Matches in SemOpenAlex for { <https://semopenalex.org/work/W3174129638> ?p ?o ?g. }
- W3174129638 abstract "Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained on intermediate (supervised) tasks that are similar to the target task. For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask. In this work, we explore Gradient Boosted Decision Trees (GBDTs) as an alternative to the commonly used Multi-Layer Perceptron (MLP) classification head. GBDTs have desirable properties such as good performance on dense, numerical features and are effective where the ratio of the number of samples w.r.t the number of features is low. We then introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network. We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model (RoBERTa-large with MNLI pretraining). The FreeGBDT shows a consistent improvement over the MLP classification head." @default.
- W3174129638 created "2021-07-05" @default.
- W3174129638 creator A5001624616 @default.
- W3174129638 creator A5011959340 @default.
- W3174129638 creator A5017419164 @default.
- W3174129638 date "2021-01-01" @default.
- W3174129638 modified "2023-10-03" @default.
- W3174129638 title "Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning" @default.
- W3174129638 cites W1565746575 @default.
- W3174129638 cites W1594031697 @default.
- W3174129638 cites W1678356000 @default.
- W3174129638 cites W1840435438 @default.
- W3174129638 cites W2073302931 @default.
- W3174129638 cites W2130158090 @default.
- W3174129638 cites W2145073242 @default.
- W3174129638 cites W2194775991 @default.
- W3174129638 cites W2220384803 @default.
- W3174129638 cites W2270070752 @default.
- W3174129638 cites W2396767181 @default.
- W3174129638 cites W2592340788 @default.
- W3174129638 cites W2737706773 @default.
- W3174129638 cites W2761700016 @default.
- W3174129638 cites W2768348081 @default.
- W3174129638 cites W2779834058 @default.
- W3174129638 cites W2808986925 @default.
- W3174129638 cites W2891308403 @default.
- W3174129638 cites W2899370105 @default.
- W3174129638 cites W2911964244 @default.
- W3174129638 cites W2912934387 @default.
- W3174129638 cites W2933138175 @default.
- W3174129638 cites W2946659172 @default.
- W3174129638 cites W2947404296 @default.
- W3174129638 cites W2950445386 @default.
- W3174129638 cites W2962739339 @default.
- W3174129638 cites W2963026768 @default.
- W3174129638 cites W2963310665 @default.
- W3174129638 cites W2963341956 @default.
- W3174129638 cites W2963403868 @default.
- W3174129638 cites W2963748441 @default.
- W3174129638 cites W2963846996 @default.
- W3174129638 cites W2963961878 @default.
- W3174129638 cites W2964054038 @default.
- W3174129638 cites W2964121744 @default.
- W3174129638 cites W2965373594 @default.
- W3174129638 cites W2970597249 @default.
- W3174129638 cites W2980282514 @default.
- W3174129638 cites W2990704537 @default.
- W3174129638 cites W2994934025 @default.
- W3174129638 cites W3005854269 @default.
- W3174129638 cites W3006647218 @default.
- W3174129638 cites W3017003177 @default.
- W3174129638 cites W3026404337 @default.
- W3174129638 cites W3034255912 @default.
- W3174129638 cites W3034850762 @default.
- W3174129638 cites W3082274269 @default.
- W3174129638 cites W3105721709 @default.
- W3174129638 cites W3128654100 @default.
- W3174129638 cites W1857789879 @default.
- W3174129638 cites W2525127255 @default.
- W3174129638 doi "https://doi.org/10.18653/v1/2021.findings-acl.26" @default.
- W3174129638 hasPublicationYear "2021" @default.
- W3174129638 type Work @default.
- W3174129638 sameAs 3174129638 @default.
- W3174129638 citedByCount "1" @default.
- W3174129638 countsByYear W31741296382023 @default.
- W3174129638 crossrefType "proceedings-article" @default.
- W3174129638 hasAuthorship W3174129638A5001624616 @default.
- W3174129638 hasAuthorship W3174129638A5011959340 @default.
- W3174129638 hasAuthorship W3174129638A5017419164 @default.
- W3174129638 hasBestOaLocation W31741296381 @default.
- W3174129638 hasConcept C11413529 @default.
- W3174129638 hasConcept C114793014 @default.
- W3174129638 hasConcept C119857082 @default.
- W3174129638 hasConcept C121332964 @default.
- W3174129638 hasConcept C127313418 @default.
- W3174129638 hasConcept C153180895 @default.
- W3174129638 hasConcept C154945302 @default.
- W3174129638 hasConcept C162324750 @default.
- W3174129638 hasConcept C165801399 @default.
- W3174129638 hasConcept C179717631 @default.
- W3174129638 hasConcept C187736073 @default.
- W3174129638 hasConcept C204321447 @default.
- W3174129638 hasConcept C2780312720 @default.
- W3174129638 hasConcept C2780451532 @default.
- W3174129638 hasConcept C41008148 @default.
- W3174129638 hasConcept C45374587 @default.
- W3174129638 hasConcept C50644808 @default.
- W3174129638 hasConcept C60908668 @default.
- W3174129638 hasConcept C62520636 @default.
- W3174129638 hasConcept C66322947 @default.
- W3174129638 hasConceptScore W3174129638C11413529 @default.
- W3174129638 hasConceptScore W3174129638C114793014 @default.
- W3174129638 hasConceptScore W3174129638C119857082 @default.
- W3174129638 hasConceptScore W3174129638C121332964 @default.
- W3174129638 hasConceptScore W3174129638C127313418 @default.
- W3174129638 hasConceptScore W3174129638C153180895 @default.
- W3174129638 hasConceptScore W3174129638C154945302 @default.
- W3174129638 hasConceptScore W3174129638C162324750 @default.
- W3174129638 hasConceptScore W3174129638C165801399 @default.
- W3174129638 hasConceptScore W3174129638C179717631 @default.