Matches in SemOpenAlex for { <https://semopenalex.org/work/W3099299360> ?p ?o ?g. }
- W3099299360 abstract "NLP is currently dominated by general-purpose pretrained language models like RoBERTa, which achieve strong performance on NLU tasks through pretraining on billions of words. But what exact knowledge or skills do Transformer LMs learn from large-scale pretraining that they cannot learn from less data? We adopt four probing methods---classifier probing, information-theoretic probing, unsupervised relative acceptability judgment, and fine-tuning on NLU tasks---and draw learning curves that track the growth of these different measures of linguistic ability with respect to pretraining data volume using the MiniBERTas, a group of RoBERTa models pretrained on 1M, 10M, 100M and 1B words. We find that LMs require only about 10M or 100M words to learn representations that reliably encode most syntactic and semantic features we test. A much larger quantity of data is needed in order to acquire enough commonsense knowledge and other skills required to master typical downstream NLU tasks. The results suggest that, while the ability to encode linguistic features is almost certainly necessary for language understanding, it is likely that other forms of knowledge are the major drivers of recent improvements in language understanding among large pretrained models." @default.
- W3099299360 created "2020-11-23" @default.
- W3099299360 creator A5007865421 @default.
- W3099299360 creator A5025675890 @default.
- W3099299360 creator A5067390670 @default.
- W3099299360 creator A5089452138 @default.
- W3099299360 date "2020-11-10" @default.
- W3099299360 modified "2023-10-11" @default.
- W3099299360 title "When Do You Need Billions of Words of Pretraining Data?" @default.
- W3099299360 cites W1711163617 @default.
- W3099299360 cites W1752492850 @default.
- W3099299360 cites W1998674455 @default.
- W3099299360 cites W2081580037 @default.
- W3099299360 cites W2105941071 @default.
- W3099299360 cites W2130158090 @default.
- W3099299360 cites W2145755360 @default.
- W3099299360 cites W2156523753 @default.
- W3099299360 cites W2250263931 @default.
- W3099299360 cites W2396767181 @default.
- W3099299360 cites W2516090925 @default.
- W3099299360 cites W2605221307 @default.
- W3099299360 cites W2771275742 @default.
- W3099299360 cites W2906152891 @default.
- W3099299360 cites W2946359678 @default.
- W3099299360 cites W2946417913 @default.
- W3099299360 cites W2946659172 @default.
- W3099299360 cites W2953369973 @default.
- W3099299360 cites W2956105246 @default.
- W3099299360 cites W2962776659 @default.
- W3099299360 cites W2963341956 @default.
- W3099299360 cites W2964121744 @default.
- W3099299360 cites W2965373594 @default.
- W3099299360 cites W2970862333 @default.
- W3099299360 cites W2971016963 @default.
- W3099299360 cites W2986154550 @default.
- W3099299360 cites W2990704537 @default.
- W3099299360 cites W2996728628 @default.
- W3099299360 cites W3004346089 @default.
- W3099299360 cites W3026404337 @default.
- W3099299360 cites W3030163527 @default.
- W3099299360 cites W3031001133 @default.
- W3099299360 cites W3034255912 @default.
- W3099299360 cites W3034775979 @default.
- W3099299360 cites W3035305735 @default.
- W3099299360 cites W3082274269 @default.
- W3099299360 cites W3085876565 @default.
- W3099299360 cites W3090805418 @default.
- W3099299360 cites W3092592453 @default.
- W3099299360 cites W3093002075 @default.
- W3099299360 cites W3098613713 @default.
- W3099299360 cites W3102226577 @default.
- W3099299360 cites W3105069964 @default.
- W3099299360 cites W3118485687 @default.
- W3099299360 cites W3168987555 @default.
- W3099299360 cites W2525127255 @default.
- W3099299360 doi "https://doi.org/10.48550/arxiv.2011.04946" @default.
- W3099299360 hasPublicationYear "2020" @default.
- W3099299360 type Work @default.
- W3099299360 sameAs 3099299360 @default.
- W3099299360 citedByCount "14" @default.
- W3099299360 countsByYear W30992993602020 @default.
- W3099299360 countsByYear W30992993602021 @default.
- W3099299360 crossrefType "posted-content" @default.
- W3099299360 hasAuthorship W3099299360A5007865421 @default.
- W3099299360 hasAuthorship W3099299360A5025675890 @default.
- W3099299360 hasAuthorship W3099299360A5067390670 @default.
- W3099299360 hasAuthorship W3099299360A5089452138 @default.
- W3099299360 hasBestOaLocation W30992993601 @default.
- W3099299360 hasConcept C104317684 @default.
- W3099299360 hasConcept C121332964 @default.
- W3099299360 hasConcept C137293760 @default.
- W3099299360 hasConcept C154945302 @default.
- W3099299360 hasConcept C161301231 @default.
- W3099299360 hasConcept C165801399 @default.
- W3099299360 hasConcept C185592680 @default.
- W3099299360 hasConcept C204321447 @default.
- W3099299360 hasConcept C2776145971 @default.
- W3099299360 hasConcept C30542707 @default.
- W3099299360 hasConcept C41008148 @default.
- W3099299360 hasConcept C55493867 @default.
- W3099299360 hasConcept C62520636 @default.
- W3099299360 hasConcept C66322947 @default.
- W3099299360 hasConcept C66746571 @default.
- W3099299360 hasConcept C95623464 @default.
- W3099299360 hasConceptScore W3099299360C104317684 @default.
- W3099299360 hasConceptScore W3099299360C121332964 @default.
- W3099299360 hasConceptScore W3099299360C137293760 @default.
- W3099299360 hasConceptScore W3099299360C154945302 @default.
- W3099299360 hasConceptScore W3099299360C161301231 @default.
- W3099299360 hasConceptScore W3099299360C165801399 @default.
- W3099299360 hasConceptScore W3099299360C185592680 @default.
- W3099299360 hasConceptScore W3099299360C204321447 @default.
- W3099299360 hasConceptScore W3099299360C2776145971 @default.
- W3099299360 hasConceptScore W3099299360C30542707 @default.
- W3099299360 hasConceptScore W3099299360C41008148 @default.
- W3099299360 hasConceptScore W3099299360C55493867 @default.
- W3099299360 hasConceptScore W3099299360C62520636 @default.
- W3099299360 hasConceptScore W3099299360C66322947 @default.
- W3099299360 hasConceptScore W3099299360C66746571 @default.
- W3099299360 hasConceptScore W3099299360C95623464 @default.