Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385573174> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4385573174 abstract "English pretrained language models, which make up the backbone of many modern NLP systems, require huge amounts of unlabeled training data. These models are generally presented as being trained only on English text but have been found to transfer surprisingly well to other languages. We investigate this phenomenon and find that common English pretraining corpora actually contain significant amounts of non-English text: even when less than 1% of data is not English (well within the error rate of strong language classifiers), this leads to hundreds of millions of foreign language tokens in large-scale datasets. We then demonstrate that even these small percentages of non-English data facilitate cross-lingual transfer for models trained on them, with target language performance strongly correlated to the amount of in-language data seen during pretraining. In light of these findings, we argue that no model is truly monolingual when pretrained at scale, which should be considered when evaluating cross-lingual transfer." @default.
- W4385573174 created "2023-08-05" @default.
- W4385573174 creator A5067919401 @default.
- W4385573174 creator A5088311840 @default.
- W4385573174 date "2022-01-01" @default.
- W4385573174 modified "2023-09-24" @default.
- W4385573174 title "Language Contamination Helps Explains the Cross-lingual Capabilities of English Pretrained Models" @default.
- W4385573174 doi "https://doi.org/10.18653/v1/2022.emnlp-main.233" @default.
- W4385573174 hasPublicationYear "2022" @default.
- W4385573174 type Work @default.
- W4385573174 citedByCount "0" @default.
- W4385573174 crossrefType "proceedings-article" @default.
- W4385573174 hasAuthorship W4385573174A5067919401 @default.
- W4385573174 hasAuthorship W4385573174A5088311840 @default.
- W4385573174 hasBestOaLocation W43855731741 @default.
- W4385573174 hasConcept C114010052 @default.
- W4385573174 hasConcept C116081451 @default.
- W4385573174 hasConcept C121332964 @default.
- W4385573174 hasConcept C129353971 @default.
- W4385573174 hasConcept C137293760 @default.
- W4385573174 hasConcept C138885662 @default.
- W4385573174 hasConcept C150899416 @default.
- W4385573174 hasConcept C154945302 @default.
- W4385573174 hasConcept C171041071 @default.
- W4385573174 hasConcept C173608175 @default.
- W4385573174 hasConcept C195324797 @default.
- W4385573174 hasConcept C204321447 @default.
- W4385573174 hasConcept C2776175482 @default.
- W4385573174 hasConcept C2778755073 @default.
- W4385573174 hasConcept C2987496018 @default.
- W4385573174 hasConcept C41008148 @default.
- W4385573174 hasConcept C41895202 @default.
- W4385573174 hasConcept C62520636 @default.
- W4385573174 hasConceptScore W4385573174C114010052 @default.
- W4385573174 hasConceptScore W4385573174C116081451 @default.
- W4385573174 hasConceptScore W4385573174C121332964 @default.
- W4385573174 hasConceptScore W4385573174C129353971 @default.
- W4385573174 hasConceptScore W4385573174C137293760 @default.
- W4385573174 hasConceptScore W4385573174C138885662 @default.
- W4385573174 hasConceptScore W4385573174C150899416 @default.
- W4385573174 hasConceptScore W4385573174C154945302 @default.
- W4385573174 hasConceptScore W4385573174C171041071 @default.
- W4385573174 hasConceptScore W4385573174C173608175 @default.
- W4385573174 hasConceptScore W4385573174C195324797 @default.
- W4385573174 hasConceptScore W4385573174C204321447 @default.
- W4385573174 hasConceptScore W4385573174C2776175482 @default.
- W4385573174 hasConceptScore W4385573174C2778755073 @default.
- W4385573174 hasConceptScore W4385573174C2987496018 @default.
- W4385573174 hasConceptScore W4385573174C41008148 @default.
- W4385573174 hasConceptScore W4385573174C41895202 @default.
- W4385573174 hasConceptScore W4385573174C62520636 @default.
- W4385573174 hasLocation W43855731741 @default.
- W4385573174 hasOpenAccess W4385573174 @default.
- W4385573174 hasPrimaryLocation W43855731741 @default.
- W4385573174 hasRelatedWork W2103132763 @default.
- W4385573174 hasRelatedWork W2279152923 @default.
- W4385573174 hasRelatedWork W2353848860 @default.
- W4385573174 hasRelatedWork W2359001871 @default.
- W4385573174 hasRelatedWork W2366315565 @default.
- W4385573174 hasRelatedWork W2377690353 @default.
- W4385573174 hasRelatedWork W2381222156 @default.
- W4385573174 hasRelatedWork W2947661788 @default.
- W4385573174 hasRelatedWork W4285152691 @default.
- W4385573174 hasRelatedWork W4319998608 @default.
- W4385573174 isParatext "false" @default.
- W4385573174 isRetracted "false" @default.
- W4385573174 workType "article" @default.