Matches in SemOpenAlex for { <https://semopenalex.org/work/W4306703986> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4306703986 abstract "We present M2D2, a fine-grained, massively multi-domain corpus for studying domain adaptation in language models (LMs). M2D2 consists of 8.5B tokens and spans 145 domains extracted from Wikipedia and Semantic Scholar. Using ontologies derived from Wikipedia and ArXiv categories, we organize the domains in each data source into 22 groups. This two-level hierarchy enables the study of relationships between domains and their effects on in- and out-of-domain performance after adaptation. We also present a number of insights into the nature of effective domain adaptation in LMs, as examples of the new types of studies M2D2 enables. To improve in-domain performance, we show the benefits of adapting the LM along a domain hierarchy; adapting to smaller amounts of fine-grained domain-specific data can lead to larger in-domain performance gains than larger amounts of weakly relevant data. We further demonstrate a trade-off between in-domain specialization and out-of-domain generalization within and across ontologies, as well as a strong correlation between out-of-domain performance and lexical overlap between domains." @default.
- W4306703986 created "2022-10-19" @default.
- W4306703986 creator A5067919401 @default.
- W4306703986 creator A5073571802 @default.
- W4306703986 creator A5075783850 @default.
- W4306703986 creator A5077994189 @default.
- W4306703986 date "2022-10-13" @default.
- W4306703986 modified "2023-09-29" @default.
- W4306703986 title "M2D2: A Massively Multi-domain Language Modeling Dataset" @default.
- W4306703986 doi "https://doi.org/10.48550/arxiv.2210.07370" @default.
- W4306703986 hasPublicationYear "2022" @default.
- W4306703986 type Work @default.
- W4306703986 citedByCount "0" @default.
- W4306703986 crossrefType "posted-content" @default.
- W4306703986 hasAuthorship W4306703986A5067919401 @default.
- W4306703986 hasAuthorship W4306703986A5073571802 @default.
- W4306703986 hasAuthorship W4306703986A5075783850 @default.
- W4306703986 hasAuthorship W4306703986A5077994189 @default.
- W4306703986 hasBestOaLocation W43067039861 @default.
- W4306703986 hasConcept C120665830 @default.
- W4306703986 hasConcept C121332964 @default.
- W4306703986 hasConcept C127313418 @default.
- W4306703986 hasConcept C134306372 @default.
- W4306703986 hasConcept C139807058 @default.
- W4306703986 hasConcept C154945302 @default.
- W4306703986 hasConcept C162324750 @default.
- W4306703986 hasConcept C17409809 @default.
- W4306703986 hasConcept C177148314 @default.
- W4306703986 hasConcept C193669473 @default.
- W4306703986 hasConcept C204321447 @default.
- W4306703986 hasConcept C2776434776 @default.
- W4306703986 hasConcept C2778648169 @default.
- W4306703986 hasConcept C31170391 @default.
- W4306703986 hasConcept C33923547 @default.
- W4306703986 hasConcept C34447519 @default.
- W4306703986 hasConcept C36503486 @default.
- W4306703986 hasConcept C41008148 @default.
- W4306703986 hasConcept C80444323 @default.
- W4306703986 hasConcept C85345410 @default.
- W4306703986 hasConcept C94184115 @default.
- W4306703986 hasConcept C95623464 @default.
- W4306703986 hasConceptScore W4306703986C120665830 @default.
- W4306703986 hasConceptScore W4306703986C121332964 @default.
- W4306703986 hasConceptScore W4306703986C127313418 @default.
- W4306703986 hasConceptScore W4306703986C134306372 @default.
- W4306703986 hasConceptScore W4306703986C139807058 @default.
- W4306703986 hasConceptScore W4306703986C154945302 @default.
- W4306703986 hasConceptScore W4306703986C162324750 @default.
- W4306703986 hasConceptScore W4306703986C17409809 @default.
- W4306703986 hasConceptScore W4306703986C177148314 @default.
- W4306703986 hasConceptScore W4306703986C193669473 @default.
- W4306703986 hasConceptScore W4306703986C204321447 @default.
- W4306703986 hasConceptScore W4306703986C2776434776 @default.
- W4306703986 hasConceptScore W4306703986C2778648169 @default.
- W4306703986 hasConceptScore W4306703986C31170391 @default.
- W4306703986 hasConceptScore W4306703986C33923547 @default.
- W4306703986 hasConceptScore W4306703986C34447519 @default.
- W4306703986 hasConceptScore W4306703986C36503486 @default.
- W4306703986 hasConceptScore W4306703986C41008148 @default.
- W4306703986 hasConceptScore W4306703986C80444323 @default.
- W4306703986 hasConceptScore W4306703986C85345410 @default.
- W4306703986 hasConceptScore W4306703986C94184115 @default.
- W4306703986 hasConceptScore W4306703986C95623464 @default.
- W4306703986 hasLocation W43067039861 @default.
- W4306703986 hasOpenAccess W4306703986 @default.
- W4306703986 hasPrimaryLocation W43067039861 @default.
- W4306703986 hasRelatedWork W2152148513 @default.
- W4306703986 hasRelatedWork W2280198878 @default.
- W4306703986 hasRelatedWork W2783380393 @default.
- W4306703986 hasRelatedWork W2951876757 @default.
- W4306703986 hasRelatedWork W3096565154 @default.
- W4306703986 hasRelatedWork W4281397339 @default.
- W4306703986 hasRelatedWork W4283450023 @default.
- W4306703986 hasRelatedWork W4306703986 @default.
- W4306703986 hasRelatedWork W4312910505 @default.
- W4306703986 hasRelatedWork W4376166954 @default.
- W4306703986 isParatext "false" @default.
- W4306703986 isRetracted "false" @default.
- W4306703986 workType "article" @default.