Matches in SemOpenAlex for { <https://semopenalex.org/work/W4281716122> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4281716122 endingPage "103377" @default.
- W4281716122 startingPage "103377" @default.
- W4281716122 abstract "This paper experiments with frequency-based corpus similarity measures across 39 languages using a register prediction task. The goal is to quantify (i) the distance between different corpora from the same language and (ii) the homogeneity of individual corpora. Both of these goals are essential for measuring how well corpus-based linguistic analysis generalizes from one dataset to another. The problem is that previous work has focused on Indo-European languages, raising the question of whether these measures are able to provide robust generalizations across diverse languages. This paper uses a register prediction task to evaluate competing measures across 39 languages: how well are they able to distinguish between corpora representing different contexts of production? Each experiment compares three corpora from a single language, with the same three digital registers shared across all languages: social media, web pages, and Wikipedia. Results show that measures of corpus similarity retain their validity across different language families, writing systems, and types of morphology. Further, the measures remain robust when evaluated on out-of-domain corpora, when applied to low-resource languages, and when applied to different sets of registers. These findings are significant given our need to make generalizations across the rapidly increasing number of corpora available for analysis." @default.
- W4281716122 created "2022-06-13" @default.
- W4281716122 creator A5028523606 @default.
- W4281716122 creator A5047552450 @default.
- W4281716122 date "2022-08-01" @default.
- W4281716122 modified "2023-10-10" @default.
- W4281716122 title "Corpus similarity measures remain robust across diverse languages" @default.
- W4281716122 cites W1501311083 @default.
- W4281716122 cites W2006539462 @default.
- W4281716122 cites W2008030627 @default.
- W4281716122 cites W2038178664 @default.
- W4281716122 cites W2041532239 @default.
- W4281716122 cites W2059396130 @default.
- W4281716122 cites W2161411600 @default.
- W4281716122 cites W2571688478 @default.
- W4281716122 cites W2611873983 @default.
- W4281716122 cites W2886308453 @default.
- W4281716122 cites W2894950577 @default.
- W4281716122 cites W3009487227 @default.
- W4281716122 cites W3100806282 @default.
- W4281716122 cites W3106272061 @default.
- W4281716122 cites W4211148787 @default.
- W4281716122 doi "https://doi.org/10.1016/j.lingua.2022.103377" @default.
- W4281716122 hasPublicationYear "2022" @default.
- W4281716122 type Work @default.
- W4281716122 citedByCount "1" @default.
- W4281716122 countsByYear W42817161222022 @default.
- W4281716122 crossrefType "journal-article" @default.
- W4281716122 hasAuthorship W4281716122A5028523606 @default.
- W4281716122 hasAuthorship W4281716122A5047552450 @default.
- W4281716122 hasBestOaLocation W42817161222 @default.
- W4281716122 hasConcept C103278499 @default.
- W4281716122 hasConcept C115961682 @default.
- W4281716122 hasConcept C121332964 @default.
- W4281716122 hasConcept C138885662 @default.
- W4281716122 hasConcept C154945302 @default.
- W4281716122 hasConcept C162324750 @default.
- W4281716122 hasConcept C187736073 @default.
- W4281716122 hasConcept C203005215 @default.
- W4281716122 hasConcept C204321447 @default.
- W4281716122 hasConcept C2778334786 @default.
- W4281716122 hasConcept C2780451532 @default.
- W4281716122 hasConcept C2985367798 @default.
- W4281716122 hasConcept C41008148 @default.
- W4281716122 hasConcept C41895202 @default.
- W4281716122 hasConcept C44870925 @default.
- W4281716122 hasConceptScore W4281716122C103278499 @default.
- W4281716122 hasConceptScore W4281716122C115961682 @default.
- W4281716122 hasConceptScore W4281716122C121332964 @default.
- W4281716122 hasConceptScore W4281716122C138885662 @default.
- W4281716122 hasConceptScore W4281716122C154945302 @default.
- W4281716122 hasConceptScore W4281716122C162324750 @default.
- W4281716122 hasConceptScore W4281716122C187736073 @default.
- W4281716122 hasConceptScore W4281716122C203005215 @default.
- W4281716122 hasConceptScore W4281716122C204321447 @default.
- W4281716122 hasConceptScore W4281716122C2778334786 @default.
- W4281716122 hasConceptScore W4281716122C2780451532 @default.
- W4281716122 hasConceptScore W4281716122C2985367798 @default.
- W4281716122 hasConceptScore W4281716122C41008148 @default.
- W4281716122 hasConceptScore W4281716122C41895202 @default.
- W4281716122 hasConceptScore W4281716122C44870925 @default.
- W4281716122 hasLocation W42817161221 @default.
- W4281716122 hasLocation W42817161222 @default.
- W4281716122 hasOpenAccess W4281716122 @default.
- W4281716122 hasPrimaryLocation W42817161221 @default.
- W4281716122 hasRelatedWork W2123678043 @default.
- W4281716122 hasRelatedWork W2125885330 @default.
- W4281716122 hasRelatedWork W2143927888 @default.
- W4281716122 hasRelatedWork W2242707303 @default.
- W4281716122 hasRelatedWork W2251265917 @default.
- W4281716122 hasRelatedWork W2757753881 @default.
- W4281716122 hasRelatedWork W2902262852 @default.
- W4281716122 hasRelatedWork W3123064333 @default.
- W4281716122 hasRelatedWork W37319627 @default.
- W4281716122 hasRelatedWork W1785384086 @default.
- W4281716122 hasVolume "275" @default.
- W4281716122 isParatext "false" @default.
- W4281716122 isRetracted "false" @default.
- W4281716122 workType "article" @default.