Matches in SemOpenAlex for { <https://semopenalex.org/work/W4382699765> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W4382699765 abstract "Resource constraints often require researchers to restrict their attention to a subset of the tokens returned by a corpus query. This paper sketches a methodology for down-sampling and offers a survey of current practices. The most prevalent approach, drawing a random sample from the list of corpus hits, has been shown to be inefficient if tokens are clustered by text file. We extend the evaluation of down-sampling designs to settings where tokens are also clustered by lexical item. Our case study, which deals with the replacement of third-person present-tense verb inflection -(e)th by -(e)s in Early Modern English, focuses on five predictors: time, gender, genre, frequency, and phonological context. Assuming we are able to analyze only 2,000 (out of 12,244) tokens, we compare two strategies for selecting a sub-sample of this size: simple down-sampling, where each hit has the same probability of being selected; and structured down-sampling, where this probability is inversely proportional to the author- and verb-specific token count. We form 500 sub-samples using each scheme and compare estimates based on mixed-effects logistic regression to a reference model fit to the full set of cases. We observe that structured down-sampling outperforms simple down-sampling on several evaluation criteria." @default.
- W4382699765 created "2023-07-01" @default.
- W4382699765 creator A5067490399 @default.
- W4382699765 date "2023-06-29" @default.
- W4382699765 modified "2023-09-23" @default.
- W4382699765 title "Down-sampling from hierarchically structured corpus data" @default.
- W4382699765 doi "https://doi.org/10.31234/osf.io/4vtja" @default.
- W4382699765 hasPublicationYear "2023" @default.
- W4382699765 type Work @default.
- W4382699765 citedByCount "0" @default.
- W4382699765 crossrefType "posted-content" @default.
- W4382699765 hasAuthorship W4382699765A5067490399 @default.
- W4382699765 hasBestOaLocation W43826997651 @default.
- W4382699765 hasConcept C105795698 @default.
- W4382699765 hasConcept C106131492 @default.
- W4382699765 hasConcept C119857082 @default.
- W4382699765 hasConcept C140779682 @default.
- W4382699765 hasConcept C144024400 @default.
- W4382699765 hasConcept C149923435 @default.
- W4382699765 hasConcept C151730666 @default.
- W4382699765 hasConcept C151956035 @default.
- W4382699765 hasConcept C154945302 @default.
- W4382699765 hasConcept C159403335 @default.
- W4382699765 hasConcept C177264268 @default.
- W4382699765 hasConcept C185592680 @default.
- W4382699765 hasConcept C198531522 @default.
- W4382699765 hasConcept C199360897 @default.
- W4382699765 hasConcept C20353970 @default.
- W4382699765 hasConcept C204321447 @default.
- W4382699765 hasConcept C2776397901 @default.
- W4382699765 hasConcept C2779343474 @default.
- W4382699765 hasConcept C2908647359 @default.
- W4382699765 hasConcept C31972630 @default.
- W4382699765 hasConcept C33923547 @default.
- W4382699765 hasConcept C38652104 @default.
- W4382699765 hasConcept C41008148 @default.
- W4382699765 hasConcept C43617362 @default.
- W4382699765 hasConcept C48145219 @default.
- W4382699765 hasConcept C49898467 @default.
- W4382699765 hasConcept C75373757 @default.
- W4382699765 hasConcept C86803240 @default.
- W4382699765 hasConceptScore W4382699765C105795698 @default.
- W4382699765 hasConceptScore W4382699765C106131492 @default.
- W4382699765 hasConceptScore W4382699765C119857082 @default.
- W4382699765 hasConceptScore W4382699765C140779682 @default.
- W4382699765 hasConceptScore W4382699765C144024400 @default.
- W4382699765 hasConceptScore W4382699765C149923435 @default.
- W4382699765 hasConceptScore W4382699765C151730666 @default.
- W4382699765 hasConceptScore W4382699765C151956035 @default.
- W4382699765 hasConceptScore W4382699765C154945302 @default.
- W4382699765 hasConceptScore W4382699765C159403335 @default.
- W4382699765 hasConceptScore W4382699765C177264268 @default.
- W4382699765 hasConceptScore W4382699765C185592680 @default.
- W4382699765 hasConceptScore W4382699765C198531522 @default.
- W4382699765 hasConceptScore W4382699765C199360897 @default.
- W4382699765 hasConceptScore W4382699765C20353970 @default.
- W4382699765 hasConceptScore W4382699765C204321447 @default.
- W4382699765 hasConceptScore W4382699765C2776397901 @default.
- W4382699765 hasConceptScore W4382699765C2779343474 @default.
- W4382699765 hasConceptScore W4382699765C2908647359 @default.
- W4382699765 hasConceptScore W4382699765C31972630 @default.
- W4382699765 hasConceptScore W4382699765C33923547 @default.
- W4382699765 hasConceptScore W4382699765C38652104 @default.
- W4382699765 hasConceptScore W4382699765C41008148 @default.
- W4382699765 hasConceptScore W4382699765C43617362 @default.
- W4382699765 hasConceptScore W4382699765C48145219 @default.
- W4382699765 hasConceptScore W4382699765C49898467 @default.
- W4382699765 hasConceptScore W4382699765C75373757 @default.
- W4382699765 hasConceptScore W4382699765C86803240 @default.
- W4382699765 hasLocation W43826997651 @default.
- W4382699765 hasOpenAccess W4382699765 @default.
- W4382699765 hasPrimaryLocation W43826997651 @default.
- W4382699765 hasRelatedWork W1605393171 @default.
- W4382699765 hasRelatedWork W2056178673 @default.
- W4382699765 hasRelatedWork W2075114053 @default.
- W4382699765 hasRelatedWork W2350399852 @default.
- W4382699765 hasRelatedWork W2555206834 @default.
- W4382699765 hasRelatedWork W2595070305 @default.
- W4382699765 hasRelatedWork W2605994566 @default.
- W4382699765 hasRelatedWork W3047864323 @default.
- W4382699765 hasRelatedWork W3126132007 @default.
- W4382699765 hasRelatedWork W567739382 @default.
- W4382699765 isParatext "false" @default.
- W4382699765 isRetracted "false" @default.
- W4382699765 workType "article" @default.