Matches in SemOpenAlex for { <https://semopenalex.org/work/W1568996612> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W1568996612 abstract "The statistical NLP and IR literatures tend to make a “homogeneity assumption” about the distribution of terms, either by adopting a “bag of words” model, or in their treatment of function words. In this paper we develop a notion of homogeneity detection to a level of statistical significance, and conduct a series of experiments on different datasets, to show that the homogeneity assumption does not generally hold. We show that it also does not hold for function words. Importantly, datasets and document collections are found not to be neutral with respect to the property of homogeneity, even for function words. The homogeneity assumption is defeated substantially even for collections known to contain similar documents, and more drastically for diverse collections. We conclude that it is statistically unreasonable to assume that word distribution within a corpus is homogeneous. Because homogeneity findings differ substantially between different collections, we argue for the use of homogeneity measures as a means of profiling datasets." @default.
- W1568996612 created "2016-06-24" @default.
- W1568996612 creator A5015098106 @default.
- W1568996612 creator A5037904300 @default.
- W1568996612 creator A5087301006 @default.
- W1568996612 date "2004-03-01" @default.
- W1568996612 modified "2023-09-24" @default.
- W1568996612 title "Defeating the Homogeneity Assumption" @default.
- W1568996612 cites W147918118 @default.
- W1568996612 cites W1574901103 @default.
- W1568996612 cites W1664671319 @default.
- W1568996612 cites W1784531618 @default.
- W1568996612 cites W1973551733 @default.
- W1568996612 cites W1975690018 @default.
- W1568996612 cites W2066651136 @default.
- W1568996612 cites W2113110240 @default.
- W1568996612 cites W2116780029 @default.
- W1568996612 cites W2787797090 @default.
- W1568996612 hasPublicationYear "2004" @default.
- W1568996612 type Work @default.
- W1568996612 sameAs 1568996612 @default.
- W1568996612 citedByCount "6" @default.
- W1568996612 countsByYear W15689966122017 @default.
- W1568996612 crossrefType "journal-article" @default.
- W1568996612 hasAuthorship W1568996612A5015098106 @default.
- W1568996612 hasAuthorship W1568996612A5037904300 @default.
- W1568996612 hasAuthorship W1568996612A5087301006 @default.
- W1568996612 hasConcept C105795698 @default.
- W1568996612 hasConcept C114614502 @default.
- W1568996612 hasConcept C142259097 @default.
- W1568996612 hasConcept C149782125 @default.
- W1568996612 hasConcept C154945302 @default.
- W1568996612 hasConcept C33923547 @default.
- W1568996612 hasConcept C41008148 @default.
- W1568996612 hasConcept C66882249 @default.
- W1568996612 hasConceptScore W1568996612C105795698 @default.
- W1568996612 hasConceptScore W1568996612C114614502 @default.
- W1568996612 hasConceptScore W1568996612C142259097 @default.
- W1568996612 hasConceptScore W1568996612C149782125 @default.
- W1568996612 hasConceptScore W1568996612C154945302 @default.
- W1568996612 hasConceptScore W1568996612C33923547 @default.
- W1568996612 hasConceptScore W1568996612C41008148 @default.
- W1568996612 hasConceptScore W1568996612C66882249 @default.
- W1568996612 hasLocation W15689966121 @default.
- W1568996612 hasOpenAccess W1568996612 @default.
- W1568996612 hasPrimaryLocation W15689966121 @default.
- W1568996612 hasRelatedWork W119366804 @default.
- W1568996612 hasRelatedWork W1664671319 @default.
- W1568996612 hasRelatedWork W1967728069 @default.
- W1568996612 hasRelatedWork W1975400356 @default.
- W1568996612 hasRelatedWork W2010784539 @default.
- W1568996612 hasRelatedWork W2012606707 @default.
- W1568996612 hasRelatedWork W2017686852 @default.
- W1568996612 hasRelatedWork W2040067411 @default.
- W1568996612 hasRelatedWork W2085384568 @default.
- W1568996612 hasRelatedWork W2092504635 @default.
- W1568996612 hasRelatedWork W2118309356 @default.
- W1568996612 hasRelatedWork W2474499347 @default.
- W1568996612 hasRelatedWork W3118555138 @default.
- W1568996612 isParatext "false" @default.
- W1568996612 isRetracted "false" @default.
- W1568996612 magId "1568996612" @default.
- W1568996612 workType "article" @default.