Matches in SemOpenAlex for { <https://semopenalex.org/work/W3103458843> ?p ?o ?g. }
- W3103458843 endingPage "113010" @default.
- W3103458843 startingPage "113010" @default.
- W3103458843 abstract "In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. Besides the sublinear scaling of the vocabulary size with database size (Heaps' law), here we report a new scaling of the fluctuations around this average (fluctuation scaling analysis). We explain both scaling laws by modeling the usage of words by simple stochastic processes in which the overall distribution of word-frequencies is fat tailed (Zipf's law) and the frequency of a single word is subject to fluctuations across documents (as in topic models). In this framework, the mean and the variance of the vocabulary size can be expressed as quenched averages, implying that: i) the inhomogeneous dissemination of words cause a reduction of the average vocabulary size in comparison to the homogeneous case, and ii) correlations in the co-occurrence of words lead to an increase in the variance and the vocabulary size becomes a non-self-averaging quantity. We address the implications of these observations to the measurement of lexical richness. We test our results in three large text databases (Google-ngram, Enlgish Wikipedia, and a collection of scientific articles)." @default.
- W3103458843 created "2020-11-23" @default.
- W3103458843 creator A5008805047 @default.
- W3103458843 creator A5023812700 @default.
- W3103458843 date "2014-11-04" @default.
- W3103458843 modified "2023-09-26" @default.
- W3103458843 title "Scaling laws and fluctuations in the statistics of word frequencies" @default.
- W3103458843 cites W1509465553 @default.
- W3103458843 cites W1532634507 @default.
- W3103458843 cites W1964274103 @default.
- W3103458843 cites W1973589994 @default.
- W3103458843 cites W1984515987 @default.
- W3103458843 cites W1989338016 @default.
- W3103458843 cites W1993803315 @default.
- W3103458843 cites W2008203686 @default.
- W3103458843 cites W2010846043 @default.
- W3103458843 cites W2015087122 @default.
- W3103458843 cites W2017392697 @default.
- W3103458843 cites W2019096529 @default.
- W3103458843 cites W2030678494 @default.
- W3103458843 cites W2060022568 @default.
- W3103458843 cites W2061114779 @default.
- W3103458843 cites W2061491895 @default.
- W3103458843 cites W2080843536 @default.
- W3103458843 cites W2083743389 @default.
- W3103458843 cites W2096602434 @default.
- W3103458843 cites W2113813706 @default.
- W3103458843 cites W2115054880 @default.
- W3103458843 cites W2119877592 @default.
- W3103458843 cites W2138821388 @default.
- W3103458843 cites W2144022212 @default.
- W3103458843 cites W2161291053 @default.
- W3103458843 cites W2174706414 @default.
- W3103458843 cites W2323648741 @default.
- W3103458843 cites W2950627632 @default.
- W3103458843 cites W3098759601 @default.
- W3103458843 cites W3099247429 @default.
- W3103458843 cites W3103059940 @default.
- W3103458843 cites W3103362336 @default.
- W3103458843 cites W4213009331 @default.
- W3103458843 cites W4214686357 @default.
- W3103458843 cites W4238346259 @default.
- W3103458843 cites W4243394285 @default.
- W3103458843 cites W4255116730 @default.
- W3103458843 doi "https://doi.org/10.1088/1367-2630/16/11/113010" @default.
- W3103458843 hasPublicationYear "2014" @default.
- W3103458843 type Work @default.
- W3103458843 sameAs 3103458843 @default.
- W3103458843 citedByCount "46" @default.
- W3103458843 countsByYear W31034588432015 @default.
- W3103458843 countsByYear W31034588432016 @default.
- W3103458843 countsByYear W31034588432017 @default.
- W3103458843 countsByYear W31034588432018 @default.
- W3103458843 countsByYear W31034588432019 @default.
- W3103458843 countsByYear W31034588432020 @default.
- W3103458843 countsByYear W31034588432021 @default.
- W3103458843 countsByYear W31034588432022 @default.
- W3103458843 countsByYear W31034588432023 @default.
- W3103458843 crossrefType "journal-article" @default.
- W3103458843 hasAuthorship W3103458843A5008805047 @default.
- W3103458843 hasAuthorship W3103458843A5023812700 @default.
- W3103458843 hasBestOaLocation W31034588431 @default.
- W3103458843 hasConcept C105795698 @default.
- W3103458843 hasConcept C121332964 @default.
- W3103458843 hasConcept C121864883 @default.
- W3103458843 hasConcept C121955636 @default.
- W3103458843 hasConcept C125932096 @default.
- W3103458843 hasConcept C138885662 @default.
- W3103458843 hasConcept C144133560 @default.
- W3103458843 hasConcept C149782125 @default.
- W3103458843 hasConcept C154945302 @default.
- W3103458843 hasConcept C175293574 @default.
- W3103458843 hasConcept C196083921 @default.
- W3103458843 hasConcept C2524010 @default.
- W3103458843 hasConcept C2777530160 @default.
- W3103458843 hasConcept C2777601683 @default.
- W3103458843 hasConcept C2988430800 @default.
- W3103458843 hasConcept C33923547 @default.
- W3103458843 hasConcept C41008148 @default.
- W3103458843 hasConcept C41895202 @default.
- W3103458843 hasConcept C43596424 @default.
- W3103458843 hasConcept C90805587 @default.
- W3103458843 hasConcept C99844830 @default.
- W3103458843 hasConceptScore W3103458843C105795698 @default.
- W3103458843 hasConceptScore W3103458843C121332964 @default.
- W3103458843 hasConceptScore W3103458843C121864883 @default.
- W3103458843 hasConceptScore W3103458843C121955636 @default.
- W3103458843 hasConceptScore W3103458843C125932096 @default.
- W3103458843 hasConceptScore W3103458843C138885662 @default.
- W3103458843 hasConceptScore W3103458843C144133560 @default.
- W3103458843 hasConceptScore W3103458843C149782125 @default.
- W3103458843 hasConceptScore W3103458843C154945302 @default.
- W3103458843 hasConceptScore W3103458843C175293574 @default.
- W3103458843 hasConceptScore W3103458843C196083921 @default.
- W3103458843 hasConceptScore W3103458843C2524010 @default.
- W3103458843 hasConceptScore W3103458843C2777530160 @default.
- W3103458843 hasConceptScore W3103458843C2777601683 @default.
- W3103458843 hasConceptScore W3103458843C2988430800 @default.