Matches in SemOpenAlex for { <https://semopenalex.org/work/W2588474037> ?p ?o ?g. }
Showing items 1 to 93 of
93
with 100 items per page.
- W2588474037 abstract "Defining New Words in Corpus Data: Productivity of English Suffixes in the British National Corpus Eiji Nishimoto (enishimoto@gc.cuny.edu) Ph.D. Program in Linguistics, The Graduate Center The City University of New York 365 Fifth Avenue, New York, NY 10016 USA Abstract The present study introduces a method of identifying potentially new words in a large corpus of texts, and assesses the morphological productivity of 12 English suffixes, based on some 78 million words of the written component (books and periodicals) of the British National Corpus (BNC). The method compares two corpus segments (created by randomly sampling at the level of documents within the BNC), and defines new words as those that are not shared across segments (segments being interpreted as randomly sampled speaker groups). The approach taken differs from others in the literature in that new words are identified irrespective of how many times a given word is used by the same speaker (author). A productivity ranking of the 12 English suffixes is obtained, and the results are shown to be intuitively satisfying and stable over different sample sizes. With a psycholinguistic interpretation of the data, implications for the nature of intuitions about productivity are considered. Introduction Morphological productivity is central to the study of word formation, but it continues to defy a solid, uniform description (see e.g., Aronoff, 1976; Bauer, 2001; Plag, 1999). The coinage of a “new” word is abundant in our daily use of language; for example, a person who is being gossiped about may be referred to as a gossipee, or a used book may be cleanish. Affixation in English (as in gossip + -ee gossipee; clean + -ish cleanish) is a productive word formation process, and there is plenty of evidence that affixes differ in their degree of productivity (e.g., Aronoff, 1976; Bauer, 2001); for example, words can in general be formed more easily with -ness than with -ity (and thus we may accept cleanness but not cleanity). The majority of researchers investigating the issue of productivity are interested in accounting for varying degrees of productivity, and several productivity measures have been proposed in the literature (e.g., Aronoff, 1976; Baayen, 1992, 2001; Bauer, 2001; Plag, 1999). Assessing the degree of productivity, however, has proven to be a complex task (Bauer, 2001): while the consensus seems to be that capturing the coinage of new words is essential in assessing productivity, there is an inherent difficulty in defining what a “new” word is. Most notable among previous studies is a corpus-based approach proposed by Baayen (1992, 2001). Based on word frequency in a large corpus of texts, his productivity measure is formulated as P = n 1 /N, where given a particular affix, n 1 is the number of word types with that affix that occur only once (the so-called hapax legomena, hereafter hapaxes), N is the sum of word tokens with that affix, and P is the productivity index. 1 P is interpreted as expressing the probability of encountering a word type with a given affix that has not been seen in the sampled corpus. Thus, new words are defined under this measure as “unseen” words in a corpus. An important characteristic of P is that it is based on token frequency—N directly refers to a count over tokens, and a word is included in the n 1 count only if it occurs just once. The measure P, with its focus on hapaxes as estimators of unseen words, is motivated by the probability estimation method of Good (1953)—or the Good-Turing estimation method (Church & Gale, 1991). 2 While a dictionary provides another source of data for quantifying morphological productivity, a corpus-based approach has many advantages. A large corpus of texts contains productively formed words that are typically not listed in a dictionary (e.g., gossipee), and corpus data reflect how words are actually used (Baayen & Lieber, 1991; Baayen & Renouf, 1996). The present study pursues and extends the corpus-based approach by introducing a new method of identifying new words and assessing productivity. Type Frequency and Deleted Estimation It has been suggested that the type frequency for an affix (the number of word types with an affix) in a corpus, represented by V, is inadequate in expressing its degree of productivity. Baayen and Lieber (1991: 804) point out that in their reference corpus of 18 million words, the type frequencies for -ness (497) and -ity (405) do not adequately express the fact that -ness is intuitively felt to be much more productive than -ity. They find that the P indices for -ness (0.0044) and -ity (0.0007) are more in line with linguists’ intuitive estimates for these suffixes. There are, however, some aspects of the measure P that can be quite counter- intuitive. In Baayen and Lieber (1991), for example, the P index for verbal suffix -ize (0.00007) is substantially lower As is usually the case in a corpus study, the term token refers to each occurrence of a word, and the term type refers to each distinct word. For instance, if we have {awareness, fairness, fairness, sharpness, sharpness}, the token frequency for -ness is 5 (the sum of all occurrences of -ness), whereas the type frequency for -ness is 3 (the number of distinct words with -ness). For more detail, see Baayen (2001)." @default.
- W2588474037 created "2017-02-24" @default.
- W2588474037 creator A5035926687 @default.
- W2588474037 date "2004-01-01" @default.
- W2588474037 modified "2023-09-23" @default.
- W2588474037 title "Defining New Words in Corpus Data: Productivity of English Suffixes in the British National Corpus" @default.
- W2588474037 cites W1574901103 @default.
- W2588474037 cites W1948131710 @default.
- W2588474037 cites W2015418650 @default.
- W2588474037 cites W2031189361 @default.
- W2588474037 cites W2059800182 @default.
- W2588474037 cites W2064318101 @default.
- W2588474037 cites W2082092506 @default.
- W2588474037 cites W2083758911 @default.
- W2588474037 cites W2115054880 @default.
- W2588474037 cites W2154206898 @default.
- W2588474037 cites W2305225949 @default.
- W2588474037 cites W2330771623 @default.
- W2588474037 cites W2612886076 @default.
- W2588474037 cites W2993650909 @default.
- W2588474037 cites W54741546 @default.
- W2588474037 cites W621048792 @default.
- W2588474037 cites W93013668 @default.
- W2588474037 cites W1973452070 @default.
- W2588474037 hasPublicationYear "2004" @default.
- W2588474037 type Work @default.
- W2588474037 sameAs 2588474037 @default.
- W2588474037 citedByCount "6" @default.
- W2588474037 countsByYear W25884740372012 @default.
- W2588474037 countsByYear W25884740372018 @default.
- W2588474037 crossrefType "journal-article" @default.
- W2588474037 hasAuthorship W2588474037A5035926687 @default.
- W2588474037 hasConcept C138885662 @default.
- W2588474037 hasConcept C139719470 @default.
- W2588474037 hasConcept C154945302 @default.
- W2588474037 hasConcept C162324750 @default.
- W2588474037 hasConcept C185592680 @default.
- W2588474037 hasConcept C189430467 @default.
- W2588474037 hasConcept C198531522 @default.
- W2588474037 hasConcept C204321447 @default.
- W2588474037 hasConcept C204983608 @default.
- W2588474037 hasConcept C2776725116 @default.
- W2588474037 hasConcept C41008148 @default.
- W2588474037 hasConcept C41895202 @default.
- W2588474037 hasConcept C43617362 @default.
- W2588474037 hasConcept C527412718 @default.
- W2588474037 hasConcept C532629269 @default.
- W2588474037 hasConcept C90805587 @default.
- W2588474037 hasConceptScore W2588474037C138885662 @default.
- W2588474037 hasConceptScore W2588474037C139719470 @default.
- W2588474037 hasConceptScore W2588474037C154945302 @default.
- W2588474037 hasConceptScore W2588474037C162324750 @default.
- W2588474037 hasConceptScore W2588474037C185592680 @default.
- W2588474037 hasConceptScore W2588474037C189430467 @default.
- W2588474037 hasConceptScore W2588474037C198531522 @default.
- W2588474037 hasConceptScore W2588474037C204321447 @default.
- W2588474037 hasConceptScore W2588474037C204983608 @default.
- W2588474037 hasConceptScore W2588474037C2776725116 @default.
- W2588474037 hasConceptScore W2588474037C41008148 @default.
- W2588474037 hasConceptScore W2588474037C41895202 @default.
- W2588474037 hasConceptScore W2588474037C43617362 @default.
- W2588474037 hasConceptScore W2588474037C527412718 @default.
- W2588474037 hasConceptScore W2588474037C532629269 @default.
- W2588474037 hasConceptScore W2588474037C90805587 @default.
- W2588474037 hasIssue "26" @default.
- W2588474037 hasLocation W25884740371 @default.
- W2588474037 hasOpenAccess W2588474037 @default.
- W2588474037 hasPrimaryLocation W25884740371 @default.
- W2588474037 hasRelatedWork W1604474074 @default.
- W2588474037 hasRelatedWork W1894217104 @default.
- W2588474037 hasRelatedWork W1997161938 @default.
- W2588474037 hasRelatedWork W2018575461 @default.
- W2588474037 hasRelatedWork W2154206898 @default.
- W2588474037 hasRelatedWork W2160685804 @default.
- W2588474037 hasRelatedWork W2172177048 @default.
- W2588474037 hasRelatedWork W218204549 @default.
- W2588474037 hasRelatedWork W2189126098 @default.
- W2588474037 hasRelatedWork W2326584371 @default.
- W2588474037 hasRelatedWork W234013945 @default.
- W2588474037 hasRelatedWork W2401086188 @default.
- W2588474037 hasRelatedWork W2505389254 @default.
- W2588474037 hasRelatedWork W2804053484 @default.
- W2588474037 hasRelatedWork W293378725 @default.
- W2588474037 hasRelatedWork W2990593945 @default.
- W2588474037 hasRelatedWork W334669044 @default.
- W2588474037 hasRelatedWork W561913045 @default.
- W2588474037 hasRelatedWork W745355315 @default.
- W2588474037 hasRelatedWork W180960733 @default.
- W2588474037 hasVolume "26" @default.
- W2588474037 isParatext "false" @default.
- W2588474037 isRetracted "false" @default.
- W2588474037 magId "2588474037" @default.
- W2588474037 workType "article" @default.