Matches in SemOpenAlex for { <https://semopenalex.org/work/W129342743> ?p ?o ?g. }
- W129342743 abstract "Word sense disambiguation has been recognized as a major problem in natural language processing research for over forty years. Much of this work has been stymied by difficulties in acquiring appropriate lexical resources, such as semantic networks and annotated corpora. Following the suggestion in Brown et al. (1991a) and Dagan et al. (1991), we have achieved considerable progress recently by taking advantage of a new source of testing and training materials. Rather than depending on small amounts of hand-labeled text, we have been making use of relatively large amounts of parallel text, text such as the Canadian Hansards (parliamentary debates), which are available in two (or more) languages. The translation can often be used in lieu of hand-labeling. For example, consider the polysemous word sentence, which has two major senses: (1) a judicial sentence, and (2), a syntactic sentence. We can collect a number of sense (1) examples by extracting instances that are translated as peine, and we can collect a number of sense (2) examples by extracting instances that are translated as phrase. In this way, we have been able to acquire a considerable amount of testing and training material for developing and testing our disambiguation algorithms. The availability of this testing and training material has enabled us to develop quantitative disambiguation methods that achieve 90% accuracy in discriminating between two very distinct senses of a noun such as sentence. In the training phase, we collect a number of instances of each sense of the polysemous noun. Then in the testing phase, we are given a new instance of the noun, and are asked to assign the instance to one of the senses. We attempt to answer this question by comparing the context of the unknown instance with contexts of known instances using a Bayesian argument that has been applied successfully in related applications such as author identification and information retrieval. The final section of the paper will describe a number of methodological studies which show that the training set need not be large and that it need not be free from errors. Perhaps most surprisingly, we find that the context should extend ±50 words, an order of magnitude larger than one typically finds in the literature. 1. Word-Sense Disambiguation Consider, for example, the word duty which has at least two quite distinct senses: (1) a tax and (2) an obligation. Three examples of each sense are given in Table 1 below. The classic disambiguation problem is to construct a means for discriminating between two or more sets of examples such as those shown in Table 1. This paper will focus on the methodology required to address the classic problem, and will have less to say about the details required for practical application of this methodology. Consequently, me reader should exercise some caution in interpreting the 90% figure reported here; this figure could easily be swamped out in a practical system by any number of factors that go beyond the scope of this paper. In particular, the Canadian Hansards, one of just the few currently available sources of parallel text, is extremely unbalanced, and is therefore severely limited as a basis for a practical disambiguation system." @default.
- W129342743 created "2016-06-24" @default.
- W129342743 creator A5015876016 @default.
- W129342743 creator A5016543371 @default.
- W129342743 creator A5068671901 @default.
- W129342743 date "2005-01-01" @default.
- W129342743 modified "2023-09-24" @default.
- W129342743 title "Using Bilingual Materials to Develop Word Sense Disambiguation Methods" @default.
- W129342743 cites W1489181569 @default.
- W129342743 cites W1548013757 @default.
- W129342743 cites W1550769831 @default.
- W129342743 cites W1570542661 @default.
- W129342743 cites W1601817326 @default.
- W129342743 cites W1969030495 @default.
- W129342743 cites W1973394465 @default.
- W129342743 cites W1977182536 @default.
- W129342743 cites W1980491396 @default.
- W129342743 cites W2007780422 @default.
- W129342743 cites W2019911971 @default.
- W129342743 cites W2035408139 @default.
- W129342743 cites W2040004971 @default.
- W129342743 cites W2082291422 @default.
- W129342743 cites W2090543924 @default.
- W129342743 cites W2099247782 @default.
- W129342743 cites W2117652747 @default.
- W129342743 cites W2129139611 @default.
- W129342743 cites W2137638032 @default.
- W129342743 cites W2148426685 @default.
- W129342743 cites W2153890685 @default.
- W129342743 cites W2154384676 @default.
- W129342743 cites W2505454603 @default.
- W129342743 cites W2798434063 @default.
- W129342743 cites W30254004 @default.
- W129342743 cites W3133994440 @default.
- W129342743 cites W32412494 @default.
- W129342743 cites W396771645 @default.
- W129342743 cites W50958056 @default.
- W129342743 hasPublicationYear "2005" @default.
- W129342743 type Work @default.
- W129342743 sameAs 129342743 @default.
- W129342743 citedByCount "62" @default.
- W129342743 countsByYear W1293427432012 @default.
- W129342743 countsByYear W1293427432013 @default.
- W129342743 countsByYear W1293427432014 @default.
- W129342743 countsByYear W1293427432015 @default.
- W129342743 countsByYear W1293427432016 @default.
- W129342743 countsByYear W1293427432019 @default.
- W129342743 countsByYear W1293427432021 @default.
- W129342743 crossrefType "journal-article" @default.
- W129342743 hasAuthorship W129342743A5015876016 @default.
- W129342743 hasAuthorship W129342743A5016543371 @default.
- W129342743 hasAuthorship W129342743A5068671901 @default.
- W129342743 hasConcept C121934690 @default.
- W129342743 hasConcept C138885662 @default.
- W129342743 hasConcept C153962237 @default.
- W129342743 hasConcept C154945302 @default.
- W129342743 hasConcept C157659113 @default.
- W129342743 hasConcept C162324750 @default.
- W129342743 hasConcept C187736073 @default.
- W129342743 hasConcept C203005215 @default.
- W129342743 hasConcept C204321447 @default.
- W129342743 hasConcept C2776224158 @default.
- W129342743 hasConcept C2777530160 @default.
- W129342743 hasConcept C2780451532 @default.
- W129342743 hasConcept C41008148 @default.
- W129342743 hasConcept C41895202 @default.
- W129342743 hasConcept C44572571 @default.
- W129342743 hasConcept C51646954 @default.
- W129342743 hasConcept C90805587 @default.
- W129342743 hasConceptScore W129342743C121934690 @default.
- W129342743 hasConceptScore W129342743C138885662 @default.
- W129342743 hasConceptScore W129342743C153962237 @default.
- W129342743 hasConceptScore W129342743C154945302 @default.
- W129342743 hasConceptScore W129342743C157659113 @default.
- W129342743 hasConceptScore W129342743C162324750 @default.
- W129342743 hasConceptScore W129342743C187736073 @default.
- W129342743 hasConceptScore W129342743C203005215 @default.
- W129342743 hasConceptScore W129342743C204321447 @default.
- W129342743 hasConceptScore W129342743C2776224158 @default.
- W129342743 hasConceptScore W129342743C2777530160 @default.
- W129342743 hasConceptScore W129342743C2780451532 @default.
- W129342743 hasConceptScore W129342743C41008148 @default.
- W129342743 hasConceptScore W129342743C41895202 @default.
- W129342743 hasConceptScore W129342743C44572571 @default.
- W129342743 hasConceptScore W129342743C51646954 @default.
- W129342743 hasConceptScore W129342743C90805587 @default.
- W129342743 hasLocation W1293427431 @default.
- W129342743 hasOpenAccess W129342743 @default.
- W129342743 hasPrimaryLocation W1293427431 @default.
- W129342743 hasRelatedWork W1971220772 @default.
- W129342743 hasRelatedWork W1977182536 @default.
- W129342743 hasRelatedWork W2033200726 @default.
- W129342743 hasRelatedWork W2038721957 @default.
- W129342743 hasRelatedWork W2040004971 @default.
- W129342743 hasRelatedWork W2047620598 @default.
- W129342743 hasRelatedWork W2065157922 @default.
- W129342743 hasRelatedWork W2084708974 @default.
- W129342743 hasRelatedWork W2101210369 @default.
- W129342743 hasRelatedWork W2102381086 @default.
- W129342743 hasRelatedWork W2106022904 @default.