Matches in SemOpenAlex for { <https://semopenalex.org/work/W73413019> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W73413019 abstract "Based on the concepts of bzd~rectwnal converswn and automahc evaluatzon, we propose two user. adaptation mechanzsms, character-preference learn. in9 and pseudo-word learning, for resolving Chinese homophone ambiguities in syllable-to.character conversion. The 1991 Umted Daily corpus of approximately 10 million Chinese characters ts used for extraction of 10 reporter-specific article databases and .[or computat,on of word frequencies and character higrams. Ezpemments show that ~0.5 percent (testing sets) to 71.8 percent (trammg sets) of conversion er. rots can be eliminated through the proposed mechanisms. These concepts are thus very useful tn apphcattons such as Chinese znput methods and speech recognition systems. 1 I n t r o d u c t i o n Corpus-based Chinese NLP research has been very active in the recent years as more and more computer readable Chinese corpora are available. Reported corpus-based NLP applications [10] include machine translation, word segmentation, character recognition, text classification, lexicography, and spelling checker. In this paper, we will describe our work on adaptive Chinese homophone disambiguation (also known as phonetic-input-to-character conversion or phonetic decoding) using part of the 1991 United Daily (UD) corpus of approximately 10 million Chinese characters (Hanzi). It requires a coding method, structural or phonetic, to input Chinese characters into a computer, since there are more than I0,000 of them in common use. In the literature [3,7], there are several hundred different coding methods for this purpose. For most users, phonetic coding (Pinyin or Bopomofo) is the choice. To input a Chinese character, the user simply keys in its corresponding phonetic code. It is easy to learn, but suffers from the homophone problem, i.e., a phonetic code corresponding to several different characters. Therefore, the user needs to choose the desired character from a (usually long) list of candidate characters. It is inefficient and annoying. So, automatic homophone disambiguation is highly desirable. Several disambiguation approaches have been reported in the literature [3, 7]. Some of them have even been realized in commercial input methods, e.g., ttanin, WangXing, Going. However, the accuracies of these disambiguators are not satisfactory. In this paper, we propose a corpus-based adaptation method for improving the accuracy of homophone disambiguation. For homophone disambiguation, what we need as input is syllable (phonetic code) corpora instead of text corpora. For adaptation, what we need is personal corpora instead of general corpora (such as the UD corpus). Thus, we first design a selection procedure to extract articles by individual reporters. Ten personal corpora were set up in this way. An additional domain-specific corpus, translated AP news, was built up similarly. Then, we design a highly-reliable (99.7% correct) character-tosyllable converter [I] to transfer the text corpora into syllable corpora. Our baseline disambiguator is rather conventional, composed of a word-lattice searching module, a path scorer, and a lexicon-driven word hypothesizer. Using the original text corpora and the corresponding syllable corpora, we propose a user-adaptation method, applying the concept of bidirectional conversion [I] and automatic evaluation [2]. The adaptation method includes two parts: character-preference learning and pseudo word learning. Given a personal corpus (i.e., sample text), the adaptation pro-" @default.
- W73413019 created "2016-06-24" @default.
- W73413019 creator A5031252462 @default.
- W73413019 date "1993-01-01" @default.
- W73413019 modified "2023-09-24" @default.
- W73413019 title "Corpus-Based Adaptation Mechanisms for Chinese Homophone Disambiguation." @default.
- W73413019 cites W1965307992 @default.
- W73413019 cites W1969426968 @default.
- W73413019 cites W1989669548 @default.
- W73413019 cites W2143459009 @default.
- W73413019 cites W66504724 @default.
- W73413019 hasPublicationYear "1993" @default.
- W73413019 type Work @default.
- W73413019 sameAs 73413019 @default.
- W73413019 citedByCount "2" @default.
- W73413019 countsByYear W734130192019 @default.
- W73413019 crossrefType "proceedings-article" @default.
- W73413019 hasAuthorship W73413019A5031252462 @default.
- W73413019 hasConcept C105795698 @default.
- W73413019 hasConcept C138885662 @default.
- W73413019 hasConcept C154945302 @default.
- W73413019 hasConcept C160253069 @default.
- W73413019 hasConcept C179518139 @default.
- W73413019 hasConcept C204321447 @default.
- W73413019 hasConcept C2524010 @default.
- W73413019 hasConcept C2777801307 @default.
- W73413019 hasConcept C2780861071 @default.
- W73413019 hasConcept C2781051154 @default.
- W73413019 hasConcept C2781095461 @default.
- W73413019 hasConcept C28490314 @default.
- W73413019 hasConcept C33923547 @default.
- W73413019 hasConcept C41008148 @default.
- W73413019 hasConcept C41895202 @default.
- W73413019 hasConceptScore W73413019C105795698 @default.
- W73413019 hasConceptScore W73413019C138885662 @default.
- W73413019 hasConceptScore W73413019C154945302 @default.
- W73413019 hasConceptScore W73413019C160253069 @default.
- W73413019 hasConceptScore W73413019C179518139 @default.
- W73413019 hasConceptScore W73413019C204321447 @default.
- W73413019 hasConceptScore W73413019C2524010 @default.
- W73413019 hasConceptScore W73413019C2777801307 @default.
- W73413019 hasConceptScore W73413019C2780861071 @default.
- W73413019 hasConceptScore W73413019C2781051154 @default.
- W73413019 hasConceptScore W73413019C2781095461 @default.
- W73413019 hasConceptScore W73413019C28490314 @default.
- W73413019 hasConceptScore W73413019C33923547 @default.
- W73413019 hasConceptScore W73413019C41008148 @default.
- W73413019 hasConceptScore W73413019C41895202 @default.
- W73413019 hasLocation W734130191 @default.
- W73413019 hasOpenAccess W73413019 @default.
- W73413019 hasPrimaryLocation W734130191 @default.
- W73413019 hasRelatedWork W175870160 @default.
- W73413019 hasRelatedWork W2043476754 @default.
- W73413019 hasRelatedWork W2186141093 @default.
- W73413019 hasRelatedWork W2475422983 @default.
- W73413019 hasRelatedWork W2972161408 @default.
- W73413019 hasRelatedWork W3165364314 @default.
- W73413019 hasRelatedWork W3192117492 @default.
- W73413019 hasRelatedWork W2207885129 @default.
- W73413019 hasRelatedWork W2812785011 @default.
- W73413019 hasRelatedWork W2815475153 @default.
- W73413019 hasRelatedWork W2820667905 @default.
- W73413019 hasRelatedWork W2825447793 @default.
- W73413019 hasRelatedWork W2830657100 @default.
- W73413019 hasRelatedWork W2831169937 @default.
- W73413019 hasRelatedWork W2837045883 @default.
- W73413019 hasRelatedWork W2854350534 @default.
- W73413019 hasRelatedWork W2862630605 @default.
- W73413019 hasRelatedWork W2883802218 @default.
- W73413019 hasRelatedWork W2932088059 @default.
- W73413019 hasRelatedWork W2959740002 @default.
- W73413019 isParatext "false" @default.
- W73413019 isRetracted "false" @default.
- W73413019 magId "73413019" @default.
- W73413019 workType "article" @default.