Matches in SemOpenAlex for { <https://semopenalex.org/work/W2247198797> ?p ?o ?g. }
- W2247198797 abstract "Data-driven Machine Translation (MT) systems have been found to require large amounts of data to function well. However, obtaining parallel texts for many languages is time-consuming, expensive and difficult. This thesis aims at improving translation quality for languages that have limited resources by making use of the available data more efficiently. Templates or generalizations of sentence-pairs where sequences of one or more words are replaced by variables are used in the translation model to handle data-sparsity challenges. Templates are built from clusters or equivalence classes that group related terms (words and phrases). As generating such clusters can be time-consuming, clusters are automatically generated by grouping terms based on semantic-similarity, syntactical-coherence and context. Data-sparsity is also a big challenge in statistical language modeling. In many MT systems, sophisticated tools are developed to make the translation models better but they still rely heavily on a restricted-decoder which uses unreliable language models that may not be well suited for translation tasks especially in sparse-data scenarios. Templates can also be used in Language Modeling. Limited training data also increases the number of out-of-vocabulary words and reduces the quality of the translations. Many of the present MT systems either ignore these unknown words or pass them on as is to the final translation assuming that they could be proper nouns. Presence of out-of-vocabulary words and rare words in the input sentence prevents an MT system from finding longer phrasal matches and produces low quality translations due to less reliable language model estimates. Approaches in the past have suggested using stems and synonyms of OOV words as replacements. This thesis uses an algorithm to find possible replacements which are not necessarily synonyms to replace out-of-vocabulary words as well as rare words based on the context in which these words appear. The effectiveness of each of the template-based approaches both in the translation model and in the language model are demonstrated for English→Chinese and English→French. The algorithm to handle out-of-vocabulary and rare words are tested on English→French, English→Chinese and English→Haitian. A Hybrid approach combining all the techniques is also studied in English→Chinese." @default.
- W2247198797 created "2016-06-24" @default.
- W2247198797 creator A5014460968 @default.
- W2247198797 creator A5062362044 @default.
- W2247198797 date "2011-01-01" @default.
- W2247198797 modified "2023-09-23" @default.
- W2247198797 title "Coping with data-sparsity in example-based machine translation" @default.
- W2247198797 cites W133045130 @default.
- W2247198797 cites W1494910745 @default.
- W2247198797 cites W1498238796 @default.
- W2247198797 cites W1508165687 @default.
- W2247198797 cites W1523767002 @default.
- W2247198797 cites W1528268292 @default.
- W2247198797 cites W1531482446 @default.
- W2247198797 cites W1534482508 @default.
- W2247198797 cites W1549285799 @default.
- W2247198797 cites W1573344341 @default.
- W2247198797 cites W158414620 @default.
- W2247198797 cites W1584791343 @default.
- W2247198797 cites W1586528281 @default.
- W2247198797 cites W1589170661 @default.
- W2247198797 cites W1591604169 @default.
- W2247198797 cites W1662133657 @default.
- W2247198797 cites W174630521 @default.
- W2247198797 cites W1850668662 @default.
- W2247198797 cites W1887950249 @default.
- W2247198797 cites W1965605789 @default.
- W2247198797 cites W1966812932 @default.
- W2247198797 cites W1978400666 @default.
- W2247198797 cites W1980414235 @default.
- W2247198797 cites W2002785199 @default.
- W2247198797 cites W2004329216 @default.
- W2247198797 cites W2031351647 @default.
- W2247198797 cites W2036516910 @default.
- W2247198797 cites W2058335193 @default.
- W2247198797 cites W2065459442 @default.
- W2247198797 cites W2078861931 @default.
- W2247198797 cites W2086039194 @default.
- W2247198797 cites W2095743640 @default.
- W2247198797 cites W2096175520 @default.
- W2247198797 cites W2097333193 @default.
- W2247198797 cites W2097661835 @default.
- W2247198797 cites W2097884808 @default.
- W2247198797 cites W2101105183 @default.
- W2247198797 cites W2107130271 @default.
- W2247198797 cites W2107695330 @default.
- W2247198797 cites W2111798208 @default.
- W2247198797 cites W2114013702 @default.
- W2247198797 cites W2116316001 @default.
- W2247198797 cites W2116599427 @default.
- W2247198797 cites W2119168550 @default.
- W2247198797 cites W2121227244 @default.
- W2247198797 cites W2124807415 @default.
- W2247198797 cites W2129610796 @default.
- W2247198797 cites W2136612048 @default.
- W2247198797 cites W2145685230 @default.
- W2247198797 cites W2146113428 @default.
- W2247198797 cites W2146474458 @default.
- W2247198797 cites W2147880316 @default.
- W2247198797 cites W2149327368 @default.
- W2247198797 cites W2150903784 @default.
- W2247198797 cites W2152263452 @default.
- W2247198797 cites W2152322845 @default.
- W2247198797 cites W2153903004 @default.
- W2247198797 cites W2154124206 @default.
- W2247198797 cites W2156985047 @default.
- W2247198797 cites W2161792612 @default.
- W2247198797 cites W2161877964 @default.
- W2247198797 cites W2165199647 @default.
- W2247198797 cites W2165874743 @default.
- W2247198797 cites W2166880518 @default.
- W2247198797 cites W2186267129 @default.
- W2247198797 cites W2403096031 @default.
- W2247198797 cites W2432289224 @default.
- W2247198797 cites W2467575451 @default.
- W2247198797 cites W2882319491 @default.
- W2247198797 cites W2950186769 @default.
- W2247198797 cites W29933666 @default.
- W2247198797 cites W3103747391 @default.
- W2247198797 cites W3144373754 @default.
- W2247198797 cites W3197037075 @default.
- W2247198797 cites W3210345896 @default.
- W2247198797 cites W4629839 @default.
- W2247198797 cites W626175318 @default.
- W2247198797 cites W72275503 @default.
- W2247198797 cites W1533339658 @default.
- W2247198797 cites W1857789879 @default.
- W2247198797 hasPublicationYear "2011" @default.
- W2247198797 type Work @default.
- W2247198797 sameAs 2247198797 @default.
- W2247198797 citedByCount "1" @default.
- W2247198797 countsByYear W22471987972013 @default.
- W2247198797 crossrefType "journal-article" @default.
- W2247198797 hasAuthorship W2247198797A5014460968 @default.
- W2247198797 hasAuthorship W2247198797A5062362044 @default.
- W2247198797 hasConcept C137293760 @default.
- W2247198797 hasConcept C138885662 @default.
- W2247198797 hasConcept C154945302 @default.
- W2247198797 hasConcept C203005215 @default.
- W2247198797 hasConcept C204321447 @default.