Matches in SemOpenAlex for { <https://semopenalex.org/work/W623156858> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W623156858 endingPage "296" @default.
- W623156858 startingPage "289" @default.
- W623156858 abstract "In this thesis I aim to improve phrase-based statistical machine translation (PBSMT) in a number of ways by the use of text harmonization strategies. PBSMT systems are built by training statistical models on large corpora of human translations. This architecture generally performs well for languages with similar structure. If the languages are different for example with respect to word order or morphological complexity, however, the standard methods do not tend to work well. I address this problem through text harmonization, by making texts more similar before training and applying a PBSMT system. I investigate how text harmonization can be used to improve PBSMT with a focus on four areas: compounding, definiteness, word order, and unknown words. For the first three areas, the focus is on linguistic differences between languages, which I address by applying transformation rules, using either rule-based or machine learning-based techniques, to the source or target data. For the last area, unknown words, I harmonize the translation input to the training data by replacing unknown words with known alternatives. I show that translation into languages with closed compounds can be improved by splitting and merging compounds. I develop new merging algorithms that outperform previously suggested algorithms and show how part-of-speech tags can be used to improve the order of compound parts. Scandinavian definite noun phrases are identified as a problem forPBSMT in translation into Scandinavian languages and I propose a preprocessing approach that addresses this problem and gives large improvements over a baseline. Several previous proposals for how to handle differences in reordering exist; I propose two types of extensions, iterating reordering and word alignment and using automatically induced word classes, which allow these methods to be used for less-resourced languages. Finally I identify several ways of replacing unknown words in the translation input, most notably a spell checking-inspired algorithm, which can be trained using character-based PBSMT techniques. Overall I present several approaches for extending PBSMT by the use of pre- and postprocessing techniques for text harmonization, and show experimentally that these methods work. Text harmonization methods are an efficient way to improve statistical machine translation within the phrase-based approach, without resorting to more complex models." @default.
- W623156858 created "2016-06-24" @default.
- W623156858 creator A5051985869 @default.
- W623156858 date "2011-01-01" @default.
- W623156858 modified "2023-09-23" @default.
- W623156858 title "Definite Noun Phrases in Statistical Machine Translation into Scandinavian Languages" @default.
- W623156858 cites W1510290649 @default.
- W623156858 cites W1631260214 @default.
- W623156858 cites W1966678781 @default.
- W623156858 cites W2008961349 @default.
- W623156858 cites W2015350341 @default.
- W623156858 cites W2078861931 @default.
- W623156858 cites W2080012968 @default.
- W623156858 cites W2091542740 @default.
- W623156858 cites W2101105183 @default.
- W623156858 cites W2117745860 @default.
- W623156858 cites W2124807415 @default.
- W623156858 cites W2162245945 @default.
- W623156858 cites W2170464899 @default.
- W623156858 cites W2171421863 @default.
- W623156858 cites W22168010 @default.
- W623156858 cites W2405635321 @default.
- W623156858 cites W2407523058 @default.
- W623156858 cites W2494948086 @default.
- W623156858 cites W314345615 @default.
- W623156858 hasPublicationYear "2011" @default.
- W623156858 type Work @default.
- W623156858 sameAs 623156858 @default.
- W623156858 citedByCount "2" @default.
- W623156858 countsByYear W6231568582012 @default.
- W623156858 crossrefType "journal-article" @default.
- W623156858 hasAuthorship W623156858A5051985869 @default.
- W623156858 hasConcept C120665830 @default.
- W623156858 hasConcept C121332964 @default.
- W623156858 hasConcept C121934690 @default.
- W623156858 hasConcept C138885662 @default.
- W623156858 hasConcept C153962237 @default.
- W623156858 hasConcept C154945302 @default.
- W623156858 hasConcept C192209626 @default.
- W623156858 hasConcept C203005215 @default.
- W623156858 hasConcept C204321447 @default.
- W623156858 hasConcept C2776224158 @default.
- W623156858 hasConcept C2779056149 @default.
- W623156858 hasConcept C34736171 @default.
- W623156858 hasConcept C41008148 @default.
- W623156858 hasConcept C41895202 @default.
- W623156858 hasConcept C70777604 @default.
- W623156858 hasConceptScore W623156858C120665830 @default.
- W623156858 hasConceptScore W623156858C121332964 @default.
- W623156858 hasConceptScore W623156858C121934690 @default.
- W623156858 hasConceptScore W623156858C138885662 @default.
- W623156858 hasConceptScore W623156858C153962237 @default.
- W623156858 hasConceptScore W623156858C154945302 @default.
- W623156858 hasConceptScore W623156858C192209626 @default.
- W623156858 hasConceptScore W623156858C203005215 @default.
- W623156858 hasConceptScore W623156858C204321447 @default.
- W623156858 hasConceptScore W623156858C2776224158 @default.
- W623156858 hasConceptScore W623156858C2779056149 @default.
- W623156858 hasConceptScore W623156858C34736171 @default.
- W623156858 hasConceptScore W623156858C41008148 @default.
- W623156858 hasConceptScore W623156858C41895202 @default.
- W623156858 hasConceptScore W623156858C70777604 @default.
- W623156858 hasLocation W6231568581 @default.
- W623156858 hasOpenAccess W623156858 @default.
- W623156858 hasPrimaryLocation W6231568581 @default.
- W623156858 hasRelatedWork W1569284214 @default.
- W623156858 hasRelatedWork W1582459552 @default.
- W623156858 hasRelatedWork W170557285 @default.
- W623156858 hasRelatedWork W1775221659 @default.
- W623156858 hasRelatedWork W1954109191 @default.
- W623156858 hasRelatedWork W2019102947 @default.
- W623156858 hasRelatedWork W2034386547 @default.
- W623156858 hasRelatedWork W2049392535 @default.
- W623156858 hasRelatedWork W2055866651 @default.
- W623156858 hasRelatedWork W2080012968 @default.
- W623156858 hasRelatedWork W2126725946 @default.
- W623156858 hasRelatedWork W2132248251 @default.
- W623156858 hasRelatedWork W2137049801 @default.
- W623156858 hasRelatedWork W2156843425 @default.
- W623156858 hasRelatedWork W2371019081 @default.
- W623156858 hasRelatedWork W2523099195 @default.
- W623156858 hasRelatedWork W2883766708 @default.
- W623156858 hasRelatedWork W2915346886 @default.
- W623156858 hasRelatedWork W2974969219 @default.
- W623156858 hasRelatedWork W116513489 @default.
- W623156858 isParatext "false" @default.
- W623156858 isRetracted "false" @default.
- W623156858 magId "623156858" @default.
- W623156858 workType "article" @default.