Matches in SemOpenAlex for { <https://semopenalex.org/work/W26823> ?p ?o ?g. }
- W26823 abstract "The development of large-scale rules and grammars for a Rule-Based Machine Translation (RBMT) system is labour-intensive, error-prone and expensive. Current research in Machine Translation (MT) tends to focus on the development of corpus-based systems which can overcome the problem of knowledge acquisition.Corpus-Based Machine Translation (CBMT) can take the form of Statistical Machine Translation (SMT) or Example-Based Machine Translation (EBMT). Despite the benefits of EBMT, SMT is currently the dominant paradigm and many systems classified as example-based integrate additional rule-based and statistical techniques. The benefits of an EBMT system which does not require extensive linguistic resources and can produce reasonably intelligible and accurate translations cannot be overlooked. We show that our linguistics-lite EBMT system can outperform an SMT system trained on the same data.The work reported in this thesis describes the development of a linguistics-lite EBMT system which does not have recourse to extensive linguistic resources. We apply the Marker Hypothesis (Green, 1979) — a psycholinguistic theory which states that all natural languages are ‘marked’ for complex syntactic structure at surface form by a closed set of specific lexemes and morphemes. We use this technique in different environments to segment aligned (English, French) phrases and sentences. We then apply an alignment algorithm which can deduce smaller aligned chunks and words. Following a process similar to (Block, 2000), we generalise these alignments by replacing certain function words with an associated tag. In so doing, we cluster on marker words and add flexibility to our matching process. In a post hoc stage we treat the World Wide Web as a large corpus and validate and correct instances of determiner-noun and noun-verb boundary friction.We have applied our marker-based EBMT system to different bitexts and have explored its applicability in various environments. We have developed a phrase-based EBMT system (Gough et al., 2002; Way and Gough, 2003). We show that despite the perceived low quality of on-line MT systems, our EBMT system can produce good quality translations when such systems are used to seed its memories.(Carl, 2003a; Schaler et al., 2003) suggest that EBMT is more suited to controlled translation than RBMT as it has been known to overcome the ‘knowledge acquisition bottleneck’. To this end, we developed the first controlled EBMT system (Gough and Way, 2003; Way and Gough, 2004). Given the lack of controlled bitexts, we used an on-line MT system Logomedia to translate a set of controlled English sentences, We performed experiments using controlled analysis and generation and assessed the performance of our system at each stage. We made a number of improvements to our sub-sentential alignment algorithm and following some minimal adjustments to our system, we show that our controlled EBMT system can outperform an RBMT system.We applied the Marker Hypothesis to a more scalable data set. We trained our system on 203,529 sentences extracted from a Sun Microsystems Translation Memory. We thus reduced problems of data-sparseness and limited our dependence on Logomedia. We show that scaling up data in a marker-based EBMT system improves the quality of our translations. We also report on the benefits of extracting lexical equivalences from the corpus using Mutual Information." @default.
- W26823 created "2016-06-24" @default.
- W26823 creator A5089995576 @default.
- W26823 date "2005-01-01" @default.
- W26823 modified "2023-09-24" @default.
- W26823 title "Example-based machine translation using the marker hypothesis" @default.
- W26823 cites W127405262 @default.
- W26823 cites W142697413 @default.
- W26823 cites W1481504940 @default.
- W26823 cites W1484082930 @default.
- W26823 cites W1489409710 @default.
- W26823 cites W1489834179 @default.
- W26823 cites W1492999621 @default.
- W26823 cites W1527762853 @default.
- W26823 cites W1534482508 @default.
- W26823 cites W1538741010 @default.
- W26823 cites W1544826511 @default.
- W26823 cites W1550540280 @default.
- W26823 cites W1550830489 @default.
- W26823 cites W1551705512 @default.
- W26823 cites W1555499375 @default.
- W26823 cites W1569261078 @default.
- W26823 cites W1573634232 @default.
- W26823 cites W1580248332 @default.
- W26823 cites W1581347075 @default.
- W26823 cites W1586060904 @default.
- W26823 cites W1586528281 @default.
- W26823 cites W1592451621 @default.
- W26823 cites W1593045043 @default.
- W26823 cites W1644866298 @default.
- W26823 cites W1782150360 @default.
- W26823 cites W1872988096 @default.
- W26823 cites W191944152 @default.
- W26823 cites W193690862 @default.
- W26823 cites W1966032129 @default.
- W26823 cites W1970026646 @default.
- W26823 cites W1974689608 @default.
- W26823 cites W1974731480 @default.
- W26823 cites W2008875494 @default.
- W26823 cites W2011008859 @default.
- W26823 cites W2048390999 @default.
- W26823 cites W2062766658 @default.
- W26823 cites W2065459442 @default.
- W26823 cites W2070333056 @default.
- W26823 cites W2079025601 @default.
- W26823 cites W2088781183 @default.
- W26823 cites W2093825590 @default.
- W26823 cites W2098551021 @default.
- W26823 cites W2099838536 @default.
- W26823 cites W2116316001 @default.
- W26823 cites W2120513984 @default.
- W26823 cites W2127571918 @default.
- W26823 cites W2131850986 @default.
- W26823 cites W2134163909 @default.
- W26823 cites W2153653739 @default.
- W26823 cites W2156985047 @default.
- W26823 cites W2161792612 @default.
- W26823 cites W2163068386 @default.
- W26823 cites W2166274350 @default.
- W26823 cites W2170026053 @default.
- W26823 cites W2188715685 @default.
- W26823 cites W2246695851 @default.
- W26823 cites W2951039835 @default.
- W26823 cites W3037924790 @default.
- W26823 cites W3100571865 @default.
- W26823 cites W3101931880 @default.
- W26823 cites W3198494294 @default.
- W26823 cites W33032091 @default.
- W26823 cites W52237254 @default.
- W26823 cites W634521669 @default.
- W26823 cites W89958697 @default.
- W26823 cites W95780973 @default.
- W26823 cites W1533339658 @default.
- W26823 cites W1578564752 @default.
- W26823 cites W16231057 @default.
- W26823 hasPublicationYear "2005" @default.
- W26823 type Work @default.
- W26823 sameAs 26823 @default.
- W26823 citedByCount "7" @default.
- W26823 countsByYear W268232020 @default.
- W26823 crossrefType "dissertation" @default.
- W26823 hasAuthorship W26823A5089995576 @default.
- W26823 hasConcept C130597682 @default.
- W26823 hasConcept C148526163 @default.
- W26823 hasConcept C154945302 @default.
- W26823 hasConcept C177264268 @default.
- W26823 hasConcept C199360897 @default.
- W26823 hasConcept C203005215 @default.
- W26823 hasConcept C204321447 @default.
- W26823 hasConcept C24687705 @default.
- W26823 hasConcept C41008148 @default.
- W26823 hasConcept C53893814 @default.
- W26823 hasConcept C98045186 @default.
- W26823 hasConceptScore W26823C130597682 @default.
- W26823 hasConceptScore W26823C148526163 @default.
- W26823 hasConceptScore W26823C154945302 @default.
- W26823 hasConceptScore W26823C177264268 @default.
- W26823 hasConceptScore W26823C199360897 @default.
- W26823 hasConceptScore W26823C203005215 @default.
- W26823 hasConceptScore W26823C204321447 @default.