Matches in SemOpenAlex for { <https://semopenalex.org/work/W3132094881> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W3132094881 abstract "We introduce a Content-based Document Alignment approach (CDA), an efficient method to align multilingual web documents based on content in creating parallel training data for machine translation (MT) systems operating at the industrial level. CDA works in two steps: (i) projecting documents of a web domain to a shared multilingual space; then (ii) aligning them based on the similarity of their representations in such space. We leverage lexical translation models to build vector representations using TF-IDF. CDA achieves performance comparable with state-of-the-art systems in the WMT-16 Bilingual Document Alignment Shared Task benchmark while operating in multilingual space. Besides, we created two web-scale datasets to examine the robustness of CDA in an industrial setting involving up to 28 languages and millions of documents. The experiments show that CDA is robust, cost-effective, and is significantly superior in (i) processing large and noisy web data and (ii) scaling to new and low-resourced languages." @default.
- W3132094881 created "2021-03-01" @default.
- W3132094881 creator A5056376686 @default.
- W3132094881 creator A5056497976 @default.
- W3132094881 date "2021-02-20" @default.
- W3132094881 modified "2023-09-27" @default.
- W3132094881 title "CDA: a Cost Efficient Content-based Multilingual Web Document Aligner" @default.
- W3132094881 cites W1937758960 @default.
- W3132094881 cites W2000748429 @default.
- W3132094881 cites W2047295649 @default.
- W3132094881 cites W2068297964 @default.
- W3132094881 cites W2092094655 @default.
- W3132094881 cites W2101096097 @default.
- W3132094881 cites W2145080939 @default.
- W3132094881 cites W2156985047 @default.
- W3132094881 cites W2170716095 @default.
- W3132094881 cites W22168010 @default.
- W3132094881 cites W2251569308 @default.
- W3132094881 cites W2508809683 @default.
- W3132094881 cites W2513768144 @default.
- W3132094881 cites W2517504876 @default.
- W3132094881 cites W2773493195 @default.
- W3132094881 cites W2891177506 @default.
- W3132094881 cites W2903035303 @default.
- W3132094881 cites W2949547296 @default.
- W3132094881 cites W2963118869 @default.
- W3132094881 cites W2998215494 @default.
- W3132094881 cites W3003748523 @default.
- W3132094881 cites W8895266 @default.
- W3132094881 hasPublicationYear "2021" @default.
- W3132094881 type Work @default.
- W3132094881 sameAs 3132094881 @default.
- W3132094881 citedByCount "0" @default.
- W3132094881 crossrefType "posted-content" @default.
- W3132094881 hasAuthorship W3132094881A5056376686 @default.
- W3132094881 hasAuthorship W3132094881A5056497976 @default.
- W3132094881 hasConcept C104317684 @default.
- W3132094881 hasConcept C153083717 @default.
- W3132094881 hasConcept C154945302 @default.
- W3132094881 hasConcept C185592680 @default.
- W3132094881 hasConcept C203005215 @default.
- W3132094881 hasConcept C204321447 @default.
- W3132094881 hasConcept C23123220 @default.
- W3132094881 hasConcept C41008148 @default.
- W3132094881 hasConcept C55493867 @default.
- W3132094881 hasConcept C63479239 @default.
- W3132094881 hasConceptScore W3132094881C104317684 @default.
- W3132094881 hasConceptScore W3132094881C153083717 @default.
- W3132094881 hasConceptScore W3132094881C154945302 @default.
- W3132094881 hasConceptScore W3132094881C185592680 @default.
- W3132094881 hasConceptScore W3132094881C203005215 @default.
- W3132094881 hasConceptScore W3132094881C204321447 @default.
- W3132094881 hasConceptScore W3132094881C23123220 @default.
- W3132094881 hasConceptScore W3132094881C41008148 @default.
- W3132094881 hasConceptScore W3132094881C55493867 @default.
- W3132094881 hasConceptScore W3132094881C63479239 @default.
- W3132094881 hasLocation W31320948811 @default.
- W3132094881 hasOpenAccess W3132094881 @default.
- W3132094881 hasPrimaryLocation W31320948811 @default.
- W3132094881 hasRelatedWork W1516946508 @default.
- W3132094881 hasRelatedWork W1562514231 @default.
- W3132094881 hasRelatedWork W1565384306 @default.
- W3132094881 hasRelatedWork W168502520 @default.
- W3132094881 hasRelatedWork W1888418830 @default.
- W3132094881 hasRelatedWork W1984411289 @default.
- W3132094881 hasRelatedWork W1992942687 @default.
- W3132094881 hasRelatedWork W1999548925 @default.
- W3132094881 hasRelatedWork W2045488547 @default.
- W3132094881 hasRelatedWork W2153488166 @default.
- W3132094881 hasRelatedWork W2165259567 @default.
- W3132094881 hasRelatedWork W2553990560 @default.
- W3132094881 hasRelatedWork W26005485 @default.
- W3132094881 hasRelatedWork W2785713289 @default.
- W3132094881 hasRelatedWork W2998272617 @default.
- W3132094881 hasRelatedWork W3037610486 @default.
- W3132094881 hasRelatedWork W3166235828 @default.
- W3132094881 hasRelatedWork W3201191103 @default.
- W3132094881 hasRelatedWork W356468192 @default.
- W3132094881 hasRelatedWork W2003981969 @default.
- W3132094881 isParatext "false" @default.
- W3132094881 isRetracted "false" @default.
- W3132094881 magId "3132094881" @default.
- W3132094881 workType "article" @default.