Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287777476> ?p ?o ?g. }
Showing items 1 to 74 of
74
with 100 items per page.
- W4287777476 abstract "Machine translation requires large amounts of parallel text. While such datasets are abundant in domains such as newswire, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge a parallel corpus in the biomedical domain does not exist for this language pair. In this study, we develop an effective pipeline to acquire and process an English-Chinese parallel corpus, consisting of about 100,000 sentence pairs and 3,000,000 tokens on each side, from the New England Journal of Medicine (NEJM). We show that training on out-of-domain data and fine-tuning with as few as 4,000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en$to$zh (zh$to$en) directions. Translation quality continues to improve at a slower pace on larger in-domain datasets, with an increase of 33.0 (24.3) BLEU for en$to$zh (zh$to$en) directions on the full dataset." @default.
- W4287777476 created "2022-07-26" @default.
- W4287777476 creator A5027639672 @default.
- W4287777476 creator A5052636953 @default.
- W4287777476 date "2020-05-18" @default.
- W4287777476 modified "2023-09-26" @default.
- W4287777476 title "NEJM-enzh: A Parallel Corpus for English-Chinese Translation in the Biomedical Domain" @default.
- W4287777476 hasPublicationYear "2020" @default.
- W4287777476 type Work @default.
- W4287777476 citedByCount "0" @default.
- W4287777476 crossrefType "posted-content" @default.
- W4287777476 hasAuthorship W4287777476A5027639672 @default.
- W4287777476 hasAuthorship W4287777476A5052636953 @default.
- W4287777476 hasBestOaLocation W42877774761 @default.
- W4287777476 hasConcept C104317684 @default.
- W4287777476 hasConcept C105580179 @default.
- W4287777476 hasConcept C111472728 @default.
- W4287777476 hasConcept C13280743 @default.
- W4287777476 hasConcept C134306372 @default.
- W4287777476 hasConcept C138885662 @default.
- W4287777476 hasConcept C149364088 @default.
- W4287777476 hasConcept C154945302 @default.
- W4287777476 hasConcept C185592680 @default.
- W4287777476 hasConcept C199360897 @default.
- W4287777476 hasConcept C203005215 @default.
- W4287777476 hasConcept C204321447 @default.
- W4287777476 hasConcept C205649164 @default.
- W4287777476 hasConcept C2777526511 @default.
- W4287777476 hasConcept C2777530160 @default.
- W4287777476 hasConcept C2779530757 @default.
- W4287777476 hasConcept C33923547 @default.
- W4287777476 hasConcept C36503486 @default.
- W4287777476 hasConcept C41008148 @default.
- W4287777476 hasConcept C43521106 @default.
- W4287777476 hasConcept C55493867 @default.
- W4287777476 hasConcept C98045186 @default.
- W4287777476 hasConceptScore W4287777476C104317684 @default.
- W4287777476 hasConceptScore W4287777476C105580179 @default.
- W4287777476 hasConceptScore W4287777476C111472728 @default.
- W4287777476 hasConceptScore W4287777476C13280743 @default.
- W4287777476 hasConceptScore W4287777476C134306372 @default.
- W4287777476 hasConceptScore W4287777476C138885662 @default.
- W4287777476 hasConceptScore W4287777476C149364088 @default.
- W4287777476 hasConceptScore W4287777476C154945302 @default.
- W4287777476 hasConceptScore W4287777476C185592680 @default.
- W4287777476 hasConceptScore W4287777476C199360897 @default.
- W4287777476 hasConceptScore W4287777476C203005215 @default.
- W4287777476 hasConceptScore W4287777476C204321447 @default.
- W4287777476 hasConceptScore W4287777476C205649164 @default.
- W4287777476 hasConceptScore W4287777476C2777526511 @default.
- W4287777476 hasConceptScore W4287777476C2777530160 @default.
- W4287777476 hasConceptScore W4287777476C2779530757 @default.
- W4287777476 hasConceptScore W4287777476C33923547 @default.
- W4287777476 hasConceptScore W4287777476C36503486 @default.
- W4287777476 hasConceptScore W4287777476C41008148 @default.
- W4287777476 hasConceptScore W4287777476C43521106 @default.
- W4287777476 hasConceptScore W4287777476C55493867 @default.
- W4287777476 hasConceptScore W4287777476C98045186 @default.
- W4287777476 hasLocation W42877774761 @default.
- W4287777476 hasOpenAccess W4287777476 @default.
- W4287777476 hasPrimaryLocation W42877774761 @default.
- W4287777476 hasRelatedWork W10287945 @default.
- W4287777476 hasRelatedWork W11012074 @default.
- W4287777476 hasRelatedWork W13211496 @default.
- W4287777476 hasRelatedWork W2061806 @default.
- W4287777476 hasRelatedWork W3769408 @default.
- W4287777476 hasRelatedWork W3939803 @default.
- W4287777476 hasRelatedWork W867563 @default.
- W4287777476 hasRelatedWork W8895266 @default.
- W4287777476 hasRelatedWork W8912579 @default.
- W4287777476 hasRelatedWork W7571534 @default.
- W4287777476 isParatext "false" @default.
- W4287777476 isRetracted "false" @default.
- W4287777476 workType "article" @default.