Matches in SemOpenAlex for { <https://semopenalex.org/work/W4285222470> ?p ?o ?g. }
Showing items 1 to 89 of
89
with 100 items per page.
- W4285222470 endingPage "78938" @default.
- W4285222470 startingPage "78928" @default.
- W4285222470 abstract "A bilingual corpus is vital for natural language processing problems, especially in machine translation. The larger and better quality the corpus is, the higher the efficiency of the resulting machine translation is. There are two popular approaches to building a bilingual corpus. The first is building one automatically based on resources that are available on the internet, typically bilingual websites. The second approach is to construct one manually. Automated construction methods are being used more frequently because they are less expensive and there are a growing number of bilingual websites to exploit. In this paper, we use automated collection methods for a bilingual website to create a bilingual Chinese-Vietnamese corpus. In particular, the bilingual website we use to collect the data is the website of a multilingual dictionary (https://glosbe.com). We collected the Chinese-Vietnamese corpus from this website that includes more than 400k sentence pairs. We chose 100,000 sentence pairs in this corpus for machine translation experiments. From the corpus, we built five datasets consisting of 20k, 40k, 60k, 80k, and 100k sentence pairs, respectively. In addition, we built five additional datasets, applying word segmentation on the sentences of the original datasets. The experimental results showed that: (1) the quality of the corpus is relatively good with the highest BLEU score of 19.8, although there are still some issues that need to be addressed in future works; (2) the larger the corpus is, the higher the machine translation quality is; and (3) the untokenized datasets help train better translation models than the tokenized datasets." @default.
- W4285222470 created "2022-07-14" @default.
- W4285222470 creator A5052627468 @default.
- W4285222470 creator A5053403067 @default.
- W4285222470 creator A5058762356 @default.
- W4285222470 creator A5061796051 @default.
- W4285222470 creator A5069886442 @default.
- W4285222470 date "2022-01-01" @default.
- W4285222470 modified "2023-10-14" @default.
- W4285222470 title "A Method of Chinese-Vietnamese Bilingual Corpus Construction for Machine Translation" @default.
- W4285222470 cites W1902237438 @default.
- W4285222470 cites W1978167847 @default.
- W4285222470 cites W1991659251 @default.
- W4285222470 cites W2086202918 @default.
- W4285222470 cites W2091957306 @default.
- W4285222470 cites W2101105183 @default.
- W4285222470 cites W2123675427 @default.
- W4285222470 cites W2165142900 @default.
- W4285222470 cites W2550455130 @default.
- W4285222470 cites W2624054409 @default.
- W4285222470 cites W2918555272 @default.
- W4285222470 cites W2963641561 @default.
- W4285222470 cites W3035016936 @default.
- W4285222470 cites W3040449765 @default.
- W4285222470 cites W3101547357 @default.
- W4285222470 doi "https://doi.org/10.1109/access.2022.3186978" @default.
- W4285222470 hasPublicationYear "2022" @default.
- W4285222470 type Work @default.
- W4285222470 citedByCount "3" @default.
- W4285222470 countsByYear W42852224702022 @default.
- W4285222470 countsByYear W42852224702023 @default.
- W4285222470 crossrefType "journal-article" @default.
- W4285222470 hasAuthorship W4285222470A5052627468 @default.
- W4285222470 hasAuthorship W4285222470A5053403067 @default.
- W4285222470 hasAuthorship W4285222470A5058762356 @default.
- W4285222470 hasAuthorship W4285222470A5061796051 @default.
- W4285222470 hasAuthorship W4285222470A5069886442 @default.
- W4285222470 hasBestOaLocation W42852224701 @default.
- W4285222470 hasConcept C103621254 @default.
- W4285222470 hasConcept C111472728 @default.
- W4285222470 hasConcept C138885662 @default.
- W4285222470 hasConcept C154945302 @default.
- W4285222470 hasConcept C165696696 @default.
- W4285222470 hasConcept C199360897 @default.
- W4285222470 hasConcept C203005215 @default.
- W4285222470 hasConcept C204321447 @default.
- W4285222470 hasConcept C23123220 @default.
- W4285222470 hasConcept C2777530160 @default.
- W4285222470 hasConcept C2779235283 @default.
- W4285222470 hasConcept C2779530757 @default.
- W4285222470 hasConcept C2780801425 @default.
- W4285222470 hasConcept C38652104 @default.
- W4285222470 hasConcept C41008148 @default.
- W4285222470 hasConcept C41895202 @default.
- W4285222470 hasConceptScore W4285222470C103621254 @default.
- W4285222470 hasConceptScore W4285222470C111472728 @default.
- W4285222470 hasConceptScore W4285222470C138885662 @default.
- W4285222470 hasConceptScore W4285222470C154945302 @default.
- W4285222470 hasConceptScore W4285222470C165696696 @default.
- W4285222470 hasConceptScore W4285222470C199360897 @default.
- W4285222470 hasConceptScore W4285222470C203005215 @default.
- W4285222470 hasConceptScore W4285222470C204321447 @default.
- W4285222470 hasConceptScore W4285222470C23123220 @default.
- W4285222470 hasConceptScore W4285222470C2777530160 @default.
- W4285222470 hasConceptScore W4285222470C2779235283 @default.
- W4285222470 hasConceptScore W4285222470C2779530757 @default.
- W4285222470 hasConceptScore W4285222470C2780801425 @default.
- W4285222470 hasConceptScore W4285222470C38652104 @default.
- W4285222470 hasConceptScore W4285222470C41008148 @default.
- W4285222470 hasConceptScore W4285222470C41895202 @default.
- W4285222470 hasLocation W42852224701 @default.
- W4285222470 hasOpenAccess W4285222470 @default.
- W4285222470 hasPrimaryLocation W42852224701 @default.
- W4285222470 hasRelatedWork W1520632506 @default.
- W4285222470 hasRelatedWork W1527658996 @default.
- W4285222470 hasRelatedWork W2004185987 @default.
- W4285222470 hasRelatedWork W2091957306 @default.
- W4285222470 hasRelatedWork W2123675427 @default.
- W4285222470 hasRelatedWork W2189309697 @default.
- W4285222470 hasRelatedWork W2765190181 @default.
- W4285222470 hasRelatedWork W2775554247 @default.
- W4285222470 hasRelatedWork W2914866113 @default.
- W4285222470 hasRelatedWork W4285222470 @default.
- W4285222470 hasVolume "10" @default.
- W4285222470 isParatext "false" @default.
- W4285222470 isRetracted "false" @default.
- W4285222470 workType "article" @default.