Matches in SemOpenAlex for { <https://semopenalex.org/work/W2952662724> ?p ?o ?g. }
- W2952662724 abstract "Segmenting a chunk of text into words is usually the first step of processing Chinese text, but its necessity has rarely been explored. In this paper, we ask the fundamental question of whether Chinese word segmentation (CWS) is necessary for deep learning-based Chinese Natural Language Processing. We benchmark neural word-based models which rely on word segmentation against neural char-based models which do not involve word segmentation in four end-to-end NLP benchmark tasks: language modeling, machine translation, sentence matching/paraphrase and text classification. Through direct comparisons between these two types of models, we find that char-based models consistently outperform word-based models. Based on these observations, we conduct comprehensive experiments to study why word-based models underperform char-based models in these deep learning-based NLP tasks. We show that it is because word-based models are more vulnerable to data sparsity and the presence of out-of-vocabulary (OOV) words, and thus more prone to overfitting. We hope this paper could encourage researchers in the community to rethink the necessity of word segmentation in deep learning-based Chinese Natural Language Processing. footnote{Yuxian Meng and Xiaoya Li contributed equally to this paper.}" @default.
- W2952662724 created "2019-06-27" @default.
- W2952662724 creator A5014777126 @default.
- W2952662724 creator A5023207081 @default.
- W2952662724 creator A5025603106 @default.
- W2952662724 creator A5062811173 @default.
- W2952662724 creator A5074612639 @default.
- W2952662724 creator A5077851706 @default.
- W2952662724 date "2019-05-14" @default.
- W2952662724 modified "2023-09-24" @default.
- W2952662724 title "Is Word Segmentation Necessary for Deep Learning of Chinese Representations?" @default.
- W2952662724 cites W1552919843 @default.
- W2952662724 cites W1588242179 @default.
- W2952662724 cites W1832693441 @default.
- W2952662724 cites W1840435438 @default.
- W2952662724 cites W1902237438 @default.
- W2952662724 cites W1938755728 @default.
- W2952662724 cites W1982498087 @default.
- W2952662724 cites W2033295622 @default.
- W2952662724 cites W2036516910 @default.
- W2952662724 cites W2053130360 @default.
- W2952662724 cites W2064675550 @default.
- W2952662724 cites W2100259670 @default.
- W2952662724 cites W2116343275 @default.
- W2952662724 cites W2118414455 @default.
- W2952662724 cites W2120354757 @default.
- W2952662724 cites W2130942839 @default.
- W2952662724 cites W2136165248 @default.
- W2952662724 cites W2147880316 @default.
- W2952662724 cites W2163605009 @default.
- W2952662724 cites W2250659129 @default.
- W2952662724 cites W2250739653 @default.
- W2952662724 cites W2251362855 @default.
- W2952662724 cites W2251681966 @default.
- W2952662724 cites W2251811146 @default.
- W2952662724 cites W2252264945 @default.
- W2952662724 cites W2274880506 @default.
- W2952662724 cites W2358307482 @default.
- W2952662724 cites W2516334389 @default.
- W2952662724 cites W2566150155 @default.
- W2952662724 cites W2593833795 @default.
- W2952662724 cites W2740418170 @default.
- W2952662724 cites W2740603853 @default.
- W2952662724 cites W2757350179 @default.
- W2952662724 cites W2759366113 @default.
- W2952662724 cites W2787109023 @default.
- W2952662724 cites W2799090016 @default.
- W2952662724 cites W2876111955 @default.
- W2952662724 cites W2889968917 @default.
- W2952662724 cites W2962784628 @default.
- W2952662724 cites W2962801832 @default.
- W2952662724 cites W2962814195 @default.
- W2952662724 cites W2963355640 @default.
- W2952662724 cites W2963403868 @default.
- W2952662724 cites W2963628345 @default.
- W2952662724 cites W2963888305 @default.
- W2952662724 cites W2963913268 @default.
- W2952662724 cites W2963997155 @default.
- W2952662724 cites W2964093505 @default.
- W2952662724 cites W2964352165 @default.
- W2952662724 cites W61894391 @default.
- W2952662724 cites W982451576 @default.
- W2952662724 doi "https://doi.org/10.48550/arxiv.1905.05526" @default.
- W2952662724 hasPublicationYear "2019" @default.
- W2952662724 type Work @default.
- W2952662724 sameAs 2952662724 @default.
- W2952662724 citedByCount "12" @default.
- W2952662724 countsByYear W29526627242019 @default.
- W2952662724 countsByYear W29526627242020 @default.
- W2952662724 countsByYear W29526627242021 @default.
- W2952662724 crossrefType "posted-content" @default.
- W2952662724 hasAuthorship W2952662724A5014777126 @default.
- W2952662724 hasAuthorship W2952662724A5023207081 @default.
- W2952662724 hasAuthorship W2952662724A5025603106 @default.
- W2952662724 hasAuthorship W2952662724A5062811173 @default.
- W2952662724 hasAuthorship W2952662724A5074612639 @default.
- W2952662724 hasAuthorship W2952662724A5077851706 @default.
- W2952662724 hasBestOaLocation W29526627241 @default.
- W2952662724 hasConcept C108583219 @default.
- W2952662724 hasConcept C13280743 @default.
- W2952662724 hasConcept C137293760 @default.
- W2952662724 hasConcept C138885662 @default.
- W2952662724 hasConcept C154945302 @default.
- W2952662724 hasConcept C185798385 @default.
- W2952662724 hasConcept C203005215 @default.
- W2952662724 hasConcept C204321447 @default.
- W2952662724 hasConcept C205649164 @default.
- W2952662724 hasConcept C22019652 @default.
- W2952662724 hasConcept C2777530160 @default.
- W2952662724 hasConcept C2780922921 @default.
- W2952662724 hasConcept C41008148 @default.
- W2952662724 hasConcept C41895202 @default.
- W2952662724 hasConcept C50644808 @default.
- W2952662724 hasConcept C89600930 @default.
- W2952662724 hasConcept C90805587 @default.
- W2952662724 hasConcept C98501671 @default.
- W2952662724 hasConceptScore W2952662724C108583219 @default.
- W2952662724 hasConceptScore W2952662724C13280743 @default.
- W2952662724 hasConceptScore W2952662724C137293760 @default.
- W2952662724 hasConceptScore W2952662724C138885662 @default.