Matches in SemOpenAlex for { <https://semopenalex.org/work/W1998974009> ?p ?o ?g. }
- W1998974009 endingPage "149" @default.
- W1998974009 startingPage "127" @default.
- W1998974009 abstract "Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket , that can account for multiple granularities simultaneously." @default.
- W1998974009 created "2016-06-24" @default.
- W1998974009 creator A5065472000 @default.
- W1998974009 creator A5076876084 @default.
- W1998974009 date "2003-06-01" @default.
- W1998974009 modified "2023-09-27" @default.
- W1998974009 title "Mostly-unsupervised statistical segmentation of Japanese kanji sequences" @default.
- W1998974009 cites W123715776 @default.
- W1998974009 cites W138393627 @default.
- W1998974009 cites W144101457 @default.
- W1998974009 cites W1558333962 @default.
- W1998974009 cites W1583770177 @default.
- W1998974009 cites W1586407478 @default.
- W1998974009 cites W1593045043 @default.
- W1998974009 cites W177759513 @default.
- W1998974009 cites W186138571 @default.
- W1998974009 cites W1911521811 @default.
- W1998974009 cites W1979262500 @default.
- W1998974009 cites W1980862600 @default.
- W1998974009 cites W1992696103 @default.
- W1998974009 cites W1997644175 @default.
- W1998974009 cites W2008434289 @default.
- W1998974009 cites W2015201047 @default.
- W1998974009 cites W201917955 @default.
- W1998974009 cites W204603808 @default.
- W1998974009 cites W2047541753 @default.
- W1998974009 cites W2048179523 @default.
- W1998974009 cites W2099111195 @default.
- W1998974009 cites W2101711363 @default.
- W1998974009 cites W2105783022 @default.
- W1998974009 cites W2110190189 @default.
- W1998974009 cites W2121497944 @default.
- W1998974009 cites W2129287413 @default.
- W1998974009 cites W2142263282 @default.
- W1998974009 cites W2153161205 @default.
- W1998974009 cites W2158874082 @default.
- W1998974009 cites W2160356260 @default.
- W1998974009 cites W2164177007 @default.
- W1998974009 cites W2217873082 @default.
- W1998974009 cites W2244930836 @default.
- W1998974009 cites W23665312 @default.
- W1998974009 cites W2420187884 @default.
- W1998974009 cites W2430312675 @default.
- W1998974009 cites W2518835881 @default.
- W1998974009 cites W2996160789 @default.
- W1998974009 cites W3043710305 @default.
- W1998974009 cites W82047487 @default.
- W1998974009 cites W3143432948 @default.
- W1998974009 doi "https://doi.org/10.1017/s1351324902002954" @default.
- W1998974009 hasPublicationYear "2003" @default.
- W1998974009 type Work @default.
- W1998974009 sameAs 1998974009 @default.
- W1998974009 citedByCount "29" @default.
- W1998974009 countsByYear W19989740092012 @default.
- W1998974009 countsByYear W19989740092013 @default.
- W1998974009 countsByYear W19989740092014 @default.
- W1998974009 countsByYear W19989740092018 @default.
- W1998974009 countsByYear W19989740092020 @default.
- W1998974009 countsByYear W19989740092021 @default.
- W1998974009 crossrefType "journal-article" @default.
- W1998974009 hasAuthorship W1998974009A5065472000 @default.
- W1998974009 hasAuthorship W1998974009A5076876084 @default.
- W1998974009 hasBestOaLocation W19989740092 @default.
- W1998974009 hasConcept C111472728 @default.
- W1998974009 hasConcept C134306372 @default.
- W1998974009 hasConcept C138885662 @default.
- W1998974009 hasConcept C153180895 @default.
- W1998974009 hasConcept C154945302 @default.
- W1998974009 hasConcept C204321447 @default.
- W1998974009 hasConcept C2776372474 @default.
- W1998974009 hasConcept C2778121359 @default.
- W1998974009 hasConcept C2781051154 @default.
- W1998974009 hasConcept C33923547 @default.
- W1998974009 hasConcept C41008148 @default.
- W1998974009 hasConcept C41895202 @default.
- W1998974009 hasConcept C77618280 @default.
- W1998974009 hasConcept C83535845 @default.
- W1998974009 hasConcept C89600930 @default.
- W1998974009 hasConcept C90805587 @default.
- W1998974009 hasConcept C98501671 @default.
- W1998974009 hasConceptScore W1998974009C111472728 @default.
- W1998974009 hasConceptScore W1998974009C134306372 @default.
- W1998974009 hasConceptScore W1998974009C138885662 @default.
- W1998974009 hasConceptScore W1998974009C153180895 @default.
- W1998974009 hasConceptScore W1998974009C154945302 @default.
- W1998974009 hasConceptScore W1998974009C204321447 @default.
- W1998974009 hasConceptScore W1998974009C2776372474 @default.
- W1998974009 hasConceptScore W1998974009C2778121359 @default.
- W1998974009 hasConceptScore W1998974009C2781051154 @default.
- W1998974009 hasConceptScore W1998974009C33923547 @default.
- W1998974009 hasConceptScore W1998974009C41008148 @default.
- W1998974009 hasConceptScore W1998974009C41895202 @default.
- W1998974009 hasConceptScore W1998974009C77618280 @default.
- W1998974009 hasConceptScore W1998974009C83535845 @default.
- W1998974009 hasConceptScore W1998974009C89600930 @default.
- W1998974009 hasConceptScore W1998974009C90805587 @default.
- W1998974009 hasConceptScore W1998974009C98501671 @default.
- W1998974009 hasIssue "2" @default.