Matches in SemOpenAlex for { <https://semopenalex.org/work/W2166987457> ?p ?o ?g. }
- W2166987457 endingPage "41" @default.
- W2166987457 startingPage "1" @default.
- W2166987457 abstract "Among statistical approaches to Chinese word segmentation, the word-based n-gram ( generative ) model and the character-based tagging ( discriminative ) model are two dominant approaches in the literature. The former gives excellent performance for the in-vocabulary (IV) words; however, it handles out-of-vocabulary (OOV) words poorly. On the other hand, though the latter is more robust for OOV words, it fails to deliver satisfactory performance for IV words. These two approaches behave differently due to the unit they use (word vs. character) and the model form they adopt (generative vs. discriminative). In general, character-based approaches are more robust than word-based ones, as the vocabulary of characters is a closed set; and discriminative models are more robust than generative ones, since they can flexibly include all kinds of available information, such as future context. This article first proposes a character-based n -gram model to enhance the robustness of the generative approach. Then the proposed generative model is further integrated with the character-based discriminative model to take advantage of both approaches. Our experiments show that this integrated approach outperforms all the existing approaches reported in the literature. Afterwards, a complete and detailed error analysis is conducted. Since a significant portion of the critical errors is related to numerical/foreign strings, character-type information is then incorporated into the model to further improve its performance. Last, the proposed integrated approach is tested on cross-domain corpora, and a semi-supervised domain adaptation algorithm is proposed and shown to be effective in our experiments." @default.
- W2166987457 created "2016-06-24" @default.
- W2166987457 creator A5015785439 @default.
- W2166987457 creator A5065145414 @default.
- W2166987457 creator A5070399630 @default.
- W2166987457 date "2012-06-01" @default.
- W2166987457 modified "2023-09-26" @default.
- W2166987457 title "Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation" @default.
- W2166987457 cites W1971678616 @default.
- W2166987457 cites W1979145089 @default.
- W2166987457 cites W1982498087 @default.
- W2166987457 cites W1997129315 @default.
- W2166987457 cites W2008652694 @default.
- W2166987457 cites W2010576059 @default.
- W2166987457 cites W2032585622 @default.
- W2166987457 cites W2036516910 @default.
- W2166987457 cites W2040786972 @default.
- W2166987457 cites W2056250865 @default.
- W2166987457 cites W2086846715 @default.
- W2166987457 cites W2096175520 @default.
- W2166987457 cites W2108220507 @default.
- W2166987457 cites W2113242163 @default.
- W2166987457 cites W2116983617 @default.
- W2166987457 cites W2140016149 @default.
- W2166987457 cites W2156346614 @default.
- W2166987457 cites W2163568299 @default.
- W2166987457 cites W2170469979 @default.
- W2166987457 cites W2217873082 @default.
- W2166987457 doi "https://doi.org/10.1145/2184436.2184440" @default.
- W2166987457 hasPublicationYear "2012" @default.
- W2166987457 type Work @default.
- W2166987457 sameAs 2166987457 @default.
- W2166987457 citedByCount "12" @default.
- W2166987457 countsByYear W21669874572012 @default.
- W2166987457 countsByYear W21669874572013 @default.
- W2166987457 countsByYear W21669874572014 @default.
- W2166987457 countsByYear W21669874572015 @default.
- W2166987457 countsByYear W21669874572016 @default.
- W2166987457 countsByYear W21669874572017 @default.
- W2166987457 countsByYear W21669874572020 @default.
- W2166987457 countsByYear W21669874572023 @default.
- W2166987457 crossrefType "journal-article" @default.
- W2166987457 hasAuthorship W2166987457A5015785439 @default.
- W2166987457 hasAuthorship W2166987457A5065145414 @default.
- W2166987457 hasAuthorship W2166987457A5070399630 @default.
- W2166987457 hasConcept C104317684 @default.
- W2166987457 hasConcept C137293760 @default.
- W2166987457 hasConcept C138885662 @default.
- W2166987457 hasConcept C151730666 @default.
- W2166987457 hasConcept C153180895 @default.
- W2166987457 hasConcept C154945302 @default.
- W2166987457 hasConcept C167966045 @default.
- W2166987457 hasConcept C185592680 @default.
- W2166987457 hasConcept C204321447 @default.
- W2166987457 hasConcept C2524010 @default.
- W2166987457 hasConcept C2777601683 @default.
- W2166987457 hasConcept C2779343474 @default.
- W2166987457 hasConcept C2780861071 @default.
- W2166987457 hasConcept C28490314 @default.
- W2166987457 hasConcept C33923547 @default.
- W2166987457 hasConcept C39890363 @default.
- W2166987457 hasConcept C41008148 @default.
- W2166987457 hasConcept C41895202 @default.
- W2166987457 hasConcept C55493867 @default.
- W2166987457 hasConcept C63479239 @default.
- W2166987457 hasConcept C86803240 @default.
- W2166987457 hasConcept C89600930 @default.
- W2166987457 hasConcept C90805587 @default.
- W2166987457 hasConcept C97931131 @default.
- W2166987457 hasConceptScore W2166987457C104317684 @default.
- W2166987457 hasConceptScore W2166987457C137293760 @default.
- W2166987457 hasConceptScore W2166987457C138885662 @default.
- W2166987457 hasConceptScore W2166987457C151730666 @default.
- W2166987457 hasConceptScore W2166987457C153180895 @default.
- W2166987457 hasConceptScore W2166987457C154945302 @default.
- W2166987457 hasConceptScore W2166987457C167966045 @default.
- W2166987457 hasConceptScore W2166987457C185592680 @default.
- W2166987457 hasConceptScore W2166987457C204321447 @default.
- W2166987457 hasConceptScore W2166987457C2524010 @default.
- W2166987457 hasConceptScore W2166987457C2777601683 @default.
- W2166987457 hasConceptScore W2166987457C2779343474 @default.
- W2166987457 hasConceptScore W2166987457C2780861071 @default.
- W2166987457 hasConceptScore W2166987457C28490314 @default.
- W2166987457 hasConceptScore W2166987457C33923547 @default.
- W2166987457 hasConceptScore W2166987457C39890363 @default.
- W2166987457 hasConceptScore W2166987457C41008148 @default.
- W2166987457 hasConceptScore W2166987457C41895202 @default.
- W2166987457 hasConceptScore W2166987457C55493867 @default.
- W2166987457 hasConceptScore W2166987457C63479239 @default.
- W2166987457 hasConceptScore W2166987457C86803240 @default.
- W2166987457 hasConceptScore W2166987457C89600930 @default.
- W2166987457 hasConceptScore W2166987457C90805587 @default.
- W2166987457 hasConceptScore W2166987457C97931131 @default.
- W2166987457 hasFunder F4320321001 @default.
- W2166987457 hasIssue "2" @default.
- W2166987457 hasLocation W21669874571 @default.
- W2166987457 hasOpenAccess W2166987457 @default.
- W2166987457 hasPrimaryLocation W21669874571 @default.