Matches in SemOpenAlex for { <https://semopenalex.org/work/W3081260973> ?p ?o ?g. }
- W3081260973 endingPage "103542" @default.
- W3081260973 startingPage "103542" @default.
- W3081260973 abstract "This study aims at realizing unsupervised term discovery in Chinese electronic health records (EHRs) by using the word segmentation technique. The existing supervised algorithms do not perform satisfactorily in the case of EHRs, as annotated medical data are scarce. We propose an unsupervised segmentation method (GTS) based on the graph partition principle, whose multi-granular segmentation capability can help realize efficient term discovery. A sentence is converted to an undirected graph, with the edge weights based on n-gram statistics, and ratio cut is used to split the sentence into words. The graph partition is solved efficiently via dynamic programming, and multi-granularity is realized by setting different partition numbers. A BERT-based discriminator is trained using generated samples to verify the correctness of the word boundaries. The words that are not recorded in existing dictionaries are retained as potential medical terms. We compared the GTS approach with mature segmentation systems for both word segmentation and term discovery. MD students manually segmented Chinese EHRs at fine and coarse granularity levels and reviewed the term discovery results. The proposed unsupervised method outperformed all the competing algorithms in the word segmentation task. In term discovery, GTS outperformed the best baseline by 17 percentage points (a 47% relative percentage of increment) on F1-score. In the absence of annotated training data, the graph partition technique can effectively use the corpus statistics and even expert knowledge to realize unsupervised word segmentation of EHRs. Multi-granular segmentation can be used to provide potential medical terms of various lengths with high accuracy." @default.
- W3081260973 created "2020-09-01" @default.
- W3081260973 creator A5009668529 @default.
- W3081260973 creator A5020682113 @default.
- W3081260973 creator A5039861476 @default.
- W3081260973 creator A5040501126 @default.
- W3081260973 creator A5042384959 @default.
- W3081260973 creator A5062834242 @default.
- W3081260973 creator A5073779383 @default.
- W3081260973 date "2020-10-01" @default.
- W3081260973 modified "2023-10-17" @default.
- W3081260973 title "Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition" @default.
- W3081260973 cites W1970820654 @default.
- W3081260973 cites W1979145089 @default.
- W3081260973 cites W1982498087 @default.
- W3081260973 cites W2036516910 @default.
- W3081260973 cites W2057913811 @default.
- W3081260973 cites W2093424574 @default.
- W3081260973 cites W2105637130 @default.
- W3081260973 cites W2120718680 @default.
- W3081260973 cites W2122402213 @default.
- W3081260973 cites W2125531986 @default.
- W3081260973 cites W2126377586 @default.
- W3081260973 cites W2132914434 @default.
- W3081260973 cites W2141099517 @default.
- W3081260973 cites W2145905222 @default.
- W3081260973 cites W2146089916 @default.
- W3081260973 cites W2159583324 @default.
- W3081260973 cites W2165345215 @default.
- W3081260973 cites W2190421341 @default.
- W3081260973 cites W2250739653 @default.
- W3081260973 cites W2296283641 @default.
- W3081260973 cites W2404901863 @default.
- W3081260973 cites W2520392019 @default.
- W3081260973 cites W2765693998 @default.
- W3081260973 cites W2769851464 @default.
- W3081260973 cites W2772987967 @default.
- W3081260973 cites W2891546148 @default.
- W3081260973 cites W2911462778 @default.
- W3081260973 cites W2962904552 @default.
- W3081260973 cites W2965414772 @default.
- W3081260973 cites W3115271704 @default.
- W3081260973 doi "https://doi.org/10.1016/j.jbi.2020.103542" @default.
- W3081260973 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/32853795" @default.
- W3081260973 hasPublicationYear "2020" @default.
- W3081260973 type Work @default.
- W3081260973 sameAs 3081260973 @default.
- W3081260973 citedByCount "15" @default.
- W3081260973 countsByYear W30812609732021 @default.
- W3081260973 countsByYear W30812609732022 @default.
- W3081260973 countsByYear W30812609732023 @default.
- W3081260973 crossrefType "journal-article" @default.
- W3081260973 hasAuthorship W3081260973A5009668529 @default.
- W3081260973 hasAuthorship W3081260973A5020682113 @default.
- W3081260973 hasAuthorship W3081260973A5039861476 @default.
- W3081260973 hasAuthorship W3081260973A5040501126 @default.
- W3081260973 hasAuthorship W3081260973A5042384959 @default.
- W3081260973 hasAuthorship W3081260973A5062834242 @default.
- W3081260973 hasAuthorship W3081260973A5073779383 @default.
- W3081260973 hasBestOaLocation W30812609731 @default.
- W3081260973 hasConcept C111919701 @default.
- W3081260973 hasConcept C11413529 @default.
- W3081260973 hasConcept C114614502 @default.
- W3081260973 hasConcept C121332964 @default.
- W3081260973 hasConcept C132525143 @default.
- W3081260973 hasConcept C153180895 @default.
- W3081260973 hasConcept C154945302 @default.
- W3081260973 hasConcept C177774035 @default.
- W3081260973 hasConcept C2777530160 @default.
- W3081260973 hasConcept C33923547 @default.
- W3081260973 hasConcept C41008148 @default.
- W3081260973 hasConcept C42812 @default.
- W3081260973 hasConcept C48903430 @default.
- W3081260973 hasConcept C55439883 @default.
- W3081260973 hasConcept C61797465 @default.
- W3081260973 hasConcept C62520636 @default.
- W3081260973 hasConcept C80444323 @default.
- W3081260973 hasConcept C89600930 @default.
- W3081260973 hasConcept C98501671 @default.
- W3081260973 hasConceptScore W3081260973C111919701 @default.
- W3081260973 hasConceptScore W3081260973C11413529 @default.
- W3081260973 hasConceptScore W3081260973C114614502 @default.
- W3081260973 hasConceptScore W3081260973C121332964 @default.
- W3081260973 hasConceptScore W3081260973C132525143 @default.
- W3081260973 hasConceptScore W3081260973C153180895 @default.
- W3081260973 hasConceptScore W3081260973C154945302 @default.
- W3081260973 hasConceptScore W3081260973C177774035 @default.
- W3081260973 hasConceptScore W3081260973C2777530160 @default.
- W3081260973 hasConceptScore W3081260973C33923547 @default.
- W3081260973 hasConceptScore W3081260973C41008148 @default.
- W3081260973 hasConceptScore W3081260973C42812 @default.
- W3081260973 hasConceptScore W3081260973C48903430 @default.
- W3081260973 hasConceptScore W3081260973C55439883 @default.
- W3081260973 hasConceptScore W3081260973C61797465 @default.
- W3081260973 hasConceptScore W3081260973C62520636 @default.
- W3081260973 hasConceptScore W3081260973C80444323 @default.
- W3081260973 hasConceptScore W3081260973C89600930 @default.
- W3081260973 hasConceptScore W3081260973C98501671 @default.