Matches in SemOpenAlex for { <https://semopenalex.org/work/W3033372647> ?p ?o ?g. }
- W3033372647 endingPage "e0234214" @default.
- W3033372647 startingPage "e0234214" @default.
- W3033372647 abstract "Symbolic sequential data are produced in huge quantities in numerous contexts, such as text and speech data, biometrics, genomics, financial market indexes, music sheets, and online social media posts. In this paper, an unsupervised approach for the chunking of idiomatic units of sequential text data is presented. Text chunking refers to the task of splitting a string of textual information into non-overlapping groups of related units. This is a fundamental problem in numerous fields where understanding the relation between raw units of symbolic sequential data is relevant. Existing methods are based primarily on supervised and semi-supervised learning approaches; however, in this study, a novel unsupervised approach is proposed based on the existing concept of n-grams, which requires no labeled text as an input. The proposed methodology is applied to two natural language corpora: a Wall Street Journal corpus and a Twitter corpus. In both cases, the corpus length was increased gradually to measure the accuracy with a different number of unitary elements as inputs. Both corpora reveal improvements in accuracy proportional with increases in the number of tokens. For the Twitter corpus, the increase in accuracy follows a linear trend. The results show that the proposed methodology can achieve a higher accuracy with incremental usage. A future study will aim at designing an iterative system for the proposed methodology." @default.
- W3033372647 created "2020-06-12" @default.
- W3033372647 creator A5059229813 @default.
- W3033372647 creator A5059975108 @default.
- W3033372647 creator A5073309615 @default.
- W3033372647 date "2020-06-08" @default.
- W3033372647 modified "2023-09-26" @default.
- W3033372647 title "Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets" @default.
- W3033372647 cites W1991566372 @default.
- W3033372647 cites W1995449614 @default.
- W3033372647 cites W2040909025 @default.
- W3033372647 cites W2062331696 @default.
- W3033372647 cites W2074613462 @default.
- W3033372647 cites W2074790252 @default.
- W3033372647 cites W2075508846 @default.
- W3033372647 cites W2081228205 @default.
- W3033372647 cites W2086613190 @default.
- W3033372647 cites W2088672056 @default.
- W3033372647 cites W2091203480 @default.
- W3033372647 cites W2097125878 @default.
- W3033372647 cites W2098921539 @default.
- W3033372647 cites W2101711363 @default.
- W3033372647 cites W2129882630 @default.
- W3033372647 cites W2132103315 @default.
- W3033372647 cites W2135843243 @default.
- W3033372647 cites W2143017621 @default.
- W3033372647 cites W2143296986 @default.
- W3033372647 cites W2156515921 @default.
- W3033372647 cites W2251329024 @default.
- W3033372647 cites W2252211741 @default.
- W3033372647 cites W2295297373 @default.
- W3033372647 cites W2481544931 @default.
- W3033372647 cites W2790949759 @default.
- W3033372647 cites W2793909381 @default.
- W3033372647 cites W2891602716 @default.
- W3033372647 cites W2901784470 @default.
- W3033372647 cites W2902306948 @default.
- W3033372647 cites W2948947170 @default.
- W3033372647 cites W2953641512 @default.
- W3033372647 cites W2962739339 @default.
- W3033372647 cites W2963369167 @default.
- W3033372647 cites W2963419157 @default.
- W3033372647 cites W2963563735 @default.
- W3033372647 cites W2963706742 @default.
- W3033372647 cites W2969545244 @default.
- W3033372647 cites W3012488989 @default.
- W3033372647 cites W3099937403 @default.
- W3033372647 cites W3102012802 @default.
- W3033372647 cites W3102424179 @default.
- W3033372647 cites W4206415992 @default.
- W3033372647 doi "https://doi.org/10.1371/journal.pone.0234214" @default.
- W3033372647 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/7790378" @default.
- W3033372647 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/33411738" @default.
- W3033372647 hasPublicationYear "2020" @default.
- W3033372647 type Work @default.
- W3033372647 sameAs 3033372647 @default.
- W3033372647 citedByCount "5" @default.
- W3033372647 countsByYear W30333726472020 @default.
- W3033372647 countsByYear W30333726472021 @default.
- W3033372647 countsByYear W30333726472022 @default.
- W3033372647 crossrefType "journal-article" @default.
- W3033372647 hasAuthorship W3033372647A5059229813 @default.
- W3033372647 hasAuthorship W3033372647A5059975108 @default.
- W3033372647 hasAuthorship W3033372647A5073309615 @default.
- W3033372647 hasBestOaLocation W30333726471 @default.
- W3033372647 hasConcept C117884012 @default.
- W3033372647 hasConcept C137293760 @default.
- W3033372647 hasConcept C154945302 @default.
- W3033372647 hasConcept C161369605 @default.
- W3033372647 hasConcept C195324797 @default.
- W3033372647 hasConcept C203357204 @default.
- W3033372647 hasConcept C204321447 @default.
- W3033372647 hasConcept C28490314 @default.
- W3033372647 hasConcept C41008148 @default.
- W3033372647 hasConcept C523546767 @default.
- W3033372647 hasConcept C54355233 @default.
- W3033372647 hasConcept C86803240 @default.
- W3033372647 hasConceptScore W3033372647C117884012 @default.
- W3033372647 hasConceptScore W3033372647C137293760 @default.
- W3033372647 hasConceptScore W3033372647C154945302 @default.
- W3033372647 hasConceptScore W3033372647C161369605 @default.
- W3033372647 hasConceptScore W3033372647C195324797 @default.
- W3033372647 hasConceptScore W3033372647C203357204 @default.
- W3033372647 hasConceptScore W3033372647C204321447 @default.
- W3033372647 hasConceptScore W3033372647C28490314 @default.
- W3033372647 hasConceptScore W3033372647C41008148 @default.
- W3033372647 hasConceptScore W3033372647C523546767 @default.
- W3033372647 hasConceptScore W3033372647C54355233 @default.
- W3033372647 hasConceptScore W3033372647C86803240 @default.
- W3033372647 hasIssue "6" @default.
- W3033372647 hasLocation W30333726471 @default.
- W3033372647 hasLocation W30333726472 @default.
- W3033372647 hasLocation W30333726473 @default.
- W3033372647 hasOpenAccess W3033372647 @default.
- W3033372647 hasPrimaryLocation W30333726471 @default.
- W3033372647 hasRelatedWork W1542956019 @default.
- W3033372647 hasRelatedWork W1563618553 @default.
- W3033372647 hasRelatedWork W1649222155 @default.