Matches in SemOpenAlex for { <https://semopenalex.org/work/W3080232144> ?p ?o ?g. }
Showing items 1 to 92 of
92
with 100 items per page.
- W3080232144 endingPage "144" @default.
- W3080232144 startingPage "144" @default.
- W3080232144 abstract "The paper is dedicated to solving the problem of optimal text classification in the area of automated detection of typology of texts. In conventional approaches to topicality-based text classification (including topic modeling), the number of clusters is to be set up by the scholar, and the optimal number of clusters, as well as the quality of the model that designates proximity of texts to each other, remain unresolved questions. We propose a novel approach to the automated definition of the optimal number of clusters that also incorporates an assessment of word proximity of texts, combined with text encoding model that is based on the system of sentence embeddings. Our approach combines Universal Sentence Encoder (USE) data pre-processing, agglomerative hierarchical clustering by Ward’s method, and the Markov stopping moment for optimal clustering. The preferred number of clusters is determined based on the “e-2” hypothesis. We set up an experiment on two datasets of real-world labeled data: News20 and BBC. The proposed model is tested against more traditional text representation methods, like bag-of-words and word2vec, to show that it provides a much better-resulting quality than the baseline DBSCAN and OPTICS models with different encoding methods. We use three quality metrics to demonstrate that clustering quality does not drop when the number of clusters grows. Thus, we get close to the convergence of text clustering and text classification." @default.
- W3080232144 created "2020-09-01" @default.
- W3080232144 creator A5018171102 @default.
- W3080232144 creator A5020072191 @default.
- W3080232144 creator A5048335742 @default.
- W3080232144 creator A5067606425 @default.
- W3080232144 creator A5079672944 @default.
- W3080232144 date "2020-08-26" @default.
- W3080232144 modified "2023-09-27" @default.
- W3080232144 title "Topic Detection Based on Sentence Embeddings and Agglomerative Clustering with Markov Moment" @default.
- W3080232144 cites W1987971958 @default.
- W3080232144 cites W2006610032 @default.
- W3080232144 cites W2341256577 @default.
- W3080232144 cites W2493916176 @default.
- W3080232144 cites W2801797627 @default.
- W3080232144 cites W2808079449 @default.
- W3080232144 cites W2911642618 @default.
- W3080232144 cites W2936143904 @default.
- W3080232144 cites W2967857541 @default.
- W3080232144 cites W4250191879 @default.
- W3080232144 cites W4254912674 @default.
- W3080232144 doi "https://doi.org/10.3390/fi12090144" @default.
- W3080232144 hasPublicationYear "2020" @default.
- W3080232144 type Work @default.
- W3080232144 sameAs 3080232144 @default.
- W3080232144 citedByCount "14" @default.
- W3080232144 countsByYear W30802321442020 @default.
- W3080232144 countsByYear W30802321442021 @default.
- W3080232144 countsByYear W30802321442022 @default.
- W3080232144 countsByYear W30802321442023 @default.
- W3080232144 crossrefType "journal-article" @default.
- W3080232144 hasAuthorship W3080232144A5018171102 @default.
- W3080232144 hasAuthorship W3080232144A5020072191 @default.
- W3080232144 hasAuthorship W3080232144A5048335742 @default.
- W3080232144 hasAuthorship W3080232144A5067606425 @default.
- W3080232144 hasAuthorship W3080232144A5079672944 @default.
- W3080232144 hasBestOaLocation W30802321441 @default.
- W3080232144 hasConcept C124101348 @default.
- W3080232144 hasConcept C153180895 @default.
- W3080232144 hasConcept C154945302 @default.
- W3080232144 hasConcept C177264268 @default.
- W3080232144 hasConcept C177937566 @default.
- W3080232144 hasConcept C199360897 @default.
- W3080232144 hasConcept C204321447 @default.
- W3080232144 hasConcept C2524010 @default.
- W3080232144 hasConcept C2776461190 @default.
- W3080232144 hasConcept C2777530160 @default.
- W3080232144 hasConcept C33923547 @default.
- W3080232144 hasConcept C41008148 @default.
- W3080232144 hasConcept C41608201 @default.
- W3080232144 hasConcept C73555534 @default.
- W3080232144 hasConcept C90805587 @default.
- W3080232144 hasConcept C92835128 @default.
- W3080232144 hasConceptScore W3080232144C124101348 @default.
- W3080232144 hasConceptScore W3080232144C153180895 @default.
- W3080232144 hasConceptScore W3080232144C154945302 @default.
- W3080232144 hasConceptScore W3080232144C177264268 @default.
- W3080232144 hasConceptScore W3080232144C177937566 @default.
- W3080232144 hasConceptScore W3080232144C199360897 @default.
- W3080232144 hasConceptScore W3080232144C204321447 @default.
- W3080232144 hasConceptScore W3080232144C2524010 @default.
- W3080232144 hasConceptScore W3080232144C2776461190 @default.
- W3080232144 hasConceptScore W3080232144C2777530160 @default.
- W3080232144 hasConceptScore W3080232144C33923547 @default.
- W3080232144 hasConceptScore W3080232144C41008148 @default.
- W3080232144 hasConceptScore W3080232144C41608201 @default.
- W3080232144 hasConceptScore W3080232144C73555534 @default.
- W3080232144 hasConceptScore W3080232144C90805587 @default.
- W3080232144 hasConceptScore W3080232144C92835128 @default.
- W3080232144 hasFunder F4320311239 @default.
- W3080232144 hasIssue "9" @default.
- W3080232144 hasLocation W30802321441 @default.
- W3080232144 hasLocation W30802321442 @default.
- W3080232144 hasOpenAccess W3080232144 @default.
- W3080232144 hasPrimaryLocation W30802321441 @default.
- W3080232144 hasRelatedWork W1558697903 @default.
- W3080232144 hasRelatedWork W2184440854 @default.
- W3080232144 hasRelatedWork W2282393731 @default.
- W3080232144 hasRelatedWork W2408387521 @default.
- W3080232144 hasRelatedWork W2985498865 @default.
- W3080232144 hasRelatedWork W2995505879 @default.
- W3080232144 hasRelatedWork W3107535086 @default.
- W3080232144 hasRelatedWork W3217529043 @default.
- W3080232144 hasRelatedWork W4226410418 @default.
- W3080232144 hasRelatedWork W4321146132 @default.
- W3080232144 hasVolume "12" @default.
- W3080232144 isParatext "false" @default.
- W3080232144 isRetracted "false" @default.
- W3080232144 magId "3080232144" @default.
- W3080232144 workType "article" @default.