Matches in SemOpenAlex for { <https://semopenalex.org/work/W4200551986> ?p ?o ?g. }
Showing items 1 to 72 of
72
with 100 items per page.
- W4200551986 abstract "Abstract Text clustering is the task of grouping a set of texts so that text in the same group will be more similar than those from a different group. The process of grouping text manually requires a significant amount of time and labor. Therefore, automation utilizing machine learning is necessary. The standard method used to represent textual data is Term Frequency Inverse Document Frequency (TFIDF). However, TFIDF cannot consider the position and context of a word in a sentence. Bidirectional Encoder Representation from Transformers (BERT) model can produce text representation that incorporates the position and context of a word in a sentence. This research analyzed the performance of the BERT model as data representation for text. Moreover, various feature extraction and normalization methods are also applied for the data representation of the BERT model. To examine the performances of BERT, we use four clustering algorithms, i.e., k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, and improved deep embedded clustering. Our simulations show that BERT outperforms the standard TFIDF method in 28 out of 36 metrics. Furthermore, different feature extraction and normalization produced varied performances. The usage of these feature extraction and normalization must be altered depending on the text clustering algorithm used." @default.
- W4200551986 created "2021-12-31" @default.
- W4200551986 creator A5039989474 @default.
- W4200551986 creator A5041208201 @default.
- W4200551986 creator A5052210284 @default.
- W4200551986 date "2021-09-30" @default.
- W4200551986 modified "2023-09-27" @default.
- W4200551986 title "The Performance of BERT as Data Representation of Text Clustering" @default.
- W4200551986 doi "https://doi.org/10.21203/rs.3.rs-940164/v1" @default.
- W4200551986 hasPublicationYear "2021" @default.
- W4200551986 type Work @default.
- W4200551986 citedByCount "0" @default.
- W4200551986 crossrefType "posted-content" @default.
- W4200551986 hasAuthorship W4200551986A5039989474 @default.
- W4200551986 hasAuthorship W4200551986A5041208201 @default.
- W4200551986 hasAuthorship W4200551986A5052210284 @default.
- W4200551986 hasBestOaLocation W42005519861 @default.
- W4200551986 hasConcept C111919701 @default.
- W4200551986 hasConcept C121332964 @default.
- W4200551986 hasConcept C124101348 @default.
- W4200551986 hasConcept C136886441 @default.
- W4200551986 hasConcept C144024400 @default.
- W4200551986 hasConcept C153180895 @default.
- W4200551986 hasConcept C154945302 @default.
- W4200551986 hasConcept C17212007 @default.
- W4200551986 hasConcept C177937566 @default.
- W4200551986 hasConcept C180505990 @default.
- W4200551986 hasConcept C19165224 @default.
- W4200551986 hasConcept C204321447 @default.
- W4200551986 hasConcept C2777530160 @default.
- W4200551986 hasConcept C2780288562 @default.
- W4200551986 hasConcept C41008148 @default.
- W4200551986 hasConcept C61797465 @default.
- W4200551986 hasConcept C62520636 @default.
- W4200551986 hasConcept C73555534 @default.
- W4200551986 hasConcept C81758059 @default.
- W4200551986 hasConceptScore W4200551986C111919701 @default.
- W4200551986 hasConceptScore W4200551986C121332964 @default.
- W4200551986 hasConceptScore W4200551986C124101348 @default.
- W4200551986 hasConceptScore W4200551986C136886441 @default.
- W4200551986 hasConceptScore W4200551986C144024400 @default.
- W4200551986 hasConceptScore W4200551986C153180895 @default.
- W4200551986 hasConceptScore W4200551986C154945302 @default.
- W4200551986 hasConceptScore W4200551986C17212007 @default.
- W4200551986 hasConceptScore W4200551986C177937566 @default.
- W4200551986 hasConceptScore W4200551986C180505990 @default.
- W4200551986 hasConceptScore W4200551986C19165224 @default.
- W4200551986 hasConceptScore W4200551986C204321447 @default.
- W4200551986 hasConceptScore W4200551986C2777530160 @default.
- W4200551986 hasConceptScore W4200551986C2780288562 @default.
- W4200551986 hasConceptScore W4200551986C41008148 @default.
- W4200551986 hasConceptScore W4200551986C61797465 @default.
- W4200551986 hasConceptScore W4200551986C62520636 @default.
- W4200551986 hasConceptScore W4200551986C73555534 @default.
- W4200551986 hasConceptScore W4200551986C81758059 @default.
- W4200551986 hasLocation W42005519861 @default.
- W4200551986 hasLocation W42005519862 @default.
- W4200551986 hasOpenAccess W4200551986 @default.
- W4200551986 hasPrimaryLocation W42005519861 @default.
- W4200551986 hasRelatedWork W159132833 @default.
- W4200551986 hasRelatedWork W1700031991 @default.
- W4200551986 hasRelatedWork W2094936398 @default.
- W4200551986 hasRelatedWork W2184440854 @default.
- W4200551986 hasRelatedWork W2240708491 @default.
- W4200551986 hasRelatedWork W2928550689 @default.
- W4200551986 hasRelatedWork W2941132005 @default.
- W4200551986 hasRelatedWork W3183300589 @default.
- W4200551986 hasRelatedWork W3191458944 @default.
- W4200551986 hasRelatedWork W4366123738 @default.
- W4200551986 isParatext "false" @default.
- W4200551986 isRetracted "false" @default.
- W4200551986 workType "article" @default.