Matches in SemOpenAlex for { <https://semopenalex.org/work/W4375869806> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4375869806 abstract "With the advent of e-commerce platforms, reviews are crucial for customers to assess the credibility of a product. The star ratings do not always match the review text written by the customer. For example, a three star rating (out of five) may be incongruous with the review text, which may be more suitable for a five star review. A clustering approach can be used to relabel the correct star ratings by grouping the text reviews into individual groups. In this work, we explore the task of choosing different text embeddings to represent these reviews and also explore the impact the embedding choice has on the performance of various classes of clustering algorithms. We use contextual (BERT) and non-contextual (Word2Vec) text embeddings to represent the text and measure their impact of three classes on clustering algorithms - partitioning based (KMeans), single linkage agglomerative hierarchical, and density based (DBSCAN and HDBSCAN), each with various experimental settings. We use the silhouette score, adjusted rand index score, and cluster purity score metrics to evaluate the performance of the algorithms and discuss the impact of different embeddings on the clustering performance. Our results indicate that the type of embedding chosen drastically affects the performance of the algorithm, the performance varies greatly across different types of clustering algorithms, no embedding type is better than the other, and DBSCAN outperforms KMeans and single linkage agglomerative clustering but also labels more data points as outliers. We provide a thorough comparison of the performances of different algorithms and provide numerous ideas to foster further research in the domain of text clustering." @default.
- W4375869806 created "2023-05-10" @default.
- W4375869806 creator A5054410152 @default.
- W4375869806 date "2023-05-04" @default.
- W4375869806 modified "2023-09-24" @default.
- W4375869806 title "Influence of various text embeddings on clustering performance in NLP" @default.
- W4375869806 doi "https://doi.org/10.48550/arxiv.2305.03144" @default.
- W4375869806 hasPublicationYear "2023" @default.
- W4375869806 type Work @default.
- W4375869806 citedByCount "0" @default.
- W4375869806 crossrefType "posted-content" @default.
- W4375869806 hasAuthorship W4375869806A5054410152 @default.
- W4375869806 hasBestOaLocation W43758698061 @default.
- W4375869806 hasConcept C119857082 @default.
- W4375869806 hasConcept C124101348 @default.
- W4375869806 hasConcept C154945302 @default.
- W4375869806 hasConcept C17212007 @default.
- W4375869806 hasConcept C204321447 @default.
- W4375869806 hasConcept C23123220 @default.
- W4375869806 hasConcept C2776461190 @default.
- W4375869806 hasConcept C33704608 @default.
- W4375869806 hasConcept C41008148 @default.
- W4375869806 hasConcept C41608201 @default.
- W4375869806 hasConcept C46576248 @default.
- W4375869806 hasConcept C73555534 @default.
- W4375869806 hasConcept C92835128 @default.
- W4375869806 hasConceptScore W4375869806C119857082 @default.
- W4375869806 hasConceptScore W4375869806C124101348 @default.
- W4375869806 hasConceptScore W4375869806C154945302 @default.
- W4375869806 hasConceptScore W4375869806C17212007 @default.
- W4375869806 hasConceptScore W4375869806C204321447 @default.
- W4375869806 hasConceptScore W4375869806C23123220 @default.
- W4375869806 hasConceptScore W4375869806C2776461190 @default.
- W4375869806 hasConceptScore W4375869806C33704608 @default.
- W4375869806 hasConceptScore W4375869806C41008148 @default.
- W4375869806 hasConceptScore W4375869806C41608201 @default.
- W4375869806 hasConceptScore W4375869806C46576248 @default.
- W4375869806 hasConceptScore W4375869806C73555534 @default.
- W4375869806 hasConceptScore W4375869806C92835128 @default.
- W4375869806 hasLocation W43758698061 @default.
- W4375869806 hasOpenAccess W4375869806 @default.
- W4375869806 hasPrimaryLocation W43758698061 @default.
- W4375869806 hasRelatedWork W1595915502 @default.
- W4375869806 hasRelatedWork W1596996943 @default.
- W4375869806 hasRelatedWork W1978862868 @default.
- W4375869806 hasRelatedWork W2019134706 @default.
- W4375869806 hasRelatedWork W2157001754 @default.
- W4375869806 hasRelatedWork W2474073737 @default.
- W4375869806 hasRelatedWork W2574513950 @default.
- W4375869806 hasRelatedWork W2907290785 @default.
- W4375869806 hasRelatedWork W2998213281 @default.
- W4375869806 hasRelatedWork W3037830725 @default.
- W4375869806 isParatext "false" @default.
- W4375869806 isRetracted "false" @default.
- W4375869806 workType "article" @default.