Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287705063> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4287705063 abstract "Processing of raw text is the crucial first step in text classification and sentiment analysis. However, text processing steps are often performed using off-the-shelf routines and pre-built word dictionaries without optimizing for domain, application, and context. This paper investigates the effect of seven text processing scenarios on a particular text domain (Twitter) and application (sentiment classification). Skip gram-based word embeddings are developed to include Twitter colloquial words, emojis, and hashtag keywords that are often removed for being unavailable in conventional literature corpora. Our experiments reveal negative effects on sentiment classification of two common text processing steps: 1) stop word removal and 2) averaging of word vectors to represent individual tweets. New effective steps for 1) including non-ASCII emoji characters, 2) measuring word importance from word embedding, 3) aggregating word vectors into a tweet embedding, and 4) developing linearly separable feature space have been proposed to optimize the sentiment classification pipeline. The best combination of text processing steps yields the highest average area under the curve (AUC) of 88.4 (+/-0.4) in classifying 14,640 tweets with three sentiment labels. Word selection from context-driven word embedding reveals that only the ten most important words in Tweets cumulatively yield over 98% of the maximum accuracy. Results demonstrate a means for data-driven selection of important words in tweet classification as opposed to using pre-built word dictionaries. The proposed tweet embedding is robust to and alleviates the need for several text processing steps." @default.
- W4287705063 created "2022-07-26" @default.
- W4287705063 creator A5000804818 @default.
- W4287705063 creator A5005819975 @default.
- W4287705063 creator A5009713952 @default.
- W4287705063 date "2020-07-25" @default.
- W4287705063 modified "2023-09-27" @default.
- W4287705063 title "Effect of Text Processing Steps on Twitter Sentiment Classification using Word Embedding" @default.
- W4287705063 doi "https://doi.org/10.48550/arxiv.2007.13027" @default.
- W4287705063 hasPublicationYear "2020" @default.
- W4287705063 type Work @default.
- W4287705063 citedByCount "0" @default.
- W4287705063 crossrefType "posted-content" @default.
- W4287705063 hasAuthorship W4287705063A5000804818 @default.
- W4287705063 hasAuthorship W4287705063A5005819975 @default.
- W4287705063 hasAuthorship W4287705063A5009713952 @default.
- W4287705063 hasBestOaLocation W42877050631 @default.
- W4287705063 hasConcept C123406163 @default.
- W4287705063 hasConcept C151730666 @default.
- W4287705063 hasConcept C154945302 @default.
- W4287705063 hasConcept C199360897 @default.
- W4287705063 hasConcept C204321447 @default.
- W4287705063 hasConcept C2524010 @default.
- W4287705063 hasConcept C2776461190 @default.
- W4287705063 hasConcept C2777462759 @default.
- W4287705063 hasConcept C2779343474 @default.
- W4287705063 hasConcept C2779500292 @default.
- W4287705063 hasConcept C28490314 @default.
- W4287705063 hasConcept C2983335612 @default.
- W4287705063 hasConcept C33923547 @default.
- W4287705063 hasConcept C41008148 @default.
- W4287705063 hasConcept C41608201 @default.
- W4287705063 hasConcept C43521106 @default.
- W4287705063 hasConcept C66402592 @default.
- W4287705063 hasConcept C81917197 @default.
- W4287705063 hasConcept C86803240 @default.
- W4287705063 hasConcept C90805587 @default.
- W4287705063 hasConceptScore W4287705063C123406163 @default.
- W4287705063 hasConceptScore W4287705063C151730666 @default.
- W4287705063 hasConceptScore W4287705063C154945302 @default.
- W4287705063 hasConceptScore W4287705063C199360897 @default.
- W4287705063 hasConceptScore W4287705063C204321447 @default.
- W4287705063 hasConceptScore W4287705063C2524010 @default.
- W4287705063 hasConceptScore W4287705063C2776461190 @default.
- W4287705063 hasConceptScore W4287705063C2777462759 @default.
- W4287705063 hasConceptScore W4287705063C2779343474 @default.
- W4287705063 hasConceptScore W4287705063C2779500292 @default.
- W4287705063 hasConceptScore W4287705063C28490314 @default.
- W4287705063 hasConceptScore W4287705063C2983335612 @default.
- W4287705063 hasConceptScore W4287705063C33923547 @default.
- W4287705063 hasConceptScore W4287705063C41008148 @default.
- W4287705063 hasConceptScore W4287705063C41608201 @default.
- W4287705063 hasConceptScore W4287705063C43521106 @default.
- W4287705063 hasConceptScore W4287705063C66402592 @default.
- W4287705063 hasConceptScore W4287705063C81917197 @default.
- W4287705063 hasConceptScore W4287705063C86803240 @default.
- W4287705063 hasConceptScore W4287705063C90805587 @default.
- W4287705063 hasLocation W42877050631 @default.
- W4287705063 hasOpenAccess W4287705063 @default.
- W4287705063 hasPrimaryLocation W42877050631 @default.
- W4287705063 hasRelatedWork W2335882425 @default.
- W4287705063 hasRelatedWork W2760392765 @default.
- W4287705063 hasRelatedWork W2896498353 @default.
- W4287705063 hasRelatedWork W2898264138 @default.
- W4287705063 hasRelatedWork W2952874106 @default.
- W4287705063 hasRelatedWork W3036348210 @default.
- W4287705063 hasRelatedWork W3044335045 @default.
- W4287705063 hasRelatedWork W3046869600 @default.
- W4287705063 hasRelatedWork W4210823838 @default.
- W4287705063 hasRelatedWork W4287705063 @default.
- W4287705063 isParatext "false" @default.
- W4287705063 isRetracted "false" @default.
- W4287705063 workType "article" @default.