Matches in SemOpenAlex for { <https://semopenalex.org/work/W4221140362> ?p ?o ?g. }
Showing items 1 to 82 of
82
with 100 items per page.
- W4221140362 abstract "Social media data such as Twitter messages (tweets) pose a particular challenge to NLP systems because of their short, noisy, and colloquial nature. Tasks such as Named Entity Recognition (NER) and syntactic parsing require highly domain-matched training data for good performance. To date, there is no complete training corpus for both NER and syntactic analysis (e.g., part of speech tagging, dependency parsing) of tweets. While there are some publicly available annotated NLP datasets of tweets, they are only designed for individual tasks. In this study, we aim to create Tweebank-NER, an English NER corpus based on Tweebank V2 (TB2), train state-of-the-art (SOTA) Tweet NLP models on TB2, and release an NLP pipeline called Twitter-Stanza. We annotate named entities in TB2 using Amazon Mechanical Turk and measure the quality of our annotations. We train the Stanza pipeline on TB2 and compare with alternative NLP frameworks (e.g., FLAIR, spaCy) and transformer-based models. The Stanza tokenizer and lemmatizer achieve SOTA performance on TB2, while the Stanza NER tagger, part-of-speech (POS) tagger, and dependency parser achieve competitive performance against non-transformer models. The transformer-based models establish a strong baseline in Tweebank-NER and achieve the new SOTA performance in POS tagging and dependency parsing on TB2. We release the dataset and make both the Stanza pipeline and BERTweet-based models available off-the-shelf for use in future Tweet NLP research. Our source code, data, and pre-trained models are available at: url{https://github.com/social-machines/TweebankNLP}." @default.
- W4221140362 created "2022-04-03" @default.
- W4221140362 creator A5004281470 @default.
- W4221140362 creator A5018152054 @default.
- W4221140362 creator A5064910304 @default.
- W4221140362 creator A5081953757 @default.
- W4221140362 date "2022-01-18" @default.
- W4221140362 modified "2023-10-18" @default.
- W4221140362 title "Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis" @default.
- W4221140362 doi "https://doi.org/10.48550/arxiv.2201.07281" @default.
- W4221140362 hasPublicationYear "2022" @default.
- W4221140362 type Work @default.
- W4221140362 citedByCount "1" @default.
- W4221140362 countsByYear W42211403622022 @default.
- W4221140362 crossrefType "posted-content" @default.
- W4221140362 hasAuthorship W4221140362A5004281470 @default.
- W4221140362 hasAuthorship W4221140362A5018152054 @default.
- W4221140362 hasAuthorship W4221140362A5064910304 @default.
- W4221140362 hasAuthorship W4221140362A5081953757 @default.
- W4221140362 hasBestOaLocation W42211403621 @default.
- W4221140362 hasConcept C121332964 @default.
- W4221140362 hasConcept C136764020 @default.
- W4221140362 hasConcept C138885662 @default.
- W4221140362 hasConcept C154945302 @default.
- W4221140362 hasConcept C162324750 @default.
- W4221140362 hasConcept C164883195 @default.
- W4221140362 hasConcept C164913051 @default.
- W4221140362 hasConcept C165801399 @default.
- W4221140362 hasConcept C186644900 @default.
- W4221140362 hasConcept C187736073 @default.
- W4221140362 hasConcept C19768560 @default.
- W4221140362 hasConcept C199360897 @default.
- W4221140362 hasConcept C204321447 @default.
- W4221140362 hasConcept C23123220 @default.
- W4221140362 hasConcept C2776751804 @default.
- W4221140362 hasConcept C2779135771 @default.
- W4221140362 hasConcept C2780451532 @default.
- W4221140362 hasConcept C41008148 @default.
- W4221140362 hasConcept C41895202 @default.
- W4221140362 hasConcept C43521106 @default.
- W4221140362 hasConcept C518677369 @default.
- W4221140362 hasConcept C62520636 @default.
- W4221140362 hasConcept C66322947 @default.
- W4221140362 hasConceptScore W4221140362C121332964 @default.
- W4221140362 hasConceptScore W4221140362C136764020 @default.
- W4221140362 hasConceptScore W4221140362C138885662 @default.
- W4221140362 hasConceptScore W4221140362C154945302 @default.
- W4221140362 hasConceptScore W4221140362C162324750 @default.
- W4221140362 hasConceptScore W4221140362C164883195 @default.
- W4221140362 hasConceptScore W4221140362C164913051 @default.
- W4221140362 hasConceptScore W4221140362C165801399 @default.
- W4221140362 hasConceptScore W4221140362C186644900 @default.
- W4221140362 hasConceptScore W4221140362C187736073 @default.
- W4221140362 hasConceptScore W4221140362C19768560 @default.
- W4221140362 hasConceptScore W4221140362C199360897 @default.
- W4221140362 hasConceptScore W4221140362C204321447 @default.
- W4221140362 hasConceptScore W4221140362C23123220 @default.
- W4221140362 hasConceptScore W4221140362C2776751804 @default.
- W4221140362 hasConceptScore W4221140362C2779135771 @default.
- W4221140362 hasConceptScore W4221140362C2780451532 @default.
- W4221140362 hasConceptScore W4221140362C41008148 @default.
- W4221140362 hasConceptScore W4221140362C41895202 @default.
- W4221140362 hasConceptScore W4221140362C43521106 @default.
- W4221140362 hasConceptScore W4221140362C518677369 @default.
- W4221140362 hasConceptScore W4221140362C62520636 @default.
- W4221140362 hasConceptScore W4221140362C66322947 @default.
- W4221140362 hasLocation W42211403621 @default.
- W4221140362 hasOpenAccess W4221140362 @default.
- W4221140362 hasPrimaryLocation W42211403621 @default.
- W4221140362 hasRelatedWork W1592893681 @default.
- W4221140362 hasRelatedWork W1847370584 @default.
- W4221140362 hasRelatedWork W2020540721 @default.
- W4221140362 hasRelatedWork W2294376144 @default.
- W4221140362 hasRelatedWork W2511797247 @default.
- W4221140362 hasRelatedWork W2741097343 @default.
- W4221140362 hasRelatedWork W3035375600 @default.
- W4221140362 hasRelatedWork W3104453603 @default.
- W4221140362 hasRelatedWork W44286443 @default.
- W4221140362 hasRelatedWork W3024604895 @default.
- W4221140362 isParatext "false" @default.
- W4221140362 isRetracted "false" @default.
- W4221140362 workType "article" @default.