Matches in SemOpenAlex for { <https://semopenalex.org/work/W3191380211> ?p ?o ?g. }
- W3191380211 endingPage "648" @default.
- W3191380211 startingPage "607" @default.
- W3191380211 abstract "Abstract The wide usage of multiple spoken Arabic dialects on social networking sites stimulates increasing interest in Natural Language Processing (NLP) for dialectal Arabic (DA). Arabic dialects represent true linguistic diversity and differ from modern standard Arabic (MSA). In fact, the complexity and variety of these dialects make it insufficient to build one NLP system that is suitable for all of them. In comparison with MSA, the available datasets for various dialects are generally limited in terms of size, genre and scope. In this article, we present a novel approach that automatically develops an annotated country-level dialectal Arabic corpus and builds lists of words that encompass 15 Arabic dialects. The algorithm uses an iterative procedure consisting of two main components: automatic creation of lists for dialectal words and automatic creation of annotated Arabic dialect identification corpus. To our knowledge, our study is the first of its kind to examine and analyse the poor performance of the MSA part-of-speech tagger on dialectal Arabic contents and to exploit that in order to extract the dialectal words. The pointwise mutual information association measure and the geographical frequency of word occurrence online are used to classify dialectal words. The annotated dialectal Arabic corpus (Twt15DA), built using our algorithm, is collected from Twitter and consists of 311,785 tweets containing 3,858,459 words in total. We randomly selected a sample of 75 tweets per country, 1125 tweets in total, and conducted a manual dialect identification task by native speakers. The results show an average inter-annotator agreement score equal to 64%, which reflects satisfactory agreement considering the overlapping features of the 15 Arabic dialects." @default.
- W3191380211 created "2021-08-16" @default.
- W3191380211 creator A5049157081 @default.
- W3191380211 date "2021-08-09" @default.
- W3191380211 modified "2023-09-25" @default.
- W3191380211 title "Creation of annotated country-level dialectal Arabic resources: An unsupervised approach" @default.
- W3191380211 cites W105270443 @default.
- W3191380211 cites W1604955787 @default.
- W3191380211 cites W1934455055 @default.
- W3191380211 cites W1975755876 @default.
- W3191380211 cites W1977195462 @default.
- W3191380211 cites W1978174332 @default.
- W3191380211 cites W1993833738 @default.
- W3191380211 cites W1996430422 @default.
- W3191380211 cites W2003458432 @default.
- W3191380211 cites W200941853 @default.
- W3191380211 cites W201141796 @default.
- W3191380211 cites W2022414723 @default.
- W3191380211 cites W2043693083 @default.
- W3191380211 cites W2066682493 @default.
- W3191380211 cites W2103195385 @default.
- W3191380211 cites W2104463314 @default.
- W3191380211 cites W2147272182 @default.
- W3191380211 cites W2160802179 @default.
- W3191380211 cites W2164777277 @default.
- W3191380211 cites W2189831162 @default.
- W3191380211 cites W2250224884 @default.
- W3191380211 cites W2250358209 @default.
- W3191380211 cites W2251190179 @default.
- W3191380211 cites W2251259636 @default.
- W3191380211 cites W2251867433 @default.
- W3191380211 cites W2252067490 @default.
- W3191380211 cites W2293965775 @default.
- W3191380211 cites W2492815064 @default.
- W3191380211 cites W2560280095 @default.
- W3191380211 cites W2614862557 @default.
- W3191380211 cites W2729871481 @default.
- W3191380211 cites W2740432722 @default.
- W3191380211 cites W2767566483 @default.
- W3191380211 cites W2776928811 @default.
- W3191380211 cites W2884753007 @default.
- W3191380211 cites W289085066 @default.
- W3191380211 cites W2937743229 @default.
- W3191380211 cites W2970513828 @default.
- W3191380211 cites W4237289832 @default.
- W3191380211 cites W4240298715 @default.
- W3191380211 cites W785060174 @default.
- W3191380211 doi "https://doi.org/10.1017/s135132492100019x" @default.
- W3191380211 hasPublicationYear "2021" @default.
- W3191380211 type Work @default.
- W3191380211 sameAs 3191380211 @default.
- W3191380211 citedByCount "2" @default.
- W3191380211 countsByYear W31913802112022 @default.
- W3191380211 crossrefType "journal-article" @default.
- W3191380211 hasAuthorship W3191380211A5049157081 @default.
- W3191380211 hasConcept C116834253 @default.
- W3191380211 hasConcept C136197465 @default.
- W3191380211 hasConcept C138885662 @default.
- W3191380211 hasConcept C152139883 @default.
- W3191380211 hasConcept C154945302 @default.
- W3191380211 hasConcept C199360897 @default.
- W3191380211 hasConcept C204321447 @default.
- W3191380211 hasConcept C2776321320 @default.
- W3191380211 hasConcept C2778012447 @default.
- W3191380211 hasConcept C2778243841 @default.
- W3191380211 hasConcept C41008148 @default.
- W3191380211 hasConcept C41895202 @default.
- W3191380211 hasConcept C59822182 @default.
- W3191380211 hasConcept C7797323 @default.
- W3191380211 hasConcept C86803240 @default.
- W3191380211 hasConcept C96455323 @default.
- W3191380211 hasConceptScore W3191380211C116834253 @default.
- W3191380211 hasConceptScore W3191380211C136197465 @default.
- W3191380211 hasConceptScore W3191380211C138885662 @default.
- W3191380211 hasConceptScore W3191380211C152139883 @default.
- W3191380211 hasConceptScore W3191380211C154945302 @default.
- W3191380211 hasConceptScore W3191380211C199360897 @default.
- W3191380211 hasConceptScore W3191380211C204321447 @default.
- W3191380211 hasConceptScore W3191380211C2776321320 @default.
- W3191380211 hasConceptScore W3191380211C2778012447 @default.
- W3191380211 hasConceptScore W3191380211C2778243841 @default.
- W3191380211 hasConceptScore W3191380211C41008148 @default.
- W3191380211 hasConceptScore W3191380211C41895202 @default.
- W3191380211 hasConceptScore W3191380211C59822182 @default.
- W3191380211 hasConceptScore W3191380211C7797323 @default.
- W3191380211 hasConceptScore W3191380211C86803240 @default.
- W3191380211 hasConceptScore W3191380211C96455323 @default.
- W3191380211 hasIssue "5" @default.
- W3191380211 hasLocation W31913802111 @default.
- W3191380211 hasOpenAccess W3191380211 @default.
- W3191380211 hasPrimaryLocation W31913802111 @default.
- W3191380211 hasRelatedWork W2024903555 @default.
- W3191380211 hasRelatedWork W2043693083 @default.
- W3191380211 hasRelatedWork W2109704865 @default.
- W3191380211 hasRelatedWork W2137397532 @default.
- W3191380211 hasRelatedWork W2250358209 @default.
- W3191380211 hasRelatedWork W2571755499 @default.
- W3191380211 hasRelatedWork W3016716103 @default.