Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385964553> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4385964553 abstract "Lemmatization is a Natural Language Processing (NLP) technique used to normalize text by changing morphological derivations of words to their root forms. It is used as a core pre-processing step in many NLP tasks including text indexing, information retrieval, and machine learning for NLP, among others. This paper pioneers the development of text lemmatization for the Somali language, a low-resource language with very limited or no prior effective adoption of NLP methods and datasets. We especially develop a lexicon and rule-based lemmatizer for Somali text, which is a starting point for a full-fledged Somali lemmatization system for various NLP tasks. With consideration of the language morphological rules, we have developed an initial lexicon of 1247 root words and 7173 derivationally related terms enriched with rules for lemmatizing words not present in the lexicon. We have tested the algorithm on 120 documents of various lengths including news articles, social media posts, and text messages. Our initial results demonstrate that the algorithm achieves an accuracy of 57% for relatively long documents (e.g. full news articles), 60.57% for news article extracts, and high accuracy of 95.87% for short texts such as social media messages." @default.
- W4385964553 created "2023-08-18" @default.
- W4385964553 creator A5004590250 @default.
- W4385964553 creator A5076232395 @default.
- W4385964553 date "2023-08-03" @default.
- W4385964553 modified "2023-10-01" @default.
- W4385964553 title "Lexicon and Rule-based Word Lemmatization Approach for the Somali Language" @default.
- W4385964553 doi "https://doi.org/10.48550/arxiv.2308.01785" @default.
- W4385964553 hasPublicationYear "2023" @default.
- W4385964553 type Work @default.
- W4385964553 citedByCount "0" @default.
- W4385964553 crossrefType "posted-content" @default.
- W4385964553 hasAuthorship W4385964553A5004590250 @default.
- W4385964553 hasAuthorship W4385964553A5076232395 @default.
- W4385964553 hasBestOaLocation W43859645531 @default.
- W4385964553 hasConcept C117884012 @default.
- W4385964553 hasConcept C137293760 @default.
- W4385964553 hasConcept C138885662 @default.
- W4385964553 hasConcept C154945302 @default.
- W4385964553 hasConcept C161831844 @default.
- W4385964553 hasConcept C171078966 @default.
- W4385964553 hasConcept C204321447 @default.
- W4385964553 hasConcept C2776831955 @default.
- W4385964553 hasConcept C2778121359 @default.
- W4385964553 hasConcept C41008148 @default.
- W4385964553 hasConcept C41895202 @default.
- W4385964553 hasConcept C66402592 @default.
- W4385964553 hasConceptScore W4385964553C117884012 @default.
- W4385964553 hasConceptScore W4385964553C137293760 @default.
- W4385964553 hasConceptScore W4385964553C138885662 @default.
- W4385964553 hasConceptScore W4385964553C154945302 @default.
- W4385964553 hasConceptScore W4385964553C161831844 @default.
- W4385964553 hasConceptScore W4385964553C171078966 @default.
- W4385964553 hasConceptScore W4385964553C204321447 @default.
- W4385964553 hasConceptScore W4385964553C2776831955 @default.
- W4385964553 hasConceptScore W4385964553C2778121359 @default.
- W4385964553 hasConceptScore W4385964553C41008148 @default.
- W4385964553 hasConceptScore W4385964553C41895202 @default.
- W4385964553 hasConceptScore W4385964553C66402592 @default.
- W4385964553 hasLocation W43859645531 @default.
- W4385964553 hasOpenAccess W4385964553 @default.
- W4385964553 hasPrimaryLocation W43859645531 @default.
- W4385964553 hasRelatedWork W1484312846 @default.
- W4385964553 hasRelatedWork W1994972134 @default.
- W4385964553 hasRelatedWork W2107817331 @default.
- W4385964553 hasRelatedWork W2344644918 @default.
- W4385964553 hasRelatedWork W2467206427 @default.
- W4385964553 hasRelatedWork W2965885965 @default.
- W4385964553 hasRelatedWork W3011677438 @default.
- W4385964553 hasRelatedWork W3153487575 @default.
- W4385964553 hasRelatedWork W3209080089 @default.
- W4385964553 hasRelatedWork W4226173368 @default.
- W4385964553 isParatext "false" @default.
- W4385964553 isRetracted "false" @default.
- W4385964553 workType "article" @default.