Matches in SemOpenAlex for { <https://semopenalex.org/work/W3003386358> ?p ?o ?g. }
- W3003386358 abstract "This paper is an effort to complement the contributions made by researchers working toward the inclusion of non-English languages in natural language processing studies. Two novel Hindi language resources have been created and released for public consumption. The first resource is a corpus consisting of nearly thousand pre-processed fictional and nonfictional texts spanning over hundred years. The second resource is an exhaustive list of stop lemmas created from 12 corpora across multiple domains, consisting of over 13 million words, from which more than 200,000 lemmas were generated, and 11 publicly available stop word lists comprising over 1000 words, from which nearly 400 unique lemmas were generated. This research lays emphasis on the use of stop lemmas instead of stop words owing to the presence of various, but not all morphological forms of a word in stop word lists, as opposed to the presence of only the root form of the word, from which variations could be derived if required. It was also observed that stop lemmas were more consistent across multiple sources as compared to stop words. In order to generate a stop lemma list, the parts of speech of the lemmas were investigated but rejected as it was found that there was no significant correlation between the rank of a word in the frequency list and its part of speech. The stop lemma list was assessed using a comparative method. A formal evaluation method is suggested as future work arising from this study." @default.
- W3003386358 created "2020-02-07" @default.
- W3003386358 creator A5010238856 @default.
- W3003386358 creator A5055903884 @default.
- W3003386358 creator A5068895642 @default.
- W3003386358 date "2020-02-01" @default.
- W3003386358 modified "2023-09-27" @default.
- W3003386358 title "Novel Language Resources for Hindi: An Aesthetics Text Corpus and a Comprehensive Stop Lemma List" @default.
- W3003386358 cites W1721182246 @default.
- W3003386358 cites W1814140194 @default.
- W3003386358 cites W2001348188 @default.
- W3003386358 cites W2097277954 @default.
- W3003386358 cites W2110474840 @default.
- W3003386358 cites W2121167884 @default.
- W3003386358 cites W2130428585 @default.
- W3003386358 cites W2168965629 @default.
- W3003386358 cites W2171770604 @default.
- W3003386358 cites W2472058251 @default.
- W3003386358 cites W2501092290 @default.
- W3003386358 cites W2508762758 @default.
- W3003386358 cites W2515218431 @default.
- W3003386358 cites W2536677016 @default.
- W3003386358 cites W2549869559 @default.
- W3003386358 cites W2806402810 @default.
- W3003386358 cites W2899606919 @default.
- W3003386358 hasPublicationYear "2020" @default.
- W3003386358 type Work @default.
- W3003386358 sameAs 3003386358 @default.
- W3003386358 citedByCount "1" @default.
- W3003386358 countsByYear W30033863582021 @default.
- W3003386358 crossrefType "posted-content" @default.
- W3003386358 hasAuthorship W3003386358A5010238856 @default.
- W3003386358 hasAuthorship W3003386358A5055903884 @default.
- W3003386358 hasAuthorship W3003386358A5068895642 @default.
- W3003386358 hasConcept C104317684 @default.
- W3003386358 hasConcept C112313634 @default.
- W3003386358 hasConcept C114614502 @default.
- W3003386358 hasConcept C123406163 @default.
- W3003386358 hasConcept C127716648 @default.
- W3003386358 hasConcept C138885662 @default.
- W3003386358 hasConcept C154945302 @default.
- W3003386358 hasConcept C164226766 @default.
- W3003386358 hasConcept C171078966 @default.
- W3003386358 hasConcept C185592680 @default.
- W3003386358 hasConcept C188082640 @default.
- W3003386358 hasConcept C188338183 @default.
- W3003386358 hasConcept C18903297 @default.
- W3003386358 hasConcept C204321447 @default.
- W3003386358 hasConcept C206345919 @default.
- W3003386358 hasConcept C2777759810 @default.
- W3003386358 hasConcept C31258907 @default.
- W3003386358 hasConcept C33923547 @default.
- W3003386358 hasConcept C34736171 @default.
- W3003386358 hasConcept C41008148 @default.
- W3003386358 hasConcept C41895202 @default.
- W3003386358 hasConcept C46757340 @default.
- W3003386358 hasConcept C519982507 @default.
- W3003386358 hasConcept C55493867 @default.
- W3003386358 hasConcept C86803240 @default.
- W3003386358 hasConcept C90805587 @default.
- W3003386358 hasConceptScore W3003386358C104317684 @default.
- W3003386358 hasConceptScore W3003386358C112313634 @default.
- W3003386358 hasConceptScore W3003386358C114614502 @default.
- W3003386358 hasConceptScore W3003386358C123406163 @default.
- W3003386358 hasConceptScore W3003386358C127716648 @default.
- W3003386358 hasConceptScore W3003386358C138885662 @default.
- W3003386358 hasConceptScore W3003386358C154945302 @default.
- W3003386358 hasConceptScore W3003386358C164226766 @default.
- W3003386358 hasConceptScore W3003386358C171078966 @default.
- W3003386358 hasConceptScore W3003386358C185592680 @default.
- W3003386358 hasConceptScore W3003386358C188082640 @default.
- W3003386358 hasConceptScore W3003386358C188338183 @default.
- W3003386358 hasConceptScore W3003386358C18903297 @default.
- W3003386358 hasConceptScore W3003386358C204321447 @default.
- W3003386358 hasConceptScore W3003386358C206345919 @default.
- W3003386358 hasConceptScore W3003386358C2777759810 @default.
- W3003386358 hasConceptScore W3003386358C31258907 @default.
- W3003386358 hasConceptScore W3003386358C33923547 @default.
- W3003386358 hasConceptScore W3003386358C34736171 @default.
- W3003386358 hasConceptScore W3003386358C41008148 @default.
- W3003386358 hasConceptScore W3003386358C41895202 @default.
- W3003386358 hasConceptScore W3003386358C46757340 @default.
- W3003386358 hasConceptScore W3003386358C519982507 @default.
- W3003386358 hasConceptScore W3003386358C55493867 @default.
- W3003386358 hasConceptScore W3003386358C86803240 @default.
- W3003386358 hasConceptScore W3003386358C90805587 @default.
- W3003386358 hasLocation W30033863581 @default.
- W3003386358 hasOpenAccess W3003386358 @default.
- W3003386358 hasPrimaryLocation W30033863581 @default.
- W3003386358 hasRelatedWork W120797962 @default.
- W3003386358 hasRelatedWork W1589953164 @default.
- W3003386358 hasRelatedWork W1645168958 @default.
- W3003386358 hasRelatedWork W1966201627 @default.
- W3003386358 hasRelatedWork W1977666781 @default.
- W3003386358 hasRelatedWork W2029115643 @default.
- W3003386358 hasRelatedWork W2188745269 @default.
- W3003386358 hasRelatedWork W2463925662 @default.
- W3003386358 hasRelatedWork W2478465355 @default.
- W3003386358 hasRelatedWork W2767483085 @default.
- W3003386358 hasRelatedWork W2992211973 @default.