Matches in SemOpenAlex for { <https://semopenalex.org/work/W2991215715> ?p ?o ?g. }
Showing items 1 to 80 of
80
with 100 items per page.
- W2991215715 abstract "Representing words and phrases into dense vectors of real numbers which encode semantic and syntactic properties is a vital constituent in natural language processing (NLP). The success of neural network (NN) models in NLP largely rely on such dense word representations learned on the large unlabeled corpus. Sindhi is one of the rich morphological language, spoken by large population in Pakistan and India lacks corpora which plays an essential role of a test-bed for generating word embeddings and developing language independent NLP systems. In this paper, a large corpus of more than 61 million words is developed for low-resourced Sindhi language for training neural word embeddings. The corpus is acquired from multiple web-resources using web-scrappy. Due to the unavailability of open source preprocessing tools for Sindhi, the prepossessing of such large corpus becomes a challenging problem specially cleaning of noisy data extracted from web resources. Therefore, a preprocessing pipeline is employed for the filtration of noisy text. Afterwards, the cleaned vocabulary is utilized for training Sindhi word embeddings with state-of-the-art GloVe, Skip-Gram (SG), and Continuous Bag of Words (CBoW) word2vec algorithms. The intrinsic evaluation approach of cosine similarity matrix and WordSim-353 are employed for the evaluation of generated Sindhi word embeddings. Moreover, we compare the proposed word embeddings with recently revealed Sindhi fastText (SdfastText) word representations. Our intrinsic evaluation results demonstrate the high quality of our generated Sindhi word embeddings using SG, CBoW, and GloVe as compare to SdfastText word representations." @default.
- W2991215715 created "2019-12-05" @default.
- W2991215715 creator A5016123311 @default.
- W2991215715 creator A5016649343 @default.
- W2991215715 creator A5051227924 @default.
- W2991215715 creator A5072850680 @default.
- W2991215715 date "2019-11-28" @default.
- W2991215715 modified "2023-10-08" @default.
- W2991215715 title "A New Corpus for Low-Resourced Sindhi Language with Word Embeddings." @default.
- W2991215715 hasPublicationYear "2019" @default.
- W2991215715 type Work @default.
- W2991215715 sameAs 2991215715 @default.
- W2991215715 citedByCount "1" @default.
- W2991215715 countsByYear W29912157152020 @default.
- W2991215715 crossrefType "posted-content" @default.
- W2991215715 hasAuthorship W2991215715A5016123311 @default.
- W2991215715 hasAuthorship W2991215715A5016649343 @default.
- W2991215715 hasAuthorship W2991215715A5051227924 @default.
- W2991215715 hasAuthorship W2991215715A5072850680 @default.
- W2991215715 hasConcept C138885662 @default.
- W2991215715 hasConcept C144024400 @default.
- W2991215715 hasConcept C149923435 @default.
- W2991215715 hasConcept C153180895 @default.
- W2991215715 hasConcept C154945302 @default.
- W2991215715 hasConcept C204321447 @default.
- W2991215715 hasConcept C2524010 @default.
- W2991215715 hasConcept C2776461190 @default.
- W2991215715 hasConcept C2777601683 @default.
- W2991215715 hasConcept C2780762811 @default.
- W2991215715 hasConcept C2908647359 @default.
- W2991215715 hasConcept C33923547 @default.
- W2991215715 hasConcept C34736171 @default.
- W2991215715 hasConcept C41008148 @default.
- W2991215715 hasConcept C41608201 @default.
- W2991215715 hasConcept C41895202 @default.
- W2991215715 hasConcept C90805587 @default.
- W2991215715 hasConceptScore W2991215715C138885662 @default.
- W2991215715 hasConceptScore W2991215715C144024400 @default.
- W2991215715 hasConceptScore W2991215715C149923435 @default.
- W2991215715 hasConceptScore W2991215715C153180895 @default.
- W2991215715 hasConceptScore W2991215715C154945302 @default.
- W2991215715 hasConceptScore W2991215715C204321447 @default.
- W2991215715 hasConceptScore W2991215715C2524010 @default.
- W2991215715 hasConceptScore W2991215715C2776461190 @default.
- W2991215715 hasConceptScore W2991215715C2777601683 @default.
- W2991215715 hasConceptScore W2991215715C2780762811 @default.
- W2991215715 hasConceptScore W2991215715C2908647359 @default.
- W2991215715 hasConceptScore W2991215715C33923547 @default.
- W2991215715 hasConceptScore W2991215715C34736171 @default.
- W2991215715 hasConceptScore W2991215715C41008148 @default.
- W2991215715 hasConceptScore W2991215715C41608201 @default.
- W2991215715 hasConceptScore W2991215715C41895202 @default.
- W2991215715 hasConceptScore W2991215715C90805587 @default.
- W2991215715 hasLocation W29912157151 @default.
- W2991215715 hasOpenAccess W2991215715 @default.
- W2991215715 hasPrimaryLocation W29912157151 @default.
- W2991215715 hasRelatedWork W2501758317 @default.
- W2991215715 hasRelatedWork W2540218936 @default.
- W2991215715 hasRelatedWork W2573020085 @default.
- W2991215715 hasRelatedWork W2789889349 @default.
- W2991215715 hasRelatedWork W2805170835 @default.
- W2991215715 hasRelatedWork W2807629530 @default.
- W2991215715 hasRelatedWork W2903973694 @default.
- W2991215715 hasRelatedWork W2935663798 @default.
- W2991215715 hasRelatedWork W2949318522 @default.
- W2991215715 hasRelatedWork W2984250044 @default.
- W2991215715 hasRelatedWork W3011510812 @default.
- W2991215715 hasRelatedWork W3090900859 @default.
- W2991215715 hasRelatedWork W3113915881 @default.
- W2991215715 hasRelatedWork W3123905954 @default.
- W2991215715 hasRelatedWork W3130502449 @default.
- W2991215715 hasRelatedWork W3153132431 @default.
- W2991215715 hasRelatedWork W3159183446 @default.
- W2991215715 hasRelatedWork W3184609040 @default.
- W2991215715 hasRelatedWork W3206914982 @default.
- W2991215715 hasRelatedWork W2587051312 @default.
- W2991215715 isParatext "false" @default.
- W2991215715 isRetracted "false" @default.
- W2991215715 magId "2991215715" @default.
- W2991215715 workType "article" @default.