SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W3204019825> ?p ?o ?g. }

Showing items 1 to 66 of 66 with 100 items per page.

W3204019825 abstract "Natural language processing (NLP) is an area of machine learning that has garnered a lot of attention in recent days due to the revolution in artificial intelligence, robotics, and smart devices. NLP focuses on training machines to understand and analyze various languages, extract meaningful information from those, translate from one language to another, correct grammar, predict the next word, complete a sentence, or even generate a completely new sentence from an existing corpus. A major challenge in NLP lies in training the model for obtaining high prediction accuracy since training needs a vast dataset. For widely used languages like English, there are many datasets available that can be used for NLP tasks like training a model and summarization but for languages like Bengali, which is only spoken primarily in South Asia, there is a dearth of big datasets which can be used to build a robust machine learning model. Therefore, NLP researchers who mainly work with the Bengali language will find an extensive, robust dataset incredibly useful for their NLP tasks involving the Bengali language. With this pressing issue in mind, this research work has prepared a dataset whose content is curated from social media, blogs, newspapers, wiki pages, and other similar resources. The amount of samples in this dataset is 19132010, and the length varies from 3 to 512 words. This dataset can easily be used to build any unsupervised machine learning model with an aim to performing necessary NLP tasks involving the Bengali language. Also, this research work is releasing two preprocessed version of this dataset that is especially suited for training both core machine learning-based and statistical-based model. As very few attempts have been made in this domain, keeping Bengali language researchers in mind, it is believed that the proposed dataset will significantly contribute to the Bengali machine learning and NLP community." @default.
W3204019825 created "2021-10-11" @default.
W3204019825 creator A5006491727 @default.
W3204019825 creator A5008347106 @default.
W3204019825 creator A5050202324 @default.
W3204019825 creator A5079597664 @default.
W3204019825 creator A5080689303 @default.
W3204019825 creator A5086971685 @default.
W3204019825 date "2021-09-02" @default.
W3204019825 modified "2023-09-27" @default.
W3204019825 title "BanglaLM: Data Mining based Bangla Corpus for Language Model Research" @default.
W3204019825 cites W1760821052 @default.
W3204019825 cites W2010595692 @default.
W3204019825 cites W2116405309 @default.
W3204019825 cites W22168010 @default.
W3204019825 cites W2317879529 @default.
W3204019825 cites W2800708042 @default.
W3204019825 cites W2903499587 @default.
W3204019825 cites W2997501009 @default.
W3204019825 doi "https://doi.org/10.1109/icirca51532.2021.9544818" @default.
W3204019825 hasPublicationYear "2021" @default.
W3204019825 type Work @default.
W3204019825 sameAs 3204019825 @default.
W3204019825 citedByCount "3" @default.
W3204019825 countsByYear W32040198252021 @default.
W3204019825 countsByYear W32040198252023 @default.
W3204019825 crossrefType "proceedings-article" @default.
W3204019825 hasAuthorship W3204019825A5006491727 @default.
W3204019825 hasAuthorship W3204019825A5008347106 @default.
W3204019825 hasAuthorship W3204019825A5050202324 @default.
W3204019825 hasAuthorship W3204019825A5079597664 @default.
W3204019825 hasAuthorship W3204019825A5080689303 @default.
W3204019825 hasAuthorship W3204019825A5086971685 @default.
W3204019825 hasConcept C137293760 @default.
W3204019825 hasConcept C154945302 @default.
W3204019825 hasConcept C170858558 @default.
W3204019825 hasConcept C19235068 @default.
W3204019825 hasConcept C204321447 @default.
W3204019825 hasConcept C2777530160 @default.
W3204019825 hasConcept C41008148 @default.
W3204019825 hasConcept C66402592 @default.
W3204019825 hasConceptScore W3204019825C137293760 @default.
W3204019825 hasConceptScore W3204019825C154945302 @default.
W3204019825 hasConceptScore W3204019825C170858558 @default.
W3204019825 hasConceptScore W3204019825C19235068 @default.
W3204019825 hasConceptScore W3204019825C204321447 @default.
W3204019825 hasConceptScore W3204019825C2777530160 @default.
W3204019825 hasConceptScore W3204019825C41008148 @default.
W3204019825 hasConceptScore W3204019825C66402592 @default.
W3204019825 hasLocation W32040198251 @default.
W3204019825 hasOpenAccess W3204019825 @default.
W3204019825 hasPrimaryLocation W32040198251 @default.
W3204019825 hasRelatedWork W10582454 @default.
W3204019825 hasRelatedWork W13336246 @default.
W3204019825 hasRelatedWork W1563810 @default.
W3204019825 hasRelatedWork W2060686 @default.
W3204019825 hasRelatedWork W3484117 @default.
W3204019825 hasRelatedWork W4867410 @default.
W3204019825 hasRelatedWork W5274612 @default.
W3204019825 hasRelatedWork W5473796 @default.
W3204019825 hasRelatedWork W978549 @default.
W3204019825 hasRelatedWork W13002482 @default.
W3204019825 isParatext "false" @default.
W3204019825 isRetracted "false" @default.
W3204019825 magId "3204019825" @default.
W3204019825 workType "article" @default.