Matches in SemOpenAlex for { <https://semopenalex.org/work/W3099919888> ?p ?o ?g. }
- W3099919888 abstract "In this paper, we introduce NLP resources for 11 major Indian languages from two major language families. These resources include: (a) large-scale sentence-level monolingual corpora, (b) pre-trained word embeddings, (c) pre-trained language models, and (d) multiple NLU evaluation datasets (IndicGLUE benchmark). The monolingual corpora contains a total of 8.8 billion tokens across all 11 languages and Indian English, primarily sourced from news crawls. The word embeddings are based on FastText, hence suitable for handling morphological complexity of Indian languages. The pre-trained language models are based on the compact ALBERT model. Lastly, we compile the (IndicGLUE benchmark for Indian language NLU. To this end, we create datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA. We also include publicly available datasets for some Indic languages for tasks like Named Entity Recognition, Cross-lingual Sentence Retrieval, Paraphrase detection, etc. Our embeddings are competitive or better than existing pre-trained embeddings on multiple tasks. We hope that the availability of the dataset will accelerate Indic NLP research which has the potential to impact more than a billion people. It can also help the community in evaluating advances in NLP over a more diverse pool of languages. The data and models are available at https://indicnlp.ai4bharat.org." @default.
- W3099919888 created "2020-11-23" @default.
- W3099919888 creator A5026375250 @default.
- W3099919888 creator A5031343973 @default.
- W3099919888 creator A5031718651 @default.
- W3099919888 creator A5050036814 @default.
- W3099919888 creator A5052909911 @default.
- W3099919888 creator A5081984451 @default.
- W3099919888 creator A5087781473 @default.
- W3099919888 date "2020-01-01" @default.
- W3099919888 modified "2023-10-17" @default.
- W3099919888 title "IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages" @default.
- W3099919888 cites W1599016936 @default.
- W3099919888 cites W1614298861 @default.
- W3099919888 cites W2120101509 @default.
- W3099919888 cites W2153579005 @default.
- W3099919888 cites W2250539671 @default.
- W3099919888 cites W2250600805 @default.
- W3099919888 cites W2250653840 @default.
- W3099919888 cites W2252106004 @default.
- W3099919888 cites W2316579313 @default.
- W3099919888 cites W2481265265 @default.
- W3099919888 cites W2493916176 @default.
- W3099919888 cites W2576516426 @default.
- W3099919888 cites W2586756071 @default.
- W3099919888 cites W2739853973 @default.
- W3099919888 cites W2790325757 @default.
- W3099919888 cites W2806157494 @default.
- W3099919888 cites W2888740011 @default.
- W3099919888 cites W2948902769 @default.
- W3099919888 cites W2962739339 @default.
- W3099919888 cites W2963118869 @default.
- W3099919888 cites W2963216505 @default.
- W3099919888 cites W2963250244 @default.
- W3099919888 cites W2963310665 @default.
- W3099919888 cites W2963341956 @default.
- W3099919888 cites W2963667932 @default.
- W3099919888 cites W2964583233 @default.
- W3099919888 cites W2965373594 @default.
- W3099919888 cites W2970476646 @default.
- W3099919888 cites W2971324494 @default.
- W3099919888 cites W2987270981 @default.
- W3099919888 cites W2990704537 @default.
- W3099919888 cites W2996428491 @default.
- W3099919888 cites W3028807187 @default.
- W3099919888 cites W3031586918 @default.
- W3099919888 cites W3032532958 @default.
- W3099919888 cites W3032816972 @default.
- W3099919888 cites W3035390927 @default.
- W3099919888 cites W3045462440 @default.
- W3099919888 cites W95183648 @default.
- W3099919888 doi "https://doi.org/10.18653/v1/2020.findings-emnlp.445" @default.
- W3099919888 hasPublicationYear "2020" @default.
- W3099919888 type Work @default.
- W3099919888 sameAs 3099919888 @default.
- W3099919888 citedByCount "145" @default.
- W3099919888 countsByYear W30999198882020 @default.
- W3099919888 countsByYear W30999198882021 @default.
- W3099919888 countsByYear W30999198882022 @default.
- W3099919888 countsByYear W30999198882023 @default.
- W3099919888 crossrefType "proceedings-article" @default.
- W3099919888 hasAuthorship W3099919888A5026375250 @default.
- W3099919888 hasAuthorship W3099919888A5031343973 @default.
- W3099919888 hasAuthorship W3099919888A5031718651 @default.
- W3099919888 hasAuthorship W3099919888A5050036814 @default.
- W3099919888 hasAuthorship W3099919888A5052909911 @default.
- W3099919888 hasAuthorship W3099919888A5081984451 @default.
- W3099919888 hasAuthorship W3099919888A5087781473 @default.
- W3099919888 hasBestOaLocation W30999198881 @default.
- W3099919888 hasConcept C13280743 @default.
- W3099919888 hasConcept C137293760 @default.
- W3099919888 hasConcept C138885662 @default.
- W3099919888 hasConcept C154945302 @default.
- W3099919888 hasConcept C185798385 @default.
- W3099919888 hasConcept C204321447 @default.
- W3099919888 hasConcept C205649164 @default.
- W3099919888 hasConcept C2777530160 @default.
- W3099919888 hasConcept C2778689934 @default.
- W3099919888 hasConcept C2780922921 @default.
- W3099919888 hasConcept C41008148 @default.
- W3099919888 hasConcept C41895202 @default.
- W3099919888 hasConcept C90805587 @default.
- W3099919888 hasConceptScore W3099919888C13280743 @default.
- W3099919888 hasConceptScore W3099919888C137293760 @default.
- W3099919888 hasConceptScore W3099919888C138885662 @default.
- W3099919888 hasConceptScore W3099919888C154945302 @default.
- W3099919888 hasConceptScore W3099919888C185798385 @default.
- W3099919888 hasConceptScore W3099919888C204321447 @default.
- W3099919888 hasConceptScore W3099919888C205649164 @default.
- W3099919888 hasConceptScore W3099919888C2777530160 @default.
- W3099919888 hasConceptScore W3099919888C2778689934 @default.
- W3099919888 hasConceptScore W3099919888C2780922921 @default.
- W3099919888 hasConceptScore W3099919888C41008148 @default.
- W3099919888 hasConceptScore W3099919888C41895202 @default.
- W3099919888 hasConceptScore W3099919888C90805587 @default.
- W3099919888 hasLocation W30999198881 @default.
- W3099919888 hasOpenAccess W3099919888 @default.
- W3099919888 hasPrimaryLocation W30999198881 @default.
- W3099919888 hasRelatedWork W1973985309 @default.
- W3099919888 hasRelatedWork W2741632991 @default.