Matches in SemOpenAlex for { <https://semopenalex.org/work/W3136221257> ?p ?o ?g. }
Showing items 1 to 99 of
99
with 100 items per page.
- W3136221257 abstract "India is a multilingual society with 1369 rationalized languages and dialects being spoken across the country (INDIA, 2011). Of these, the 22 scheduled languages have a staggering total of 1.17 billion speakers and 121 languages have more than 10,000 speakers (INDIA, 2011). India also has the second largest (and an ever growing) digital footprint (Statista, 2020). Despite this, today's state-of-the-art multilingual systems perform suboptimally on Indian (IN) languages. This can be explained by the fact that multilingual language models (LMs) are often trained on 100+ languages together, leading to a small representation of IN languages in their vocabulary and training data. Multilingual LMs are substantially less effective in resource-lean scenarios (Wu and Dredze, 2020; Lauscher et al., 2020), as limited data doesn't help capture the various nuances of a language. One also commonly observes IN language text transliterated to Latin or code-mixed with English, especially in informal settings (for example, on social media platforms) (Rijhwani et al., 2017). This phenomenon is not adequately handled by current state-of-the-art multilingual LMs. To address the aforementioned gaps, we propose MuRIL, a multilingual LM specifically built for IN languages. MuRIL is trained on significantly large amounts of IN text corpora only. We explicitly augment monolingual text corpora with both translated and transliterated document pairs, that serve as supervised cross-lingual signals in training. MuRIL significantly outperforms multilingual BERT (mBERT) on all tasks in the challenging cross-lingual XTREME benchmark (Hu et al., 2020). We also present results on transliterated (native to Latin script) test sets of the chosen datasets and demonstrate the efficacy of MuRIL in handling transliterated data." @default.
- W3136221257 created "2021-03-29" @default.
- W3136221257 creator A5006765506 @default.
- W3136221257 creator A5013522456 @default.
- W3136221257 creator A5013898359 @default.
- W3136221257 creator A5032345820 @default.
- W3136221257 creator A5033696194 @default.
- W3136221257 creator A5041736588 @default.
- W3136221257 creator A5049835802 @default.
- W3136221257 creator A5050999417 @default.
- W3136221257 creator A5052885801 @default.
- W3136221257 creator A5063681225 @default.
- W3136221257 creator A5065013252 @default.
- W3136221257 creator A5067304700 @default.
- W3136221257 creator A5086465246 @default.
- W3136221257 creator A5089407156 @default.
- W3136221257 date "2021-03-19" @default.
- W3136221257 modified "2023-09-27" @default.
- W3136221257 title "MuRIL: Multilingual Representations for Indian Languages." @default.
- W3136221257 cites W2121879602 @default.
- W3136221257 cites W2187509670 @default.
- W3136221257 cites W2270070752 @default.
- W3136221257 cites W2278252977 @default.
- W3136221257 cites W2740885753 @default.
- W3136221257 cites W2914120296 @default.
- W3136221257 cites W3000965575 @default.
- W3136221257 cites W3013840636 @default.
- W3136221257 cites W3029760648 @default.
- W3136221257 cites W3034469191 @default.
- W3136221257 cites W3035390927 @default.
- W3136221257 cites W3085479580 @default.
- W3136221257 cites W3100198908 @default.
- W3136221257 cites W3105190698 @default.
- W3136221257 hasPublicationYear "2021" @default.
- W3136221257 type Work @default.
- W3136221257 sameAs 3136221257 @default.
- W3136221257 citedByCount "14" @default.
- W3136221257 countsByYear W31362212572021 @default.
- W3136221257 countsByYear W31362212572022 @default.
- W3136221257 crossrefType "posted-content" @default.
- W3136221257 hasAuthorship W3136221257A5006765506 @default.
- W3136221257 hasAuthorship W3136221257A5013522456 @default.
- W3136221257 hasAuthorship W3136221257A5013898359 @default.
- W3136221257 hasAuthorship W3136221257A5032345820 @default.
- W3136221257 hasAuthorship W3136221257A5033696194 @default.
- W3136221257 hasAuthorship W3136221257A5041736588 @default.
- W3136221257 hasAuthorship W3136221257A5049835802 @default.
- W3136221257 hasAuthorship W3136221257A5050999417 @default.
- W3136221257 hasAuthorship W3136221257A5052885801 @default.
- W3136221257 hasAuthorship W3136221257A5063681225 @default.
- W3136221257 hasAuthorship W3136221257A5065013252 @default.
- W3136221257 hasAuthorship W3136221257A5067304700 @default.
- W3136221257 hasAuthorship W3136221257A5086465246 @default.
- W3136221257 hasAuthorship W3136221257A5089407156 @default.
- W3136221257 hasConcept C13280743 @default.
- W3136221257 hasConcept C138885662 @default.
- W3136221257 hasConcept C154945302 @default.
- W3136221257 hasConcept C185798385 @default.
- W3136221257 hasConcept C203005215 @default.
- W3136221257 hasConcept C204321447 @default.
- W3136221257 hasConcept C205649164 @default.
- W3136221257 hasConcept C41008148 @default.
- W3136221257 hasConcept C41895202 @default.
- W3136221257 hasConceptScore W3136221257C13280743 @default.
- W3136221257 hasConceptScore W3136221257C138885662 @default.
- W3136221257 hasConceptScore W3136221257C154945302 @default.
- W3136221257 hasConceptScore W3136221257C185798385 @default.
- W3136221257 hasConceptScore W3136221257C203005215 @default.
- W3136221257 hasConceptScore W3136221257C204321447 @default.
- W3136221257 hasConceptScore W3136221257C205649164 @default.
- W3136221257 hasConceptScore W3136221257C41008148 @default.
- W3136221257 hasConceptScore W3136221257C41895202 @default.
- W3136221257 hasLocation W31362212571 @default.
- W3136221257 hasOpenAccess W3136221257 @default.
- W3136221257 hasPrimaryLocation W31362212571 @default.
- W3136221257 hasRelatedWork W1807656721 @default.
- W3136221257 hasRelatedWork W1999602292 @default.
- W3136221257 hasRelatedWork W2889648809 @default.
- W3136221257 hasRelatedWork W2963341956 @default.
- W3136221257 hasRelatedWork W2963721344 @default.
- W3136221257 hasRelatedWork W2965373594 @default.
- W3136221257 hasRelatedWork W3012990076 @default.
- W3136221257 hasRelatedWork W3014346867 @default.
- W3136221257 hasRelatedWork W3017852021 @default.
- W3136221257 hasRelatedWork W3032433061 @default.
- W3136221257 hasRelatedWork W3110945136 @default.
- W3136221257 hasRelatedWork W3147984406 @default.
- W3136221257 hasRelatedWork W3156170450 @default.
- W3136221257 hasRelatedWork W3166656494 @default.
- W3136221257 hasRelatedWork W3175567752 @default.
- W3136221257 hasRelatedWork W3185236989 @default.
- W3136221257 hasRelatedWork W3197148020 @default.
- W3136221257 hasRelatedWork W3211456888 @default.
- W3136221257 hasRelatedWork W3213418658 @default.
- W3136221257 hasRelatedWork W3213438371 @default.
- W3136221257 isParatext "false" @default.
- W3136221257 isRetracted "false" @default.
- W3136221257 magId "3136221257" @default.
- W3136221257 workType "article" @default.