Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386005268> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4386005268 endingPage "1199" @default.
- W4386005268 startingPage "1184" @default.
- W4386005268 abstract "Sentence representation from vanilla BERT models does not work well on sentence similarity tasks. Sentence-BERT models specifically trained on STS or NLI datasets are shown to provide state-of-the-art performance. However, building these models for low-resource languages is not straightforward due to the lack of these specialized datasets. This work focuses on two low-resource Indian languages, Hindi and Marathi. We train sentence-BERT models for these languages using synthetic NLI and STS datasets prepared using machine translation. We show that the strategy of NLI pre-training followed by STSb fine-tuning is effective in generating high-performance sentence-similarity models for Hindi and Marathi. The vanilla BERT models trained using this simple strategy outperform the multilingual LaBSE trained using a complex training strategy. These models are evaluated on downstream text classification and similarity tasks. We evaluate these models on real text classification datasets to show embeddings obtained from synthetic data training are generalizable to real datasets as well and thus represent an effective training strategy for low-resource languages. We also provide a comparative analysis of sentence embeddings from fast text models, multilingual BERT models (mBERT, IndicBERT, xlm-RoBERTa, MuRIL), multilingual sentence embedding models (LASER, LaBSE), and monolingual BERT models based on L3Cube-MahaBERT and HindBERT. We release L3Cube-MahaSBERT and HindSBERT, the state-of-the-art sentence-BERT models for Marathi and Hindi respectively. Our work also serves as a guide to building low-resource sentence embedding models." @default.
- W4386005268 created "2023-08-20" @default.
- W4386005268 creator A5004478007 @default.
- W4386005268 creator A5009725385 @default.
- W4386005268 creator A5076277754 @default.
- W4386005268 creator A5089077061 @default.
- W4386005268 creator A5089285146 @default.
- W4386005268 date "2023-01-01" @default.
- W4386005268 modified "2023-10-12" @default.
- W4386005268 title "L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi" @default.
- W4386005268 cites W2963026768 @default.
- W4386005268 cites W2963918774 @default.
- W4386005268 cites W2964204621 @default.
- W4386005268 cites W2970641574 @default.
- W4386005268 cites W3035390927 @default.
- W4386005268 cites W3042631625 @default.
- W4386005268 cites W3099919888 @default.
- W4386005268 cites W3104382433 @default.
- W4386005268 cites W3105816068 @default.
- W4386005268 cites W3162462834 @default.
- W4386005268 cites W3163018411 @default.
- W4386005268 cites W3203765809 @default.
- W4386005268 cites W4308862025 @default.
- W4386005268 cites W4385574194 @default.
- W4386005268 cites W4385574330 @default.
- W4386005268 doi "https://doi.org/10.1007/978-3-031-37963-5_82" @default.
- W4386005268 hasPublicationYear "2023" @default.
- W4386005268 type Work @default.
- W4386005268 citedByCount "0" @default.
- W4386005268 crossrefType "book-chapter" @default.
- W4386005268 hasAuthorship W4386005268A5004478007 @default.
- W4386005268 hasAuthorship W4386005268A5009725385 @default.
- W4386005268 hasAuthorship W4386005268A5076277754 @default.
- W4386005268 hasAuthorship W4386005268A5089077061 @default.
- W4386005268 hasAuthorship W4386005268A5089285146 @default.
- W4386005268 hasBestOaLocation W43860052682 @default.
- W4386005268 hasConcept C138885662 @default.
- W4386005268 hasConcept C154945302 @default.
- W4386005268 hasConcept C204321447 @default.
- W4386005268 hasConcept C2776844415 @default.
- W4386005268 hasConcept C2777530160 @default.
- W4386005268 hasConcept C41008148 @default.
- W4386005268 hasConcept C41608201 @default.
- W4386005268 hasConcept C41895202 @default.
- W4386005268 hasConcept C519982507 @default.
- W4386005268 hasConceptScore W4386005268C138885662 @default.
- W4386005268 hasConceptScore W4386005268C154945302 @default.
- W4386005268 hasConceptScore W4386005268C204321447 @default.
- W4386005268 hasConceptScore W4386005268C2776844415 @default.
- W4386005268 hasConceptScore W4386005268C2777530160 @default.
- W4386005268 hasConceptScore W4386005268C41008148 @default.
- W4386005268 hasConceptScore W4386005268C41608201 @default.
- W4386005268 hasConceptScore W4386005268C41895202 @default.
- W4386005268 hasConceptScore W4386005268C519982507 @default.
- W4386005268 hasLocation W43860052681 @default.
- W4386005268 hasLocation W43860052682 @default.
- W4386005268 hasOpenAccess W4386005268 @default.
- W4386005268 hasPrimaryLocation W43860052681 @default.
- W4386005268 hasRelatedWork W2097740971 @default.
- W4386005268 hasRelatedWork W2336638260 @default.
- W4386005268 hasRelatedWork W2784413230 @default.
- W4386005268 hasRelatedWork W2789013119 @default.
- W4386005268 hasRelatedWork W3082016734 @default.
- W4386005268 hasRelatedWork W3119657211 @default.
- W4386005268 hasRelatedWork W4309803833 @default.
- W4386005268 hasRelatedWork W4321480256 @default.
- W4386005268 hasRelatedWork W4386103000 @default.
- W4386005268 hasRelatedWork W637185019 @default.
- W4386005268 isParatext "false" @default.
- W4386005268 isRetracted "false" @default.
- W4386005268 workType "book-chapter" @default.