Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387522367> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W4387522367 endingPage "240" @default.
- W4387522367 startingPage "226" @default.
- W4387522367 abstract "As the capabilities of language models continue to advance, it is conceivable that “one-size-fits-all” model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models already extensively trained on diverse corpora. More specifically, we further pretrain GPT-J and LLaMA models on Portuguese texts using 3% or less of their original pretraining budget. Few-shot evaluations on Poeta, a suite of 14 Portuguese datasets, reveal that our models outperform English-centric and multilingual counterparts by a significant margin. Our best model, Sabiá-65B, performs on par with GPT-3.5-turbo. By evaluating on datasets originally conceived in the target language as well as translated ones, we study the impact of language-specific pretraining in terms of 1) capturing linguistic nuances and structures inherent to the target language, and 2) enriching the model’s knowledge about a domain or culture. Our results indicate that most benefits stem from the domain-specific knowledge acquired through monolingual pretraining. Finally, we show that our optimized model for Portuguese demonstrates a reduced performance in English tasks, thereby substantiating the inherent compromise in refining models for specific linguistic domains." @default.
- W4387522367 created "2023-10-12" @default.
- W4387522367 creator A5009573252 @default.
- W4387522367 creator A5030647281 @default.
- W4387522367 creator A5043191034 @default.
- W4387522367 creator A5086873271 @default.
- W4387522367 date "2023-01-01" @default.
- W4387522367 modified "2023-10-12" @default.
- W4387522367 title "Sabiá: Portuguese Large Language Models" @default.
- W4387522367 cites W2905246312 @default.
- W4387522367 cites W2946659172 @default.
- W4387522367 cites W2963250244 @default.
- W4387522367 cites W2986154550 @default.
- W4387522367 cites W2991878188 @default.
- W4387522367 cites W3001434439 @default.
- W4387522367 cites W3008931407 @default.
- W4387522367 cites W3035390927 @default.
- W4387522367 cites W3045958725 @default.
- W4387522367 cites W3096266342 @default.
- W4387522367 cites W3098637735 @default.
- W4387522367 cites W3101498587 @default.
- W4387522367 cites W3114950584 @default.
- W4387522367 cites W3153675281 @default.
- W4387522367 cites W3159795318 @default.
- W4387522367 cites W3164045210 @default.
- W4387522367 cites W3169483174 @default.
- W4387522367 cites W3177252310 @default.
- W4387522367 cites W3194676777 @default.
- W4387522367 cites W3213418658 @default.
- W4387522367 cites W4212932674 @default.
- W4387522367 cites W4220967417 @default.
- W4387522367 cites W4229506649 @default.
- W4387522367 cites W4284691825 @default.
- W4387522367 cites W4287888679 @default.
- W4387522367 cites W4289828103 @default.
- W4387522367 cites W4385570226 @default.
- W4387522367 cites W4385571124 @default.
- W4387522367 cites W4385572438 @default.
- W4387522367 doi "https://doi.org/10.1007/978-3-031-45392-2_15" @default.
- W4387522367 hasPublicationYear "2023" @default.
- W4387522367 type Work @default.
- W4387522367 citedByCount "0" @default.
- W4387522367 crossrefType "book-chapter" @default.
- W4387522367 hasAuthorship W4387522367A5009573252 @default.
- W4387522367 hasAuthorship W4387522367A5030647281 @default.
- W4387522367 hasAuthorship W4387522367A5043191034 @default.
- W4387522367 hasAuthorship W4387522367A5086873271 @default.
- W4387522367 hasConcept C119857082 @default.
- W4387522367 hasConcept C134306372 @default.
- W4387522367 hasConcept C137293760 @default.
- W4387522367 hasConcept C138885662 @default.
- W4387522367 hasConcept C154945302 @default.
- W4387522367 hasConcept C166957645 @default.
- W4387522367 hasConcept C204321447 @default.
- W4387522367 hasConcept C2992389322 @default.
- W4387522367 hasConcept C33923547 @default.
- W4387522367 hasConcept C35219183 @default.
- W4387522367 hasConcept C36503486 @default.
- W4387522367 hasConcept C41008148 @default.
- W4387522367 hasConcept C41895202 @default.
- W4387522367 hasConcept C774472 @default.
- W4387522367 hasConcept C79581498 @default.
- W4387522367 hasConcept C95457728 @default.
- W4387522367 hasConceptScore W4387522367C119857082 @default.
- W4387522367 hasConceptScore W4387522367C134306372 @default.
- W4387522367 hasConceptScore W4387522367C137293760 @default.
- W4387522367 hasConceptScore W4387522367C138885662 @default.
- W4387522367 hasConceptScore W4387522367C154945302 @default.
- W4387522367 hasConceptScore W4387522367C166957645 @default.
- W4387522367 hasConceptScore W4387522367C204321447 @default.
- W4387522367 hasConceptScore W4387522367C2992389322 @default.
- W4387522367 hasConceptScore W4387522367C33923547 @default.
- W4387522367 hasConceptScore W4387522367C35219183 @default.
- W4387522367 hasConceptScore W4387522367C36503486 @default.
- W4387522367 hasConceptScore W4387522367C41008148 @default.
- W4387522367 hasConceptScore W4387522367C41895202 @default.
- W4387522367 hasConceptScore W4387522367C774472 @default.
- W4387522367 hasConceptScore W4387522367C79581498 @default.
- W4387522367 hasConceptScore W4387522367C95457728 @default.
- W4387522367 hasLocation W43875223671 @default.
- W4387522367 hasOpenAccess W4387522367 @default.
- W4387522367 hasPrimaryLocation W43875223671 @default.
- W4387522367 hasRelatedWork W1511772879 @default.
- W4387522367 hasRelatedWork W1535222686 @default.
- W4387522367 hasRelatedWork W1964944391 @default.
- W4387522367 hasRelatedWork W2083794993 @default.
- W4387522367 hasRelatedWork W2787642437 @default.
- W4387522367 hasRelatedWork W2912615426 @default.
- W4387522367 hasRelatedWork W352609212 @default.
- W4387522367 hasRelatedWork W4200340037 @default.
- W4387522367 hasRelatedWork W4231704780 @default.
- W4387522367 hasRelatedWork W4285255813 @default.
- W4387522367 isParatext "false" @default.
- W4387522367 isRetracted "false" @default.
- W4387522367 workType "book-chapter" @default.