Matches in SemOpenAlex for { <https://semopenalex.org/work/W4366196683> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W4366196683 abstract "Large decoder-only language models (LMs) can be largely improved in terms of perplexity by retrieval (e.g., RETRO), but its impact on text generation quality and downstream task accuracy is unclear. Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages. We first provide the recipe to reproduce RETRO up to 9.5B parameters while retrieving a text corpus with 330B tokens. Based on that, we have the following novel findings: i) RETRO outperforms GPT on text generation with much less degeneration (i.e., repetition), moderately higher factual accuracy, and slightly lower toxicity with a nontoxic retrieval database. ii) On the LM Evaluation Harness benchmark, RETRO largely outperforms GPT on knowledge-intensive tasks, but is on par with GPT on other tasks. Furthermore, we introduce a simple variant of the model, RETRO++, which largely improves open-domain QA results of original RETRO (e.g., EM score +8.6 on Natural Question) and significantly outperforms retrieval-augmented GPT across different model sizes. Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models. We release our implementation at: https://github.com/NVIDIA/Megatron-LM#retro" @default.
- W4366196683 created "2023-04-19" @default.
- W4366196683 creator A5005843046 @default.
- W4366196683 creator A5011760791 @default.
- W4366196683 creator A5014498545 @default.
- W4366196683 creator A5025080382 @default.
- W4366196683 creator A5027894657 @default.
- W4366196683 creator A5034995105 @default.
- W4366196683 creator A5041191241 @default.
- W4366196683 creator A5048403564 @default.
- W4366196683 creator A5049184232 @default.
- W4366196683 creator A5049870254 @default.
- W4366196683 creator A5066242985 @default.
- W4366196683 creator A5072436307 @default.
- W4366196683 date "2023-04-13" @default.
- W4366196683 modified "2023-10-18" @default.
- W4366196683 title "Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study" @default.
- W4366196683 doi "https://doi.org/10.48550/arxiv.2304.06762" @default.
- W4366196683 hasPublicationYear "2023" @default.
- W4366196683 type Work @default.
- W4366196683 citedByCount "0" @default.
- W4366196683 crossrefType "posted-content" @default.
- W4366196683 hasAuthorship W4366196683A5005843046 @default.
- W4366196683 hasAuthorship W4366196683A5011760791 @default.
- W4366196683 hasAuthorship W4366196683A5014498545 @default.
- W4366196683 hasAuthorship W4366196683A5025080382 @default.
- W4366196683 hasAuthorship W4366196683A5027894657 @default.
- W4366196683 hasAuthorship W4366196683A5034995105 @default.
- W4366196683 hasAuthorship W4366196683A5041191241 @default.
- W4366196683 hasAuthorship W4366196683A5048403564 @default.
- W4366196683 hasAuthorship W4366196683A5049184232 @default.
- W4366196683 hasAuthorship W4366196683A5049870254 @default.
- W4366196683 hasAuthorship W4366196683A5066242985 @default.
- W4366196683 hasAuthorship W4366196683A5072436307 @default.
- W4366196683 hasBestOaLocation W43661966831 @default.
- W4366196683 hasConcept C100279451 @default.
- W4366196683 hasConcept C105795698 @default.
- W4366196683 hasConcept C13280743 @default.
- W4366196683 hasConcept C137293760 @default.
- W4366196683 hasConcept C154945302 @default.
- W4366196683 hasConcept C159877910 @default.
- W4366196683 hasConcept C162324750 @default.
- W4366196683 hasConcept C185798385 @default.
- W4366196683 hasConcept C187736073 @default.
- W4366196683 hasConcept C204321447 @default.
- W4366196683 hasConcept C205649164 @default.
- W4366196683 hasConcept C23123220 @default.
- W4366196683 hasConcept C2776214188 @default.
- W4366196683 hasConcept C2780451532 @default.
- W4366196683 hasConcept C28490314 @default.
- W4366196683 hasConcept C33923547 @default.
- W4366196683 hasConcept C41008148 @default.
- W4366196683 hasConcept C44291984 @default.
- W4366196683 hasConcept C48044578 @default.
- W4366196683 hasConcept C77088390 @default.
- W4366196683 hasConceptScore W4366196683C100279451 @default.
- W4366196683 hasConceptScore W4366196683C105795698 @default.
- W4366196683 hasConceptScore W4366196683C13280743 @default.
- W4366196683 hasConceptScore W4366196683C137293760 @default.
- W4366196683 hasConceptScore W4366196683C154945302 @default.
- W4366196683 hasConceptScore W4366196683C159877910 @default.
- W4366196683 hasConceptScore W4366196683C162324750 @default.
- W4366196683 hasConceptScore W4366196683C185798385 @default.
- W4366196683 hasConceptScore W4366196683C187736073 @default.
- W4366196683 hasConceptScore W4366196683C204321447 @default.
- W4366196683 hasConceptScore W4366196683C205649164 @default.
- W4366196683 hasConceptScore W4366196683C23123220 @default.
- W4366196683 hasConceptScore W4366196683C2776214188 @default.
- W4366196683 hasConceptScore W4366196683C2780451532 @default.
- W4366196683 hasConceptScore W4366196683C28490314 @default.
- W4366196683 hasConceptScore W4366196683C33923547 @default.
- W4366196683 hasConceptScore W4366196683C41008148 @default.
- W4366196683 hasConceptScore W4366196683C44291984 @default.
- W4366196683 hasConceptScore W4366196683C48044578 @default.
- W4366196683 hasConceptScore W4366196683C77088390 @default.
- W4366196683 hasLocation W43661966831 @default.
- W4366196683 hasOpenAccess W4366196683 @default.
- W4366196683 hasPrimaryLocation W43661966831 @default.
- W4366196683 hasRelatedWork W2169518243 @default.
- W4366196683 hasRelatedWork W2747680751 @default.
- W4366196683 hasRelatedWork W2943490406 @default.
- W4366196683 hasRelatedWork W3107474891 @default.
- W4366196683 hasRelatedWork W3154872984 @default.
- W4366196683 hasRelatedWork W3184187848 @default.
- W4366196683 hasRelatedWork W3197304116 @default.
- W4366196683 hasRelatedWork W3198803088 @default.
- W4366196683 hasRelatedWork W4281566512 @default.
- W4366196683 hasRelatedWork W4285757700 @default.
- W4366196683 isParatext "false" @default.
- W4366196683 isRetracted "false" @default.
- W4366196683 workType "article" @default.