Matches in SemOpenAlex for { <https://semopenalex.org/work/W4320087317> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W4320087317 abstract "The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equipping language models with the ability to generalize to longer problems." @default.
- W4320087317 created "2023-02-12" @default.
- W4320087317 creator A5010086803 @default.
- W4320087317 creator A5018113047 @default.
- W4320087317 creator A5019210292 @default.
- W4320087317 creator A5024901763 @default.
- W4320087317 creator A5030441349 @default.
- W4320087317 creator A5035369215 @default.
- W4320087317 creator A5050581734 @default.
- W4320087317 creator A5052645158 @default.
- W4320087317 creator A5059753414 @default.
- W4320087317 creator A5074830078 @default.
- W4320087317 date "2022-07-11" @default.
- W4320087317 modified "2023-09-29" @default.
- W4320087317 title "Exploring Length Generalization in Large Language Models" @default.
- W4320087317 doi "https://doi.org/10.48550/arxiv.2207.04901" @default.
- W4320087317 hasPublicationYear "2022" @default.
- W4320087317 type Work @default.
- W4320087317 citedByCount "0" @default.
- W4320087317 crossrefType "posted-content" @default.
- W4320087317 hasAuthorship W4320087317A5010086803 @default.
- W4320087317 hasAuthorship W4320087317A5018113047 @default.
- W4320087317 hasAuthorship W4320087317A5019210292 @default.
- W4320087317 hasAuthorship W4320087317A5024901763 @default.
- W4320087317 hasAuthorship W4320087317A5030441349 @default.
- W4320087317 hasAuthorship W4320087317A5035369215 @default.
- W4320087317 hasAuthorship W4320087317A5050581734 @default.
- W4320087317 hasAuthorship W4320087317A5052645158 @default.
- W4320087317 hasAuthorship W4320087317A5059753414 @default.
- W4320087317 hasAuthorship W4320087317A5074830078 @default.
- W4320087317 hasBestOaLocation W43200873171 @default.
- W4320087317 hasConcept C119599485 @default.
- W4320087317 hasConcept C119857082 @default.
- W4320087317 hasConcept C127413603 @default.
- W4320087317 hasConcept C134306372 @default.
- W4320087317 hasConcept C137293760 @default.
- W4320087317 hasConcept C144024400 @default.
- W4320087317 hasConcept C151730666 @default.
- W4320087317 hasConcept C154945302 @default.
- W4320087317 hasConcept C165801399 @default.
- W4320087317 hasConcept C177148314 @default.
- W4320087317 hasConcept C204321447 @default.
- W4320087317 hasConcept C2779343474 @default.
- W4320087317 hasConcept C2779903281 @default.
- W4320087317 hasConcept C33923547 @default.
- W4320087317 hasConcept C36289849 @default.
- W4320087317 hasConcept C41008148 @default.
- W4320087317 hasConcept C66322947 @default.
- W4320087317 hasConcept C80444323 @default.
- W4320087317 hasConcept C86803240 @default.
- W4320087317 hasConceptScore W4320087317C119599485 @default.
- W4320087317 hasConceptScore W4320087317C119857082 @default.
- W4320087317 hasConceptScore W4320087317C127413603 @default.
- W4320087317 hasConceptScore W4320087317C134306372 @default.
- W4320087317 hasConceptScore W4320087317C137293760 @default.
- W4320087317 hasConceptScore W4320087317C144024400 @default.
- W4320087317 hasConceptScore W4320087317C151730666 @default.
- W4320087317 hasConceptScore W4320087317C154945302 @default.
- W4320087317 hasConceptScore W4320087317C165801399 @default.
- W4320087317 hasConceptScore W4320087317C177148314 @default.
- W4320087317 hasConceptScore W4320087317C204321447 @default.
- W4320087317 hasConceptScore W4320087317C2779343474 @default.
- W4320087317 hasConceptScore W4320087317C2779903281 @default.
- W4320087317 hasConceptScore W4320087317C33923547 @default.
- W4320087317 hasConceptScore W4320087317C36289849 @default.
- W4320087317 hasConceptScore W4320087317C41008148 @default.
- W4320087317 hasConceptScore W4320087317C66322947 @default.
- W4320087317 hasConceptScore W4320087317C80444323 @default.
- W4320087317 hasConceptScore W4320087317C86803240 @default.
- W4320087317 hasLocation W43200873171 @default.
- W4320087317 hasOpenAccess W4320087317 @default.
- W4320087317 hasPrimaryLocation W43200873171 @default.
- W4320087317 hasRelatedWork W142374489 @default.
- W4320087317 hasRelatedWork W1563618553 @default.
- W4320087317 hasRelatedWork W2148757832 @default.
- W4320087317 hasRelatedWork W2359001871 @default.
- W4320087317 hasRelatedWork W2989932438 @default.
- W4320087317 hasRelatedWork W3033862527 @default.
- W4320087317 hasRelatedWork W3097571385 @default.
- W4320087317 hasRelatedWork W3107474891 @default.
- W4320087317 hasRelatedWork W3196747313 @default.
- W4320087317 hasRelatedWork W4287761227 @default.
- W4320087317 isParatext "false" @default.
- W4320087317 isRetracted "false" @default.
- W4320087317 workType "article" @default.