Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287641010> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4287641010 abstract "Extrapolation to unseen sequence lengths is a challenge for neural generative models of language. In this work, we characterize the effect on length extrapolation of a modeling decision often overlooked: predicting the end of the generative process through the use of a special end-of-sequence (EOS) vocabulary item. We study an oracle setting - forcing models to generate to the correct sequence length at test time - to compare the length-extrapolative behavior of networks trained to predict EOS (+EOS) with networks not trained to (-EOS). We find that -EOS substantially outperforms +EOS, for example extrapolating well to lengths 10 times longer than those seen at training time in a bracket closing task, as well as achieving a 40% improvement over +EOS in the difficult SCAN dataset length generalization task. By comparing the hidden states and dynamics of -EOS and +EOS models, we observe that +EOS models fail to generalize because they (1) unnecessarily stratify their hidden states by their linear position is a sequence (structures we call length manifolds) or (2) get stuck in clusters (which we refer to as length attractors) once the EOS token is the highest-probability prediction." @default.
- W4287641010 created "2022-07-25" @default.
- W4287641010 creator A5017046391 @default.
- W4287641010 creator A5025255782 @default.
- W4287641010 creator A5030044044 @default.
- W4287641010 creator A5046006076 @default.
- W4287641010 date "2020-10-14" @default.
- W4287641010 modified "2023-09-30" @default.
- W4287641010 title "The EOS Decision and Length Extrapolation" @default.
- W4287641010 doi "https://doi.org/10.48550/arxiv.2010.07174" @default.
- W4287641010 hasPublicationYear "2020" @default.
- W4287641010 type Work @default.
- W4287641010 citedByCount "0" @default.
- W4287641010 crossrefType "posted-content" @default.
- W4287641010 hasAuthorship W4287641010A5017046391 @default.
- W4287641010 hasAuthorship W4287641010A5025255782 @default.
- W4287641010 hasAuthorship W4287641010A5030044044 @default.
- W4287641010 hasAuthorship W4287641010A5046006076 @default.
- W4287641010 hasBestOaLocation W42876410101 @default.
- W4287641010 hasConcept C105795698 @default.
- W4287641010 hasConcept C11413529 @default.
- W4287641010 hasConcept C115903868 @default.
- W4287641010 hasConcept C119857082 @default.
- W4287641010 hasConcept C132459708 @default.
- W4287641010 hasConcept C134306372 @default.
- W4287641010 hasConcept C154945302 @default.
- W4287641010 hasConcept C162324750 @default.
- W4287641010 hasConcept C177148314 @default.
- W4287641010 hasConcept C187736073 @default.
- W4287641010 hasConcept C197115733 @default.
- W4287641010 hasConcept C22019652 @default.
- W4287641010 hasConcept C2778112365 @default.
- W4287641010 hasConcept C2780451532 @default.
- W4287641010 hasConcept C33923547 @default.
- W4287641010 hasConcept C39890363 @default.
- W4287641010 hasConcept C41008148 @default.
- W4287641010 hasConcept C50644808 @default.
- W4287641010 hasConcept C54355233 @default.
- W4287641010 hasConcept C55166926 @default.
- W4287641010 hasConcept C86803240 @default.
- W4287641010 hasConceptScore W4287641010C105795698 @default.
- W4287641010 hasConceptScore W4287641010C11413529 @default.
- W4287641010 hasConceptScore W4287641010C115903868 @default.
- W4287641010 hasConceptScore W4287641010C119857082 @default.
- W4287641010 hasConceptScore W4287641010C132459708 @default.
- W4287641010 hasConceptScore W4287641010C134306372 @default.
- W4287641010 hasConceptScore W4287641010C154945302 @default.
- W4287641010 hasConceptScore W4287641010C162324750 @default.
- W4287641010 hasConceptScore W4287641010C177148314 @default.
- W4287641010 hasConceptScore W4287641010C187736073 @default.
- W4287641010 hasConceptScore W4287641010C197115733 @default.
- W4287641010 hasConceptScore W4287641010C22019652 @default.
- W4287641010 hasConceptScore W4287641010C2778112365 @default.
- W4287641010 hasConceptScore W4287641010C2780451532 @default.
- W4287641010 hasConceptScore W4287641010C33923547 @default.
- W4287641010 hasConceptScore W4287641010C39890363 @default.
- W4287641010 hasConceptScore W4287641010C41008148 @default.
- W4287641010 hasConceptScore W4287641010C50644808 @default.
- W4287641010 hasConceptScore W4287641010C54355233 @default.
- W4287641010 hasConceptScore W4287641010C55166926 @default.
- W4287641010 hasConceptScore W4287641010C86803240 @default.
- W4287641010 hasLocation W42876410101 @default.
- W4287641010 hasOpenAccess W4287641010 @default.
- W4287641010 hasPrimaryLocation W42876410101 @default.
- W4287641010 hasRelatedWork W10944326 @default.
- W4287641010 hasRelatedWork W11023528 @default.
- W4287641010 hasRelatedWork W11562254 @default.
- W4287641010 hasRelatedWork W14777649 @default.
- W4287641010 hasRelatedWork W14789944 @default.
- W4287641010 hasRelatedWork W14828854 @default.
- W4287641010 hasRelatedWork W3422034 @default.
- W4287641010 hasRelatedWork W6533109 @default.
- W4287641010 hasRelatedWork W6630852 @default.
- W4287641010 hasRelatedWork W8451425 @default.
- W4287641010 isParatext "false" @default.
- W4287641010 isRetracted "false" @default.
- W4287641010 workType "article" @default.