Matches in SemOpenAlex for { <https://semopenalex.org/work/W3196318247> ?p ?o ?g. }
- W3196318247 abstract "Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question remains open: how to achieve extrapolation at inference time to longer sequences than seen during training? We first show that extrapolation can be improved by changing the position representation method, though we find that existing proposals do not allow efficient extrapolation. We introduce a simple and efficient method, Attention with Linear Biases (ALiBi), that allows for extrapolation. ALiBi does not add positional embeddings to the word embeddings; instead, it biases the query-key attention scores with a term that is proportional to their distance. We show that this method allows training a 1.3 billion parameter model on input sequences of length 1024 that extrapolates to input sequences of length 2048, achieving the same perplexity as a sinusoidal position embedding model trained on inputs of length 2048, 11% faster and using 11% less memory. ALiBi's inductive bias towards recency allows it to outperform multiple strong position methods on the WikiText-103 benchmark. Finally, we provide analysis of ALiBi to understand why it leads to better performance." @default.
- W3196318247 created "2021-09-13" @default.
- W3196318247 creator A5004412943 @default.
- W3196318247 creator A5035538068 @default.
- W3196318247 creator A5088517824 @default.
- W3196318247 date "2021-08-27" @default.
- W3196318247 modified "2023-09-27" @default.
- W3196318247 title "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation" @default.
- W3196318247 cites W1566289585 @default.
- W3196318247 cites W1591801644 @default.
- W3196318247 cites W179875071 @default.
- W3196318247 cites W1999965501 @default.
- W3196318247 cites W2051840895 @default.
- W3196318247 cites W2525332836 @default.
- W3196318247 cites W2789541106 @default.
- W3196318247 cites W2805206884 @default.
- W3196318247 cites W2919624000 @default.
- W3196318247 cites W2962964385 @default.
- W3196318247 cites W2963341956 @default.
- W3196318247 cites W2963347649 @default.
- W3196318247 cites W2963403868 @default.
- W3196318247 cites W2963631907 @default.
- W3196318247 cites W2963807318 @default.
- W3196318247 cites W2964110616 @default.
- W3196318247 cites W2965373594 @default.
- W3196318247 cites W2995154514 @default.
- W3196318247 cites W2995575179 @default.
- W3196318247 cites W3015468748 @default.
- W3196318247 cites W3030163527 @default.
- W3196318247 cites W3035390927 @default.
- W3196318247 cites W3035691519 @default.
- W3196318247 cites W3082274269 @default.
- W3196318247 cites W3098824823 @default.
- W3196318247 cites W3131922516 @default.
- W3196318247 cites W3147874613 @default.
- W3196318247 cites W3168847912 @default.
- W3196318247 cites W3170261818 @default.
- W3196318247 cites W3174401451 @default.
- W3196318247 hasPublicationYear "2021" @default.
- W3196318247 type Work @default.
- W3196318247 sameAs 3196318247 @default.
- W3196318247 citedByCount "2" @default.
- W3196318247 countsByYear W31963182472021 @default.
- W3196318247 crossrefType "posted-content" @default.
- W3196318247 hasAuthorship W3196318247A5004412943 @default.
- W3196318247 hasAuthorship W3196318247A5035538068 @default.
- W3196318247 hasAuthorship W3196318247A5088517824 @default.
- W3196318247 hasConcept C10138342 @default.
- W3196318247 hasConcept C105795698 @default.
- W3196318247 hasConcept C11413529 @default.
- W3196318247 hasConcept C132459708 @default.
- W3196318247 hasConcept C13280743 @default.
- W3196318247 hasConcept C154945302 @default.
- W3196318247 hasConcept C162324750 @default.
- W3196318247 hasConcept C17744445 @default.
- W3196318247 hasConcept C184898388 @default.
- W3196318247 hasConcept C185798385 @default.
- W3196318247 hasConcept C198082294 @default.
- W3196318247 hasConcept C199539241 @default.
- W3196318247 hasConcept C205649164 @default.
- W3196318247 hasConcept C2776214188 @default.
- W3196318247 hasConcept C2777492778 @default.
- W3196318247 hasConcept C33923547 @default.
- W3196318247 hasConcept C41008148 @default.
- W3196318247 hasConceptScore W3196318247C10138342 @default.
- W3196318247 hasConceptScore W3196318247C105795698 @default.
- W3196318247 hasConceptScore W3196318247C11413529 @default.
- W3196318247 hasConceptScore W3196318247C132459708 @default.
- W3196318247 hasConceptScore W3196318247C13280743 @default.
- W3196318247 hasConceptScore W3196318247C154945302 @default.
- W3196318247 hasConceptScore W3196318247C162324750 @default.
- W3196318247 hasConceptScore W3196318247C17744445 @default.
- W3196318247 hasConceptScore W3196318247C184898388 @default.
- W3196318247 hasConceptScore W3196318247C185798385 @default.
- W3196318247 hasConceptScore W3196318247C198082294 @default.
- W3196318247 hasConceptScore W3196318247C199539241 @default.
- W3196318247 hasConceptScore W3196318247C205649164 @default.
- W3196318247 hasConceptScore W3196318247C2776214188 @default.
- W3196318247 hasConceptScore W3196318247C2777492778 @default.
- W3196318247 hasConceptScore W3196318247C33923547 @default.
- W3196318247 hasConceptScore W3196318247C41008148 @default.
- W3196318247 hasLocation W31963182471 @default.
- W3196318247 hasOpenAccess W3196318247 @default.
- W3196318247 hasPrimaryLocation W31963182471 @default.
- W3196318247 hasRelatedWork W1548974228 @default.
- W3196318247 hasRelatedWork W2126763700 @default.
- W3196318247 hasRelatedWork W2157140289 @default.
- W3196318247 hasRelatedWork W2166721725 @default.
- W3196318247 hasRelatedWork W2175413297 @default.
- W3196318247 hasRelatedWork W2419788668 @default.
- W3196318247 hasRelatedWork W2554844166 @default.
- W3196318247 hasRelatedWork W2604700561 @default.
- W3196318247 hasRelatedWork W2805087131 @default.
- W3196318247 hasRelatedWork W3128590981 @default.
- W3196318247 hasRelatedWork W3157366283 @default.
- W3196318247 hasRelatedWork W3163785802 @default.
- W3196318247 hasRelatedWork W3170718858 @default.
- W3196318247 hasRelatedWork W3173222967 @default.
- W3196318247 hasRelatedWork W3174401451 @default.
- W3196318247 hasRelatedWork W3212632398 @default.