Matches in SemOpenAlex for { <https://semopenalex.org/work/W3166384429> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W3166384429 abstract "Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture. In this work we introduce a novel attention procedure called staircase attention that, unlike self-attention, operates across the sequence (in time) recurrently processing the input by adding another step of processing. A step in the staircase comprises of backward tokens (encoding the sequence so far seen) and forward tokens (ingesting a new part of the sequence), or an extreme Ladder version with a forward step of zero that simply repeats the Transformer on each step of the ladder, sharing the weights. We thus describe a family of such models that can trade off performance and compute, by either increasing the amount of recurrence through time, the amount of sequential processing via recurrence in depth, or both. Staircase attention is shown to be able to solve tasks that involve tracking that conventional Transformers cannot, due to this recurrence. Further, it is shown to provide improved modeling power for the same size model (number of parameters) compared to self-attentive Transformers on large language modeling and dialogue tasks, yielding significant perplexity gains." @default.
- W3166384429 created "2021-06-22" @default.
- W3166384429 creator A5024825692 @default.
- W3166384429 creator A5049992350 @default.
- W3166384429 creator A5060255128 @default.
- W3166384429 creator A5076635608 @default.
- W3166384429 date "2021-06-08" @default.
- W3166384429 modified "2023-10-17" @default.
- W3166384429 title "Staircase Attention for Recurrent Processing of Sequences" @default.
- W3166384429 cites W1793121960 @default.
- W3166384429 cites W179875071 @default.
- W3166384429 cites W2064675550 @default.
- W3166384429 cites W2110485445 @default.
- W3166384429 cites W2132339004 @default.
- W3166384429 cites W2890394457 @default.
- W3166384429 cites W2896060389 @default.
- W3166384429 cites W2940744433 @default.
- W3166384429 cites W2946567085 @default.
- W3166384429 cites W2948981900 @default.
- W3166384429 cites W2963088785 @default.
- W3166384429 cites W2963149412 @default.
- W3166384429 cites W2963341956 @default.
- W3166384429 cites W2963403868 @default.
- W3166384429 cites W2963641307 @default.
- W3166384429 cites W2963925437 @default.
- W3166384429 cites W2964308564 @default.
- W3166384429 cites W2965373594 @default.
- W3166384429 cites W2973049837 @default.
- W3166384429 cites W2975059944 @default.
- W3166384429 cites W2983040767 @default.
- W3166384429 cites W2988945824 @default.
- W3166384429 cites W2994673210 @default.
- W3166384429 cites W2995289474 @default.
- W3166384429 cites W3002330681 @default.
- W3166384429 cites W3015468748 @default.
- W3166384429 cites W3023786569 @default.
- W3166384429 cites W3034337319 @default.
- W3166384429 cites W3119866685 @default.
- W3166384429 cites W3123673616 @default.
- W3166384429 cites W3147874613 @default.
- W3166384429 doi "https://doi.org/10.48550/arxiv.2106.04279" @default.
- W3166384429 hasPublicationYear "2021" @default.
- W3166384429 type Work @default.
- W3166384429 sameAs 3166384429 @default.
- W3166384429 citedByCount "0" @default.
- W3166384429 crossrefType "posted-content" @default.
- W3166384429 hasAuthorship W3166384429A5024825692 @default.
- W3166384429 hasAuthorship W3166384429A5049992350 @default.
- W3166384429 hasAuthorship W3166384429A5060255128 @default.
- W3166384429 hasAuthorship W3166384429A5076635608 @default.
- W3166384429 hasBestOaLocation W31663844291 @default.
- W3166384429 hasConcept C100279451 @default.
- W3166384429 hasConcept C11413529 @default.
- W3166384429 hasConcept C119599485 @default.
- W3166384429 hasConcept C127413603 @default.
- W3166384429 hasConcept C137293760 @default.
- W3166384429 hasConcept C154945302 @default.
- W3166384429 hasConcept C165801399 @default.
- W3166384429 hasConcept C2778112365 @default.
- W3166384429 hasConcept C28490314 @default.
- W3166384429 hasConcept C41008148 @default.
- W3166384429 hasConcept C54355233 @default.
- W3166384429 hasConcept C66322947 @default.
- W3166384429 hasConcept C86803240 @default.
- W3166384429 hasConceptScore W3166384429C100279451 @default.
- W3166384429 hasConceptScore W3166384429C11413529 @default.
- W3166384429 hasConceptScore W3166384429C119599485 @default.
- W3166384429 hasConceptScore W3166384429C127413603 @default.
- W3166384429 hasConceptScore W3166384429C137293760 @default.
- W3166384429 hasConceptScore W3166384429C154945302 @default.
- W3166384429 hasConceptScore W3166384429C165801399 @default.
- W3166384429 hasConceptScore W3166384429C2778112365 @default.
- W3166384429 hasConceptScore W3166384429C28490314 @default.
- W3166384429 hasConceptScore W3166384429C41008148 @default.
- W3166384429 hasConceptScore W3166384429C54355233 @default.
- W3166384429 hasConceptScore W3166384429C66322947 @default.
- W3166384429 hasConceptScore W3166384429C86803240 @default.
- W3166384429 hasLocation W31663844291 @default.
- W3166384429 hasOpenAccess W3166384429 @default.
- W3166384429 hasPrimaryLocation W31663844291 @default.
- W3166384429 hasRelatedWork W1989705153 @default.
- W3166384429 hasRelatedWork W2107734859 @default.
- W3166384429 hasRelatedWork W2496228846 @default.
- W3166384429 hasRelatedWork W2896411932 @default.
- W3166384429 hasRelatedWork W2936497627 @default.
- W3166384429 hasRelatedWork W3013624417 @default.
- W3166384429 hasRelatedWork W3016888273 @default.
- W3166384429 hasRelatedWork W3049463507 @default.
- W3166384429 hasRelatedWork W3100913109 @default.
- W3166384429 hasRelatedWork W4287826556 @default.
- W3166384429 isParatext "false" @default.
- W3166384429 isRetracted "false" @default.
- W3166384429 magId "3166384429" @default.
- W3166384429 workType "article" @default.