Matches in SemOpenAlex for { <https://semopenalex.org/work/W4226285793> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W4226285793 abstract "We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length. Our recurrent cell operates on blocks of tokens rather than single tokens during training, and leverages parallel computation within a block in order to make efficient use of accelerator hardware. The cell itself is strikingly simple. It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens. Our design was inspired in part by LSTM cells, and it uses LSTM-style gates, but it scales the typical LSTM cell up by several orders of magnitude. Our implementation of recurrence has the same cost in both computation time and parameter count as a conventional transformer layer, but offers dramatically improved perplexity in language modeling tasks over very long sequences. Our model out-performs a long-range Transformer XL baseline by a wide margin, while running twice as fast. We demonstrate its effectiveness on PG19 (books), arXiv papers, and GitHub source code. Our code has been released as open source." @default.
- W4226285793 created "2022-05-05" @default.
- W4226285793 creator A5010086803 @default.
- W4226285793 creator A5019167028 @default.
- W4226285793 creator A5020043722 @default.
- W4226285793 creator A5024901763 @default.
- W4226285793 creator A5035369215 @default.
- W4226285793 date "2022-03-11" @default.
- W4226285793 modified "2023-10-17" @default.
- W4226285793 title "Block-Recurrent Transformers" @default.
- W4226285793 doi "https://doi.org/10.48550/arxiv.2203.07852" @default.
- W4226285793 hasPublicationYear "2022" @default.
- W4226285793 type Work @default.
- W4226285793 citedByCount "1" @default.
- W4226285793 countsByYear W42262857932023 @default.
- W4226285793 crossrefType "posted-content" @default.
- W4226285793 hasAuthorship W4226285793A5010086803 @default.
- W4226285793 hasAuthorship W4226285793A5019167028 @default.
- W4226285793 hasAuthorship W4226285793A5020043722 @default.
- W4226285793 hasAuthorship W4226285793A5024901763 @default.
- W4226285793 hasAuthorship W4226285793A5035369215 @default.
- W4226285793 hasBestOaLocation W42262857931 @default.
- W4226285793 hasConcept C100279451 @default.
- W4226285793 hasConcept C113775141 @default.
- W4226285793 hasConcept C11413529 @default.
- W4226285793 hasConcept C119599485 @default.
- W4226285793 hasConcept C127413603 @default.
- W4226285793 hasConcept C137293760 @default.
- W4226285793 hasConcept C154945302 @default.
- W4226285793 hasConcept C165801399 @default.
- W4226285793 hasConcept C199360897 @default.
- W4226285793 hasConcept C41008148 @default.
- W4226285793 hasConcept C43126263 @default.
- W4226285793 hasConcept C45374587 @default.
- W4226285793 hasConcept C66322947 @default.
- W4226285793 hasConceptScore W4226285793C100279451 @default.
- W4226285793 hasConceptScore W4226285793C113775141 @default.
- W4226285793 hasConceptScore W4226285793C11413529 @default.
- W4226285793 hasConceptScore W4226285793C119599485 @default.
- W4226285793 hasConceptScore W4226285793C127413603 @default.
- W4226285793 hasConceptScore W4226285793C137293760 @default.
- W4226285793 hasConceptScore W4226285793C154945302 @default.
- W4226285793 hasConceptScore W4226285793C165801399 @default.
- W4226285793 hasConceptScore W4226285793C199360897 @default.
- W4226285793 hasConceptScore W4226285793C41008148 @default.
- W4226285793 hasConceptScore W4226285793C43126263 @default.
- W4226285793 hasConceptScore W4226285793C45374587 @default.
- W4226285793 hasConceptScore W4226285793C66322947 @default.
- W4226285793 hasLocation W42262857931 @default.
- W4226285793 hasLocation W42262857932 @default.
- W4226285793 hasOpenAccess W4226285793 @default.
- W4226285793 hasPrimaryLocation W42262857931 @default.
- W4226285793 hasRelatedWork W1989705153 @default.
- W4226285793 hasRelatedWork W2496228846 @default.
- W4226285793 hasRelatedWork W2896411932 @default.
- W4226285793 hasRelatedWork W2936497627 @default.
- W4226285793 hasRelatedWork W3013624417 @default.
- W4226285793 hasRelatedWork W3049463507 @default.
- W4226285793 hasRelatedWork W4226285793 @default.
- W4226285793 hasRelatedWork W4287826556 @default.
- W4226285793 hasRelatedWork W4288365749 @default.
- W4226285793 hasRelatedWork W4360981678 @default.
- W4226285793 isParatext "false" @default.
- W4226285793 isRetracted "false" @default.
- W4226285793 workType "article" @default.