Matches in SemOpenAlex for { <https://semopenalex.org/work/W3123673616> ?p ?o ?g. }
- W3123673616 abstract "Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks. Unlike recurrent neural networks, Transformers use attention to capture temporal relations while processing input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input. The representation at a given layer can only access representations from lower layers, rather than the higher level representations already available. In this work, we propose the Feedback Transformer architecture that exposes all previous representations to all future representations, meaning the lowest representation of the current timestep is formed from the highest-level abstract representation of the past. We demonstrate on a variety of benchmarks in language modeling, machine translation, and reinforcement learning that the increased representation capacity can create small, shallow models with much stronger performance than comparable Transformers." @default.
- W3123673616 created "2021-02-01" @default.
- W3123673616 creator A5041693326 @default.
- W3123673616 creator A5060255128 @default.
- W3123673616 creator A5069316249 @default.
- W3123673616 creator A5083185771 @default.
- W3123673616 creator A5090457109 @default.
- W3123673616 date "2020-02-21" @default.
- W3123673616 modified "2023-10-17" @default.
- W3123673616 title "Addressing Some Limitations of Transformers with Feedback Memory" @default.
- W3123673616 cites W1544827683 @default.
- W3123673616 cites W1732222442 @default.
- W3123673616 cites W1793121960 @default.
- W3123673616 cites W1847088711 @default.
- W3123673616 cites W2145107163 @default.
- W3123673616 cites W2154652894 @default.
- W3123673616 cites W2173051530 @default.
- W3123673616 cites W2194775991 @default.
- W3123673616 cites W2626778328 @default.
- W3123673616 cites W2743945814 @default.
- W3123673616 cites W2768716007 @default.
- W3123673616 cites W2790259362 @default.
- W3123673616 cites W2792376130 @default.
- W3123673616 cites W2804044248 @default.
- W3123673616 cites W2806311723 @default.
- W3123673616 cites W2890177507 @default.
- W3123673616 cites W2896060389 @default.
- W3123673616 cites W2896528354 @default.
- W3123673616 cites W2908802752 @default.
- W3123673616 cites W2911109671 @default.
- W3123673616 cites W2936652946 @default.
- W3123673616 cites W2940744433 @default.
- W3123673616 cites W2946567085 @default.
- W3123673616 cites W2950527759 @default.
- W3123673616 cites W2951560313 @default.
- W3123673616 cites W2952436057 @default.
- W3123673616 cites W2952509486 @default.
- W3123673616 cites W2952913664 @default.
- W3123673616 cites W2955227499 @default.
- W3123673616 cites W2956480774 @default.
- W3123673616 cites W2962784628 @default.
- W3123673616 cites W2963034893 @default.
- W3123673616 cites W2963341956 @default.
- W3123673616 cites W2963601622 @default.
- W3123673616 cites W2963631907 @default.
- W3123673616 cites W2963641307 @default.
- W3123673616 cites W2963925437 @default.
- W3123673616 cites W2971842688 @default.
- W3123673616 cites W2975381464 @default.
- W3123673616 cites W2980433389 @default.
- W3123673616 cites W2986922898 @default.
- W3123673616 cites W2988841832 @default.
- W3123673616 cites W2991324852 @default.
- W3123673616 cites W2995575179 @default.
- W3123673616 cites W2997517014 @default.
- W3123673616 doi "https://doi.org/10.48550/arxiv.2002.09402" @default.
- W3123673616 hasPublicationYear "2020" @default.
- W3123673616 type Work @default.
- W3123673616 sameAs 3123673616 @default.
- W3123673616 citedByCount "11" @default.
- W3123673616 countsByYear W31236736162020 @default.
- W3123673616 countsByYear W31236736162021 @default.
- W3123673616 countsByYear W31236736162022 @default.
- W3123673616 crossrefType "posted-content" @default.
- W3123673616 hasAuthorship W3123673616A5041693326 @default.
- W3123673616 hasAuthorship W3123673616A5060255128 @default.
- W3123673616 hasAuthorship W3123673616A5069316249 @default.
- W3123673616 hasAuthorship W3123673616A5083185771 @default.
- W3123673616 hasAuthorship W3123673616A5090457109 @default.
- W3123673616 hasBestOaLocation W31236736161 @default.
- W3123673616 hasConcept C113775141 @default.
- W3123673616 hasConcept C118524514 @default.
- W3123673616 hasConcept C119599485 @default.
- W3123673616 hasConcept C119857082 @default.
- W3123673616 hasConcept C123657996 @default.
- W3123673616 hasConcept C127413603 @default.
- W3123673616 hasConcept C133731056 @default.
- W3123673616 hasConcept C142362112 @default.
- W3123673616 hasConcept C153349607 @default.
- W3123673616 hasConcept C154945302 @default.
- W3123673616 hasConcept C165801399 @default.
- W3123673616 hasConcept C17744445 @default.
- W3123673616 hasConcept C199539241 @default.
- W3123673616 hasConcept C203005215 @default.
- W3123673616 hasConcept C2776359362 @default.
- W3123673616 hasConcept C38858127 @default.
- W3123673616 hasConcept C41008148 @default.
- W3123673616 hasConcept C50644808 @default.
- W3123673616 hasConcept C66322947 @default.
- W3123673616 hasConcept C80444323 @default.
- W3123673616 hasConcept C94625758 @default.
- W3123673616 hasConcept C97541855 @default.
- W3123673616 hasConceptScore W3123673616C113775141 @default.
- W3123673616 hasConceptScore W3123673616C118524514 @default.
- W3123673616 hasConceptScore W3123673616C119599485 @default.
- W3123673616 hasConceptScore W3123673616C119857082 @default.
- W3123673616 hasConceptScore W3123673616C123657996 @default.
- W3123673616 hasConceptScore W3123673616C127413603 @default.
- W3123673616 hasConceptScore W3123673616C133731056 @default.
- W3123673616 hasConceptScore W3123673616C142362112 @default.