Matches in SemOpenAlex for { <https://semopenalex.org/work/W3036475981> ?p ?o ?g. }
- W3036475981 abstract "Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks. However, most recent progress hinges on heuristic approaches with limited understanding of attention's role in model optimization and computation, and rely on considerable memory and computational resources that scale poorly. In this work, we present a formal analysis of how self-attention affects gradient propagation in recurrent networks, and prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies by establishing concrete bounds for gradient norms. Building on these results, we propose a relevancy screening mechanism, inspired by the cognitive process of memory consolidation, that allows for a scalable use of sparse self-attention with recurrence. While providing guarantees to avoid vanishing gradients, we use simple numerical experiments to demonstrate the tradeoffs in performance and computational resources by efficiently balancing attention and recurrence. Based on our results, we propose a concrete direction of research to improve scalability of attentive networks." @default.
- W3036475981 created "2020-06-25" @default.
- W3036475981 creator A5016769717 @default.
- W3036475981 creator A5022173687 @default.
- W3036475981 creator A5029376641 @default.
- W3036475981 creator A5043037494 @default.
- W3036475981 creator A5051127775 @default.
- W3036475981 creator A5086198262 @default.
- W3036475981 date "2020-06-16" @default.
- W3036475981 modified "2023-09-28" @default.
- W3036475981 title "Untangling tradeoffs between recurrence and self-attention in neural networks" @default.
- W3036475981 cites W1632114991 @default.
- W3036475981 cites W1793121960 @default.
- W3036475981 cites W1800356822 @default.
- W3036475981 cites W1815076433 @default.
- W3036475981 cites W2064675550 @default.
- W3036475981 cites W2107878631 @default.
- W3036475981 cites W2133564696 @default.
- W3036475981 cites W2141086994 @default.
- W3036475981 cites W2157331557 @default.
- W3036475981 cites W2278108219 @default.
- W3036475981 cites W2413794162 @default.
- W3036475981 cites W2597655663 @default.
- W3036475981 cites W2612675303 @default.
- W3036475981 cites W2626194254 @default.
- W3036475981 cites W2736601468 @default.
- W3036475981 cites W2759969248 @default.
- W3036475981 cites W2911109671 @default.
- W3036475981 cites W2914344989 @default.
- W3036475981 cites W2940744433 @default.
- W3036475981 cites W2950242052 @default.
- W3036475981 cites W2950527759 @default.
- W3036475981 cites W2963042606 @default.
- W3036475981 cites W2963341956 @default.
- W3036475981 cites W2963403868 @default.
- W3036475981 cites W2963870701 @default.
- W3036475981 cites W2964059481 @default.
- W3036475981 cites W2970555085 @default.
- W3036475981 cites W2970597249 @default.
- W3036475981 cites W2976023236 @default.
- W3036475981 cites W2990032740 @default.
- W3036475981 cites W2997753998 @default.
- W3036475981 cites W3000514857 @default.
- W3036475981 hasPublicationYear "2020" @default.
- W3036475981 type Work @default.
- W3036475981 sameAs 3036475981 @default.
- W3036475981 citedByCount "2" @default.
- W3036475981 countsByYear W30364759812020 @default.
- W3036475981 crossrefType "posted-content" @default.
- W3036475981 hasAuthorship W3036475981A5016769717 @default.
- W3036475981 hasAuthorship W3036475981A5022173687 @default.
- W3036475981 hasAuthorship W3036475981A5029376641 @default.
- W3036475981 hasAuthorship W3036475981A5043037494 @default.
- W3036475981 hasAuthorship W3036475981A5051127775 @default.
- W3036475981 hasAuthorship W3036475981A5086198262 @default.
- W3036475981 hasConcept C111472728 @default.
- W3036475981 hasConcept C111919701 @default.
- W3036475981 hasConcept C11413529 @default.
- W3036475981 hasConcept C119857082 @default.
- W3036475981 hasConcept C120314980 @default.
- W3036475981 hasConcept C138885662 @default.
- W3036475981 hasConcept C154945302 @default.
- W3036475981 hasConcept C15744967 @default.
- W3036475981 hasConcept C169760540 @default.
- W3036475981 hasConcept C169900460 @default.
- W3036475981 hasConcept C173801870 @default.
- W3036475981 hasConcept C2780586882 @default.
- W3036475981 hasConcept C41008148 @default.
- W3036475981 hasConcept C45374587 @default.
- W3036475981 hasConcept C48044578 @default.
- W3036475981 hasConcept C50644808 @default.
- W3036475981 hasConcept C77088390 @default.
- W3036475981 hasConcept C80444323 @default.
- W3036475981 hasConcept C98045186 @default.
- W3036475981 hasConceptScore W3036475981C111472728 @default.
- W3036475981 hasConceptScore W3036475981C111919701 @default.
- W3036475981 hasConceptScore W3036475981C11413529 @default.
- W3036475981 hasConceptScore W3036475981C119857082 @default.
- W3036475981 hasConceptScore W3036475981C120314980 @default.
- W3036475981 hasConceptScore W3036475981C138885662 @default.
- W3036475981 hasConceptScore W3036475981C154945302 @default.
- W3036475981 hasConceptScore W3036475981C15744967 @default.
- W3036475981 hasConceptScore W3036475981C169760540 @default.
- W3036475981 hasConceptScore W3036475981C169900460 @default.
- W3036475981 hasConceptScore W3036475981C173801870 @default.
- W3036475981 hasConceptScore W3036475981C2780586882 @default.
- W3036475981 hasConceptScore W3036475981C41008148 @default.
- W3036475981 hasConceptScore W3036475981C45374587 @default.
- W3036475981 hasConceptScore W3036475981C48044578 @default.
- W3036475981 hasConceptScore W3036475981C50644808 @default.
- W3036475981 hasConceptScore W3036475981C77088390 @default.
- W3036475981 hasConceptScore W3036475981C80444323 @default.
- W3036475981 hasConceptScore W3036475981C98045186 @default.
- W3036475981 hasLocation W30364759811 @default.
- W3036475981 hasOpenAccess W3036475981 @default.
- W3036475981 hasPrimaryLocation W30364759811 @default.
- W3036475981 hasRelatedWork W1486687522 @default.
- W3036475981 hasRelatedWork W1538919667 @default.
- W3036475981 hasRelatedWork W2904036825 @default.
- W3036475981 hasRelatedWork W2928146297 @default.