Matches in SemOpenAlex for { <https://semopenalex.org/work/W2971524460> ?p ?o ?g. }
- W2971524460 abstract "Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with $alpha$-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the $alpha$ parameter -- which controls the shape and sparsity of $alpha$-entmax -- allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations." @default.
- W2971524460 created "2019-09-12" @default.
- W2971524460 creator A5051693368 @default.
- W2971524460 creator A5064085103 @default.
- W2971524460 creator A5074202735 @default.
- W2971524460 date "2019-08-30" @default.
- W2971524460 modified "2023-09-27" @default.
- W2971524460 title "Adaptively Sparse Transformers" @default.
- W2971524460 cites W1551360398 @default.
- W2971524460 cites W1983874169 @default.
- W2971524460 cites W2017697298 @default.
- W2971524460 cites W2101105183 @default.
- W2971524460 cites W2257408573 @default.
- W2971524460 cites W2270190199 @default.
- W2971524460 cites W2505728881 @default.
- W2971524460 cites W2512924740 @default.
- W2971524460 cites W2799051177 @default.
- W2971524460 cites W2805493160 @default.
- W2971524460 cites W2859444450 @default.
- W2971524460 cites W2888539709 @default.
- W2971524460 cites W2908336025 @default.
- W2971524460 cites W2921569601 @default.
- W2971524460 cites W2934842096 @default.
- W2971524460 cites W2940744433 @default.
- W2971524460 cites W2946417913 @default.
- W2971524460 cites W2946567085 @default.
- W2971524460 cites W2946794439 @default.
- W2971524460 cites W2950858167 @default.
- W2971524460 cites W2962784628 @default.
- W2971524460 cites W2962822108 @default.
- W2971524460 cites W2962839844 @default.
- W2971524460 cites W2962943802 @default.
- W2971524460 cites W2963062480 @default.
- W2971524460 cites W2963123301 @default.
- W2971524460 cites W2963341956 @default.
- W2971524460 cites W2963403868 @default.
- W2971524460 cites W2963502387 @default.
- W2971524460 cites W2963807318 @default.
- W2971524460 cites W2963828549 @default.
- W2971524460 cites W2963970238 @default.
- W2971524460 cites W2964265128 @default.
- W2971524460 cites W2964308564 @default.
- W2971524460 cites W3005389111 @default.
- W2971524460 hasPublicationYear "2019" @default.
- W2971524460 type Work @default.
- W2971524460 sameAs 2971524460 @default.
- W2971524460 citedByCount "16" @default.
- W2971524460 countsByYear W29715244602019 @default.
- W2971524460 countsByYear W29715244602020 @default.
- W2971524460 countsByYear W29715244602021 @default.
- W2971524460 crossrefType "posted-content" @default.
- W2971524460 hasAuthorship W2971524460A5051693368 @default.
- W2971524460 hasAuthorship W2971524460A5064085103 @default.
- W2971524460 hasAuthorship W2971524460A5074202735 @default.
- W2971524460 hasConcept C119599485 @default.
- W2971524460 hasConcept C119857082 @default.
- W2971524460 hasConcept C127413603 @default.
- W2971524460 hasConcept C134306372 @default.
- W2971524460 hasConcept C153180895 @default.
- W2971524460 hasConcept C154945302 @default.
- W2971524460 hasConcept C165801399 @default.
- W2971524460 hasConcept C177148314 @default.
- W2971524460 hasConcept C188441871 @default.
- W2971524460 hasConcept C204321447 @default.
- W2971524460 hasConcept C2781067378 @default.
- W2971524460 hasConcept C33923547 @default.
- W2971524460 hasConcept C41008148 @default.
- W2971524460 hasConcept C50644808 @default.
- W2971524460 hasConcept C66322947 @default.
- W2971524460 hasConceptScore W2971524460C119599485 @default.
- W2971524460 hasConceptScore W2971524460C119857082 @default.
- W2971524460 hasConceptScore W2971524460C127413603 @default.
- W2971524460 hasConceptScore W2971524460C134306372 @default.
- W2971524460 hasConceptScore W2971524460C153180895 @default.
- W2971524460 hasConceptScore W2971524460C154945302 @default.
- W2971524460 hasConceptScore W2971524460C165801399 @default.
- W2971524460 hasConceptScore W2971524460C177148314 @default.
- W2971524460 hasConceptScore W2971524460C188441871 @default.
- W2971524460 hasConceptScore W2971524460C204321447 @default.
- W2971524460 hasConceptScore W2971524460C2781067378 @default.
- W2971524460 hasConceptScore W2971524460C33923547 @default.
- W2971524460 hasConceptScore W2971524460C41008148 @default.
- W2971524460 hasConceptScore W2971524460C50644808 @default.
- W2971524460 hasConceptScore W2971524460C66322947 @default.
- W2971524460 hasLocation W29715244601 @default.
- W2971524460 hasOpenAccess W2971524460 @default.
- W2971524460 hasPrimaryLocation W29715244601 @default.
- W2971524460 hasRelatedWork W2615692152 @default.
- W2971524460 hasRelatedWork W2902066303 @default.
- W2971524460 hasRelatedWork W2906760449 @default.
- W2971524460 hasRelatedWork W2911109671 @default.
- W2971524460 hasRelatedWork W2940744433 @default.
- W2971524460 hasRelatedWork W2946567085 @default.
- W2971524460 hasRelatedWork W2952530889 @default.
- W2971524460 hasRelatedWork W2963123301 @default.
- W2971524460 hasRelatedWork W2963341956 @default.
- W2971524460 hasRelatedWork W2963403868 @default.
- W2971524460 hasRelatedWork W2964308564 @default.
- W2971524460 hasRelatedWork W2965373594 @default.
- W2971524460 hasRelatedWork W2970777192 @default.