Matches in SemOpenAlex for { <https://semopenalex.org/work/W3091819431> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W3091819431 abstract "Low-resource language translation is a challenging but socially valuable NLP task. Building on recent work adapting the Transformer's normalization to this setting, we propose QKNorm, a normalization technique that modifies the attention mechanism to make the softmax function less prone to arbitrary saturation without sacrificing expressivity. Specifically, we apply $ell_2$ normalization along the head dimension of each query and key matrix prior to multiplying them and then scale up by a learnable parameter instead of dividing by the square root of the embedding dimension. We show improvements averaging 0.928 BLEU over state-of-the-art bilingual benchmarks for 5 low-resource translation pairs from the TED Talks corpus and IWSLT'15." @default.
- W3091819431 created "2020-10-15" @default.
- W3091819431 creator A5015996294 @default.
- W3091819431 creator A5053433835 @default.
- W3091819431 creator A5056170993 @default.
- W3091819431 creator A5068566052 @default.
- W3091819431 date "2020-10-08" @default.
- W3091819431 modified "2023-09-23" @default.
- W3091819431 title "Query-Key Normalization for Transformers" @default.
- W3091819431 cites W2101105183 @default.
- W3091819431 cites W2143017621 @default.
- W3091819431 cites W222053410 @default.
- W3091819431 cites W2767989436 @default.
- W3091819431 cites W2798761464 @default.
- W3091819431 cites W2905927205 @default.
- W3091819431 cites W2919290281 @default.
- W3091819431 cites W2933138175 @default.
- W3091819431 cites W2949117887 @default.
- W3091819431 cites W2953830716 @default.
- W3091819431 cites W2962784628 @default.
- W3091819431 cites W2963086938 @default.
- W3091819431 cites W2963212250 @default.
- W3091819431 cites W2963403868 @default.
- W3091819431 cites W2963418779 @default.
- W3091819431 cites W2963532001 @default.
- W3091819431 cites W2963542740 @default.
- W3091819431 cites W2964085268 @default.
- W3091819431 cites W2964110616 @default.
- W3091819431 cites W2964213727 @default.
- W3091819431 cites W2970157301 @default.
- W3091819431 cites W2970903692 @default.
- W3091819431 cites W2972324944 @default.
- W3091819431 cites W2979636403 @default.
- W3091819431 cites W2997753998 @default.
- W3091819431 cites W3021357296 @default.
- W3091819431 cites W3026674654 @default.
- W3091819431 cites W3034772996 @default.
- W3091819431 cites W3035207248 @default.
- W3091819431 cites W3035618147 @default.
- W3091819431 cites W3093960091 @default.
- W3091819431 doi "https://doi.org/10.48550/arxiv.2010.04245" @default.
- W3091819431 hasPublicationYear "2020" @default.
- W3091819431 type Work @default.
- W3091819431 sameAs 3091819431 @default.
- W3091819431 citedByCount "1" @default.
- W3091819431 countsByYear W30918194312021 @default.
- W3091819431 crossrefType "posted-content" @default.
- W3091819431 hasAuthorship W3091819431A5015996294 @default.
- W3091819431 hasAuthorship W3091819431A5053433835 @default.
- W3091819431 hasAuthorship W3091819431A5056170993 @default.
- W3091819431 hasAuthorship W3091819431A5068566052 @default.
- W3091819431 hasBestOaLocation W30918194311 @default.
- W3091819431 hasConcept C108583219 @default.
- W3091819431 hasConcept C119599485 @default.
- W3091819431 hasConcept C127413603 @default.
- W3091819431 hasConcept C136886441 @default.
- W3091819431 hasConcept C144024400 @default.
- W3091819431 hasConcept C154945302 @default.
- W3091819431 hasConcept C165801399 @default.
- W3091819431 hasConcept C188441871 @default.
- W3091819431 hasConcept C19165224 @default.
- W3091819431 hasConcept C203005215 @default.
- W3091819431 hasConcept C204321447 @default.
- W3091819431 hasConcept C41008148 @default.
- W3091819431 hasConcept C41608201 @default.
- W3091819431 hasConcept C66322947 @default.
- W3091819431 hasConceptScore W3091819431C108583219 @default.
- W3091819431 hasConceptScore W3091819431C119599485 @default.
- W3091819431 hasConceptScore W3091819431C127413603 @default.
- W3091819431 hasConceptScore W3091819431C136886441 @default.
- W3091819431 hasConceptScore W3091819431C144024400 @default.
- W3091819431 hasConceptScore W3091819431C154945302 @default.
- W3091819431 hasConceptScore W3091819431C165801399 @default.
- W3091819431 hasConceptScore W3091819431C188441871 @default.
- W3091819431 hasConceptScore W3091819431C19165224 @default.
- W3091819431 hasConceptScore W3091819431C203005215 @default.
- W3091819431 hasConceptScore W3091819431C204321447 @default.
- W3091819431 hasConceptScore W3091819431C41008148 @default.
- W3091819431 hasConceptScore W3091819431C41608201 @default.
- W3091819431 hasConceptScore W3091819431C66322947 @default.
- W3091819431 hasLocation W30918194311 @default.
- W3091819431 hasOpenAccess W3091819431 @default.
- W3091819431 hasPrimaryLocation W30918194311 @default.
- W3091819431 hasRelatedWork W1484029852 @default.
- W3091819431 hasRelatedWork W1512718085 @default.
- W3091819431 hasRelatedWork W1585034923 @default.
- W3091819431 hasRelatedWork W1592339875 @default.
- W3091819431 hasRelatedWork W1697423248 @default.
- W3091819431 hasRelatedWork W2135598948 @default.
- W3091819431 hasRelatedWork W2435130738 @default.
- W3091819431 hasRelatedWork W2747680751 @default.
- W3091819431 hasRelatedWork W3107474891 @default.
- W3091819431 hasRelatedWork W2610387714 @default.
- W3091819431 isParatext "false" @default.
- W3091819431 isRetracted "false" @default.
- W3091819431 magId "3091819431" @default.
- W3091819431 workType "article" @default.