Matches in SemOpenAlex for { <https://semopenalex.org/work/W4320854821> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W4320854821 abstract "The Transformer architecture consists of self-attention and feed-forward networks (FFNs) which can be viewed as key-value memories according to previous works. However, FFN and traditional memory utilize different activation functions (i.e., ReLU and Softmax respectively), which makes them not equivalent. In this paper, we first rebuild the connections between FFN and key-value memory by conducting extensive studies on ReLU and Softmax, and find they are equivalent when adding an additional layer normalization module on Softmax. In addition, ReLU outperforms Softmax on both FFN and key-value memory when the number of value slots is large. We analyze the reasons and then explore this good property of ReLU on the self-attention network where the original Softmax activation performs poorly on long input sequences. We then propose a full ReLU architecture named ReLUFormer which performs better than the baseline Transformer on long sequence tasks such as document translation. This paper sheds light on the following points: 1) Softmax and ReLU use different normalization methods over elements which lead to different variances of results, and ReLU is good at dealing with a large number of key-value slots; 2) FFN and key-value memory are equivalent, and thus the Transformer can be viewed as a memory network where FFNs and self-attention networks are both key-value memories." @default.
- W4320854821 created "2023-02-16" @default.
- W4320854821 creator A5005226552 @default.
- W4320854821 creator A5030951014 @default.
- W4320854821 creator A5042906130 @default.
- W4320854821 creator A5055122985 @default.
- W4320854821 creator A5063062444 @default.
- W4320854821 creator A5070812231 @default.
- W4320854821 date "2023-02-13" @default.
- W4320854821 modified "2023-10-16" @default.
- W4320854821 title "A Study on ReLU and Softmax in Transformer" @default.
- W4320854821 doi "https://doi.org/10.48550/arxiv.2302.06461" @default.
- W4320854821 hasPublicationYear "2023" @default.
- W4320854821 type Work @default.
- W4320854821 citedByCount "0" @default.
- W4320854821 crossrefType "posted-content" @default.
- W4320854821 hasAuthorship W4320854821A5005226552 @default.
- W4320854821 hasAuthorship W4320854821A5030951014 @default.
- W4320854821 hasAuthorship W4320854821A5042906130 @default.
- W4320854821 hasAuthorship W4320854821A5055122985 @default.
- W4320854821 hasAuthorship W4320854821A5063062444 @default.
- W4320854821 hasAuthorship W4320854821A5070812231 @default.
- W4320854821 hasBestOaLocation W43208548211 @default.
- W4320854821 hasConcept C119599485 @default.
- W4320854821 hasConcept C127413603 @default.
- W4320854821 hasConcept C136886441 @default.
- W4320854821 hasConcept C144024400 @default.
- W4320854821 hasConcept C154945302 @default.
- W4320854821 hasConcept C165801399 @default.
- W4320854821 hasConcept C188441871 @default.
- W4320854821 hasConcept C19165224 @default.
- W4320854821 hasConcept C26517878 @default.
- W4320854821 hasConcept C38652104 @default.
- W4320854821 hasConcept C41008148 @default.
- W4320854821 hasConcept C50644808 @default.
- W4320854821 hasConcept C66322947 @default.
- W4320854821 hasConceptScore W4320854821C119599485 @default.
- W4320854821 hasConceptScore W4320854821C127413603 @default.
- W4320854821 hasConceptScore W4320854821C136886441 @default.
- W4320854821 hasConceptScore W4320854821C144024400 @default.
- W4320854821 hasConceptScore W4320854821C154945302 @default.
- W4320854821 hasConceptScore W4320854821C165801399 @default.
- W4320854821 hasConceptScore W4320854821C188441871 @default.
- W4320854821 hasConceptScore W4320854821C19165224 @default.
- W4320854821 hasConceptScore W4320854821C26517878 @default.
- W4320854821 hasConceptScore W4320854821C38652104 @default.
- W4320854821 hasConceptScore W4320854821C41008148 @default.
- W4320854821 hasConceptScore W4320854821C50644808 @default.
- W4320854821 hasConceptScore W4320854821C66322947 @default.
- W4320854821 hasLocation W43208548211 @default.
- W4320854821 hasOpenAccess W4320854821 @default.
- W4320854821 hasPrimaryLocation W43208548211 @default.
- W4320854821 hasRelatedWork W2888789309 @default.
- W4320854821 hasRelatedWork W2922692936 @default.
- W4320854821 hasRelatedWork W2971416272 @default.
- W4320854821 hasRelatedWork W2977314777 @default.
- W4320854821 hasRelatedWork W3137649401 @default.
- W4320854821 hasRelatedWork W3185514949 @default.
- W4320854821 hasRelatedWork W3190626620 @default.
- W4320854821 hasRelatedWork W4283328349 @default.
- W4320854821 hasRelatedWork W4287268156 @default.
- W4320854821 hasRelatedWork W4307834408 @default.
- W4320854821 isParatext "false" @default.
- W4320854821 isRetracted "false" @default.
- W4320854821 workType "article" @default.