Matches in SemOpenAlex for { <https://semopenalex.org/work/W4297798623> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4297798623 abstract "Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms. The statistical assumption that the input is normal distributed supports the gradient stability of Softmax. However, when used in attention mechanisms such as transformers, since the correlation scores between embeddings are often not normally distributed, the gradient vanishing problem appears, and we prove this point through experimental confirmation. In this work, we suggest that replacing the exponential function by periodic functions, and we delve into some potential periodic alternatives of Softmax from the view of value and gradient. Through experiments on a simply designed demo referenced to LeViT, our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants. Further, we analyze the impact of pre-normalization for Softmax and our methods through mathematics and experiments. Lastly, we increase the depth of the demo and prove the applicability of our method in deep structures." @default.
- W4297798623 created "2022-10-01" @default.
- W4297798623 creator A5048826252 @default.
- W4297798623 creator A5060470951 @default.
- W4297798623 creator A5069651351 @default.
- W4297798623 date "2021-08-16" @default.
- W4297798623 modified "2023-09-24" @default.
- W4297798623 title "Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism" @default.
- W4297798623 doi "https://doi.org/10.48550/arxiv.2108.07153" @default.
- W4297798623 hasPublicationYear "2021" @default.
- W4297798623 type Work @default.
- W4297798623 citedByCount "0" @default.
- W4297798623 crossrefType "posted-content" @default.
- W4297798623 hasAuthorship W4297798623A5048826252 @default.
- W4297798623 hasAuthorship W4297798623A5060470951 @default.
- W4297798623 hasAuthorship W4297798623A5069651351 @default.
- W4297798623 hasBestOaLocation W42977986231 @default.
- W4297798623 hasConcept C11413529 @default.
- W4297798623 hasConcept C134306372 @default.
- W4297798623 hasConcept C136886441 @default.
- W4297798623 hasConcept C144024400 @default.
- W4297798623 hasConcept C151376022 @default.
- W4297798623 hasConcept C154945302 @default.
- W4297798623 hasConcept C188441871 @default.
- W4297798623 hasConcept C19165224 @default.
- W4297798623 hasConcept C33923547 @default.
- W4297798623 hasConcept C41008148 @default.
- W4297798623 hasConcept C50644808 @default.
- W4297798623 hasConceptScore W4297798623C11413529 @default.
- W4297798623 hasConceptScore W4297798623C134306372 @default.
- W4297798623 hasConceptScore W4297798623C136886441 @default.
- W4297798623 hasConceptScore W4297798623C144024400 @default.
- W4297798623 hasConceptScore W4297798623C151376022 @default.
- W4297798623 hasConceptScore W4297798623C154945302 @default.
- W4297798623 hasConceptScore W4297798623C188441871 @default.
- W4297798623 hasConceptScore W4297798623C19165224 @default.
- W4297798623 hasConceptScore W4297798623C33923547 @default.
- W4297798623 hasConceptScore W4297798623C41008148 @default.
- W4297798623 hasConceptScore W4297798623C50644808 @default.
- W4297798623 hasLocation W42977986231 @default.
- W4297798623 hasOpenAccess W4297798623 @default.
- W4297798623 hasPrimaryLocation W42977986231 @default.
- W4297798623 hasRelatedWork W2888789309 @default.
- W4297798623 hasRelatedWork W2913039608 @default.
- W4297798623 hasRelatedWork W2922692936 @default.
- W4297798623 hasRelatedWork W2971416272 @default.
- W4297798623 hasRelatedWork W3000076038 @default.
- W4297798623 hasRelatedWork W3170224572 @default.
- W4297798623 hasRelatedWork W3185514949 @default.
- W4297798623 hasRelatedWork W4283328349 @default.
- W4297798623 hasRelatedWork W4287904794 @default.
- W4297798623 hasRelatedWork W4320925816 @default.
- W4297798623 isParatext "false" @default.
- W4297798623 isRetracted "false" @default.
- W4297798623 workType "article" @default.