Matches in SemOpenAlex for { <https://semopenalex.org/work/W3083109176> ?p ?o ?g. }
Showing items 1 to 85 of
85
with 100 items per page.
- W3083109176 abstract "Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not. In this paper, we empirically explore automating the design choices in the transformer model, i.e., how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand. RL is employed to navigate along search space, and special parameter sharing strategies are designed to accelerate the search. It is shown that sampling a proportion of training data per epoch during search help to improve the search quality. Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers. In particular, we show that our learned model can be trained more robustly with large learning rates without warm-up." @default.
- W3083109176 created "2020-09-11" @default.
- W3083109176 creator A5017557616 @default.
- W3083109176 creator A5023775139 @default.
- W3083109176 creator A5028055721 @default.
- W3083109176 creator A5035939993 @default.
- W3083109176 creator A5044665993 @default.
- W3083109176 date "2020-09-04" @default.
- W3083109176 modified "2023-09-25" @default.
- W3083109176 title "AutoTrans: Automating Transformer Design via Reinforced Architecture Search" @default.
- W3083109176 cites W2106411961 @default.
- W3083109176 cites W2250539671 @default.
- W3083109176 cites W2345720230 @default.
- W3083109176 cites W2912521296 @default.
- W3083109176 cites W2936599103 @default.
- W3083109176 cites W2944815030 @default.
- W3083109176 cites W2951104886 @default.
- W3083109176 cites W2962746461 @default.
- W3083109176 cites W2962784628 @default.
- W3083109176 cites W2963136578 @default.
- W3083109176 cites W2963341956 @default.
- W3083109176 cites W2963374479 @default.
- W3083109176 cites W2963403868 @default.
- W3083109176 cites W2963542740 @default.
- W3083109176 cites W2964081807 @default.
- W3083109176 cites W2964259004 @default.
- W3083109176 cites W2964444661 @default.
- W3083109176 cites W2970290486 @default.
- W3083109176 cites W2970777192 @default.
- W3083109176 cites W2983180560 @default.
- W3083109176 cites W2996428491 @default.
- W3083109176 cites W3000514857 @default.
- W3083109176 cites W3034772996 @default.
- W3083109176 cites W3041866211 @default.
- W3083109176 doi "https://doi.org/10.48550/arxiv.2009.02070" @default.
- W3083109176 hasPublicationYear "2020" @default.
- W3083109176 type Work @default.
- W3083109176 sameAs 3083109176 @default.
- W3083109176 citedByCount "3" @default.
- W3083109176 countsByYear W30831091762020 @default.
- W3083109176 countsByYear W30831091762021 @default.
- W3083109176 crossrefType "posted-content" @default.
- W3083109176 hasAuthorship W3083109176A5017557616 @default.
- W3083109176 hasAuthorship W3083109176A5023775139 @default.
- W3083109176 hasAuthorship W3083109176A5028055721 @default.
- W3083109176 hasAuthorship W3083109176A5035939993 @default.
- W3083109176 hasAuthorship W3083109176A5044665993 @default.
- W3083109176 hasBestOaLocation W30831091761 @default.
- W3083109176 hasConcept C119599485 @default.
- W3083109176 hasConcept C119857082 @default.
- W3083109176 hasConcept C123657996 @default.
- W3083109176 hasConcept C127413603 @default.
- W3083109176 hasConcept C142362112 @default.
- W3083109176 hasConcept C153349607 @default.
- W3083109176 hasConcept C154945302 @default.
- W3083109176 hasConcept C165801399 @default.
- W3083109176 hasConcept C41008148 @default.
- W3083109176 hasConcept C66322947 @default.
- W3083109176 hasConceptScore W3083109176C119599485 @default.
- W3083109176 hasConceptScore W3083109176C119857082 @default.
- W3083109176 hasConceptScore W3083109176C123657996 @default.
- W3083109176 hasConceptScore W3083109176C127413603 @default.
- W3083109176 hasConceptScore W3083109176C142362112 @default.
- W3083109176 hasConceptScore W3083109176C153349607 @default.
- W3083109176 hasConceptScore W3083109176C154945302 @default.
- W3083109176 hasConceptScore W3083109176C165801399 @default.
- W3083109176 hasConceptScore W3083109176C41008148 @default.
- W3083109176 hasConceptScore W3083109176C66322947 @default.
- W3083109176 hasLocation W30831091761 @default.
- W3083109176 hasOpenAccess W3083109176 @default.
- W3083109176 hasPrimaryLocation W30831091761 @default.
- W3083109176 hasRelatedWork W2364531466 @default.
- W3083109176 hasRelatedWork W2961085424 @default.
- W3083109176 hasRelatedWork W3046775127 @default.
- W3083109176 hasRelatedWork W3170094116 @default.
- W3083109176 hasRelatedWork W4205958290 @default.
- W3083109176 hasRelatedWork W4285260836 @default.
- W3083109176 hasRelatedWork W4286629047 @default.
- W3083109176 hasRelatedWork W4306321456 @default.
- W3083109176 hasRelatedWork W4306674287 @default.
- W3083109176 hasRelatedWork W4224009465 @default.
- W3083109176 isParatext "false" @default.
- W3083109176 isRetracted "false" @default.
- W3083109176 magId "3083109176" @default.
- W3083109176 workType "article" @default.