Matches in SemOpenAlex for { <https://semopenalex.org/work/W4221156872> ?p ?o ?g. }
Showing items 1 to 60 of
60
with 100 items per page.
- W4221156872 abstract "Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g_t^2 and the squared prediction error (m_t-g_t)^2, respectively, where m_t is the first momentum at iteration t and can be viewed as a prediction of g_t. In this work, we investigate if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above research question in two steps. Firstly, we slightly modify Adam and AdaBelief by introducing layerwise adaptive stepsizes in their update procedures via either pre- or post-processing. Our empirical results indicate that the slight modification produces comparable performance for training VGG and ResNet models over CIFAR10 and CIFAR100, suggesting that layer-wise gradient statistics play an important role towards the success of Adam and AdaBelief for at least certian DNN tasks. In the second step, we propose Aida, a new optimisation method, with the objective that the elementwise stepsizes within each layer have significantly smaller statistical variances, and the layerwise average stepsizes are much more compact across all the layers. Motivated by the fact that (m_t-g_t)^2 in AdaBelief is conservative in comparison to g_t^2 in Adam in terms of layerwise statistical averages and variances, Aida is designed by tracking a more conservative function of m_t and g_t than (m_t-g_t)^2 via layerwise vector projections. Experimental results show that Aida produces either competitive or better performance with respect to a number of existing methods including Adam and AdaBelief for a set of challenging DNN tasks. Code is available <a href=https://github.com/guoqiang-x-zhang/AidaOptimizer>at this URL</a>" @default.
- W4221156872 created "2022-04-03" @default.
- W4221156872 creator A5046170826 @default.
- W4221156872 creator A5046662102 @default.
- W4221156872 creator A5088034962 @default.
- W4221156872 date "2022-03-24" @default.
- W4221156872 modified "2023-10-16" @default.
- W4221156872 title "On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks" @default.
- W4221156872 hasPublicationYear "2022" @default.
- W4221156872 type Work @default.
- W4221156872 citedByCount "0" @default.
- W4221156872 crossrefType "posted-content" @default.
- W4221156872 hasAuthorship W4221156872A5046170826 @default.
- W4221156872 hasAuthorship W4221156872A5046662102 @default.
- W4221156872 hasAuthorship W4221156872A5088034962 @default.
- W4221156872 hasBestOaLocation W42211568721 @default.
- W4221156872 hasConcept C105795698 @default.
- W4221156872 hasConcept C11413529 @default.
- W4221156872 hasConcept C134306372 @default.
- W4221156872 hasConcept C139945424 @default.
- W4221156872 hasConcept C14036430 @default.
- W4221156872 hasConcept C151376022 @default.
- W4221156872 hasConcept C154945302 @default.
- W4221156872 hasConcept C28826006 @default.
- W4221156872 hasConcept C2984842247 @default.
- W4221156872 hasConcept C33923547 @default.
- W4221156872 hasConcept C41008148 @default.
- W4221156872 hasConcept C50644808 @default.
- W4221156872 hasConcept C78458016 @default.
- W4221156872 hasConcept C86803240 @default.
- W4221156872 hasConceptScore W4221156872C105795698 @default.
- W4221156872 hasConceptScore W4221156872C11413529 @default.
- W4221156872 hasConceptScore W4221156872C134306372 @default.
- W4221156872 hasConceptScore W4221156872C139945424 @default.
- W4221156872 hasConceptScore W4221156872C14036430 @default.
- W4221156872 hasConceptScore W4221156872C151376022 @default.
- W4221156872 hasConceptScore W4221156872C154945302 @default.
- W4221156872 hasConceptScore W4221156872C28826006 @default.
- W4221156872 hasConceptScore W4221156872C2984842247 @default.
- W4221156872 hasConceptScore W4221156872C33923547 @default.
- W4221156872 hasConceptScore W4221156872C41008148 @default.
- W4221156872 hasConceptScore W4221156872C50644808 @default.
- W4221156872 hasConceptScore W4221156872C78458016 @default.
- W4221156872 hasConceptScore W4221156872C86803240 @default.
- W4221156872 hasLocation W42211568721 @default.
- W4221156872 hasOpenAccess W4221156872 @default.
- W4221156872 hasPrimaryLocation W42211568721 @default.
- W4221156872 hasRelatedWork W1268192 @default.
- W4221156872 hasRelatedWork W12952539 @default.
- W4221156872 hasRelatedWork W1407330 @default.
- W4221156872 hasRelatedWork W15354828 @default.
- W4221156872 hasRelatedWork W3063455 @default.
- W4221156872 hasRelatedWork W3133799 @default.
- W4221156872 hasRelatedWork W3495599 @default.
- W4221156872 hasRelatedWork W699561 @default.
- W4221156872 hasRelatedWork W792754 @default.
- W4221156872 hasRelatedWork W9554121 @default.
- W4221156872 isParatext "false" @default.
- W4221156872 isRetracted "false" @default.
- W4221156872 workType "article" @default.