Matches in SemOpenAlex for { <https://semopenalex.org/work/W3004988528> ?p ?o ?g. }
- W3004988528 abstract "We provide an improved analysis of normalized SGD showing that adding momentum provably removes the need for large batch sizes on non-convex objectives. Then, we consider the case of objectives with bounded second derivative and show that in this case a small tweak to the momentum formula allows normalized SGD with momentum to find an $epsilon$-critical point in $O(1/epsilon^{3.5})$ iterations, matching the best-known rates without accruing any logarithmic factors or dependence on dimension. We also provide an adaptive method that automatically improves convergence rates when the variance in the gradients is small. Finally, we show that our method is effective when employed on popular large scale tasks such as ResNet-50 and BERT pretraining, matching the performance of the disparate methods used to get state-of-the-art results on both tasks." @default.
- W3004988528 created "2020-02-14" @default.
- W3004988528 creator A5006453681 @default.
- W3004988528 creator A5066011484 @default.
- W3004988528 date "2020-02-09" @default.
- W3004988528 modified "2023-10-01" @default.
- W3004988528 title "Momentum Improves Normalized SGD" @default.
- W3004988528 cites W1988720110 @default.
- W3004988528 cites W2107438106 @default.
- W3004988528 cites W2108598243 @default.
- W3004988528 cites W2194775991 @default.
- W3004988528 cites W2405601855 @default.
- W3004988528 cites W2622263826 @default.
- W3004988528 cites W2797328513 @default.
- W3004988528 cites W2808953177 @default.
- W3004988528 cites W2891952073 @default.
- W3004988528 cites W2896457183 @default.
- W3004988528 cites W2946511237 @default.
- W3004988528 cites W2948930516 @default.
- W3004988528 cites W2963217371 @default.
- W3004988528 cites W2963411541 @default.
- W3004988528 cites W2963470657 @default.
- W3004988528 cites W2964121744 @default.
- W3004988528 cites W2966465943 @default.
- W3004988528 cites W2970623506 @default.
- W3004988528 cites W2971216715 @default.
- W3004988528 cites W2993258424 @default.
- W3004988528 cites W836608889 @default.
- W3004988528 hasPublicationYear "2020" @default.
- W3004988528 type Work @default.
- W3004988528 sameAs 3004988528 @default.
- W3004988528 citedByCount "0" @default.
- W3004988528 crossrefType "posted-content" @default.
- W3004988528 hasAuthorship W3004988528A5006453681 @default.
- W3004988528 hasAuthorship W3004988528A5066011484 @default.
- W3004988528 hasConcept C10138342 @default.
- W3004988528 hasConcept C105795698 @default.
- W3004988528 hasConcept C112680207 @default.
- W3004988528 hasConcept C11413529 @default.
- W3004988528 hasConcept C114614502 @default.
- W3004988528 hasConcept C121332964 @default.
- W3004988528 hasConcept C121955636 @default.
- W3004988528 hasConcept C126255220 @default.
- W3004988528 hasConcept C134306372 @default.
- W3004988528 hasConcept C144133560 @default.
- W3004988528 hasConcept C145446738 @default.
- W3004988528 hasConcept C162324750 @default.
- W3004988528 hasConcept C165064840 @default.
- W3004988528 hasConcept C196083921 @default.
- W3004988528 hasConcept C2524010 @default.
- W3004988528 hasConcept C26517878 @default.
- W3004988528 hasConcept C2777303404 @default.
- W3004988528 hasConcept C2778755073 @default.
- W3004988528 hasConcept C28826006 @default.
- W3004988528 hasConcept C33676613 @default.
- W3004988528 hasConcept C33923547 @default.
- W3004988528 hasConcept C34388435 @default.
- W3004988528 hasConcept C38652104 @default.
- W3004988528 hasConcept C39927690 @default.
- W3004988528 hasConcept C41008148 @default.
- W3004988528 hasConcept C50522688 @default.
- W3004988528 hasConcept C57869625 @default.
- W3004988528 hasConcept C60718061 @default.
- W3004988528 hasConcept C62520636 @default.
- W3004988528 hasConceptScore W3004988528C10138342 @default.
- W3004988528 hasConceptScore W3004988528C105795698 @default.
- W3004988528 hasConceptScore W3004988528C112680207 @default.
- W3004988528 hasConceptScore W3004988528C11413529 @default.
- W3004988528 hasConceptScore W3004988528C114614502 @default.
- W3004988528 hasConceptScore W3004988528C121332964 @default.
- W3004988528 hasConceptScore W3004988528C121955636 @default.
- W3004988528 hasConceptScore W3004988528C126255220 @default.
- W3004988528 hasConceptScore W3004988528C134306372 @default.
- W3004988528 hasConceptScore W3004988528C144133560 @default.
- W3004988528 hasConceptScore W3004988528C145446738 @default.
- W3004988528 hasConceptScore W3004988528C162324750 @default.
- W3004988528 hasConceptScore W3004988528C165064840 @default.
- W3004988528 hasConceptScore W3004988528C196083921 @default.
- W3004988528 hasConceptScore W3004988528C2524010 @default.
- W3004988528 hasConceptScore W3004988528C26517878 @default.
- W3004988528 hasConceptScore W3004988528C2777303404 @default.
- W3004988528 hasConceptScore W3004988528C2778755073 @default.
- W3004988528 hasConceptScore W3004988528C28826006 @default.
- W3004988528 hasConceptScore W3004988528C33676613 @default.
- W3004988528 hasConceptScore W3004988528C33923547 @default.
- W3004988528 hasConceptScore W3004988528C34388435 @default.
- W3004988528 hasConceptScore W3004988528C38652104 @default.
- W3004988528 hasConceptScore W3004988528C39927690 @default.
- W3004988528 hasConceptScore W3004988528C41008148 @default.
- W3004988528 hasConceptScore W3004988528C50522688 @default.
- W3004988528 hasConceptScore W3004988528C57869625 @default.
- W3004988528 hasConceptScore W3004988528C60718061 @default.
- W3004988528 hasConceptScore W3004988528C62520636 @default.
- W3004988528 hasLocation W30049885281 @default.
- W3004988528 hasOpenAccess W3004988528 @default.
- W3004988528 hasPrimaryLocation W30049885281 @default.
- W3004988528 hasRelatedWork W2301987905 @default.
- W3004988528 hasRelatedWork W2811201469 @default.
- W3004988528 hasRelatedWork W2912205739 @default.
- W3004988528 hasRelatedWork W2940467104 @default.