Matches in SemOpenAlex for { <https://semopenalex.org/work/W3138587024> ?p ?o ?g. }
- W3138587024 endingPage "349" @default.
- W3138587024 startingPage "335" @default.
- W3138587024 abstract "Stochastic gradient descent (SGD) is an inherently sequential training algorithm--computing the gradient at batch $i$ depends on the model parameters learned from batch $i-1$. Prior approaches that break this dependence do not honor them (e.g., sum the gradients for each batch, which is not what sequential SGD would do) and thus potentially suffer from poor convergence. This paper introduces a novel method to combine gradients called Adasum (for adaptive sum) that converges faster than prior work. Adasum is easy to implement, almost as efficient as simply summing gradients, and is integrated into the open-source toolkit Horovod. This paper first provides a formal justification for Adasum and then empirically demonstrates Adasum is more accurate than prior gradient accumulation methods. It then introduces a series of case-studies to show Adasum works with multiple frameworks, (TensorFlow and PyTorch), scales multiple optimizers (Momentum-SGD, Adam, and LAMB) to larger batch-sizes while still giving good downstream accuracy. Finally, it proves that Adasum converges. To summarize, Adasum scales Momentum-SGD on the MLPerf Resnet50 benchmark to 64K examples before communication (no MLPerf v0.5 entry converged with more than 16K), the Adam optimizer to 64K examples before communication on BERT-LARGE (prior work showed Adam stopped scaling at 16K), and the LAMB optimizer to 128K before communication on BERT-LARGE (prior work used 64K), all while maintaining downstream accuracy metrics. Finally, if a user does not need to scale, we show LAMB with Adasum on BERT-LARGE converges in 30% fewer steps than the baseline." @default.
- W3138587024 created "2021-03-29" @default.
- W3138587024 creator A5001454502 @default.
- W3138587024 creator A5011198874 @default.
- W3138587024 creator A5013212213 @default.
- W3138587024 creator A5021349260 @default.
- W3138587024 creator A5023535717 @default.
- W3138587024 creator A5026843951 @default.
- W3138587024 creator A5027115167 @default.
- W3138587024 creator A5077514991 @default.
- W3138587024 date "2021-03-15" @default.
- W3138587024 modified "2023-09-26" @default.
- W3138587024 title "Scaling Distributed Training with Adaptive Summation" @default.
- W3138587024 hasPublicationYear "2021" @default.
- W3138587024 type Work @default.
- W3138587024 sameAs 3138587024 @default.
- W3138587024 citedByCount "1" @default.
- W3138587024 countsByYear W31385870242021 @default.
- W3138587024 crossrefType "journal-article" @default.
- W3138587024 hasAuthorship W3138587024A5001454502 @default.
- W3138587024 hasAuthorship W3138587024A5011198874 @default.
- W3138587024 hasAuthorship W3138587024A5013212213 @default.
- W3138587024 hasAuthorship W3138587024A5021349260 @default.
- W3138587024 hasAuthorship W3138587024A5023535717 @default.
- W3138587024 hasAuthorship W3138587024A5026843951 @default.
- W3138587024 hasAuthorship W3138587024A5027115167 @default.
- W3138587024 hasAuthorship W3138587024A5077514991 @default.
- W3138587024 hasConcept C10138342 @default.
- W3138587024 hasConcept C11413529 @default.
- W3138587024 hasConcept C126255220 @default.
- W3138587024 hasConcept C13280743 @default.
- W3138587024 hasConcept C143724316 @default.
- W3138587024 hasConcept C151730666 @default.
- W3138587024 hasConcept C153258448 @default.
- W3138587024 hasConcept C154945302 @default.
- W3138587024 hasConcept C162324750 @default.
- W3138587024 hasConcept C185798385 @default.
- W3138587024 hasConcept C205649164 @default.
- W3138587024 hasConcept C206688291 @default.
- W3138587024 hasConcept C2524010 @default.
- W3138587024 hasConcept C26517878 @default.
- W3138587024 hasConcept C2777303404 @default.
- W3138587024 hasConcept C33923547 @default.
- W3138587024 hasConcept C38652104 @default.
- W3138587024 hasConcept C41008148 @default.
- W3138587024 hasConcept C50522688 @default.
- W3138587024 hasConcept C50644808 @default.
- W3138587024 hasConcept C57869625 @default.
- W3138587024 hasConcept C60718061 @default.
- W3138587024 hasConcept C86803240 @default.
- W3138587024 hasConcept C99844830 @default.
- W3138587024 hasConceptScore W3138587024C10138342 @default.
- W3138587024 hasConceptScore W3138587024C11413529 @default.
- W3138587024 hasConceptScore W3138587024C126255220 @default.
- W3138587024 hasConceptScore W3138587024C13280743 @default.
- W3138587024 hasConceptScore W3138587024C143724316 @default.
- W3138587024 hasConceptScore W3138587024C151730666 @default.
- W3138587024 hasConceptScore W3138587024C153258448 @default.
- W3138587024 hasConceptScore W3138587024C154945302 @default.
- W3138587024 hasConceptScore W3138587024C162324750 @default.
- W3138587024 hasConceptScore W3138587024C185798385 @default.
- W3138587024 hasConceptScore W3138587024C205649164 @default.
- W3138587024 hasConceptScore W3138587024C206688291 @default.
- W3138587024 hasConceptScore W3138587024C2524010 @default.
- W3138587024 hasConceptScore W3138587024C26517878 @default.
- W3138587024 hasConceptScore W3138587024C2777303404 @default.
- W3138587024 hasConceptScore W3138587024C33923547 @default.
- W3138587024 hasConceptScore W3138587024C38652104 @default.
- W3138587024 hasConceptScore W3138587024C41008148 @default.
- W3138587024 hasConceptScore W3138587024C50522688 @default.
- W3138587024 hasConceptScore W3138587024C50644808 @default.
- W3138587024 hasConceptScore W3138587024C57869625 @default.
- W3138587024 hasConceptScore W3138587024C60718061 @default.
- W3138587024 hasConceptScore W3138587024C86803240 @default.
- W3138587024 hasConceptScore W3138587024C99844830 @default.
- W3138587024 hasLocation W31385870241 @default.
- W3138587024 hasOpenAccess W3138587024 @default.
- W3138587024 hasPrimaryLocation W31385870241 @default.
- W3138587024 hasRelatedWork W1565687702 @default.
- W3138587024 hasRelatedWork W1629097610 @default.
- W3138587024 hasRelatedWork W2120420045 @default.
- W3138587024 hasRelatedWork W2618398196 @default.
- W3138587024 hasRelatedWork W2619516334 @default.
- W3138587024 hasRelatedWork W2914311156 @default.
- W3138587024 hasRelatedWork W2962911098 @default.
- W3138587024 hasRelatedWork W2970498991 @default.
- W3138587024 hasRelatedWork W3005981462 @default.
- W3138587024 hasRelatedWork W3024569256 @default.
- W3138587024 hasRelatedWork W3032993654 @default.
- W3138587024 hasRelatedWork W3034797957 @default.
- W3138587024 hasRelatedWork W3035253236 @default.
- W3138587024 hasRelatedWork W3035445852 @default.
- W3138587024 hasRelatedWork W3037950432 @default.
- W3138587024 hasRelatedWork W3046881646 @default.
- W3138587024 hasRelatedWork W3094600974 @default.
- W3138587024 hasRelatedWork W3159886027 @default.
- W3138587024 hasRelatedWork W3165067279 @default.
- W3138587024 hasRelatedWork W3213997898 @default.