Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287554867> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4287554867 abstract "Intensive communication and synchronization cost for gradients and parameters is the well-known bottleneck of distributed deep learning training. Based on the observations that Synchronous SGD (SSGD) obtains good convergence accuracy while asynchronous SGD (ASGD) delivers a faster raw training speed, we propose Several Steps Delay SGD (SSD-SGD) to combine their merits, aiming at tackling the communication bottleneck via communication sparsification. SSD-SGD explores both global synchronous updates in the parameter servers and asynchronous local updates in the workers in each periodic iteration. The periodic and flexible synchronization makes SSD-SGD achieve good convergence accuracy and fast training speed. To the best of our knowledge, we strike the new balance between synchronization quality and communication sparsification, and improve the trade-off between accuracy and training speed. Specifically, the core components of SSD-SGD include proper warm-up stage, steps delay stage, and our novel algorithm of global gradient for local update (GLU). GLU is critical for local update operations to effectively compensate the delayed local weights. Furthermore, we implement SSD-SGD on MXNet framework and comprehensively evaluate its performance with CIFAR-10 and ImageNet datasets. Experimental results show that SSD-SGD can accelerate distributed training speed under different experimental configurations, by up to 110%, while achieving good convergence accuracy." @default.
- W4287554867 created "2022-07-25" @default.
- W4287554867 creator A5000688713 @default.
- W4287554867 creator A5006729432 @default.
- W4287554867 creator A5012198727 @default.
- W4287554867 creator A5012472163 @default.
- W4287554867 creator A5042353509 @default.
- W4287554867 date "2020-12-09" @default.
- W4287554867 modified "2023-10-16" @default.
- W4287554867 title "SSD-SSD: Communication sparsification for distributed deep learning training" @default.
- W4287554867 doi "https://doi.org/10.48550/arxiv.2012.05396" @default.
- W4287554867 hasPublicationYear "2020" @default.
- W4287554867 type Work @default.
- W4287554867 citedByCount "0" @default.
- W4287554867 crossrefType "posted-content" @default.
- W4287554867 hasAuthorship W4287554867A5000688713 @default.
- W4287554867 hasAuthorship W4287554867A5006729432 @default.
- W4287554867 hasAuthorship W4287554867A5012198727 @default.
- W4287554867 hasAuthorship W4287554867A5012472163 @default.
- W4287554867 hasAuthorship W4287554867A5042353509 @default.
- W4287554867 hasBestOaLocation W42875548671 @default.
- W4287554867 hasConcept C120314980 @default.
- W4287554867 hasConcept C121332964 @default.
- W4287554867 hasConcept C127162648 @default.
- W4287554867 hasConcept C149635348 @default.
- W4287554867 hasConcept C151319957 @default.
- W4287554867 hasConcept C153294291 @default.
- W4287554867 hasConcept C154945302 @default.
- W4287554867 hasConcept C162324750 @default.
- W4287554867 hasConcept C173608175 @default.
- W4287554867 hasConcept C2777211547 @default.
- W4287554867 hasConcept C2777303404 @default.
- W4287554867 hasConcept C2778562939 @default.
- W4287554867 hasConcept C2780513914 @default.
- W4287554867 hasConcept C31258907 @default.
- W4287554867 hasConcept C41008148 @default.
- W4287554867 hasConcept C50522688 @default.
- W4287554867 hasConcept C68339613 @default.
- W4287554867 hasConceptScore W4287554867C120314980 @default.
- W4287554867 hasConceptScore W4287554867C121332964 @default.
- W4287554867 hasConceptScore W4287554867C127162648 @default.
- W4287554867 hasConceptScore W4287554867C149635348 @default.
- W4287554867 hasConceptScore W4287554867C151319957 @default.
- W4287554867 hasConceptScore W4287554867C153294291 @default.
- W4287554867 hasConceptScore W4287554867C154945302 @default.
- W4287554867 hasConceptScore W4287554867C162324750 @default.
- W4287554867 hasConceptScore W4287554867C173608175 @default.
- W4287554867 hasConceptScore W4287554867C2777211547 @default.
- W4287554867 hasConceptScore W4287554867C2777303404 @default.
- W4287554867 hasConceptScore W4287554867C2778562939 @default.
- W4287554867 hasConceptScore W4287554867C2780513914 @default.
- W4287554867 hasConceptScore W4287554867C31258907 @default.
- W4287554867 hasConceptScore W4287554867C41008148 @default.
- W4287554867 hasConceptScore W4287554867C50522688 @default.
- W4287554867 hasConceptScore W4287554867C68339613 @default.
- W4287554867 hasLocation W42875548671 @default.
- W4287554867 hasOpenAccess W4287554867 @default.
- W4287554867 hasPrimaryLocation W42875548671 @default.
- W4287554867 hasRelatedWork W1559884866 @default.
- W4287554867 hasRelatedWork W2124870959 @default.
- W4287554867 hasRelatedWork W2133123956 @default.
- W4287554867 hasRelatedWork W2313989154 @default.
- W4287554867 hasRelatedWork W2364921833 @default.
- W4287554867 hasRelatedWork W4226025883 @default.
- W4287554867 hasRelatedWork W4242837953 @default.
- W4287554867 hasRelatedWork W4245208434 @default.
- W4287554867 hasRelatedWork W4249402901 @default.
- W4287554867 hasRelatedWork W94000989 @default.
- W4287554867 isParatext "false" @default.
- W4287554867 isRetracted "false" @default.
- W4287554867 workType "article" @default.