Matches in SemOpenAlex for { <https://semopenalex.org/work/W2999276551> ?p ?o ?g. }
- W2999276551 abstract "In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications. With increasing complexity of learning models and amounts of training data, data-parallel approaches based on frequent all-reduce synchronization steps are increasingly popular. Despite the fact that high-performance computing (HPC) technologies have been designed to address such patterns efficiently, the behavior of data-parallel approaches on HPC platforms is not well understood. To address this issue, in this paper we study the behavior of Horovod, a popular data-parallel approach that relies on MPI, on Theta, a pre-Exascale machine at Argonne National Laboratory. Using two representative applications, we explore two aspects: (1) how performance and scalability is affected by important parameters such as number of nodes, number of workers, threads per node, batch size; (2) how computational phases are interleaved withall-reduce communication phases at fine granularity and what consequences this interleaving has in terms of potential bottlenecks. Our findings show that pipelining of back-propagation, gradient reduction and weight updates mitigate the effects of stragglers during all-reduce only partially. Furthermore, there can be significant delays between weights update, which can be leveraged to mask the overhead of additional background operations that are coupled with the training." @default.
- W2999276551 created "2020-01-23" @default.
- W2999276551 creator A5010055736 @default.
- W2999276551 creator A5020776475 @default.
- W2999276551 creator A5023743368 @default.
- W2999276551 creator A5085745891 @default.
- W2999276551 date "2019-11-01" @default.
- W2999276551 modified "2023-10-18" @default.
- W2999276551 title "Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training" @default.
- W2999276551 cites W1533411515 @default.
- W2999276551 cites W1861703862 @default.
- W2999276551 cites W2108598243 @default.
- W2999276551 cites W2127941149 @default.
- W2999276551 cites W2155893237 @default.
- W2999276551 cites W2181607856 @default.
- W2999276551 cites W2194775991 @default.
- W2999276551 cites W2576960802 @default.
- W2999276551 cites W2805253785 @default.
- W2999276551 cites W2898319404 @default.
- W2999276551 cites W2906137127 @default.
- W2999276551 cites W2960293275 @default.
- W2999276551 cites W2961595369 @default.
- W2999276551 cites W2962747323 @default.
- W2999276551 cites W2962863496 @default.
- W2999276551 cites W2962911728 @default.
- W2999276551 cites W2963304552 @default.
- W2999276551 cites W2963382388 @default.
- W2999276551 cites W2963807318 @default.
- W2999276551 cites W2963933775 @default.
- W2999276551 doi "https://doi.org/10.1109/mlhpc49564.2019.00006" @default.
- W2999276551 hasPublicationYear "2019" @default.
- W2999276551 type Work @default.
- W2999276551 sameAs 2999276551 @default.
- W2999276551 citedByCount "5" @default.
- W2999276551 countsByYear W29992765512020 @default.
- W2999276551 countsByYear W29992765512021 @default.
- W2999276551 crossrefType "proceedings-article" @default.
- W2999276551 hasAuthorship W2999276551A5010055736 @default.
- W2999276551 hasAuthorship W2999276551A5020776475 @default.
- W2999276551 hasAuthorship W2999276551A5023743368 @default.
- W2999276551 hasAuthorship W2999276551A5085745891 @default.
- W2999276551 hasBestOaLocation W29992765512 @default.
- W2999276551 hasConcept C111919701 @default.
- W2999276551 hasConcept C120314980 @default.
- W2999276551 hasConcept C124101348 @default.
- W2999276551 hasConcept C127162648 @default.
- W2999276551 hasConcept C127413603 @default.
- W2999276551 hasConcept C165696696 @default.
- W2999276551 hasConcept C173608175 @default.
- W2999276551 hasConcept C177774035 @default.
- W2999276551 hasConcept C190475519 @default.
- W2999276551 hasConcept C2778562939 @default.
- W2999276551 hasConcept C2779960059 @default.
- W2999276551 hasConcept C2781172179 @default.
- W2999276551 hasConcept C28034677 @default.
- W2999276551 hasConcept C31258907 @default.
- W2999276551 hasConcept C38652104 @default.
- W2999276551 hasConcept C41008148 @default.
- W2999276551 hasConcept C48044578 @default.
- W2999276551 hasConcept C62611344 @default.
- W2999276551 hasConcept C66938386 @default.
- W2999276551 hasConcept C75684735 @default.
- W2999276551 hasConcept C77088390 @default.
- W2999276551 hasConcept C83283714 @default.
- W2999276551 hasConceptScore W2999276551C111919701 @default.
- W2999276551 hasConceptScore W2999276551C120314980 @default.
- W2999276551 hasConceptScore W2999276551C124101348 @default.
- W2999276551 hasConceptScore W2999276551C127162648 @default.
- W2999276551 hasConceptScore W2999276551C127413603 @default.
- W2999276551 hasConceptScore W2999276551C165696696 @default.
- W2999276551 hasConceptScore W2999276551C173608175 @default.
- W2999276551 hasConceptScore W2999276551C177774035 @default.
- W2999276551 hasConceptScore W2999276551C190475519 @default.
- W2999276551 hasConceptScore W2999276551C2778562939 @default.
- W2999276551 hasConceptScore W2999276551C2779960059 @default.
- W2999276551 hasConceptScore W2999276551C2781172179 @default.
- W2999276551 hasConceptScore W2999276551C28034677 @default.
- W2999276551 hasConceptScore W2999276551C31258907 @default.
- W2999276551 hasConceptScore W2999276551C38652104 @default.
- W2999276551 hasConceptScore W2999276551C41008148 @default.
- W2999276551 hasConceptScore W2999276551C48044578 @default.
- W2999276551 hasConceptScore W2999276551C62611344 @default.
- W2999276551 hasConceptScore W2999276551C66938386 @default.
- W2999276551 hasConceptScore W2999276551C75684735 @default.
- W2999276551 hasConceptScore W2999276551C77088390 @default.
- W2999276551 hasConceptScore W2999276551C83283714 @default.
- W2999276551 hasLocation W29992765511 @default.
- W2999276551 hasLocation W29992765512 @default.
- W2999276551 hasLocation W29992765513 @default.
- W2999276551 hasOpenAccess W2999276551 @default.
- W2999276551 hasPrimaryLocation W29992765511 @default.
- W2999276551 hasRelatedWork W12242549 @default.
- W2999276551 hasRelatedWork W1907684 @default.
- W2999276551 hasRelatedWork W2394243 @default.
- W2999276551 hasRelatedWork W2754543 @default.
- W2999276551 hasRelatedWork W5211984 @default.
- W2999276551 hasRelatedWork W5453676 @default.
- W2999276551 hasRelatedWork W5530410 @default.
- W2999276551 hasRelatedWork W6262472 @default.
- W2999276551 hasRelatedWork W7077492 @default.