Matches in SemOpenAlex for { <https://semopenalex.org/work/W3204371345> ?p ?o ?g. }
Showing items 1 to 76 of
76
with 100 items per page.
- W3204371345 abstract "High-performance communication for very large messages on modern multi-GPU nodes has become increasingly important for Deep Learning workloads. These computing nodes are equipped with state-of-the-art interconnects, such as Nvidia's NVLink and PCIe, to facilitate communications between GPUs, and GPUs with the host processors. In this paper, we take on the challenge to design efficient intra-socket GPU-to-GPU communication using multiple NVLink channels at the UCX and MPI levels, and then utilise it to design an intra-node hierarchical NVLink/PCIe-aware GPU based MPI_Allreduce to enhance Horovod + TensorFlow with different models. UCX only utilises a small portion of the available NVLink bandwidth for intra-socket GPU-to-GPU communication. We propose a novel data transfer mechanism that stripes the message across multiple intra-socket communication channels and multiple memory regions using multiple GPU streams to utilise all available NVLink paths. Our approach achieves 1.69x and 1.84x higher bandwidth for UCX and Open MPI + UCX, respectively. We observe similar bandwidth improvements for large messages for MPI point-to-point communication when compared to other MPI implementations as they are also limited by data transfers by a single path. We then propose a 3-stage hierarchical, pipelined MPI_Allreduce design that incorporates the new multi-path NVLink data transfer mechanism for intra-socket communications in the first and third stages of the collective, and PCIe and X-bus channels for inter-socket GPU communication in the second stage with minimal interference. For large messages, our proposed algorithm achieves a high speedup when compared to Spectrum MPI, Open MPI + UCX, Open MPI + HPC-X, MVAPICH2-GDR, and NCCL. We also observe significant speedup for the proposed MPI_Allreduce for Horovod with TensorFlow with a variety of Deep Learning models." @default.
- W3204371345 created "2021-10-11" @default.
- W3204371345 creator A5031362631 @default.
- W3204371345 creator A5039854005 @default.
- W3204371345 creator A5081527852 @default.
- W3204371345 creator A5090580559 @default.
- W3204371345 date "2021-08-01" @default.
- W3204371345 modified "2023-09-27" @default.
- W3204371345 title "Efficient Multi-Path NVLink/PCIe-Aware UCX based Collective Communication for Deep Learning" @default.
- W3204371345 cites W1572016165 @default.
- W3204371345 cites W1637731592 @default.
- W3204371345 cites W1962931680 @default.
- W3204371345 cites W1964479544 @default.
- W3204371345 cites W1980008253 @default.
- W3204371345 cites W1987274898 @default.
- W3204371345 cites W2000531806 @default.
- W3204371345 cites W2024565525 @default.
- W3204371345 cites W2111144707 @default.
- W3204371345 cites W2131613942 @default.
- W3204371345 cites W2160054705 @default.
- W3204371345 cites W2488497244 @default.
- W3204371345 cites W2777078856 @default.
- W3204371345 cites W2888980102 @default.
- W3204371345 cites W2903901007 @default.
- W3204371345 cites W2930869794 @default.
- W3204371345 cites W2985247698 @default.
- W3204371345 cites W3039165326 @default.
- W3204371345 cites W3046490292 @default.
- W3204371345 doi "https://doi.org/10.1109/hoti52880.2021.00018" @default.
- W3204371345 hasPublicationYear "2021" @default.
- W3204371345 type Work @default.
- W3204371345 sameAs 3204371345 @default.
- W3204371345 citedByCount "2" @default.
- W3204371345 countsByYear W32043713452022 @default.
- W3204371345 crossrefType "proceedings-article" @default.
- W3204371345 hasAuthorship W3204371345A5031362631 @default.
- W3204371345 hasAuthorship W3204371345A5039854005 @default.
- W3204371345 hasAuthorship W3204371345A5081527852 @default.
- W3204371345 hasAuthorship W3204371345A5090580559 @default.
- W3204371345 hasConcept C130795937 @default.
- W3204371345 hasConcept C149635348 @default.
- W3204371345 hasConcept C173608175 @default.
- W3204371345 hasConcept C199360897 @default.
- W3204371345 hasConcept C26713055 @default.
- W3204371345 hasConcept C2776257435 @default.
- W3204371345 hasConcept C31258907 @default.
- W3204371345 hasConcept C41008148 @default.
- W3204371345 hasConcept C42935608 @default.
- W3204371345 hasConcept C64270927 @default.
- W3204371345 hasConceptScore W3204371345C130795937 @default.
- W3204371345 hasConceptScore W3204371345C149635348 @default.
- W3204371345 hasConceptScore W3204371345C173608175 @default.
- W3204371345 hasConceptScore W3204371345C199360897 @default.
- W3204371345 hasConceptScore W3204371345C26713055 @default.
- W3204371345 hasConceptScore W3204371345C2776257435 @default.
- W3204371345 hasConceptScore W3204371345C31258907 @default.
- W3204371345 hasConceptScore W3204371345C41008148 @default.
- W3204371345 hasConceptScore W3204371345C42935608 @default.
- W3204371345 hasConceptScore W3204371345C64270927 @default.
- W3204371345 hasLocation W32043713451 @default.
- W3204371345 hasOpenAccess W3204371345 @default.
- W3204371345 hasPrimaryLocation W32043713451 @default.
- W3204371345 hasRelatedWork W10202958 @default.
- W3204371345 hasRelatedWork W11911270 @default.
- W3204371345 hasRelatedWork W13846533 @default.
- W3204371345 hasRelatedWork W14728754 @default.
- W3204371345 hasRelatedWork W4697903 @default.
- W3204371345 hasRelatedWork W5373012 @default.
- W3204371345 hasRelatedWork W551164 @default.
- W3204371345 hasRelatedWork W8213497 @default.
- W3204371345 hasRelatedWork W9190101 @default.
- W3204371345 hasRelatedWork W9541773 @default.
- W3204371345 isParatext "false" @default.
- W3204371345 isRetracted "false" @default.
- W3204371345 magId "3204371345" @default.
- W3204371345 workType "article" @default.