Matches in SemOpenAlex for { <https://semopenalex.org/work/W4384705436> ?p ?o ?g. }
Showing items 1 to 98 of
98
with 100 items per page.
- W4384705436 abstract "In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models necessitates advanced parallelism strategies [1], [2] to maintain efficiency. However, such distributed DL parallelism strategies require a varied mixture of collective and point-to-point communication operations across a broad range of message sizes and scales. Examples of models using advanced parallelism strategies include Deep Learning Recommendation Models (DLRM) [3] and Mixture-of-Experts (MoE) [4], [5]. Communication libraries’ performance varies wildly across different communication operations, scales, and message sizes. We propose MCR-DL: an extensible DL communication framework that supports all point-to-point and collective operations while enabling users to dynamically mix-and-match communication backends for a given operation without deadlocks. MCR-DL also comes packaged with a tuning suite for dynamically selecting the best communication backend for a given input tensor. We select DeepSpeed-MoE and DLRM as candidate DL models and demonstrate a 31% improvement in DS-MoE throughput on 256 V100 GPUs on the Lassen HPC system. Further, we achieve a 20% throughput improvement in a dense Megatron-DeepSpeed model and a 25% throughput improvement in DLRM on 32 A100 GPUs with the Theta-GPU HPC system." @default.
- W4384705436 created "2023-07-20" @default.
- W4384705436 creator A5004330728 @default.
- W4384705436 creator A5015153122 @default.
- W4384705436 creator A5024879682 @default.
- W4384705436 creator A5032141491 @default.
- W4384705436 creator A5034293705 @default.
- W4384705436 creator A5037534081 @default.
- W4384705436 creator A5040302174 @default.
- W4384705436 creator A5078128277 @default.
- W4384705436 date "2023-05-01" @default.
- W4384705436 modified "2023-09-26" @default.
- W4384705436 title "MCR-DL: Mix-and-Match Communication Runtime for Deep Learning" @default.
- W4384705436 cites W1991732708 @default.
- W4384705436 cites W2052440657 @default.
- W4384705436 cites W2128520152 @default.
- W4384705436 cites W2913752846 @default.
- W4384705436 cites W3017432630 @default.
- W4384705436 cites W3086105743 @default.
- W4384705436 cites W3097718020 @default.
- W4384705436 cites W3132977829 @default.
- W4384705436 cites W3164436820 @default.
- W4384705436 cites W4294008856 @default.
- W4384705436 doi "https://doi.org/10.1109/ipdps54959.2023.00103" @default.
- W4384705436 hasPublicationYear "2023" @default.
- W4384705436 type Work @default.
- W4384705436 citedByCount "0" @default.
- W4384705436 crossrefType "proceedings-article" @default.
- W4384705436 hasAuthorship W4384705436A5004330728 @default.
- W4384705436 hasAuthorship W4384705436A5015153122 @default.
- W4384705436 hasAuthorship W4384705436A5024879682 @default.
- W4384705436 hasAuthorship W4384705436A5032141491 @default.
- W4384705436 hasAuthorship W4384705436A5034293705 @default.
- W4384705436 hasAuthorship W4384705436A5037534081 @default.
- W4384705436 hasAuthorship W4384705436A5040302174 @default.
- W4384705436 hasAuthorship W4384705436A5078128277 @default.
- W4384705436 hasConcept C108583219 @default.
- W4384705436 hasConcept C111919701 @default.
- W4384705436 hasConcept C118524514 @default.
- W4384705436 hasConcept C120314980 @default.
- W4384705436 hasConcept C144024400 @default.
- W4384705436 hasConcept C154945302 @default.
- W4384705436 hasConcept C157764524 @default.
- W4384705436 hasConcept C158156997 @default.
- W4384705436 hasConcept C159985019 @default.
- W4384705436 hasConcept C166957645 @default.
- W4384705436 hasConcept C173608175 @default.
- W4384705436 hasConcept C192562407 @default.
- W4384705436 hasConcept C204323151 @default.
- W4384705436 hasConcept C2524010 @default.
- W4384705436 hasConcept C2781172179 @default.
- W4384705436 hasConcept C28719098 @default.
- W4384705436 hasConcept C33923547 @default.
- W4384705436 hasConcept C41008148 @default.
- W4384705436 hasConcept C46312422 @default.
- W4384705436 hasConcept C555944384 @default.
- W4384705436 hasConcept C61483411 @default.
- W4384705436 hasConcept C79581498 @default.
- W4384705436 hasConcept C95457728 @default.
- W4384705436 hasConceptScore W4384705436C108583219 @default.
- W4384705436 hasConceptScore W4384705436C111919701 @default.
- W4384705436 hasConceptScore W4384705436C118524514 @default.
- W4384705436 hasConceptScore W4384705436C120314980 @default.
- W4384705436 hasConceptScore W4384705436C144024400 @default.
- W4384705436 hasConceptScore W4384705436C154945302 @default.
- W4384705436 hasConceptScore W4384705436C157764524 @default.
- W4384705436 hasConceptScore W4384705436C158156997 @default.
- W4384705436 hasConceptScore W4384705436C159985019 @default.
- W4384705436 hasConceptScore W4384705436C166957645 @default.
- W4384705436 hasConceptScore W4384705436C173608175 @default.
- W4384705436 hasConceptScore W4384705436C192562407 @default.
- W4384705436 hasConceptScore W4384705436C204323151 @default.
- W4384705436 hasConceptScore W4384705436C2524010 @default.
- W4384705436 hasConceptScore W4384705436C2781172179 @default.
- W4384705436 hasConceptScore W4384705436C28719098 @default.
- W4384705436 hasConceptScore W4384705436C33923547 @default.
- W4384705436 hasConceptScore W4384705436C41008148 @default.
- W4384705436 hasConceptScore W4384705436C46312422 @default.
- W4384705436 hasConceptScore W4384705436C555944384 @default.
- W4384705436 hasConceptScore W4384705436C61483411 @default.
- W4384705436 hasConceptScore W4384705436C79581498 @default.
- W4384705436 hasConceptScore W4384705436C95457728 @default.
- W4384705436 hasLocation W43847054361 @default.
- W4384705436 hasOpenAccess W4384705436 @default.
- W4384705436 hasPrimaryLocation W43847054361 @default.
- W4384705436 hasRelatedWork W1508832769 @default.
- W4384705436 hasRelatedWork W1534022569 @default.
- W4384705436 hasRelatedWork W1608806855 @default.
- W4384705436 hasRelatedWork W2023505575 @default.
- W4384705436 hasRelatedWork W2047588290 @default.
- W4384705436 hasRelatedWork W2137702271 @default.
- W4384705436 hasRelatedWork W2313503008 @default.
- W4384705436 hasRelatedWork W2378666660 @default.
- W4384705436 hasRelatedWork W3095555187 @default.
- W4384705436 hasRelatedWork W4240606930 @default.
- W4384705436 isParatext "false" @default.
- W4384705436 isRetracted "false" @default.
- W4384705436 workType "article" @default.