Matches in SemOpenAlex for { <https://semopenalex.org/work/W3204921104> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W3204921104 abstract "Deep Neural Networks have gained significant attraction due to their wide applicability in different domains. DNN sizes and training samples are constantly growing, making training of such workloads more challenging. Distributed training is a solution to reduce the training time. High-performance distributed training platforms should leverage multi-dimensional hierarchical networks, which interconnect accelerators through different levels of the network, to dramatically reduce expensive NICs required for the scale-out network. However, it comes at the expense of communication overhead between distributed accelerators to exchange gradients or input/output activation. In order to allow for further scaling of the workloads, communication overhead needs to be minimized. In this paper, we motivate the fact that in training platforms, adding more intermediate network dimensions is beneficial for efficiently mitigating the excessive use of expensive NIC resources. Further, we address different challenges of the DNN training on hierarchical networks. We discuss when designing the interconnect, how to distribute network bandwidth resources across different dimensions in order to (i) maximize BW utilization of all dimensions, and (ii) minimizing the overall training time for the target workload. We then implement a framework that, for a given workload, determines the best network configuration that maximizes performance, or performance-per-cost." @default.
- W3204921104 created "2021-10-11" @default.
- W3204921104 creator A5034089074 @default.
- W3204921104 creator A5065143008 @default.
- W3204921104 creator A5068175490 @default.
- W3204921104 creator A5088899364 @default.
- W3204921104 date "2021-09-24" @default.
- W3204921104 modified "2023-09-27" @default.
- W3204921104 title "Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models." @default.
- W3204921104 cites W2009571103 @default.
- W3204921104 cites W2057332538 @default.
- W3204921104 cites W2131613942 @default.
- W3204921104 cites W2626778328 @default.
- W3204921104 cites W2807147113 @default.
- W3204921104 cites W2884711234 @default.
- W3204921104 cites W2900824371 @default.
- W3204921104 cites W2922527104 @default.
- W3204921104 cites W2947737663 @default.
- W3204921104 cites W2949161920 @default.
- W3204921104 cites W2949650786 @default.
- W3204921104 cites W2991040477 @default.
- W3204921104 cites W3016395792 @default.
- W3204921104 cites W3043522163 @default.
- W3204921104 cites W3092501829 @default.
- W3204921104 cites W3096425133 @default.
- W3204921104 cites W3129831491 @default.
- W3204921104 hasPublicationYear "2021" @default.
- W3204921104 type Work @default.
- W3204921104 sameAs 3204921104 @default.
- W3204921104 citedByCount "0" @default.
- W3204921104 crossrefType "posted-content" @default.
- W3204921104 hasAuthorship W3204921104A5034089074 @default.
- W3204921104 hasAuthorship W3204921104A5065143008 @default.
- W3204921104 hasAuthorship W3204921104A5068175490 @default.
- W3204921104 hasAuthorship W3204921104A5088899364 @default.
- W3204921104 hasConcept C111919701 @default.
- W3204921104 hasConcept C120314980 @default.
- W3204921104 hasConcept C121332964 @default.
- W3204921104 hasConcept C123745756 @default.
- W3204921104 hasConcept C153083717 @default.
- W3204921104 hasConcept C153294291 @default.
- W3204921104 hasConcept C154945302 @default.
- W3204921104 hasConcept C199845137 @default.
- W3204921104 hasConcept C2776257435 @default.
- W3204921104 hasConcept C2777211547 @default.
- W3204921104 hasConcept C2778476105 @default.
- W3204921104 hasConcept C2779960059 @default.
- W3204921104 hasConcept C31258907 @default.
- W3204921104 hasConcept C41008148 @default.
- W3204921104 hasConcept C48044578 @default.
- W3204921104 hasConcept C50644808 @default.
- W3204921104 hasConceptScore W3204921104C111919701 @default.
- W3204921104 hasConceptScore W3204921104C120314980 @default.
- W3204921104 hasConceptScore W3204921104C121332964 @default.
- W3204921104 hasConceptScore W3204921104C123745756 @default.
- W3204921104 hasConceptScore W3204921104C153083717 @default.
- W3204921104 hasConceptScore W3204921104C153294291 @default.
- W3204921104 hasConceptScore W3204921104C154945302 @default.
- W3204921104 hasConceptScore W3204921104C199845137 @default.
- W3204921104 hasConceptScore W3204921104C2776257435 @default.
- W3204921104 hasConceptScore W3204921104C2777211547 @default.
- W3204921104 hasConceptScore W3204921104C2778476105 @default.
- W3204921104 hasConceptScore W3204921104C2779960059 @default.
- W3204921104 hasConceptScore W3204921104C31258907 @default.
- W3204921104 hasConceptScore W3204921104C41008148 @default.
- W3204921104 hasConceptScore W3204921104C48044578 @default.
- W3204921104 hasConceptScore W3204921104C50644808 @default.
- W3204921104 hasLocation W32049211041 @default.
- W3204921104 hasOpenAccess W3204921104 @default.
- W3204921104 hasPrimaryLocation W32049211041 @default.
- W3204921104 hasRelatedWork W2023835913 @default.
- W3204921104 hasRelatedWork W2309679942 @default.
- W3204921104 hasRelatedWork W2583750011 @default.
- W3204921104 hasRelatedWork W2666552878 @default.
- W3204921104 hasRelatedWork W2745574548 @default.
- W3204921104 hasRelatedWork W2768646057 @default.
- W3204921104 hasRelatedWork W2794846391 @default.
- W3204921104 hasRelatedWork W3003883635 @default.
- W3204921104 hasRelatedWork W3003893379 @default.
- W3204921104 hasRelatedWork W3012536640 @default.
- W3204921104 hasRelatedWork W3015757494 @default.
- W3204921104 hasRelatedWork W3071522630 @default.
- W3204921104 hasRelatedWork W3108415931 @default.
- W3204921104 hasRelatedWork W3129155488 @default.
- W3204921104 hasRelatedWork W3134655840 @default.
- W3204921104 hasRelatedWork W3186518286 @default.
- W3204921104 hasRelatedWork W3187219391 @default.
- W3204921104 hasRelatedWork W3201181410 @default.
- W3204921104 hasRelatedWork W3204833512 @default.
- W3204921104 hasRelatedWork W3206122484 @default.
- W3204921104 isParatext "false" @default.
- W3204921104 isRetracted "false" @default.
- W3204921104 magId "3204921104" @default.
- W3204921104 workType "article" @default.