Matches in SemOpenAlex for { <https://semopenalex.org/work/W4375958459> ?p ?o ?g. }
Showing items 1 to 57 of
57
with 100 items per page.
- W4375958459 abstract "Distributed Machine Learning (DML) systems are utilized to enhance the speed of model training in data centers (DCs) and edge nodes. The Parameter Server (PS) communication architecture is commonly employed, but it faces severe long-tail latency caused by many-to-one incast traffic patterns, negatively impacting training throughput. To address this challenge, we design the textbf{L}oss-tolerant textbf{T}ransmission textbf{P}rotocol (LTP), which permits partial loss of gradients during synchronization to avoid unneeded retransmission and contributes to faster synchronization per iteration. LTP implements loss-tolerant transmission through textit{out-of-order transmission} and textit{out-of-order Acknowledges (ACKs)}. LTP employs textit{Early Close} to adjust the loss-tolerant threshold based on network conditions and textit{Bubble Filling} for data correction to maintain training accuracy. LTP is implemented by C++ and integrated into PyTorch. Evaluations on a testbed of 8 worker nodes and one PS node demonstrate that LTP can significantly improve DML training task throughput by up to 30x compared to traditional TCP congestion controls, with no sacrifice to final accuracy." @default.
- W4375958459 created "2023-05-10" @default.
- W4375958459 creator A5008996538 @default.
- W4375958459 creator A5011724155 @default.
- W4375958459 creator A5025588604 @default.
- W4375958459 creator A5043927335 @default.
- W4375958459 creator A5061734822 @default.
- W4375958459 creator A5088839810 @default.
- W4375958459 date "2023-05-07" @default.
- W4375958459 modified "2023-10-16" @default.
- W4375958459 title "Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol" @default.
- W4375958459 doi "https://doi.org/10.48550/arxiv.2305.04279" @default.
- W4375958459 hasPublicationYear "2023" @default.
- W4375958459 type Work @default.
- W4375958459 citedByCount "0" @default.
- W4375958459 crossrefType "posted-content" @default.
- W4375958459 hasAuthorship W4375958459A5008996538 @default.
- W4375958459 hasAuthorship W4375958459A5011724155 @default.
- W4375958459 hasAuthorship W4375958459A5025588604 @default.
- W4375958459 hasAuthorship W4375958459A5043927335 @default.
- W4375958459 hasAuthorship W4375958459A5061734822 @default.
- W4375958459 hasAuthorship W4375958459A5088839810 @default.
- W4375958459 hasBestOaLocation W43759584591 @default.
- W4375958459 hasConcept C111919701 @default.
- W4375958459 hasConcept C120314980 @default.
- W4375958459 hasConcept C154945302 @default.
- W4375958459 hasConcept C157764524 @default.
- W4375958459 hasConcept C31258907 @default.
- W4375958459 hasConcept C31395832 @default.
- W4375958459 hasConcept C41008148 @default.
- W4375958459 hasConcept C46686674 @default.
- W4375958459 hasConcept C555944384 @default.
- W4375958459 hasConceptScore W4375958459C111919701 @default.
- W4375958459 hasConceptScore W4375958459C120314980 @default.
- W4375958459 hasConceptScore W4375958459C154945302 @default.
- W4375958459 hasConceptScore W4375958459C157764524 @default.
- W4375958459 hasConceptScore W4375958459C31258907 @default.
- W4375958459 hasConceptScore W4375958459C31395832 @default.
- W4375958459 hasConceptScore W4375958459C41008148 @default.
- W4375958459 hasConceptScore W4375958459C46686674 @default.
- W4375958459 hasConceptScore W4375958459C555944384 @default.
- W4375958459 hasLocation W43759584591 @default.
- W4375958459 hasOpenAccess W4375958459 @default.
- W4375958459 hasPrimaryLocation W43759584591 @default.
- W4375958459 hasRelatedWork W1517466511 @default.
- W4375958459 hasRelatedWork W1669499690 @default.
- W4375958459 hasRelatedWork W1972041827 @default.
- W4375958459 hasRelatedWork W2006631655 @default.
- W4375958459 hasRelatedWork W2023981379 @default.
- W4375958459 hasRelatedWork W2096193336 @default.
- W4375958459 hasRelatedWork W2129451236 @default.
- W4375958459 hasRelatedWork W2273814841 @default.
- W4375958459 hasRelatedWork W2366080774 @default.
- W4375958459 hasRelatedWork W4312371084 @default.
- W4375958459 isParatext "false" @default.
- W4375958459 isRetracted "false" @default.
- W4375958459 workType "article" @default.