Matches in SemOpenAlex for { <https://semopenalex.org/work/W4320517655> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W4320517655 abstract "Scaling model parameters usually improves model quality, but at the price of high computation overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) architecture, have constant computation cost over their dense counterparts, thus providing opportunities to train and serve a large model at a reasonable cost. However, the distributed training of an MoE model is prone to low efficiency, mainly due to the interleaved all-to-all communication during model computation. This paper makes three main contributions. First, we systematically analyze the all-to-all overhead in distributed training of MoE. Second, we propose a new communication scheduling scheme based on tensor partitioning that prioritizes the all-to-all operations over other communication, due to its blocking nature. Third, we introduce expert packing that reduces the all-to-all transfer size and incorporates optimizations to mitigate its overheads. Both techniques effectively tackle the all-to-all bottleneck, and we integrate them into a new system called Lina. Experiments on an A100 GPU testbed show that Lina improves the training step time of popular NLP models by up to 1.73x over the state-of-the-art." @default.
- W4320517655 created "2023-02-14" @default.
- W4320517655 creator A5013473459 @default.
- W4320517655 creator A5028120333 @default.
- W4320517655 creator A5046078938 @default.
- W4320517655 creator A5064029047 @default.
- W4320517655 creator A5072547549 @default.
- W4320517655 date "2022-10-31" @default.
- W4320517655 modified "2023-10-18" @default.
- W4320517655 title "Lita: Accelerating Distributed Training of Sparsely Activated Models" @default.
- W4320517655 doi "https://doi.org/10.48550/arxiv.2210.17223" @default.
- W4320517655 hasPublicationYear "2022" @default.
- W4320517655 type Work @default.
- W4320517655 citedByCount "0" @default.
- W4320517655 crossrefType "posted-content" @default.
- W4320517655 hasAuthorship W4320517655A5013473459 @default.
- W4320517655 hasAuthorship W4320517655A5028120333 @default.
- W4320517655 hasAuthorship W4320517655A5046078938 @default.
- W4320517655 hasAuthorship W4320517655A5064029047 @default.
- W4320517655 hasAuthorship W4320517655A5072547549 @default.
- W4320517655 hasBestOaLocation W43205176551 @default.
- W4320517655 hasConcept C111919701 @default.
- W4320517655 hasConcept C11413529 @default.
- W4320517655 hasConcept C120314980 @default.
- W4320517655 hasConcept C126255220 @default.
- W4320517655 hasConcept C149635348 @default.
- W4320517655 hasConcept C206729178 @default.
- W4320517655 hasConcept C2779960059 @default.
- W4320517655 hasConcept C2780513914 @default.
- W4320517655 hasConcept C31258907 @default.
- W4320517655 hasConcept C31395832 @default.
- W4320517655 hasConcept C33923547 @default.
- W4320517655 hasConcept C41008148 @default.
- W4320517655 hasConcept C45374587 @default.
- W4320517655 hasConceptScore W4320517655C111919701 @default.
- W4320517655 hasConceptScore W4320517655C11413529 @default.
- W4320517655 hasConceptScore W4320517655C120314980 @default.
- W4320517655 hasConceptScore W4320517655C126255220 @default.
- W4320517655 hasConceptScore W4320517655C149635348 @default.
- W4320517655 hasConceptScore W4320517655C206729178 @default.
- W4320517655 hasConceptScore W4320517655C2779960059 @default.
- W4320517655 hasConceptScore W4320517655C2780513914 @default.
- W4320517655 hasConceptScore W4320517655C31258907 @default.
- W4320517655 hasConceptScore W4320517655C31395832 @default.
- W4320517655 hasConceptScore W4320517655C33923547 @default.
- W4320517655 hasConceptScore W4320517655C41008148 @default.
- W4320517655 hasConceptScore W4320517655C45374587 @default.
- W4320517655 hasLocation W43205176551 @default.
- W4320517655 hasOpenAccess W4320517655 @default.
- W4320517655 hasPrimaryLocation W43205176551 @default.
- W4320517655 hasRelatedWork W1669499690 @default.
- W4320517655 hasRelatedWork W1882733036 @default.
- W4320517655 hasRelatedWork W2092071486 @default.
- W4320517655 hasRelatedWork W2124870959 @default.
- W4320517655 hasRelatedWork W2157013742 @default.
- W4320517655 hasRelatedWork W2160425906 @default.
- W4320517655 hasRelatedWork W2391167130 @default.
- W4320517655 hasRelatedWork W2563188592 @default.
- W4320517655 hasRelatedWork W4242263690 @default.
- W4320517655 hasRelatedWork W94000989 @default.
- W4320517655 isParatext "false" @default.
- W4320517655 isRetracted "false" @default.
- W4320517655 workType "article" @default.