Matches in SemOpenAlex for { <https://semopenalex.org/work/W4366999497> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4366999497 abstract "The Mixture of Experts (MoE) model becomes an important choice of large language models nowadays because of its scalability with sublinear computational complexity for training and inference. However, existing MoE models suffer from two critical drawbacks, 1) tremendous inner-node and inter-node communication overhead introduced by all-to-all dispatching and gathering, and 2) limited scalability for the backbone because of the bound data parallel and expert parallel to scale in the expert dimension. In this paper, we systematically analyze these drawbacks in terms of training efficiency in the parallel framework view and propose a novel MoE architecture called Pipeline MoE (PPMoE) to tackle them. PPMoE builds expert parallel incorporating with tensor parallel and replaces communication-intensive all-to-all dispatching and gathering with a simple tensor index slicing and inner-node all-reduce. Besides, it is convenient for PPMoE to integrate pipeline parallel to further scale the backbone due to its flexible parallel architecture. Extensive experiments show that PPMoE not only achieves a more than $1.75times$ speed up compared to existing MoE architectures but also reaches $90%$ throughput of its corresponding backbone model that is $20times$ smaller." @default.
- W4366999497 created "2023-04-27" @default.
- W4366999497 creator A5010146234 @default.
- W4366999497 creator A5023906567 @default.
- W4366999497 creator A5028462736 @default.
- W4366999497 creator A5033976028 @default.
- W4366999497 creator A5047455588 @default.
- W4366999497 creator A5056944785 @default.
- W4366999497 date "2023-04-22" @default.
- W4366999497 modified "2023-09-23" @default.
- W4366999497 title "Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism" @default.
- W4366999497 doi "https://doi.org/10.48550/arxiv.2304.11414" @default.
- W4366999497 hasPublicationYear "2023" @default.
- W4366999497 type Work @default.
- W4366999497 citedByCount "0" @default.
- W4366999497 crossrefType "posted-content" @default.
- W4366999497 hasAuthorship W4366999497A5010146234 @default.
- W4366999497 hasAuthorship W4366999497A5023906567 @default.
- W4366999497 hasAuthorship W4366999497A5028462736 @default.
- W4366999497 hasAuthorship W4366999497A5033976028 @default.
- W4366999497 hasAuthorship W4366999497A5047455588 @default.
- W4366999497 hasAuthorship W4366999497A5056944785 @default.
- W4366999497 hasBestOaLocation W43669994971 @default.
- W4366999497 hasConcept C120314980 @default.
- W4366999497 hasConcept C121332964 @default.
- W4366999497 hasConcept C127413603 @default.
- W4366999497 hasConcept C157764524 @default.
- W4366999497 hasConcept C173608175 @default.
- W4366999497 hasConcept C199360897 @default.
- W4366999497 hasConcept C2778755073 @default.
- W4366999497 hasConcept C2779960059 @default.
- W4366999497 hasConcept C41008148 @default.
- W4366999497 hasConcept C43521106 @default.
- W4366999497 hasConcept C48044578 @default.
- W4366999497 hasConcept C555944384 @default.
- W4366999497 hasConcept C62520636 @default.
- W4366999497 hasConcept C62611344 @default.
- W4366999497 hasConcept C66938386 @default.
- W4366999497 hasConcept C76155785 @default.
- W4366999497 hasConcept C77088390 @default.
- W4366999497 hasConceptScore W4366999497C120314980 @default.
- W4366999497 hasConceptScore W4366999497C121332964 @default.
- W4366999497 hasConceptScore W4366999497C127413603 @default.
- W4366999497 hasConceptScore W4366999497C157764524 @default.
- W4366999497 hasConceptScore W4366999497C173608175 @default.
- W4366999497 hasConceptScore W4366999497C199360897 @default.
- W4366999497 hasConceptScore W4366999497C2778755073 @default.
- W4366999497 hasConceptScore W4366999497C2779960059 @default.
- W4366999497 hasConceptScore W4366999497C41008148 @default.
- W4366999497 hasConceptScore W4366999497C43521106 @default.
- W4366999497 hasConceptScore W4366999497C48044578 @default.
- W4366999497 hasConceptScore W4366999497C555944384 @default.
- W4366999497 hasConceptScore W4366999497C62520636 @default.
- W4366999497 hasConceptScore W4366999497C62611344 @default.
- W4366999497 hasConceptScore W4366999497C66938386 @default.
- W4366999497 hasConceptScore W4366999497C76155785 @default.
- W4366999497 hasConceptScore W4366999497C77088390 @default.
- W4366999497 hasLocation W43669994971 @default.
- W4366999497 hasOpenAccess W4366999497 @default.
- W4366999497 hasPrimaryLocation W43669994971 @default.
- W4366999497 hasRelatedWork W1595151633 @default.
- W4366999497 hasRelatedWork W1596201972 @default.
- W4366999497 hasRelatedWork W1604898313 @default.
- W4366999497 hasRelatedWork W1788737569 @default.
- W4366999497 hasRelatedWork W2161252841 @default.
- W4366999497 hasRelatedWork W2364921833 @default.
- W4366999497 hasRelatedWork W2385146268 @default.
- W4366999497 hasRelatedWork W3048889998 @default.
- W4366999497 hasRelatedWork W94000989 @default.
- W4366999497 hasRelatedWork W2503642292 @default.
- W4366999497 isParatext "false" @default.
- W4366999497 isRetracted "false" @default.
- W4366999497 workType "article" @default.