Matches in SemOpenAlex for { <https://semopenalex.org/work/W4378468535> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4378468535 abstract "Training models with varying capacities can be advantageous for deploying them in different scenarios. While high-capacity models offer better performance, low-capacity models require fewer computing resources for training and inference. In this work, we propose a novel one-stop training framework to jointly train high-capacity and low-capactiy models. This framework consists of two composite model architectures and a joint training algorithm called Two-Stage Joint-Training (TSJT). Unlike knowledge distillation, where multiple capacity models are trained from scratch separately, our approach integrates supervisions from different capacity models simultaneously, leading to faster and more efficient convergence. Extensive experiments on the multilingual machine translation benchmark WMT10 show that our method outperforms low-capacity baseline models and achieves comparable or better performance on high-capacity models. Notably, the analysis demonstrates that our method significantly influences the initial training process, leading to more efficient convergence and superior solutions." @default.
- W4378468535 created "2023-05-27" @default.
- W4378468535 creator A5011702824 @default.
- W4378468535 creator A5014662947 @default.
- W4378468535 creator A5020769158 @default.
- W4378468535 creator A5090166634 @default.
- W4378468535 creator A5092029654 @default.
- W4378468535 date "2023-05-23" @default.
- W4378468535 modified "2023-10-16" @default.
- W4378468535 title "One-stop Training of Multiple Capacity Models" @default.
- W4378468535 doi "https://doi.org/10.48550/arxiv.2305.14066" @default.
- W4378468535 hasPublicationYear "2023" @default.
- W4378468535 type Work @default.
- W4378468535 citedByCount "0" @default.
- W4378468535 crossrefType "posted-content" @default.
- W4378468535 hasAuthorship W4378468535A5011702824 @default.
- W4378468535 hasAuthorship W4378468535A5014662947 @default.
- W4378468535 hasAuthorship W4378468535A5020769158 @default.
- W4378468535 hasAuthorship W4378468535A5090166634 @default.
- W4378468535 hasAuthorship W4378468535A5092029654 @default.
- W4378468535 hasBestOaLocation W43784685351 @default.
- W4378468535 hasConcept C111368507 @default.
- W4378468535 hasConcept C111919701 @default.
- W4378468535 hasConcept C119857082 @default.
- W4378468535 hasConcept C121332964 @default.
- W4378468535 hasConcept C12725497 @default.
- W4378468535 hasConcept C127313418 @default.
- W4378468535 hasConcept C127413603 @default.
- W4378468535 hasConcept C13280743 @default.
- W4378468535 hasConcept C153294291 @default.
- W4378468535 hasConcept C154945302 @default.
- W4378468535 hasConcept C162324750 @default.
- W4378468535 hasConcept C170154142 @default.
- W4378468535 hasConcept C18555067 @default.
- W4378468535 hasConcept C185798385 @default.
- W4378468535 hasConcept C205649164 @default.
- W4378468535 hasConcept C2776214188 @default.
- W4378468535 hasConcept C2777211547 @default.
- W4378468535 hasConcept C2777303404 @default.
- W4378468535 hasConcept C41008148 @default.
- W4378468535 hasConcept C50522688 @default.
- W4378468535 hasConcept C98045186 @default.
- W4378468535 hasConceptScore W4378468535C111368507 @default.
- W4378468535 hasConceptScore W4378468535C111919701 @default.
- W4378468535 hasConceptScore W4378468535C119857082 @default.
- W4378468535 hasConceptScore W4378468535C121332964 @default.
- W4378468535 hasConceptScore W4378468535C12725497 @default.
- W4378468535 hasConceptScore W4378468535C127313418 @default.
- W4378468535 hasConceptScore W4378468535C127413603 @default.
- W4378468535 hasConceptScore W4378468535C13280743 @default.
- W4378468535 hasConceptScore W4378468535C153294291 @default.
- W4378468535 hasConceptScore W4378468535C154945302 @default.
- W4378468535 hasConceptScore W4378468535C162324750 @default.
- W4378468535 hasConceptScore W4378468535C170154142 @default.
- W4378468535 hasConceptScore W4378468535C18555067 @default.
- W4378468535 hasConceptScore W4378468535C185798385 @default.
- W4378468535 hasConceptScore W4378468535C205649164 @default.
- W4378468535 hasConceptScore W4378468535C2776214188 @default.
- W4378468535 hasConceptScore W4378468535C2777211547 @default.
- W4378468535 hasConceptScore W4378468535C2777303404 @default.
- W4378468535 hasConceptScore W4378468535C41008148 @default.
- W4378468535 hasConceptScore W4378468535C50522688 @default.
- W4378468535 hasConceptScore W4378468535C98045186 @default.
- W4378468535 hasLocation W43784685351 @default.
- W4378468535 hasOpenAccess W4378468535 @default.
- W4378468535 hasPrimaryLocation W43784685351 @default.
- W4378468535 hasRelatedWork W1485630101 @default.
- W4378468535 hasRelatedWork W2913647591 @default.
- W4378468535 hasRelatedWork W2963058055 @default.
- W4378468535 hasRelatedWork W2963941635 @default.
- W4378468535 hasRelatedWork W3115300116 @default.
- W4378468535 hasRelatedWork W3139398652 @default.
- W4378468535 hasRelatedWork W4212980275 @default.
- W4378468535 hasRelatedWork W4288635965 @default.
- W4378468535 hasRelatedWork W4309953041 @default.
- W4378468535 hasRelatedWork W4320505317 @default.
- W4378468535 isParatext "false" @default.
- W4378468535 isRetracted "false" @default.
- W4378468535 workType "article" @default.