Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387427596> ?p ?o ?g. }
Showing items 1 to 67 of
67
with 100 items per page.
- W4387427596 abstract "Many recent breakthroughs in machine learning have been enabled by the pre-trained foundation models. By scaling up model parameters, training data, and computation resources, foundation models have significantly advanced the state-of-the-art in many applications. However, it is still an open question of how to use these models to perform downstream tasks efficiently. Knowledge distillation (KD) has been explored to tackle this challenge. KD transfers knowledge from a large teacher model to a smaller student model. While KD has been successful in improving student model performance, recent research has discovered that a powerful teacher does not necessarily lead to a powerful student, due to their huge capacity gap. In addition, the potential distribution shifts between the pre-training data and downstream tasks can make knowledge transfer in KD sub-optimal for improving downstream task performance. In this paper, we extend KD with an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models. Our design is inspired by the way humans learn from teachers who can explain knowledge in a way that meets the students' needs. Specifically, we let each model (i.e., student and teacher) train two components: (1) an encoder encoding the model's hidden states to a message and (2) a decoder decoding any messages to its own hidden states. With encoder and decoder, not only can the teacher transfer rich information by encoding its hidden states, but also the student can send messages with information of downstream tasks to the teacher. Therefore, knowledge passing from teacher to student can be tailored to the student's capacity and downstream tasks' distributions. We conducted experiments on benchmark datasets to show that our communication mechanism outperforms state-of-the-art distillation techniques." @default.
- W4387427596 created "2023-10-08" @default.
- W4387427596 creator A5000787176 @default.
- W4387427596 creator A5015729655 @default.
- W4387427596 creator A5036550121 @default.
- W4387427596 creator A5055796262 @default.
- W4387427596 creator A5079085366 @default.
- W4387427596 creator A5086696415 @default.
- W4387427596 date "2023-10-04" @default.
- W4387427596 modified "2023-10-09" @default.
- W4387427596 title "Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication" @default.
- W4387427596 doi "https://doi.org/10.48550/arxiv.2310.03188" @default.
- W4387427596 hasPublicationYear "2023" @default.
- W4387427596 type Work @default.
- W4387427596 citedByCount "0" @default.
- W4387427596 crossrefType "posted-content" @default.
- W4387427596 hasAuthorship W4387427596A5000787176 @default.
- W4387427596 hasAuthorship W4387427596A5015729655 @default.
- W4387427596 hasAuthorship W4387427596A5036550121 @default.
- W4387427596 hasAuthorship W4387427596A5055796262 @default.
- W4387427596 hasAuthorship W4387427596A5079085366 @default.
- W4387427596 hasAuthorship W4387427596A5086696415 @default.
- W4387427596 hasBestOaLocation W43874275961 @default.
- W4387427596 hasConcept C107457646 @default.
- W4387427596 hasConcept C111919701 @default.
- W4387427596 hasConcept C118505674 @default.
- W4387427596 hasConcept C125411270 @default.
- W4387427596 hasConcept C127413603 @default.
- W4387427596 hasConcept C154945302 @default.
- W4387427596 hasConcept C201995342 @default.
- W4387427596 hasConcept C21547014 @default.
- W4387427596 hasConcept C2776207758 @default.
- W4387427596 hasConcept C2776960227 @default.
- W4387427596 hasConcept C2780451532 @default.
- W4387427596 hasConcept C41008148 @default.
- W4387427596 hasConcept C56739046 @default.
- W4387427596 hasConcept C98045186 @default.
- W4387427596 hasConceptScore W4387427596C107457646 @default.
- W4387427596 hasConceptScore W4387427596C111919701 @default.
- W4387427596 hasConceptScore W4387427596C118505674 @default.
- W4387427596 hasConceptScore W4387427596C125411270 @default.
- W4387427596 hasConceptScore W4387427596C127413603 @default.
- W4387427596 hasConceptScore W4387427596C154945302 @default.
- W4387427596 hasConceptScore W4387427596C201995342 @default.
- W4387427596 hasConceptScore W4387427596C21547014 @default.
- W4387427596 hasConceptScore W4387427596C2776207758 @default.
- W4387427596 hasConceptScore W4387427596C2776960227 @default.
- W4387427596 hasConceptScore W4387427596C2780451532 @default.
- W4387427596 hasConceptScore W4387427596C41008148 @default.
- W4387427596 hasConceptScore W4387427596C56739046 @default.
- W4387427596 hasConceptScore W4387427596C98045186 @default.
- W4387427596 hasLocation W43874275961 @default.
- W4387427596 hasOpenAccess W4387427596 @default.
- W4387427596 hasPrimaryLocation W43874275961 @default.
- W4387427596 hasRelatedWork W1950940422 @default.
- W4387427596 hasRelatedWork W2032507829 @default.
- W4387427596 hasRelatedWork W2060210989 @default.
- W4387427596 hasRelatedWork W2129146436 @default.
- W4387427596 hasRelatedWork W2349021146 @default.
- W4387427596 hasRelatedWork W3040203686 @default.
- W4387427596 hasRelatedWork W35583307 @default.
- W4387427596 hasRelatedWork W4214653257 @default.
- W4387427596 hasRelatedWork W4249524554 @default.
- W4387427596 hasRelatedWork W4283822356 @default.
- W4387427596 isParatext "false" @default.
- W4387427596 isRetracted "false" @default.
- W4387427596 workType "article" @default.