Matches in SemOpenAlex for { <https://semopenalex.org/work/W3090789082> ?p ?o ?g. }
- W3090789082 abstract "Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one. Although widely used, this objective by design assumes that all the dimensions of hidden representations are independent, failing to capture important structural knowledge in the intermediate layers of the teacher network. To achieve better distillation efficacy, we propose Contrastive Distillation on Intermediate Representations (CoDIR), a principled knowledge distillation framework where the student is trained to distill knowledge through intermediate layers of the teacher via a contrastive objective. By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers. CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark, outperforming state-of-the-art compression methods." @default.
- W3090789082 created "2020-10-08" @default.
- W3090789082 creator A5023777406 @default.
- W3090789082 creator A5026746295 @default.
- W3090789082 creator A5034826937 @default.
- W3090789082 creator A5039183544 @default.
- W3090789082 creator A5066666034 @default.
- W3090789082 creator A5077322975 @default.
- W3090789082 date "2020-09-29" @default.
- W3090789082 modified "2023-09-25" @default.
- W3090789082 title "Contrastive Distillation on Intermediate Representations for Language Model Compression" @default.
- W3090789082 cites W131533222 @default.
- W3090789082 cites W1566289585 @default.
- W3090789082 cites W1599016936 @default.
- W3090789082 cites W1821462560 @default.
- W3090789082 cites W2152790380 @default.
- W3090789082 cites W2251939518 @default.
- W3090789082 cites W2427527485 @default.
- W3090789082 cites W2607892599 @default.
- W3090789082 cites W2842511635 @default.
- W3090789082 cites W2924902521 @default.
- W3090789082 cites W2944828972 @default.
- W3090789082 cites W2946417913 @default.
- W3090789082 cites W2951292523 @default.
- W3090789082 cites W2951585248 @default.
- W3090789082 cites W2951873722 @default.
- W3090789082 cites W2952509486 @default.
- W3090789082 cites W2963310665 @default.
- W3090789082 cites W2963341956 @default.
- W3090789082 cites W2963403868 @default.
- W3090789082 cites W2964420626 @default.
- W3090789082 cites W2965373594 @default.
- W3090789082 cites W2969624041 @default.
- W3090789082 cites W2970454332 @default.
- W3090789082 cites W2970565456 @default.
- W3090789082 cites W2970597249 @default.
- W3090789082 cites W2971155163 @default.
- W3090789082 cites W2973061659 @default.
- W3090789082 cites W2973727699 @default.
- W3090789082 cites W2976833415 @default.
- W3090789082 cites W2978017171 @default.
- W3090789082 cites W2978670439 @default.
- W3090789082 cites W2979567256 @default.
- W3090789082 cites W2981794819 @default.
- W3090789082 cites W2987283559 @default.
- W3090789082 cites W2996428491 @default.
- W3090789082 cites W2997710335 @default.
- W3090789082 cites W3000514857 @default.
- W3090789082 cites W3005680577 @default.
- W3090789082 cites W3015298864 @default.
- W3090789082 cites W3018378048 @default.
- W3090789082 cites W3019527251 @default.
- W3090789082 cites W3104033643 @default.
- W3090789082 cites W3105966348 @default.
- W3090789082 cites W3177265267 @default.
- W3090789082 doi "https://doi.org/10.48550/arxiv.2009.14167" @default.
- W3090789082 hasPublicationYear "2020" @default.
- W3090789082 type Work @default.
- W3090789082 sameAs 3090789082 @default.
- W3090789082 citedByCount "3" @default.
- W3090789082 countsByYear W30907890822020 @default.
- W3090789082 countsByYear W30907890822022 @default.
- W3090789082 crossrefType "posted-content" @default.
- W3090789082 hasAuthorship W3090789082A5023777406 @default.
- W3090789082 hasAuthorship W3090789082A5026746295 @default.
- W3090789082 hasAuthorship W3090789082A5034826937 @default.
- W3090789082 hasAuthorship W3090789082A5039183544 @default.
- W3090789082 hasAuthorship W3090789082A5066666034 @default.
- W3090789082 hasAuthorship W3090789082A5077322975 @default.
- W3090789082 hasBestOaLocation W30907890821 @default.
- W3090789082 hasConcept C119857082 @default.
- W3090789082 hasConcept C121332964 @default.
- W3090789082 hasConcept C13280743 @default.
- W3090789082 hasConcept C137293760 @default.
- W3090789082 hasConcept C154945302 @default.
- W3090789082 hasConcept C159985019 @default.
- W3090789082 hasConcept C177264268 @default.
- W3090789082 hasConcept C178790620 @default.
- W3090789082 hasConcept C180016635 @default.
- W3090789082 hasConcept C185592680 @default.
- W3090789082 hasConcept C185798385 @default.
- W3090789082 hasConcept C192562407 @default.
- W3090789082 hasConcept C198531522 @default.
- W3090789082 hasConcept C199360897 @default.
- W3090789082 hasConcept C204030448 @default.
- W3090789082 hasConcept C204321447 @default.
- W3090789082 hasConcept C205649164 @default.
- W3090789082 hasConcept C2778755073 @default.
- W3090789082 hasConcept C41008148 @default.
- W3090789082 hasConcept C43617362 @default.
- W3090789082 hasConcept C62520636 @default.
- W3090789082 hasConceptScore W3090789082C119857082 @default.
- W3090789082 hasConceptScore W3090789082C121332964 @default.
- W3090789082 hasConceptScore W3090789082C13280743 @default.
- W3090789082 hasConceptScore W3090789082C137293760 @default.
- W3090789082 hasConceptScore W3090789082C154945302 @default.
- W3090789082 hasConceptScore W3090789082C159985019 @default.
- W3090789082 hasConceptScore W3090789082C177264268 @default.
- W3090789082 hasConceptScore W3090789082C178790620 @default.
- W3090789082 hasConceptScore W3090789082C180016635 @default.