Matches in SemOpenAlex for { <https://semopenalex.org/work/W3094342783> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W3094342783 abstract "Recent state-of-the-art approaches to summarization utilize large pre-trained Transformer models. Distilling these models to smaller student models has become critically important for practical use; however there are many different distillation methods proposed by the NLP literature. Recent work on distilling BERT for classification and regression tasks shows strong performance using direct knowledge distillation. Alternatively, machine translation practitioners distill using pseudo-labeling, where a small model is trained on the translations of a larger model. A third, simpler approach is to 'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning. We compare these three approaches for distillation of Pegasus and BART, the current and former state of the art, pre-trained summarization models, and find that SFT outperforms knowledge distillation and pseudo-labeling on the CNN/DailyMail dataset, but under-performs pseudo-labeling on the more abstractive XSUM dataset. PyTorch Code and checkpoints of different sizes are available through Hugging Face transformers here http://tiny.cc/4iy0tz." @default.
- W3094342783 created "2020-10-29" @default.
- W3094342783 creator A5042449490 @default.
- W3094342783 creator A5085355324 @default.
- W3094342783 date "2020-10-24" @default.
- W3094342783 modified "2023-09-23" @default.
- W3094342783 title "Pre-trained Summarization Distillation" @default.
- W3094342783 cites W1821462560 @default.
- W3094342783 cites W1965555277 @default.
- W3094342783 cites W2101105183 @default.
- W3094342783 cites W2512924740 @default.
- W3094342783 cites W2606974598 @default.
- W3094342783 cites W2952942550 @default.
- W3094342783 cites W2956301977 @default.
- W3094342783 cites W2963310665 @default.
- W3094342783 cites W2963341956 @default.
- W3094342783 cites W2963736842 @default.
- W3094342783 cites W2970947975 @default.
- W3094342783 cites W2970971581 @default.
- W3094342783 cites W2974875810 @default.
- W3094342783 cites W2975381464 @default.
- W3094342783 cites W2978017171 @default.
- W3094342783 cites W3005444338 @default.
- W3094342783 cites W3008374555 @default.
- W3094342783 cites W3034457371 @default.
- W3094342783 cites W3034999214 @default.
- W3094342783 cites W3035812575 @default.
- W3094342783 cites W3036463250 @default.
- W3094342783 cites W3082274269 @default.
- W3094342783 cites W3082928416 @default.
- W3094342783 cites W3087395956 @default.
- W3094342783 cites W3107826490 @default.
- W3094342783 doi "https://doi.org/10.48550/arxiv.2010.13002" @default.
- W3094342783 hasPublicationYear "2020" @default.
- W3094342783 type Work @default.
- W3094342783 sameAs 3094342783 @default.
- W3094342783 citedByCount "14" @default.
- W3094342783 countsByYear W30943427832020 @default.
- W3094342783 countsByYear W30943427832021 @default.
- W3094342783 countsByYear W30943427832022 @default.
- W3094342783 countsByYear W30943427832023 @default.
- W3094342783 crossrefType "posted-content" @default.
- W3094342783 hasAuthorship W3094342783A5042449490 @default.
- W3094342783 hasAuthorship W3094342783A5085355324 @default.
- W3094342783 hasBestOaLocation W30943427831 @default.
- W3094342783 hasConcept C119599485 @default.
- W3094342783 hasConcept C119857082 @default.
- W3094342783 hasConcept C127413603 @default.
- W3094342783 hasConcept C153180895 @default.
- W3094342783 hasConcept C154945302 @default.
- W3094342783 hasConcept C165801399 @default.
- W3094342783 hasConcept C170858558 @default.
- W3094342783 hasConcept C177264268 @default.
- W3094342783 hasConcept C178790620 @default.
- W3094342783 hasConcept C185592680 @default.
- W3094342783 hasConcept C199360897 @default.
- W3094342783 hasConcept C203005215 @default.
- W3094342783 hasConcept C204030448 @default.
- W3094342783 hasConcept C204321447 @default.
- W3094342783 hasConcept C2776760102 @default.
- W3094342783 hasConcept C41008148 @default.
- W3094342783 hasConcept C66322947 @default.
- W3094342783 hasConceptScore W3094342783C119599485 @default.
- W3094342783 hasConceptScore W3094342783C119857082 @default.
- W3094342783 hasConceptScore W3094342783C127413603 @default.
- W3094342783 hasConceptScore W3094342783C153180895 @default.
- W3094342783 hasConceptScore W3094342783C154945302 @default.
- W3094342783 hasConceptScore W3094342783C165801399 @default.
- W3094342783 hasConceptScore W3094342783C170858558 @default.
- W3094342783 hasConceptScore W3094342783C177264268 @default.
- W3094342783 hasConceptScore W3094342783C178790620 @default.
- W3094342783 hasConceptScore W3094342783C185592680 @default.
- W3094342783 hasConceptScore W3094342783C199360897 @default.
- W3094342783 hasConceptScore W3094342783C203005215 @default.
- W3094342783 hasConceptScore W3094342783C204030448 @default.
- W3094342783 hasConceptScore W3094342783C204321447 @default.
- W3094342783 hasConceptScore W3094342783C2776760102 @default.
- W3094342783 hasConceptScore W3094342783C41008148 @default.
- W3094342783 hasConceptScore W3094342783C66322947 @default.
- W3094342783 hasLocation W30943427831 @default.
- W3094342783 hasOpenAccess W3094342783 @default.
- W3094342783 hasPrimaryLocation W30943427831 @default.
- W3094342783 hasRelatedWork W2747680751 @default.
- W3094342783 hasRelatedWork W3033862527 @default.
- W3094342783 hasRelatedWork W3094342783 @default.
- W3094342783 hasRelatedWork W3097571385 @default.
- W3094342783 hasRelatedWork W3107474891 @default.
- W3094342783 hasRelatedWork W3206841862 @default.
- W3094342783 hasRelatedWork W4281385036 @default.
- W3094342783 hasRelatedWork W4284703357 @default.
- W3094342783 hasRelatedWork W4287208479 @default.
- W3094342783 hasRelatedWork W4287761227 @default.
- W3094342783 isParatext "false" @default.
- W3094342783 isRetracted "false" @default.
- W3094342783 magId "3094342783" @default.
- W3094342783 workType "article" @default.