Matches in SemOpenAlex for { <https://semopenalex.org/work/W3207862512> ?p ?o ?g. }
- W3207862512 abstract "Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time. However, the improved inference speed may be still unsatisfactory for certain time-sensitive applications. In this paper, we aim to further push the limit of inference speed by exploring a new area in the design space of the student model. More specifically, we consider distilling a transformer-based text classifier into a billion-parameter, sparsely-activated student model with a embedding-averaging architecture. Our experiments show that the student models retain 97% of the RoBERTa-Large teacher performance on a collection of six text classification tasks. Meanwhile, the student model achieves up to 600x speed-up on both GPUs and CPUs, compared to the teacher models. Further investigation shows that our pipeline is also effective in privacy-preserving and domain generalization settings." @default.
- W3207862512 created "2021-10-25" @default.
- W3207862512 creator A5004412943 @default.
- W3207862512 creator A5009408707 @default.
- W3207862512 creator A5022956157 @default.
- W3207862512 creator A5049450075 @default.
- W3207862512 creator A5054253075 @default.
- W3207862512 creator A5082970015 @default.
- W3207862512 date "2021-10-16" @default.
- W3207862512 modified "2023-10-16" @default.
- W3207862512 title "Sparse Distillation: Speeding Up Text Classification by Using Bigger Models" @default.
- W3207862512 cites W1821462560 @default.
- W3207862512 cites W2070246124 @default.
- W3207862512 cites W2109664771 @default.
- W3207862512 cites W2113459411 @default.
- W3207862512 cites W2250473257 @default.
- W3207862512 cites W2250653840 @default.
- W3207862512 cites W2540646130 @default.
- W3207862512 cites W2907947679 @default.
- W3207862512 cites W2920807444 @default.
- W3207862512 cites W2924902521 @default.
- W3207862512 cites W2963012544 @default.
- W3207862512 cites W2963341956 @default.
- W3207862512 cites W2965373594 @default.
- W3207862512 cites W2971196067 @default.
- W3207862512 cites W2974875810 @default.
- W3207862512 cites W2975429091 @default.
- W3207862512 cites W2978017171 @default.
- W3207862512 cites W2988217457 @default.
- W3207862512 cites W2997710335 @default.
- W3207862512 cites W3015468748 @default.
- W3207862512 cites W3033529678 @default.
- W3207862512 cites W3034175374 @default.
- W3207862512 cites W3034340181 @default.
- W3207862512 cites W3034457371 @default.
- W3207862512 cites W3035102548 @default.
- W3207862512 cites W3085139254 @default.
- W3207862512 cites W3103412034 @default.
- W3207862512 cites W3104613728 @default.
- W3207862512 cites W3119866685 @default.
- W3207862512 cites W3132730484 @default.
- W3207862512 cites W3152698349 @default.
- W3207862512 cites W3162276117 @default.
- W3207862512 hasPublicationYear "2021" @default.
- W3207862512 type Work @default.
- W3207862512 sameAs 3207862512 @default.
- W3207862512 citedByCount "0" @default.
- W3207862512 crossrefType "posted-content" @default.
- W3207862512 hasAuthorship W3207862512A5004412943 @default.
- W3207862512 hasAuthorship W3207862512A5009408707 @default.
- W3207862512 hasAuthorship W3207862512A5022956157 @default.
- W3207862512 hasAuthorship W3207862512A5049450075 @default.
- W3207862512 hasAuthorship W3207862512A5054253075 @default.
- W3207862512 hasAuthorship W3207862512A5082970015 @default.
- W3207862512 hasBestOaLocation W32078625121 @default.
- W3207862512 hasConcept C11413529 @default.
- W3207862512 hasConcept C119599485 @default.
- W3207862512 hasConcept C119857082 @default.
- W3207862512 hasConcept C123657996 @default.
- W3207862512 hasConcept C127413603 @default.
- W3207862512 hasConcept C134306372 @default.
- W3207862512 hasConcept C142362112 @default.
- W3207862512 hasConcept C153349607 @default.
- W3207862512 hasConcept C154945302 @default.
- W3207862512 hasConcept C165801399 @default.
- W3207862512 hasConcept C173608175 @default.
- W3207862512 hasConcept C177148314 @default.
- W3207862512 hasConcept C178790620 @default.
- W3207862512 hasConcept C185592680 @default.
- W3207862512 hasConcept C199360897 @default.
- W3207862512 hasConcept C204030448 @default.
- W3207862512 hasConcept C2776214188 @default.
- W3207862512 hasConcept C33923547 @default.
- W3207862512 hasConcept C41008148 @default.
- W3207862512 hasConcept C41608201 @default.
- W3207862512 hasConcept C43521106 @default.
- W3207862512 hasConcept C45374587 @default.
- W3207862512 hasConcept C66322947 @default.
- W3207862512 hasConcept C68339613 @default.
- W3207862512 hasConcept C95623464 @default.
- W3207862512 hasConceptScore W3207862512C11413529 @default.
- W3207862512 hasConceptScore W3207862512C119599485 @default.
- W3207862512 hasConceptScore W3207862512C119857082 @default.
- W3207862512 hasConceptScore W3207862512C123657996 @default.
- W3207862512 hasConceptScore W3207862512C127413603 @default.
- W3207862512 hasConceptScore W3207862512C134306372 @default.
- W3207862512 hasConceptScore W3207862512C142362112 @default.
- W3207862512 hasConceptScore W3207862512C153349607 @default.
- W3207862512 hasConceptScore W3207862512C154945302 @default.
- W3207862512 hasConceptScore W3207862512C165801399 @default.
- W3207862512 hasConceptScore W3207862512C173608175 @default.
- W3207862512 hasConceptScore W3207862512C177148314 @default.
- W3207862512 hasConceptScore W3207862512C178790620 @default.
- W3207862512 hasConceptScore W3207862512C185592680 @default.
- W3207862512 hasConceptScore W3207862512C199360897 @default.
- W3207862512 hasConceptScore W3207862512C204030448 @default.
- W3207862512 hasConceptScore W3207862512C2776214188 @default.
- W3207862512 hasConceptScore W3207862512C33923547 @default.
- W3207862512 hasConceptScore W3207862512C41008148 @default.
- W3207862512 hasConceptScore W3207862512C41608201 @default.