Matches in SemOpenAlex for { <https://semopenalex.org/work/W3113303810> ?p ?o ?g. }
- W3113303810 abstract "We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the SAME architecture, trained using the SAME algorithm on the SAME data set, and they only differ by the random seeds used in the initialization. We show that ensemble/knowledge distillation in Deep Learning works very differently from traditional learning theory (such as boosting or NTKs, neural tangent kernels). To properly understand them, we develop a theory showing that when data has a structure we refer to as ``multi-view'', then ensemble of independently trained neural networks can provably improve test accuracy, and such superior test accuracy can also be provably distilled into a single model by training a single model to match the output of the ensemble instead of the true label. Our result sheds light on how ensemble works in deep learning in a way that is completely different from traditional theorems, and how the ``dark knowledge'' is hidden in the outputs of the ensemble and can be used in distillation. In the end, we prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy." @default.
- W3113303810 created "2020-12-21" @default.
- W3113303810 creator A5048981295 @default.
- W3113303810 creator A5070340856 @default.
- W3113303810 date "2020-12-17" @default.
- W3113303810 modified "2023-09-24" @default.
- W3113303810 title "Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning" @default.
- W3113303810 cites W1534477342 @default.
- W3113303810 cites W1540358749 @default.
- W3113303810 cites W1546113605 @default.
- W3113303810 cites W1587157779 @default.
- W3113303810 cites W1597142226 @default.
- W3113303810 cites W1622484071 @default.
- W3113303810 cites W1678356000 @default.
- W3113303810 cites W1821462560 @default.
- W3113303810 cites W1967165756 @default.
- W3113303810 cites W1975846642 @default.
- W3113303810 cites W1988790447 @default.
- W3113303810 cites W1992018127 @default.
- W3113303810 cites W2009067475 @default.
- W3113303810 cites W2017337590 @default.
- W3113303810 cites W2024046085 @default.
- W3113303810 cites W2099454382 @default.
- W3113303810 cites W2100128988 @default.
- W3113303810 cites W2100805904 @default.
- W3113303810 cites W2107624415 @default.
- W3113303810 cites W2109094355 @default.
- W3113303810 cites W2113242816 @default.
- W3113303810 cites W2124868070 @default.
- W3113303810 cites W2124951716 @default.
- W3113303810 cites W2128073546 @default.
- W3113303810 cites W2135293965 @default.
- W3113303810 cites W2141014294 @default.
- W3113303810 cites W2142834259 @default.
- W3113303810 cites W2145073242 @default.
- W3113303810 cites W2150757437 @default.
- W3113303810 cites W2155806188 @default.
- W3113303810 cites W2158275940 @default.
- W3113303810 cites W2164073372 @default.
- W3113303810 cites W2167055186 @default.
- W3113303810 cites W2167917621 @default.
- W3113303810 cites W2171809276 @default.
- W3113303810 cites W2399994860 @default.
- W3113303810 cites W2401231614 @default.
- W3113303810 cites W2508418541 @default.
- W3113303810 cites W2557139899 @default.
- W3113303810 cites W2573849764 @default.
- W3113303810 cites W2587694128 @default.
- W3113303810 cites W2593709294 @default.
- W3113303810 cites W2593958421 @default.
- W3113303810 cites W2625063094 @default.
- W3113303810 cites W2633884958 @default.
- W3113303810 cites W2747909401 @default.
- W3113303810 cites W2765701314 @default.
- W3113303810 cites W2766371994 @default.
- W3113303810 cites W2791315675 @default.
- W3113303810 cites W2800987155 @default.
- W3113303810 cites W2803023299 @default.
- W3113303810 cites W2809090039 @default.
- W3113303810 cites W2886067286 @default.
- W3113303810 cites W2894604724 @default.
- W3113303810 cites W2898871740 @default.
- W3113303810 cites W2899748887 @default.
- W3113303810 cites W2900959181 @default.
- W3113303810 cites W2911867426 @default.
- W3113303810 cites W2912934387 @default.
- W3113303810 cites W2913473169 @default.
- W3113303810 cites W2913892099 @default.
- W3113303810 cites W2937297214 @default.
- W3113303810 cites W2948009788 @default.
- W3113303810 cites W2949798199 @default.
- W3113303810 cites W2949978219 @default.
- W3113303810 cites W2952318479 @default.
- W3113303810 cites W2958636547 @default.
- W3113303810 cites W2962698540 @default.
- W3113303810 cites W2962939986 @default.
- W3113303810 cites W2963417959 @default.
- W3113303810 cites W2963446085 @default.
- W3113303810 cites W2963519230 @default.
- W3113303810 cites W2964031251 @default.
- W3113303810 cites W2964220233 @default.
- W3113303810 cites W2964337047 @default.
- W3113303810 cites W2970241199 @default.
- W3113303810 cites W2970332347 @default.
- W3113303810 cites W2970618525 @default.
- W3113303810 cites W2971043187 @default.
- W3113303810 cites W2971169274 @default.
- W3113303810 cites W2972392510 @default.
- W3113303810 cites W2987861506 @default.
- W3113303810 cites W2991290085 @default.
- W3113303810 cites W3000127803 @default.
- W3113303810 cites W3006051380 @default.
- W3113303810 cites W3010154184 @default.
- W3113303810 cites W3010476364 @default.
- W3113303810 cites W3028525609 @default.
- W3113303810 cites W3046749015 @default.
- W3113303810 cites W3103722330 @default.
- W3113303810 cites W3118608800 @default.
- W3113303810 cites W3152114226 @default.
- W3113303810 doi "https://doi.org/10.48550/arxiv.2012.09816" @default.