Matches in SemOpenAlex for { <https://semopenalex.org/work/W3136879969> ?p ?o ?g. }
- W3136879969 abstract "The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings." @default.
- W3136879969 created "2021-03-29" @default.
- W3136879969 creator A5011999109 @default.
- W3136879969 creator A5030261391 @default.
- W3136879969 creator A5076656836 @default.
- W3136879969 date "2021-03-16" @default.
- W3136879969 modified "2023-09-27" @default.
- W3136879969 title "Deep learning: a statistical viewpoint" @default.
- W3136879969 cites W1480376833 @default.
- W3136879969 cites W1496357020 @default.
- W3136879969 cites W1501882007 @default.
- W3136879969 cites W1511694993 @default.
- W3136879969 cites W1522579744 @default.
- W3136879969 cites W1523240376 @default.
- W3136879969 cites W1526146785 @default.
- W3136879969 cites W1542886316 @default.
- W3136879969 cites W1546851689 @default.
- W3136879969 cites W1573820523 @default.
- W3136879969 cites W1576778649 @default.
- W3136879969 cites W1594986075 @default.
- W3136879969 cites W1614597761 @default.
- W3136879969 cites W1678356000 @default.
- W3136879969 cites W1966280301 @default.
- W3136879969 cites W1967573895 @default.
- W3136879969 cites W196871588 @default.
- W3136879969 cites W1975846642 @default.
- W3136879969 cites W1982032418 @default.
- W3136879969 cites W1988790447 @default.
- W3136879969 cites W2006698588 @default.
- W3136879969 cites W2011039300 @default.
- W3136879969 cites W2012501405 @default.
- W3136879969 cites W2022465039 @default.
- W3136879969 cites W2023163512 @default.
- W3136879969 cites W2024489202 @default.
- W3136879969 cites W2028461624 @default.
- W3136879969 cites W2029538739 @default.
- W3136879969 cites W2042024619 @default.
- W3136879969 cites W2045313701 @default.
- W3136879969 cites W2051580875 @default.
- W3136879969 cites W2053113791 @default.
- W3136879969 cites W2068221105 @default.
- W3136879969 cites W2072059627 @default.
- W3136879969 cites W2083853335 @default.
- W3136879969 cites W2084544490 @default.
- W3136879969 cites W2087258353 @default.
- W3136879969 cites W2099579348 @default.
- W3136879969 cites W2100116519 @default.
- W3136879969 cites W2103164654 @default.
- W3136879969 cites W2104364170 @default.
- W3136879969 cites W2108383531 @default.
- W3136879969 cites W2112545207 @default.
- W3136879969 cites W2115881141 @default.
- W3136879969 cites W2120875792 @default.
- W3136879969 cites W2122111042 @default.
- W3136879969 cites W2123395972 @default.
- W3136879969 cites W2132314398 @default.
- W3136879969 cites W2133442019 @default.
- W3136879969 cites W2134331967 @default.
- W3136879969 cites W2135046866 @default.
- W3136879969 cites W2135825502 @default.
- W3136879969 cites W2139338362 @default.
- W3136879969 cites W2141690599 @default.
- W3136879969 cites W2144902422 @default.
- W3136879969 cites W2147772771 @default.
- W3136879969 cites W2152026596 @default.
- W3136879969 cites W2153714959 @default.
- W3136879969 cites W2154952480 @default.
- W3136879969 cites W2155534369 @default.
- W3136879969 cites W2156876426 @default.
- W3136879969 cites W2159058260 @default.
- W3136879969 cites W2163321856 @default.
- W3136879969 cites W2165758113 @default.
- W3136879969 cites W2170586091 @default.
- W3136879969 cites W2185932763 @default.
- W3136879969 cites W2211925278 @default.
- W3136879969 cites W2215331545 @default.
- W3136879969 cites W2342070830 @default.
- W3136879969 cites W2557283755 @default.
- W3136879969 cites W2570121038 @default.
- W3136879969 cites W2579923771 @default.
- W3136879969 cites W2613715972 @default.
- W3136879969 cites W2752851182 @default.
- W3136879969 cites W2771061327 @default.
- W3136879969 cites W2787248994 @default.
- W3136879969 cites W2790253170 @default.
- W3136879969 cites W2809090039 @default.
- W3136879969 cites W2889737445 @default.
- W3136879969 cites W2896721680 @default.
- W3136879969 cites W2899476926 @default.
- W3136879969 cites W2899748887 @default.
- W3136879969 cites W2903327037 @default.
- W3136879969 cites W2905800572 @default.
- W3136879969 cites W2911742574 @default.
- W3136879969 cites W2923764619 @default.
- W3136879969 cites W2949247311 @default.
- W3136879969 cites W2952204734 @default.
- W3136879969 cites W2959995783 @default.
- W3136879969 cites W2962698540 @default.
- W3136879969 cites W2962857907 @default.
- W3136879969 cites W2963013450 @default.