Matches in SemOpenAlex for { <https://semopenalex.org/work/W3207786655> ?p ?o ?g. }
- W3207786655 abstract "Understanding the of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold. Intuitively, with a sufficiently small learning rate $eta$, SGD tracks Gradient Descent (GD) until it gets close to such manifold, where the gradient noise prevents further convergence. In such a regime, Blanc et al. (2020) proved that SGD with label noise locally decreases a regularizer-like term, the sharpness of loss, $mathrm{tr}[nabla^2 L]$. The current paper gives a general framework for such analysis by adapting ideas from Katzenberger (1991). It allows in principle a complete characterization for the regularization effect of SGD around such manifold -- i.e., the implicit bias -- using a stochastic differential equation (SDE) describing the limiting dynamics of the parameters, which is determined jointly by the loss function and the noise covariance. This yields some new results: (1) a global analysis of the valid for $eta^{-2}$ steps, in contrast to the local analysis of Blanc et al. (2020) that is only valid for $eta^{-1.6}$ steps and (2) allowing arbitrary noise covariance. As an application, we show with arbitrary large initialization, label noise SGD can always escape the kernel regime and only requires $O(kappaln d)$ samples for learning an $kappa$-sparse overparametrized linear model in $mathbb{R}^d$ (Woodworth et al., 2020), while GD initialized in the kernel regime requires $Omega(d)$ samples. This upper bound is minimax optimal and improves the previous $tilde{O}(kappa^2)$ upper bound (HaoChen et al., 2020)." @default.
- W3207786655 created "2021-10-25" @default.
- W3207786655 creator A5005741793 @default.
- W3207786655 creator A5013194814 @default.
- W3207786655 creator A5079951047 @default.
- W3207786655 date "2021-10-13" @default.
- W3207786655 modified "2023-09-27" @default.
- W3207786655 title "What Happens after SGD Reaches Zero Loss? -A Mathematical Framework." @default.
- W3207786655 cites W1590072560 @default.
- W3207786655 cites W1598866093 @default.
- W3207786655 cites W1684009677 @default.
- W3207786655 cites W1967869046 @default.
- W3207786655 cites W1974001213 @default.
- W3207786655 cites W1987217557 @default.
- W3207786655 cites W2004915807 @default.
- W3207786655 cites W2010029425 @default.
- W3207786655 cites W2132565565 @default.
- W3207786655 cites W2153542182 @default.
- W3207786655 cites W2474090883 @default.
- W3207786655 cites W2593958421 @default.
- W3207786655 cites W2617242334 @default.
- W3207786655 cites W2622263826 @default.
- W3207786655 cites W2766900638 @default.
- W3207786655 cites W2768267830 @default.
- W3207786655 cites W2799160179 @default.
- W3207786655 cites W2804386825 @default.
- W3207786655 cites W2809090039 @default.
- W3207786655 cites W2886067286 @default.
- W3207786655 cites W2892675615 @default.
- W3207786655 cites W2894201461 @default.
- W3207786655 cites W2894604724 @default.
- W3207786655 cites W2899748887 @default.
- W3207786655 cites W2911742574 @default.
- W3207786655 cites W2913473169 @default.
- W3207786655 cites W2914484425 @default.
- W3207786655 cites W2914671377 @default.
- W3207786655 cites W2921013684 @default.
- W3207786655 cites W2924791586 @default.
- W3207786655 cites W2951934643 @default.
- W3207786655 cites W2952204734 @default.
- W3207786655 cites W2958636547 @default.
- W3207786655 cites W2962698540 @default.
- W3207786655 cites W2962754331 @default.
- W3207786655 cites W2962798807 @default.
- W3207786655 cites W2963025848 @default.
- W3207786655 cites W2963239103 @default.
- W3207786655 cites W2963336603 @default.
- W3207786655 cites W2963376662 @default.
- W3207786655 cites W2963384892 @default.
- W3207786655 cites W2963474066 @default.
- W3207786655 cites W2963798163 @default.
- W3207786655 cites W2963959597 @default.
- W3207786655 cites W2964031251 @default.
- W3207786655 cites W2964047251 @default.
- W3207786655 cites W2964072432 @default.
- W3207786655 cites W2970170116 @default.
- W3207786655 cites W2970259623 @default.
- W3207786655 cites W2971067248 @default.
- W3207786655 cites W2995625976 @default.
- W3207786655 cites W3004700632 @default.
- W3207786655 cites W3022001591 @default.
- W3207786655 cites W3034731342 @default.
- W3207786655 cites W3034840375 @default.
- W3207786655 cites W3034877436 @default.
- W3207786655 cites W3046680711 @default.
- W3207786655 cites W3046739385 @default.
- W3207786655 cites W3057525668 @default.
- W3207786655 cites W3093317808 @default.
- W3207786655 cites W3104527631 @default.
- W3207786655 cites W3121845051 @default.
- W3207786655 cites W3130298903 @default.
- W3207786655 cites W3169250519 @default.
- W3207786655 cites W3170960122 @default.
- W3207786655 hasPublicationYear "2021" @default.
- W3207786655 type Work @default.
- W3207786655 sameAs 3207786655 @default.
- W3207786655 citedByCount "0" @default.
- W3207786655 crossrefType "posted-content" @default.
- W3207786655 hasAuthorship W3207786655A5005741793 @default.
- W3207786655 hasAuthorship W3207786655A5013194814 @default.
- W3207786655 hasAuthorship W3207786655A5079951047 @default.
- W3207786655 hasConcept C105795698 @default.
- W3207786655 hasConcept C114466953 @default.
- W3207786655 hasConcept C114614502 @default.
- W3207786655 hasConcept C115961682 @default.
- W3207786655 hasConcept C121332964 @default.
- W3207786655 hasConcept C126255220 @default.
- W3207786655 hasConcept C127413603 @default.
- W3207786655 hasConcept C14036430 @default.
- W3207786655 hasConcept C149728462 @default.
- W3207786655 hasConcept C153258448 @default.
- W3207786655 hasConcept C154945302 @default.
- W3207786655 hasConcept C178650346 @default.
- W3207786655 hasConcept C199360897 @default.
- W3207786655 hasConcept C206688291 @default.
- W3207786655 hasConcept C2776135515 @default.
- W3207786655 hasConcept C2779557605 @default.
- W3207786655 hasConcept C28826006 @default.
- W3207786655 hasConcept C33923547 @default.
- W3207786655 hasConcept C41008148 @default.