Matches in SemOpenAlex for { <https://semopenalex.org/work/W3213600112> ?p ?o ?g. }
- W3213600112 abstract "The generalization mystery of overparametrized deep nets has motivated efforts to understand how gradient descent (GD) converges to low-loss solutions that generalize well. Real-life neural networks are initialized from small random values and trained with cross-entropy loss for classification (unlike the lazy or NTK regime of training where analysis was more successful), and a recent sequence of results (Lyu and Li, 2020; Chizat and Bach, 2020; Ji and Telgarsky, 2020) provide theoretical evidence that GD may converge to the max-margin solution with zero loss, which presumably generalizes well. However, the global optimality of margin is proved only in some settings where neural nets are infinitely or exponentially wide. The current paper is able to establish this global optimality for two-layer Leaky ReLU nets trained with gradient flow on linearly separable and symmetric data, regardless of the width. The analysis also gives some theoretical justification for recent empirical findings (Kalimeris et al., 2019) on the so-called simplicity bias of GD towards linear or other classes of solutions, especially early in training. On the pessimistic side, the paper suggests that such results are fragile. A simple data manipulation can make gradient flow converge to a linear classifier with suboptimal margin." @default.
- W3213600112 created "2021-11-22" @default.
- W3213600112 creator A5013194814 @default.
- W3213600112 creator A5030891201 @default.
- W3213600112 creator A5059693721 @default.
- W3213600112 creator A5079951047 @default.
- W3213600112 date "2021-10-26" @default.
- W3213600112 modified "2023-09-27" @default.
- W3213600112 title "Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias" @default.
- W3213600112 cites W1677182931 @default.
- W3213600112 cites W2000985550 @default.
- W3213600112 cites W2068357778 @default.
- W3213600112 cites W2155858901 @default.
- W3213600112 cites W2790253170 @default.
- W3213600112 cites W2790814024 @default.
- W3213600112 cites W2806252860 @default.
- W3213600112 cites W2809090039 @default.
- W3213600112 cites W2899748887 @default.
- W3213600112 cites W2900103278 @default.
- W3213600112 cites W2900959181 @default.
- W3213600112 cites W2911742574 @default.
- W3213600112 cites W2913180865 @default.
- W3213600112 cites W2951520791 @default.
- W3213600112 cites W2952204734 @default.
- W3213600112 cites W2962698540 @default.
- W3213600112 cites W2963100491 @default.
- W3213600112 cites W2963239103 @default.
- W3213600112 cites W2963285844 @default.
- W3213600112 cites W2963664410 @default.
- W3213600112 cites W2963798163 @default.
- W3213600112 cites W2963826371 @default.
- W3213600112 cites W2963837241 @default.
- W3213600112 cites W2964031251 @default.
- W3213600112 cites W2964072686 @default.
- W3213600112 cites W2964084001 @default.
- W3213600112 cites W2964161337 @default.
- W3213600112 cites W2964210434 @default.
- W3213600112 cites W2964220724 @default.
- W3213600112 cites W2965772785 @default.
- W3213600112 cites W2970166047 @default.
- W3213600112 cites W2970259623 @default.
- W3213600112 cites W2971127900 @default.
- W3213600112 cites W2989989463 @default.
- W3213600112 cites W2994848047 @default.
- W3213600112 cites W2995625976 @default.
- W3213600112 cites W2996168800 @default.
- W3213600112 cites W2996210821 @default.
- W3213600112 cites W3046680711 @default.
- W3213600112 cites W3046864714 @default.
- W3213600112 cites W3099180329 @default.
- W3213600112 cites W3100200769 @default.
- W3213600112 cites W3100332668 @default.
- W3213600112 cites W3105547603 @default.
- W3213600112 cites W3121845051 @default.
- W3213600112 cites W3122402965 @default.
- W3213600112 cites W3122626815 @default.
- W3213600112 cites W3129089775 @default.
- W3213600112 cites W3137695714 @default.
- W3213600112 cites W3142026569 @default.
- W3213600112 cites W3164705719 @default.
- W3213600112 cites W3168121459 @default.
- W3213600112 cites W3169764433 @default.
- W3213600112 cites W3171560591 @default.
- W3213600112 hasPublicationYear "2021" @default.
- W3213600112 type Work @default.
- W3213600112 sameAs 3213600112 @default.
- W3213600112 citedByCount "0" @default.
- W3213600112 crossrefType "posted-content" @default.
- W3213600112 hasAuthorship W3213600112A5013194814 @default.
- W3213600112 hasAuthorship W3213600112A5030891201 @default.
- W3213600112 hasAuthorship W3213600112A5059693721 @default.
- W3213600112 hasAuthorship W3213600112A5079951047 @default.
- W3213600112 hasConcept C11413529 @default.
- W3213600112 hasConcept C119857082 @default.
- W3213600112 hasConcept C126255220 @default.
- W3213600112 hasConcept C134306372 @default.
- W3213600112 hasConcept C153258448 @default.
- W3213600112 hasConcept C154945302 @default.
- W3213600112 hasConcept C167879884 @default.
- W3213600112 hasConcept C28826006 @default.
- W3213600112 hasConcept C33923547 @default.
- W3213600112 hasConcept C41008148 @default.
- W3213600112 hasConcept C50644808 @default.
- W3213600112 hasConcept C774472 @default.
- W3213600112 hasConceptScore W3213600112C11413529 @default.
- W3213600112 hasConceptScore W3213600112C119857082 @default.
- W3213600112 hasConceptScore W3213600112C126255220 @default.
- W3213600112 hasConceptScore W3213600112C134306372 @default.
- W3213600112 hasConceptScore W3213600112C153258448 @default.
- W3213600112 hasConceptScore W3213600112C154945302 @default.
- W3213600112 hasConceptScore W3213600112C167879884 @default.
- W3213600112 hasConceptScore W3213600112C28826006 @default.
- W3213600112 hasConceptScore W3213600112C33923547 @default.
- W3213600112 hasConceptScore W3213600112C41008148 @default.
- W3213600112 hasConceptScore W3213600112C50644808 @default.
- W3213600112 hasConceptScore W3213600112C774472 @default.
- W3213600112 hasLocation W32136001121 @default.
- W3213600112 hasOpenAccess W3213600112 @default.
- W3213600112 hasPrimaryLocation W32136001121 @default.
- W3213600112 hasRelatedWork W2102054292 @default.