SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W3005981462> ?p ?o ?g. }

Showing items 1 to 100 of ±104 with 100 items per page.

W3005981462 endingPage "7727" @default.
W3005981462 startingPage "7717" @default.
W3005981462 abstract "Most modern learning problems are highly overparameterized, i.e., have many more model parameters than the number of training data points. As a result, the training loss may have infinitely many global minima (parameter vectors that perfectly “interpolate” the training data). It is therefore imperative to understand which interpolating solutions we converge to, how they depend on the initialization and learning algorithm, and whether they yield different test errors. In this article, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which stochastic gradient descent (SGD) is a special case. Recently, it has been shown that for overparameterized linear models, SMD converges to the closest global minimum to the initialization point, where closeness is in terms of the Bregman divergence corresponding to the potential function of the mirror descent. With appropriate initialization, this yields convergence to the minimum-potential interpolating solution, a phenomenon referred to as implicit regularization. On the theory side, we show that for sufficiently-overparameterized nonlinear models, SMD with a (small enough) fixed step size converges to a global minimum that is “very close” (in Bregman divergence) to the minimum-potential interpolating solution, thus attaining approximate implicit regularization. On the empirical side, our experiments on the MNIST and CIFAR-10 datasets consistently confirm that the above phenomenon occurs in practical scenarios. They further indicate a clear difference in the generalization performances of different SMD algorithms: experiments on the CIFAR-10 dataset with different regularizers, <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$ell _{1}$ </tex-math></inline-formula> to encourage sparsity, <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$ell _{2}$ </tex-math></inline-formula> (SGD) to encourage small Euclidean norm, and <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$ell _{infty }$ </tex-math></inline-formula> to discourage large components, surprisingly show that the <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$ell _{infty }$ </tex-math></inline-formula> norm consistently yields better generalization performance than SGD, which in turn generalizes better than the <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$ell _{1}$ </tex-math></inline-formula> norm." @default.
W3005981462 created "2020-02-24" @default.
W3005981462 creator A5002430773 @default.
W3005981462 creator A5005748450 @default.
W3005981462 creator A5036507616 @default.
W3005981462 date "2022-12-01" @default.
W3005981462 modified "2023-09-26" @default.
W3005981462 title "Stochastic Mirror Descent on Overparameterized Nonlinear Models" @default.
W3005981462 cites W1540586255 @default.
W3005981462 cites W2016384870 @default.
W3005981462 cites W2069317438 @default.
W3005981462 cites W2102800374 @default.
W3005981462 cites W2112796928 @default.
W3005981462 cites W2143612262 @default.
W3005981462 cites W2145339207 @default.
W3005981462 cites W2194775991 @default.
W3005981462 cites W2257979135 @default.
W3005981462 cites W2794413559 @default.
W3005981462 cites W2803423166 @default.
W3005981462 cites W2919115771 @default.
W3005981462 cites W2963177640 @default.
W3005981462 cites W2964047251 @default.
W3005981462 doi "https://doi.org/10.1109/tnnls.2021.3087480" @default.
W3005981462 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/34270431" @default.
W3005981462 hasPublicationYear "2022" @default.
W3005981462 type Work @default.
W3005981462 sameAs 3005981462 @default.
W3005981462 citedByCount "6" @default.
W3005981462 countsByYear W30059814622021 @default.
W3005981462 countsByYear W30059814622023 @default.
W3005981462 crossrefType "journal-article" @default.
W3005981462 hasAuthorship W3005981462A5002430773 @default.
W3005981462 hasAuthorship W3005981462A5005748450 @default.
W3005981462 hasAuthorship W3005981462A5036507616 @default.
W3005981462 hasBestOaLocation W30059814622 @default.
W3005981462 hasConcept C11413529 @default.
W3005981462 hasConcept C114466953 @default.
W3005981462 hasConcept C121332964 @default.
W3005981462 hasConcept C126255220 @default.
W3005981462 hasConcept C134306372 @default.
W3005981462 hasConcept C138885662 @default.
W3005981462 hasConcept C149073432 @default.
W3005981462 hasConcept C153258448 @default.
W3005981462 hasConcept C154945302 @default.
W3005981462 hasConcept C158622935 @default.
W3005981462 hasConcept C186633575 @default.
W3005981462 hasConcept C199360897 @default.
W3005981462 hasConcept C206688291 @default.
W3005981462 hasConcept C207390915 @default.
W3005981462 hasConcept C2776135515 @default.
W3005981462 hasConcept C28826006 @default.
W3005981462 hasConcept C33923547 @default.
W3005981462 hasConcept C41008148 @default.
W3005981462 hasConcept C41895202 @default.
W3005981462 hasConcept C50644808 @default.
W3005981462 hasConcept C62520636 @default.
W3005981462 hasConceptScore W3005981462C11413529 @default.
W3005981462 hasConceptScore W3005981462C114466953 @default.
W3005981462 hasConceptScore W3005981462C121332964 @default.
W3005981462 hasConceptScore W3005981462C126255220 @default.
W3005981462 hasConceptScore W3005981462C134306372 @default.
W3005981462 hasConceptScore W3005981462C138885662 @default.
W3005981462 hasConceptScore W3005981462C149073432 @default.
W3005981462 hasConceptScore W3005981462C153258448 @default.
W3005981462 hasConceptScore W3005981462C154945302 @default.
W3005981462 hasConceptScore W3005981462C158622935 @default.
W3005981462 hasConceptScore W3005981462C186633575 @default.
W3005981462 hasConceptScore W3005981462C199360897 @default.
W3005981462 hasConceptScore W3005981462C206688291 @default.
W3005981462 hasConceptScore W3005981462C207390915 @default.
W3005981462 hasConceptScore W3005981462C2776135515 @default.
W3005981462 hasConceptScore W3005981462C28826006 @default.
W3005981462 hasConceptScore W3005981462C33923547 @default.
W3005981462 hasConceptScore W3005981462C41008148 @default.
W3005981462 hasConceptScore W3005981462C41895202 @default.
W3005981462 hasConceptScore W3005981462C50644808 @default.
W3005981462 hasConceptScore W3005981462C62520636 @default.
W3005981462 hasFunder F4320306076 @default.
W3005981462 hasFunder F4320308258 @default.
W3005981462 hasFunder F4320310598 @default.
W3005981462 hasFunder F4320332375 @default.
W3005981462 hasIssue "12" @default.
W3005981462 hasLocation W30059814621 @default.
W3005981462 hasLocation W30059814622 @default.
W3005981462 hasLocation W30059814623 @default.
W3005981462 hasOpenAccess W3005981462 @default.
W3005981462 hasPrimaryLocation W30059814621 @default.
W3005981462 hasRelatedWork W2900959181 @default.
W3005981462 hasRelatedWork W2948482439 @default.
W3005981462 hasRelatedWork W2948488743 @default.
W3005981462 hasRelatedWork W3005981462 @default.
W3005981462 hasRelatedWork W3042560000 @default.
W3005981462 hasRelatedWork W3096287559 @default.
W3005981462 hasRelatedWork W3103375947 @default.
W3005981462 hasRelatedWork W3204601670 @default.
W3005981462 hasRelatedWork W4287606052 @default.
W3005981462 hasRelatedWork W4287714231 @default.
W3005981462 hasVolume "33" @default.