Matches in SemOpenAlex for { <https://semopenalex.org/work/W3136604584> ?p ?o ?g. }
Showing items 1 to 78 of
78
with 100 items per page.
- W3136604584 abstract "Understanding the algorithmic bias of emph{stochastic gradient descent} (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works, however, focus on emph{very small or even infinitesimal} learning rate regime, and fail to cover practical scenarios where the learning rate is emph{moderate and annealing}. In this paper, we make an initial attempt to characterize the particular regularization effect of SGD in the moderate learning rate regime by studying its behavior for optimizing an overparameterized linear regression problem. In this case, SGD and GD are known to converge to the unique minimum-norm solution; however, with the moderate and annealing learning rate, we show that they exhibit different emph{directional bias}: SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions. Furthermore, we show that such directional bias does matter when early stopping is adopted, where the SGD output is nearly optimal but the GD output is suboptimal. Finally, our theory explains several folk arts in practice used for SGD hyperparameter tuning, such as (1) linearly scaling the initial learning rate with batch size; and (2) overrunning SGD with high learning rate even when the loss stops decreasing." @default.
- W3136604584 created "2021-03-29" @default.
- W3136604584 creator A5014462874 @default.
- W3136604584 creator A5051448391 @default.
- W3136604584 creator A5053736722 @default.
- W3136604584 creator A5085848346 @default.
- W3136604584 date "2020-09-28" @default.
- W3136604584 modified "2023-09-28" @default.
- W3136604584 title "Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate" @default.
- W3136604584 hasPublicationYear "2020" @default.
- W3136604584 type Work @default.
- W3136604584 sameAs 3136604584 @default.
- W3136604584 citedByCount "1" @default.
- W3136604584 countsByYear W31366045842021 @default.
- W3136604584 crossrefType "proceedings-article" @default.
- W3136604584 hasAuthorship W3136604584A5014462874 @default.
- W3136604584 hasAuthorship W3136604584A5051448391 @default.
- W3136604584 hasAuthorship W3136604584A5053736722 @default.
- W3136604584 hasAuthorship W3136604584A5085848346 @default.
- W3136604584 hasConcept C108583219 @default.
- W3136604584 hasConcept C11413529 @default.
- W3136604584 hasConcept C119857082 @default.
- W3136604584 hasConcept C121332964 @default.
- W3136604584 hasConcept C126255220 @default.
- W3136604584 hasConcept C154945302 @default.
- W3136604584 hasConcept C158693339 @default.
- W3136604584 hasConcept C206688291 @default.
- W3136604584 hasConcept C2524010 @default.
- W3136604584 hasConcept C28826006 @default.
- W3136604584 hasConcept C33923547 @default.
- W3136604584 hasConcept C41008148 @default.
- W3136604584 hasConcept C50644808 @default.
- W3136604584 hasConcept C62520636 @default.
- W3136604584 hasConcept C8642999 @default.
- W3136604584 hasConcept C99844830 @default.
- W3136604584 hasConceptScore W3136604584C108583219 @default.
- W3136604584 hasConceptScore W3136604584C11413529 @default.
- W3136604584 hasConceptScore W3136604584C119857082 @default.
- W3136604584 hasConceptScore W3136604584C121332964 @default.
- W3136604584 hasConceptScore W3136604584C126255220 @default.
- W3136604584 hasConceptScore W3136604584C154945302 @default.
- W3136604584 hasConceptScore W3136604584C158693339 @default.
- W3136604584 hasConceptScore W3136604584C206688291 @default.
- W3136604584 hasConceptScore W3136604584C2524010 @default.
- W3136604584 hasConceptScore W3136604584C28826006 @default.
- W3136604584 hasConceptScore W3136604584C33923547 @default.
- W3136604584 hasConceptScore W3136604584C41008148 @default.
- W3136604584 hasConceptScore W3136604584C50644808 @default.
- W3136604584 hasConceptScore W3136604584C62520636 @default.
- W3136604584 hasConceptScore W3136604584C8642999 @default.
- W3136604584 hasConceptScore W3136604584C99844830 @default.
- W3136604584 hasLocation W31366045841 @default.
- W3136604584 hasOpenAccess W3136604584 @default.
- W3136604584 hasPrimaryLocation W31366045841 @default.
- W3136604584 hasRelatedWork W2807930578 @default.
- W3136604584 hasRelatedWork W2884583528 @default.
- W3136604584 hasRelatedWork W2898987415 @default.
- W3136604584 hasRelatedWork W2902123811 @default.
- W3136604584 hasRelatedWork W2905181074 @default.
- W3136604584 hasRelatedWork W2951196414 @default.
- W3136604584 hasRelatedWork W2952132225 @default.
- W3136604584 hasRelatedWork W2962712496 @default.
- W3136604584 hasRelatedWork W2962781506 @default.
- W3136604584 hasRelatedWork W2963052924 @default.
- W3136604584 hasRelatedWork W2963317585 @default.
- W3136604584 hasRelatedWork W2996642285 @default.
- W3136604584 hasRelatedWork W2998281275 @default.
- W3136604584 hasRelatedWork W3037278715 @default.
- W3136604584 hasRelatedWork W3094062549 @default.
- W3136604584 hasRelatedWork W3097466446 @default.
- W3136604584 hasRelatedWork W3098727227 @default.
- W3136604584 hasRelatedWork W3123802267 @default.
- W3136604584 hasRelatedWork W3158775690 @default.
- W3136604584 hasRelatedWork W3194706005 @default.
- W3136604584 isParatext "false" @default.
- W3136604584 isRetracted "false" @default.
- W3136604584 magId "3136604584" @default.
- W3136604584 workType "article" @default.