Matches in SemOpenAlex for { <https://semopenalex.org/work/W2970388773> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W2970388773 endingPage "11680" @default.
- W2970388773 startingPage "11669" @default.
- W2970388773 abstract "Stochastic gradient descent with a large initial learning rate is widely used for training modern neural net architectures. Although a small initial learning rate allows for faster training and better test performance initially, the large learning rate achieves better generalization soon after the learning rate is annealed. Towards explaining this phenomenon, we devise a setting in which we can prove that a two layer network trained with large initial learning rate and annealing provably generalizes better than the same network trained with a small learning rate from the start. The key insight in our analysis is that the order of learning different types of patterns is crucial: because the small learning rate model first memorizes low-noise, hard-to-fit patterns, it generalizes worse on hard-to-generalize, easier-to-fit patterns than its large learning rate counterpart. This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing. Our experiments show that this causes the small learning rate model's accuracy on unmodified images to suffer, as it relies too much on the patch early on." @default.
- W2970388773 created "2019-09-05" @default.
- W2970388773 creator A5055799536 @default.
- W2970388773 creator A5061905935 @default.
- W2970388773 creator A5070340856 @default.
- W2970388773 date "2019-07-10" @default.
- W2970388773 modified "2023-10-18" @default.
- W2970388773 title "Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks" @default.
- W2970388773 hasPublicationYear "2019" @default.
- W2970388773 type Work @default.
- W2970388773 sameAs 2970388773 @default.
- W2970388773 citedByCount "46" @default.
- W2970388773 countsByYear W29703887732019 @default.
- W2970388773 countsByYear W29703887732020 @default.
- W2970388773 countsByYear W29703887732021 @default.
- W2970388773 crossrefType "proceedings-article" @default.
- W2970388773 hasAuthorship W2970388773A5055799536 @default.
- W2970388773 hasAuthorship W2970388773A5061905935 @default.
- W2970388773 hasAuthorship W2970388773A5070340856 @default.
- W2970388773 hasConcept C108583219 @default.
- W2970388773 hasConcept C119857082 @default.
- W2970388773 hasConcept C134306372 @default.
- W2970388773 hasConcept C154945302 @default.
- W2970388773 hasConcept C177148314 @default.
- W2970388773 hasConcept C206688291 @default.
- W2970388773 hasConcept C26517878 @default.
- W2970388773 hasConcept C2776135515 @default.
- W2970388773 hasConcept C33923547 @default.
- W2970388773 hasConcept C38652104 @default.
- W2970388773 hasConcept C41008148 @default.
- W2970388773 hasConcept C50644808 @default.
- W2970388773 hasConcept C57869625 @default.
- W2970388773 hasConceptScore W2970388773C108583219 @default.
- W2970388773 hasConceptScore W2970388773C119857082 @default.
- W2970388773 hasConceptScore W2970388773C134306372 @default.
- W2970388773 hasConceptScore W2970388773C154945302 @default.
- W2970388773 hasConceptScore W2970388773C177148314 @default.
- W2970388773 hasConceptScore W2970388773C206688291 @default.
- W2970388773 hasConceptScore W2970388773C26517878 @default.
- W2970388773 hasConceptScore W2970388773C2776135515 @default.
- W2970388773 hasConceptScore W2970388773C33923547 @default.
- W2970388773 hasConceptScore W2970388773C38652104 @default.
- W2970388773 hasConceptScore W2970388773C41008148 @default.
- W2970388773 hasConceptScore W2970388773C50644808 @default.
- W2970388773 hasConceptScore W2970388773C57869625 @default.
- W2970388773 hasLocation W29703887731 @default.
- W2970388773 hasOpenAccess W2970388773 @default.
- W2970388773 hasPrimaryLocation W29703887731 @default.
- W2970388773 hasRelatedWork W2194775991 @default.
- W2970388773 hasRelatedWork W2622263826 @default.
- W2970388773 hasRelatedWork W2768267830 @default.
- W2970388773 hasRelatedWork W2809090039 @default.
- W2970388773 hasRelatedWork W2886067286 @default.
- W2970388773 hasRelatedWork W2899476926 @default.
- W2970388773 hasRelatedWork W2911742574 @default.
- W2970388773 hasRelatedWork W2952204734 @default.
- W2970388773 hasRelatedWork W2962698540 @default.
- W2970388773 hasRelatedWork W2962915600 @default.
- W2970388773 hasRelatedWork W2963069632 @default.
- W2970388773 hasRelatedWork W2963177640 @default.
- W2970388773 hasRelatedWork W2963959597 @default.
- W2970388773 hasRelatedWork W2964072432 @default.
- W2970388773 hasRelatedWork W2964121744 @default.
- W2970388773 hasRelatedWork W2970217468 @default.
- W2970388773 hasRelatedWork W2971043187 @default.
- W2970388773 hasRelatedWork W3009686669 @default.
- W2970388773 hasRelatedWork W3118608800 @default.
- W2970388773 hasRelatedWork W3137695714 @default.
- W2970388773 hasVolume "32" @default.
- W2970388773 isParatext "false" @default.
- W2970388773 isRetracted "false" @default.
- W2970388773 magId "2970388773" @default.
- W2970388773 workType "article" @default.