Matches in SemOpenAlex for { <https://semopenalex.org/work/W3008829306> ?p ?o ?g. }
- W3008829306 abstract "We investigate several confounding factors in the evaluation of optimization algorithms for deep learning. Primarily, we take a deeper look at how adaptive gradient methods interact with the learning rate schedule, a notoriously difficult-to-tune hyperparameter which has dramatic effects on the convergence and generalization of neural network training. We introduce a grafting experiment which decouples an update's magnitude from its direction, finding that many existing beliefs in the literature may have arisen from insufficient isolation of the implicit schedule of step sizes. Alongside this contribution, we present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods, aimed at bringing more clarity to this space." @default.
- W3008829306 created "2020-03-06" @default.
- W3008829306 creator A5007875487 @default.
- W3008829306 creator A5024431603 @default.
- W3008829306 creator A5038867895 @default.
- W3008829306 creator A5043228083 @default.
- W3008829306 creator A5053775980 @default.
- W3008829306 date "2020-02-26" @default.
- W3008829306 modified "2023-09-27" @default.
- W3008829306 title "Disentangling Adaptive Gradient Methods from Learning Rates" @default.
- W3008829306 cites W1598866093 @default.
- W3008829306 cites W1686810756 @default.
- W3008829306 cites W1815076433 @default.
- W3008829306 cites W1988720110 @default.
- W3008829306 cites W2093647425 @default.
- W3008829306 cites W2108598243 @default.
- W3008829306 cites W2146502635 @default.
- W3008829306 cites W2148825261 @default.
- W3008829306 cites W2296319761 @default.
- W3008829306 cites W2302255633 @default.
- W3008829306 cites W2402144811 @default.
- W3008829306 cites W2404385938 @default.
- W3008829306 cites W2518108298 @default.
- W3008829306 cites W2564486991 @default.
- W3008829306 cites W2577255746 @default.
- W3008829306 cites W2606722458 @default.
- W3008829306 cites W2622263826 @default.
- W3008829306 cites W2757910899 @default.
- W3008829306 cites W2776855315 @default.
- W3008829306 cites W2781726626 @default.
- W3008829306 cites W2785523195 @default.
- W3008829306 cites W2786263712 @default.
- W3008829306 cites W2796108585 @default.
- W3008829306 cites W2797328513 @default.
- W3008829306 cites W2808042107 @default.
- W3008829306 cites W2891952073 @default.
- W3008829306 cites W2893749619 @default.
- W3008829306 cites W2895674636 @default.
- W3008829306 cites W2896457183 @default.
- W3008829306 cites W2899402383 @default.
- W3008829306 cites W2903382683 @default.
- W3008829306 cites W2911867426 @default.
- W3008829306 cites W2912018747 @default.
- W3008829306 cites W2928941594 @default.
- W3008829306 cites W2930786691 @default.
- W3008829306 cites W2945586457 @default.
- W3008829306 cites W2945697643 @default.
- W3008829306 cites W2949117887 @default.
- W3008829306 cites W2949978754 @default.
- W3008829306 cites W2950813464 @default.
- W3008829306 cites W2951395930 @default.
- W3008829306 cites W2954490083 @default.
- W3008829306 cites W2962716258 @default.
- W3008829306 cites W2962760235 @default.
- W3008829306 cites W2962961534 @default.
- W3008829306 cites W2963000090 @default.
- W3008829306 cites W2963023528 @default.
- W3008829306 cites W2963120839 @default.
- W3008829306 cites W2963139417 @default.
- W3008829306 cites W2963208657 @default.
- W3008829306 cites W2963403868 @default.
- W3008829306 cites W2963433607 @default.
- W3008829306 cites W2963470657 @default.
- W3008829306 cites W2963518130 @default.
- W3008829306 cites W2963664311 @default.
- W3008829306 cites W2963794891 @default.
- W3008829306 cites W2963798163 @default.
- W3008829306 cites W2963826371 @default.
- W3008829306 cites W2964004663 @default.
- W3008829306 cites W2964121744 @default.
- W3008829306 cites W2964125128 @default.
- W3008829306 cites W2964319706 @default.
- W3008829306 cites W2965373594 @default.
- W3008829306 cites W2967536008 @default.
- W3008829306 cites W2970241199 @default.
- W3008829306 cites W2971216715 @default.
- W3008829306 cites W2980149079 @default.
- W3008829306 cites W2980844465 @default.
- W3008829306 cites W2990219166 @default.
- W3008829306 cites W2995435108 @default.
- W3008829306 cites W2995727426 @default.
- W3008829306 cites W3141595720 @default.
- W3008829306 cites W6908809 @default.
- W3008829306 cites W2995353295 @default.
- W3008829306 doi "https://doi.org/10.48550/arxiv.2002.11803" @default.
- W3008829306 hasPublicationYear "2020" @default.
- W3008829306 type Work @default.
- W3008829306 sameAs 3008829306 @default.
- W3008829306 citedByCount "7" @default.
- W3008829306 countsByYear W30088293062020 @default.
- W3008829306 countsByYear W30088293062021 @default.
- W3008829306 countsByYear W30088293062023 @default.
- W3008829306 crossrefType "posted-content" @default.
- W3008829306 hasAuthorship W3008829306A5007875487 @default.
- W3008829306 hasAuthorship W3008829306A5024431603 @default.
- W3008829306 hasAuthorship W3008829306A5038867895 @default.
- W3008829306 hasAuthorship W3008829306A5043228083 @default.
- W3008829306 hasAuthorship W3008829306A5053775980 @default.
- W3008829306 hasBestOaLocation W30088293061 @default.
- W3008829306 hasConcept C108583219 @default.