Matches in SemOpenAlex for { <https://semopenalex.org/work/W3082716386> ?p ?o ?g. }
- W3082716386 abstract "Bandit and reinforcement learning (RL) problems can often be framed as optimization problems where the goal is to maximize average performance while having access only to stochastic estimates of the true gradient. Traditionally, stochastic optimization theory predicts that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates. In this paper we demonstrate that this is not the case for bandit and RL problems. To allow our analysis to be interpreted in light of multi-step MDPs, we focus on techniques derived from stochastic optimization principles (e.g., natural policy gradient and EXP3) and we show that some standard assumptions from optimization theory are violated in these problems. We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics and that seemingly innocuous choices like the baseline can determine whether an algorithm converges. These theoretical findings match our empirical evaluation, which we extend to multi-state MDPs." @default.
- W3082716386 created "2020-09-08" @default.
- W3082716386 creator A5005411106 @default.
- W3082716386 creator A5015846541 @default.
- W3082716386 creator A5071583471 @default.
- W3082716386 creator A5085413987 @default.
- W3082716386 date "2020-08-31" @default.
- W3082716386 modified "2023-09-27" @default.
- W3082716386 title "Beyond variance reduction: Understanding the true impact of baselines on policy optimization" @default.
- W3082716386 cites W1514587017 @default.
- W3082716386 cites W1998498767 @default.
- W3082716386 cites W2000080679 @default.
- W3082716386 cites W2049934117 @default.
- W3082716386 cites W2069739265 @default.
- W3082716386 cites W2077902449 @default.
- W3082716386 cites W2094387729 @default.
- W3082716386 cites W2108682071 @default.
- W3082716386 cites W2119567691 @default.
- W3082716386 cites W2119717200 @default.
- W3082716386 cites W2121863487 @default.
- W3082716386 cites W2125612430 @default.
- W3082716386 cites W2132083787 @default.
- W3082716386 cites W2196298154 @default.
- W3082716386 cites W2767002724 @default.
- W3082716386 cites W2786303200 @default.
- W3082716386 cites W2940545298 @default.
- W3082716386 cites W2949211412 @default.
- W3082716386 cites W2950492145 @default.
- W3082716386 cites W2951371685 @default.
- W3082716386 cites W2952191563 @default.
- W3082716386 cites W2963424548 @default.
- W3082716386 cites W2963433607 @default.
- W3082716386 cites W2963674921 @default.
- W3082716386 cites W2964055673 @default.
- W3082716386 cites W2964112534 @default.
- W3082716386 cites W2980897206 @default.
- W3082716386 cites W3031765924 @default.
- W3082716386 cites W3034426742 @default.
- W3082716386 cites W3034871777 @default.
- W3082716386 cites W3035598242 @default.
- W3082716386 cites W3037719421 @default.
- W3082716386 cites W3043114440 @default.
- W3082716386 cites W3046626913 @default.
- W3082716386 cites W659523800 @default.
- W3082716386 hasPublicationYear "2020" @default.
- W3082716386 type Work @default.
- W3082716386 sameAs 3082716386 @default.
- W3082716386 citedByCount "3" @default.
- W3082716386 countsByYear W30827163862021 @default.
- W3082716386 crossrefType "posted-content" @default.
- W3082716386 hasAuthorship W3082716386A5005411106 @default.
- W3082716386 hasAuthorship W3082716386A5015846541 @default.
- W3082716386 hasAuthorship W3082716386A5071583471 @default.
- W3082716386 hasAuthorship W3082716386A5085413987 @default.
- W3082716386 hasConcept C111368507 @default.
- W3082716386 hasConcept C115961682 @default.
- W3082716386 hasConcept C120665830 @default.
- W3082716386 hasConcept C121332964 @default.
- W3082716386 hasConcept C121955636 @default.
- W3082716386 hasConcept C126255220 @default.
- W3082716386 hasConcept C12725497 @default.
- W3082716386 hasConcept C127313418 @default.
- W3082716386 hasConcept C137836250 @default.
- W3082716386 hasConcept C154945302 @default.
- W3082716386 hasConcept C162324750 @default.
- W3082716386 hasConcept C192209626 @default.
- W3082716386 hasConcept C194387892 @default.
- W3082716386 hasConcept C195065555 @default.
- W3082716386 hasConcept C196083921 @default.
- W3082716386 hasConcept C2524010 @default.
- W3082716386 hasConcept C33923547 @default.
- W3082716386 hasConcept C41008148 @default.
- W3082716386 hasConcept C62644790 @default.
- W3082716386 hasConcept C97541855 @default.
- W3082716386 hasConcept C99498987 @default.
- W3082716386 hasConceptScore W3082716386C111368507 @default.
- W3082716386 hasConceptScore W3082716386C115961682 @default.
- W3082716386 hasConceptScore W3082716386C120665830 @default.
- W3082716386 hasConceptScore W3082716386C121332964 @default.
- W3082716386 hasConceptScore W3082716386C121955636 @default.
- W3082716386 hasConceptScore W3082716386C126255220 @default.
- W3082716386 hasConceptScore W3082716386C12725497 @default.
- W3082716386 hasConceptScore W3082716386C127313418 @default.
- W3082716386 hasConceptScore W3082716386C137836250 @default.
- W3082716386 hasConceptScore W3082716386C154945302 @default.
- W3082716386 hasConceptScore W3082716386C162324750 @default.
- W3082716386 hasConceptScore W3082716386C192209626 @default.
- W3082716386 hasConceptScore W3082716386C194387892 @default.
- W3082716386 hasConceptScore W3082716386C195065555 @default.
- W3082716386 hasConceptScore W3082716386C196083921 @default.
- W3082716386 hasConceptScore W3082716386C2524010 @default.
- W3082716386 hasConceptScore W3082716386C33923547 @default.
- W3082716386 hasConceptScore W3082716386C41008148 @default.
- W3082716386 hasConceptScore W3082716386C62644790 @default.
- W3082716386 hasConceptScore W3082716386C97541855 @default.
- W3082716386 hasConceptScore W3082716386C99498987 @default.
- W3082716386 hasLocation W30827163861 @default.
- W3082716386 hasOpenAccess W3082716386 @default.
- W3082716386 hasPrimaryLocation W30827163861 @default.
- W3082716386 hasRelatedWork W1551424636 @default.