Matches in SemOpenAlex for { <https://semopenalex.org/work/W3209211538> ?p ?o ?g. }
- W3209211538 abstract "We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions. First, we show that the preferability of optimization methods depends critically on whether stochastic versus exact gradients are used. In particular, unlike the true gradient setting, geometric information cannot be easily exploited in the stochastic case for accelerating policy optimization without detrimental consequences or impractical assumptions. Second, to explain these findings we introduce the concept of committal rate for stochastic policy optimization, and show that this can serve as a criterion for determining almost sure convergence to global optimality. Third, we show that in the absence of external oracle information, which allows an algorithm to determine the difference between optimal and sub-optimal actions given only on-policy samples, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely. That is, an uninformed algorithm either converges to a globally optimal policy with probability $1$ but at a rate no better than $O(1/t)$, or it achieves faster than $O(1/t)$ convergence but then must fail to converge to the globally optimal policy with some positive probability. Finally, we use the committal rate theory to explain why practical policy optimization methods are sensitive to random initialization, then develop an ensemble method that can be guaranteed to achieve near-optimal solutions with high probability." @default.
- W3209211538 created "2021-11-08" @default.
- W3209211538 creator A5010575626 @default.
- W3209211538 creator A5014823249 @default.
- W3209211538 creator A5026521600 @default.
- W3209211538 creator A5069856068 @default.
- W3209211538 creator A5086484914 @default.
- W3209211538 date "2021-10-29" @default.
- W3209211538 modified "2023-10-05" @default.
- W3209211538 title "Understanding the Effect of Stochasticity in Policy Optimization" @default.
- W3209211538 cites W1771410628 @default.
- W3209211538 cites W1805871248 @default.
- W3209211538 cites W1992208280 @default.
- W3209211538 cites W2043806097 @default.
- W3209211538 cites W2130801532 @default.
- W3209211538 cites W2155027007 @default.
- W3209211538 cites W2736601468 @default.
- W3209211538 cites W2945496654 @default.
- W3209211538 cites W2963120839 @default.
- W3209211538 cites W2964043796 @default.
- W3209211538 cites W2964319760 @default.
- W3209211538 cites W2990747716 @default.
- W3209211538 cites W2995931713 @default.
- W3209211538 cites W3004082694 @default.
- W3209211538 cites W3034426742 @default.
- W3209211538 cites W3041970508 @default.
- W3209211538 cites W3042983647 @default.
- W3209211538 cites W3044451384 @default.
- W3209211538 cites W3046626913 @default.
- W3209211538 cites W3082716386 @default.
- W3209211538 cites W3093528669 @default.
- W3209211538 cites W3098410977 @default.
- W3209211538 cites W3109546547 @default.
- W3209211538 cites W3127686539 @default.
- W3209211538 cites W3132054471 @default.
- W3209211538 cites W3132159071 @default.
- W3209211538 cites W3132322668 @default.
- W3209211538 cites W3159422316 @default.
- W3209211538 cites W3160101512 @default.
- W3209211538 hasPublicationYear "2021" @default.
- W3209211538 type Work @default.
- W3209211538 sameAs 3209211538 @default.
- W3209211538 citedByCount "0" @default.
- W3209211538 crossrefType "posted-content" @default.
- W3209211538 hasAuthorship W3209211538A5010575626 @default.
- W3209211538 hasAuthorship W3209211538A5014823249 @default.
- W3209211538 hasAuthorship W3209211538A5026521600 @default.
- W3209211538 hasAuthorship W3209211538A5069856068 @default.
- W3209211538 hasAuthorship W3209211538A5086484914 @default.
- W3209211538 hasConcept C114466953 @default.
- W3209211538 hasConcept C115903868 @default.
- W3209211538 hasConcept C126255220 @default.
- W3209211538 hasConcept C137836250 @default.
- W3209211538 hasConcept C162324750 @default.
- W3209211538 hasConcept C194387892 @default.
- W3209211538 hasConcept C199360897 @default.
- W3209211538 hasConcept C26517878 @default.
- W3209211538 hasConcept C2777303404 @default.
- W3209211538 hasConcept C33923547 @default.
- W3209211538 hasConcept C38652104 @default.
- W3209211538 hasConcept C41008148 @default.
- W3209211538 hasConcept C50522688 @default.
- W3209211538 hasConcept C55166926 @default.
- W3209211538 hasConcept C57869625 @default.
- W3209211538 hasConceptScore W3209211538C114466953 @default.
- W3209211538 hasConceptScore W3209211538C115903868 @default.
- W3209211538 hasConceptScore W3209211538C126255220 @default.
- W3209211538 hasConceptScore W3209211538C137836250 @default.
- W3209211538 hasConceptScore W3209211538C162324750 @default.
- W3209211538 hasConceptScore W3209211538C194387892 @default.
- W3209211538 hasConceptScore W3209211538C199360897 @default.
- W3209211538 hasConceptScore W3209211538C26517878 @default.
- W3209211538 hasConceptScore W3209211538C2777303404 @default.
- W3209211538 hasConceptScore W3209211538C33923547 @default.
- W3209211538 hasConceptScore W3209211538C38652104 @default.
- W3209211538 hasConceptScore W3209211538C41008148 @default.
- W3209211538 hasConceptScore W3209211538C50522688 @default.
- W3209211538 hasConceptScore W3209211538C55166926 @default.
- W3209211538 hasConceptScore W3209211538C57869625 @default.
- W3209211538 hasLocation W32092115381 @default.
- W3209211538 hasOpenAccess W3209211538 @default.
- W3209211538 hasPrimaryLocation W32092115381 @default.
- W3209211538 hasRelatedWork W1521225494 @default.
- W3209211538 hasRelatedWork W1880005013 @default.
- W3209211538 hasRelatedWork W202180931 @default.
- W3209211538 hasRelatedWork W2083849624 @default.
- W3209211538 hasRelatedWork W2151613683 @default.
- W3209211538 hasRelatedWork W2152231846 @default.
- W3209211538 hasRelatedWork W2165150801 @default.
- W3209211538 hasRelatedWork W2292096035 @default.
- W3209211538 hasRelatedWork W2899032397 @default.
- W3209211538 hasRelatedWork W2950650380 @default.
- W3209211538 hasRelatedWork W2963763047 @default.
- W3209211538 hasRelatedWork W2965260795 @default.
- W3209211538 hasRelatedWork W3039845099 @default.
- W3209211538 hasRelatedWork W3046626913 @default.
- W3209211538 hasRelatedWork W3098593577 @default.
- W3209211538 hasRelatedWork W3127035336 @default.
- W3209211538 hasRelatedWork W3167824561 @default.
- W3209211538 hasRelatedWork W3210035896 @default.