Matches in SemOpenAlex for { <https://semopenalex.org/work/W3039845099> ?p ?o ?g. }
- W3039845099 abstract "Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution or how they cope with approximation error due to using a restricted class of parametric policies. This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: tabular policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy; and parametric policy classes (considering both log-linear and neural policy classes), which may not contain the optimal policy and where we provide agnostic learning results. One central contribution of this work is in providing approximation guarantees that are average case -- which avoid explicit worst-case dependencies on the size of state space -- by making a formal connection to supervised learning under distribution shift. This characterization shows an important interplay between estimation error, approximation error, and exploration (as characterized through a precisely defined condition number)." @default.
- W3039845099 created "2020-07-10" @default.
- W3039845099 creator A5018792915 @default.
- W3039845099 creator A5031473871 @default.
- W3039845099 creator A5036435487 @default.
- W3039845099 creator A5059740024 @default.
- W3039845099 date "2019-08-01" @default.
- W3039845099 modified "2023-09-23" @default.
- W3039845099 title "On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift" @default.
- W3039845099 cites W107583932 @default.
- W3039845099 cites W114364418 @default.
- W3039845099 cites W1499669280 @default.
- W3039845099 cites W1505731132 @default.
- W3039845099 cites W1553598118 @default.
- W3039845099 cites W1564755532 @default.
- W3039845099 cites W1570963478 @default.
- W3039845099 cites W1575592356 @default.
- W3039845099 cites W1576452626 @default.
- W3039845099 cites W1597303641 @default.
- W3039845099 cites W1730555343 @default.
- W3039845099 cites W1771410628 @default.
- W3039845099 cites W187211397 @default.
- W3039845099 cites W1889629917 @default.
- W3039845099 cites W1987083649 @default.
- W3039845099 cites W1988790447 @default.
- W3039845099 cites W1993411524 @default.
- W3039845099 cites W2009941369 @default.
- W3039845099 cites W2021361347 @default.
- W3039845099 cites W2054434031 @default.
- W3039845099 cites W2074680702 @default.
- W3039845099 cites W2077723394 @default.
- W3039845099 cites W2094387729 @default.
- W3039845099 cites W2106017103 @default.
- W3039845099 cites W2115738253 @default.
- W3039845099 cites W2119579400 @default.
- W3039845099 cites W2119717200 @default.
- W3039845099 cites W2122689259 @default.
- W3039845099 cites W2128812357 @default.
- W3039845099 cites W2129732816 @default.
- W3039845099 cites W2130801532 @default.
- W3039845099 cites W2155027007 @default.
- W3039845099 cites W2157016390 @default.
- W3039845099 cites W2165421048 @default.
- W3039845099 cites W2397607997 @default.
- W3039845099 cites W2489939061 @default.
- W3039845099 cites W2545659366 @default.
- W3039845099 cites W2619268125 @default.
- W3039845099 cites W2736601468 @default.
- W3039845099 cites W2945496654 @default.
- W3039845099 cites W2948432982 @default.
- W3039845099 cites W2949099147 @default.
- W3039845099 cites W2949212702 @default.
- W3039845099 cites W2952500758 @default.
- W3039845099 cites W2956123884 @default.
- W3039845099 cites W2962749646 @default.
- W3039845099 cites W2962785728 @default.
- W3039845099 cites W2962821147 @default.
- W3039845099 cites W2962901215 @default.
- W3039845099 cites W2963092340 @default.
- W3039845099 cites W2963248893 @default.
- W3039845099 cites W2963670858 @default.
- W3039845099 cites W2963884015 @default.
- W3039845099 cites W2964043796 @default.
- W3039845099 cites W2964106499 @default.
- W3039845099 cites W2970355847 @default.
- W3039845099 cites W3046395471 @default.
- W3039845099 cites W3117137507 @default.
- W3039845099 cites W607505555 @default.
- W3039845099 hasPublicationYear "2019" @default.
- W3039845099 type Work @default.
- W3039845099 sameAs 3039845099 @default.
- W3039845099 citedByCount "76" @default.
- W3039845099 countsByYear W30398450992018 @default.
- W3039845099 countsByYear W30398450992019 @default.
- W3039845099 countsByYear W30398450992020 @default.
- W3039845099 countsByYear W30398450992021 @default.
- W3039845099 countsByYear W30398450992022 @default.
- W3039845099 crossrefType "posted-content" @default.
- W3039845099 hasAuthorship W3039845099A5018792915 @default.
- W3039845099 hasAuthorship W3039845099A5031473871 @default.
- W3039845099 hasAuthorship W3039845099A5036435487 @default.
- W3039845099 hasAuthorship W3039845099A5059740024 @default.
- W3039845099 hasConcept C105795698 @default.
- W3039845099 hasConcept C106189395 @default.
- W3039845099 hasConcept C117251300 @default.
- W3039845099 hasConcept C122383733 @default.
- W3039845099 hasConcept C126255220 @default.
- W3039845099 hasConcept C127162648 @default.
- W3039845099 hasConcept C151730666 @default.
- W3039845099 hasConcept C154945302 @default.
- W3039845099 hasConcept C159886148 @default.
- W3039845099 hasConcept C162324750 @default.
- W3039845099 hasConcept C2777212361 @default.
- W3039845099 hasConcept C2777303404 @default.
- W3039845099 hasConcept C2779343474 @default.
- W3039845099 hasConcept C28826006 @default.
- W3039845099 hasConcept C31258907 @default.
- W3039845099 hasConcept C33923547 @default.
- W3039845099 hasConcept C41008148 @default.
- W3039845099 hasConcept C50522688 @default.