Matches in SemOpenAlex for { <https://semopenalex.org/work/W3118291054> ?p ?o ?g. }
- W3118291054 abstract "Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems." @default.
- W3118291054 created "2021-01-18" @default.
- W3118291054 creator A5016980595 @default.
- W3118291054 creator A5019604570 @default.
- W3118291054 creator A5026432142 @default.
- W3118291054 creator A5047410441 @default.
- W3118291054 date "2021-01-04" @default.
- W3118291054 modified "2023-09-23" @default.
- W3118291054 title "Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity" @default.
- W3118291054 cites W1191599655 @default.
- W3118291054 cites W1542941925 @default.
- W3118291054 cites W1578630563 @default.
- W3118291054 cites W1579118835 @default.
- W3118291054 cites W1589395967 @default.
- W3118291054 cites W1771410628 @default.
- W3118291054 cites W1965473151 @default.
- W3118291054 cites W1965878388 @default.
- W3118291054 cites W1968315580 @default.
- W3118291054 cites W2005437559 @default.
- W3118291054 cites W2027591506 @default.
- W3118291054 cites W2035326564 @default.
- W3118291054 cites W2039153613 @default.
- W3118291054 cites W2056241376 @default.
- W3118291054 cites W2076626169 @default.
- W3118291054 cites W2080381596 @default.
- W3118291054 cites W2094387729 @default.
- W3118291054 cites W2096035449 @default.
- W3118291054 cites W2103979068 @default.
- W3118291054 cites W2105078254 @default.
- W3118291054 cites W2106929622 @default.
- W3118291054 cites W2130801532 @default.
- W3118291054 cites W2136503687 @default.
- W3118291054 cites W2145618894 @default.
- W3118291054 cites W2155027007 @default.
- W3118291054 cites W2156737235 @default.
- W3118291054 cites W2168565265 @default.
- W3118291054 cites W2173248099 @default.
- W3118291054 cites W2296319761 @default.
- W3118291054 cites W2575731723 @default.
- W3118291054 cites W2602963933 @default.
- W3118291054 cites W2736601468 @default.
- W3118291054 cites W2822752092 @default.
- W3118291054 cites W2886474253 @default.
- W3118291054 cites W2946019081 @default.
- W3118291054 cites W2949608212 @default.
- W3118291054 cites W2950300520 @default.
- W3118291054 cites W2953916822 @default.
- W3118291054 cites W2963774238 @default.
- W3118291054 cites W2964070557 @default.
- W3118291054 cites W2964189368 @default.
- W3118291054 cites W2964990165 @default.
- W3118291054 cites W2970416783 @default.
- W3118291054 cites W2970537473 @default.
- W3118291054 cites W2970802174 @default.
- W3118291054 cites W2984078320 @default.
- W3118291054 cites W2991046523 @default.
- W3118291054 cites W2994779591 @default.
- W3118291054 cites W2998481680 @default.
- W3118291054 cites W3007136572 @default.
- W3118291054 cites W3008744877 @default.
- W3118291054 cites W3019868075 @default.
- W3118291054 cites W3020047188 @default.
- W3118291054 cites W3034862928 @default.
- W3118291054 cites W3035665052 @default.
- W3118291054 cites W3037916866 @default.
- W3118291054 cites W3042871037 @default.
- W3118291054 cites W3046017196 @default.
- W3118291054 cites W3046090228 @default.
- W3118291054 cites W3046992251 @default.
- W3118291054 cites W3087176855 @default.
- W3118291054 cites W3091380613 @default.
- W3118291054 cites W3095407942 @default.
- W3118291054 cites W3100292177 @default.
- W3118291054 cites W3100975812 @default.
- W3118291054 cites W3108890229 @default.
- W3118291054 cites W3108902193 @default.
- W3118291054 cites W3109546547 @default.
- W3118291054 cites W3110042586 @default.
- W3118291054 cites W3119353806 @default.
- W3118291054 cites W3127429604 @default.
- W3118291054 cites W3216747129 @default.
- W3118291054 hasPublicationYear "2021" @default.
- W3118291054 type Work @default.
- W3118291054 sameAs 3118291054 @default.
- W3118291054 citedByCount "6" @default.
- W3118291054 countsByYear W31182910542021 @default.
- W3118291054 countsByYear W31182910542022 @default.
- W3118291054 crossrefType "posted-content" @default.
- W3118291054 hasAuthorship W3118291054A5016980595 @default.
- W3118291054 hasAuthorship W3118291054A5019604570 @default.
- W3118291054 hasAuthorship W3118291054A5026432142 @default.
- W3118291054 hasAuthorship W3118291054A5047410441 @default.
- W3118291054 hasBestOaLocation W31182910541 @default.
- W3118291054 hasConcept C104317684 @default.
- W3118291054 hasConcept C126255220 @default.
- W3118291054 hasConcept C134306372 @default.
- W3118291054 hasConcept C137836250 @default.
- W3118291054 hasConcept C149728462 @default.
- W3118291054 hasConcept C154945302 @default.
- W3118291054 hasConcept C185592680 @default.