Matches in SemOpenAlex for { <https://semopenalex.org/work/W3098412154> ?p ?o ?g. }
- W3098412154 endingPage "5298" @default.
- W3098412154 startingPage "5283" @default.
- W3098412154 abstract "The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in <italic xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>ad hoc</i> work on domain randomization. Although policy gradient algorithms require optimization of a nonconvex cost function, we show that the multiplicative noise LQR cost has a special property called <italic xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>gradient domination</i> , which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients" @default.
- W3098412154 created "2020-11-23" @default.
- W3098412154 creator A5034611919 @default.
- W3098412154 creator A5047397533 @default.
- W3098412154 creator A5051772147 @default.
- W3098412154 date "2021-11-01" @default.
- W3098412154 modified "2023-09-24" @default.
- W3098412154 title "Learning Optimal Controllers for Linear Systems With Multiplicative Noise via Policy Gradient" @default.
- W3098412154 cites W1616818660 @default.
- W3098412154 cites W1975498076 @default.
- W3098412154 cites W1981042292 @default.
- W3098412154 cites W1981066143 @default.
- W3098412154 cites W1990139897 @default.
- W3098412154 cites W1995502176 @default.
- W3098412154 cites W2002858896 @default.
- W3098412154 cites W2011866373 @default.
- W3098412154 cites W2022611390 @default.
- W3098412154 cites W2052334067 @default.
- W3098412154 cites W2055140741 @default.
- W3098412154 cites W2082210980 @default.
- W3098412154 cites W2120871244 @default.
- W3098412154 cites W2123486671 @default.
- W3098412154 cites W2124783336 @default.
- W3098412154 cites W2127107099 @default.
- W3098412154 cites W2129488707 @default.
- W3098412154 cites W2152161277 @default.
- W3098412154 cites W2153292828 @default.
- W3098412154 cites W2153461933 @default.
- W3098412154 cites W2168090960 @default.
- W3098412154 cites W2395955964 @default.
- W3098412154 cites W2590144118 @default.
- W3098412154 cites W2605102758 @default.
- W3098412154 cites W2822752092 @default.
- W3098412154 cites W2854640194 @default.
- W3098412154 cites W2902907165 @default.
- W3098412154 cites W2943040161 @default.
- W3098412154 cites W3012118400 @default.
- W3098412154 cites W3012546009 @default.
- W3098412154 cites W3045891384 @default.
- W3098412154 cites W3098045837 @default.
- W3098412154 cites W3210839039 @default.
- W3098412154 cites W4250589301 @default.
- W3098412154 doi "https://doi.org/10.1109/tac.2020.3037046" @default.
- W3098412154 hasPublicationYear "2021" @default.
- W3098412154 type Work @default.
- W3098412154 sameAs 3098412154 @default.
- W3098412154 citedByCount "21" @default.
- W3098412154 countsByYear W30984121542020 @default.
- W3098412154 countsByYear W30984121542021 @default.
- W3098412154 countsByYear W30984121542022 @default.
- W3098412154 countsByYear W30984121542023 @default.
- W3098412154 crossrefType "journal-article" @default.
- W3098412154 hasAuthorship W3098412154A5034611919 @default.
- W3098412154 hasAuthorship W3098412154A5047397533 @default.
- W3098412154 hasAuthorship W3098412154A5051772147 @default.
- W3098412154 hasConcept C104317684 @default.
- W3098412154 hasConcept C126255220 @default.
- W3098412154 hasConcept C131021393 @default.
- W3098412154 hasConcept C13412647 @default.
- W3098412154 hasConcept C134306372 @default.
- W3098412154 hasConcept C154945302 @default.
- W3098412154 hasConcept C18015164 @default.
- W3098412154 hasConcept C185592680 @default.
- W3098412154 hasConcept C2775924081 @default.
- W3098412154 hasConcept C33923547 @default.
- W3098412154 hasConcept C41008148 @default.
- W3098412154 hasConcept C42747912 @default.
- W3098412154 hasConcept C47446073 @default.
- W3098412154 hasConcept C55493867 @default.
- W3098412154 hasConcept C63479239 @default.
- W3098412154 hasConcept C84462506 @default.
- W3098412154 hasConcept C91575142 @default.
- W3098412154 hasConcept C9390403 @default.
- W3098412154 hasConcept C97541855 @default.
- W3098412154 hasConcept C98779006 @default.
- W3098412154 hasConceptScore W3098412154C104317684 @default.
- W3098412154 hasConceptScore W3098412154C126255220 @default.
- W3098412154 hasConceptScore W3098412154C131021393 @default.
- W3098412154 hasConceptScore W3098412154C13412647 @default.
- W3098412154 hasConceptScore W3098412154C134306372 @default.
- W3098412154 hasConceptScore W3098412154C154945302 @default.
- W3098412154 hasConceptScore W3098412154C18015164 @default.
- W3098412154 hasConceptScore W3098412154C185592680 @default.
- W3098412154 hasConceptScore W3098412154C2775924081 @default.
- W3098412154 hasConceptScore W3098412154C33923547 @default.
- W3098412154 hasConceptScore W3098412154C41008148 @default.
- W3098412154 hasConceptScore W3098412154C42747912 @default.
- W3098412154 hasConceptScore W3098412154C47446073 @default.
- W3098412154 hasConceptScore W3098412154C55493867 @default.
- W3098412154 hasConceptScore W3098412154C63479239 @default.
- W3098412154 hasConceptScore W3098412154C84462506 @default.
- W3098412154 hasConceptScore W3098412154C91575142 @default.
- W3098412154 hasConceptScore W3098412154C9390403 @default.
- W3098412154 hasConceptScore W3098412154C97541855 @default.
- W3098412154 hasConceptScore W3098412154C98779006 @default.
- W3098412154 hasFunder F4320338279 @default.
- W3098412154 hasFunder F4320338281 @default.
- W3098412154 hasIssue "11" @default.