Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287979710> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4287979710 abstract "Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems and the lack of exact gradient computation. In this paper, we take a step towards demystifying the performance and efficiency of such methods by focusing on the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We establish exponential stability for the ordinary differential equation (ODE) that governs the gradient-flow dynamics over the set of stabilizing feedback gains and show that a similar result holds for the gradient descent method that arises from the forward Euler discretization of the corresponding ODE. We also provide theoretical bounds on the convergence rate and sample complexity of the random search method with two-point gradient estimates. We prove that the required simulation time for achieving $epsilon$-accuracy in the model-free setup and the total number of function evaluations both scale as $log , (1/epsilon)$." @default.
- W4287979710 created "2022-07-26" @default.
- W4287979710 creator A5035198573 @default.
- W4287979710 creator A5046962187 @default.
- W4287979710 creator A5047399398 @default.
- W4287979710 creator A5087790067 @default.
- W4287979710 date "2019-12-26" @default.
- W4287979710 modified "2023-09-29" @default.
- W4287979710 title "Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem" @default.
- W4287979710 doi "https://doi.org/10.48550/arxiv.1912.11899" @default.
- W4287979710 hasPublicationYear "2019" @default.
- W4287979710 type Work @default.
- W4287979710 citedByCount "0" @default.
- W4287979710 crossrefType "posted-content" @default.
- W4287979710 hasAuthorship W4287979710A5035198573 @default.
- W4287979710 hasAuthorship W4287979710A5046962187 @default.
- W4287979710 hasAuthorship W4287979710A5047399398 @default.
- W4287979710 hasAuthorship W4287979710A5087790067 @default.
- W4287979710 hasBestOaLocation W42879797101 @default.
- W4287979710 hasConcept C119857082 @default.
- W4287979710 hasConcept C126255220 @default.
- W4287979710 hasConcept C134306372 @default.
- W4287979710 hasConcept C153258448 @default.
- W4287979710 hasConcept C154945302 @default.
- W4287979710 hasConcept C162324750 @default.
- W4287979710 hasConcept C26517878 @default.
- W4287979710 hasConcept C2777303404 @default.
- W4287979710 hasConcept C28826006 @default.
- W4287979710 hasConcept C33923547 @default.
- W4287979710 hasConcept C34862557 @default.
- W4287979710 hasConcept C38652104 @default.
- W4287979710 hasConcept C41008148 @default.
- W4287979710 hasConcept C50522688 @default.
- W4287979710 hasConcept C50644808 @default.
- W4287979710 hasConcept C51544822 @default.
- W4287979710 hasConcept C57869625 @default.
- W4287979710 hasConcept C73000952 @default.
- W4287979710 hasConcept C78045399 @default.
- W4287979710 hasConcept C91575142 @default.
- W4287979710 hasConcept C97541855 @default.
- W4287979710 hasConcept C98779006 @default.
- W4287979710 hasConceptScore W4287979710C119857082 @default.
- W4287979710 hasConceptScore W4287979710C126255220 @default.
- W4287979710 hasConceptScore W4287979710C134306372 @default.
- W4287979710 hasConceptScore W4287979710C153258448 @default.
- W4287979710 hasConceptScore W4287979710C154945302 @default.
- W4287979710 hasConceptScore W4287979710C162324750 @default.
- W4287979710 hasConceptScore W4287979710C26517878 @default.
- W4287979710 hasConceptScore W4287979710C2777303404 @default.
- W4287979710 hasConceptScore W4287979710C28826006 @default.
- W4287979710 hasConceptScore W4287979710C33923547 @default.
- W4287979710 hasConceptScore W4287979710C34862557 @default.
- W4287979710 hasConceptScore W4287979710C38652104 @default.
- W4287979710 hasConceptScore W4287979710C41008148 @default.
- W4287979710 hasConceptScore W4287979710C50522688 @default.
- W4287979710 hasConceptScore W4287979710C50644808 @default.
- W4287979710 hasConceptScore W4287979710C51544822 @default.
- W4287979710 hasConceptScore W4287979710C57869625 @default.
- W4287979710 hasConceptScore W4287979710C73000952 @default.
- W4287979710 hasConceptScore W4287979710C78045399 @default.
- W4287979710 hasConceptScore W4287979710C91575142 @default.
- W4287979710 hasConceptScore W4287979710C97541855 @default.
- W4287979710 hasConceptScore W4287979710C98779006 @default.
- W4287979710 hasLocation W42879797101 @default.
- W4287979710 hasOpenAccess W4287979710 @default.
- W4287979710 hasPrimaryLocation W42879797101 @default.
- W4287979710 hasRelatedWork W131075558 @default.
- W4287979710 hasRelatedWork W1535810608 @default.
- W4287979710 hasRelatedWork W2124450277 @default.
- W4287979710 hasRelatedWork W2133789993 @default.
- W4287979710 hasRelatedWork W2494394972 @default.
- W4287979710 hasRelatedWork W2951248065 @default.
- W4287979710 hasRelatedWork W2974753373 @default.
- W4287979710 hasRelatedWork W2985965953 @default.
- W4287979710 hasRelatedWork W3135923918 @default.
- W4287979710 hasRelatedWork W3210092022 @default.
- W4287979710 isParatext "false" @default.
- W4287979710 isRetracted "false" @default.
- W4287979710 workType "article" @default.