Matches in SemOpenAlex for { <https://semopenalex.org/work/W4210527295> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W4210527295 abstract "In this paper, we study the Instantaneously Constrained Reinforcement Learning (ICRL) problem, in which we are tasked to find a reward-maximizing policy while satisfying certain constraints at each time step. We first extend a result on the strong duality of Constrained Markov Decision Process (CMDP) in the literature and propose a sufficient condition for strong duality of the ICRL problem. Inspired by the Augmented Lagrangian Method in constrained optimization, we propose a new surrogate objective function for ICRL, which could be efficiently optimized by common policy-gradient based RL algorithms. We show theoretically that a feasible and optimal policy could be obtained by optimizing this surrogate function, under certain conditions related to the feasible policy set. Our empirical results on a tabular Markov Decision Process and two nonlinear optimal control problems, a constrained pendulum and a constrained half-cheetah, justify our analysis, and suggest that our method could promote safety during learning and converge in a smaller number of iterations compared to the existing algorithms." @default.
- W4210527295 created "2022-02-08" @default.
- W4210527295 creator A5024100448 @default.
- W4210527295 creator A5045155646 @default.
- W4210527295 creator A5070827615 @default.
- W4210527295 creator A5072786751 @default.
- W4210527295 date "2021-12-14" @default.
- W4210527295 modified "2023-10-14" @default.
- W4210527295 title "Augmented Lagrangian Method for Instantaneously Constrained Reinforcement Learning Problems" @default.
- W4210527295 cites W1971086298 @default.
- W4210527295 cites W2027579135 @default.
- W4210527295 cites W2038818107 @default.
- W4210527295 cites W2145339207 @default.
- W4210527295 cites W2284269953 @default.
- W4210527295 cites W2575705757 @default.
- W4210527295 cites W2583891003 @default.
- W4210527295 cites W2895492910 @default.
- W4210527295 cites W2905810301 @default.
- W4210527295 cites W2963293747 @default.
- W4210527295 cites W2968547875 @default.
- W4210527295 cites W3004186391 @default.
- W4210527295 cites W3083356404 @default.
- W4210527295 cites W3156919398 @default.
- W4210527295 doi "https://doi.org/10.1109/cdc45484.2021.9683088" @default.
- W4210527295 hasPublicationYear "2021" @default.
- W4210527295 type Work @default.
- W4210527295 citedByCount "4" @default.
- W4210527295 countsByYear W42105272952023 @default.
- W4210527295 crossrefType "proceedings-article" @default.
- W4210527295 hasAuthorship W4210527295A5024100448 @default.
- W4210527295 hasAuthorship W4210527295A5045155646 @default.
- W4210527295 hasAuthorship W4210527295A5070827615 @default.
- W4210527295 hasAuthorship W4210527295A5072786751 @default.
- W4210527295 hasConcept C105795698 @default.
- W4210527295 hasConcept C106189395 @default.
- W4210527295 hasConcept C118615104 @default.
- W4210527295 hasConcept C121332964 @default.
- W4210527295 hasConcept C126255220 @default.
- W4210527295 hasConcept C14036430 @default.
- W4210527295 hasConcept C150452318 @default.
- W4210527295 hasConcept C154945302 @default.
- W4210527295 hasConcept C158622935 @default.
- W4210527295 hasConcept C159886148 @default.
- W4210527295 hasConcept C177264268 @default.
- W4210527295 hasConcept C199360897 @default.
- W4210527295 hasConcept C2778023678 @default.
- W4210527295 hasConcept C28826006 @default.
- W4210527295 hasConcept C33923547 @default.
- W4210527295 hasConcept C41008148 @default.
- W4210527295 hasConcept C53469067 @default.
- W4210527295 hasConcept C62520636 @default.
- W4210527295 hasConcept C78458016 @default.
- W4210527295 hasConcept C86803240 @default.
- W4210527295 hasConcept C91765299 @default.
- W4210527295 hasConcept C97541855 @default.
- W4210527295 hasConceptScore W4210527295C105795698 @default.
- W4210527295 hasConceptScore W4210527295C106189395 @default.
- W4210527295 hasConceptScore W4210527295C118615104 @default.
- W4210527295 hasConceptScore W4210527295C121332964 @default.
- W4210527295 hasConceptScore W4210527295C126255220 @default.
- W4210527295 hasConceptScore W4210527295C14036430 @default.
- W4210527295 hasConceptScore W4210527295C150452318 @default.
- W4210527295 hasConceptScore W4210527295C154945302 @default.
- W4210527295 hasConceptScore W4210527295C158622935 @default.
- W4210527295 hasConceptScore W4210527295C159886148 @default.
- W4210527295 hasConceptScore W4210527295C177264268 @default.
- W4210527295 hasConceptScore W4210527295C199360897 @default.
- W4210527295 hasConceptScore W4210527295C2778023678 @default.
- W4210527295 hasConceptScore W4210527295C28826006 @default.
- W4210527295 hasConceptScore W4210527295C33923547 @default.
- W4210527295 hasConceptScore W4210527295C41008148 @default.
- W4210527295 hasConceptScore W4210527295C53469067 @default.
- W4210527295 hasConceptScore W4210527295C62520636 @default.
- W4210527295 hasConceptScore W4210527295C78458016 @default.
- W4210527295 hasConceptScore W4210527295C86803240 @default.
- W4210527295 hasConceptScore W4210527295C91765299 @default.
- W4210527295 hasConceptScore W4210527295C97541855 @default.
- W4210527295 hasFunder F4320338279 @default.
- W4210527295 hasLocation W42105272951 @default.
- W4210527295 hasOpenAccess W4210527295 @default.
- W4210527295 hasPrimaryLocation W42105272951 @default.
- W4210527295 hasRelatedWork W166252871 @default.
- W4210527295 hasRelatedWork W1967010422 @default.
- W4210527295 hasRelatedWork W2027569164 @default.
- W4210527295 hasRelatedWork W2088707050 @default.
- W4210527295 hasRelatedWork W2376104286 @default.
- W4210527295 hasRelatedWork W2390246627 @default.
- W4210527295 hasRelatedWork W3007368744 @default.
- W4210527295 hasRelatedWork W3010376936 @default.
- W4210527295 hasRelatedWork W4236384091 @default.
- W4210527295 hasRelatedWork W4285149922 @default.
- W4210527295 isParatext "false" @default.
- W4210527295 isRetracted "false" @default.
- W4210527295 workType "article" @default.