Matches in SemOpenAlex for { <https://semopenalex.org/work/W4381586937> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W4381586937 abstract "We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs." @default.
- W4381586937 created "2023-06-22" @default.
- W4381586937 creator A5008672638 @default.
- W4381586937 creator A5035849375 @default.
- W4381586937 creator A5047410441 @default.
- W4381586937 creator A5078862959 @default.
- W4381586937 date "2023-06-20" @default.
- W4381586937 modified "2023-10-18" @default.
- W4381586937 title "Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs" @default.
- W4381586937 doi "https://doi.org/10.48550/arxiv.2306.11700" @default.
- W4381586937 hasPublicationYear "2023" @default.
- W4381586937 type Work @default.
- W4381586937 citedByCount "0" @default.
- W4381586937 crossrefType "posted-content" @default.
- W4381586937 hasAuthorship W4381586937A5008672638 @default.
- W4381586937 hasAuthorship W4381586937A5035849375 @default.
- W4381586937 hasAuthorship W4381586937A5047410441 @default.
- W4381586937 hasAuthorship W4381586937A5078862959 @default.
- W4381586937 hasBestOaLocation W43815869371 @default.
- W4381586937 hasConcept C105795698 @default.
- W4381586937 hasConcept C106189395 @default.
- W4381586937 hasConcept C117160843 @default.
- W4381586937 hasConcept C126255220 @default.
- W4381586937 hasConcept C127162648 @default.
- W4381586937 hasConcept C134306372 @default.
- W4381586937 hasConcept C140479938 @default.
- W4381586937 hasConcept C159886148 @default.
- W4381586937 hasConcept C2524010 @default.
- W4381586937 hasConcept C2681867 @default.
- W4381586937 hasConcept C28826006 @default.
- W4381586937 hasConcept C31258907 @default.
- W4381586937 hasConcept C33923547 @default.
- W4381586937 hasConcept C41008148 @default.
- W4381586937 hasConcept C57869625 @default.
- W4381586937 hasConceptScore W4381586937C105795698 @default.
- W4381586937 hasConceptScore W4381586937C106189395 @default.
- W4381586937 hasConceptScore W4381586937C117160843 @default.
- W4381586937 hasConceptScore W4381586937C126255220 @default.
- W4381586937 hasConceptScore W4381586937C127162648 @default.
- W4381586937 hasConceptScore W4381586937C134306372 @default.
- W4381586937 hasConceptScore W4381586937C140479938 @default.
- W4381586937 hasConceptScore W4381586937C159886148 @default.
- W4381586937 hasConceptScore W4381586937C2524010 @default.
- W4381586937 hasConceptScore W4381586937C2681867 @default.
- W4381586937 hasConceptScore W4381586937C28826006 @default.
- W4381586937 hasConceptScore W4381586937C31258907 @default.
- W4381586937 hasConceptScore W4381586937C33923547 @default.
- W4381586937 hasConceptScore W4381586937C41008148 @default.
- W4381586937 hasConceptScore W4381586937C57869625 @default.
- W4381586937 hasLocation W43815869371 @default.
- W4381586937 hasOpenAccess W4381586937 @default.
- W4381586937 hasPrimaryLocation W43815869371 @default.
- W4381586937 hasRelatedWork W2055345933 @default.
- W4381586937 hasRelatedWork W2095374904 @default.
- W4381586937 hasRelatedWork W2161367706 @default.
- W4381586937 hasRelatedWork W2164546091 @default.
- W4381586937 hasRelatedWork W2272509765 @default.
- W4381586937 hasRelatedWork W2923608232 @default.
- W4381586937 hasRelatedWork W3136016287 @default.
- W4381586937 hasRelatedWork W3143470420 @default.
- W4381586937 hasRelatedWork W4287241112 @default.
- W4381586937 hasRelatedWork W4322733630 @default.
- W4381586937 isParatext "false" @default.
- W4381586937 isRetracted "false" @default.
- W4381586937 workType "article" @default.