Matches in SemOpenAlex for { <https://semopenalex.org/work/W4319323677> ?p ?o ?g. }
Showing items 1 to 69 of
69
with 100 items per page.
- W4319323677 abstract "Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the understanding of their convergence to a globally optimal policy is still limited. In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces. First, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a $tilde{mathcal{O}}(varepsilon^{-2.5})$ sample complexity of this method for finding a global $varepsilon$-optimal policy. Improving over the previously known $tilde{mathcal{O}}(varepsilon^{-3})$ complexity, this algorithm does not require the use of importance sampling or second-order information and samples only one trajectory per iteration. Second, we further improve this complexity to $tilde{ mathcal{mathcal{O}} }(varepsilon^{-2})$ by considering a Hessian-Aided Recursive Policy Gradient ((N)-HARPG) algorithm enhanced with a correction based on a Hessian-vector product. Interestingly, both algorithms are $(i)$ simple and easy to implement: single-loop, do not require large batches of trajectories and sample at most two trajectories per iteration; $(ii)$ computationally and memory efficient: they do not require expensive subroutines at each iteration and can be implemented with memory linear in the dimension of parameters." @default.
- W4319323677 created "2023-02-08" @default.
- W4319323677 creator A5003899242 @default.
- W4319323677 creator A5020090388 @default.
- W4319323677 creator A5065250840 @default.
- W4319323677 creator A5071683073 @default.
- W4319323677 date "2023-02-03" @default.
- W4319323677 modified "2023-09-27" @default.
- W4319323677 title "Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies" @default.
- W4319323677 doi "https://doi.org/10.48550/arxiv.2302.01734" @default.
- W4319323677 hasPublicationYear "2023" @default.
- W4319323677 type Work @default.
- W4319323677 citedByCount "0" @default.
- W4319323677 crossrefType "posted-content" @default.
- W4319323677 hasAuthorship W4319323677A5003899242 @default.
- W4319323677 hasAuthorship W4319323677A5020090388 @default.
- W4319323677 hasAuthorship W4319323677A5065250840 @default.
- W4319323677 hasAuthorship W4319323677A5071683073 @default.
- W4319323677 hasBestOaLocation W43193236771 @default.
- W4319323677 hasConcept C105795698 @default.
- W4319323677 hasConcept C11413529 @default.
- W4319323677 hasConcept C114614502 @default.
- W4319323677 hasConcept C118615104 @default.
- W4319323677 hasConcept C121332964 @default.
- W4319323677 hasConcept C126255220 @default.
- W4319323677 hasConcept C162324750 @default.
- W4319323677 hasConcept C165464430 @default.
- W4319323677 hasConcept C203616005 @default.
- W4319323677 hasConcept C2777303404 @default.
- W4319323677 hasConcept C28826006 @default.
- W4319323677 hasConcept C29406490 @default.
- W4319323677 hasConcept C33676613 @default.
- W4319323677 hasConcept C33923547 @default.
- W4319323677 hasConcept C50522688 @default.
- W4319323677 hasConcept C62520636 @default.
- W4319323677 hasConcept C72319582 @default.
- W4319323677 hasConceptScore W4319323677C105795698 @default.
- W4319323677 hasConceptScore W4319323677C11413529 @default.
- W4319323677 hasConceptScore W4319323677C114614502 @default.
- W4319323677 hasConceptScore W4319323677C118615104 @default.
- W4319323677 hasConceptScore W4319323677C121332964 @default.
- W4319323677 hasConceptScore W4319323677C126255220 @default.
- W4319323677 hasConceptScore W4319323677C162324750 @default.
- W4319323677 hasConceptScore W4319323677C165464430 @default.
- W4319323677 hasConceptScore W4319323677C203616005 @default.
- W4319323677 hasConceptScore W4319323677C2777303404 @default.
- W4319323677 hasConceptScore W4319323677C28826006 @default.
- W4319323677 hasConceptScore W4319323677C29406490 @default.
- W4319323677 hasConceptScore W4319323677C33676613 @default.
- W4319323677 hasConceptScore W4319323677C33923547 @default.
- W4319323677 hasConceptScore W4319323677C50522688 @default.
- W4319323677 hasConceptScore W4319323677C62520636 @default.
- W4319323677 hasConceptScore W4319323677C72319582 @default.
- W4319323677 hasLocation W43193236771 @default.
- W4319323677 hasOpenAccess W4319323677 @default.
- W4319323677 hasPrimaryLocation W43193236771 @default.
- W4319323677 hasRelatedWork W1568617615 @default.
- W4319323677 hasRelatedWork W2025314782 @default.
- W4319323677 hasRelatedWork W2076935033 @default.
- W4319323677 hasRelatedWork W2343305867 @default.
- W4319323677 hasRelatedWork W2351859806 @default.
- W4319323677 hasRelatedWork W2897391477 @default.
- W4319323677 hasRelatedWork W3044813583 @default.
- W4319323677 hasRelatedWork W3049601810 @default.
- W4319323677 hasRelatedWork W4221159909 @default.
- W4319323677 hasRelatedWork W4294690810 @default.
- W4319323677 isParatext "false" @default.
- W4319323677 isRetracted "false" @default.
- W4319323677 workType "article" @default.