Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313442804> ?p ?o ?g. }
Showing items 1 to 57 of
57
with 100 items per page.
- W4313442804 abstract "Many popular policy gradient methods for reinforcement learning follow a biased approximation of the policy gradient known as the discounted approximation. While it has been shown that the discounted approximation of the policy gradient is not the gradient of any objective function, little else is known about its convergence behavior or properties. In this paper, we show that if the discounted approximation is followed such that the discount factor is increased slowly at a rate related to a decreasing learning rate, the resulting method recovers the standard guarantees of gradient ascent on the undiscounted objective." @default.
- W4313442804 created "2023-01-06" @default.
- W4313442804 creator A5038122282 @default.
- W4313442804 date "2022-12-28" @default.
- W4313442804 modified "2023-10-18" @default.
- W4313442804 title "On the Convergence of Discounted Policy Gradient Methods" @default.
- W4313442804 doi "https://doi.org/10.48550/arxiv.2212.14066" @default.
- W4313442804 hasPublicationYear "2022" @default.
- W4313442804 type Work @default.
- W4313442804 citedByCount "0" @default.
- W4313442804 crossrefType "posted-content" @default.
- W4313442804 hasAuthorship W4313442804A5038122282 @default.
- W4313442804 hasBestOaLocation W43134428041 @default.
- W4313442804 hasConcept C115680565 @default.
- W4313442804 hasConcept C126255220 @default.
- W4313442804 hasConcept C14036430 @default.
- W4313442804 hasConcept C162324750 @default.
- W4313442804 hasConcept C26517878 @default.
- W4313442804 hasConcept C2777303404 @default.
- W4313442804 hasConcept C28826006 @default.
- W4313442804 hasConcept C33923547 @default.
- W4313442804 hasConcept C38652104 @default.
- W4313442804 hasConcept C41008148 @default.
- W4313442804 hasConcept C50522688 @default.
- W4313442804 hasConcept C57869625 @default.
- W4313442804 hasConcept C78458016 @default.
- W4313442804 hasConcept C86803240 @default.
- W4313442804 hasConceptScore W4313442804C115680565 @default.
- W4313442804 hasConceptScore W4313442804C126255220 @default.
- W4313442804 hasConceptScore W4313442804C14036430 @default.
- W4313442804 hasConceptScore W4313442804C162324750 @default.
- W4313442804 hasConceptScore W4313442804C26517878 @default.
- W4313442804 hasConceptScore W4313442804C2777303404 @default.
- W4313442804 hasConceptScore W4313442804C28826006 @default.
- W4313442804 hasConceptScore W4313442804C33923547 @default.
- W4313442804 hasConceptScore W4313442804C38652104 @default.
- W4313442804 hasConceptScore W4313442804C41008148 @default.
- W4313442804 hasConceptScore W4313442804C50522688 @default.
- W4313442804 hasConceptScore W4313442804C57869625 @default.
- W4313442804 hasConceptScore W4313442804C78458016 @default.
- W4313442804 hasConceptScore W4313442804C86803240 @default.
- W4313442804 hasLocation W43134428041 @default.
- W4313442804 hasOpenAccess W4313442804 @default.
- W4313442804 hasPrimaryLocation W43134428041 @default.
- W4313442804 hasRelatedWork W130791004 @default.
- W4313442804 hasRelatedWork W1968838599 @default.
- W4313442804 hasRelatedWork W2040523805 @default.
- W4313442804 hasRelatedWork W2128702080 @default.
- W4313442804 hasRelatedWork W2152704622 @default.
- W4313442804 hasRelatedWork W2620834727 @default.
- W4313442804 hasRelatedWork W3135034681 @default.
- W4313442804 hasRelatedWork W3201481289 @default.
- W4313442804 hasRelatedWork W4226232014 @default.
- W4313442804 hasRelatedWork W4286986829 @default.
- W4313442804 isParatext "false" @default.
- W4313442804 isRetracted "false" @default.
- W4313442804 workType "article" @default.