Matches in SemOpenAlex for { <https://semopenalex.org/work/W1967459934> ?p ?o ?g. }
- W1967459934 endingPage "376" @default.
- W1967459934 startingPage "342" @default.
- W1967459934 abstract "Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution that corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate γ for the value functions close to 1, these algorithms do not permit γ to be set exactly at γ = 1. In this article, we propose a method for estimating the log stationary state distribution derivative (LSD) as a useful form of the derivative of the stationary state distribution through backward Markov chain formulation and a temporal difference learning framework. A new policy gradient (PG) framework with an LSD is also proposed, in which the average reward gradient can be estimated by setting γ = 0, so it becomes unnecessary to learn the value functions. We also test the performance of the proposed algorithms using simple benchmark tasks and show that these can improve the performances of existing PG methods." @default.
- W1967459934 created "2016-06-24" @default.
- W1967459934 creator A5004840638 @default.
- W1967459934 creator A5031054137 @default.
- W1967459934 creator A5071281836 @default.
- W1967459934 creator A5083076675 @default.
- W1967459934 creator A5089173802 @default.
- W1967459934 date "2010-02-01" @default.
- W1967459934 modified "2023-09-24" @default.
- W1967459934 title "Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning" @default.
- W1967459934 cites W1541084404 @default.
- W1967459934 cites W1553320709 @default.
- W1967459934 cites W1570704007 @default.
- W1967459934 cites W1606011487 @default.
- W1967459934 cites W1814308503 @default.
- W1967459934 cites W2009303086 @default.
- W1967459934 cites W2046765929 @default.
- W1967459934 cites W2072931156 @default.
- W1967459934 cites W2073583350 @default.
- W1967459934 cites W2080759927 @default.
- W1967459934 cites W2093638026 @default.
- W1967459934 cites W2113501460 @default.
- W1967459934 cites W2114537044 @default.
- W1967459934 cites W2119717200 @default.
- W1967459934 cites W2127107099 @default.
- W1967459934 cites W2132351269 @default.
- W1967459934 cites W2137267792 @default.
- W1967459934 cites W2149418961 @default.
- W1967459934 cites W2172968643 @default.
- W1967459934 cites W2173945562 @default.
- W1967459934 cites W2610686804 @default.
- W1967459934 cites W3041202696 @default.
- W1967459934 cites W3103182070 @default.
- W1967459934 cites W4242606736 @default.
- W1967459934 cites W4246808543 @default.
- W1967459934 cites W562332459 @default.
- W1967459934 doi "https://doi.org/10.1162/neco.2009.12-08-922" @default.
- W1967459934 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/19842990" @default.
- W1967459934 hasPublicationYear "2010" @default.
- W1967459934 type Work @default.
- W1967459934 sameAs 1967459934 @default.
- W1967459934 citedByCount "17" @default.
- W1967459934 countsByYear W19674599342012 @default.
- W1967459934 countsByYear W19674599342015 @default.
- W1967459934 countsByYear W19674599342017 @default.
- W1967459934 countsByYear W19674599342018 @default.
- W1967459934 countsByYear W19674599342019 @default.
- W1967459934 countsByYear W19674599342020 @default.
- W1967459934 countsByYear W19674599342021 @default.
- W1967459934 countsByYear W19674599342022 @default.
- W1967459934 countsByYear W19674599342023 @default.
- W1967459934 crossrefType "journal-article" @default.
- W1967459934 hasAuthorship W1967459934A5004840638 @default.
- W1967459934 hasAuthorship W1967459934A5031054137 @default.
- W1967459934 hasAuthorship W1967459934A5071281836 @default.
- W1967459934 hasAuthorship W1967459934A5083076675 @default.
- W1967459934 hasAuthorship W1967459934A5089173802 @default.
- W1967459934 hasConcept C105795698 @default.
- W1967459934 hasConcept C106159729 @default.
- W1967459934 hasConcept C110121322 @default.
- W1967459934 hasConcept C111771559 @default.
- W1967459934 hasConcept C115680565 @default.
- W1967459934 hasConcept C126255220 @default.
- W1967459934 hasConcept C13280743 @default.
- W1967459934 hasConcept C134306372 @default.
- W1967459934 hasConcept C138885662 @default.
- W1967459934 hasConcept C154945302 @default.
- W1967459934 hasConcept C162324750 @default.
- W1967459934 hasConcept C185798385 @default.
- W1967459934 hasConcept C205649164 @default.
- W1967459934 hasConcept C28826006 @default.
- W1967459934 hasConcept C33923547 @default.
- W1967459934 hasConcept C39927690 @default.
- W1967459934 hasConcept C41008148 @default.
- W1967459934 hasConcept C41895202 @default.
- W1967459934 hasConcept C7149132 @default.
- W1967459934 hasConcept C97541855 @default.
- W1967459934 hasConcept C98763669 @default.
- W1967459934 hasConcept C98951983 @default.
- W1967459934 hasConceptScore W1967459934C105795698 @default.
- W1967459934 hasConceptScore W1967459934C106159729 @default.
- W1967459934 hasConceptScore W1967459934C110121322 @default.
- W1967459934 hasConceptScore W1967459934C111771559 @default.
- W1967459934 hasConceptScore W1967459934C115680565 @default.
- W1967459934 hasConceptScore W1967459934C126255220 @default.
- W1967459934 hasConceptScore W1967459934C13280743 @default.
- W1967459934 hasConceptScore W1967459934C134306372 @default.
- W1967459934 hasConceptScore W1967459934C138885662 @default.
- W1967459934 hasConceptScore W1967459934C154945302 @default.
- W1967459934 hasConceptScore W1967459934C162324750 @default.
- W1967459934 hasConceptScore W1967459934C185798385 @default.
- W1967459934 hasConceptScore W1967459934C205649164 @default.
- W1967459934 hasConceptScore W1967459934C28826006 @default.
- W1967459934 hasConceptScore W1967459934C33923547 @default.
- W1967459934 hasConceptScore W1967459934C39927690 @default.
- W1967459934 hasConceptScore W1967459934C41008148 @default.
- W1967459934 hasConceptScore W1967459934C41895202 @default.
- W1967459934 hasConceptScore W1967459934C7149132 @default.