Matches in SemOpenAlex for { <https://semopenalex.org/work/W2950403652> ?p ?o ?g. }
- W2950403652 abstract "The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters. However, most policy gradient methods drop the discount factor from the state distribution and therefore do not optimize the discounted objective. What do they optimize instead? This has been an open question for several years, and this lack of theoretical clarity has lead to an abundance of misstatements in the literature. We answer this question by proving that the update direction approximated by most methods is not the gradient of any function. Further, we argue that algorithms that follow this direction are not guaranteed to converge to a reasonable fixed point by constructing a counterexample wherein the fixed point is globally pessimal with respect to both the discounted and undiscounted objectives. We motivate this work by surveying the literature and showing that there remains a widespread misunderstanding regarding discounted policy gradient methods, with errors present even in highly-cited papers published at top conferences." @default.
- W2950403652 created "2019-06-27" @default.
- W2950403652 creator A5038122282 @default.
- W2950403652 creator A5066332280 @default.
- W2950403652 date "2019-06-17" @default.
- W2950403652 modified "2023-10-12" @default.
- W2950403652 title "Is the Policy Gradient a Gradient" @default.
- W2950403652 cites W1191599655 @default.
- W2950403652 cites W1606011487 @default.
- W2950403652 cites W1646707810 @default.
- W2950403652 cites W1771410628 @default.
- W2950403652 cites W1971942712 @default.
- W2950403652 cites W2075268401 @default.
- W2950403652 cites W2097415784 @default.
- W2950403652 cites W2101915445 @default.
- W2950403652 cites W2119717200 @default.
- W2950403652 cites W2121863487 @default.
- W2950403652 cites W2128477394 @default.
- W2950403652 cites W2137766593 @default.
- W2950403652 cites W2145339207 @default.
- W2950403652 cites W2153874061 @default.
- W2950403652 cites W2155027007 @default.
- W2950403652 cites W2165131254 @default.
- W2950403652 cites W2165150801 @default.
- W2950403652 cites W2395162158 @default.
- W2950403652 cites W2556958149 @default.
- W2950403652 cites W2575667675 @default.
- W2950403652 cites W2736601468 @default.
- W2950403652 cites W2749928749 @default.
- W2950403652 cites W2949608212 @default.
- W2950403652 cites W2962902376 @default.
- W2950403652 cites W2963864421 @default.
- W2950403652 cites W2963923407 @default.
- W2950403652 cites W2964043796 @default.
- W2950403652 cites W779665318 @default.
- W2950403652 hasPublicationYear "2019" @default.
- W2950403652 type Work @default.
- W2950403652 sameAs 2950403652 @default.
- W2950403652 citedByCount "2" @default.
- W2950403652 countsByYear W29504036522021 @default.
- W2950403652 crossrefType "posted-content" @default.
- W2950403652 hasAuthorship W2950403652A5038122282 @default.
- W2950403652 hasAuthorship W2950403652A5066332280 @default.
- W2950403652 hasConcept C10138342 @default.
- W2950403652 hasConcept C115680565 @default.
- W2950403652 hasConcept C118615104 @default.
- W2950403652 hasConcept C121332964 @default.
- W2950403652 hasConcept C126255220 @default.
- W2950403652 hasConcept C14036430 @default.
- W2950403652 hasConcept C144237770 @default.
- W2950403652 hasConcept C162324750 @default.
- W2950403652 hasConcept C162838799 @default.
- W2950403652 hasConcept C185592680 @default.
- W2950403652 hasConcept C18762648 @default.
- W2950403652 hasConcept C2524010 @default.
- W2950403652 hasConcept C2777146004 @default.
- W2950403652 hasConcept C28719098 @default.
- W2950403652 hasConcept C28826006 @default.
- W2950403652 hasConcept C33923547 @default.
- W2950403652 hasConcept C41008148 @default.
- W2950403652 hasConcept C55493867 @default.
- W2950403652 hasConcept C6177178 @default.
- W2950403652 hasConcept C78458016 @default.
- W2950403652 hasConcept C86803240 @default.
- W2950403652 hasConcept C97355855 @default.
- W2950403652 hasConceptScore W2950403652C10138342 @default.
- W2950403652 hasConceptScore W2950403652C115680565 @default.
- W2950403652 hasConceptScore W2950403652C118615104 @default.
- W2950403652 hasConceptScore W2950403652C121332964 @default.
- W2950403652 hasConceptScore W2950403652C126255220 @default.
- W2950403652 hasConceptScore W2950403652C14036430 @default.
- W2950403652 hasConceptScore W2950403652C144237770 @default.
- W2950403652 hasConceptScore W2950403652C162324750 @default.
- W2950403652 hasConceptScore W2950403652C162838799 @default.
- W2950403652 hasConceptScore W2950403652C185592680 @default.
- W2950403652 hasConceptScore W2950403652C18762648 @default.
- W2950403652 hasConceptScore W2950403652C2524010 @default.
- W2950403652 hasConceptScore W2950403652C2777146004 @default.
- W2950403652 hasConceptScore W2950403652C28719098 @default.
- W2950403652 hasConceptScore W2950403652C28826006 @default.
- W2950403652 hasConceptScore W2950403652C33923547 @default.
- W2950403652 hasConceptScore W2950403652C41008148 @default.
- W2950403652 hasConceptScore W2950403652C55493867 @default.
- W2950403652 hasConceptScore W2950403652C6177178 @default.
- W2950403652 hasConceptScore W2950403652C78458016 @default.
- W2950403652 hasConceptScore W2950403652C86803240 @default.
- W2950403652 hasConceptScore W2950403652C97355855 @default.
- W2950403652 hasLocation W29504036521 @default.
- W2950403652 hasOpenAccess W2950403652 @default.
- W2950403652 hasPrimaryLocation W29504036521 @default.
- W2950403652 hasRelatedWork W1606188723 @default.
- W2950403652 hasRelatedWork W1608247161 @default.
- W2950403652 hasRelatedWork W2007083034 @default.
- W2950403652 hasRelatedWork W2028482653 @default.
- W2950403652 hasRelatedWork W2064791110 @default.
- W2950403652 hasRelatedWork W2742582601 @default.
- W2950403652 hasRelatedWork W2901524545 @default.
- W2950403652 hasRelatedWork W2949651824 @default.
- W2950403652 hasRelatedWork W2953282161 @default.
- W2950403652 hasRelatedWork W3037435714 @default.