Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387561491> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4387561491 abstract "Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to sub-optimal policies if the reward is misaligned. Moreover, recent works have demonstrated that effective shaping rewards for a particular problem can depend on the underlying learning algorithm. This paper introduces a novel way to encourage exploration called $f$-Policy Gradients, or $f$-PG. $f$-PG minimizes the f-divergence between the agent's state visitation distribution and the goal, which we show can lead to an optimal policy. We derive gradients for various f-divergences to optimize this objective. Our learning paradigm provides dense learning signals for exploration in sparse reward settings. We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective. We show that several metric-based shaping rewards like L2 can be used with $s$-MaxEnt RL, providing a common ground to study such metric-based shaping rewards with efficient exploration. We find that $f$-PG has better performance compared to standard policy gradient methods on a challenging gridworld as well as the Point Maze and FetchReach environments. More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html." @default.
- W4387561491 created "2023-10-12" @default.
- W4387561491 creator A5001594330 @default.
- W4387561491 creator A5018095069 @default.
- W4387561491 creator A5052068951 @default.
- W4387561491 creator A5069875677 @default.
- W4387561491 date "2023-10-10" @default.
- W4387561491 modified "2023-10-13" @default.
- W4387561491 title "$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences" @default.
- W4387561491 doi "https://doi.org/10.48550/arxiv.2310.06794" @default.
- W4387561491 hasPublicationYear "2023" @default.
- W4387561491 type Work @default.
- W4387561491 citedByCount "0" @default.
- W4387561491 crossrefType "posted-content" @default.
- W4387561491 hasAuthorship W4387561491A5001594330 @default.
- W4387561491 hasAuthorship W4387561491A5018095069 @default.
- W4387561491 hasAuthorship W4387561491A5052068951 @default.
- W4387561491 hasAuthorship W4387561491A5069875677 @default.
- W4387561491 hasBestOaLocation W43875614911 @default.
- W4387561491 hasConcept C106301342 @default.
- W4387561491 hasConcept C11413529 @default.
- W4387561491 hasConcept C119857082 @default.
- W4387561491 hasConcept C121332964 @default.
- W4387561491 hasConcept C126255220 @default.
- W4387561491 hasConcept C137836250 @default.
- W4387561491 hasConcept C138885662 @default.
- W4387561491 hasConcept C14036430 @default.
- W4387561491 hasConcept C154945302 @default.
- W4387561491 hasConcept C162324750 @default.
- W4387561491 hasConcept C176217482 @default.
- W4387561491 hasConcept C178635117 @default.
- W4387561491 hasConcept C207390915 @default.
- W4387561491 hasConcept C21547014 @default.
- W4387561491 hasConcept C2524010 @default.
- W4387561491 hasConcept C28719098 @default.
- W4387561491 hasConcept C33923547 @default.
- W4387561491 hasConcept C38652104 @default.
- W4387561491 hasConcept C41008148 @default.
- W4387561491 hasConcept C41895202 @default.
- W4387561491 hasConcept C62520636 @default.
- W4387561491 hasConcept C78458016 @default.
- W4387561491 hasConcept C86803240 @default.
- W4387561491 hasConcept C89109886 @default.
- W4387561491 hasConcept C9679016 @default.
- W4387561491 hasConcept C97541855 @default.
- W4387561491 hasConceptScore W4387561491C106301342 @default.
- W4387561491 hasConceptScore W4387561491C11413529 @default.
- W4387561491 hasConceptScore W4387561491C119857082 @default.
- W4387561491 hasConceptScore W4387561491C121332964 @default.
- W4387561491 hasConceptScore W4387561491C126255220 @default.
- W4387561491 hasConceptScore W4387561491C137836250 @default.
- W4387561491 hasConceptScore W4387561491C138885662 @default.
- W4387561491 hasConceptScore W4387561491C14036430 @default.
- W4387561491 hasConceptScore W4387561491C154945302 @default.
- W4387561491 hasConceptScore W4387561491C162324750 @default.
- W4387561491 hasConceptScore W4387561491C176217482 @default.
- W4387561491 hasConceptScore W4387561491C178635117 @default.
- W4387561491 hasConceptScore W4387561491C207390915 @default.
- W4387561491 hasConceptScore W4387561491C21547014 @default.
- W4387561491 hasConceptScore W4387561491C2524010 @default.
- W4387561491 hasConceptScore W4387561491C28719098 @default.
- W4387561491 hasConceptScore W4387561491C33923547 @default.
- W4387561491 hasConceptScore W4387561491C38652104 @default.
- W4387561491 hasConceptScore W4387561491C41008148 @default.
- W4387561491 hasConceptScore W4387561491C41895202 @default.
- W4387561491 hasConceptScore W4387561491C62520636 @default.
- W4387561491 hasConceptScore W4387561491C78458016 @default.
- W4387561491 hasConceptScore W4387561491C86803240 @default.
- W4387561491 hasConceptScore W4387561491C89109886 @default.
- W4387561491 hasConceptScore W4387561491C9679016 @default.
- W4387561491 hasConceptScore W4387561491C97541855 @default.
- W4387561491 hasLocation W43875614911 @default.
- W4387561491 hasOpenAccess W4387561491 @default.
- W4387561491 hasPrimaryLocation W43875614911 @default.
- W4387561491 hasRelatedWork W1997473290 @default.
- W4387561491 hasRelatedWork W2058731384 @default.
- W4387561491 hasRelatedWork W2062145486 @default.
- W4387561491 hasRelatedWork W2080108722 @default.
- W4387561491 hasRelatedWork W2353911672 @default.
- W4387561491 hasRelatedWork W2393042414 @default.
- W4387561491 hasRelatedWork W2559130006 @default.
- W4387561491 hasRelatedWork W2752681920 @default.
- W4387561491 hasRelatedWork W4312713068 @default.
- W4387561491 hasRelatedWork W4385488867 @default.
- W4387561491 isParatext "false" @default.
- W4387561491 isRetracted "false" @default.
- W4387561491 workType "article" @default.