Matches in SemOpenAlex for { <https://semopenalex.org/work/W2893813829> ?p ?o ?g. }
- W2893813829 abstract "Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, an agent learns a policy over $mathbb{R}^d$ and in the latter, over a discrete set of actions each of which is parametrized by a continuous parameter. Such problems are naturally solved using policy based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading to instability and slow convergence. Unnecessary variance is introduced whenever policies over bounded action spaces are modeled using distributions with unbounded support by applying a transformation $T$ to the sampled action before execution in the environment. Recently, the variance reduced clipped action policy gradient (CAPG) was introduced for actions in bounded intervals, but to date no variance reduced methods exist when the action is a direction, something often seen in RTS games. To this end we introduce the angular policy gradient (APG), a stochastic policy gradient method for directional control. With the marginal policy gradients family of estimators we present a unified analysis of the variance reduction properties of APG and CAPG; our results provide a stronger guarantee than existing analyses for CAPG. Experimental results on a popular RTS game and a navigation task show that the APG estimator offers a substantial improvement over the standard policy gradient." @default.
- W2893813829 created "2018-10-05" @default.
- W2893813829 creator A5019708503 @default.
- W2893813829 creator A5049225160 @default.
- W2893813829 creator A5053914366 @default.
- W2893813829 creator A5069376593 @default.
- W2893813829 date "2018-06-13" @default.
- W2893813829 modified "2023-09-27" @default.
- W2893813829 title "Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications" @default.
- W2893813829 cites W1771410628 @default.
- W2893813829 cites W1777239053 @default.
- W2893813829 cites W2072403451 @default.
- W2893813829 cites W2089656010 @default.
- W2893813829 cites W2108682071 @default.
- W2893813829 cites W2121863487 @default.
- W2893813829 cites W2134110604 @default.
- W2893813829 cites W2145339207 @default.
- W2893813829 cites W2155027007 @default.
- W2893813829 cites W2165150801 @default.
- W2893813829 cites W2395575420 @default.
- W2893813829 cites W2518713116 @default.
- W2893813829 cites W2604763608 @default.
- W2893813829 cites W2618736734 @default.
- W2893813829 cites W2736601468 @default.
- W2893813829 cites W2740828027 @default.
- W2893813829 cites W2749807327 @default.
- W2893813829 cites W2753511062 @default.
- W2893813829 cites W2783932892 @default.
- W2893813829 cites W2899771611 @default.
- W2893813829 cites W2949608212 @default.
- W2893813829 cites W2963184621 @default.
- W2893813829 cites W2963286043 @default.
- W2893813829 cites W2963312729 @default.
- W2893813829 cites W2963454111 @default.
- W2893813829 cites W2963616477 @default.
- W2893813829 cites W2963945817 @default.
- W2893813829 cites W2964043796 @default.
- W2893813829 cites W2964121744 @default.
- W2893813829 cites W2964279789 @default.
- W2893813829 hasPublicationYear "2018" @default.
- W2893813829 type Work @default.
- W2893813829 sameAs 2893813829 @default.
- W2893813829 citedByCount "1" @default.
- W2893813829 countsByYear W28938138292019 @default.
- W2893813829 crossrefType "posted-content" @default.
- W2893813829 hasAuthorship W2893813829A5019708503 @default.
- W2893813829 hasAuthorship W2893813829A5049225160 @default.
- W2893813829 hasAuthorship W2893813829A5053914366 @default.
- W2893813829 hasAuthorship W2893813829A5069376593 @default.
- W2893813829 hasConcept C104317684 @default.
- W2893813829 hasConcept C105795698 @default.
- W2893813829 hasConcept C112972136 @default.
- W2893813829 hasConcept C119857082 @default.
- W2893813829 hasConcept C121332964 @default.
- W2893813829 hasConcept C121955636 @default.
- W2893813829 hasConcept C126255220 @default.
- W2893813829 hasConcept C134306372 @default.
- W2893813829 hasConcept C154945302 @default.
- W2893813829 hasConcept C162324750 @default.
- W2893813829 hasConcept C177264268 @default.
- W2893813829 hasConcept C185429906 @default.
- W2893813829 hasConcept C185592680 @default.
- W2893813829 hasConcept C196083921 @default.
- W2893813829 hasConcept C199360897 @default.
- W2893813829 hasConcept C204241405 @default.
- W2893813829 hasConcept C2777303404 @default.
- W2893813829 hasConcept C2780791683 @default.
- W2893813829 hasConcept C33923547 @default.
- W2893813829 hasConcept C34388435 @default.
- W2893813829 hasConcept C41008148 @default.
- W2893813829 hasConcept C50522688 @default.
- W2893813829 hasConcept C55493867 @default.
- W2893813829 hasConcept C62520636 @default.
- W2893813829 hasConcept C97541855 @default.
- W2893813829 hasConceptScore W2893813829C104317684 @default.
- W2893813829 hasConceptScore W2893813829C105795698 @default.
- W2893813829 hasConceptScore W2893813829C112972136 @default.
- W2893813829 hasConceptScore W2893813829C119857082 @default.
- W2893813829 hasConceptScore W2893813829C121332964 @default.
- W2893813829 hasConceptScore W2893813829C121955636 @default.
- W2893813829 hasConceptScore W2893813829C126255220 @default.
- W2893813829 hasConceptScore W2893813829C134306372 @default.
- W2893813829 hasConceptScore W2893813829C154945302 @default.
- W2893813829 hasConceptScore W2893813829C162324750 @default.
- W2893813829 hasConceptScore W2893813829C177264268 @default.
- W2893813829 hasConceptScore W2893813829C185429906 @default.
- W2893813829 hasConceptScore W2893813829C185592680 @default.
- W2893813829 hasConceptScore W2893813829C196083921 @default.
- W2893813829 hasConceptScore W2893813829C199360897 @default.
- W2893813829 hasConceptScore W2893813829C204241405 @default.
- W2893813829 hasConceptScore W2893813829C2777303404 @default.
- W2893813829 hasConceptScore W2893813829C2780791683 @default.
- W2893813829 hasConceptScore W2893813829C33923547 @default.
- W2893813829 hasConceptScore W2893813829C34388435 @default.
- W2893813829 hasConceptScore W2893813829C41008148 @default.
- W2893813829 hasConceptScore W2893813829C50522688 @default.
- W2893813829 hasConceptScore W2893813829C55493867 @default.
- W2893813829 hasConceptScore W2893813829C62520636 @default.
- W2893813829 hasConceptScore W2893813829C97541855 @default.
- W2893813829 hasLocation W28938138291 @default.