Matches in SemOpenAlex for { <https://semopenalex.org/work/W2808135057> ?p ?o ?g. }
- W2808135057 abstract "Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, an agent learns a policy over $mathbb{R}^d$ and in the latter, over a discrete set of actions each of which is parametrized by a continuous parameter. Such problems are naturally solved using policy based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading to instability and slow convergence. Unnecessary variance is introduced whenever policies over bounded action spaces are modeled using distributions with unbounded support by applying a transformation $T$ to the sampled action before execution in the environment. Recently, the variance reduced clipped action policy gradient (CAPG) was introduced for actions in bounded intervals, but to date no variance reduced methods exist when the action is a direction, something often seen in RTS games. To this end we introduce the angular policy gradient (APG), a stochastic policy gradient method for directional control. With the marginal policy gradients family of estimators we present a unified analysis of the variance reduction properties of APG and CAPG; our results provide a stronger guarantee than existing analyses for CAPG. Experimental results on a popular RTS game and a navigation task show that the APG estimator offers a substantial improvement over the standard policy gradient." @default.
- W2808135057 created "2018-06-21" @default.
- W2808135057 creator A5019708503 @default.
- W2808135057 creator A5049225160 @default.
- W2808135057 creator A5053914366 @default.
- W2808135057 creator A5069376593 @default.
- W2808135057 date "2018-06-13" @default.
- W2808135057 modified "2023-09-26" @default.
- W2808135057 title "Marginal Policy Gradients for Complex Control." @default.
- W2808135057 cites W1191599655 @default.
- W2808135057 cites W1771410628 @default.
- W2808135057 cites W1777239053 @default.
- W2808135057 cites W2072403451 @default.
- W2808135057 cites W2108682071 @default.
- W2808135057 cites W2145339207 @default.
- W2808135057 cites W2165150801 @default.
- W2808135057 cites W2604763608 @default.
- W2808135057 cites W2606433045 @default.
- W2808135057 cites W2624413595 @default.
- W2808135057 cites W2736601468 @default.
- W2808135057 cites W2740828027 @default.
- W2808135057 cites W2753511062 @default.
- W2808135057 cites W2783932892 @default.
- W2808135057 cites W2963312729 @default.
- W2808135057 cites W2964043796 @default.
- W2808135057 cites W2964121744 @default.
- W2808135057 cites W2964279789 @default.
- W2808135057 hasPublicationYear "2018" @default.
- W2808135057 type Work @default.
- W2808135057 sameAs 2808135057 @default.
- W2808135057 citedByCount "0" @default.
- W2808135057 crossrefType "posted-content" @default.
- W2808135057 hasAuthorship W2808135057A5019708503 @default.
- W2808135057 hasAuthorship W2808135057A5049225160 @default.
- W2808135057 hasAuthorship W2808135057A5053914366 @default.
- W2808135057 hasAuthorship W2808135057A5069376593 @default.
- W2808135057 hasConcept C104317684 @default.
- W2808135057 hasConcept C105795698 @default.
- W2808135057 hasConcept C112972136 @default.
- W2808135057 hasConcept C119857082 @default.
- W2808135057 hasConcept C121332964 @default.
- W2808135057 hasConcept C121955636 @default.
- W2808135057 hasConcept C126255220 @default.
- W2808135057 hasConcept C134306372 @default.
- W2808135057 hasConcept C154945302 @default.
- W2808135057 hasConcept C162324750 @default.
- W2808135057 hasConcept C177264268 @default.
- W2808135057 hasConcept C185429906 @default.
- W2808135057 hasConcept C185592680 @default.
- W2808135057 hasConcept C196083921 @default.
- W2808135057 hasConcept C199360897 @default.
- W2808135057 hasConcept C204241405 @default.
- W2808135057 hasConcept C2777303404 @default.
- W2808135057 hasConcept C2780791683 @default.
- W2808135057 hasConcept C33923547 @default.
- W2808135057 hasConcept C34388435 @default.
- W2808135057 hasConcept C41008148 @default.
- W2808135057 hasConcept C50522688 @default.
- W2808135057 hasConcept C55493867 @default.
- W2808135057 hasConcept C62520636 @default.
- W2808135057 hasConcept C62644790 @default.
- W2808135057 hasConcept C97541855 @default.
- W2808135057 hasConceptScore W2808135057C104317684 @default.
- W2808135057 hasConceptScore W2808135057C105795698 @default.
- W2808135057 hasConceptScore W2808135057C112972136 @default.
- W2808135057 hasConceptScore W2808135057C119857082 @default.
- W2808135057 hasConceptScore W2808135057C121332964 @default.
- W2808135057 hasConceptScore W2808135057C121955636 @default.
- W2808135057 hasConceptScore W2808135057C126255220 @default.
- W2808135057 hasConceptScore W2808135057C134306372 @default.
- W2808135057 hasConceptScore W2808135057C154945302 @default.
- W2808135057 hasConceptScore W2808135057C162324750 @default.
- W2808135057 hasConceptScore W2808135057C177264268 @default.
- W2808135057 hasConceptScore W2808135057C185429906 @default.
- W2808135057 hasConceptScore W2808135057C185592680 @default.
- W2808135057 hasConceptScore W2808135057C196083921 @default.
- W2808135057 hasConceptScore W2808135057C199360897 @default.
- W2808135057 hasConceptScore W2808135057C204241405 @default.
- W2808135057 hasConceptScore W2808135057C2777303404 @default.
- W2808135057 hasConceptScore W2808135057C2780791683 @default.
- W2808135057 hasConceptScore W2808135057C33923547 @default.
- W2808135057 hasConceptScore W2808135057C34388435 @default.
- W2808135057 hasConceptScore W2808135057C41008148 @default.
- W2808135057 hasConceptScore W2808135057C50522688 @default.
- W2808135057 hasConceptScore W2808135057C55493867 @default.
- W2808135057 hasConceptScore W2808135057C62520636 @default.
- W2808135057 hasConceptScore W2808135057C62644790 @default.
- W2808135057 hasConceptScore W2808135057C97541855 @default.
- W2808135057 hasLocation W28081350571 @default.
- W2808135057 hasOpenAccess W2808135057 @default.
- W2808135057 hasPrimaryLocation W28081350571 @default.
- W2808135057 hasRelatedWork W2165150801 @default.
- W2808135057 hasRelatedWork W2684685482 @default.
- W2808135057 hasRelatedWork W2783932892 @default.
- W2808135057 hasRelatedWork W2883895200 @default.
- W2808135057 hasRelatedWork W2893813829 @default.
- W2808135057 hasRelatedWork W2951818274 @default.
- W2808135057 hasRelatedWork W2963285565 @default.
- W2808135057 hasRelatedWork W2963457007 @default.
- W2808135057 hasRelatedWork W2979330446 @default.