Matches in SemOpenAlex for { <https://semopenalex.org/work/W2684685482> ?p ?o ?g. }
- W2684685482 abstract "We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates across the action when estimating the gradient, instead of relying only on the action in the sampled trajectory. We establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. We also prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and, for the Gaussian case, with no computational overhead. Finally, we show that it is optimal in a certain sense to explore with a Gaussian policy such that the covariance is proportional to the exponential of the scaled Hessian of the critic with respect to the actions. We present empirical results confirming that this new form of exploration substantially outperforms DPG with the Ornstein-Uhlenbeck heuristic in four challenging MuJoCo domains." @default.
- W2684685482 created "2017-06-30" @default.
- W2684685482 creator A5043239539 @default.
- W2684685482 creator A5056879203 @default.
- W2684685482 date "2017-06-15" @default.
- W2684685482 modified "2023-09-27" @default.
- W2684685482 title "Expected Policy Gradients" @default.
- W2684685482 cites W1522301498 @default.
- W2684685482 cites W1646707810 @default.
- W2684685482 cites W1906772730 @default.
- W2684685482 cites W2100752967 @default.
- W2684685482 cites W2112964839 @default.
- W2684685482 cites W2115126871 @default.
- W2684685482 cites W2127107099 @default.
- W2684685482 cites W2130005627 @default.
- W2684685482 cites W2130801532 @default.
- W2684685482 cites W2136602922 @default.
- W2684685482 cites W2137104525 @default.
- W2684685482 cites W2155027007 @default.
- W2684685482 cites W2158782408 @default.
- W2684685482 cites W2165150801 @default.
- W2684685482 cites W2172968643 @default.
- W2684685482 cites W2173248099 @default.
- W2684685482 cites W2535247013 @default.
- W2684685482 cites W2565081646 @default.
- W2684685482 cites W2747402019 @default.
- W2684685482 cites W2949608212 @default.
- W2684685482 cites W2950471160 @default.
- W2684685482 cites W2950492145 @default.
- W2684685482 cites W3139377883 @default.
- W2684685482 hasPublicationYear "2017" @default.
- W2684685482 type Work @default.
- W2684685482 sameAs 2684685482 @default.
- W2684685482 citedByCount "6" @default.
- W2684685482 countsByYear W26846854822017 @default.
- W2684685482 countsByYear W26846854822019 @default.
- W2684685482 crossrefType "posted-content" @default.
- W2684685482 hasAuthorship W2684685482A5043239539 @default.
- W2684685482 hasAuthorship W2684685482A5056879203 @default.
- W2684685482 hasConcept C105795698 @default.
- W2684685482 hasConcept C121332964 @default.
- W2684685482 hasConcept C121955636 @default.
- W2684685482 hasConcept C126255220 @default.
- W2684685482 hasConcept C1276947 @default.
- W2684685482 hasConcept C129844170 @default.
- W2684685482 hasConcept C134306372 @default.
- W2684685482 hasConcept C13662910 @default.
- W2684685482 hasConcept C151376022 @default.
- W2684685482 hasConcept C154945302 @default.
- W2684685482 hasConcept C162324750 @default.
- W2684685482 hasConcept C163716315 @default.
- W2684685482 hasConcept C173801870 @default.
- W2684685482 hasConcept C178650346 @default.
- W2684685482 hasConcept C196083921 @default.
- W2684685482 hasConcept C203616005 @default.
- W2684685482 hasConcept C2524010 @default.
- W2684685482 hasConcept C2780791683 @default.
- W2684685482 hasConcept C28826006 @default.
- W2684685482 hasConcept C33923547 @default.
- W2684685482 hasConcept C41008148 @default.
- W2684685482 hasConcept C62520636 @default.
- W2684685482 hasConcept C97541855 @default.
- W2684685482 hasConceptScore W2684685482C105795698 @default.
- W2684685482 hasConceptScore W2684685482C121332964 @default.
- W2684685482 hasConceptScore W2684685482C121955636 @default.
- W2684685482 hasConceptScore W2684685482C126255220 @default.
- W2684685482 hasConceptScore W2684685482C1276947 @default.
- W2684685482 hasConceptScore W2684685482C129844170 @default.
- W2684685482 hasConceptScore W2684685482C134306372 @default.
- W2684685482 hasConceptScore W2684685482C13662910 @default.
- W2684685482 hasConceptScore W2684685482C151376022 @default.
- W2684685482 hasConceptScore W2684685482C154945302 @default.
- W2684685482 hasConceptScore W2684685482C162324750 @default.
- W2684685482 hasConceptScore W2684685482C163716315 @default.
- W2684685482 hasConceptScore W2684685482C173801870 @default.
- W2684685482 hasConceptScore W2684685482C178650346 @default.
- W2684685482 hasConceptScore W2684685482C196083921 @default.
- W2684685482 hasConceptScore W2684685482C203616005 @default.
- W2684685482 hasConceptScore W2684685482C2524010 @default.
- W2684685482 hasConceptScore W2684685482C2780791683 @default.
- W2684685482 hasConceptScore W2684685482C28826006 @default.
- W2684685482 hasConceptScore W2684685482C33923547 @default.
- W2684685482 hasConceptScore W2684685482C41008148 @default.
- W2684685482 hasConceptScore W2684685482C62520636 @default.
- W2684685482 hasConceptScore W2684685482C97541855 @default.
- W2684685482 hasLocation W26846854821 @default.
- W2684685482 hasOpenAccess W2684685482 @default.
- W2684685482 hasPrimaryLocation W26846854821 @default.
- W2684685482 hasRelatedWork W2155027007 @default.
- W2684685482 hasRelatedWork W2165150801 @default.
- W2684685482 hasRelatedWork W2783932892 @default.
- W2684685482 hasRelatedWork W2808135057 @default.
- W2684685482 hasRelatedWork W2883895200 @default.
- W2684685482 hasRelatedWork W2893813829 @default.
- W2684685482 hasRelatedWork W2895049160 @default.
- W2684685482 hasRelatedWork W2951818274 @default.
- W2684685482 hasRelatedWork W2963457007 @default.
- W2684685482 hasRelatedWork W2964043796 @default.
- W2684685482 hasRelatedWork W2964325153 @default.
- W2684685482 hasRelatedWork W3034567339 @default.