Matches in SemOpenAlex for { <https://semopenalex.org/work/W2767133776> ?p ?o ?g. }
- W2767133776 abstract "Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to significantly improve the sample-efficiency. The SVRG estimation is incorporated into a trust-region Newton conjugate gradient framework for the policy optimization. On several Mujoco tasks, our method achieves significantly better performance compared to the state-of-the-art model-free policy gradient methods in robotic continuous control such as trust region policy optimization (TRPO)" @default.
- W2767133776 created "2017-11-10" @default.
- W2767133776 creator A5014048942 @default.
- W2767133776 creator A5028384432 @default.
- W2767133776 creator A5061177999 @default.
- W2767133776 date "2017-10-16" @default.
- W2767133776 modified "2023-09-27" @default.
- W2767133776 title "Stochastic Variance Reduction for Policy Gradient Estimation" @default.
- W2767133776 cites W1515851193 @default.
- W2767133776 cites W1771410628 @default.
- W2767133776 cites W1967736575 @default.
- W2767133776 cites W1969074599 @default.
- W2767133776 cites W1991083751 @default.
- W2767133776 cites W2006722592 @default.
- W2767133776 cites W2012587148 @default.
- W2767133776 cites W2107438106 @default.
- W2767133776 cites W2109008048 @default.
- W2767133776 cites W2119717200 @default.
- W2767133776 cites W2124541940 @default.
- W2767133776 cites W2125612430 @default.
- W2767133776 cites W2126909264 @default.
- W2767133776 cites W2139053308 @default.
- W2767133776 cites W2140135625 @default.
- W2767133776 cites W2145339207 @default.
- W2767133776 cites W2155027007 @default.
- W2767133776 cites W2158782408 @default.
- W2767133776 cites W2342662072 @default.
- W2767133776 cites W2568597750 @default.
- W2767133776 cites W2594203335 @default.
- W2767133776 cites W2722088290 @default.
- W2767133776 cites W2739340005 @default.
- W2767133776 cites W2798766386 @default.
- W2767133776 cites W2963397933 @default.
- W2767133776 cites W2963941964 @default.
- W2767133776 cites W2964161785 @default.
- W2767133776 cites W3029645440 @default.
- W2767133776 doi "https://doi.org/10.48550/arxiv.1710.06034" @default.
- W2767133776 hasPublicationYear "2017" @default.
- W2767133776 type Work @default.
- W2767133776 sameAs 2767133776 @default.
- W2767133776 citedByCount "10" @default.
- W2767133776 countsByYear W27671337762019 @default.
- W2767133776 countsByYear W27671337762020 @default.
- W2767133776 countsByYear W27671337762021 @default.
- W2767133776 countsByYear W27671337762022 @default.
- W2767133776 crossrefType "posted-content" @default.
- W2767133776 hasAuthorship W2767133776A5014048942 @default.
- W2767133776 hasAuthorship W2767133776A5028384432 @default.
- W2767133776 hasAuthorship W2767133776A5061177999 @default.
- W2767133776 hasBestOaLocation W27671337761 @default.
- W2767133776 hasConcept C107673813 @default.
- W2767133776 hasConcept C111335779 @default.
- W2767133776 hasConcept C111350023 @default.
- W2767133776 hasConcept C11413529 @default.
- W2767133776 hasConcept C115680565 @default.
- W2767133776 hasConcept C121332964 @default.
- W2767133776 hasConcept C121683094 @default.
- W2767133776 hasConcept C121955636 @default.
- W2767133776 hasConcept C126255220 @default.
- W2767133776 hasConcept C13153151 @default.
- W2767133776 hasConcept C153258448 @default.
- W2767133776 hasConcept C154945302 @default.
- W2767133776 hasConcept C162324750 @default.
- W2767133776 hasConcept C178635117 @default.
- W2767133776 hasConcept C187736073 @default.
- W2767133776 hasConcept C196083921 @default.
- W2767133776 hasConcept C198531522 @default.
- W2767133776 hasConcept C2524010 @default.
- W2767133776 hasConcept C2778334786 @default.
- W2767133776 hasConcept C33923547 @default.
- W2767133776 hasConcept C38652104 @default.
- W2767133776 hasConcept C41008148 @default.
- W2767133776 hasConcept C44870925 @default.
- W2767133776 hasConcept C50644808 @default.
- W2767133776 hasConcept C62644790 @default.
- W2767133776 hasConcept C81184566 @default.
- W2767133776 hasConcept C89109886 @default.
- W2767133776 hasConcept C96250715 @default.
- W2767133776 hasConcept C97355855 @default.
- W2767133776 hasConcept C97541855 @default.
- W2767133776 hasConceptScore W2767133776C107673813 @default.
- W2767133776 hasConceptScore W2767133776C111335779 @default.
- W2767133776 hasConceptScore W2767133776C111350023 @default.
- W2767133776 hasConceptScore W2767133776C11413529 @default.
- W2767133776 hasConceptScore W2767133776C115680565 @default.
- W2767133776 hasConceptScore W2767133776C121332964 @default.
- W2767133776 hasConceptScore W2767133776C121683094 @default.
- W2767133776 hasConceptScore W2767133776C121955636 @default.
- W2767133776 hasConceptScore W2767133776C126255220 @default.
- W2767133776 hasConceptScore W2767133776C13153151 @default.
- W2767133776 hasConceptScore W2767133776C153258448 @default.
- W2767133776 hasConceptScore W2767133776C154945302 @default.
- W2767133776 hasConceptScore W2767133776C162324750 @default.
- W2767133776 hasConceptScore W2767133776C178635117 @default.
- W2767133776 hasConceptScore W2767133776C187736073 @default.
- W2767133776 hasConceptScore W2767133776C196083921 @default.
- W2767133776 hasConceptScore W2767133776C198531522 @default.
- W2767133776 hasConceptScore W2767133776C2524010 @default.
- W2767133776 hasConceptScore W2767133776C2778334786 @default.
- W2767133776 hasConceptScore W2767133776C33923547 @default.