Matches in SemOpenAlex for { <https://semopenalex.org/work/W2954295423> ?p ?o ?g. }
- W2954295423 endingPage "8831" @default.
- W2954295423 startingPage "8823" @default.
- W2954295423 abstract "Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic mirror descent. In VRMPO, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed VRMPO needs only O(ε−3) sample trajectories to achieve an ε-approximate first-order stationary point, which matches the best sample complexity for policy optimization. Extensive empirical results demonstrate that VRMP outperforms the state-of-the-art policy gradient methods in various settings." @default.
- W2954295423 created "2019-07-12" @default.
- W2954295423 creator A5003356739 @default.
- W2954295423 creator A5005694880 @default.
- W2954295423 creator A5014394467 @default.
- W2954295423 creator A5038119641 @default.
- W2954295423 creator A5069283448 @default.
- W2954295423 creator A5071773009 @default.
- W2954295423 creator A5084291326 @default.
- W2954295423 date "2022-06-28" @default.
- W2954295423 modified "2023-10-16" @default.
- W2954295423 title "Policy Optimization with Stochastic Mirror Descent" @default.
- W2954295423 cites W1499669280 @default.
- W2954295423 cites W1505731132 @default.
- W2954295423 cites W1540586255 @default.
- W2954295423 cites W1587704401 @default.
- W2954295423 cites W1771410628 @default.
- W2954295423 cites W1777239053 @default.
- W2954295423 cites W1953936588 @default.
- W2954295423 cites W1992208280 @default.
- W2954295423 cites W1995875735 @default.
- W2954295423 cites W2016384870 @default.
- W2954295423 cites W2029463628 @default.
- W2954295423 cites W2038497950 @default.
- W2954295423 cites W205960364 @default.
- W2954295423 cites W2107438106 @default.
- W2954295423 cites W2108682071 @default.
- W2954295423 cites W2119717200 @default.
- W2954295423 cites W2121863487 @default.
- W2954295423 cites W2122882636 @default.
- W2954295423 cites W2130801532 @default.
- W2954295423 cites W2145339207 @default.
- W2954295423 cites W2155027007 @default.
- W2954295423 cites W2156718681 @default.
- W2954295423 cites W2158782408 @default.
- W2954295423 cites W2162262334 @default.
- W2954295423 cites W2162831327 @default.
- W2954295423 cites W2165150801 @default.
- W2954295423 cites W2312609093 @default.
- W2954295423 cites W2554120691 @default.
- W2954295423 cites W2554984891 @default.
- W2954295423 cites W2594203335 @default.
- W2954295423 cites W2619167391 @default.
- W2954295423 cites W2736601468 @default.
- W2954295423 cites W2766447205 @default.
- W2954295423 cites W2767133776 @default.
- W2954295423 cites W2786887383 @default.
- W2954295423 cites W2806985155 @default.
- W2954295423 cites W2807821938 @default.
- W2954295423 cites W2889643881 @default.
- W2954295423 cites W2890597452 @default.
- W2954295423 cites W2913611331 @default.
- W2954295423 cites W2945007422 @default.
- W2954295423 cites W2949608212 @default.
- W2954295423 cites W2952215077 @default.
- W2954295423 cites W2962696654 @default.
- W2954295423 cites W2962777832 @default.
- W2954295423 cites W2962902376 @default.
- W2954295423 cites W2962916883 @default.
- W2954295423 cites W2962936775 @default.
- W2954295423 cites W2962970637 @default.
- W2954295423 cites W2963156201 @default.
- W2954295423 cites W2963184621 @default.
- W2954295423 cites W2963267001 @default.
- W2954295423 cites W2963411541 @default.
- W2954295423 cites W2963457007 @default.
- W2954295423 cites W2963650250 @default.
- W2954295423 cites W2963851840 @default.
- W2954295423 cites W2963864421 @default.
- W2954295423 cites W2963923407 @default.
- W2954295423 cites W2963965485 @default.
- W2954295423 cites W2963973601 @default.
- W2954295423 cites W2964025922 @default.
- W2954295423 cites W2964043796 @default.
- W2954295423 cites W2964410826 @default.
- W2954295423 cites W2967027099 @default.
- W2954295423 cites W3039845099 @default.
- W2954295423 cites W3046626913 @default.
- W2954295423 cites W3141595720 @default.
- W2954295423 cites W633277346 @default.
- W2954295423 cites W2859870583 @default.
- W2954295423 doi "https://doi.org/10.1609/aaai.v36i8.20863" @default.
- W2954295423 hasPublicationYear "2022" @default.
- W2954295423 type Work @default.
- W2954295423 sameAs 2954295423 @default.
- W2954295423 citedByCount "8" @default.
- W2954295423 countsByYear W29542954232019 @default.
- W2954295423 countsByYear W29542954232020 @default.
- W2954295423 countsByYear W29542954232021 @default.
- W2954295423 countsByYear W29542954232023 @default.
- W2954295423 crossrefType "journal-article" @default.
- W2954295423 hasAuthorship W2954295423A5003356739 @default.
- W2954295423 hasAuthorship W2954295423A5005694880 @default.
- W2954295423 hasAuthorship W2954295423A5014394467 @default.
- W2954295423 hasAuthorship W2954295423A5038119641 @default.
- W2954295423 hasAuthorship W2954295423A5069283448 @default.
- W2954295423 hasAuthorship W2954295423A5071773009 @default.
- W2954295423 hasAuthorship W2954295423A5084291326 @default.