Matches in SemOpenAlex for { <https://semopenalex.org/work/W2950230866> ?p ?o ?g. }
- W2950230866 abstract "Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved with guidance from the reactive policy. In this work we study this Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes." @default.
- W2950230866 created "2019-06-27" @default.
- W2950230866 creator A5012830032 @default.
- W2950230866 creator A5032266950 @default.
- W2950230866 creator A5034064327 @default.
- W2950230866 creator A5052552981 @default.
- W2950230866 date "2018-05-28" @default.
- W2950230866 modified "2023-09-27" @default.
- W2950230866 title "Dual Policy Iteration" @default.
- W2950230866 cites W112666333 @default.
- W2950230866 cites W1191599655 @default.
- W2950230866 cites W1564755532 @default.
- W2950230866 cites W1575592356 @default.
- W2950230866 cites W1625390266 @default.
- W2950230866 cites W1771410628 @default.
- W2950230866 cites W1941445455 @default.
- W2950230866 cites W2002793865 @default.
- W2950230866 cites W2105038027 @default.
- W2950230866 cites W2119579400 @default.
- W2950230866 cites W2119717200 @default.
- W2950230866 cites W2121103318 @default.
- W2950230866 cites W2130105540 @default.
- W2950230866 cites W2130801532 @default.
- W2950230866 cites W2138839127 @default.
- W2950230866 cites W2158782408 @default.
- W2950230866 cites W2165060096 @default.
- W2950230866 cites W2165421048 @default.
- W2950230866 cites W2169080882 @default.
- W2950230866 cites W2257979135 @default.
- W2950230866 cites W2268617045 @default.
- W2950230866 cites W2292274090 @default.
- W2950230866 cites W2528734395 @default.
- W2950230866 cites W2594640072 @default.
- W2950230866 cites W2618097077 @default.
- W2950230866 cites W2766447205 @default.
- W2950230866 cites W2789525339 @default.
- W2950230866 cites W2949608212 @default.
- W2950230866 cites W2950775382 @default.
- W2950230866 cites W2962957031 @default.
- W2950230866 cites W2963194474 @default.
- W2950230866 cites W2963630259 @default.
- W2950230866 cites W2964161785 @default.
- W2950230866 cites W2964349150 @default.
- W2950230866 cites W3139377883 @default.
- W2950230866 cites W3195133498 @default.
- W2950230866 cites W64088143 @default.
- W2950230866 hasPublicationYear "2018" @default.
- W2950230866 type Work @default.
- W2950230866 sameAs 2950230866 @default.
- W2950230866 citedByCount "17" @default.
- W2950230866 countsByYear W29502308662018 @default.
- W2950230866 countsByYear W29502308662019 @default.
- W2950230866 countsByYear W29502308662020 @default.
- W2950230866 countsByYear W29502308662021 @default.
- W2950230866 crossrefType "posted-content" @default.
- W2950230866 hasAuthorship W2950230866A5012830032 @default.
- W2950230866 hasAuthorship W2950230866A5032266950 @default.
- W2950230866 hasAuthorship W2950230866A5034064327 @default.
- W2950230866 hasAuthorship W2950230866A5052552981 @default.
- W2950230866 hasConcept C105795698 @default.
- W2950230866 hasConcept C106189395 @default.
- W2950230866 hasConcept C113174947 @default.
- W2950230866 hasConcept C124952713 @default.
- W2950230866 hasConcept C126255220 @default.
- W2950230866 hasConcept C134306372 @default.
- W2950230866 hasConcept C142362112 @default.
- W2950230866 hasConcept C154945302 @default.
- W2950230866 hasConcept C159886148 @default.
- W2950230866 hasConcept C162324750 @default.
- W2950230866 hasConcept C2777212361 @default.
- W2950230866 hasConcept C2777303404 @default.
- W2950230866 hasConcept C2780980858 @default.
- W2950230866 hasConcept C33923547 @default.
- W2950230866 hasConcept C41008148 @default.
- W2950230866 hasConcept C50522688 @default.
- W2950230866 hasConceptScore W2950230866C105795698 @default.
- W2950230866 hasConceptScore W2950230866C106189395 @default.
- W2950230866 hasConceptScore W2950230866C113174947 @default.
- W2950230866 hasConceptScore W2950230866C124952713 @default.
- W2950230866 hasConceptScore W2950230866C126255220 @default.
- W2950230866 hasConceptScore W2950230866C134306372 @default.
- W2950230866 hasConceptScore W2950230866C142362112 @default.
- W2950230866 hasConceptScore W2950230866C154945302 @default.
- W2950230866 hasConceptScore W2950230866C159886148 @default.
- W2950230866 hasConceptScore W2950230866C162324750 @default.
- W2950230866 hasConceptScore W2950230866C2777212361 @default.
- W2950230866 hasConceptScore W2950230866C2777303404 @default.
- W2950230866 hasConceptScore W2950230866C2780980858 @default.
- W2950230866 hasConceptScore W2950230866C33923547 @default.
- W2950230866 hasConceptScore W2950230866C41008148 @default.
- W2950230866 hasConceptScore W2950230866C50522688 @default.
- W2950230866 hasLocation W29502308661 @default.
- W2950230866 hasOpenAccess W2950230866 @default.
- W2950230866 hasPrimaryLocation W29502308661 @default.
- W2950230866 hasRelatedWork W1511927616 @default.
- W2950230866 hasRelatedWork W1575592356 @default.
- W2950230866 hasRelatedWork W1757796397 @default.
- W2950230866 hasRelatedWork W1980035368 @default.
- W2950230866 hasRelatedWork W2121103318 @default.
- W2950230866 hasRelatedWork W2140135625 @default.