Matches in SemOpenAlex for { <https://semopenalex.org/work/W2807939031> ?p ?o ?g. }
- W2807939031 abstract "Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach human-level performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the Iterated-Width (IW) planner, a state of the art planner that makes explicit use of the state representation to perform structured exploration. IW is able to scale up to problems independently of the size of the state space. From the state-actions visited by IW, the learning step estimates a compact policy, which in turn is used to guide the planning step. The type of exploration used by our method is radically different than the standard random exploration used in RL. We evaluate our method in simple problems where we show it to have superior performance than the state-of-the-art reinforcement learning algorithms A2C and Alpha Zero. Finally, we present preliminary results in a subset of the Atari games suite." @default.
- W2807939031 created "2018-06-21" @default.
- W2807939031 creator A5053716315 @default.
- W2807939031 creator A5063092583 @default.
- W2807939031 creator A5086943516 @default.
- W2807939031 date "2018-06-15" @default.
- W2807939031 modified "2023-09-26" @default.
- W2807939031 title "Improving width-based planning with compact policies." @default.
- W2807939031 cites W1757796397 @default.
- W2807939031 cites W1777239053 @default.
- W2807939031 cites W1987411046 @default.
- W2807939031 cites W1997477668 @default.
- W2807939031 cites W2020920737 @default.
- W2807939031 cites W2098614265 @default.
- W2807939031 cites W2139612737 @default.
- W2807939031 cites W2145339207 @default.
- W2807939031 cites W2151210636 @default.
- W2807939031 cites W2190606234 @default.
- W2807939031 cites W2257979135 @default.
- W2807939031 cites W2279668280 @default.
- W2807939031 cites W2280163991 @default.
- W2807939031 cites W2401523698 @default.
- W2807939031 cites W2489939061 @default.
- W2807939031 cites W2606757878 @default.
- W2807939031 cites W2663108269 @default.
- W2807939031 cites W2766447205 @default.
- W2807939031 cites W2772709170 @default.
- W2807939031 cites W2783392051 @default.
- W2807939031 cites W2788989367 @default.
- W2807939031 cites W2962957031 @default.
- W2807939031 cites W2963024489 @default.
- W2807939031 cites W2964174623 @default.
- W2807939031 hasPublicationYear "2018" @default.
- W2807939031 type Work @default.
- W2807939031 sameAs 2807939031 @default.
- W2807939031 citedByCount "1" @default.
- W2807939031 countsByYear W28079390312019 @default.
- W2807939031 crossrefType "posted-content" @default.
- W2807939031 hasAuthorship W2807939031A5053716315 @default.
- W2807939031 hasAuthorship W2807939031A5063092583 @default.
- W2807939031 hasAuthorship W2807939031A5086943516 @default.
- W2807939031 hasConcept C105795698 @default.
- W2807939031 hasConcept C11413529 @default.
- W2807939031 hasConcept C119857082 @default.
- W2807939031 hasConcept C121332964 @default.
- W2807939031 hasConcept C126255220 @default.
- W2807939031 hasConcept C134306372 @default.
- W2807939031 hasConcept C140479938 @default.
- W2807939031 hasConcept C154945302 @default.
- W2807939031 hasConcept C166957645 @default.
- W2807939031 hasConcept C17744445 @default.
- W2807939031 hasConcept C199539241 @default.
- W2807939031 hasConcept C2776359362 @default.
- W2807939031 hasConcept C2776999362 @default.
- W2807939031 hasConcept C2778755073 @default.
- W2807939031 hasConcept C33923547 @default.
- W2807939031 hasConcept C41008148 @default.
- W2807939031 hasConcept C48103436 @default.
- W2807939031 hasConcept C62520636 @default.
- W2807939031 hasConcept C72434380 @default.
- W2807939031 hasConcept C79581498 @default.
- W2807939031 hasConcept C81074085 @default.
- W2807939031 hasConcept C90509273 @default.
- W2807939031 hasConcept C94625758 @default.
- W2807939031 hasConcept C95457728 @default.
- W2807939031 hasConcept C97541855 @default.
- W2807939031 hasConceptScore W2807939031C105795698 @default.
- W2807939031 hasConceptScore W2807939031C11413529 @default.
- W2807939031 hasConceptScore W2807939031C119857082 @default.
- W2807939031 hasConceptScore W2807939031C121332964 @default.
- W2807939031 hasConceptScore W2807939031C126255220 @default.
- W2807939031 hasConceptScore W2807939031C134306372 @default.
- W2807939031 hasConceptScore W2807939031C140479938 @default.
- W2807939031 hasConceptScore W2807939031C154945302 @default.
- W2807939031 hasConceptScore W2807939031C166957645 @default.
- W2807939031 hasConceptScore W2807939031C17744445 @default.
- W2807939031 hasConceptScore W2807939031C199539241 @default.
- W2807939031 hasConceptScore W2807939031C2776359362 @default.
- W2807939031 hasConceptScore W2807939031C2776999362 @default.
- W2807939031 hasConceptScore W2807939031C2778755073 @default.
- W2807939031 hasConceptScore W2807939031C33923547 @default.
- W2807939031 hasConceptScore W2807939031C41008148 @default.
- W2807939031 hasConceptScore W2807939031C48103436 @default.
- W2807939031 hasConceptScore W2807939031C62520636 @default.
- W2807939031 hasConceptScore W2807939031C72434380 @default.
- W2807939031 hasConceptScore W2807939031C79581498 @default.
- W2807939031 hasConceptScore W2807939031C81074085 @default.
- W2807939031 hasConceptScore W2807939031C90509273 @default.
- W2807939031 hasConceptScore W2807939031C94625758 @default.
- W2807939031 hasConceptScore W2807939031C95457728 @default.
- W2807939031 hasConceptScore W2807939031C97541855 @default.
- W2807939031 hasLocation W28079390311 @default.
- W2807939031 hasOpenAccess W2807939031 @default.
- W2807939031 hasPrimaryLocation W28079390311 @default.
- W2807939031 hasRelatedWork W1515308897 @default.
- W2807939031 hasRelatedWork W1601640269 @default.
- W2807939031 hasRelatedWork W18175453 @default.
- W2807939031 hasRelatedWork W1884070896 @default.
- W2807939031 hasRelatedWork W1976800061 @default.
- W2807939031 hasRelatedWork W1996625075 @default.