Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387355622> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W4387355622 abstract "Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results which allows for training of better options. We demonstrate experimentally that planning using options learned with Option Iteration leads to a significant benefit in challenging planning environments compared to an analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration." @default.
- W4387355622 created "2023-10-05" @default.
- W4387355622 creator A5004923102 @default.
- W4387355622 creator A5007831673 @default.
- W4387355622 date "2023-10-02" @default.
- W4387355622 modified "2023-10-06" @default.
- W4387355622 title "Iterative Option Discovery for Planning, by Planning" @default.
- W4387355622 doi "https://doi.org/10.48550/arxiv.2310.01569" @default.
- W4387355622 hasPublicationYear "2023" @default.
- W4387355622 type Work @default.
- W4387355622 citedByCount "0" @default.
- W4387355622 crossrefType "posted-content" @default.
- W4387355622 hasAuthorship W4387355622A5004923102 @default.
- W4387355622 hasAuthorship W4387355622A5007831673 @default.
- W4387355622 hasBestOaLocation W43873556221 @default.
- W4387355622 hasConcept C105795698 @default.
- W4387355622 hasConcept C119857082 @default.
- W4387355622 hasConcept C126255220 @default.
- W4387355622 hasConcept C154945302 @default.
- W4387355622 hasConcept C177264268 @default.
- W4387355622 hasConcept C18903297 @default.
- W4387355622 hasConcept C199360897 @default.
- W4387355622 hasConcept C26517878 @default.
- W4387355622 hasConcept C28761237 @default.
- W4387355622 hasConcept C33923547 @default.
- W4387355622 hasConcept C38652104 @default.
- W4387355622 hasConcept C41008148 @default.
- W4387355622 hasConcept C70771513 @default.
- W4387355622 hasConcept C72434380 @default.
- W4387355622 hasConcept C86803240 @default.
- W4387355622 hasConcept C97541855 @default.
- W4387355622 hasConceptScore W4387355622C105795698 @default.
- W4387355622 hasConceptScore W4387355622C119857082 @default.
- W4387355622 hasConceptScore W4387355622C126255220 @default.
- W4387355622 hasConceptScore W4387355622C154945302 @default.
- W4387355622 hasConceptScore W4387355622C177264268 @default.
- W4387355622 hasConceptScore W4387355622C18903297 @default.
- W4387355622 hasConceptScore W4387355622C199360897 @default.
- W4387355622 hasConceptScore W4387355622C26517878 @default.
- W4387355622 hasConceptScore W4387355622C28761237 @default.
- W4387355622 hasConceptScore W4387355622C33923547 @default.
- W4387355622 hasConceptScore W4387355622C38652104 @default.
- W4387355622 hasConceptScore W4387355622C41008148 @default.
- W4387355622 hasConceptScore W4387355622C70771513 @default.
- W4387355622 hasConceptScore W4387355622C72434380 @default.
- W4387355622 hasConceptScore W4387355622C86803240 @default.
- W4387355622 hasConceptScore W4387355622C97541855 @default.
- W4387355622 hasLocation W43873556221 @default.
- W4387355622 hasOpenAccess W4387355622 @default.
- W4387355622 hasPrimaryLocation W43873556221 @default.
- W4387355622 hasRelatedWork W2126211886 @default.
- W4387355622 hasRelatedWork W2350784623 @default.
- W4387355622 hasRelatedWork W2992629954 @default.
- W4387355622 hasRelatedWork W2999580272 @default.
- W4387355622 hasRelatedWork W3009457412 @default.
- W4387355622 hasRelatedWork W3212257828 @default.
- W4387355622 hasRelatedWork W4297873223 @default.
- W4387355622 hasRelatedWork W4306904969 @default.
- W4387355622 hasRelatedWork W4362501864 @default.
- W4387355622 hasRelatedWork W4225571923 @default.
- W4387355622 isParatext "false" @default.
- W4387355622 isRetracted "false" @default.
- W4387355622 workType "article" @default.