Matches in SemOpenAlex for { <https://semopenalex.org/work/W3202898482> ?p ?o ?g. }
- W3202898482 abstract "Although model-based reinforcement learning (RL) approaches are considered more sample efficient, existing algorithms are usually relying on sophisticated planning algorithm to couple tightly with the model-learning procedure. Hence the learned models may lack the ability of being re-used with more specialized planners. In this paper we address this issue and provide approaches to learn an RL model efficiently without the guidance of a reward signal. In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase and demand that emph{any planning algorithm} on the learned model can give a near-optimal policy. Specicially, we focus on the linear mixture MDP setting, where the probability transition matrix is a (unknown) convex combination of a set of existing models. We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking $tilde{O}(d^2H^3/epsilon^2)$ interactions with the environment and emph{any} $epsilon$-optimal planner on the model gives an $O(epsilon)$-optimal policy on the original model. This sample complexity matches lower bounds for non-plug-in approaches and is emph{statistically optimal}. We achieve this result by leveraging a careful maximum total-variance bound using Bernstein inequality and properties specified to linear mixture MDP." @default.
- W3202898482 created "2021-10-11" @default.
- W3202898482 creator A5043094856 @default.
- W3202898482 creator A5055723755 @default.
- W3202898482 creator A5072096775 @default.
- W3202898482 creator A5073109266 @default.
- W3202898482 date "2021-10-07" @default.
- W3202898482 modified "2023-09-23" @default.
- W3202898482 title "Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver" @default.
- W3202898482 cites W1867103660 @default.
- W3202898482 cites W2119738618 @default.
- W3202898482 cites W2120678009 @default.
- W3202898482 cites W2125510930 @default.
- W3202898482 cites W2168877982 @default.
- W3202898482 cites W2545659366 @default.
- W3202898482 cites W2614839826 @default.
- W3202898482 cites W2816041711 @default.
- W3202898482 cites W2920362155 @default.
- W3202898482 cites W2953708620 @default.
- W3202898482 cites W2956123884 @default.
- W3202898482 cites W2963215512 @default.
- W3202898482 cites W2964054583 @default.
- W3202898482 cites W2979895842 @default.
- W3202898482 cites W3025606523 @default.
- W3202898482 cites W3028766998 @default.
- W3202898482 cites W3029753614 @default.
- W3202898482 cites W3034335560 @default.
- W3202898482 cites W3034871777 @default.
- W3202898482 cites W3034897261 @default.
- W3202898482 cites W3035273634 @default.
- W3202898482 cites W3035454135 @default.
- W3202898482 cites W3035599863 @default.
- W3202898482 cites W3036002380 @default.
- W3202898482 cites W3036498527 @default.
- W3202898482 cites W3037850847 @default.
- W3202898482 cites W3044126222 @default.
- W3202898482 cites W3046395471 @default.
- W3202898482 cites W3046692137 @default.
- W3202898482 cites W3076127970 @default.
- W3202898482 cites W3093056657 @default.
- W3202898482 cites W3111437863 @default.
- W3202898482 cites W3129608001 @default.
- W3202898482 cites W3145894457 @default.
- W3202898482 cites W3167017381 @default.
- W3202898482 cites W3167472281 @default.
- W3202898482 cites W3169423158 @default.
- W3202898482 cites W3170402103 @default.
- W3202898482 cites W3192782817 @default.
- W3202898482 cites W3207591282 @default.
- W3202898482 cites W779494576 @default.
- W3202898482 cites W3139488757 @default.
- W3202898482 doi "https://doi.org/10.48550/arxiv.2110.03244" @default.
- W3202898482 hasPublicationYear "2021" @default.
- W3202898482 type Work @default.
- W3202898482 sameAs 3202898482 @default.
- W3202898482 citedByCount "1" @default.
- W3202898482 countsByYear W32028984822021 @default.
- W3202898482 crossrefType "posted-content" @default.
- W3202898482 hasAuthorship W3202898482A5043094856 @default.
- W3202898482 hasAuthorship W3202898482A5055723755 @default.
- W3202898482 hasAuthorship W3202898482A5072096775 @default.
- W3202898482 hasAuthorship W3202898482A5073109266 @default.
- W3202898482 hasBestOaLocation W32028984821 @default.
- W3202898482 hasConcept C120665830 @default.
- W3202898482 hasConcept C121332964 @default.
- W3202898482 hasConcept C121955636 @default.
- W3202898482 hasConcept C126255220 @default.
- W3202898482 hasConcept C144133560 @default.
- W3202898482 hasConcept C154945302 @default.
- W3202898482 hasConcept C177264268 @default.
- W3202898482 hasConcept C192209626 @default.
- W3202898482 hasConcept C196083921 @default.
- W3202898482 hasConcept C199360897 @default.
- W3202898482 hasConcept C2778770139 @default.
- W3202898482 hasConcept C33923547 @default.
- W3202898482 hasConcept C41008148 @default.
- W3202898482 hasConcept C97541855 @default.
- W3202898482 hasConceptScore W3202898482C120665830 @default.
- W3202898482 hasConceptScore W3202898482C121332964 @default.
- W3202898482 hasConceptScore W3202898482C121955636 @default.
- W3202898482 hasConceptScore W3202898482C126255220 @default.
- W3202898482 hasConceptScore W3202898482C144133560 @default.
- W3202898482 hasConceptScore W3202898482C154945302 @default.
- W3202898482 hasConceptScore W3202898482C177264268 @default.
- W3202898482 hasConceptScore W3202898482C192209626 @default.
- W3202898482 hasConceptScore W3202898482C196083921 @default.
- W3202898482 hasConceptScore W3202898482C199360897 @default.
- W3202898482 hasConceptScore W3202898482C2778770139 @default.
- W3202898482 hasConceptScore W3202898482C33923547 @default.
- W3202898482 hasConceptScore W3202898482C41008148 @default.
- W3202898482 hasConceptScore W3202898482C97541855 @default.
- W3202898482 hasLocation W32028984821 @default.
- W3202898482 hasOpenAccess W3202898482 @default.
- W3202898482 hasPrimaryLocation W32028984821 @default.
- W3202898482 hasRelatedWork W1562959674 @default.
- W3202898482 hasRelatedWork W2032560733 @default.
- W3202898482 hasRelatedWork W2952472710 @default.
- W3202898482 hasRelatedWork W2997849647 @default.
- W3202898482 hasRelatedWork W3037422413 @default.
- W3202898482 hasRelatedWork W3126373388 @default.