Matches in SemOpenAlex for { <https://semopenalex.org/work/W4286908619> ?p ?o ?g. }
Showing items 1 to 72 of
72
with 100 items per page.
- W4286908619 abstract "Although model-based reinforcement learning (RL) approaches are considered more sample efficient, existing algorithms are usually relying on sophisticated planning algorithm to couple tightly with the model-learning procedure. Hence the learned models may lack the ability of being re-used with more specialized planners. In this paper we address this issue and provide approaches to learn an RL model efficiently without the guidance of a reward signal. In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase and demand that emph{any planning algorithm} on the learned model can give a near-optimal policy. Specicially, we focus on the linear mixture MDP setting, where the probability transition matrix is a (unknown) convex combination of a set of existing models. We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking $tilde{O}(d^2H^3/epsilon^2)$ interactions with the environment and emph{any} $epsilon$-optimal planner on the model gives an $O(epsilon)$-optimal policy on the original model. This sample complexity matches lower bounds for non-plug-in approaches and is emph{statistically optimal}. We achieve this result by leveraging a careful maximum total-variance bound using Bernstein inequality and properties specified to linear mixture MDP." @default.
- W4286908619 created "2022-07-25" @default.
- W4286908619 creator A5043094856 @default.
- W4286908619 creator A5055723755 @default.
- W4286908619 creator A5072096775 @default.
- W4286908619 creator A5073109266 @default.
- W4286908619 date "2021-10-07" @default.
- W4286908619 modified "2023-09-23" @default.
- W4286908619 title "Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver" @default.
- W4286908619 hasPublicationYear "2021" @default.
- W4286908619 type Work @default.
- W4286908619 citedByCount "0" @default.
- W4286908619 crossrefType "posted-content" @default.
- W4286908619 hasAuthorship W4286908619A5043094856 @default.
- W4286908619 hasAuthorship W4286908619A5055723755 @default.
- W4286908619 hasAuthorship W4286908619A5072096775 @default.
- W4286908619 hasAuthorship W4286908619A5073109266 @default.
- W4286908619 hasBestOaLocation W42869086191 @default.
- W4286908619 hasConcept C120665830 @default.
- W4286908619 hasConcept C121332964 @default.
- W4286908619 hasConcept C121955636 @default.
- W4286908619 hasConcept C126255220 @default.
- W4286908619 hasConcept C144133560 @default.
- W4286908619 hasConcept C154945302 @default.
- W4286908619 hasConcept C177264268 @default.
- W4286908619 hasConcept C185592680 @default.
- W4286908619 hasConcept C192209626 @default.
- W4286908619 hasConcept C196083921 @default.
- W4286908619 hasConcept C198531522 @default.
- W4286908619 hasConcept C199360897 @default.
- W4286908619 hasConcept C2776999362 @default.
- W4286908619 hasConcept C2778445095 @default.
- W4286908619 hasConcept C2778770139 @default.
- W4286908619 hasConcept C33923547 @default.
- W4286908619 hasConcept C41008148 @default.
- W4286908619 hasConcept C43617362 @default.
- W4286908619 hasConcept C97541855 @default.
- W4286908619 hasConceptScore W4286908619C120665830 @default.
- W4286908619 hasConceptScore W4286908619C121332964 @default.
- W4286908619 hasConceptScore W4286908619C121955636 @default.
- W4286908619 hasConceptScore W4286908619C126255220 @default.
- W4286908619 hasConceptScore W4286908619C144133560 @default.
- W4286908619 hasConceptScore W4286908619C154945302 @default.
- W4286908619 hasConceptScore W4286908619C177264268 @default.
- W4286908619 hasConceptScore W4286908619C185592680 @default.
- W4286908619 hasConceptScore W4286908619C192209626 @default.
- W4286908619 hasConceptScore W4286908619C196083921 @default.
- W4286908619 hasConceptScore W4286908619C198531522 @default.
- W4286908619 hasConceptScore W4286908619C199360897 @default.
- W4286908619 hasConceptScore W4286908619C2776999362 @default.
- W4286908619 hasConceptScore W4286908619C2778445095 @default.
- W4286908619 hasConceptScore W4286908619C2778770139 @default.
- W4286908619 hasConceptScore W4286908619C33923547 @default.
- W4286908619 hasConceptScore W4286908619C41008148 @default.
- W4286908619 hasConceptScore W4286908619C43617362 @default.
- W4286908619 hasConceptScore W4286908619C97541855 @default.
- W4286908619 hasLocation W42869086191 @default.
- W4286908619 hasOpenAccess W4286908619 @default.
- W4286908619 hasPrimaryLocation W42869086191 @default.
- W4286908619 hasRelatedWork W10212228 @default.
- W4286908619 hasRelatedWork W12180670 @default.
- W4286908619 hasRelatedWork W1279312 @default.
- W4286908619 hasRelatedWork W13717812 @default.
- W4286908619 hasRelatedWork W3697118 @default.
- W4286908619 hasRelatedWork W706759 @default.
- W4286908619 hasRelatedWork W7149022 @default.
- W4286908619 hasRelatedWork W7225426 @default.
- W4286908619 hasRelatedWork W7342293 @default.
- W4286908619 hasRelatedWork W13071157 @default.
- W4286908619 isParatext "false" @default.
- W4286908619 isRetracted "false" @default.
- W4286908619 workType "article" @default.