Matches in SemOpenAlex for { <https://semopenalex.org/work/W2963215512> ?p ?o ?g. }
Showing items 1 to 98 of
98
with 100 items per page.
- W2963215512 abstract "Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL. However, the theoretical understanding of such methods has been rather limited. This paper introduces a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees. We design a meta-algorithm with a theoretical guarantee of monotone improvement to a local maximum of the expected reward. The meta-algorithm iteratively builds a lower bound of the expected reward based on the estimated dynamical model and sample trajectories, and then maximizes the lower bound jointly over the policy and the model. The framework extends the optimism-in-face-of-uncertainty principle to non-linear dynamical models in a way that requires textit{no explicit} uncertainty quantification. Instantiating our framework with simplification gives a variant of model-based RL algorithms Stochastic Lower Bounds Optimization (SLBO). Experiments demonstrate that SLBO achieves state-of-the-art performance when only one million or fewer samples are permitted on a range of continuous control benchmark tasks." @default.
- W2963215512 created "2019-07-30" @default.
- W2963215512 creator A5019405863 @default.
- W2963215512 creator A5029105520 @default.
- W2963215512 creator A5049093671 @default.
- W2963215512 creator A5061905935 @default.
- W2963215512 creator A5070340856 @default.
- W2963215512 creator A5084821923 @default.
- W2963215512 date "2018-09-27" @default.
- W2963215512 modified "2023-10-01" @default.
- W2963215512 title "Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees." @default.
- W2963215512 hasPublicationYear "2018" @default.
- W2963215512 type Work @default.
- W2963215512 sameAs 2963215512 @default.
- W2963215512 citedByCount "42" @default.
- W2963215512 countsByYear W29632155122019 @default.
- W2963215512 countsByYear W29632155122020 @default.
- W2963215512 countsByYear W29632155122021 @default.
- W2963215512 crossrefType "proceedings-article" @default.
- W2963215512 hasAuthorship W2963215512A5019405863 @default.
- W2963215512 hasAuthorship W2963215512A5029105520 @default.
- W2963215512 hasAuthorship W2963215512A5049093671 @default.
- W2963215512 hasAuthorship W2963215512A5061905935 @default.
- W2963215512 hasAuthorship W2963215512A5070340856 @default.
- W2963215512 hasAuthorship W2963215512A5084821923 @default.
- W2963215512 hasConcept C11413529 @default.
- W2963215512 hasConcept C126255220 @default.
- W2963215512 hasConcept C13280743 @default.
- W2963215512 hasConcept C134306372 @default.
- W2963215512 hasConcept C144024400 @default.
- W2963215512 hasConcept C154945302 @default.
- W2963215512 hasConcept C159985019 @default.
- W2963215512 hasConcept C176248197 @default.
- W2963215512 hasConcept C185798385 @default.
- W2963215512 hasConcept C192562407 @default.
- W2963215512 hasConcept C204323151 @default.
- W2963215512 hasConcept C205649164 @default.
- W2963215512 hasConcept C2524010 @default.
- W2963215512 hasConcept C2778445095 @default.
- W2963215512 hasConcept C2779304628 @default.
- W2963215512 hasConcept C2834757 @default.
- W2963215512 hasConcept C32254414 @default.
- W2963215512 hasConcept C33923547 @default.
- W2963215512 hasConcept C36289849 @default.
- W2963215512 hasConcept C41008148 @default.
- W2963215512 hasConcept C77553402 @default.
- W2963215512 hasConcept C77967617 @default.
- W2963215512 hasConcept C97541855 @default.
- W2963215512 hasConceptScore W2963215512C11413529 @default.
- W2963215512 hasConceptScore W2963215512C126255220 @default.
- W2963215512 hasConceptScore W2963215512C13280743 @default.
- W2963215512 hasConceptScore W2963215512C134306372 @default.
- W2963215512 hasConceptScore W2963215512C144024400 @default.
- W2963215512 hasConceptScore W2963215512C154945302 @default.
- W2963215512 hasConceptScore W2963215512C159985019 @default.
- W2963215512 hasConceptScore W2963215512C176248197 @default.
- W2963215512 hasConceptScore W2963215512C185798385 @default.
- W2963215512 hasConceptScore W2963215512C192562407 @default.
- W2963215512 hasConceptScore W2963215512C204323151 @default.
- W2963215512 hasConceptScore W2963215512C205649164 @default.
- W2963215512 hasConceptScore W2963215512C2524010 @default.
- W2963215512 hasConceptScore W2963215512C2778445095 @default.
- W2963215512 hasConceptScore W2963215512C2779304628 @default.
- W2963215512 hasConceptScore W2963215512C2834757 @default.
- W2963215512 hasConceptScore W2963215512C32254414 @default.
- W2963215512 hasConceptScore W2963215512C33923547 @default.
- W2963215512 hasConceptScore W2963215512C36289849 @default.
- W2963215512 hasConceptScore W2963215512C41008148 @default.
- W2963215512 hasConceptScore W2963215512C77553402 @default.
- W2963215512 hasConceptScore W2963215512C77967617 @default.
- W2963215512 hasConceptScore W2963215512C97541855 @default.
- W2963215512 hasLocation W29632155121 @default.
- W2963215512 hasOpenAccess W2963215512 @default.
- W2963215512 hasPrimaryLocation W29632155121 @default.
- W2963215512 hasRelatedWork W1491843047 @default.
- W2963215512 hasRelatedWork W1771410628 @default.
- W2963215512 hasRelatedWork W2121863487 @default.
- W2963215512 hasRelatedWork W2140135625 @default.
- W2963215512 hasRelatedWork W2145339207 @default.
- W2963215512 hasRelatedWork W2158782408 @default.
- W2963215512 hasRelatedWork W2257979135 @default.
- W2963215512 hasRelatedWork W2736601468 @default.
- W2963215512 hasRelatedWork W2789824229 @default.
- W2963215512 hasRelatedWork W2892230114 @default.
- W2963215512 hasRelatedWork W2920362155 @default.
- W2963215512 hasRelatedWork W2953708620 @default.
- W2963215512 hasRelatedWork W2962804251 @default.
- W2963215512 hasRelatedWork W2962872206 @default.
- W2963215512 hasRelatedWork W2962902376 @default.
- W2963215512 hasRelatedWork W2963846183 @default.
- W2963215512 hasRelatedWork W2963960193 @default.
- W2963215512 hasRelatedWork W2964006217 @default.
- W2963215512 hasRelatedWork W2964220198 @default.
- W2963215512 hasRelatedWork W2970277495 @default.
- W2963215512 isParatext "false" @default.
- W2963215512 isRetracted "false" @default.
- W2963215512 magId "2963215512" @default.
- W2963215512 workType "article" @default.