Matches in SemOpenAlex for { <https://semopenalex.org/work/W2890211540> ?p ?o ?g. }
- W2890211540 abstract "Author(s): Moldovan, Teodor Mihai | Advisor(s): Abbeel, Pieter; Jordan, Michael I | Abstract: Replicating the human ability to solve complex planning problems based on minimal prior knowledge has been extensively studied in the field of reinforcement learning. Algorithms for discrete or approximate models are supported by theoretical guarantees but the necessary assumptions are often constraining. We aim to extend these results in the direction of practical applicability to more realistic settings. Our contributions are restricted to three specific aspects of practical problems that we believe to be important when applying reinforcement learning techniques: risk awareness, safe exploration and data efficient exploration. Risk awareness is important in planning situations where restarts are not available and performance depends on one-off returns rather than average returns. The expected return is no longer an appropriate objective because the law of large numbers does not apply. In Chapter 2 we propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties, relating it to previously proposed risk-aware objectives: minmax, exponential utility, percentile and mean minus variance. In environments with uncertain dynamics, exploration is often necessary to improve performance. Existing reinforcement learning algorithms provide theoretical exploration guarantees, but they tendto rely on the assumption that any state is eventually reachable from any other state by following a suitable policy. For most physical systems this assumption is impractical as the systems would break before any reasonable exploration has taken place. In Chapter 3 weaddress the need for a safe exploration method. In Chapter 4 we address the specific challenges presented by extending model-based reinforcement learning methods from discrete to continuous dynamical systems. System representations based on explicitly enumerated states are not longer applicable. To address this challenge we use a Dirichlet process mixture of linear models to represent dynamics. The proposed model strikes a good balance between compact representation and flexibility. To address the challenge of efficient exploration-exploitation trade-off we apply the principle of Optimism in the Face of Uncertainty that underlies numerous other provably efficient algorithms in simpler settings. Our algorithm reduces the exploration problem to a sequence of classical optimal control problems. Synthetic experiments illustrate the effectiveness of our methods." @default.
- W2890211540 created "2018-09-27" @default.
- W2890211540 creator A5001855685 @default.
- W2890211540 date "2014-01-01" @default.
- W2890211540 modified "2023-09-24" @default.
- W2890211540 title "Safety, Risk Awareness and Exploration in Reinforcement Learning" @default.
- W2890211540 cites W134786152 @default.
- W2890211540 cites W1501823362 @default.
- W2890211540 cites W1505937442 @default.
- W2890211540 cites W1518931405 @default.
- W2890211540 cites W15411808 @default.
- W2890211540 cites W1593140824 @default.
- W2890211540 cites W195033972 @default.
- W2890211540 cites W1985249384 @default.
- W2890211540 cites W1987073631 @default.
- W2890211540 cites W1998534269 @default.
- W2890211540 cites W2016211524 @default.
- W2890211540 cites W2018705428 @default.
- W2890211540 cites W2026570525 @default.
- W2890211540 cites W2058066080 @default.
- W2890211540 cites W2063575294 @default.
- W2890211540 cites W2066118288 @default.
- W2890211540 cites W2069045459 @default.
- W2890211540 cites W2077902449 @default.
- W2890211540 cites W2082691056 @default.
- W2890211540 cites W2086304253 @default.
- W2890211540 cites W2088413745 @default.
- W2890211540 cites W2102884619 @default.
- W2890211540 cites W2113789941 @default.
- W2890211540 cites W2117481033 @default.
- W2890211540 cites W2121863487 @default.
- W2890211540 cites W2123681024 @default.
- W2890211540 cites W2124352385 @default.
- W2890211540 cites W2127192854 @default.
- W2890211540 cites W2127498532 @default.
- W2890211540 cites W2134491302 @default.
- W2890211540 cites W2136074593 @default.
- W2890211540 cites W2136462844 @default.
- W2890211540 cites W2140135625 @default.
- W2890211540 cites W2145487433 @default.
- W2890211540 cites W2149365825 @default.
- W2890211540 cites W2153704625 @default.
- W2890211540 cites W2154032554 @default.
- W2890211540 cites W2165622730 @default.
- W2890211540 cites W2169071224 @default.
- W2890211540 cites W2279759792 @default.
- W2890211540 cites W2296319761 @default.
- W2890211540 cites W2342548755 @default.
- W2890211540 cites W2488247662 @default.
- W2890211540 cites W2951778453 @default.
- W2890211540 cites W2952647718 @default.
- W2890211540 cites W2952720101 @default.
- W2890211540 hasPublicationYear "2014" @default.
- W2890211540 type Work @default.
- W2890211540 sameAs 2890211540 @default.
- W2890211540 citedByCount "0" @default.
- W2890211540 crossrefType "journal-article" @default.
- W2890211540 hasAuthorship W2890211540A5001855685 @default.
- W2890211540 hasConcept C112930515 @default.
- W2890211540 hasConcept C119857082 @default.
- W2890211540 hasConcept C121955636 @default.
- W2890211540 hasConcept C126255220 @default.
- W2890211540 hasConcept C144133560 @default.
- W2890211540 hasConcept C149728462 @default.
- W2890211540 hasConcept C154945302 @default.
- W2890211540 hasConcept C196083921 @default.
- W2890211540 hasConcept C33923547 @default.
- W2890211540 hasConcept C41008148 @default.
- W2890211540 hasConcept C42475967 @default.
- W2890211540 hasConcept C71924100 @default.
- W2890211540 hasConcept C97541855 @default.
- W2890211540 hasConceptScore W2890211540C112930515 @default.
- W2890211540 hasConceptScore W2890211540C119857082 @default.
- W2890211540 hasConceptScore W2890211540C121955636 @default.
- W2890211540 hasConceptScore W2890211540C126255220 @default.
- W2890211540 hasConceptScore W2890211540C144133560 @default.
- W2890211540 hasConceptScore W2890211540C149728462 @default.
- W2890211540 hasConceptScore W2890211540C154945302 @default.
- W2890211540 hasConceptScore W2890211540C196083921 @default.
- W2890211540 hasConceptScore W2890211540C33923547 @default.
- W2890211540 hasConceptScore W2890211540C41008148 @default.
- W2890211540 hasConceptScore W2890211540C42475967 @default.
- W2890211540 hasConceptScore W2890211540C71924100 @default.
- W2890211540 hasConceptScore W2890211540C97541855 @default.
- W2890211540 hasLocation W28902115401 @default.
- W2890211540 hasOpenAccess W2890211540 @default.
- W2890211540 hasPrimaryLocation W28902115401 @default.
- W2890211540 hasRelatedWork W142858861 @default.
- W2890211540 hasRelatedWork W1616330627 @default.
- W2890211540 hasRelatedWork W2524179627 @default.
- W2890211540 hasRelatedWork W2773896370 @default.
- W2890211540 hasRelatedWork W2787477569 @default.
- W2890211540 hasRelatedWork W2799385102 @default.
- W2890211540 hasRelatedWork W2945132723 @default.
- W2890211540 hasRelatedWork W2952580032 @default.
- W2890211540 hasRelatedWork W2952720101 @default.
- W2890211540 hasRelatedWork W2953228110 @default.
- W2890211540 hasRelatedWork W2963482581 @default.
- W2890211540 hasRelatedWork W2970355847 @default.
- W2890211540 hasRelatedWork W2991551448 @default.