Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287726780> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4287726780 abstract "A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of safe exploration, most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods." @default.
- W4287726780 created "2022-07-26" @default.
- W4287726780 creator A5020630629 @default.
- W4287726780 creator A5051270969 @default.
- W4287726780 creator A5058043823 @default.
- W4287726780 date "2020-07-07" @default.
- W4287726780 modified "2023-09-25" @default.
- W4287726780 title "Provably Safe PAC-MDP Exploration Using Analogies" @default.
- W4287726780 doi "https://doi.org/10.48550/arxiv.2007.03574" @default.
- W4287726780 hasPublicationYear "2020" @default.
- W4287726780 type Work @default.
- W4287726780 citedByCount "0" @default.
- W4287726780 crossrefType "posted-content" @default.
- W4287726780 hasAuthorship W4287726780A5020630629 @default.
- W4287726780 hasAuthorship W4287726780A5051270969 @default.
- W4287726780 hasAuthorship W4287726780A5058043823 @default.
- W4287726780 hasBestOaLocation W42877267801 @default.
- W4287726780 hasConcept C102634674 @default.
- W4287726780 hasConcept C105795698 @default.
- W4287726780 hasConcept C106189395 @default.
- W4287726780 hasConcept C111472728 @default.
- W4287726780 hasConcept C111919701 @default.
- W4287726780 hasConcept C112930515 @default.
- W4287726780 hasConcept C127413603 @default.
- W4287726780 hasConcept C134306372 @default.
- W4287726780 hasConcept C138885662 @default.
- W4287726780 hasConcept C151201525 @default.
- W4287726780 hasConcept C154945302 @default.
- W4287726780 hasConcept C159886148 @default.
- W4287726780 hasConcept C165696696 @default.
- W4287726780 hasConcept C201995342 @default.
- W4287726780 hasConcept C26517878 @default.
- W4287726780 hasConcept C2778445095 @default.
- W4287726780 hasConcept C2780451532 @default.
- W4287726780 hasConcept C33923547 @default.
- W4287726780 hasConcept C38652104 @default.
- W4287726780 hasConcept C41008148 @default.
- W4287726780 hasConcept C71924100 @default.
- W4287726780 hasConcept C75553542 @default.
- W4287726780 hasConcept C97541855 @default.
- W4287726780 hasConcept C98045186 @default.
- W4287726780 hasConceptScore W4287726780C102634674 @default.
- W4287726780 hasConceptScore W4287726780C105795698 @default.
- W4287726780 hasConceptScore W4287726780C106189395 @default.
- W4287726780 hasConceptScore W4287726780C111472728 @default.
- W4287726780 hasConceptScore W4287726780C111919701 @default.
- W4287726780 hasConceptScore W4287726780C112930515 @default.
- W4287726780 hasConceptScore W4287726780C127413603 @default.
- W4287726780 hasConceptScore W4287726780C134306372 @default.
- W4287726780 hasConceptScore W4287726780C138885662 @default.
- W4287726780 hasConceptScore W4287726780C151201525 @default.
- W4287726780 hasConceptScore W4287726780C154945302 @default.
- W4287726780 hasConceptScore W4287726780C159886148 @default.
- W4287726780 hasConceptScore W4287726780C165696696 @default.
- W4287726780 hasConceptScore W4287726780C201995342 @default.
- W4287726780 hasConceptScore W4287726780C26517878 @default.
- W4287726780 hasConceptScore W4287726780C2778445095 @default.
- W4287726780 hasConceptScore W4287726780C2780451532 @default.
- W4287726780 hasConceptScore W4287726780C33923547 @default.
- W4287726780 hasConceptScore W4287726780C38652104 @default.
- W4287726780 hasConceptScore W4287726780C41008148 @default.
- W4287726780 hasConceptScore W4287726780C71924100 @default.
- W4287726780 hasConceptScore W4287726780C75553542 @default.
- W4287726780 hasConceptScore W4287726780C97541855 @default.
- W4287726780 hasConceptScore W4287726780C98045186 @default.
- W4287726780 hasLocation W42877267801 @default.
- W4287726780 hasOpenAccess W4287726780 @default.
- W4287726780 hasPrimaryLocation W42877267801 @default.
- W4287726780 hasRelatedWork W1517383877 @default.
- W4287726780 hasRelatedWork W1874176344 @default.
- W4287726780 hasRelatedWork W2101748387 @default.
- W4287726780 hasRelatedWork W2949964922 @default.
- W4287726780 hasRelatedWork W2952448454 @default.
- W4287726780 hasRelatedWork W3040161731 @default.
- W4287726780 hasRelatedWork W3157242750 @default.
- W4287726780 hasRelatedWork W3213838085 @default.
- W4287726780 hasRelatedWork W4297095626 @default.
- W4287726780 hasRelatedWork W4313591620 @default.
- W4287726780 isParatext "false" @default.
- W4287726780 isRetracted "false" @default.
- W4287726780 workType "article" @default.