Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287028738> ?p ?o ?g. }
Showing items 1 to 78 of
78
with 100 items per page.
- W4287028738 abstract "Training-time safety violations have been a major concern when we deploy reinforcement learning algorithms in the real world. This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data. We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies. The barrier certificates, learned via adversarial training, ensure the policy's safety assuming calibrated learned dynamics model. We also add a regularization term to encourage larger certified regions to enable better exploration. Empirical simulations show that zero safety violations are already challenging for a suite of simple environments with only 2-4 dimensional state space, especially if high-reward policies have to visit regions near the safety boundary. Prior methods require hundreds of violations to achieve decent rewards on these tasks, whereas our proposed algorithms incur zero violations." @default.
- W4287028738 created "2022-07-25" @default.
- W4287028738 creator A5061905935 @default.
- W4287028738 creator A5074630977 @default.
- W4287028738 date "2021-08-04" @default.
- W4287028738 modified "2023-09-24" @default.
- W4287028738 title "Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations" @default.
- W4287028738 hasPublicationYear "2021" @default.
- W4287028738 type Work @default.
- W4287028738 citedByCount "0" @default.
- W4287028738 crossrefType "posted-content" @default.
- W4287028738 hasAuthorship W4287028738A5061905935 @default.
- W4287028738 hasAuthorship W4287028738A5074630977 @default.
- W4287028738 hasBestOaLocation W42870287381 @default.
- W4287028738 hasConcept C11413529 @default.
- W4287028738 hasConcept C119857082 @default.
- W4287028738 hasConcept C121332964 @default.
- W4287028738 hasConcept C138885662 @default.
- W4287028738 hasConcept C144024400 @default.
- W4287028738 hasConcept C153294291 @default.
- W4287028738 hasConcept C154945302 @default.
- W4287028738 hasConcept C166957645 @default.
- W4287028738 hasConcept C17744445 @default.
- W4287028738 hasConcept C199539241 @default.
- W4287028738 hasConcept C2776135515 @default.
- W4287028738 hasConcept C2776775276 @default.
- W4287028738 hasConcept C2777211547 @default.
- W4287028738 hasConcept C2780813799 @default.
- W4287028738 hasConcept C37736160 @default.
- W4287028738 hasConcept C38652104 @default.
- W4287028738 hasConcept C41008148 @default.
- W4287028738 hasConcept C41895202 @default.
- W4287028738 hasConcept C46304622 @default.
- W4287028738 hasConcept C73484699 @default.
- W4287028738 hasConcept C79581498 @default.
- W4287028738 hasConcept C95457728 @default.
- W4287028738 hasConcept C96865113 @default.
- W4287028738 hasConcept C97541855 @default.
- W4287028738 hasConceptScore W4287028738C11413529 @default.
- W4287028738 hasConceptScore W4287028738C119857082 @default.
- W4287028738 hasConceptScore W4287028738C121332964 @default.
- W4287028738 hasConceptScore W4287028738C138885662 @default.
- W4287028738 hasConceptScore W4287028738C144024400 @default.
- W4287028738 hasConceptScore W4287028738C153294291 @default.
- W4287028738 hasConceptScore W4287028738C154945302 @default.
- W4287028738 hasConceptScore W4287028738C166957645 @default.
- W4287028738 hasConceptScore W4287028738C17744445 @default.
- W4287028738 hasConceptScore W4287028738C199539241 @default.
- W4287028738 hasConceptScore W4287028738C2776135515 @default.
- W4287028738 hasConceptScore W4287028738C2776775276 @default.
- W4287028738 hasConceptScore W4287028738C2777211547 @default.
- W4287028738 hasConceptScore W4287028738C2780813799 @default.
- W4287028738 hasConceptScore W4287028738C37736160 @default.
- W4287028738 hasConceptScore W4287028738C38652104 @default.
- W4287028738 hasConceptScore W4287028738C41008148 @default.
- W4287028738 hasConceptScore W4287028738C41895202 @default.
- W4287028738 hasConceptScore W4287028738C46304622 @default.
- W4287028738 hasConceptScore W4287028738C73484699 @default.
- W4287028738 hasConceptScore W4287028738C79581498 @default.
- W4287028738 hasConceptScore W4287028738C95457728 @default.
- W4287028738 hasConceptScore W4287028738C96865113 @default.
- W4287028738 hasConceptScore W4287028738C97541855 @default.
- W4287028738 hasLocation W42870287381 @default.
- W4287028738 hasOpenAccess W4287028738 @default.
- W4287028738 hasPrimaryLocation W42870287381 @default.
- W4287028738 hasRelatedWork W10379689 @default.
- W4287028738 hasRelatedWork W12291563 @default.
- W4287028738 hasRelatedWork W4412456 @default.
- W4287028738 hasRelatedWork W547392 @default.
- W4287028738 hasRelatedWork W5547603 @default.
- W4287028738 hasRelatedWork W7084024 @default.
- W4287028738 hasRelatedWork W8447228 @default.
- W4287028738 hasRelatedWork W8539471 @default.
- W4287028738 hasRelatedWork W868042 @default.
- W4287028738 hasRelatedWork W9657784 @default.
- W4287028738 isParatext "false" @default.
- W4287028738 isRetracted "false" @default.
- W4287028738 workType "article" @default.