Matches in SemOpenAlex for { <https://semopenalex.org/work/W4286952720> ?p ?o ?g. }
Showing items 1 to 53 of
53
with 100 items per page.
- W4286952720 abstract "One of the key challenges to deep reinforcement learning (deep RL) is to ensure safety at both training and testing phases. In this work, we propose a novel technique of unsupervised action planning to improve the safety of on-policy reinforcement learning algorithms, such as trust region policy optimization (TRPO) or proximal policy optimization (PPO). We design our safety-aware reinforcement learning by storing all the history of recovery actions that rescue the agent from dangerous situations into a separate safety buffer and finding the best recovery action when the agent encounters similar states. Because this functionality requires the algorithm to query similar states, we implement the proposed safety mechanism using an unsupervised learning algorithm, k-means clustering. We evaluate the proposed algorithm on six robotic control tasks that cover navigation and manipulation. Our results show that the proposed safety RL algorithm can achieve higher rewards compared with multiple baselines in both discrete and continuous control problems. The supplemental video can be found at: https://youtu.be/AFTeWSohILo." @default.
- W4286952720 created "2022-07-25" @default.
- W4286952720 creator A5064581452 @default.
- W4286952720 creator A5071485078 @default.
- W4286952720 creator A5086145215 @default.
- W4286952720 date "2021-09-29" @default.
- W4286952720 modified "2023-10-14" @default.
- W4286952720 title "Improving Safety in Deep Reinforcement Learning using Unsupervised Action Planning" @default.
- W4286952720 doi "https://doi.org/10.48550/arxiv.2109.14325" @default.
- W4286952720 hasPublicationYear "2021" @default.
- W4286952720 type Work @default.
- W4286952720 citedByCount "0" @default.
- W4286952720 crossrefType "posted-content" @default.
- W4286952720 hasAuthorship W4286952720A5064581452 @default.
- W4286952720 hasAuthorship W4286952720A5071485078 @default.
- W4286952720 hasAuthorship W4286952720A5086145215 @default.
- W4286952720 hasBestOaLocation W42869527201 @default.
- W4286952720 hasConcept C119857082 @default.
- W4286952720 hasConcept C121332964 @default.
- W4286952720 hasConcept C154945302 @default.
- W4286952720 hasConcept C2775924081 @default.
- W4286952720 hasConcept C2780791683 @default.
- W4286952720 hasConcept C41008148 @default.
- W4286952720 hasConcept C62520636 @default.
- W4286952720 hasConcept C73555534 @default.
- W4286952720 hasConcept C8038995 @default.
- W4286952720 hasConcept C97541855 @default.
- W4286952720 hasConceptScore W4286952720C119857082 @default.
- W4286952720 hasConceptScore W4286952720C121332964 @default.
- W4286952720 hasConceptScore W4286952720C154945302 @default.
- W4286952720 hasConceptScore W4286952720C2775924081 @default.
- W4286952720 hasConceptScore W4286952720C2780791683 @default.
- W4286952720 hasConceptScore W4286952720C41008148 @default.
- W4286952720 hasConceptScore W4286952720C62520636 @default.
- W4286952720 hasConceptScore W4286952720C73555534 @default.
- W4286952720 hasConceptScore W4286952720C8038995 @default.
- W4286952720 hasConceptScore W4286952720C97541855 @default.
- W4286952720 hasLocation W42869527201 @default.
- W4286952720 hasOpenAccess W4286952720 @default.
- W4286952720 hasPrimaryLocation W42869527201 @default.
- W4286952720 hasRelatedWork W3022038857 @default.
- W4286952720 hasRelatedWork W3046775127 @default.
- W4286952720 hasRelatedWork W3087576162 @default.
- W4286952720 hasRelatedWork W3123344745 @default.
- W4286952720 hasRelatedWork W3196155444 @default.
- W4286952720 hasRelatedWork W3209574120 @default.
- W4286952720 hasRelatedWork W3210156800 @default.
- W4286952720 hasRelatedWork W4285260836 @default.
- W4286952720 hasRelatedWork W4287665842 @default.
- W4286952720 hasRelatedWork W4319083788 @default.
- W4286952720 isParatext "false" @default.
- W4286952720 isRetracted "false" @default.
- W4286952720 workType "article" @default.