Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387596105> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4387596105 abstract "Safety is a primary concern when applying reinforcement learning to real-world control tasks, especially in the presence of external disturbances. However, existing safe reinforcement learning algorithms rarely account for external disturbances, limiting their applicability and robustness in practice. To address this challenge, this paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances. First, this paper presents a policy iteration scheme to solve for the robust invariant set, i.e., a subset of the safe set, where persistent safety is only possible for states within. The key idea is to establish a two-player zero-sum game by leveraging the safety value function in Hamilton-Jacobi reachability analysis, in which the protagonist (i.e., control inputs) aims to maintain safety and the adversary (i.e., external disturbances) tries to break down safety. This paper proves that the proposed policy iteration algorithm converges monotonically to the maximal robust invariant set. Second, this paper integrates the proposed policy iteration scheme into a constrained reinforcement learning algorithm that simultaneously synthesizes the robust invariant set and uses it for constrained policy optimization. This algorithm tackles both optimality and safety, i.e., learning a policy that attains high rewards while maintaining safety under worst-case disturbances. Experiments on classic control tasks show that the proposed method achieves zero constraint violation with learned worst-case adversarial disturbances, while other baseline algorithms violate the safety constraints substantially. Our proposed method also attains comparable performance as the baselines even in the absence of the adversary." @default.
- W4387596105 created "2023-10-13" @default.
- W4387596105 creator A5034758970 @default.
- W4387596105 creator A5045375278 @default.
- W4387596105 creator A5072263969 @default.
- W4387596105 creator A5079033650 @default.
- W4387596105 creator A5088481026 @default.
- W4387596105 date "2023-10-11" @default.
- W4387596105 modified "2023-10-14" @default.
- W4387596105 title "Robust Safe Reinforcement Learning under Adversarial Disturbances" @default.
- W4387596105 doi "https://doi.org/10.48550/arxiv.2310.07207" @default.
- W4387596105 hasPublicationYear "2023" @default.
- W4387596105 type Work @default.
- W4387596105 citedByCount "0" @default.
- W4387596105 crossrefType "posted-content" @default.
- W4387596105 hasAuthorship W4387596105A5034758970 @default.
- W4387596105 hasAuthorship W4387596105A5045375278 @default.
- W4387596105 hasAuthorship W4387596105A5072263969 @default.
- W4387596105 hasAuthorship W4387596105A5079033650 @default.
- W4387596105 hasAuthorship W4387596105A5088481026 @default.
- W4387596105 hasBestOaLocation W43875961051 @default.
- W4387596105 hasConcept C104317684 @default.
- W4387596105 hasConcept C11413529 @default.
- W4387596105 hasConcept C126255220 @default.
- W4387596105 hasConcept C134306372 @default.
- W4387596105 hasConcept C136643341 @default.
- W4387596105 hasConcept C14646407 @default.
- W4387596105 hasConcept C154945302 @default.
- W4387596105 hasConcept C177264268 @default.
- W4387596105 hasConcept C185592680 @default.
- W4387596105 hasConcept C190470478 @default.
- W4387596105 hasConcept C199360897 @default.
- W4387596105 hasConcept C33923547 @default.
- W4387596105 hasConcept C37736160 @default.
- W4387596105 hasConcept C37914503 @default.
- W4387596105 hasConcept C38652104 @default.
- W4387596105 hasConcept C41008148 @default.
- W4387596105 hasConcept C41065033 @default.
- W4387596105 hasConcept C55493867 @default.
- W4387596105 hasConcept C63479239 @default.
- W4387596105 hasConcept C72169020 @default.
- W4387596105 hasConcept C97541855 @default.
- W4387596105 hasConceptScore W4387596105C104317684 @default.
- W4387596105 hasConceptScore W4387596105C11413529 @default.
- W4387596105 hasConceptScore W4387596105C126255220 @default.
- W4387596105 hasConceptScore W4387596105C134306372 @default.
- W4387596105 hasConceptScore W4387596105C136643341 @default.
- W4387596105 hasConceptScore W4387596105C14646407 @default.
- W4387596105 hasConceptScore W4387596105C154945302 @default.
- W4387596105 hasConceptScore W4387596105C177264268 @default.
- W4387596105 hasConceptScore W4387596105C185592680 @default.
- W4387596105 hasConceptScore W4387596105C190470478 @default.
- W4387596105 hasConceptScore W4387596105C199360897 @default.
- W4387596105 hasConceptScore W4387596105C33923547 @default.
- W4387596105 hasConceptScore W4387596105C37736160 @default.
- W4387596105 hasConceptScore W4387596105C37914503 @default.
- W4387596105 hasConceptScore W4387596105C38652104 @default.
- W4387596105 hasConceptScore W4387596105C41008148 @default.
- W4387596105 hasConceptScore W4387596105C41065033 @default.
- W4387596105 hasConceptScore W4387596105C55493867 @default.
- W4387596105 hasConceptScore W4387596105C63479239 @default.
- W4387596105 hasConceptScore W4387596105C72169020 @default.
- W4387596105 hasConceptScore W4387596105C97541855 @default.
- W4387596105 hasLocation W43875961051 @default.
- W4387596105 hasOpenAccess W4387596105 @default.
- W4387596105 hasPrimaryLocation W43875961051 @default.
- W4387596105 hasRelatedWork W106056076 @default.
- W4387596105 hasRelatedWork W203812490 @default.
- W4387596105 hasRelatedWork W2067910792 @default.
- W4387596105 hasRelatedWork W2135200719 @default.
- W4387596105 hasRelatedWork W2136512912 @default.
- W4387596105 hasRelatedWork W2918664383 @default.
- W4387596105 hasRelatedWork W3123119822 @default.
- W4387596105 hasRelatedWork W4320018150 @default.
- W4387596105 hasRelatedWork W4320855730 @default.
- W4387596105 hasRelatedWork W2127267268 @default.
- W4387596105 isParatext "false" @default.
- W4387596105 isRetracted "false" @default.
- W4387596105 workType "article" @default.