Matches in SemOpenAlex for { <https://semopenalex.org/work/W4226232680> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W4226232680 abstract "Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 4 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric." @default.
- W4226232680 created "2022-05-05" @default.
- W4226232680 creator A5009569401 @default.
- W4226232680 creator A5010823846 @default.
- W4226232680 creator A5030508108 @default.
- W4226232680 creator A5034483183 @default.
- W4226232680 creator A5055606460 @default.
- W4226232680 creator A5074839359 @default.
- W4226232680 date "2022-02-15" @default.
- W4226232680 modified "2023-10-16" @default.
- W4226232680 title "User-Oriented Robust Reinforcement Learning" @default.
- W4226232680 doi "https://doi.org/10.48550/arxiv.2202.07301" @default.
- W4226232680 hasPublicationYear "2022" @default.
- W4226232680 type Work @default.
- W4226232680 citedByCount "0" @default.
- W4226232680 crossrefType "posted-content" @default.
- W4226232680 hasAuthorship W4226232680A5009569401 @default.
- W4226232680 hasAuthorship W4226232680A5010823846 @default.
- W4226232680 hasAuthorship W4226232680A5030508108 @default.
- W4226232680 hasAuthorship W4226232680A5034483183 @default.
- W4226232680 hasAuthorship W4226232680A5055606460 @default.
- W4226232680 hasAuthorship W4226232680A5074839359 @default.
- W4226232680 hasBestOaLocation W42262326801 @default.
- W4226232680 hasConcept C104317684 @default.
- W4226232680 hasConcept C111472728 @default.
- W4226232680 hasConcept C119857082 @default.
- W4226232680 hasConcept C126255220 @default.
- W4226232680 hasConcept C127413603 @default.
- W4226232680 hasConcept C138885662 @default.
- W4226232680 hasConcept C154945302 @default.
- W4226232680 hasConcept C162324750 @default.
- W4226232680 hasConcept C165696696 @default.
- W4226232680 hasConcept C176217482 @default.
- W4226232680 hasConcept C185592680 @default.
- W4226232680 hasConcept C187736073 @default.
- W4226232680 hasConcept C21547014 @default.
- W4226232680 hasConcept C2780898871 @default.
- W4226232680 hasConcept C33923547 @default.
- W4226232680 hasConcept C38652104 @default.
- W4226232680 hasConcept C41008148 @default.
- W4226232680 hasConcept C55493867 @default.
- W4226232680 hasConcept C63479239 @default.
- W4226232680 hasConcept C75553542 @default.
- W4226232680 hasConcept C97541855 @default.
- W4226232680 hasConceptScore W4226232680C104317684 @default.
- W4226232680 hasConceptScore W4226232680C111472728 @default.
- W4226232680 hasConceptScore W4226232680C119857082 @default.
- W4226232680 hasConceptScore W4226232680C126255220 @default.
- W4226232680 hasConceptScore W4226232680C127413603 @default.
- W4226232680 hasConceptScore W4226232680C138885662 @default.
- W4226232680 hasConceptScore W4226232680C154945302 @default.
- W4226232680 hasConceptScore W4226232680C162324750 @default.
- W4226232680 hasConceptScore W4226232680C165696696 @default.
- W4226232680 hasConceptScore W4226232680C176217482 @default.
- W4226232680 hasConceptScore W4226232680C185592680 @default.
- W4226232680 hasConceptScore W4226232680C187736073 @default.
- W4226232680 hasConceptScore W4226232680C21547014 @default.
- W4226232680 hasConceptScore W4226232680C2780898871 @default.
- W4226232680 hasConceptScore W4226232680C33923547 @default.
- W4226232680 hasConceptScore W4226232680C38652104 @default.
- W4226232680 hasConceptScore W4226232680C41008148 @default.
- W4226232680 hasConceptScore W4226232680C55493867 @default.
- W4226232680 hasConceptScore W4226232680C63479239 @default.
- W4226232680 hasConceptScore W4226232680C75553542 @default.
- W4226232680 hasConceptScore W4226232680C97541855 @default.
- W4226232680 hasLocation W42262326801 @default.
- W4226232680 hasOpenAccess W4226232680 @default.
- W4226232680 hasPrimaryLocation W42262326801 @default.
- W4226232680 hasRelatedWork W1874176344 @default.
- W4226232680 hasRelatedWork W2023749213 @default.
- W4226232680 hasRelatedWork W2168895868 @default.
- W4226232680 hasRelatedWork W266446692 @default.
- W4226232680 hasRelatedWork W2732251946 @default.
- W4226232680 hasRelatedWork W3013347679 @default.
- W4226232680 hasRelatedWork W3022038857 @default.
- W4226232680 hasRelatedWork W3095098996 @default.
- W4226232680 hasRelatedWork W3154711472 @default.
- W4226232680 hasRelatedWork W4287824031 @default.
- W4226232680 isParatext "false" @default.
- W4226232680 isRetracted "false" @default.
- W4226232680 workType "article" @default.