SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4226232680> ?p ?o ?g. }

Showing items 1 to 81 of 81 with 100 items per page.

W4226232680 abstract "Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 4 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric." @default.
W4226232680 created "2022-05-05" @default.
W4226232680 creator A5009569401 @default.
W4226232680 creator A5010823846 @default.
W4226232680 creator A5030508108 @default.
W4226232680 creator A5034483183 @default.
W4226232680 creator A5055606460 @default.
W4226232680 creator A5074839359 @default.
W4226232680 date "2022-02-15" @default.
W4226232680 modified "2023-10-16" @default.
W4226232680 title "User-Oriented Robust Reinforcement Learning" @default.
W4226232680 doi "https://doi.org/10.48550/arxiv.2202.07301" @default.
W4226232680 hasPublicationYear "2022" @default.
W4226232680 type Work @default.
W4226232680 citedByCount "0" @default.
W4226232680 crossrefType "posted-content" @default.
W4226232680 hasAuthorship W4226232680A5009569401 @default.
W4226232680 hasAuthorship W4226232680A5010823846 @default.
W4226232680 hasAuthorship W4226232680A5030508108 @default.
W4226232680 hasAuthorship W4226232680A5034483183 @default.
W4226232680 hasAuthorship W4226232680A5055606460 @default.
W4226232680 hasAuthorship W4226232680A5074839359 @default.
W4226232680 hasBestOaLocation W42262326801 @default.
W4226232680 hasConcept C104317684 @default.
W4226232680 hasConcept C111472728 @default.
W4226232680 hasConcept C119857082 @default.
W4226232680 hasConcept C126255220 @default.
W4226232680 hasConcept C127413603 @default.
W4226232680 hasConcept C138885662 @default.
W4226232680 hasConcept C154945302 @default.
W4226232680 hasConcept C162324750 @default.
W4226232680 hasConcept C165696696 @default.
W4226232680 hasConcept C176217482 @default.
W4226232680 hasConcept C185592680 @default.
W4226232680 hasConcept C187736073 @default.
W4226232680 hasConcept C21547014 @default.
W4226232680 hasConcept C2780898871 @default.
W4226232680 hasConcept C33923547 @default.
W4226232680 hasConcept C38652104 @default.
W4226232680 hasConcept C41008148 @default.
W4226232680 hasConcept C55493867 @default.
W4226232680 hasConcept C63479239 @default.
W4226232680 hasConcept C75553542 @default.
W4226232680 hasConcept C97541855 @default.
W4226232680 hasConceptScore W4226232680C104317684 @default.
W4226232680 hasConceptScore W4226232680C111472728 @default.
W4226232680 hasConceptScore W4226232680C119857082 @default.
W4226232680 hasConceptScore W4226232680C126255220 @default.
W4226232680 hasConceptScore W4226232680C127413603 @default.
W4226232680 hasConceptScore W4226232680C138885662 @default.
W4226232680 hasConceptScore W4226232680C154945302 @default.
W4226232680 hasConceptScore W4226232680C162324750 @default.
W4226232680 hasConceptScore W4226232680C165696696 @default.
W4226232680 hasConceptScore W4226232680C176217482 @default.
W4226232680 hasConceptScore W4226232680C185592680 @default.
W4226232680 hasConceptScore W4226232680C187736073 @default.
W4226232680 hasConceptScore W4226232680C21547014 @default.
W4226232680 hasConceptScore W4226232680C2780898871 @default.
W4226232680 hasConceptScore W4226232680C33923547 @default.
W4226232680 hasConceptScore W4226232680C38652104 @default.
W4226232680 hasConceptScore W4226232680C41008148 @default.
W4226232680 hasConceptScore W4226232680C55493867 @default.
W4226232680 hasConceptScore W4226232680C63479239 @default.
W4226232680 hasConceptScore W4226232680C75553542 @default.
W4226232680 hasConceptScore W4226232680C97541855 @default.
W4226232680 hasLocation W42262326801 @default.
W4226232680 hasOpenAccess W4226232680 @default.
W4226232680 hasPrimaryLocation W42262326801 @default.
W4226232680 hasRelatedWork W1874176344 @default.
W4226232680 hasRelatedWork W2023749213 @default.
W4226232680 hasRelatedWork W2168895868 @default.
W4226232680 hasRelatedWork W266446692 @default.
W4226232680 hasRelatedWork W2732251946 @default.
W4226232680 hasRelatedWork W3013347679 @default.
W4226232680 hasRelatedWork W3022038857 @default.
W4226232680 hasRelatedWork W3095098996 @default.
W4226232680 hasRelatedWork W3154711472 @default.
W4226232680 hasRelatedWork W4287824031 @default.
W4226232680 isParatext "false" @default.
W4226232680 isRetracted "false" @default.
W4226232680 workType "article" @default.