Matches in SemOpenAlex for { <https://semopenalex.org/work/W3217647572> ?p ?o ?g. }
- W3217647572 abstract "The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems." @default.
- W3217647572 created "2021-12-06" @default.
- W3217647572 creator A5005536479 @default.
- W3217647572 creator A5053096993 @default.
- W3217647572 date "2021-12-02" @default.
- W3217647572 modified "2023-09-25" @default.
- W3217647572 title "Sample Complexity of Robust Reinforcement Learning with a Generative Model" @default.
- W3217647572 cites W1965878388 @default.
- W3217647572 cites W2039439610 @default.
- W3217647572 cites W2106378743 @default.
- W3217647572 cites W2110546349 @default.
- W3217647572 cites W2120678009 @default.
- W3217647572 cites W2121863487 @default.
- W3217647572 cites W2136503687 @default.
- W3217647572 cites W2139914196 @default.
- W3217647572 cites W2155153696 @default.
- W3217647572 cites W2165428239 @default.
- W3217647572 cites W2168565265 @default.
- W3217647572 cites W2257317309 @default.
- W3217647572 cites W2553297237 @default.
- W3217647572 cites W2593952959 @default.
- W3217647572 cites W2602963933 @default.
- W3217647572 cites W2787248994 @default.
- W3217647572 cites W2890347272 @default.
- W3217647572 cites W2946019081 @default.
- W3217647572 cites W2950300520 @default.
- W3217647572 cites W2962850106 @default.
- W3217647572 cites W2964182728 @default.
- W3217647572 cites W2964751386 @default.
- W3217647572 cites W2970160802 @default.
- W3217647572 cites W3092476885 @default.
- W3217647572 cites W3098237412 @default.
- W3217647572 cites W3102674085 @default.
- W3217647572 cites W3157375408 @default.
- W3217647572 cites W3168872570 @default.
- W3217647572 cites W3195133498 @default.
- W3217647572 hasPublicationYear "2021" @default.
- W3217647572 type Work @default.
- W3217647572 sameAs 3217647572 @default.
- W3217647572 citedByCount "0" @default.
- W3217647572 crossrefType "posted-content" @default.
- W3217647572 hasAuthorship W3217647572A5005536479 @default.
- W3217647572 hasAuthorship W3217647572A5053096993 @default.
- W3217647572 hasConcept C105795698 @default.
- W3217647572 hasConcept C106189395 @default.
- W3217647572 hasConcept C11413529 @default.
- W3217647572 hasConcept C126255220 @default.
- W3217647572 hasConcept C13280743 @default.
- W3217647572 hasConcept C138885662 @default.
- W3217647572 hasConcept C14036430 @default.
- W3217647572 hasConcept C14646407 @default.
- W3217647572 hasConcept C154945302 @default.
- W3217647572 hasConcept C159886148 @default.
- W3217647572 hasConcept C167966045 @default.
- W3217647572 hasConcept C177264268 @default.
- W3217647572 hasConcept C185592680 @default.
- W3217647572 hasConcept C185798385 @default.
- W3217647572 hasConcept C198531522 @default.
- W3217647572 hasConcept C199360897 @default.
- W3217647572 hasConcept C205649164 @default.
- W3217647572 hasConcept C207390915 @default.
- W3217647572 hasConcept C2778445095 @default.
- W3217647572 hasConcept C33923547 @default.
- W3217647572 hasConcept C37404715 @default.
- W3217647572 hasConcept C39890363 @default.
- W3217647572 hasConcept C41008148 @default.
- W3217647572 hasConcept C41895202 @default.
- W3217647572 hasConcept C43617362 @default.
- W3217647572 hasConcept C78458016 @default.
- W3217647572 hasConcept C86803240 @default.
- W3217647572 hasConcept C97541855 @default.
- W3217647572 hasConceptScore W3217647572C105795698 @default.
- W3217647572 hasConceptScore W3217647572C106189395 @default.
- W3217647572 hasConceptScore W3217647572C11413529 @default.
- W3217647572 hasConceptScore W3217647572C126255220 @default.
- W3217647572 hasConceptScore W3217647572C13280743 @default.
- W3217647572 hasConceptScore W3217647572C138885662 @default.
- W3217647572 hasConceptScore W3217647572C14036430 @default.
- W3217647572 hasConceptScore W3217647572C14646407 @default.
- W3217647572 hasConceptScore W3217647572C154945302 @default.
- W3217647572 hasConceptScore W3217647572C159886148 @default.
- W3217647572 hasConceptScore W3217647572C167966045 @default.
- W3217647572 hasConceptScore W3217647572C177264268 @default.
- W3217647572 hasConceptScore W3217647572C185592680 @default.
- W3217647572 hasConceptScore W3217647572C185798385 @default.
- W3217647572 hasConceptScore W3217647572C198531522 @default.
- W3217647572 hasConceptScore W3217647572C199360897 @default.
- W3217647572 hasConceptScore W3217647572C205649164 @default.
- W3217647572 hasConceptScore W3217647572C207390915 @default.
- W3217647572 hasConceptScore W3217647572C2778445095 @default.
- W3217647572 hasConceptScore W3217647572C33923547 @default.
- W3217647572 hasConceptScore W3217647572C37404715 @default.
- W3217647572 hasConceptScore W3217647572C39890363 @default.
- W3217647572 hasConceptScore W3217647572C41008148 @default.
- W3217647572 hasConceptScore W3217647572C41895202 @default.
- W3217647572 hasConceptScore W3217647572C43617362 @default.
- W3217647572 hasConceptScore W3217647572C78458016 @default.
- W3217647572 hasConceptScore W3217647572C86803240 @default.
- W3217647572 hasConceptScore W3217647572C97541855 @default.
- W3217647572 hasLocation W32176475721 @default.