Matches in SemOpenAlex for { <https://semopenalex.org/work/W4312597946> ?p ?o ?g. }
- W4312597946 abstract "Evaluating rare but high-stakes events is one of the main challenges in obtaining reliable reinforcement learning policies, especially in large or infinite state/action spaces where limited scalability dictates a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. This paper proposes the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function approximators. We investigate the convergence property of APE in the tabular setting. Our empirical studies show that APE can estimate the rare event probability with a smaller bias while only using orders of magnitude fewer samples than baselines in multi-agent and single-agent environments." @default.
- W4312597946 created "2023-01-05" @default.
- W4312597946 creator A5007087808 @default.
- W4312597946 creator A5010028058 @default.
- W4312597946 creator A5015581947 @default.
- W4312597946 creator A5025075066 @default.
- W4312597946 creator A5026307285 @default.
- W4312597946 creator A5037644321 @default.
- W4312597946 creator A5049539929 @default.
- W4312597946 creator A5086653913 @default.
- W4312597946 creator A5090788140 @default.
- W4312597946 date "2022-10-23" @default.
- W4312597946 modified "2023-10-16" @default.
- W4312597946 title "Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event Sampling" @default.
- W4312597946 cites W2037393162 @default.
- W4312597946 cites W2115264563 @default.
- W4312597946 cites W2126025075 @default.
- W4312597946 cites W2147632348 @default.
- W4312597946 cites W2158126207 @default.
- W4312597946 cites W2165428239 @default.
- W4312597946 cites W2338318698 @default.
- W4312597946 cites W2438413413 @default.
- W4312597946 cites W2511072509 @default.
- W4312597946 cites W2525936901 @default.
- W4312597946 cites W2758442112 @default.
- W4312597946 cites W2798302089 @default.
- W4312597946 cites W3098217413 @default.
- W4312597946 cites W3117478135 @default.
- W4312597946 cites W3133465684 @default.
- W4312597946 cites W3207250575 @default.
- W4312597946 cites W4211042066 @default.
- W4312597946 cites W4233696721 @default.
- W4312597946 doi "https://doi.org/10.1109/iros47612.2022.9981867" @default.
- W4312597946 hasPublicationYear "2022" @default.
- W4312597946 type Work @default.
- W4312597946 citedByCount "0" @default.
- W4312597946 crossrefType "proceedings-article" @default.
- W4312597946 hasAuthorship W4312597946A5007087808 @default.
- W4312597946 hasAuthorship W4312597946A5010028058 @default.
- W4312597946 hasAuthorship W4312597946A5015581947 @default.
- W4312597946 hasAuthorship W4312597946A5025075066 @default.
- W4312597946 hasAuthorship W4312597946A5026307285 @default.
- W4312597946 hasAuthorship W4312597946A5037644321 @default.
- W4312597946 hasAuthorship W4312597946A5049539929 @default.
- W4312597946 hasAuthorship W4312597946A5086653913 @default.
- W4312597946 hasAuthorship W4312597946A5090788140 @default.
- W4312597946 hasBestOaLocation W43125979462 @default.
- W4312597946 hasConcept C105339364 @default.
- W4312597946 hasConcept C105795698 @default.
- W4312597946 hasConcept C106131492 @default.
- W4312597946 hasConcept C106189395 @default.
- W4312597946 hasConcept C111919701 @default.
- W4312597946 hasConcept C119857082 @default.
- W4312597946 hasConcept C121332964 @default.
- W4312597946 hasConcept C121955636 @default.
- W4312597946 hasConcept C140779682 @default.
- W4312597946 hasConcept C144133560 @default.
- W4312597946 hasConcept C154945302 @default.
- W4312597946 hasConcept C159886148 @default.
- W4312597946 hasConcept C162324750 @default.
- W4312597946 hasConcept C19499675 @default.
- W4312597946 hasConcept C196083921 @default.
- W4312597946 hasConcept C2777303404 @default.
- W4312597946 hasConcept C2777317252 @default.
- W4312597946 hasConcept C2779662365 @default.
- W4312597946 hasConcept C31972630 @default.
- W4312597946 hasConcept C33923547 @default.
- W4312597946 hasConcept C41008148 @default.
- W4312597946 hasConcept C48044578 @default.
- W4312597946 hasConcept C50522688 @default.
- W4312597946 hasConcept C52740198 @default.
- W4312597946 hasConcept C62520636 @default.
- W4312597946 hasConcept C77088390 @default.
- W4312597946 hasConcept C97541855 @default.
- W4312597946 hasConcept C98763669 @default.
- W4312597946 hasConceptScore W4312597946C105339364 @default.
- W4312597946 hasConceptScore W4312597946C105795698 @default.
- W4312597946 hasConceptScore W4312597946C106131492 @default.
- W4312597946 hasConceptScore W4312597946C106189395 @default.
- W4312597946 hasConceptScore W4312597946C111919701 @default.
- W4312597946 hasConceptScore W4312597946C119857082 @default.
- W4312597946 hasConceptScore W4312597946C121332964 @default.
- W4312597946 hasConceptScore W4312597946C121955636 @default.
- W4312597946 hasConceptScore W4312597946C140779682 @default.
- W4312597946 hasConceptScore W4312597946C144133560 @default.
- W4312597946 hasConceptScore W4312597946C154945302 @default.
- W4312597946 hasConceptScore W4312597946C159886148 @default.
- W4312597946 hasConceptScore W4312597946C162324750 @default.
- W4312597946 hasConceptScore W4312597946C19499675 @default.
- W4312597946 hasConceptScore W4312597946C196083921 @default.
- W4312597946 hasConceptScore W4312597946C2777303404 @default.
- W4312597946 hasConceptScore W4312597946C2777317252 @default.
- W4312597946 hasConceptScore W4312597946C2779662365 @default.
- W4312597946 hasConceptScore W4312597946C31972630 @default.
- W4312597946 hasConceptScore W4312597946C33923547 @default.
- W4312597946 hasConceptScore W4312597946C41008148 @default.
- W4312597946 hasConceptScore W4312597946C48044578 @default.
- W4312597946 hasConceptScore W4312597946C50522688 @default.
- W4312597946 hasConceptScore W4312597946C52740198 @default.
- W4312597946 hasConceptScore W4312597946C62520636 @default.