Matches in SemOpenAlex for { <https://semopenalex.org/work/W3089710958> ?p ?o ?g. }
- W3089710958 abstract "Real-world robots have complex strict constraints. Therefore, safe reinforcement learning algorithms that can simultaneously minimize the total cost and the risk of constraint violation are crucial. However, almost no algorithms exist that can scale to high-dimensional systems to the best of our knowledge. In this paper, we propose Dynamic Actor-Advisor Programming (DAAP), as an algorithm for sample-efficient and scalable safe reinforcement learning. DAAP employs two control policies, actor and advisor. They are updated to minimize total cost and risk of constraint violation intertwiningly and smoothly towards each other’s direction by using the other as the baseline policy in the Kullback-Leibler divergence of Dynamic Policy Programming framework. We demonstrate the scalability and sample efficiency of DAAP through its application on simulated robot arm control tasks with performance comparisons to baselines." @default.
- W3089710958 created "2020-10-08" @default.
- W3089710958 creator A5011048472 @default.
- W3089710958 creator A5012201453 @default.
- W3089710958 creator A5042074952 @default.
- W3089710958 date "2020-05-01" @default.
- W3089710958 modified "2023-09-24" @default.
- W3089710958 title "Dynamic Actor-Advisor Programming for Scalable Safe Reinforcement Learning" @default.
- W3089710958 cites W1518931405 @default.
- W3089710958 cites W1575592356 @default.
- W3089710958 cites W1845972764 @default.
- W3089710958 cites W2118556122 @default.
- W3089710958 cites W2121863487 @default.
- W3089710958 cites W2130005627 @default.
- W3089710958 cites W2145339207 @default.
- W3089710958 cites W2151237105 @default.
- W3089710958 cites W2164479831 @default.
- W3089710958 cites W2555811267 @default.
- W3089710958 cites W2618318883 @default.
- W3089710958 cites W2730929966 @default.
- W3089710958 cites W2766447205 @default.
- W3089710958 cites W2808709600 @default.
- W3089710958 cites W2900582619 @default.
- W3089710958 cites W2911305319 @default.
- W3089710958 cites W2921639061 @default.
- W3089710958 cites W2962736495 @default.
- W3089710958 cites W2962901215 @default.
- W3089710958 cites W2963780574 @default.
- W3089710958 cites W2964340170 @default.
- W3089710958 cites W3104013016 @default.
- W3089710958 cites W3106238320 @default.
- W3089710958 doi "https://doi.org/10.1109/icra40945.2020.9197200" @default.
- W3089710958 hasPublicationYear "2020" @default.
- W3089710958 type Work @default.
- W3089710958 sameAs 3089710958 @default.
- W3089710958 citedByCount "3" @default.
- W3089710958 countsByYear W30897109582021 @default.
- W3089710958 countsByYear W30897109582022 @default.
- W3089710958 countsByYear W30897109582023 @default.
- W3089710958 crossrefType "proceedings-article" @default.
- W3089710958 hasAuthorship W3089710958A5011048472 @default.
- W3089710958 hasAuthorship W3089710958A5012201453 @default.
- W3089710958 hasAuthorship W3089710958A5042074952 @default.
- W3089710958 hasConcept C111368507 @default.
- W3089710958 hasConcept C11413529 @default.
- W3089710958 hasConcept C119857082 @default.
- W3089710958 hasConcept C126255220 @default.
- W3089710958 hasConcept C12725497 @default.
- W3089710958 hasConcept C127313418 @default.
- W3089710958 hasConcept C127413603 @default.
- W3089710958 hasConcept C137631369 @default.
- W3089710958 hasConcept C138885662 @default.
- W3089710958 hasConcept C154945302 @default.
- W3089710958 hasConcept C173404611 @default.
- W3089710958 hasConcept C185592680 @default.
- W3089710958 hasConcept C198531522 @default.
- W3089710958 hasConcept C207390915 @default.
- W3089710958 hasConcept C2775924081 @default.
- W3089710958 hasConcept C2776036281 @default.
- W3089710958 hasConcept C33923547 @default.
- W3089710958 hasConcept C37404715 @default.
- W3089710958 hasConcept C41008148 @default.
- W3089710958 hasConcept C41895202 @default.
- W3089710958 hasConcept C43617362 @default.
- W3089710958 hasConcept C48044578 @default.
- W3089710958 hasConcept C77088390 @default.
- W3089710958 hasConcept C78519656 @default.
- W3089710958 hasConcept C90509273 @default.
- W3089710958 hasConcept C97541855 @default.
- W3089710958 hasConceptScore W3089710958C111368507 @default.
- W3089710958 hasConceptScore W3089710958C11413529 @default.
- W3089710958 hasConceptScore W3089710958C119857082 @default.
- W3089710958 hasConceptScore W3089710958C126255220 @default.
- W3089710958 hasConceptScore W3089710958C12725497 @default.
- W3089710958 hasConceptScore W3089710958C127313418 @default.
- W3089710958 hasConceptScore W3089710958C127413603 @default.
- W3089710958 hasConceptScore W3089710958C137631369 @default.
- W3089710958 hasConceptScore W3089710958C138885662 @default.
- W3089710958 hasConceptScore W3089710958C154945302 @default.
- W3089710958 hasConceptScore W3089710958C173404611 @default.
- W3089710958 hasConceptScore W3089710958C185592680 @default.
- W3089710958 hasConceptScore W3089710958C198531522 @default.
- W3089710958 hasConceptScore W3089710958C207390915 @default.
- W3089710958 hasConceptScore W3089710958C2775924081 @default.
- W3089710958 hasConceptScore W3089710958C2776036281 @default.
- W3089710958 hasConceptScore W3089710958C33923547 @default.
- W3089710958 hasConceptScore W3089710958C37404715 @default.
- W3089710958 hasConceptScore W3089710958C41008148 @default.
- W3089710958 hasConceptScore W3089710958C41895202 @default.
- W3089710958 hasConceptScore W3089710958C43617362 @default.
- W3089710958 hasConceptScore W3089710958C48044578 @default.
- W3089710958 hasConceptScore W3089710958C77088390 @default.
- W3089710958 hasConceptScore W3089710958C78519656 @default.
- W3089710958 hasConceptScore W3089710958C90509273 @default.
- W3089710958 hasConceptScore W3089710958C97541855 @default.
- W3089710958 hasLocation W30897109581 @default.
- W3089710958 hasOpenAccess W3089710958 @default.
- W3089710958 hasPrimaryLocation W30897109581 @default.
- W3089710958 hasRelatedWork W102453 @default.
- W3089710958 hasRelatedWork W10515480 @default.