Matches in SemOpenAlex for { <https://semopenalex.org/work/W3111570652> ?p ?o ?g. }
- W3111570652 abstract "Despite its promise, reinforcement learning's real-world adoption has been hampered by the need for costly exploration to learn a good policy. Imitation learning (IL) mitigates this shortcoming by using an oracle policy during training as a bootstrap to accelerate the learning process. However, in many practical situations, the learner has access to multiple suboptimal oracles, which may provide conflicting advice in a state. The existing IL literature provides a limited treatment of such scenarios. Whereas in the single-oracle case, the return of the oracle's policy provides an obvious benchmark for the learner to compete against, neither such a benchmark nor principled ways of outperforming it are known for the multi-oracle setting. In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles. Using a reduction of policy optimization to online learning, we introduce a novel IL algorithm MAMBA, which can provably learn a policy competitive with this benchmark. In particular, MAMBA optimizes policies by using a gradient estimator in the style of generalized advantage estimation (GAE). Our theoretical analysis shows that this design makes MAMBA robust and enables it to outperform the oracle policies by a larger margin than the IL state of the art, even in the single-oracle case. In an evaluation against standard policy gradient with GAE and AggreVaTe(D), we showcase MAMBA's ability to leverage demonstrations both from a single and from multiple weak oracles, and significantly speed up policy optimization." @default.
- W3111570652 created "2020-12-21" @default.
- W3111570652 creator A5036435487 @default.
- W3111570652 creator A5062476223 @default.
- W3111570652 creator A5067758026 @default.
- W3111570652 date "2020-07-01" @default.
- W3111570652 modified "2023-09-26" @default.
- W3111570652 title "Policy Improvement via Imitation of Multiple Oracles" @default.
- W3111570652 cites W112666333 @default.
- W3111570652 cites W1575592356 @default.
- W3111570652 cites W1777239053 @default.
- W3111570652 cites W1850531616 @default.
- W3111570652 cites W1970789124 @default.
- W3111570652 cites W1977655452 @default.
- W3111570652 cites W1999874108 @default.
- W3111570652 cites W2020677283 @default.
- W3111570652 cites W2098774185 @default.
- W3111570652 cites W2100677568 @default.
- W3111570652 cites W2116064496 @default.
- W3111570652 cites W2119567691 @default.
- W3111570652 cites W2148825261 @default.
- W3111570652 cites W2150884987 @default.
- W3111570652 cites W2158782408 @default.
- W3111570652 cites W2167224731 @default.
- W3111570652 cites W2594640072 @default.
- W3111570652 cites W2787065812 @default.
- W3111570652 cites W2793955514 @default.
- W3111570652 cites W2794908222 @default.
- W3111570652 cites W2804930149 @default.
- W3111570652 cites W2890326782 @default.
- W3111570652 cites W2891236810 @default.
- W3111570652 cites W2898035736 @default.
- W3111570652 cites W2962694783 @default.
- W3111570652 cites W2962957031 @default.
- W3111570652 cites W2963098081 @default.
- W3111570652 cites W2963277051 @default.
- W3111570652 cites W2963328631 @default.
- W3111570652 cites W2963349913 @default.
- W3111570652 cites W2963590100 @default.
- W3111570652 cites W2963821151 @default.
- W3111570652 cites W2963888186 @default.
- W3111570652 cites W2964121744 @default.
- W3111570652 cites W2964134150 @default.
- W3111570652 cites W2982530739 @default.
- W3111570652 cites W3031124621 @default.
- W3111570652 cites W3072315125 @default.
- W3111570652 doi "https://doi.org/10.48550/arxiv.2007.00795" @default.
- W3111570652 hasPublicationYear "2020" @default.
- W3111570652 type Work @default.
- W3111570652 sameAs 3111570652 @default.
- W3111570652 citedByCount "0" @default.
- W3111570652 crossrefType "posted-content" @default.
- W3111570652 hasAuthorship W3111570652A5036435487 @default.
- W3111570652 hasAuthorship W3111570652A5062476223 @default.
- W3111570652 hasAuthorship W3111570652A5067758026 @default.
- W3111570652 hasBestOaLocation W31115706521 @default.
- W3111570652 hasConcept C105795698 @default.
- W3111570652 hasConcept C111919701 @default.
- W3111570652 hasConcept C115903868 @default.
- W3111570652 hasConcept C119857082 @default.
- W3111570652 hasConcept C126388530 @default.
- W3111570652 hasConcept C12725497 @default.
- W3111570652 hasConcept C13280743 @default.
- W3111570652 hasConcept C153083717 @default.
- W3111570652 hasConcept C154945302 @default.
- W3111570652 hasConcept C15744967 @default.
- W3111570652 hasConcept C17744445 @default.
- W3111570652 hasConcept C185429906 @default.
- W3111570652 hasConcept C185798385 @default.
- W3111570652 hasConcept C199539241 @default.
- W3111570652 hasConcept C205649164 @default.
- W3111570652 hasConcept C33923547 @default.
- W3111570652 hasConcept C41008148 @default.
- W3111570652 hasConcept C55166926 @default.
- W3111570652 hasConcept C774472 @default.
- W3111570652 hasConcept C77805123 @default.
- W3111570652 hasConcept C97541855 @default.
- W3111570652 hasConcept C98045186 @default.
- W3111570652 hasConceptScore W3111570652C105795698 @default.
- W3111570652 hasConceptScore W3111570652C111919701 @default.
- W3111570652 hasConceptScore W3111570652C115903868 @default.
- W3111570652 hasConceptScore W3111570652C119857082 @default.
- W3111570652 hasConceptScore W3111570652C126388530 @default.
- W3111570652 hasConceptScore W3111570652C12725497 @default.
- W3111570652 hasConceptScore W3111570652C13280743 @default.
- W3111570652 hasConceptScore W3111570652C153083717 @default.
- W3111570652 hasConceptScore W3111570652C154945302 @default.
- W3111570652 hasConceptScore W3111570652C15744967 @default.
- W3111570652 hasConceptScore W3111570652C17744445 @default.
- W3111570652 hasConceptScore W3111570652C185429906 @default.
- W3111570652 hasConceptScore W3111570652C185798385 @default.
- W3111570652 hasConceptScore W3111570652C199539241 @default.
- W3111570652 hasConceptScore W3111570652C205649164 @default.
- W3111570652 hasConceptScore W3111570652C33923547 @default.
- W3111570652 hasConceptScore W3111570652C41008148 @default.
- W3111570652 hasConceptScore W3111570652C55166926 @default.
- W3111570652 hasConceptScore W3111570652C774472 @default.
- W3111570652 hasConceptScore W3111570652C77805123 @default.
- W3111570652 hasConceptScore W3111570652C97541855 @default.
- W3111570652 hasConceptScore W3111570652C98045186 @default.