Matches in SemOpenAlex for { <https://semopenalex.org/work/W4381551344> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W4381551344 abstract "Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps." @default.
- W4381551344 created "2023-06-22" @default.
- W4381551344 creator A5023817654 @default.
- W4381551344 creator A5051862250 @default.
- W4381551344 creator A5061140388 @default.
- W4381551344 creator A5071030696 @default.
- W4381551344 creator A5086678121 @default.
- W4381551344 date "2023-06-17" @default.
- W4381551344 modified "2023-09-25" @default.
- W4381551344 title "Active Policy Improvement from Multiple Black-box Oracles" @default.
- W4381551344 doi "https://doi.org/10.48550/arxiv.2306.10259" @default.
- W4381551344 hasPublicationYear "2023" @default.
- W4381551344 type Work @default.
- W4381551344 citedByCount "0" @default.
- W4381551344 crossrefType "posted-content" @default.
- W4381551344 hasAuthorship W4381551344A5023817654 @default.
- W4381551344 hasAuthorship W4381551344A5051862250 @default.
- W4381551344 hasAuthorship W4381551344A5061140388 @default.
- W4381551344 hasAuthorship W4381551344A5071030696 @default.
- W4381551344 hasAuthorship W4381551344A5086678121 @default.
- W4381551344 hasBestOaLocation W43815513441 @default.
- W4381551344 hasConcept C11413529 @default.
- W4381551344 hasConcept C115903868 @default.
- W4381551344 hasConcept C119857082 @default.
- W4381551344 hasConcept C126388530 @default.
- W4381551344 hasConcept C14036430 @default.
- W4381551344 hasConcept C154945302 @default.
- W4381551344 hasConcept C15744967 @default.
- W4381551344 hasConcept C166957645 @default.
- W4381551344 hasConcept C177264268 @default.
- W4381551344 hasConcept C199360897 @default.
- W4381551344 hasConcept C2775924081 @default.
- W4381551344 hasConcept C2776760102 @default.
- W4381551344 hasConcept C2777212361 @default.
- W4381551344 hasConcept C41008148 @default.
- W4381551344 hasConcept C48103436 @default.
- W4381551344 hasConcept C55166926 @default.
- W4381551344 hasConcept C77805123 @default.
- W4381551344 hasConcept C78458016 @default.
- W4381551344 hasConcept C79581498 @default.
- W4381551344 hasConcept C86803240 @default.
- W4381551344 hasConcept C94966114 @default.
- W4381551344 hasConcept C95457728 @default.
- W4381551344 hasConcept C97541855 @default.
- W4381551344 hasConceptScore W4381551344C11413529 @default.
- W4381551344 hasConceptScore W4381551344C115903868 @default.
- W4381551344 hasConceptScore W4381551344C119857082 @default.
- W4381551344 hasConceptScore W4381551344C126388530 @default.
- W4381551344 hasConceptScore W4381551344C14036430 @default.
- W4381551344 hasConceptScore W4381551344C154945302 @default.
- W4381551344 hasConceptScore W4381551344C15744967 @default.
- W4381551344 hasConceptScore W4381551344C166957645 @default.
- W4381551344 hasConceptScore W4381551344C177264268 @default.
- W4381551344 hasConceptScore W4381551344C199360897 @default.
- W4381551344 hasConceptScore W4381551344C2775924081 @default.
- W4381551344 hasConceptScore W4381551344C2776760102 @default.
- W4381551344 hasConceptScore W4381551344C2777212361 @default.
- W4381551344 hasConceptScore W4381551344C41008148 @default.
- W4381551344 hasConceptScore W4381551344C48103436 @default.
- W4381551344 hasConceptScore W4381551344C55166926 @default.
- W4381551344 hasConceptScore W4381551344C77805123 @default.
- W4381551344 hasConceptScore W4381551344C78458016 @default.
- W4381551344 hasConceptScore W4381551344C79581498 @default.
- W4381551344 hasConceptScore W4381551344C86803240 @default.
- W4381551344 hasConceptScore W4381551344C94966114 @default.
- W4381551344 hasConceptScore W4381551344C95457728 @default.
- W4381551344 hasConceptScore W4381551344C97541855 @default.
- W4381551344 hasLocation W43815513441 @default.
- W4381551344 hasLocation W43815513442 @default.
- W4381551344 hasOpenAccess W4381551344 @default.
- W4381551344 hasPrimaryLocation W43815513441 @default.
- W4381551344 hasRelatedWork W2099308455 @default.
- W4381551344 hasRelatedWork W2795798526 @default.
- W4381551344 hasRelatedWork W2795910581 @default.
- W4381551344 hasRelatedWork W2805805280 @default.
- W4381551344 hasRelatedWork W2944362136 @default.
- W4381551344 hasRelatedWork W2963098081 @default.
- W4381551344 hasRelatedWork W3022038857 @default.
- W4381551344 hasRelatedWork W4297785905 @default.
- W4381551344 hasRelatedWork W4306666666 @default.
- W4381551344 hasRelatedWork W4319083788 @default.
- W4381551344 isParatext "false" @default.
- W4381551344 isRetracted "false" @default.
- W4381551344 workType "article" @default.