Matches in SemOpenAlex for { <https://semopenalex.org/work/W3038892247> ?p ?o ?g. }
- W3038892247 abstract "Solving sparse reward tasks through exploration is one of the major challenges in deep reinforcement learning, especially in three-dimensional, partially-observable environments. Critically, the algorithm proposed in this article uses a single human demonstration to solve hard-exploration problems. We train an agent on a combination of demonstrations and own experience to solve problems with variable initial conditions. We adapt this idea and integrate it with the proximal policy optimization (PPO). The agent is able to increase its performance and to tackle harder problems by replaying its own past trajectories prioritizing them based on the obtained reward and the maximum value of the trajectory. We compare different variations of this algorithm to behavioral cloning on a set of hard-exploration tasks in the Animal-AI Olympics environment. To the best of our knowledge, learning a task in a three-dimensional environment with comparable difficulty has never been considered before using only one human demonstration." @default.
- W3038892247 created "2020-07-10" @default.
- W3038892247 creator A5027026045 @default.
- W3038892247 creator A5033313151 @default.
- W3038892247 date "2020-07-07" @default.
- W3038892247 modified "2023-09-27" @default.
- W3038892247 title "Guided Exploration with Proximal Policy Optimization using a Single Demonstration." @default.
- W3038892247 cites W1522301498 @default.
- W3038892247 cites W1993411524 @default.
- W3038892247 cites W2034806191 @default.
- W3038892247 cites W2145339207 @default.
- W3038892247 cites W2157331557 @default.
- W3038892247 cites W2201581102 @default.
- W3038892247 cites W2561776174 @default.
- W3038892247 cites W2601322194 @default.
- W3038892247 cites W2614839826 @default.
- W3038892247 cites W2736601468 @default.
- W3038892247 cites W2766447205 @default.
- W3038892247 cites W2786036274 @default.
- W3038892247 cites W2803616302 @default.
- W3038892247 cites W2899205164 @default.
- W3038892247 cites W2904157920 @default.
- W3038892247 cites W2914261249 @default.
- W3038892247 cites W2914898814 @default.
- W3038892247 cites W2946579496 @default.
- W3038892247 cites W2948199445 @default.
- W3038892247 cites W2962957031 @default.
- W3038892247 cites W2963126744 @default.
- W3038892247 cites W2963160877 @default.
- W3038892247 cites W2963276097 @default.
- W3038892247 cites W2963277051 @default.
- W3038892247 cites W2963328631 @default.
- W3038892247 cites W2963376229 @default.
- W3038892247 cites W2963403143 @default.
- W3038892247 cites W2964157221 @default.
- W3038892247 cites W2971870892 @default.
- W3038892247 cites W2973515080 @default.
- W3038892247 cites W2973525135 @default.
- W3038892247 cites W3006178546 @default.
- W3038892247 cites W3013618273 @default.
- W3038892247 cites W3115293622 @default.
- W3038892247 cites W3118210634 @default.
- W3038892247 hasPublicationYear "2020" @default.
- W3038892247 type Work @default.
- W3038892247 sameAs 3038892247 @default.
- W3038892247 citedByCount "0" @default.
- W3038892247 crossrefType "posted-content" @default.
- W3038892247 hasAuthorship W3038892247A5027026045 @default.
- W3038892247 hasAuthorship W3038892247A5033313151 @default.
- W3038892247 hasConcept C119857082 @default.
- W3038892247 hasConcept C121050878 @default.
- W3038892247 hasConcept C121332964 @default.
- W3038892247 hasConcept C126255220 @default.
- W3038892247 hasConcept C127413603 @default.
- W3038892247 hasConcept C1276947 @default.
- W3038892247 hasConcept C134306372 @default.
- W3038892247 hasConcept C13662910 @default.
- W3038892247 hasConcept C154945302 @default.
- W3038892247 hasConcept C177264268 @default.
- W3038892247 hasConcept C182365436 @default.
- W3038892247 hasConcept C199360897 @default.
- W3038892247 hasConcept C201995342 @default.
- W3038892247 hasConcept C2776291640 @default.
- W3038892247 hasConcept C2780451532 @default.
- W3038892247 hasConcept C33923547 @default.
- W3038892247 hasConcept C41008148 @default.
- W3038892247 hasConcept C97541855 @default.
- W3038892247 hasConceptScore W3038892247C119857082 @default.
- W3038892247 hasConceptScore W3038892247C121050878 @default.
- W3038892247 hasConceptScore W3038892247C121332964 @default.
- W3038892247 hasConceptScore W3038892247C126255220 @default.
- W3038892247 hasConceptScore W3038892247C127413603 @default.
- W3038892247 hasConceptScore W3038892247C1276947 @default.
- W3038892247 hasConceptScore W3038892247C134306372 @default.
- W3038892247 hasConceptScore W3038892247C13662910 @default.
- W3038892247 hasConceptScore W3038892247C154945302 @default.
- W3038892247 hasConceptScore W3038892247C177264268 @default.
- W3038892247 hasConceptScore W3038892247C182365436 @default.
- W3038892247 hasConceptScore W3038892247C199360897 @default.
- W3038892247 hasConceptScore W3038892247C201995342 @default.
- W3038892247 hasConceptScore W3038892247C2776291640 @default.
- W3038892247 hasConceptScore W3038892247C2780451532 @default.
- W3038892247 hasConceptScore W3038892247C33923547 @default.
- W3038892247 hasConceptScore W3038892247C41008148 @default.
- W3038892247 hasConceptScore W3038892247C97541855 @default.
- W3038892247 hasLocation W30388922471 @default.
- W3038892247 hasOpenAccess W3038892247 @default.
- W3038892247 hasPrimaryLocation W30388922471 @default.
- W3038892247 hasRelatedWork W1604959332 @default.
- W3038892247 hasRelatedWork W2075471234 @default.
- W3038892247 hasRelatedWork W2539326829 @default.
- W3038892247 hasRelatedWork W2891234582 @default.
- W3038892247 hasRelatedWork W2891921281 @default.
- W3038892247 hasRelatedWork W2913273311 @default.
- W3038892247 hasRelatedWork W2959488596 @default.
- W3038892247 hasRelatedWork W2963099939 @default.
- W3038892247 hasRelatedWork W2963176272 @default.
- W3038892247 hasRelatedWork W3000757491 @default.
- W3038892247 hasRelatedWork W3019235189 @default.
- W3038892247 hasRelatedWork W3020367954 @default.