Matches in SemOpenAlex for { <https://semopenalex.org/work/W3120288532> ?p ?o ?g. }
- W3120288532 abstract "Meta-reinforcement learning (RL) addresses the problem of sample inefficiency in deep RL by using experience obtained in past tasks for a new task to be solved. However, most meta-RL methods require partially or fully on-policy data, i.e., they cannot reuse the data collected by past policies, which hinders the improvement of sample efficiency. To alleviate this problem, we propose a novel off-policy meta-RL method, embedding learning and evaluation of uncertainty (ELUE). An ELUE agent is characterized by the learning of a feature embedding space shared among tasks. It learns a belief model over the embedding space and a belief-conditional policy and Q-function. Then, for a new task, it collects data by the pretrained policy, and updates its belief based on the belief model. Thanks to the belief update, the performance can be improved with a small amount of data. In addition, it updates the parameters of the neural networks to adjust the pretrained relationships when there are enough data. We demonstrate that ELUE outperforms state-of-the-art meta RL methods through experiments on meta-RL benchmarks." @default.
- W3120288532 created "2021-01-18" @default.
- W3120288532 creator A5002489351 @default.
- W3120288532 creator A5064113904 @default.
- W3120288532 creator A5064754457 @default.
- W3120288532 date "2021-01-06" @default.
- W3120288532 modified "2023-10-02" @default.
- W3120288532 title "Off-Policy Meta-Reinforcement Learning Based on Feature Embedding Spaces" @default.
- W3120288532 cites W1686946872 @default.
- W3120288532 cites W2060277733 @default.
- W3120288532 cites W2097707447 @default.
- W3120288532 cites W2151710526 @default.
- W3120288532 cites W2168359464 @default.
- W3120288532 cites W2188365844 @default.
- W3120288532 cites W2578206533 @default.
- W3120288532 cites W2604763608 @default.
- W3120288532 cites W2753798143 @default.
- W3120288532 cites W2766447205 @default.
- W3120288532 cites W2794363191 @default.
- W3120288532 cites W2795900505 @default.
- W3120288532 cites W2797527950 @default.
- W3120288532 cites W2908064123 @default.
- W3120288532 cites W2913854057 @default.
- W3120288532 cites W2916826721 @default.
- W3120288532 cites W2921037632 @default.
- W3120288532 cites W2923504512 @default.
- W3120288532 cites W2945020056 @default.
- W3120288532 cites W2962854145 @default.
- W3120288532 cites W2962902376 @default.
- W3120288532 cites W2963680188 @default.
- W3120288532 cites W2964009285 @default.
- W3120288532 cites W2964160479 @default.
- W3120288532 cites W2964168257 @default.
- W3120288532 cites W2970105755 @default.
- W3120288532 cites W2970214542 @default.
- W3120288532 cites W2971014752 @default.
- W3120288532 cites W2981344907 @default.
- W3120288532 cites W2996037775 @default.
- W3120288532 cites W2996148148 @default.
- W3120288532 cites W3032377877 @default.
- W3120288532 cites W3035389468 @default.
- W3120288532 hasPublicationYear "2021" @default.
- W3120288532 type Work @default.
- W3120288532 sameAs 3120288532 @default.
- W3120288532 citedByCount "1" @default.
- W3120288532 countsByYear W31202885322022 @default.
- W3120288532 crossrefType "posted-content" @default.
- W3120288532 hasAuthorship W3120288532A5002489351 @default.
- W3120288532 hasAuthorship W3120288532A5064113904 @default.
- W3120288532 hasAuthorship W3120288532A5064754457 @default.
- W3120288532 hasConcept C105795698 @default.
- W3120288532 hasConcept C111919701 @default.
- W3120288532 hasConcept C119857082 @default.
- W3120288532 hasConcept C127413603 @default.
- W3120288532 hasConcept C138885662 @default.
- W3120288532 hasConcept C14036430 @default.
- W3120288532 hasConcept C154945302 @default.
- W3120288532 hasConcept C162324750 @default.
- W3120288532 hasConcept C175444787 @default.
- W3120288532 hasConcept C185592680 @default.
- W3120288532 hasConcept C198531522 @default.
- W3120288532 hasConcept C201995342 @default.
- W3120288532 hasConcept C206588197 @default.
- W3120288532 hasConcept C2776401178 @default.
- W3120288532 hasConcept C2778572836 @default.
- W3120288532 hasConcept C2778869765 @default.
- W3120288532 hasConcept C2779436431 @default.
- W3120288532 hasConcept C2780451532 @default.
- W3120288532 hasConcept C2781002164 @default.
- W3120288532 hasConcept C33923547 @default.
- W3120288532 hasConcept C41008148 @default.
- W3120288532 hasConcept C41608201 @default.
- W3120288532 hasConcept C41895202 @default.
- W3120288532 hasConcept C43617362 @default.
- W3120288532 hasConcept C548081761 @default.
- W3120288532 hasConcept C72434380 @default.
- W3120288532 hasConcept C78458016 @default.
- W3120288532 hasConcept C86803240 @default.
- W3120288532 hasConcept C97541855 @default.
- W3120288532 hasConceptScore W3120288532C105795698 @default.
- W3120288532 hasConceptScore W3120288532C111919701 @default.
- W3120288532 hasConceptScore W3120288532C119857082 @default.
- W3120288532 hasConceptScore W3120288532C127413603 @default.
- W3120288532 hasConceptScore W3120288532C138885662 @default.
- W3120288532 hasConceptScore W3120288532C14036430 @default.
- W3120288532 hasConceptScore W3120288532C154945302 @default.
- W3120288532 hasConceptScore W3120288532C162324750 @default.
- W3120288532 hasConceptScore W3120288532C175444787 @default.
- W3120288532 hasConceptScore W3120288532C185592680 @default.
- W3120288532 hasConceptScore W3120288532C198531522 @default.
- W3120288532 hasConceptScore W3120288532C201995342 @default.
- W3120288532 hasConceptScore W3120288532C206588197 @default.
- W3120288532 hasConceptScore W3120288532C2776401178 @default.
- W3120288532 hasConceptScore W3120288532C2778572836 @default.
- W3120288532 hasConceptScore W3120288532C2778869765 @default.
- W3120288532 hasConceptScore W3120288532C2779436431 @default.
- W3120288532 hasConceptScore W3120288532C2780451532 @default.
- W3120288532 hasConceptScore W3120288532C2781002164 @default.
- W3120288532 hasConceptScore W3120288532C33923547 @default.
- W3120288532 hasConceptScore W3120288532C41008148 @default.