Matches in SemOpenAlex for { <https://semopenalex.org/work/W3109396871> ?p ?o ?g. }
- W3109396871 abstract "We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem. Model-free reinforcement learning algorithms can achieve good asymptotic performance in multi-task learning at a cost of extensive sampling, due to their approach, which requires learning from scratch. While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties. Meta-reinforcement learning addresses the efficiency and generalization challenges on multi task learning by quickly leveraging the meta-prior policy for a new task. In this paper, we propose a meta-reinforcement learning approach to learn the dynamic model of a non-stationary environment to be used for meta-policy optimization later. Due to the sample efficiency of model-based learning methods, we are able to simultaneously train both the meta-model of the non-stationary environment and the meta-policy until dynamic model convergence. Then, the meta-learned dynamic model of the environment will generate simulated data for meta-policy optimization. Our experiment demonstrates that our proposed method can meta-learn the policy in a non-stationary environment with the data efficiency of model-based learning approaches while achieving the high asymptotic performance of model-free meta-reinforcement learning." @default.
- W3109396871 created "2020-12-07" @default.
- W3109396871 creator A5002752114 @default.
- W3109396871 creator A5063341901 @default.
- W3109396871 date "2020-11-21" @default.
- W3109396871 modified "2023-10-08" @default.
- W3109396871 title "Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments" @default.
- W3109396871 cites W1757796397 @default.
- W3109396871 cites W1771410628 @default.
- W3109396871 cites W1868559974 @default.
- W3109396871 cites W1923344279 @default.
- W3109396871 cites W2100235553 @default.
- W3109396871 cites W2119717200 @default.
- W3109396871 cites W2121103318 @default.
- W3109396871 cites W2134491302 @default.
- W3109396871 cites W2136503687 @default.
- W3109396871 cites W2137825550 @default.
- W3109396871 cites W2140135625 @default.
- W3109396871 cites W2188233853 @default.
- W3109396871 cites W2257979135 @default.
- W3109396871 cites W2281096776 @default.
- W3109396871 cites W2290354866 @default.
- W3109396871 cites W2416041116 @default.
- W3109396871 cites W2575705757 @default.
- W3109396871 cites W2578206533 @default.
- W3109396871 cites W2604763608 @default.
- W3109396871 cites W2742093937 @default.
- W3109396871 cites W2753160622 @default.
- W3109396871 cites W2768607686 @default.
- W3109396871 cites W2787501667 @default.
- W3109396871 cites W2789824229 @default.
- W3109396871 cites W2794363191 @default.
- W3109396871 cites W2794757725 @default.
- W3109396871 cites W2795900505 @default.
- W3109396871 cites W2892230114 @default.
- W3109396871 cites W2942608247 @default.
- W3109396871 cites W2962872206 @default.
- W3109396871 cites W2962899390 @default.
- W3109396871 cites W2963025296 @default.
- W3109396871 cites W2963303956 @default.
- W3109396871 cites W2963547174 @default.
- W3109396871 cites W2963581679 @default.
- W3109396871 cites W2963775850 @default.
- W3109396871 cites W2963809569 @default.
- W3109396871 cites W2963846183 @default.
- W3109396871 cites W2964093801 @default.
- W3109396871 cites W2964161785 @default.
- W3109396871 cites W2964173023 @default.
- W3109396871 cites W2996148148 @default.
- W3109396871 cites W3195133498 @default.
- W3109396871 hasPublicationYear "2020" @default.
- W3109396871 type Work @default.
- W3109396871 sameAs 3109396871 @default.
- W3109396871 citedByCount "1" @default.
- W3109396871 countsByYear W31093968712021 @default.
- W3109396871 crossrefType "posted-content" @default.
- W3109396871 hasAuthorship W3109396871A5002752114 @default.
- W3109396871 hasAuthorship W3109396871A5063341901 @default.
- W3109396871 hasConcept C119857082 @default.
- W3109396871 hasConcept C127413603 @default.
- W3109396871 hasConcept C134306372 @default.
- W3109396871 hasConcept C154945302 @default.
- W3109396871 hasConcept C162324750 @default.
- W3109396871 hasConcept C177148314 @default.
- W3109396871 hasConcept C185592680 @default.
- W3109396871 hasConcept C198531522 @default.
- W3109396871 hasConcept C201995342 @default.
- W3109396871 hasConcept C2777303404 @default.
- W3109396871 hasConcept C2779436431 @default.
- W3109396871 hasConcept C2780451532 @default.
- W3109396871 hasConcept C2781002164 @default.
- W3109396871 hasConcept C33923547 @default.
- W3109396871 hasConcept C41008148 @default.
- W3109396871 hasConcept C43617362 @default.
- W3109396871 hasConcept C50522688 @default.
- W3109396871 hasConcept C97541855 @default.
- W3109396871 hasConceptScore W3109396871C119857082 @default.
- W3109396871 hasConceptScore W3109396871C127413603 @default.
- W3109396871 hasConceptScore W3109396871C134306372 @default.
- W3109396871 hasConceptScore W3109396871C154945302 @default.
- W3109396871 hasConceptScore W3109396871C162324750 @default.
- W3109396871 hasConceptScore W3109396871C177148314 @default.
- W3109396871 hasConceptScore W3109396871C185592680 @default.
- W3109396871 hasConceptScore W3109396871C198531522 @default.
- W3109396871 hasConceptScore W3109396871C201995342 @default.
- W3109396871 hasConceptScore W3109396871C2777303404 @default.
- W3109396871 hasConceptScore W3109396871C2779436431 @default.
- W3109396871 hasConceptScore W3109396871C2780451532 @default.
- W3109396871 hasConceptScore W3109396871C2781002164 @default.
- W3109396871 hasConceptScore W3109396871C33923547 @default.
- W3109396871 hasConceptScore W3109396871C41008148 @default.
- W3109396871 hasConceptScore W3109396871C43617362 @default.
- W3109396871 hasConceptScore W3109396871C50522688 @default.
- W3109396871 hasConceptScore W3109396871C97541855 @default.
- W3109396871 hasLocation W31093968711 @default.
- W3109396871 hasOpenAccess W3109396871 @default.
- W3109396871 hasPrimaryLocation W31093968711 @default.
- W3109396871 hasRelatedWork W2923504512 @default.
- W3109396871 hasRelatedWork W2949555518 @default.
- W3109396871 hasRelatedWork W2949945034 @default.