Matches in SemOpenAlex for { <https://semopenalex.org/work/W2992767559> ?p ?o ?g. }
- W2992767559 abstract "In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast reinforcement learning (RL) strategies that transfer to similar tasks. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can useful pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution. We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner's data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks. Repeating this procedure leads to iterative reorganization such that the curriculum adapts as the meta-learner's data distribution shifts. In particular, we show how discriminative clustering for visual representation can support trajectory-level task acquisition and exploration in domains with pixel observations, avoiding pitfalls of alternatives. In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient supervised meta-learning of test task distributions." @default.
- W2992767559 created "2019-12-13" @default.
- W2992767559 creator A5005431772 @default.
- W2992767559 creator A5017906439 @default.
- W2992767559 creator A5026322200 @default.
- W2992767559 creator A5055442462 @default.
- W2992767559 creator A5081649142 @default.
- W2992767559 creator A5084558494 @default.
- W2992767559 date "2019-12-09" @default.
- W2992767559 modified "2023-09-25" @default.
- W2992767559 title "Unsupervised Curricula for Visual Meta-Reinforcement Learning" @default.
- W2992767559 cites W115285041 @default.
- W2992767559 cites W1585013689 @default.
- W2992767559 cites W1680245855 @default.
- W2992767559 cites W1786044565 @default.
- W2992767559 cites W1799227866 @default.
- W2992767559 cites W2108384452 @default.
- W2992767559 cites W2139612737 @default.
- W2992767559 cites W2144433748 @default.
- W2992767559 cites W2151083897 @default.
- W2992767559 cites W2151834591 @default.
- W2992767559 cites W2158782408 @default.
- W2992767559 cites W2160589914 @default.
- W2992767559 cites W2194775991 @default.
- W2992767559 cites W2528489519 @default.
- W2992767559 cites W2556477470 @default.
- W2992767559 cites W2578206533 @default.
- W2992767559 cites W2604373826 @default.
- W2992767559 cites W2604763608 @default.
- W2992767559 cites W2726717203 @default.
- W2992767559 cites W2736601468 @default.
- W2992767559 cites W2744921630 @default.
- W2992767559 cites W2751973545 @default.
- W2992767559 cites W2785397462 @default.
- W2992767559 cites W2787501667 @default.
- W2992767559 cites W2788904251 @default.
- W2992767559 cites W2808682055 @default.
- W2992767559 cites W2842511635 @default.
- W2992767559 cites W2883433335 @default.
- W2992767559 cites W2883725317 @default.
- W2992767559 cites W2892490014 @default.
- W2992767559 cites W2902697547 @default.
- W2992767559 cites W2903327785 @default.
- W2992767559 cites W2908470496 @default.
- W2992767559 cites W2912889105 @default.
- W2992767559 cites W2913340405 @default.
- W2992767559 cites W2915604253 @default.
- W2992767559 cites W2923504512 @default.
- W2992767559 cites W2938321354 @default.
- W2992767559 cites W2950736586 @default.
- W2992767559 cites W2951004968 @default.
- W2992767559 cites W2962723954 @default.
- W2992767559 cites W2962730405 @default.
- W2992767559 cites W2963025296 @default.
- W2992767559 cites W2963276097 @default.
- W2992767559 cites W2963293881 @default.
- W2992767559 cites W2963311874 @default.
- W2992767559 cites W2963406904 @default.
- W2992767559 cites W2963430173 @default.
- W2992767559 cites W2963438456 @default.
- W2992767559 cites W2963495051 @default.
- W2992767559 cites W2963577640 @default.
- W2992767559 cites W2963581679 @default.
- W2992767559 cites W2963646405 @default.
- W2992767559 cites W2963820385 @default.
- W2992767559 cites W2963871073 @default.
- W2992767559 cites W2964032613 @default.
- W2992767559 cites W2964067469 @default.
- W2992767559 cites W2964084698 @default.
- W2992767559 cites W2964227899 @default.
- W2992767559 cites W2964327384 @default.
- W2992767559 cites W2964342357 @default.
- W2992767559 cites W2972758308 @default.
- W2992767559 cites W2986775770 @default.
- W2992767559 cites W567721252 @default.
- W2992767559 cites W99485931 @default.
- W2992767559 hasPublicationYear "2019" @default.
- W2992767559 type Work @default.
- W2992767559 sameAs 2992767559 @default.
- W2992767559 citedByCount "14" @default.
- W2992767559 countsByYear W29927675592020 @default.
- W2992767559 countsByYear W29927675592021 @default.
- W2992767559 crossrefType "posted-content" @default.
- W2992767559 hasAuthorship W2992767559A5005431772 @default.
- W2992767559 hasAuthorship W2992767559A5017906439 @default.
- W2992767559 hasAuthorship W2992767559A5026322200 @default.
- W2992767559 hasAuthorship W2992767559A5055442462 @default.
- W2992767559 hasAuthorship W2992767559A5081649142 @default.
- W2992767559 hasAuthorship W2992767559A5084558494 @default.
- W2992767559 hasConcept C119857082 @default.
- W2992767559 hasConcept C154945302 @default.
- W2992767559 hasConcept C162324750 @default.
- W2992767559 hasConcept C187736073 @default.
- W2992767559 hasConcept C2780451532 @default.
- W2992767559 hasConcept C2781002164 @default.
- W2992767559 hasConcept C28006648 @default.
- W2992767559 hasConcept C41008148 @default.
- W2992767559 hasConcept C73555534 @default.
- W2992767559 hasConcept C8038995 @default.
- W2992767559 hasConcept C97541855 @default.