Matches in SemOpenAlex for { <https://semopenalex.org/work/W2951725892> ?p ?o ?g. }
- W2951725892 abstract "Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning (RL). We hypothesize that one critical element of solving such problems is the notion of compositionality. With the ability to learn concepts and sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. In this paper, we propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems. Our approach learns an instruction-following low-level policy and a high-level policy that can reuse abstractions across tasks, in essence, permitting agents to reason using structured language. To study compositional task learning, we introduce an open-source object interaction environment built using the MuJoCo physics engine and the CLEVR engine. We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations. Our analysis reveals that the compositional nature of language is critical for learning diverse sub-skills and systematically generalizing to new sub-skills in comparison to non-compositional abstractions that use the same supervision." @default.
- W2951725892 created "2019-06-27" @default.
- W2951725892 creator A5002713363 @default.
- W2951725892 creator A5005431772 @default.
- W2951725892 creator A5011289358 @default.
- W2951725892 creator A5061613634 @default.
- W2951725892 date "2019-06-18" @default.
- W2951725892 modified "2023-09-25" @default.
- W2951725892 title "Language as an Abstraction for Hierarchical Deep Reinforcement Learning" @default.
- W2951725892 cites W1191599655 @default.
- W2951725892 cites W1489525520 @default.
- W2951725892 cites W1492014007 @default.
- W2951725892 cites W1507087299 @default.
- W2951725892 cites W1522301498 @default.
- W2951725892 cites W1556824961 @default.
- W2951725892 cites W1585861384 @default.
- W2951725892 cites W2077502546 @default.
- W2951725892 cites W2109910161 @default.
- W2951725892 cites W2111967991 @default.
- W2951725892 cites W2121517924 @default.
- W2951725892 cites W2139612737 @default.
- W2951725892 cites W2145339207 @default.
- W2951725892 cites W2158548602 @default.
- W2951725892 cites W2158782408 @default.
- W2951725892 cites W2160371091 @default.
- W2951725892 cites W2160808139 @default.
- W2951725892 cites W2264742718 @default.
- W2951725892 cites W2465628802 @default.
- W2951725892 cites W2534060593 @default.
- W2951725892 cites W2553882142 @default.
- W2951725892 cites W2561715562 @default.
- W2951725892 cites W2567070169 @default.
- W2951725892 cites W2606433045 @default.
- W2951725892 cites W2609374097 @default.
- W2951725892 cites W2615790994 @default.
- W2951725892 cites W2739330054 @default.
- W2951725892 cites W2754203286 @default.
- W2951725892 cites W2765602917 @default.
- W2951725892 cites W2766447205 @default.
- W2951725892 cites W2767020241 @default.
- W2951725892 cites W2780057514 @default.
- W2951725892 cites W2781726626 @default.
- W2951725892 cites W2785678896 @default.
- W2951725892 cites W2793351326 @default.
- W2951725892 cites W2804672169 @default.
- W2951725892 cites W2805516822 @default.
- W2951725892 cites W2806532810 @default.
- W2951725892 cites W2895560838 @default.
- W2951725892 cites W2897513296 @default.
- W2951725892 cites W2914112028 @default.
- W2951725892 cites W2922007426 @default.
- W2951725892 cites W2948380112 @default.
- W2951725892 cites W2949267040 @default.
- W2951725892 cites W2949801941 @default.
- W2951725892 cites W2949888546 @default.
- W2951725892 cites W2950151997 @default.
- W2951725892 cites W2951004968 @default.
- W2951725892 cites W2952523895 @default.
- W2951725892 cites W2953174743 @default.
- W2951725892 cites W2962823158 @default.
- W2951725892 cites W2962832483 @default.
- W2951725892 cites W2963262099 @default.
- W2951725892 cites W2963799213 @default.
- W2951725892 cites W2963800628 @default.
- W2951725892 cites W2963921132 @default.
- W2951725892 cites W2964001908 @default.
- W2951725892 cites W2964036701 @default.
- W2951725892 cites W2964118262 @default.
- W2951725892 cites W2964161785 @default.
- W2951725892 cites W2964164804 @default.
- W2951725892 cites W2964227312 @default.
- W2951725892 cites W2964342357 @default.
- W2951725892 cites W2987688408 @default.
- W2951725892 cites W567721252 @default.
- W2951725892 hasPublicationYear "2019" @default.
- W2951725892 type Work @default.
- W2951725892 sameAs 2951725892 @default.
- W2951725892 citedByCount "13" @default.
- W2951725892 countsByYear W29517258922019 @default.
- W2951725892 countsByYear W29517258922020 @default.
- W2951725892 countsByYear W29517258922021 @default.
- W2951725892 countsByYear W29517258922022 @default.
- W2951725892 crossrefType "posted-content" @default.
- W2951725892 hasAuthorship W2951725892A5002713363 @default.
- W2951725892 hasAuthorship W2951725892A5005431772 @default.
- W2951725892 hasAuthorship W2951725892A5011289358 @default.
- W2951725892 hasAuthorship W2951725892A5061613634 @default.
- W2951725892 hasConcept C105795698 @default.
- W2951725892 hasConcept C107457646 @default.
- W2951725892 hasConcept C111472728 @default.
- W2951725892 hasConcept C121375916 @default.
- W2951725892 hasConcept C124304363 @default.
- W2951725892 hasConcept C134306372 @default.
- W2951725892 hasConcept C136197465 @default.
- W2951725892 hasConcept C138885662 @default.
- W2951725892 hasConcept C154945302 @default.
- W2951725892 hasConcept C162324750 @default.
- W2951725892 hasConcept C177148314 @default.
- W2951725892 hasConcept C187736073 @default.
- W2951725892 hasConcept C18903297 @default.