Matches in SemOpenAlex for { <https://semopenalex.org/work/W3120229615> ?p ?o ?g. }
Showing items 1 to 80 of
80
with 100 items per page.
- W3120229615 abstract "The option framework, one of the most promising Hierarchical Reinforcement Learning (HRL) frameworks, is developed based on the Semi-Markov Decision Problem (SMDP) and employs a triple formulation of the option (i.e., an action policy, a termination probability, and an initiation set). These design choices, however, mean that the option framework: 1) has low sample efficiency, 2) cannot use more stable Markov Decision Problem (MDP) based learning algorithms, 3) represents abstract actions implicitly, and 4) is expensive to scale up. To overcome these problems, here we propose a simple yet effective MDP implementation of the option framework: the Skill-Action (SA) architecture. Derived from a novel discovery that the SMDP option framework has an MDP equivalence, SA hierarchically extracts skills (abstract actions) from primary actions and explicitly encodes these knowledge into skill context vectors (embedding vectors). Although SA is MDP formulated, skills can still be temporally extended by applying the attention mechanism to skill context vectors. Unlike the option framework, which requires M action policies for M skills, SA's action policy only needs one decoder to decode skill context vectors into primary actions. Under this formulation, SA can be optimized with any MDP based policy gradient algorithm. Moreover, it is sample efficient, cheap to scale up, and theoretically proven to have lower variance. Our empirical studies on challenging infinite horizon robot simulation environments demonstrate that SA not only outperforms all baselines by a large margin, but also exhibits smaller variance, faster convergence, and good interpretability. On transfer learning tasks, SA also outperforms the other models and shows its advantage on reusing knowledge across tasks. A potential impact of SA is to pave the way for a large scale pre-training architecture in the reinforcement learning area." @default.
- W3120229615 created "2021-01-18" @default.
- W3120229615 creator A5001819736 @default.
- W3120229615 creator A5013197657 @default.
- W3120229615 creator A5069334991 @default.
- W3120229615 date "2021-05-04" @default.
- W3120229615 modified "2023-09-27" @default.
- W3120229615 title "The Skill-Action Architecture: Learning Abstract Action Embeddings for Reinforcement Learning" @default.
- W3120229615 hasPublicationYear "2021" @default.
- W3120229615 type Work @default.
- W3120229615 sameAs 3120229615 @default.
- W3120229615 citedByCount "1" @default.
- W3120229615 countsByYear W31202296152021 @default.
- W3120229615 crossrefType "journal-article" @default.
- W3120229615 hasAuthorship W3120229615A5001819736 @default.
- W3120229615 hasAuthorship W3120229615A5013197657 @default.
- W3120229615 hasAuthorship W3120229615A5069334991 @default.
- W3120229615 hasConcept C105795698 @default.
- W3120229615 hasConcept C106189395 @default.
- W3120229615 hasConcept C112972136 @default.
- W3120229615 hasConcept C119857082 @default.
- W3120229615 hasConcept C126255220 @default.
- W3120229615 hasConcept C151730666 @default.
- W3120229615 hasConcept C154945302 @default.
- W3120229615 hasConcept C159886148 @default.
- W3120229615 hasConcept C2779343474 @default.
- W3120229615 hasConcept C2781067378 @default.
- W3120229615 hasConcept C28761237 @default.
- W3120229615 hasConcept C33923547 @default.
- W3120229615 hasConcept C41008148 @default.
- W3120229615 hasConcept C41608201 @default.
- W3120229615 hasConcept C774472 @default.
- W3120229615 hasConcept C86803240 @default.
- W3120229615 hasConcept C97541855 @default.
- W3120229615 hasConcept C98763669 @default.
- W3120229615 hasConceptScore W3120229615C105795698 @default.
- W3120229615 hasConceptScore W3120229615C106189395 @default.
- W3120229615 hasConceptScore W3120229615C112972136 @default.
- W3120229615 hasConceptScore W3120229615C119857082 @default.
- W3120229615 hasConceptScore W3120229615C126255220 @default.
- W3120229615 hasConceptScore W3120229615C151730666 @default.
- W3120229615 hasConceptScore W3120229615C154945302 @default.
- W3120229615 hasConceptScore W3120229615C159886148 @default.
- W3120229615 hasConceptScore W3120229615C2779343474 @default.
- W3120229615 hasConceptScore W3120229615C2781067378 @default.
- W3120229615 hasConceptScore W3120229615C28761237 @default.
- W3120229615 hasConceptScore W3120229615C33923547 @default.
- W3120229615 hasConceptScore W3120229615C41008148 @default.
- W3120229615 hasConceptScore W3120229615C41608201 @default.
- W3120229615 hasConceptScore W3120229615C774472 @default.
- W3120229615 hasConceptScore W3120229615C86803240 @default.
- W3120229615 hasConceptScore W3120229615C97541855 @default.
- W3120229615 hasConceptScore W3120229615C98763669 @default.
- W3120229615 hasLocation W31202296151 @default.
- W3120229615 hasOpenAccess W3120229615 @default.
- W3120229615 hasPrimaryLocation W31202296151 @default.
- W3120229615 hasRelatedWork W143164768 @default.
- W3120229615 hasRelatedWork W1521625446 @default.
- W3120229615 hasRelatedWork W1658094677 @default.
- W3120229615 hasRelatedWork W1958015824 @default.
- W3120229615 hasRelatedWork W1976800061 @default.
- W3120229615 hasRelatedWork W1982948368 @default.
- W3120229615 hasRelatedWork W2032854309 @default.
- W3120229615 hasRelatedWork W2038771780 @default.
- W3120229615 hasRelatedWork W2140332127 @default.
- W3120229615 hasRelatedWork W2154023516 @default.
- W3120229615 hasRelatedWork W2166265228 @default.
- W3120229615 hasRelatedWork W2557609013 @default.
- W3120229615 hasRelatedWork W2572448624 @default.
- W3120229615 hasRelatedWork W2907385442 @default.
- W3120229615 hasRelatedWork W2975199294 @default.
- W3120229615 hasRelatedWork W2995726179 @default.
- W3120229615 hasRelatedWork W3042947755 @default.
- W3120229615 hasRelatedWork W3072315125 @default.
- W3120229615 hasRelatedWork W3139043417 @default.
- W3120229615 hasRelatedWork W36691172 @default.
- W3120229615 isParatext "false" @default.
- W3120229615 isRetracted "false" @default.
- W3120229615 magId "3120229615" @default.
- W3120229615 workType "article" @default.