Matches in SemOpenAlex for { <https://semopenalex.org/work/W4286892057> ?p ?o ?g. }
Showing items 1 to 70 of
70
with 100 items per page.
- W4286892057 abstract "Learning meaningful behaviors in the absence of reward is a difficult problem in reinforcement learning. A desirable and challenging unsupervised objective is to learn a set of diverse skills that provide a thorough coverage of the state space while being directed, i.e., reliably reaching distinct regions of the environment. In this paper, we build on the mutual information framework for skill discovery and introduce UPSIDE, which addresses the coverage-directedness trade-off in the following ways: 1) We design policies with a decoupled structure of a directed skill, trained to reach a specific region, followed by a diffusing part that induces a local coverage. 2) We optimize policies by maximizing their number under the constraint that each of them reaches distinct regions of the environment (i.e., they are sufficiently discriminable) and prove that this serves as a lower bound to the original mutual information objective. 3) Finally, we compose the learned directed skills into a growing tree that adaptively covers the environment. We illustrate in several navigation and control environments how the skills learned by UPSIDE solve sparse-reward downstream tasks better than existing baselines." @default.
- W4286892057 created "2022-07-25" @default.
- W4286892057 creator A5002110131 @default.
- W4286892057 creator A5014791481 @default.
- W4286892057 creator A5027101473 @default.
- W4286892057 creator A5031635996 @default.
- W4286892057 creator A5071798388 @default.
- W4286892057 date "2021-10-27" @default.
- W4286892057 modified "2023-09-29" @default.
- W4286892057 title "Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching" @default.
- W4286892057 doi "https://doi.org/10.48550/arxiv.2110.14457" @default.
- W4286892057 hasPublicationYear "2021" @default.
- W4286892057 type Work @default.
- W4286892057 citedByCount "0" @default.
- W4286892057 crossrefType "posted-content" @default.
- W4286892057 hasAuthorship W4286892057A5002110131 @default.
- W4286892057 hasAuthorship W4286892057A5014791481 @default.
- W4286892057 hasAuthorship W4286892057A5027101473 @default.
- W4286892057 hasAuthorship W4286892057A5031635996 @default.
- W4286892057 hasAuthorship W4286892057A5071798388 @default.
- W4286892057 hasBestOaLocation W42868920571 @default.
- W4286892057 hasConcept C105795698 @default.
- W4286892057 hasConcept C113174947 @default.
- W4286892057 hasConcept C11413529 @default.
- W4286892057 hasConcept C119857082 @default.
- W4286892057 hasConcept C134306372 @default.
- W4286892057 hasConcept C154945302 @default.
- W4286892057 hasConcept C177264268 @default.
- W4286892057 hasConcept C199360897 @default.
- W4286892057 hasConcept C2524010 @default.
- W4286892057 hasConcept C2775924081 @default.
- W4286892057 hasConcept C2776036281 @default.
- W4286892057 hasConcept C33923547 @default.
- W4286892057 hasConcept C41008148 @default.
- W4286892057 hasConcept C48103436 @default.
- W4286892057 hasConcept C72434380 @default.
- W4286892057 hasConcept C97541855 @default.
- W4286892057 hasConceptScore W4286892057C105795698 @default.
- W4286892057 hasConceptScore W4286892057C113174947 @default.
- W4286892057 hasConceptScore W4286892057C11413529 @default.
- W4286892057 hasConceptScore W4286892057C119857082 @default.
- W4286892057 hasConceptScore W4286892057C134306372 @default.
- W4286892057 hasConceptScore W4286892057C154945302 @default.
- W4286892057 hasConceptScore W4286892057C177264268 @default.
- W4286892057 hasConceptScore W4286892057C199360897 @default.
- W4286892057 hasConceptScore W4286892057C2524010 @default.
- W4286892057 hasConceptScore W4286892057C2775924081 @default.
- W4286892057 hasConceptScore W4286892057C2776036281 @default.
- W4286892057 hasConceptScore W4286892057C33923547 @default.
- W4286892057 hasConceptScore W4286892057C41008148 @default.
- W4286892057 hasConceptScore W4286892057C48103436 @default.
- W4286892057 hasConceptScore W4286892057C72434380 @default.
- W4286892057 hasConceptScore W4286892057C97541855 @default.
- W4286892057 hasLocation W42868920571 @default.
- W4286892057 hasLocation W42868920572 @default.
- W4286892057 hasOpenAccess W4286892057 @default.
- W4286892057 hasPrimaryLocation W42868920571 @default.
- W4286892057 hasRelatedWork W2094557321 @default.
- W4286892057 hasRelatedWork W2951871955 @default.
- W4286892057 hasRelatedWork W2954804306 @default.
- W4286892057 hasRelatedWork W3022038857 @default.
- W4286892057 hasRelatedWork W3103643887 @default.
- W4286892057 hasRelatedWork W3170446423 @default.
- W4286892057 hasRelatedWork W3173185086 @default.
- W4286892057 hasRelatedWork W3196472998 @default.
- W4286892057 hasRelatedWork W4287598111 @default.
- W4286892057 hasRelatedWork W4319083788 @default.
- W4286892057 isParatext "false" @default.
- W4286892057 isRetracted "false" @default.
- W4286892057 workType "article" @default.