Matches in SemOpenAlex for { <https://semopenalex.org/work/W3205046940> ?p ?o ?g. }
- W3205046940 endingPage "6892" @default.
- W3205046940 startingPage "6884" @default.
- W3205046940 abstract "Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning. Conventional approaches of unsupervised skill discovery feed a latent variable to the agent and shed its empowerment on agent’s behavior by mutual information (MI) maximization. However, the policies learned by MI-based methods cannot sufficiently explore the state space, despite they can be successfully identified from each other. Therefore we propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies. Additionally, we overcome difficulties in simultaneously training N(N>2) policies, and amortizing the overall reward to each step. Experiments show policies learned by our approach outperform MI-based methods on the metric of Wasserstein distance while keeping high discriminability. Furthermore, the agents trained by WURL can sufficiently explore the state space in mazes and MuJoCo tasks and the pre-trained policies can be applied to downstream tasks by hierarchical learning." @default.
- W3205046940 created "2021-10-25" @default.
- W3205046940 creator A5003607744 @default.
- W3205046940 creator A5024401174 @default.
- W3205046940 creator A5032664588 @default.
- W3205046940 creator A5043447775 @default.
- W3205046940 creator A5086972360 @default.
- W3205046940 date "2022-06-28" @default.
- W3205046940 modified "2023-09-25" @default.
- W3205046940 title "Wasserstein Unsupervised Reinforcement Learning" @default.
- W3205046940 cites W1771410628 @default.
- W3205046940 cites W2145339207 @default.
- W3205046940 cites W2158131535 @default.
- W3205046940 cites W2173248099 @default.
- W3205046940 cites W2257979135 @default.
- W3205046940 cites W2556477470 @default.
- W3205046940 cites W2736601468 @default.
- W3205046940 cites W2739748921 @default.
- W3205046940 cites W2785342287 @default.
- W3205046940 cites W2785620140 @default.
- W3205046940 cites W2788904251 @default.
- W3205046940 cites W2803213319 @default.
- W3205046940 cites W2883433335 @default.
- W3205046940 cites W2894973163 @default.
- W3205046940 cites W2922007426 @default.
- W3205046940 cites W2922142804 @default.
- W3205046940 cites W2925140953 @default.
- W3205046940 cites W2949608212 @default.
- W3205046940 cites W2962738371 @default.
- W3205046940 cites W2962902376 @default.
- W3205046940 cites W2962970351 @default.
- W3205046940 cites W2963006832 @default.
- W3205046940 cites W2963438456 @default.
- W3205046940 cites W2963563295 @default.
- W3205046940 cites W2963639957 @default.
- W3205046940 cites W2963746531 @default.
- W3205046940 cites W2963800509 @default.
- W3205046940 cites W2964001908 @default.
- W3205046940 cites W2964263543 @default.
- W3205046940 cites W3027456239 @default.
- W3205046940 cites W3034731451 @default.
- W3205046940 cites W3035403520 @default.
- W3205046940 cites W3113118584 @default.
- W3205046940 cites W3119054718 @default.
- W3205046940 cites W3121427321 @default.
- W3205046940 doi "https://doi.org/10.1609/aaai.v36i6.20645" @default.
- W3205046940 hasPublicationYear "2022" @default.
- W3205046940 type Work @default.
- W3205046940 sameAs 3205046940 @default.
- W3205046940 citedByCount "1" @default.
- W3205046940 countsByYear W32050469402022 @default.
- W3205046940 crossrefType "journal-article" @default.
- W3205046940 hasAuthorship W3205046940A5003607744 @default.
- W3205046940 hasAuthorship W3205046940A5024401174 @default.
- W3205046940 hasAuthorship W3205046940A5032664588 @default.
- W3205046940 hasAuthorship W3205046940A5043447775 @default.
- W3205046940 hasAuthorship W3205046940A5086972360 @default.
- W3205046940 hasBestOaLocation W32050469401 @default.
- W3205046940 hasConcept C105795698 @default.
- W3205046940 hasConcept C111919701 @default.
- W3205046940 hasConcept C119857082 @default.
- W3205046940 hasConcept C120822770 @default.
- W3205046940 hasConcept C126255220 @default.
- W3205046940 hasConcept C127413603 @default.
- W3205046940 hasConcept C154945302 @default.
- W3205046940 hasConcept C176217482 @default.
- W3205046940 hasConcept C21547014 @default.
- W3205046940 hasConcept C2776330181 @default.
- W3205046940 hasConcept C2778572836 @default.
- W3205046940 hasConcept C33923547 @default.
- W3205046940 hasConcept C41008148 @default.
- W3205046940 hasConcept C66938386 @default.
- W3205046940 hasConcept C67203356 @default.
- W3205046940 hasConcept C72434380 @default.
- W3205046940 hasConcept C8038995 @default.
- W3205046940 hasConcept C97541855 @default.
- W3205046940 hasConceptScore W3205046940C105795698 @default.
- W3205046940 hasConceptScore W3205046940C111919701 @default.
- W3205046940 hasConceptScore W3205046940C119857082 @default.
- W3205046940 hasConceptScore W3205046940C120822770 @default.
- W3205046940 hasConceptScore W3205046940C126255220 @default.
- W3205046940 hasConceptScore W3205046940C127413603 @default.
- W3205046940 hasConceptScore W3205046940C154945302 @default.
- W3205046940 hasConceptScore W3205046940C176217482 @default.
- W3205046940 hasConceptScore W3205046940C21547014 @default.
- W3205046940 hasConceptScore W3205046940C2776330181 @default.
- W3205046940 hasConceptScore W3205046940C2778572836 @default.
- W3205046940 hasConceptScore W3205046940C33923547 @default.
- W3205046940 hasConceptScore W3205046940C41008148 @default.
- W3205046940 hasConceptScore W3205046940C66938386 @default.
- W3205046940 hasConceptScore W3205046940C67203356 @default.
- W3205046940 hasConceptScore W3205046940C72434380 @default.
- W3205046940 hasConceptScore W3205046940C8038995 @default.
- W3205046940 hasConceptScore W3205046940C97541855 @default.
- W3205046940 hasIssue "6" @default.
- W3205046940 hasLocation W32050469401 @default.
- W3205046940 hasLocation W32050469402 @default.
- W3205046940 hasOpenAccess W3205046940 @default.