Matches in SemOpenAlex for { <https://semopenalex.org/work/W4224130159> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4224130159 endingPage "104" @default.
- W4224130159 startingPage "90" @default.
- W4224130159 abstract "Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods." @default.
- W4224130159 created "2022-04-20" @default.
- W4224130159 creator A5028032706 @default.
- W4224130159 creator A5055819776 @default.
- W4224130159 creator A5063407210 @default.
- W4224130159 date "2022-08-01" @default.
- W4224130159 modified "2023-10-16" @default.
- W4224130159 title "Discovering diverse solutions in deep reinforcement learning by maximizing state–action-based mutual information" @default.
- W4224130159 cites W1516111018 @default.
- W4224130159 cites W1738827650 @default.
- W4224130159 cites W2119717200 @default.
- W4224130159 cites W2158782408 @default.
- W4224130159 cites W2257979135 @default.
- W4224130159 cites W2462548332 @default.
- W4224130159 cites W2796290181 @default.
- W4224130159 cites W2964227312 @default.
- W4224130159 cites W2997947190 @default.
- W4224130159 cites W3104512559 @default.
- W4224130159 doi "https://doi.org/10.1016/j.neunet.2022.04.009" @default.
- W4224130159 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/35523085" @default.
- W4224130159 hasPublicationYear "2022" @default.
- W4224130159 type Work @default.
- W4224130159 citedByCount "2" @default.
- W4224130159 countsByYear W42241301592023 @default.
- W4224130159 crossrefType "journal-article" @default.
- W4224130159 hasAuthorship W4224130159A5028032706 @default.
- W4224130159 hasAuthorship W4224130159A5055819776 @default.
- W4224130159 hasAuthorship W4224130159A5063407210 @default.
- W4224130159 hasBestOaLocation W42241301592 @default.
- W4224130159 hasConcept C105795698 @default.
- W4224130159 hasConcept C119857082 @default.
- W4224130159 hasConcept C120665830 @default.
- W4224130159 hasConcept C121332964 @default.
- W4224130159 hasConcept C139807058 @default.
- W4224130159 hasConcept C14036430 @default.
- W4224130159 hasConcept C152139883 @default.
- W4224130159 hasConcept C154945302 @default.
- W4224130159 hasConcept C177264268 @default.
- W4224130159 hasConcept C185429906 @default.
- W4224130159 hasConcept C199360897 @default.
- W4224130159 hasConcept C33923547 @default.
- W4224130159 hasConcept C41008148 @default.
- W4224130159 hasConcept C51167844 @default.
- W4224130159 hasConcept C78458016 @default.
- W4224130159 hasConcept C8038995 @default.
- W4224130159 hasConcept C86803240 @default.
- W4224130159 hasConcept C97541855 @default.
- W4224130159 hasConceptScore W4224130159C105795698 @default.
- W4224130159 hasConceptScore W4224130159C119857082 @default.
- W4224130159 hasConceptScore W4224130159C120665830 @default.
- W4224130159 hasConceptScore W4224130159C121332964 @default.
- W4224130159 hasConceptScore W4224130159C139807058 @default.
- W4224130159 hasConceptScore W4224130159C14036430 @default.
- W4224130159 hasConceptScore W4224130159C152139883 @default.
- W4224130159 hasConceptScore W4224130159C154945302 @default.
- W4224130159 hasConceptScore W4224130159C177264268 @default.
- W4224130159 hasConceptScore W4224130159C185429906 @default.
- W4224130159 hasConceptScore W4224130159C199360897 @default.
- W4224130159 hasConceptScore W4224130159C33923547 @default.
- W4224130159 hasConceptScore W4224130159C41008148 @default.
- W4224130159 hasConceptScore W4224130159C51167844 @default.
- W4224130159 hasConceptScore W4224130159C78458016 @default.
- W4224130159 hasConceptScore W4224130159C8038995 @default.
- W4224130159 hasConceptScore W4224130159C86803240 @default.
- W4224130159 hasConceptScore W4224130159C97541855 @default.
- W4224130159 hasFunder F4320334764 @default.
- W4224130159 hasLocation W42241301591 @default.
- W4224130159 hasLocation W42241301592 @default.
- W4224130159 hasLocation W42241301593 @default.
- W4224130159 hasOpenAccess W4224130159 @default.
- W4224130159 hasPrimaryLocation W42241301591 @default.
- W4224130159 hasRelatedWork W3022038857 @default.
- W4224130159 hasRelatedWork W3046775127 @default.
- W4224130159 hasRelatedWork W3123344745 @default.
- W4224130159 hasRelatedWork W3196155444 @default.
- W4224130159 hasRelatedWork W3208099188 @default.
- W4224130159 hasRelatedWork W3209574120 @default.
- W4224130159 hasRelatedWork W4285260836 @default.
- W4224130159 hasRelatedWork W4319083788 @default.
- W4224130159 hasRelatedWork W4367692580 @default.
- W4224130159 hasRelatedWork W4386462264 @default.
- W4224130159 hasVolume "152" @default.
- W4224130159 isParatext "false" @default.
- W4224130159 isRetracted "false" @default.
- W4224130159 workType "article" @default.