Matches in SemOpenAlex for { <https://semopenalex.org/work/W3133860714> ?p ?o ?g. }
- W3133860714 abstract "In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by finite-horizon trajectories is a sensible target. Especially, we present a novel and practical policy-search algorithm, Maximum Entropy POLicy optimization (MEPOL), to learn a policy that maximizes a non-parametric, $k$-nearest neighbors estimate of the state distribution entropy. In contrast to known methods, MEPOL is completely model-free as it requires neither to estimate the state distribution of any policy nor to model transition dynamics. Then, we empirically show that MEPOL allows learning a maximum-entropy exploration policy in high-dimensional, continuous-control domains, and how this policy facilitates learning a variety of meaningful reward-based tasks downstream." @default.
- W3133860714 created "2021-03-15" @default.
- W3133860714 creator A5017130830 @default.
- W3133860714 creator A5047410750 @default.
- W3133860714 creator A5089411660 @default.
- W3133860714 date "2020-07-09" @default.
- W3133860714 modified "2023-09-23" @default.
- W3133860714 title "Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate" @default.
- W3133860714 cites W1601389419 @default.
- W3133860714 cites W172298727 @default.
- W3133860714 cites W1771410628 @default.
- W3133860714 cites W1786044565 @default.
- W3133860714 cites W1995875735 @default.
- W3133860714 cites W2012587148 @default.
- W3133860714 cites W2049446512 @default.
- W3133860714 cites W2084176271 @default.
- W3133860714 cites W2089943482 @default.
- W3133860714 cites W2101234009 @default.
- W3133860714 cites W2101524054 @default.
- W3133860714 cites W2119567691 @default.
- W3133860714 cites W2121863487 @default.
- W3133860714 cites W2139612737 @default.
- W3133860714 cites W2145339207 @default.
- W3133860714 cites W2158782408 @default.
- W3133860714 cites W2160589914 @default.
- W3133860714 cites W2257979135 @default.
- W3133860714 cites W2293729149 @default.
- W3133860714 cites W2336509789 @default.
- W3133860714 cites W2561776174 @default.
- W3133860714 cites W2604763608 @default.
- W3133860714 cites W2614839826 @default.
- W3133860714 cites W2808682055 @default.
- W3133860714 cites W2883433335 @default.
- W3133860714 cites W2890597452 @default.
- W3133860714 cites W2902298341 @default.
- W3133860714 cites W2914261249 @default.
- W3133860714 cites W2915306724 @default.
- W3133860714 cites W2949608212 @default.
- W3133860714 cites W2952193948 @default.
- W3133860714 cites W2953326529 @default.
- W3133860714 cites W2962730405 @default.
- W3133860714 cites W2962902376 @default.
- W3133860714 cites W2963177395 @default.
- W3133860714 cites W2963276097 @default.
- W3133860714 cites W2963438456 @default.
- W3133860714 cites W2963639957 @default.
- W3133860714 cites W2963641140 @default.
- W3133860714 cites W2963646405 @default.
- W3133860714 cites W2963680188 @default.
- W3133860714 cites W2964067469 @default.
- W3133860714 cites W2964083594 @default.
- W3133860714 cites W2980964291 @default.
- W3133860714 cites W2990747716 @default.
- W3133860714 cites W2996037775 @default.
- W3133860714 cites W2997343068 @default.
- W3133860714 cites W3028821797 @default.
- W3133860714 cites W3034769194 @default.
- W3133860714 cites W3034893468 @default.
- W3133860714 cites W3035599863 @default.
- W3133860714 cites W3035642820 @default.
- W3133860714 cites W3035717769 @default.
- W3133860714 cites W3036002380 @default.
- W3133860714 cites W3089395821 @default.
- W3133860714 cites W3173335063 @default.
- W3133860714 hasPublicationYear "2020" @default.
- W3133860714 type Work @default.
- W3133860714 sameAs 3133860714 @default.
- W3133860714 citedByCount "3" @default.
- W3133860714 countsByYear W31338607142021 @default.
- W3133860714 crossrefType "posted-content" @default.
- W3133860714 hasAuthorship W3133860714A5017130830 @default.
- W3133860714 hasAuthorship W3133860714A5047410750 @default.
- W3133860714 hasAuthorship W3133860714A5089411660 @default.
- W3133860714 hasConcept C105795698 @default.
- W3133860714 hasConcept C106301342 @default.
- W3133860714 hasConcept C117251300 @default.
- W3133860714 hasConcept C121332964 @default.
- W3133860714 hasConcept C126255220 @default.
- W3133860714 hasConcept C127413603 @default.
- W3133860714 hasConcept C154945302 @default.
- W3133860714 hasConcept C201995342 @default.
- W3133860714 hasConcept C2780451532 @default.
- W3133860714 hasConcept C33923547 @default.
- W3133860714 hasConcept C41008148 @default.
- W3133860714 hasConcept C62520636 @default.
- W3133860714 hasConcept C9679016 @default.
- W3133860714 hasConceptScore W3133860714C105795698 @default.
- W3133860714 hasConceptScore W3133860714C106301342 @default.
- W3133860714 hasConceptScore W3133860714C117251300 @default.
- W3133860714 hasConceptScore W3133860714C121332964 @default.
- W3133860714 hasConceptScore W3133860714C126255220 @default.
- W3133860714 hasConceptScore W3133860714C127413603 @default.
- W3133860714 hasConceptScore W3133860714C154945302 @default.
- W3133860714 hasConceptScore W3133860714C201995342 @default.
- W3133860714 hasConceptScore W3133860714C2780451532 @default.
- W3133860714 hasConceptScore W3133860714C33923547 @default.
- W3133860714 hasConceptScore W3133860714C41008148 @default.
- W3133860714 hasConceptScore W3133860714C62520636 @default.
- W3133860714 hasConceptScore W3133860714C9679016 @default.
- W3133860714 hasOpenAccess W3133860714 @default.