Matches in SemOpenAlex for { <https://semopenalex.org/work/W3186762757> ?p ?o ?g. }
- W3186762757 abstract "Abstract A hallmark of human intelligence, but challenging for reinforcement learning (RL) agents, is the ability to compositionally generalise, that is, to recompose familiar knowledge components in novel ways to solve new problems. For instance, when navigating in a city, one needs to know the location of the destination and how to operate a vehicle to get there, whether it be pedalling a bike or operating a car. In RL, these correspond to the reward function and transition function, respectively. To compositionally generalize, these two components need to be transferable independently of each other: multiple modes of transport can reach the same goal, and any given mode can be used to reach multiple destinations. Yet there are also instances where it can be helpful to learn and transfer entire structures, jointly representing goals and transitions, particularly whenever these recur in natural tasks (e.g., given a suggestion to get ice cream, one might prefer to bike, even in new towns). Prior theoretical work has explored how, in model-based RL, agents can learn and generalize task components (transition and reward functions). But a satisfactory account for how a single agent can simultaneously satisfy the two competing demands is still lacking. Here, we propose a hierarchical RL agent that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model of the task. It maintains a factorised representation of task components through a hierarchical Dirichlet process, but it also represents different possible covariances between these components through a standard Dirichlet process. We validate our approach on a variety of navigation tasks covering a wide range of statistical correlations between task components and show that it can also improve generalisation and transfer in more complex, hierarchical tasks with goal/subgoal structures. Finally, we end with a discussion of our work including how this clustering algorithm could conceivably be implemented by cortico-striatal gating circuits in the brain." @default.
- W3186762757 created "2021-08-02" @default.
- W3186762757 creator A5007609257 @default.
- W3186762757 creator A5071428721 @default.
- W3186762757 date "2021-07-21" @default.
- W3186762757 modified "2023-10-16" @default.
- W3186762757 title "Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning" @default.
- W3186762757 cites W1966678693 @default.
- W3186762757 cites W1982670892 @default.
- W3186762757 cites W2006114123 @default.
- W3186762757 cites W2056354534 @default.
- W3186762757 cites W2061304498 @default.
- W3186762757 cites W2072916238 @default.
- W3186762757 cites W2086710210 @default.
- W3186762757 cites W2092387347 @default.
- W3186762757 cites W2109910161 @default.
- W3186762757 cites W2113122939 @default.
- W3186762757 cites W2117726420 @default.
- W3186762757 cites W2118373646 @default.
- W3186762757 cites W2119842714 @default.
- W3186762757 cites W2121517924 @default.
- W3186762757 cites W2140956413 @default.
- W3186762757 cites W2145339207 @default.
- W3186762757 cites W2158266063 @default.
- W3186762757 cites W2169743339 @default.
- W3186762757 cites W2170014483 @default.
- W3186762757 cites W2194321275 @default.
- W3186762757 cites W2257979135 @default.
- W3186762757 cites W2341634245 @default.
- W3186762757 cites W263845233 @default.
- W3186762757 cites W2761873684 @default.
- W3186762757 cites W2766447205 @default.
- W3186762757 cites W2827257388 @default.
- W3186762757 cites W2949369413 @default.
- W3186762757 cites W2950347959 @default.
- W3186762757 cites W2952705521 @default.
- W3186762757 cites W2953170694 @default.
- W3186762757 cites W2974979868 @default.
- W3186762757 cites W3016035772 @default.
- W3186762757 cites W3092740831 @default.
- W3186762757 cites W3096511324 @default.
- W3186762757 cites W3103493959 @default.
- W3186762757 cites W3156105972 @default.
- W3186762757 cites W4229781645 @default.
- W3186762757 cites W4251668937 @default.
- W3186762757 doi "https://doi.org/10.1101/2021.07.20.453122" @default.
- W3186762757 hasPublicationYear "2021" @default.
- W3186762757 type Work @default.
- W3186762757 sameAs 3186762757 @default.
- W3186762757 citedByCount "1" @default.
- W3186762757 countsByYear W31867627572022 @default.
- W3186762757 crossrefType "posted-content" @default.
- W3186762757 hasAuthorship W3186762757A5007609257 @default.
- W3186762757 hasAuthorship W3186762757A5071428721 @default.
- W3186762757 hasBestOaLocation W31867627571 @default.
- W3186762757 hasConcept C104317684 @default.
- W3186762757 hasConcept C111919701 @default.
- W3186762757 hasConcept C119857082 @default.
- W3186762757 hasConcept C14036430 @default.
- W3186762757 hasConcept C154945302 @default.
- W3186762757 hasConcept C162324750 @default.
- W3186762757 hasConcept C17744445 @default.
- W3186762757 hasConcept C185592680 @default.
- W3186762757 hasConcept C187736073 @default.
- W3186762757 hasConcept C194232998 @default.
- W3186762757 hasConcept C199539241 @default.
- W3186762757 hasConcept C2776359362 @default.
- W3186762757 hasConcept C2780451532 @default.
- W3186762757 hasConcept C41008148 @default.
- W3186762757 hasConcept C55493867 @default.
- W3186762757 hasConcept C78458016 @default.
- W3186762757 hasConcept C86803240 @default.
- W3186762757 hasConcept C94625758 @default.
- W3186762757 hasConcept C97541855 @default.
- W3186762757 hasConcept C98045186 @default.
- W3186762757 hasConceptScore W3186762757C104317684 @default.
- W3186762757 hasConceptScore W3186762757C111919701 @default.
- W3186762757 hasConceptScore W3186762757C119857082 @default.
- W3186762757 hasConceptScore W3186762757C14036430 @default.
- W3186762757 hasConceptScore W3186762757C154945302 @default.
- W3186762757 hasConceptScore W3186762757C162324750 @default.
- W3186762757 hasConceptScore W3186762757C17744445 @default.
- W3186762757 hasConceptScore W3186762757C185592680 @default.
- W3186762757 hasConceptScore W3186762757C187736073 @default.
- W3186762757 hasConceptScore W3186762757C194232998 @default.
- W3186762757 hasConceptScore W3186762757C199539241 @default.
- W3186762757 hasConceptScore W3186762757C2776359362 @default.
- W3186762757 hasConceptScore W3186762757C2780451532 @default.
- W3186762757 hasConceptScore W3186762757C41008148 @default.
- W3186762757 hasConceptScore W3186762757C55493867 @default.
- W3186762757 hasConceptScore W3186762757C78458016 @default.
- W3186762757 hasConceptScore W3186762757C86803240 @default.
- W3186762757 hasConceptScore W3186762757C94625758 @default.
- W3186762757 hasConceptScore W3186762757C97541855 @default.
- W3186762757 hasConceptScore W3186762757C98045186 @default.
- W3186762757 hasLocation W31867627571 @default.
- W3186762757 hasOpenAccess W3186762757 @default.
- W3186762757 hasPrimaryLocation W31867627571 @default.
- W3186762757 hasRelatedWork W1562959674 @default.
- W3186762757 hasRelatedWork W2023365303 @default.