Matches in SemOpenAlex for { <https://semopenalex.org/work/W4362697178> ?p ?o ?g. }
- W4362697178 abstract "Abstract To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming “beliefs”—optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN’s learned representation encodes belief information, but only when the RNN’s capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity. Author Summary Natural environments are full of uncertainty. For example, just because my fridge had food in it yesterday does not mean it will have food today. Despite such uncertainty, animals can estimate which states and actions are the most valuable. Previous work suggests that animals estimate value using a brain area called the basal ganglia, using a process resembling a reinforcement learning algorithm called TD learning. However, traditional reinforcement learning algorithms cannot accurately estimate value in environments with state uncertainty (e.g., when my fridge’s contents are unknown). One way around this problem is if agents form “beliefs,” a probabilistic estimate of how likely each state is, given any observations so far. However, estimating beliefs is a demanding process that may not be possible for animals in more complex environments. Here we show that an artificial recurrent neural network (RNN) trained with TD learning can estimate value from observations, without explicitly estimating beliefs. The trained RNN’s error signals resembled the neural activity of dopamine neurons measured during the same task. Importantly, the RNN’s activity resembled beliefs, but only when the RNN had enough capacity. This work illustrates how animals could estimate value in uncertain environments without needing to first form beliefs, which may be useful in environments where computing the true beliefs is too costly." @default.
- W4362697178 created "2023-04-09" @default.
- W4362697178 creator A5031715686 @default.
- W4362697178 creator A5042128956 @default.
- W4362697178 creator A5054079047 @default.
- W4362697178 creator A5057544670 @default.
- W4362697178 creator A5063041513 @default.
- W4362697178 creator A5085316776 @default.
- W4362697178 date "2023-04-07" @default.
- W4362697178 modified "2023-09-27" @default.
- W4362697178 title "Emergence of belief-like representations through reinforcement learning" @default.
- W4362697178 cites W1980324747 @default.
- W4362697178 cites W2002935439 @default.
- W4362697178 cites W2011074553 @default.
- W4362697178 cites W2011868317 @default.
- W4362697178 cites W2017357931 @default.
- W4362697178 cites W2022561563 @default.
- W4362697178 cites W2045833919 @default.
- W4362697178 cites W2046813707 @default.
- W4362697178 cites W2054217036 @default.
- W4362697178 cites W2092861175 @default.
- W4362697178 cites W2117726420 @default.
- W4362697178 cites W2119170562 @default.
- W4362697178 cites W2119259385 @default.
- W4362697178 cites W2152681028 @default.
- W4362697178 cites W2167362547 @default.
- W4362697178 cites W2168359464 @default.
- W4362697178 cites W2171865010 @default.
- W4362697178 cites W2346736747 @default.
- W4362697178 cites W2557342183 @default.
- W4362697178 cites W2592093412 @default.
- W4362697178 cites W2593605634 @default.
- W4362697178 cites W2769753143 @default.
- W4362697178 cites W2797393690 @default.
- W4362697178 cites W2801251491 @default.
- W4362697178 cites W2804780080 @default.
- W4362697178 cites W2976141814 @default.
- W4362697178 cites W3041566370 @default.
- W4362697178 cites W3041725488 @default.
- W4362697178 cites W4210609138 @default.
- W4362697178 cites W4225107246 @default.
- W4362697178 cites W4255472410 @default.
- W4362697178 cites W4288280627 @default.
- W4362697178 doi "https://doi.org/10.1101/2023.04.04.535512" @default.
- W4362697178 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37066383" @default.
- W4362697178 hasPublicationYear "2023" @default.
- W4362697178 type Work @default.
- W4362697178 citedByCount "2" @default.
- W4362697178 countsByYear W43626971782023 @default.
- W4362697178 crossrefType "posted-content" @default.
- W4362697178 hasAuthorship W4362697178A5031715686 @default.
- W4362697178 hasAuthorship W4362697178A5042128956 @default.
- W4362697178 hasAuthorship W4362697178A5054079047 @default.
- W4362697178 hasAuthorship W4362697178A5057544670 @default.
- W4362697178 hasAuthorship W4362697178A5063041513 @default.
- W4362697178 hasAuthorship W4362697178A5085316776 @default.
- W4362697178 hasBestOaLocation W43626971781 @default.
- W4362697178 hasConcept C105795698 @default.
- W4362697178 hasConcept C106189395 @default.
- W4362697178 hasConcept C107673813 @default.
- W4362697178 hasConcept C111919701 @default.
- W4362697178 hasConcept C119857082 @default.
- W4362697178 hasConcept C147168706 @default.
- W4362697178 hasConcept C154945302 @default.
- W4362697178 hasConcept C159886148 @default.
- W4362697178 hasConcept C17744445 @default.
- W4362697178 hasConcept C199539241 @default.
- W4362697178 hasConcept C2776291640 @default.
- W4362697178 hasConcept C2776359362 @default.
- W4362697178 hasConcept C28826006 @default.
- W4362697178 hasConcept C33923547 @default.
- W4362697178 hasConcept C36299963 @default.
- W4362697178 hasConcept C41008148 @default.
- W4362697178 hasConcept C48044578 @default.
- W4362697178 hasConcept C50644808 @default.
- W4362697178 hasConcept C77088390 @default.
- W4362697178 hasConcept C94625758 @default.
- W4362697178 hasConcept C97541855 @default.
- W4362697178 hasConcept C98045186 @default.
- W4362697178 hasConceptScore W4362697178C105795698 @default.
- W4362697178 hasConceptScore W4362697178C106189395 @default.
- W4362697178 hasConceptScore W4362697178C107673813 @default.
- W4362697178 hasConceptScore W4362697178C111919701 @default.
- W4362697178 hasConceptScore W4362697178C119857082 @default.
- W4362697178 hasConceptScore W4362697178C147168706 @default.
- W4362697178 hasConceptScore W4362697178C154945302 @default.
- W4362697178 hasConceptScore W4362697178C159886148 @default.
- W4362697178 hasConceptScore W4362697178C17744445 @default.
- W4362697178 hasConceptScore W4362697178C199539241 @default.
- W4362697178 hasConceptScore W4362697178C2776291640 @default.
- W4362697178 hasConceptScore W4362697178C2776359362 @default.
- W4362697178 hasConceptScore W4362697178C28826006 @default.
- W4362697178 hasConceptScore W4362697178C33923547 @default.
- W4362697178 hasConceptScore W4362697178C36299963 @default.
- W4362697178 hasConceptScore W4362697178C41008148 @default.
- W4362697178 hasConceptScore W4362697178C48044578 @default.
- W4362697178 hasConceptScore W4362697178C50644808 @default.
- W4362697178 hasConceptScore W4362697178C77088390 @default.
- W4362697178 hasConceptScore W4362697178C94625758 @default.
- W4362697178 hasConceptScore W4362697178C97541855 @default.