Matches in SemOpenAlex for { <https://semopenalex.org/work/W137357473> ?p ?o ?g. }
- W137357473 abstract "Attention and Reinforcement Learning: Constructing Representations from Indirect Feedback ˜ (canas@colorado.edu) & Matt Jones (mcj@colorado.edu) Fabi´an Ca nas University of Colorado, Department of Psychology & Neuroscience Boulder, CO 80309 USA Abstract Reinforcement learning (RL) shows great promise as a theory of learning in complex, dynamic tasks. However, the learn- ing performance of RL models depends strongly on how stim- uli are represented, because this determines how knowledge is generalized among stimuli. We propose a mechanism by which RL autonomously constructs representations that suit its needs, using selective attention among stimulus dimensions to bootstrap off of internal value estimates and improve those same estimates, thereby speeding learning. Results of a behav- ioral experiment support this proposal, by showing people can learn selective attention for actions that do not lead directly to reward, through internally generated feedback. The results are cast in a larger framework for integrating RL with psychologi- cal mechanisms of representation learning. Keywords: Reinforcement Learning; attention; generalization Introduction Humans have an incredible capacity to learn new and com- plex tasks in dynamic environments. In recent years, Rein- forcement Learning (RL) has emerged as a theoretical frame- work that may explain how such powerful learning takes place (e.g., Sutton & Barto, 1998). Reinforcement learn- ing draws on a synthesis of machine learning and neuro- science and offers a set of computational principles for de- scribing learning of dynamic tasks. RL has led to major ad- vances in the ability of machines to learn difficult tasks such as backgammon and autonomous helicopter flight (Tesauro, 1995; Bagnell & Schneider, 2001). RL has also received much interest in neuroscience, based on findings that phasic dopamine signals have similar properties to the internal feed- back computed by RL algorithms (Schultz, Dayan, & Mon- tague, 1997). This correspondence suggests that RL offers a useful model of biological learning. Despite the promise of this framework, the learning perfor- mance of RL algorithms strongly depends on the representa- tions on which they operate. RL works by learning which action to perform in each state of a task’s environment. In realistically complex tasks with large state spaces, learning about every state individually is impossible, and instead the learner must generalize knowledge among states. General- ization is closely tied to similarity (Shepard, 1987), which in turn depends on how stimuli or situations are represented. Therefore the efficacy of generalization depends on how a task is internally represented. Most often in machine-learning applications, representations are pre-supplied by the modeler based on features that are carefully crafted to capture the most important aspects of the task being learned (e.g., Tesauro, 1995). In psychological contexts, stimuli are chosen so that the subject’s representation is transparent, and consequently the question of how the representation is learned is neglected (Schyns, Goldstone, & Thibaut, 1998). A great deal of psychological research in domains other than RL focuses on how people learn representations to fa- cilitate learning, inference, and decision-making. The aim of our general research program is to explore how such mecha- nisms might interact with RL, and in particular how RL can build its own representations to bootstrap learning. In the present paper we focus on selective attention, building on models from the literature on category learning (Kruschke, 1992). In a companion paper (Jones & Ca˜nas, 2010), we pro- vide a formal framework for integrating representation learn- ing with RL and implement a specific computational model based on selective attention. Here, we present a behavioral experiment that support the thesis that RL can drive represen- tational learning. Our results show that the internally gener- ated feedback signals at the core of RL can direct shifts of attention toward those stimulus dimensions that are most di- agnostic of optimal action. The remainder of this paper begins with background on RL and modeling of attention learning in categorization. We then outline our proposal for how RL and attention learning can bootstrap off of each other. We then report the results of a sequential decision-making experiment designed to test this specific proposal. Implications are discussed for the role of attention in more complex and temporally extended tasks, prescriptions for training in such tasks, and interactions be- tween representation learning and declarative memory. Reinforcement Learning RL is a computational framework for learning dynamic tasks based on feedback from the environment. RL models rep- resent a task as a set of environmental states together with a set of available actions in each state. The action selected at each step determines the immediate reward as well as the ensuing state. This general framework accommodates nearly any psychological task, from simple conditioning to elaborate planning (Sutton & Barto, 1998). RL works by estimating values of states and actions, which reflect predictions of total future reward. From any given state, the action with the highest estimated value represents a best guess of the choice that will lead to the highest long- term reward. The key to learning value estimates, which lies at the heart of all RL models, is an internally generated feed- back signal known as Temporal Difference (TD) error. TD error represents the discrepancy between the estimated value of an action prior to its execution and a new estimate based" @default.
- W137357473 created "2016-06-24" @default.
- W137357473 creator A5021482091 @default.
- W137357473 creator A5050892279 @default.
- W137357473 date "2010-01-01" @default.
- W137357473 modified "2023-09-26" @default.
- W137357473 title "Attention and Reinforcement Learning: Constructing Representations from Indirect Feedback" @default.
- W137357473 cites W1990198033 @default.
- W137357473 cites W1995501628 @default.
- W137357473 cites W2013369031 @default.
- W137357473 cites W2018124860 @default.
- W137357473 cites W2106925043 @default.
- W137357473 cites W2114826854 @default.
- W137357473 cites W2117726420 @default.
- W137357473 cites W2121863487 @default.
- W137357473 cites W2129668249 @default.
- W137357473 cites W2130105540 @default.
- W137357473 cites W2132089731 @default.
- W137357473 cites W2132271720 @default.
- W137357473 cites W2134145060 @default.
- W137357473 cites W2154164802 @default.
- W137357473 cites W2169214866 @default.
- W137357473 cites W2170014483 @default.
- W137357473 cites W2622232467 @default.
- W137357473 cites W2766069215 @default.
- W137357473 cites W86588979 @default.
- W137357473 cites W2131600418 @default.
- W137357473 hasPublicationYear "2010" @default.
- W137357473 type Work @default.
- W137357473 sameAs 137357473 @default.
- W137357473 citedByCount "3" @default.
- W137357473 countsByYear W1373574732014 @default.
- W137357473 countsByYear W1373574732016 @default.
- W137357473 crossrefType "journal-article" @default.
- W137357473 hasAuthorship W137357473A5021482091 @default.
- W137357473 hasAuthorship W137357473A5050892279 @default.
- W137357473 hasConcept C111472728 @default.
- W137357473 hasConcept C138885662 @default.
- W137357473 hasConcept C154945302 @default.
- W137357473 hasConcept C15744967 @default.
- W137357473 hasConcept C177148314 @default.
- W137357473 hasConcept C177264268 @default.
- W137357473 hasConcept C17744445 @default.
- W137357473 hasConcept C180747234 @default.
- W137357473 hasConcept C188147891 @default.
- W137357473 hasConcept C199360897 @default.
- W137357473 hasConcept C199539241 @default.
- W137357473 hasConcept C2776359362 @default.
- W137357473 hasConcept C2779918689 @default.
- W137357473 hasConcept C41008148 @default.
- W137357473 hasConcept C67203356 @default.
- W137357473 hasConcept C77805123 @default.
- W137357473 hasConcept C94625758 @default.
- W137357473 hasConcept C97541855 @default.
- W137357473 hasConceptScore W137357473C111472728 @default.
- W137357473 hasConceptScore W137357473C138885662 @default.
- W137357473 hasConceptScore W137357473C154945302 @default.
- W137357473 hasConceptScore W137357473C15744967 @default.
- W137357473 hasConceptScore W137357473C177148314 @default.
- W137357473 hasConceptScore W137357473C177264268 @default.
- W137357473 hasConceptScore W137357473C17744445 @default.
- W137357473 hasConceptScore W137357473C180747234 @default.
- W137357473 hasConceptScore W137357473C188147891 @default.
- W137357473 hasConceptScore W137357473C199360897 @default.
- W137357473 hasConceptScore W137357473C199539241 @default.
- W137357473 hasConceptScore W137357473C2776359362 @default.
- W137357473 hasConceptScore W137357473C2779918689 @default.
- W137357473 hasConceptScore W137357473C41008148 @default.
- W137357473 hasConceptScore W137357473C67203356 @default.
- W137357473 hasConceptScore W137357473C77805123 @default.
- W137357473 hasConceptScore W137357473C94625758 @default.
- W137357473 hasConceptScore W137357473C97541855 @default.
- W137357473 hasIssue "32" @default.
- W137357473 hasLocation W1373574731 @default.
- W137357473 hasOpenAccess W137357473 @default.
- W137357473 hasPrimaryLocation W1373574731 @default.
- W137357473 hasRelatedWork W2018124860 @default.
- W137357473 hasRelatedWork W2147198965 @default.
- W137357473 hasRelatedWork W2549418705 @default.
- W137357473 hasRelatedWork W2585812899 @default.
- W137357473 hasRelatedWork W2589919542 @default.
- W137357473 hasRelatedWork W2590516042 @default.
- W137357473 hasRelatedWork W259238774 @default.
- W137357473 hasRelatedWork W2617908825 @default.
- W137357473 hasRelatedWork W2622232467 @default.
- W137357473 hasRelatedWork W2767316680 @default.
- W137357473 hasRelatedWork W2767710836 @default.
- W137357473 hasRelatedWork W2779844519 @default.
- W137357473 hasRelatedWork W2904815624 @default.
- W137357473 hasRelatedWork W2945791024 @default.
- W137357473 hasRelatedWork W295306427 @default.
- W137357473 hasRelatedWork W2973379954 @default.
- W137357473 hasRelatedWork W3001936345 @default.
- W137357473 hasRelatedWork W3170157804 @default.
- W137357473 hasRelatedWork W3206188474 @default.
- W137357473 hasRelatedWork W86588979 @default.
- W137357473 hasVolume "32" @default.
- W137357473 isParatext "false" @default.
- W137357473 isRetracted "false" @default.
- W137357473 magId "137357473" @default.