SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W2767331574> ?p ?o ?g. }

Showing items 1 to 70 of 70 with 100 items per page.

W2767331574 abstract "Reinforcement Learning Leads to Risk Averse Behavior Jerker C. Denrell (denrell@gsb.stanford.edu) Graduate School of Business, Stanford University, 518 Memorial Way, Stanford, CA 94305 USA Keywords: Reinforcement Learning; Risk Taking; Adaptation; Exploration. S 2 b σ i 2 ] 2 ( 2 − b ) P i , t = N S 2 b σ j ] Exp [ S μ j − 2 ( 2 − b ) j = 1 Exp [ S μ i − lim t → ∞ Animals and humans often have to choose between options with reward distributions that are initially unknown and can only be learned through experience. Recent experimental and theoretical work has demonstrated that such decision processes can be modeled using computational models of reinforcement learning (Daw et al, 2006; Erev & Barron, 2005; Sutton & Barto, 1998). In these models, agents use past rewards to form estimates of the rewards generated by the different options and the probability of choosing an option is an increasing function of its reward estimate. Here I show that such models lead to risk averse behavior. Reinforcement learning leads to improved performance by increasing the probability of sampling alternatives with good past outcomes and avoiding alternatives with poor past outcomes. Such adaptive sampling is sensible but introduces an asymmetry in experiential learning. Because alternatives with poor past outcomes are avoided, errors that involve underestimation of rewards are unlikely to be corrected. Because alternatives with favorable past outcomes are sampled again, errors of overestimation are likely to be corrected (Denrell & March, 2001; Denrell, 2005; 2007; March, 1996). Due to this asymmetry, reinforcement learning leads to systematic biases in decision making (e.g. Denrell, 2005; Denrell & Le Mens, 2007). In this paper I demonstrate formally that because of this asymmetry, reinforcement learning leads to risk averse behavior: among a set of uncertain alternatives with identical expected value, the learner will, in the long run, be most likely to choose the least variable alternative. In particular, suppose that 1) In each period, the learner must choose one of N alternatives, each with a normally distributed reward, r i , t . where μ i & σ i 2 are the expected reward and the variance of alternative i. This probability is an increasing function of the expected reward, but a decreasing function of the variance. Moreover, these choice probabilities are identical to that of a decision maker who knows the probability distributions, prefers alternatives with high mean and low variance, and chooses between options according to a logit choice rule. Thus, the learning model generates choice probabilities identical to a random utility model assuming mean-variance preferences. I prove that the result that reinforcement learning leads to risk averse behavior generalizes to a large class of probability distributions and several other belief-updating rules and choice rules. If the reward distributions are not symmetric, I show the learning model can generate behavior consistent with a preference for alternatives with a reward distribution with low variance and positive skew. I also show that a modified logit choice rule can generate behavior consistent with an s- shaped utility function. References Daw, N. D., O'Doherty, J. P., Dayan , P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441, 876-879. Denrell, J. (2007). Adaptive learning and risk taking. Psychological Review, 114, 177-187. Denrell, Jerker. (2005). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, 112, 951-978. Denrell, J. and G. Le Mens (2007). Interdependent sampling and social influence. Psychological Review, 114, 398-422. Denrell, J. & March, J. G. (2001). Adaptation as information restriction: The hot stove effect. Organization Science, 12, 523-538. Erev, I., & Barron, G. (2005). On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychological Review, 112, 912-931. March, J. G. (1996). Learning to be risk averse. Psychological Review, 103, 309-319. Sutton, R. & Barto, A. G. (1998). Reinforcement Learning. Cambridge, MA: The MIT Press. 2) The learner uses a weighted average of past experiences to form a reward estimate, y i , t , for each alternative. Specifically, the reward estimate of alternative i is: y i , t + 1 = ( 1 − b ) y i , t + br i , t + 1 . 3) The learner chooses among alternatives according to a logit choice rule: the probability that alternative i is chosen N in period t is Exp ( Sy i , t ) /[ ∑ Exp ( Sy j , t )] . j = 1 This model leads to risk averse behavior: asymptotically the probability that alternative i is chosen is" @default.
W2767331574 created "2017-11-17" @default.
W2767331574 creator A5058172408 @default.
W2767331574 date "2008-01-01" @default.
W2767331574 modified "2023-09-26" @default.
W2767331574 title "Reinforcement Learning Leads to Risk Averse Behavior" @default.
W2767331574 cites W1966625800 @default.
W2767331574 cites W2077611535 @default.
W2767331574 cites W2079843690 @default.
W2767331574 cites W2090148394 @default.
W2767331574 cites W2094177537 @default.
W2767331574 cites W2165357065 @default.
W2767331574 cites W2168419630 @default.
W2767331574 cites W2911283634 @default.
W2767331574 cites W2914656440 @default.
W2767331574 hasPublicationYear "2008" @default.
W2767331574 type Work @default.
W2767331574 sameAs 2767331574 @default.
W2767331574 citedByCount "0" @default.
W2767331574 crossrefType "journal-article" @default.
W2767331574 hasAuthorship W2767331574A5058172408 @default.
W2767331574 hasConcept C154945302 @default.
W2767331574 hasConcept C15744967 @default.
W2767331574 hasConcept C162324750 @default.
W2767331574 hasConcept C175444787 @default.
W2767331574 hasConcept C180747234 @default.
W2767331574 hasConcept C339426 @default.
W2767331574 hasConcept C41008148 @default.
W2767331574 hasConcept C67203356 @default.
W2767331574 hasConcept C77805123 @default.
W2767331574 hasConcept C97541855 @default.
W2767331574 hasConceptScore W2767331574C154945302 @default.
W2767331574 hasConceptScore W2767331574C15744967 @default.
W2767331574 hasConceptScore W2767331574C162324750 @default.
W2767331574 hasConceptScore W2767331574C175444787 @default.
W2767331574 hasConceptScore W2767331574C180747234 @default.
W2767331574 hasConceptScore W2767331574C339426 @default.
W2767331574 hasConceptScore W2767331574C41008148 @default.
W2767331574 hasConceptScore W2767331574C67203356 @default.
W2767331574 hasConceptScore W2767331574C77805123 @default.
W2767331574 hasConceptScore W2767331574C97541855 @default.
W2767331574 hasIssue "30" @default.
W2767331574 hasLocation W27673315741 @default.
W2767331574 hasOpenAccess W2767331574 @default.
W2767331574 hasPrimaryLocation W27673315741 @default.
W2767331574 hasRelatedWork W1487265965 @default.
W2767331574 hasRelatedWork W1557479620 @default.
W2767331574 hasRelatedWork W1595536930 @default.
W2767331574 hasRelatedWork W1971278719 @default.
W2767331574 hasRelatedWork W2109435795 @default.
W2767331574 hasRelatedWork W2137102617 @default.
W2767331574 hasRelatedWork W2288539272 @default.
W2767331574 hasRelatedWork W2466839386 @default.
W2767331574 hasRelatedWork W2547385335 @default.
W2767331574 hasRelatedWork W2615791846 @default.
W2767331574 hasRelatedWork W2786762040 @default.
W2767331574 hasRelatedWork W2884887308 @default.
W2767331574 hasRelatedWork W2889340798 @default.
W2767331574 hasRelatedWork W2895316822 @default.
W2767331574 hasRelatedWork W3011154666 @default.
W2767331574 hasRelatedWork W3121159245 @default.
W2767331574 hasRelatedWork W3123578931 @default.
W2767331574 hasRelatedWork W3157960594 @default.
W2767331574 hasRelatedWork W412111484 @default.
W2767331574 hasRelatedWork W758889636 @default.
W2767331574 hasVolume "30" @default.
W2767331574 isParatext "false" @default.
W2767331574 isRetracted "false" @default.
W2767331574 magId "2767331574" @default.
W2767331574 workType "article" @default.