Skip to main content

Advertisement

Log in

Adaptive properties of differential learning rates for positive and negative outcomes

  • Letter to the Editor
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

The concept of the reward prediction error—the difference between reward obtained and reward predicted—continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static “bandit” choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  • Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS (2007) Learning the value of information in an uncertain world. Nat Neurosci 10(9):1214–1221

    Article  PubMed  CAS  Google Scholar 

  • Bromberg-Martin ES, Matsumoto M, Hikosaka O (2010) Dopamine in motivational control: rewarding, aversive, and alerting. Neuron 68(5):815–834

    Article  PubMed  CAS  Google Scholar 

  • Cavanagh JF, Frank MJ (2011) Social stress reactivity alters reward and punishment learning. Soc Cogn Affect Neurosci 6(3):311–320

    Article  PubMed  Google Scholar 

  • Chase HW, Clark L (2010) Gambling severity predicts midbrain response to near-miss outcomes. J Neurosci 30(18):6180–6187

    Article  PubMed  CAS  Google Scholar 

  • D’Acremont M, Bossaerts P (2008) Neurobiological studies of risk assessment: a comparison of expected utility and mean-variance approaches. Cogn Affect Behav Neurosci 8(4):363–374

    Article  PubMed  Google Scholar 

  • Redish AD, Jensen S, Johnson A, Kurth-Nelson Z (2007) Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol Rev 114(3):784–805

    Article  PubMed  Google Scholar 

  • Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095):876–879

    Article  PubMed  CAS  Google Scholar 

  • Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences on humans’ choices and striatal prediction errors. Neuron 69(6):1204–1215

    Article  PubMed  CAS  Google Scholar 

  • Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18(2):185–196

    Article  PubMed  CAS  Google Scholar 

  • Doll BB, Jacobs WJ, Sanfey AG, Frank MJ (2009) Instructional control of reinforcement learning: a behavioral and neurocomputational investigation. Brain Res 1299:74–94

    Article  PubMed  CAS  Google Scholar 

  • Doya K (2002) Metalearning and neuromodulation. Neural Netw 15(4–6):495–506

    Article  PubMed  Google Scholar 

  • Doya K, Samejima K, Katagiri K, Kawato M (2002) Multiple model-based reinforcement learning. Neural Comput 14(6):1347–1369

    Article  PubMed  Google Scholar 

  • Fiorillo CD (2013) Two dimensions of value: dopamine neurons represent reward but not aversiveness. Science 341(6145):546–549

    Article  PubMed  CAS  Google Scholar 

  • Frank MJ, Seeberger LC, O’reilly RC (2004) By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306(5703):1940–1943

    Article  PubMed  CAS  Google Scholar 

  • Frank MJ, Moustafa AA, Haughey HM, Curran T, Hutchison KE (2007) Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc Natl Acad Sci 104(41):16311–16316

    Article  PubMed  CAS  Google Scholar 

  • Frank MJ, Doll BB, Oas-Terpstra J, Moreno F (2009) Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat Neurosci 12(8):1062–1068

    Article  PubMed  CAS  Google Scholar 

  • Gerfen CR, Engber TM, Mahan LC, Susel Z, Chase TN, Monsma FJ Jr, Sibley DR (1990) \(\text{ D }_1\) and \(\text{ D }_2\) dopamine receptor-regulated gene expression of striatonigral and striatopallidal neurons. Science 250:1429–1432

    Article  PubMed  CAS  Google Scholar 

  • Gershman SJ, Niv Y (2010) Learning latent structure: carving nature at its joints. Curr Opin Neurobiol 20(2):251–256

    Article  PubMed  CAS  Google Scholar 

  • Grace AA (2012) Dopamine system dysregulation by the hippocampus: implications for the pathophysiology and treatment of schizophrenia. Neuropharmacology 62(3):1342–1348

    Article  PubMed  CAS  Google Scholar 

  • Humphries MD, Khamassi M, Gurney K (2012) Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Front Neurosci 6(February):9

    PubMed  Google Scholar 

  • Kahneman D, Tversky A (1979) Prospect theory: an analysis of decision under risk. Econ J Econ Soc 47(2):263–292

    Google Scholar 

  • Khamassi M, Lallée S, Enel P, Procyk E, Dominey PF (2011) Robot cognitive control with a neurophysiologically inspired reinforcement learning model. Front Neurorobotic 5(July):1

    Google Scholar 

  • Khamassi M, Enel P, Dominey PF, Procyk E (2013) Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters. Prog Brain Res 202:441–464

    Article  PubMed  Google Scholar 

  • Kravitz AV, Tye LD, Kreitzer AC (2012) Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nat Neurosci 15:816–818

    Google Scholar 

  • Kurth-Nelson Z, Redish AD (2009) Temporal-difference reinforcement learning with distributed representations. PLoS One 4(10):e7362

    Article  PubMed  Google Scholar 

  • Maia TV, Frank MJ (2011) From reinforcement learning models to psychiatric and neurological disorders. Nat Neurosci 14(2):154–162

    Article  PubMed  CAS  Google Scholar 

  • Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Mach Learn 49:267–290

    Article  Google Scholar 

  • Niv Y, Duff MO, Dayan P (2005) Dopamine, uncertainty and TD learning. Behav Brain Funct 1:6

    Article  PubMed  Google Scholar 

  • Niv Y, Daw ND, Joel D, Dayan P (2007) Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology 191(3):507–520

    Article  PubMed  CAS  Google Scholar 

  • O’Doherty JP, Hampton A, Kim H (2007) Model-based fMRI and its application to reward learning and decision making. Ann NY Acad Sci 1104:35–53

    Article  PubMed  Google Scholar 

  • Redish AD (2004) Addiction as a computational process gone awry. Science 306(5703):1944–1947

    Article  PubMed  CAS  Google Scholar 

  • Schultz W (2006) Behavioral theories and the neurophysiology of reward. Annu Rev Psychol 57:87–115

    Article  PubMed  Google Scholar 

  • Schweighofer N, Doya K (2003) Meta-learning in reinforcement learning. Neural Netw 16(1):5–9

    Article  PubMed  Google Scholar 

  • Sharot T (2011) The optimism bias. Curr Biol 21(23):R941–R945

    Google Scholar 

  • Sharot T, Korn CW, Dolan RJ (2011) How unrealistic optimism is maintained in the face of reality. Nat Neurosci 14(11):1475–1479

    Article  PubMed  CAS  Google Scholar 

  • Shenhav A, Botvinick MM, Cohen JD (2013) The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron 79(2):217–240

    Article  PubMed  CAS  Google Scholar 

  • Sutton RS (1984) Temporal credit assignment in reinforcement learning. Doctoral Dissertation, UMass Amherst

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA

    Google Scholar 

  • van der Meer M, Kurth-Nelson Z, Redish AD (2012) Information processing in decision-making systems. Neuroscientist 18(4):342–359

    Article  PubMed  Google Scholar 

  • Watkins C (1989) Learning from delayed rewards. PhD thesis

  • Yu AJ (2007) Adaptive behavior: humans act as bayesian learners. Curr Biol 17(22):R977–R980

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This work originated at the Okinawa Computational Neuroscience Course at the Okinawa Institute for Science and Technology (OIST), Japan. We are grateful to the organizers for providing a stimulating learning environment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthijs A. A. van der Meer.

Additional information

R. D. Cazé is supported by a Marie Curie initial training fellowship (PITN-GA-2011-289146 of the European Union’s Seventh Framework Programme FP7 2007–13). M. A. A. van der Meer is supported by the National Science and Engineering Council of Canada (NSERC).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cazé, R.D., van der Meer, M.A.A. Adaptive properties of differential learning rates for positive and negative outcomes. Biol Cybern 107, 711–719 (2013). https://doi.org/10.1007/s00422-013-0571-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-013-0571-5

Keywords

Navigation