Skip to main content

Reinforcement Learning Models Then-and-Now: From Single Cells to Modern Neuroimaging

  • Chapter
  • First Online:
20 Years of Computational Neuroscience

Part of the book series: Springer Series in Computational Neuroscience ((NEUROSCI,volume 9))

  • 2364 Accesses

Abstract

Although much ignored in some intellectual circles today, behaviorism and its models from the early to mid parts of the twentieth century provided the basis for some of the first computational accounts of reward learning. The best expression of this work emerged in the early 1970s with the Rescorla–Wagner model of Pavlovian conditioning. This model accounted for a range of behavioral data about how animals learn about cues that predict rewarding outcomes. The step forward in this account was that learning was depicted as being driven by failed predictions—that is, some system collected information, formed expectations about how much reward to expect (associated with “conditioned stimuli” or cs), and generated learning updates that were proportional to the size and sign of the error. While successful in describing a large body of data, the Rescorla–Wagner model failed at one critical aspect of simple learning—the capacity to “chain” important cues together into a trajectory of learned associations—a feature called secondary conditioning: “A predicts B predicts food,” for example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • American Psychiatric Association (2000) Diagnostic and statistical manual of mental disorders, Revised 4th edn. American Psychiatric Association, Washington, DC

    Google Scholar 

  • Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13:834–846

    Article  Google Scholar 

  • Bayer HM, Glimcher PW (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47(1):129–141

    Article  PubMed  CAS  Google Scholar 

  • Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton, NJ

    Google Scholar 

  • Chiu PH, Lohrenz TM, Montague PR (2008) Smokers’ brains compute, but ignore, a fictive error signal in a sequential investment task. Nat Neurosci 11(4):514–520

    Article  PubMed  CAS  Google Scholar 

  • Dani JA, Montague PR (2007) Disrupting addiction through the loss of drug-associated internal states. Nat Neurosci 10(4):403–404

    Article  PubMed  CAS  Google Scholar 

  • Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16(2):199–204

    Article  PubMed  CAS  Google Scholar 

  • Hayden BY, Pearson JM, Platt ML (2009) Fictive reward signals in the anterior cingulate cortex. Science 324(5929):948–950

    Article  PubMed  CAS  Google Scholar 

  • Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1:304–309

    Article  PubMed  CAS  Google Scholar 

  • Koob GF, Le Moal M (1997) Drug abuse: hedonic homeostatic dysregulation. Science 278(5335):52–58

    Article  PubMed  CAS  Google Scholar 

  • Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopaminergic neurons during learning of behavioral reactions. J Neurophysiol 67:145–163

    PubMed  CAS  Google Scholar 

  • Lohrenz T, McCabe K, Camerer CF, Montague PR (2007) Neural signature of fictive learning signals in a sequential investment task. Proc Natl Acad Sci USA 104(22):9493–9498

    Article  PubMed  CAS  Google Scholar 

  • McClure SM, Daw ND, Montague PR (2003) A computational substrate for incentive salience. Trends Neurosci 26(8):423–428

    Article  PubMed  CAS  Google Scholar 

  • Montague PR, Dayan P, Nowlan SJ, Pouget A, Sejnowski TJ (1993) Using aperiodic reinforcement for directed self-organization. In: Giles CL, Hanson SJ, Cowan JD (eds) Advances in neural information processing systems, vol 5. Morgan Kaufmann, San Mateo, CA, pp 969–976

    Google Scholar 

  • Montague PR, Dayan P, Person C, Sejnowski TJ (1995) Bee foraging in uncertain environments using predictive hebbian learning. Nature 377:725–728

    Article  PubMed  CAS  Google Scholar 

  • Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive hebbian learning. J Neurosci 16(5):1936–1947

    PubMed  CAS  Google Scholar 

  • Montague PR, Hyman SE, Cohen JD (2004) Computational roles for dopamine in behavioural control. Nature 431(7010):760–767

    Article  PubMed  CAS  Google Scholar 

  • O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329–337

    Article  PubMed  Google Scholar 

  • Pidoplichko VI, DeBiasi M, Williams JT, Dani JA (1997) Nicotine activates and desensitizes midbrain dopamine neurons. Nature 390(6658):401–404

    Article  PubMed  CAS  Google Scholar 

  • Quartz SR, Dayan P, Montague PR, Sejnowski TJ (1993) Expectation learning in the brain using diffuse ascending projections. Soc Neurosci Abstr 18:1210

    Google Scholar 

  • Redish AD (2004) Addiction as a computational process gone awry. Science 306(5703):1944–1947

    Article  PubMed  CAS  Google Scholar 

  • Rescorla RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In: Black A, Prokasy W (eds) Classical conditioning. Appleton-Century-Crofts, New York, NY, pp 64–99

    Google Scholar 

  • Romo R, Schultz W (1990) Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J Neurophysiol 63:592–606

    PubMed  CAS  Google Scholar 

  • Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900–913

    PubMed  CAS  Google Scholar 

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599

    Article  PubMed  CAS  Google Scholar 

  • Sutton RS, Barto AG (1990) Time-derivative models of Pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, MA, pp 497–537

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA

    Google Scholar 

  • Tobler PN, Fiorillo CD, Schultz W (2005) Adaptive coding of reward value by dopamine neurons. Science 307(5715):1642–1645

    Article  PubMed  CAS  Google Scholar 

  • Volkow ND, Fowler JS, Wang GJ (2002) Role of dopamine in drug reinforcement and addiction in humans: results from imaging studies. Behav Pharmacol 13(5–6):355–366

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Read Montague .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Montague, P.R. (2013). Reinforcement Learning Models Then-and-Now: From Single Cells to Modern Neuroimaging. In: Bower, J. (eds) 20 Years of Computational Neuroscience. Springer Series in Computational Neuroscience, vol 9. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1424-7_13

Download citation

Publish with us

Policies and ethics