Reinforcement Learning Models Then-and-Now: From Single Cells to Modern Neuroimaging

Montague, P. Read

doi:10.1007/978-1-4614-1424-7_13

P. Read Montague^2,3,4

Part of the book series: Springer Series in Computational Neuroscience ((NEUROSCI,volume 9))

2364 Accesses

Abstract

Although much ignored in some intellectual circles today, behaviorism and its models from the early to mid parts of the twentieth century provided the basis for some of the first computational accounts of reward learning. The best expression of this work emerged in the early 1970s with the Rescorla–Wagner model of Pavlovian conditioning. This model accounted for a range of behavioral data about how animals learn about cues that predict rewarding outcomes. The step forward in this account was that learning was depicted as being driven by failed predictions—that is, some system collected information, formed expectations about how much reward to expect (associated with “conditioned stimuli” or cs), and generated learning updates that were proportional to the size and sign of the error. While successful in describing a large body of data, the Rescorla–Wagner model failed at one critical aspect of simple learning—the capacity to “chain” important cues together into a trajectory of learned associations—a feature called secondary conditioning: “A predicts B predicts food,” for example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

American Psychiatric Association (2000) Diagnostic and statistical manual of mental disorders, Revised 4th edn. American Psychiatric Association, Washington, DC
Google Scholar
Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13:834–846
Article Google Scholar
Bayer HM, Glimcher PW (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47(1):129–141
Article PubMed CAS Google Scholar
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton, NJ
Google Scholar
Chiu PH, Lohrenz TM, Montague PR (2008) Smokers’ brains compute, but ignore, a fictive error signal in a sequential investment task. Nat Neurosci 11(4):514–520
Article PubMed CAS Google Scholar
Dani JA, Montague PR (2007) Disrupting addiction through the loss of drug-associated internal states. Nat Neurosci 10(4):403–404
Article PubMed CAS Google Scholar
Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16(2):199–204
Article PubMed CAS Google Scholar
Hayden BY, Pearson JM, Platt ML (2009) Fictive reward signals in the anterior cingulate cortex. Science 324(5929):948–950
Article PubMed CAS Google Scholar
Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1:304–309
Article PubMed CAS Google Scholar
Koob GF, Le Moal M (1997) Drug abuse: hedonic homeostatic dysregulation. Science 278(5335):52–58
Article PubMed CAS Google Scholar
Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopaminergic neurons during learning of behavioral reactions. J Neurophysiol 67:145–163
PubMed CAS Google Scholar
Lohrenz T, McCabe K, Camerer CF, Montague PR (2007) Neural signature of fictive learning signals in a sequential investment task. Proc Natl Acad Sci USA 104(22):9493–9498
Article PubMed CAS Google Scholar
McClure SM, Daw ND, Montague PR (2003) A computational substrate for incentive salience. Trends Neurosci 26(8):423–428
Article PubMed CAS Google Scholar
Montague PR, Dayan P, Nowlan SJ, Pouget A, Sejnowski TJ (1993) Using aperiodic reinforcement for directed self-organization. In: Giles CL, Hanson SJ, Cowan JD (eds) Advances in neural information processing systems, vol 5. Morgan Kaufmann, San Mateo, CA, pp 969–976
Google Scholar
Montague PR, Dayan P, Person C, Sejnowski TJ (1995) Bee foraging in uncertain environments using predictive hebbian learning. Nature 377:725–728
Article PubMed CAS Google Scholar
Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive hebbian learning. J Neurosci 16(5):1936–1947
PubMed CAS Google Scholar
Montague PR, Hyman SE, Cohen JD (2004) Computational roles for dopamine in behavioural control. Nature 431(7010):760–767
Article PubMed CAS Google Scholar
O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329–337
Article PubMed Google Scholar
Pidoplichko VI, DeBiasi M, Williams JT, Dani JA (1997) Nicotine activates and desensitizes midbrain dopamine neurons. Nature 390(6658):401–404
Article PubMed CAS Google Scholar
Quartz SR, Dayan P, Montague PR, Sejnowski TJ (1993) Expectation learning in the brain using diffuse ascending projections. Soc Neurosci Abstr 18:1210
Google Scholar
Redish AD (2004) Addiction as a computational process gone awry. Science 306(5703):1944–1947
Article PubMed CAS Google Scholar
Rescorla RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In: Black A, Prokasy W (eds) Classical conditioning. Appleton-Century-Crofts, New York, NY, pp 64–99
Google Scholar
Romo R, Schultz W (1990) Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J Neurophysiol 63:592–606
PubMed CAS Google Scholar
Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900–913
PubMed CAS Google Scholar
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
Article PubMed CAS Google Scholar
Sutton RS, Barto AG (1990) Time-derivative models of Pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, MA, pp 497–537
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA
Google Scholar
Tobler PN, Fiorillo CD, Schultz W (2005) Adaptive coding of reward value by dopamine neurons. Science 307(5715):1642–1645
Article PubMed CAS Google Scholar
Volkow ND, Fowler JS, Wang GJ (2002) Role of dopamine in drug reinforcement and addiction in humans: results from imaging studies. Behav Pharmacol 13(5–6):355–366
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Human Neuroimaging Lab, The Virginia Tech Carilion Research Institute, 2 Riverside Circle, Roanoke, VA, 24016, USA
P. Read Montague
Department of Physics, Virginia Tech, Roanoke, VA, USA
P. Read Montague
Wellcome Trust Centre for Neuroimaging, University College London, London, UK
P. Read Montague

Authors

P. Read Montague
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Read Montague .

Editor information

Editors and Affiliations

7703 Floyd Curl Dr, San Antonio, 78229-3901, Texas, USA
James M Bower

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Montague, P.R. (2013). Reinforcement Learning Models Then-and-Now: From Single Cells to Modern Neuroimaging. In: Bower, J. (eds) 20 Years of Computational Neuroscience. Springer Series in Computational Neuroscience, vol 9. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1424-7_13

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1424-7_13
Published: 17 June 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1423-0
Online ISBN: 978-1-4614-1424-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics