Abstract
Although much ignored in some intellectual circles today, behaviorism and its models from the early to mid parts of the twentieth century provided the basis for some of the first computational accounts of reward learning. The best expression of this work emerged in the early 1970s with the Rescorla–Wagner model of Pavlovian conditioning. This model accounted for a range of behavioral data about how animals learn about cues that predict rewarding outcomes. The step forward in this account was that learning was depicted as being driven by failed predictions—that is, some system collected information, formed expectations about how much reward to expect (associated with “conditioned stimuli” or cs), and generated learning updates that were proportional to the size and sign of the error. While successful in describing a large body of data, the Rescorla–Wagner model failed at one critical aspect of simple learning—the capacity to “chain” important cues together into a trajectory of learned associations—a feature called secondary conditioning: “A predicts B predicts food,” for example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
American Psychiatric Association (2000) Diagnostic and statistical manual of mental disorders, Revised 4th edn. American Psychiatric Association, Washington, DC
Barto AG, Sutton RS, Anderson CW (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13:834–846
Bayer HM, Glimcher PW (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47(1):129–141
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton, NJ
Chiu PH, Lohrenz TM, Montague PR (2008) Smokers’ brains compute, but ignore, a fictive error signal in a sequential investment task. Nat Neurosci 11(4):514–520
Dani JA, Montague PR (2007) Disrupting addiction through the loss of drug-associated internal states. Nat Neurosci 10(4):403–404
Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16(2):199–204
Hayden BY, Pearson JM, Platt ML (2009) Fictive reward signals in the anterior cingulate cortex. Science 324(5929):948–950
Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1:304–309
Koob GF, Le Moal M (1997) Drug abuse: hedonic homeostatic dysregulation. Science 278(5335):52–58
Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopaminergic neurons during learning of behavioral reactions. J Neurophysiol 67:145–163
Lohrenz T, McCabe K, Camerer CF, Montague PR (2007) Neural signature of fictive learning signals in a sequential investment task. Proc Natl Acad Sci USA 104(22):9493–9498
McClure SM, Daw ND, Montague PR (2003) A computational substrate for incentive salience. Trends Neurosci 26(8):423–428
Montague PR, Dayan P, Nowlan SJ, Pouget A, Sejnowski TJ (1993) Using aperiodic reinforcement for directed self-organization. In: Giles CL, Hanson SJ, Cowan JD (eds) Advances in neural information processing systems, vol 5. Morgan Kaufmann, San Mateo, CA, pp 969–976
Montague PR, Dayan P, Person C, Sejnowski TJ (1995) Bee foraging in uncertain environments using predictive hebbian learning. Nature 377:725–728
Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive hebbian learning. J Neurosci 16(5):1936–1947
Montague PR, Hyman SE, Cohen JD (2004) Computational roles for dopamine in behavioural control. Nature 431(7010):760–767
O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329–337
Pidoplichko VI, DeBiasi M, Williams JT, Dani JA (1997) Nicotine activates and desensitizes midbrain dopamine neurons. Nature 390(6658):401–404
Quartz SR, Dayan P, Montague PR, Sejnowski TJ (1993) Expectation learning in the brain using diffuse ascending projections. Soc Neurosci Abstr 18:1210
Redish AD (2004) Addiction as a computational process gone awry. Science 306(5703):1944–1947
Rescorla RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In: Black A, Prokasy W (eds) Classical conditioning. Appleton-Century-Crofts, New York, NY, pp 64–99
Romo R, Schultz W (1990) Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J Neurophysiol 63:592–606
Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900–913
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
Sutton RS, Barto AG (1990) Time-derivative models of Pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, MA, pp 497–537
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA
Tobler PN, Fiorillo CD, Schultz W (2005) Adaptive coding of reward value by dopamine neurons. Science 307(5715):1642–1645
Volkow ND, Fowler JS, Wang GJ (2002) Role of dopamine in drug reinforcement and addiction in humans: results from imaging studies. Behav Pharmacol 13(5–6):355–366
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Montague, P.R. (2013). Reinforcement Learning Models Then-and-Now: From Single Cells to Modern Neuroimaging. In: Bower, J. (eds) 20 Years of Computational Neuroscience. Springer Series in Computational Neuroscience, vol 9. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1424-7_13
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1424-7_13
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1423-0
Online ISBN: 978-1-4614-1424-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)