Abstract
The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have inspired computational theories such as temporal difference learning1. However, there is at present no adequate neurobiological account of how this learning occurs. Here, in a functional magnetic resonance imaging (fMRI) study of higher-order aversive conditioning, we describe a key computational strategy that humans use to learn predictions about pain. We show that neural activity in the ventral striatum and the anterior insula displays a marked correspondence to the signals for sequential learning predicted by temporal difference models. This result reveals a flexible aversive learning process ideally suited to the changing and uncertain nature of real-world environments. Taken with existing data on reward learning2, our results suggest a critical role for the ventral striatum in integrating complex appetitive and aversive predictions to coordinate behaviour.
Similar content being viewed by others
References
Sutton, R. S. & Barto, A. G. in Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds Gabriel, M. & Moore, J.) 497–537 (MIT, Cambridge, Massachusetts, 1990)
Everitt, B. J. et al. Associative processes in addiction and reward. The role of amygdala–ventral striatal subsystems. Ann. NY Acad. Sci. 877, 412–438 (1999)
LeDoux, J. Fear and the brain: where have we been, and where are we going? Biol. Psychiatry 44, 1229–1238 (1998)
Buchel, C. & Dolan, R. J. Classical fear conditioning in functional neuroimaging. Curr. Opin. Neurobiol. 10, 219–223 (2000)
Ploghaus, A. et al. Dissociating pain from its anticipation in the human brain. Science 284, 1979–1981 (1999)
Ploghaus, A. et al. Learning about pain: the neural substrate of the prediction error for aversive events. Proc. Natl Acad. Sci. USA 97, 9281–9286 (2000)
Dickinson, A. Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, UK, 1980)
Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981)
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT, Cambridge, Massachusetts, 1998)
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Suri, R. E. & Schultz, W. Temporal difference model reproduces anticipatory neural activity. Neural Comput. 13, 841–862 (2001)
O'Doherty, J. P., Dayan, P., Friston, K., Critchley, H. & Dolan, R. J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003)
Friston, K. J., Tononi, G., Reeke, G. N. Jr, Sporns, O. & Edelman, G. M. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59, 229–243 (1994)
McClure, S. M., Berns, G. S. & Montague, P. R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003)
Daw, N. D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616 (2002)
Brandon, S. E., Vogel, E. H. & Wagner, A. R. Stimulus representation in SOP: I. Theoretical rationalization and some implications. Behav. Processes 62, 5–25 (2003)
Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike elements that can solve difficult learning problems. IEEE Trans. Syst. Man Cybern. 13, 834–846 (1983)
Barto, A. G., Sutton, R. S. & Watkins, C. J. C. H. in Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds Gabriel, M. & Moor, J.) 539–602 (MIT, Cambridge, Massachusetts, 1990)
Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G.) 215–232 (MIT, Cambridge, Massachusetts, 1995)
Chudler, E. H. & Dong, W. K. The role of the basal ganglia in nociception and pain. Pain 60, 3–38 (1995)
Solomon, R. L. & Corbit, J. D. An opponent-process theory of motivation. I. Temporal dynamics of affect. Psychol. Rev. 81, 119–145 (1974)
Dickinson, A. & Dearing, M. F. in Mechanisms of Learning and Motivation (eds Dickinson, A. & Boakes, R. A.) 203–231 (Erlbaum, Hillsdale, New Jersey, 1979)
Horvitz, J. C. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656 (2000)
Azmitia, E. C. & Segal, M. An autoradiographic analysis of the differential ascending projections of the dorsal and median raphe nuclei in the rat. J. Comp. Neurol. 179, 641–667 (1978)
Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 (1996)
Horvitz, J. C. Dopamine gating of glutamatergic sensorimotor and incentive motivational input signals to the striatum. Behav. Brain Res. 137, 65–74 (2002)
Ploghaus, A., Becerra, L., Borras, C. & Borsook, D. Neural circuitry underlying pain modulation: expectation, hypnosis, placebo. Trends Cogn. Sci. 7, 197–200 (2003)
Deichmann, R., Gottfried, J. A., Hutton, C. & Turner, R. Optimized EPI for fMRI studies of the orbitofrontal cortex. Neuroimage 19, 430–441 (2003)
Buchel, C., Dolan, R. J., Armony, J. L. & Friston, K. J. Amygdala–hippocampal involvement in human aversive trace conditioning revealed through event-related functional magnetic resonance imaging. J. Neurosci. 19, 10869–10876 (1999)
Acknowledgements
We thank P. Allen and E. Featherstone for technical help. This work was funded by Wellcome Trust program grants to R.S.F., K.J.F., M.K. and R.J.D. P.D. was funded by the Gatsby Charitable foundation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing financial interests.
Rights and permissions
About this article
Cite this article
Seymour, B., O'Doherty, J., Dayan, P. et al. Temporal difference models describe higher-order learning in humans. Nature 429, 664–667 (2004). https://doi.org/10.1038/nature02581
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/nature02581
- Springer Nature Limited
This article is cited by
-
Aberrations in temporal dynamics of cognitive processing induced by Parkinson’s disease and Levodopa
Scientific Reports (2023)
-
Strengths of social ties modulate brain computations for third-party punishment
Scientific Reports (2023)
-
Striatal hub of dynamic and stabilized prediction coding in forebrain networks for olfactory reinforcement learning
Nature Communications (2022)
-
Personalized information and willingness to pay for non-financial risk prevention: An experiment
Journal of Risk and Uncertainty (2022)
-
Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans
Nature Communications (2021)