Reinforcement learning, conditioning, and the brain: Successes and challenges

Abstract

The field of reinforcement learning has greatly influenced the neuroscientific study of conditioning. This article provides an introduction to reinforcement learning followed by an examination of the successes and challenges using reinforcement learning to understand the neural bases of conditioning. Successes reviewed include (1) the mapping of positive and negative prediction errors to the firing of dopamine neurons and neurons in the lateral habenula, respectively; (2) the mapping of model-based and model-free reinforcement learning to associative and sensorimotor cortico-basal ganglia-thalamo-cortical circuits, respectively; and (3) the mapping of actor and critic to the dorsal and ventral striatum, respectively. Challenges reviewed consist of several behavioral and neural findings that are at odds with standard reinforcement-learning models, including, among others, evidence for hyperbolic discounting and adaptive coding. The article suggests ways of reconciling reinforcement-learning models with many of the challenging findings, and highlights the need for further theoretical developments where necessary. Additional information related to this study may be downloaded from http://cabn.psychonomic-journals.org/content/supplemental.

References

  1. Abler, B., Walter, H., Erk, S., Kammerer, H., & Spitzer, M. (2006). Prediction error as a linear function of reward probability is coded in human nucleus accumbens. NeuroImage, 31, 790–795.

    PubMed  Article  Google Scholar 

  2. Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 34B, 77–98.

    Google Scholar 

  3. Ainslie, G. (1975). Specious reward: A behavioral theory of impulsiveness and impulse control. Psychological Bulletin, 82, 463–496.

    PubMed  Article  Google Scholar 

  4. Alexander, G. E., DeLong, M. R., & Strick, P. L. (1986). Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annual Review of Neuroscience, 9, 357–381.

    PubMed  Article  Google Scholar 

  5. Aron, A. R., Shohamy, D., Clark, J., Myers, C., Gluck, M. A., & Poldrack, R. A. (2004). Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. Journal of Neurophysiology, 92, 1144–1152.

    PubMed  Article  Google Scholar 

  6. Barnes, T. D., Kubota, Y., Hu, D., Jin, D. Z., & Graybiel, A. M. (2005). Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories. Nature, 437, 1158–1161.

    PubMed  Article  Google Scholar 

  7. Barto, A. G. (1995). Adaptive critics and the basal ganglia. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 215–232). Cambridge, MA: MIT Press.

    Google Scholar 

  8. Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory & Applications, 13, 343–379.

    Google Scholar 

  9. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, & Cybernetics, 13, 834–846.

    Google Scholar 

  10. Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141.

    PubMed  Article  Google Scholar 

  11. Bayer, H. M., Lau, B., & Glimcher, P. W. (2007). Statistics of midbrain dopamine neuron spike trains in the awake primate. Journal of Neurophysiology, 98, 1428–1439.

    PubMed  Article  Google Scholar 

  12. Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.

    Google Scholar 

  13. Belova, M. A., Paton, J. J., & Salzman, C. D. (2008). Moment-tomoment tracking of state value in the amygdala. Journal of Neuroscience, 28, 10023–10030.

    PubMed  Article  Google Scholar 

  14. Bernoulli, D. (1954). Exposition of a new theory on the measurement of risk. Econometrica, 22, 23–36. (Original work published 1738)

    Article  Google Scholar 

  15. Berns, G. S., Capra, C. M., Chappelow, J., Moore, S., & Noussair, C. (2008). Nonlinear neurobiological probability weighting functions for aversive outcomes. NeuroImage, 39, 2047–2057.

    PubMed  Article  Google Scholar 

  16. Botvinick, M. M., Niv, Y., & Barto, A. G. (in press). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition. doi:10.1016/j.cognition.2008.08.011

  17. Botvinick, M. M., & Plaut, D. C. (2004). Doing without schema hierarchies: A recurrent connectionist approach to normal and impaired routine sequential action. Psychological Review, 111, 395–429.

    PubMed  Article  Google Scholar 

  18. Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 393–400). Cambridge, MA: MIT Press.

    Google Scholar 

  19. Bray, S., & O’Doherty, J. (2007). Neural coding of reward-prediction error signals during classical conditioning with attractive faces. Journal of Neurophysiology, 97, 3036–3045.

    PubMed  Article  Google Scholar 

  20. Brischoux, F., Chakraborty, S., Brierley, D. I., & Ungless, M. A. (2009). Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proceedings of the National Academy of Sciences, 106, 4894–4899.

    Article  Google Scholar 

  21. Brown, L. L., & Wolfson, L. I. (1983). A dopamine-sensitive striatal efferent system mapped with [14C]deoxyglucose in the rat. Brain Research, 261, 213–229.

    PubMed  Article  Google Scholar 

  22. Calabresi, P., Pisani, A., Centonze, D., & Bernardi, G. (1997). Synaptic plasticity and physiological interactions between dopamine and glutamate in the striatum. Neuroscience & Biobehavioral Reviews, 21, 519–523.

    Article  Google Scholar 

  23. Camerer, C. F., & Loewenstein, G. (2004). Behavioral economics: Past, present, future. In C. F. Camerer, G. Loewenstein, & M. Rabin (Eds.), Advances in behavioral economics (pp. 3–51). Princeton, NJ: Princeton University Press.

    Google Scholar 

  24. Cardinal, R. N., Parkinson, J. A., Hall, J., & Everitt, B. J. (2002). Emotion and motivation: The role of the amygdala, ventral striatum, and prefrontal cortex. Neuroscience & Biobehavioral Reviews, 26, 321–352.

    Article  Google Scholar 

  25. Cassandra, A. R., Kaelbling, L. P., & Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. In Proceedings of the 12th National Conference on Artificial Intelligence (pp. 1023–1028). Menlo Park, CA: AAAI Press.

    Google Scholar 

  26. Cavada, C., Company, T., Tejedor, J., Cruz-Rizzolo, R. J., & Reinoso-Suarez, F. (2000). The anatomical connections of the macaque monkey orbitofrontal cortex: A review. Cerebral Cortex, 10, 220–242.

    PubMed  Article  Google Scholar 

  27. Christoph, G. R., Leonzio, R. J., & Wilcox, K. S. (1986). Stimulation of the lateral habenula inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area of the rat. Journal of Neuroscience, 6, 613–619.

    PubMed  Google Scholar 

  28. Cools, R., Robinson, O. J., & Sahakian, B. (2008). Acute tryptophan depletion in healthy volunteers enhances punishment prediction but does not affect reward prediction. Neuropsychopharmacology, 33, 2291–2299.

    PubMed  Article  Google Scholar 

  29. D’Ardenne, K., McClure, S. M., Nystrom, L. E., & Cohen, J. D. (2008). BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science, 319, 1264–1267.

    PubMed  Article  Google Scholar 

  30. Daw, N. D. (2003). Reinforcement learning models of the dopamine system and their behavioral implications. Unpublished doctoral dissertation, Carnegie Mellon University, Pittsburgh.

    Google Scholar 

  31. Daw, N. D., Courville, A. C., & Dayan, P. (2008). Semi-rational models of conditioning: The case of trial order. In N. Chater & M. Oaksford (Eds.), The probabilistic mind: Prospects for Bayesian cognitive science (pp. 431–452). Oxford: Oxford University Press.

    Google Scholar 

  32. Daw, N. D., Courville, A. C., & Touretzky, D. S. (2006). Representation and timing in theories of the dopamine system. Neural Computation, 18, 1637–1677.

    PubMed  Article  Google Scholar 

  33. Daw, N. D., Kakade, S., & Dayan, P. (2002). Opponent interactions between serotonin and dopamine. Neural Networks, 15, 603–616.

    PubMed  Article  Google Scholar 

  34. Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711.

    PubMed  Article  Google Scholar 

  35. Daw, N. D., Niv, Y., & Dayan, P. (2006). Actions, policies, values, and the basal ganglia. In E. Bezard (Ed.), Recent breakthroughs in basal ganglia research (pp. 111–130). New York: Nova Science.

    Google Scholar 

  36. Day, J. J., Roitman, M. F., Wightman, R. M., & Carelli, R. M. (2007). Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nature Neuroscience, 10, 1020–1028.

    PubMed  Article  Google Scholar 

  37. Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q- learning. In Proceedings of the 15th National Conference on Artificial Intelligence (pp. 761–768). Menlo Park, CA: AAAI Press.

    Google Scholar 

  38. De Pisapia, N., & Goddard, N. H. (2003). A neural model of fronto striatal interactions for behavioural planning and action chunking. Neurocomputing, 52–54, 489–495. doi:10.1016/S0925-2312(02)00753-1

    Article  Google Scholar 

  39. Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society B, 308, 67–78.

    Article  Google Scholar 

  40. Dickinson, A. (1994). Instrumental conditioning. In N. J. Mackintosh (Ed.), Animal learning and cognition (pp. 45–79). San Diego: Academic Press.

    Google Scholar 

  41. Domjan, M. (2003). The principles of learning and behavior (5th ed.). Belmont, CA: Thomson/Wadsworth.

    Google Scholar 

  42. Doya, K. (1996). Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1073–1079). Cambridge, MA: MIT Press.

    Google Scholar 

  43. Eblen, F., & Graybiel, A. M. (1995). Highly restricted origin of prefrontal cortical inputs to striosomes in the macaque monkey. Journal of Neuroscience, 15, 5999–6013.

    PubMed  Google Scholar 

  44. Elliott, R., Newman, J. L., Longe, O. A., & Deakin, J. F. W. (2004). Instrumental responding for rewards is associated with enhanced neuronal response in subcortical reward systems. NeuroImage, 21, 984–990.

    PubMed  Article  Google Scholar 

  45. Elster, J. (1979). Ulysses and the sirens: Studies in rationality and irrationality. Cambridge: Cambridge University Press.

    Google Scholar 

  46. Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In Proceedings of the 20th International Conference on Machine Learning (pp. 154–161). Menlo Park, CA: AAAI Press.

    Google Scholar 

  47. Ferraro, G., Montalbano, M. E., Sardo, P., & La Grutta, V. (1996). Lateral habenular influence on dorsal raphe neurons. Brain Research Bulletin, 41, 47–52. doi:10.1016/0361-9230(96)00170-0

    PubMed  Article  Google Scholar 

  48. Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299, 1898–1902.

    PubMed  Article  Google Scholar 

  49. Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2005). Evidence that the delay-period activity of dopamine neurons corresponds to reward uncertainty rather than backpropagating TD errors. Behavioral & Brain Functions, 1, 7.

    Article  Google Scholar 

  50. Frederick, S., Loewenstein, G., & O’Donoghue, T. (2002). Time discounting and time preference: A critical review. Journal of Economic Literature, 40, 351–401.

    Article  Google Scholar 

  51. Fujii, N., & Graybiel, A. M. (2005). Time-varying covariance of neural activities recorded in striatum and frontal cortex as monkeys perform sequential-saccade tasks. Proceedings of the National Academy of Sciences, 102, 9032–9037.

    Article  Google Scholar 

  52. Gao, D. M., Hoffman, D., & Benabid, A. L. (1996). Simultaneous recording of spontaneous activities and nociceptive responses from neurons in the pars compacta of substantia nigra and in the lateral habenula. European Journal of Neuroscience, 8, 1474–1478.

    PubMed  Article  Google Scholar 

  53. Geisler, S., Derst, C., Veh, R. W., & Zahm, D. S. (2007). Glutamatergic afferents of the ventral tegmental area in the rat. Journal of Neuroscience, 27, 5730–5743.

    PubMed  Article  Google Scholar 

  54. Geisler, S., & Trimble, M. (2008). The lateral habenula: No longer neglected. CNS Spectrums, 13, 484–489.

    PubMed  Google Scholar 

  55. Gerfen, C. R. (1984). The neostriatal mosaic: Compartmentalization of corticostriatal input and striatonigral output systems. Nature, 311, 461–464. doi:10.1038/311461a0

    PubMed  Article  Google Scholar 

  56. Gerfen, C. R. (1985). The neostriatal mosaic. I. Compartmental organization of projections from the striatum to the substantia nigra in the rat. Journal of Comparative Neurology, 236, 454–476.

    PubMed  Article  Google Scholar 

  57. Grace, A. A. (1991). Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia. Neuroscience, 41, 1–24.

    PubMed  Article  Google Scholar 

  58. Grace, A. A. (2000). The tonic/phasic model of dopamine system regulation and its implications for understanding alcohol and psychostimulant craving. Addiction, 95(Suppl. 2), S119-S128.

    PubMed  Google Scholar 

  59. Gray, T. S. (1999). Functional and anatomical relationships among the amygdala, basal forebrain, ventral striatum, and cortex: An integrative discussion. In J. F. McGinty (Ed.), Advancing from the ventral striatum to the amygdala: Implications for neuropsychiatry and drug abuse (Annals of the New York Academy of Sciences, Vol. 877, pp. 439–444). New York: New York Academy of Sciences.

    Google Scholar 

  60. Graybiel, A. M. (1990). Neurotransmitters and neuromodulators in the basal ganglia. Trends in Neurosciences, 13, 244–254.

    PubMed  Article  Google Scholar 

  61. Graybiel, A. M. (1998). The basal ganglia and chunking of action repertoires. Neurobiology of Learning & Memory, 70, 119–136.

    Article  Google Scholar 

  62. Graybiel, A. M., & Ragsdale, C. W., Jr. (1978). Histochemically distinct compartments in the striatum of human, monkeys, and cat demonstrated by acetylthiocholinesterase staining. Proceedings of the National Academy of Sciences, 75, 5723–5726.

    Article  Google Scholar 

  63. Green, L., & Myerson, J. (2004). A discounting framework for choice with delayed and probabilistic rewards. Psychological Bulletin, 130, 769–792.

    PubMed  Article  Google Scholar 

  64. Guarraci, F. A., & Kapp, B. S. (1999). An electrophysiological characterization of ventral tegmental area dopaminergic neurons during differential Pavlovian fear conditioning in the awake rabbit. Behavioural Brain Research, 99, 169–179.

    PubMed  Article  Google Scholar 

  65. Haber, S. N. (2003). The primate basal ganglia: Parallel and integrative networks. Journal of Chemical Neuroanatomy, 26, 317–330.

    PubMed  Article  Google Scholar 

  66. Haber, S. N., & Fudge, J. L. (1997). The interface between dopamine neurons and the amygdala: Implications for schizophrenia. Schizophrenia Bulletin, 23, 471–482. doi:10.1093/schbul/23.3.471

    PubMed  Google Scholar 

  67. Hastie, R., & Dawes, R. M. (2001). Rational choice in an uncertain world: The psychology of judgment and decision making. New York: Sage.

    Google Scholar 

  68. Herkenham, M., & Nauta, W. J. (1979). Efferent connections of the habenular nuclei in the rat. Journal of Comparative Neurology, 187, 19–47.

    PubMed  Article  Google Scholar 

  69. Hertwig, R., Barron, G., Weber, E. U., & Erev, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychological Science, 15, 534–539. doi:10.1111/j.0956-7976.2004.00715.x

    PubMed  Article  Google Scholar 

  70. Hikosaka, K., & Watanabe, M. (2000). Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cerebral Cortex, 10, 263–271.

    PubMed  Article  Google Scholar 

  71. Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1: Foundations (pp. 77–109). Cambridge, MA: MIT Press.

    Google Scholar 

  72. Ho, M.-Y., Mobini, S., Chiang, T.-J., Bradshaw, C. M., & Szabadi, E. (1999). Theory and method in the quantitative analysis of “impulsive choice” behaviour: Implications for psychopharmacology. Psychopharmacology, 146, 362–372.

    PubMed  Article  Google Scholar 

  73. Hollerman, J. R., & Schultz, W. (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neuroscience, 1, 304–309.

    PubMed  Article  Google Scholar 

  74. Horvitz, J. C. (2000). Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience, 96, 651–656.

    PubMed  Article  Google Scholar 

  75. Houk, J. C., Adams, J. L., & Barto, A. G. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 249–270). Cambridge, MA: MIT Press.

    Google Scholar 

  76. Hsu, M., Krajbich, I., Zhao, C., & Camerer, C. F. (2009). Neural response to reward anticipation under risk is nonlinear in probabilities. Journal of Neuroscience, 29, 2231–2237. doi:10.1523/jneurosci.5296-08.2009

    PubMed  Article  Google Scholar 

  77. Huettel, S. A., Stowe, C. J., Gordon, E. M., Warner, B. T., & Platt, M. L. (2006). Neural signatures of economic preferences for risk and ambiguity. Neuron, 49, 765–775.

    PubMed  Article  Google Scholar 

  78. Jay, T. M. (2003). Dopamine: A potential substrate for synaptic plasticity and memory mechanisms. Progress in Neurobiology, 69, 375–390. doi:10.1016/S0301-0082(03)00085-6

    PubMed  Article  Google Scholar 

  79. Ji, H., & Shepard, P. D. (2007). Lateral habenula stimulation inhibits rat midbrain dopamine neurons through a GABAA receptor-mediated mechanism. Journal of Neuroscience, 27, 6923–6930. doi:10.1523/ jneurosci.0958-07.2007

    PubMed  Article  Google Scholar 

  80. Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535–547.

    PubMed  Article  Google Scholar 

  81. Joel, D., & Weiner, I. (2000). The connections of the dopaminergic system with the striatum in rats and primates: An analysis with respect to the functional and compartmental organization of the striatum. Neuroscience, 96, 451–474.

    PubMed  Article  Google Scholar 

  82. Jog, M. S., Kubota, Y., Connolly, C. I., Hillegaart, V., & Graybiel, A. M. (1999). Building neural representations of habits. Science, 286, 1745–1749.

    PubMed  Article  Google Scholar 

  83. Johnson, A., van der Meer, M. A. A., & Redish, A. D. (2007). Integrating hippocampus and striatum in decision-making. Current Opinion in Neurobiology, 17, 692–697.

    PubMed  Article  Google Scholar 

  84. Kable, J. W., & Glimcher, P. W. (2007). The neural correlates of subjective value during intertemporal choice. Nature Neuroscience, 10, 1625–1633.

    PubMed  Article  Google Scholar 

  85. Kacelnik, A. (1997). Normative and descriptive models of decision making: Time discounting and risk sensitivity. In G. R. Bock & G. Cardew (Eds.), Characterizing human psychological adaptations (Ciba Foundation Symposium, No. 208, pp. 51–70). New York: Wiley.

    Google Scholar 

  86. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.

    Article  Google Scholar 

  87. Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.

    Google Scholar 

  88. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291.

    Article  Google Scholar 

  89. Kalen, P., Strecker, R. E., Rosengren, E., & Bjorklund, A. (1989). Regulation of striatal serotonin release by the lateral habenula-dorsal raphe pathway in the rat as demonstrated by in vivo microdialysis: Role of excitatory amino acids and GABA. Brain Research, 492, 187–202.

    PubMed  Article  Google Scholar 

  90. Killcross, S., & Coutureau, E. (2003). Coordination of actions and habits in the medial prefrontal cortex of rats. Cerebral Cortex, 13, 400–408.

    PubMed  Article  Google Scholar 

  91. Kim, S., Hwang, J., & Lee, D. (2008). Prefrontal coding of temporally discounted values during intertemporal choice. Neuron, 59, 161–172.

    PubMed  Article  Google Scholar 

  92. Kirkland, K. L. (2002). High-tech brains: A history of technologybased analogies and models of nerve and brain function. Perspectives in Biology & Medicine, 45, 212–223. doi:10.1353/pbm.2002.0033

    Article  Google Scholar 

  93. Knight, F. H. (1921). Risk, uncertainty and profit. Boston: Houghton Mifflin.

    Google Scholar 

  94. Knowlton, B. J., Mangels, J. A., & Squire, L. R. (1996). A neostriatal habit learning system in humans. Science, 273, 1399–1402.

    PubMed  Article  Google Scholar 

  95. Knutson, B., & Gibbs, S. E. (2007). Linking nucleus accumbens dopamine and blood oxygenation. Psychopharmacology, 191, 813–822.

    PubMed  Article  Google Scholar 

  96. Kobayashi, S., & Schultz, W. W. (2008). Influence of reward delays on responses of dopamine neurons. Journal of Neuroscience, 28, 7837–7846. doi:10.1523/jneurosci.1600-08.2008

    PubMed  Article  Google Scholar 

  97. Kozlowski, M. R., & Marshall, J. F. (1980). Plasticity of [14C]2-deoxy-D-glucose incorporation into neostriatum and related structures in response to dopamine neuron damage and apomorphine replacement. Brain Research, 197, 167–183.

    PubMed  Article  Google Scholar 

  98. Laibson, D. (1997). Golden eggs and hyperbolic discounting. Quarterly Journal of Economics, 112, 443–477.

    Article  Google Scholar 

  99. Lévesque, M., & Parent, A. (2005). The striatofugal fiber system in primates: A reevaluation of its organization based on single-axon tracing studies. Proceedings of the National Academy of Sciences, 102, 11888–11893. doi:10.1073/pnas.0502710102

    Article  Google Scholar 

  100. Loewenstein, G. (1996). Out of control: Visceral influences on behavior. Organizational Behavior & Human Decision Processes, 65, 272–292.

    Article  Google Scholar 

  101. Logothetis, N. K., Pauls, J., Augath, M., Trinath, T., & Oeltermann, A. (2001). Neurophysiological investigation of the basis of the fMRI signal. Nature, 412, 150–157. doi:10.1038/35084005

    PubMed  Article  Google Scholar 

  102. Ludvig, E. A., Sutton, R. S., & Kehoe, E. J. (2008). Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Computation, 20, 3034–3054.

    PubMed  Article  Google Scholar 

  103. Mantz, J., Thierry, A. M., & Glowinski, J. (1989). Effect of noxious tail pinch on the discharge rate of mesocortical and mesolimbic do pamine neurons: Selective activation of the mesocortical system. Brain Research, 476, 377–381.

    PubMed  Article  Google Scholar 

  104. Matsumoto, M., & Hikosaka, O. (2007). Lateral habenula as a source of negative reward signals in dopamine neurons. Nature, 447, 1111–1115.

    PubMed  Article  Google Scholar 

  105. Matsumoto, M., & Hikosaka, O. (2009a). Representation of negative motivational value in the primate lateral habenula. Nature Neuroscience, 12, 77–84.

    PubMed  Article  Google Scholar 

  106. Matsumoto, M., & Hikosaka, O. (2009b). Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature, 459, 837–841.

    PubMed  Article  Google Scholar 

  107. Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. In M. L. Commons, J. E. Mazur, J. A. Nevin, & H. Rachlin (Eds.), Quantitative analyses of behavior: Vol. 5. The effect of delay and of intervening events on reinforcement value (pp. 55–73). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  108. Mazur, J. E. (2001). Hyperbolic value addition and general models of animal choice. Psychological Review, 108, 96–112.

    PubMed  Article  Google Scholar 

  109. Mazur, J. E. (2007). Choice in a successive-encounters procedure and hyperbolic decay of reinforcement. Journal of the Experimental Analysis of Behavior, 88, 73–85.

    PubMed  Article  Google Scholar 

  110. McClure, S. M., Berns, G. S., & Montague, P. R. (2003). Temporal prediction errors in a passive learning task activate human striatum. Neuron, 38, 339–346.

    PubMed  Article  Google Scholar 

  111. McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2004). Separate neural systems value immediate and delayed monetary rewards. Science, 306, 503–507.

    PubMed  Article  Google Scholar 

  112. McCoy, A. N., & Platt, M. L. (2005). Risk-sensitive neurons in macaque posterior cingulate cortex. Nature Neuroscience, 8, 1220–1227.

    PubMed  Article  Google Scholar 

  113. McCulloch, J., Savaki, H. E., & Sokoloff, L. (1980). Influence of dopaminergic systems on the lateral habenular nucleus of the rat. Brain Research, 194, 117–124.

    PubMed  Article  Google Scholar 

  114. Metcalfe, J., & Mischel, W. (1999). A hot/cool-system analysis of delay of gratification: Dynamics of willpower. Psychological Review, 106, 3–19.

    PubMed  Article  Google Scholar 

  115. Michie, D. (1961). Trial and error. In S. A. Barnett & A. McLaren (Eds.), Science survey (Part 2, pp. 129–145). Harmondsworth, U.K.: Penguin.

    Google Scholar 

  116. Middleton, F. A., & Strick, P. L. (2001). A revised neuroanatomy of frontal-subcortical circuits. In D. G. Lichter & J. L. Cummings (Eds.), Frontal-subcortical circuits in psychiatric and neurological disorders. New York: Guilford.

    Google Scholar 

  117. Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt, Rinehart & Winston.

    Google Scholar 

  118. Minsky, M. (1963). Steps toward artificial intelligence. In E. A. Feigenbaum & J. Feldman (Eds.), Computers and thought (pp. 406–450). New York: McGraw-Hill.

    Google Scholar 

  119. Mirenowicz, J., & Schultz, W. (1994). Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology, 72, 1024–1027.

    PubMed  Google Scholar 

  120. Mirenowicz, J., & Schultz, W. (1996). Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature, 379, 449–451.

    PubMed  Article  Google Scholar 

  121. Monahan, G. E. (1982). A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Science, 28, 1–16.

    Article  Google Scholar 

  122. Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16, 1936–1947.

    PubMed  Google Scholar 

  123. Morecraft, R. J., Geula, C., & Mesulam, M. M. (1992). Cytoarchitecture and neural afferents of orbitofrontal cortex in the brain of the monkey. Journal of Comparative Neurology, 323, 341–358.

    PubMed  Article  Google Scholar 

  124. Morris, G., Arkadir, D., Nevet, A., Vaadia, E., & Bergman, H. (2004). Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron, 43, 133–143.

    PubMed  Article  Google Scholar 

  125. Myerson, J., & Green, L. (1995). Discounting of delayed rewards: Models of individual choice. Journal of the Experimental Analysis of Behavior, 64, 263–276.

    PubMed  Article  Google Scholar 

  126. Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., & Hikosaka, O. (2004). Dopamine neurons can represent context-dependent prediction error. Neuron, 41, 269–280.

    PubMed  Article  Google Scholar 

  127. Nakamura, K., Matsumoto, M., & Hikosaka, O. (2008). Reward-dependent modulation of neuronal activity in the primate dorsal raphe nucleus. Journal of Neuroscience, 28, 5331–5343.

    PubMed  Article  Google Scholar 

  128. Niv, Y., Duff, M. O., & Dayan, P. (2005). Dopamine, uncertainty and TD learning. Behavioral & Brain Functions, 1, 6. doi:10.1186/1744-9081-1-6

    Article  Google Scholar 

  129. Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences, 12, 265–272.

    PubMed  Article  Google Scholar 

  130. Oades, R. D., & Halliday, G. M. (1987). Ventral tegmental (A10) system: Neurobiology. 1. Anatomy and connectivity. Brain Research, 434, 117–165.

    PubMed  Google Scholar 

  131. O’Doherty, J., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38, 329–337.

    PubMed  Article  Google Scholar 

  132. O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454.

    PubMed  Article  Google Scholar 

  133. O’Donoghue, T., & Rabin, M. (1999). Doing it now or later. American Economic Review, 89, 103–124.

    Article  Google Scholar 

  134. Ongur, D., An, X., & Price, J. L. (1998). Prefrontal cortical projections to the hypothalamus in macaque monkeys. Journal of Comparative Neurology, 401, 480–505.

    PubMed  Article  Google Scholar 

  135. Packard, M. G., & Knowlton, B. J. (2002). Learning and memory functions of the basal ganglia. Annual Review of Neuroscience, 25, 563–593.

    PubMed  Article  Google Scholar 

  136. Pagnoni, G., Zink, C. F., Montague, P. R., & Berns, G. S. (2002). Activity in human ventral striatum locked to errors of reward prediction. Nature Neuroscience, 5, 97–98.

    PubMed  Article  Google Scholar 

  137. Park, M. R. (1987). Monosynaptic inhibitory postsynaptic potentials from lateral habenula recorded in dorsal raphe neurons. Brain Research Bulletin, 19, 581–586.

    PubMed  Article  Google Scholar 

  138. Parr, R. (1998). Hierarchical control and learning for Markov decision processes. Unpublished doctoral dissertation, University of California, Berkeley.

    Google Scholar 

  139. Paton, J. J., Belova, M. A., Morrison, S. E., & Salzman, C. D. (2006). The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature, 439, 865–870.

    PubMed  Article  Google Scholar 

  140. Paulus, M. P., & Frank, L. R. (2006). Anterior cingulate activity modulates nonlinear decision weight function of uncertain prospects. NeuroImage, 30, 668–677.

    PubMed  Article  Google Scholar 

  141. Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopamine-dependent prediction errors underpin rewardseeking behaviour in humans. Nature, 442, 1042–1045.

    PubMed  Article  Google Scholar 

  142. Poupart, P., Vlassis, N., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning (pp. 697–704). New York: ACM.

    Google Scholar 

  143. Prelec, D., & Loewenstein, G. (1991). Decision making over time and under uncertainty: A common approach. Management Science, 37, 770–786.

    Article  Google Scholar 

  144. Preuschoff, K., & Bossaerts, P. (2007). Adding prediction risk to the theory of reward learning. In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagami (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 135–146). New York: New York Academy of Sciences.

    Google Scholar 

  145. Preuschoff, K., Bossaerts, P., & Quartz, S. R. (2006). Neural differentiation of expected reward and risk in human subcortical structures. Neuron, 51, 381–390.

    PubMed  Article  Google Scholar 

  146. Puterman, M. L. (2001). Dynamic programming. In R. A. Meyers (Ed.), Encyclopedia of physical science and technology (3rd ed., Vol. 4, pp. 673–696). San Diego: Academic Press.

    Google Scholar 

  147. Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken, NJ: Wiley-Interscience.

    Google Scholar 

  148. Rachlin, H., Raineri, A., & Cross, D. (1991). Subjective probability and delay. Journal of the Experimental Analysis of Behavior, 55, 233–244.

    PubMed  Article  Google Scholar 

  149. Ramm, P., Beninger, R. J., & Frost, B. J. (1984). Functional activity in the lateral habenular and dorsal raphe nuclei following administration of several dopamine receptor antagonists. Canadian Journal of Physiology & Pharmacology, 62, 1530–1533.

    Google Scholar 

  150. Redish, A. D., & Johnson, A. (2007). A computational model of craving and obsession. In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagami (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 324–339). New York: New York Academy of Sciences.

    Google Scholar 

  151. Reisine, T. D., Soubrié, P., Artaud, F., & Glowinski, J. (1982). Involvement of lateral habenula-dorsal raphe neurons in the differential regulation of striatal and nigral serotonergic transmission in cats. Journal of Neuroscience, 2, 1062–1071.

    PubMed  Google Scholar 

  152. Rempel-Clower, N. L. (2007). Role of orbitofrontal cortex connections in emotion. In G. Schoenbaum, J. A. Gottfried, E. A. Murray, & S. J. Ramus (Eds.), Linking affect to action: Critical contributions of the orbitofrontal cortex (Annals of the New York Academy of Sciences, Vol. 1121, pp. 72–86). New York: New York Academy of Sciences.

    Google Scholar 

  153. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts.

    Google Scholar 

  154. Reynolds, J. N., & Wickens, J. R. (2002). Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15, 507–521. doi:10.1016/S0893-6080(02)00045-X

    PubMed  Article  Google Scholar 

  155. Richards, J. B., Mitchell, S. H., de Wit, H., & Seiden, L. S. (1997). Determination of discount functions in rats with an adjusting-amount procedure. Journal of the Experimental Analysis of Behavior, 67, 353–366.

    PubMed  Article  Google Scholar 

  156. Rodriguez, P. F., Aron, A. R., & Poldrack, R. A. (2006). Ventralstriatal/ nucleus-accumbens sensitivity to prediction errors during classification learning. Human Brain Mapping, 27, 306–313.

    PubMed  Article  Google Scholar 

  157. Samuelson, P. (1937). A note on measurement of utility. Review of Economic Studies, 4, 155–161.

    Article  Google Scholar 

  158. Santamaria, J. C., Sutton, R. S., & Ram, A. (1998). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6, 163–218.

    Article  Google Scholar 

  159. Schoenbaum, G., Chiba, A. A., & Gallagher, M. (1998). Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nature Neuroscience, 1, 155–159.

    PubMed  Article  Google Scholar 

  160. Schoenbaum, G., & Roesch, M. (2005). Orbitofrontal cortex, associative learning, and expectancies. Neuron, 47, 633–636.

    PubMed  Article  Google Scholar 

  161. Schönberg, T., Daw, N. D., Joel, D., & O’Doherty, J. P. (2007). Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. Journal of Neuroscience, 27, 12860–12867.

    PubMed  Article  Google Scholar 

  162. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.

    PubMed  Google Scholar 

  163. Schultz, W. (2000). Multiple reward signals in the brain. Nature Reviews Neuroscience, 1, 199–207.

    PubMed  Article  Google Scholar 

  164. Schultz, W. (2002). Getting formal with dopamine and reward. Neuron, 36, 241–263.

    PubMed  Article  Google Scholar 

  165. Schultz, W., Apicella, P., Scarnati, E., & Ljungberg, T. (1992). Neuronal activity in monkey ventral striatum related to the expectation of reward. Journal of Neuroscience, 12, 4595–4610.

    PubMed  Google Scholar 

  166. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.

    PubMed  Article  Google Scholar 

  167. Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of Neuroscience, 23, 473–500.

    PubMed  Article  Google Scholar 

  168. Schultz, W., Preuschoff, K., Camerer, C., Hsu, M., Fiorillo, C. D., Tobler, P. N., & Bossaerts, P. (2008). Explicit neural signals reflecting reward uncertainty. Philosophical Transactions of the Royal Society B, 363, 3801–3811. doi:10.1098/rstb.2008.0152

    Article  Google Scholar 

  169. Schultz, W., & Romo, R. (1987). Responses of nigrostriatal dopamine neurons to high-intensity somatosensory stimulation in the anesthetized monkey. Journal of Neurophysiology, 57, 201–217.

    PubMed  Google Scholar 

  170. Schultz, W., Tremblay, L., & Hollerman, J. R. (2000). Reward processing in primate orbitofrontal cortex and basal ganglia. Cerebral Cortex, 10, 272–283.

    PubMed  Article  Google Scholar 

  171. Schweimer, J. V., Brierley, D. I., & Ungless, M. A. (2008). Phasic nociceptive responses in dorsal raphe serotonin neurons. Fundamental & Clinical Pharmacology, 22, 119.

    Article  Google Scholar 

  172. Setlow, B., Schoenbaum, G., & Gallagher, M. (2003). Neural encoding in ventral striatum during olfactory discrimination learning. Neuron, 38, 625–636.

    PubMed  Article  Google Scholar 

  173. Shohamy, D., Myers, C. E., Grossman, S., Sage, J., Gluck, M. A., & Poldrack, R. A. (2004). Cortico-striatal contributions to feedbackbased learning: Converging data from neuroimaging and neuropsychology. Brain, 127, 851–859.

    PubMed  Article  Google Scholar 

  174. Simmons, J. M., Ravel, S., Shidara, M., & Richmond, B. J. (2007). A comparison of reward-contingent neuronal activity in monkey orbitofrontal cortex and ventral striatum: Guiding actions toward rewards. In G. Schoenbaum, J. A. Gottfried, E. A. Murray, & S. J. Ramus (Eds.), Linking affect to action: Critical contributions of the orbitofrontal cortex (Annals of the New York Academy of Sciences, Vol. 1121, pp. 376–394). New York: New York Academy of Sciences.

    Google Scholar 

  175. Smart, W. D., & Kaelbling, L. P. (2000). Practical reinforcement learning in continuous spaces. In Proceedings of the 17th International Conference on Machine Learning (pp. 903–910). San Francisco: Morgan Kaufmann.

    Google Scholar 

  176. Sozou, P. D. (1998). On hyperbolic discounting and uncertain hazard rates. Proceedings of the Royal Society B, 265, 2015–2020.

    Article  Google Scholar 

  177. Stern, W. C., Johnson, A., Bronzino, J. D., & Morgane, P. J. (1979). Effects of electrical stimulation of the lateral habenula on single-unit activity of raphe neurons. Experimental Neurology, 65, 326–342.

    PubMed  Article  Google Scholar 

  178. Stevens, S. S. (1957). On the psychophysical law. Psychological Review, 64, 153–181.

    PubMed  Article  Google Scholar 

  179. Suri, R. E. (2002). TD models of reward predictive responses in dopamine neurons. Neural Networks, 15, 523–533.

    PubMed  Article  Google Scholar 

  180. Suri, R. E., & Schultz, W. (1999). A neural network model with dopamine- like reinforcement signal that learns a spatial delayed response task. Neuroscience, 91, 871–890.

    PubMed  Article  Google Scholar 

  181. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.

    Google Scholar 

  182. Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. R. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp. 497–537). Cambridge, MA: MIT Press.

    Google Scholar 

  183. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

    Google Scholar 

  184. Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.

    Article  Google Scholar 

  185. Tan, C. O., & Bullock, D. (2008). A local circuit model of learned striatal and dopamine cell responses under probabilistic schedules of reward. Journal of Neuroscience, 28, 10062–10074.

    PubMed  Article  Google Scholar 

  186. Thiébot, M. H., Hamon, M., & Soubrié, P. (1983). The involvement of nigral serotonin innervation in the control of punishment-induced behavioral inhibition in rats. Pharmacology, Biochemistry & Behavior, 19, 225–229.

    Article  Google Scholar 

  187. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph Supplements, 2(4, Whole No. 8).

  188. Tobler, P. N., Christopoulos, G. I., O’Doherty, J. P., Dolan, R. J., & Schultz, W. (2008). Neuronal distortions of reward probability without choice. Journal of Neuroscience, 28, 11703–11711.

    PubMed  Article  Google Scholar 

  189. Tobler, P. N., Fiorillo, C. D., & Schultz, W. (2005). Adaptive coding of reward value by dopamine neurons. Science, 307, 1642–1645.

    PubMed  Article  Google Scholar 

  190. Tobler, P. N., O’Doherty, J. P., Dolan, R. J., & Schultz, W. (2007). Reward value coding distinct from risk attitude-related uncertainty coding in human reward systems. Journal of Neurophysiology, 97, 1621–1632.

    PubMed  Article  Google Scholar 

  191. Tolman, E. C. (1932). Purposive behavior in animals and men. New York: Appleton Century.

    Google Scholar 

  192. Tremblay, L., & Schultz, W. (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398, 704–708.

    PubMed  Article  Google Scholar 

  193. Tremblay, L., & Schultz, W. (2000). Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. Journal of Neurophysiology, 83, 1864–1876.

    PubMed  Google Scholar 

  194. Trepel, C., Fox, C. R., & Poldrack, R. A. (2005). Prospect theory on the brain? Toward a cognitive neuroscience of decision under risk. Cognitive Brain Research, 23, 34–50.

    PubMed  Article  Google Scholar 

  195. Tricomi, E. M., Delgado, M. R., & Fiez, J. A. (2004). Modulation of caudate activity by action contingency. Neuron, 41, 281–292.

    PubMed  Article  Google Scholar 

  196. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk & Uncertainty, 5, 297–323.

    Article  Google Scholar 

  197. Tye, N. C., Everitt, B. J., & Iversen, S. D. (1977). 5- Hydroxytryptamine and punishment. Nature, 268, 741–743.

    PubMed  Article  Google Scholar 

  198. Ungless, M. A., Magill, P. J., & Bolam, J. P. (2004). Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science, 303, 2040–2042.

    PubMed  Article  Google Scholar 

  199. von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton, NJ: Princeton University Press.

    Google Scholar 

  200. Wan, X., & Peoples, L. L. (2006). Firing patterns of accumbal neurons during a Pavlovian-conditioned approach task. Journal of Neurophysiology, 96, 652–660.

    PubMed  Article  Google Scholar 

  201. Wang, R. Y., & Aghajanian, G. K. (1977). Physiological evidence for habenula as major link between forebrain and midbrain raphe. Science, 197, 89–91.

    PubMed  Article  Google Scholar 

  202. White, N. M., & Hiroi, N. (1998). Preferential localization of selfstimulation sites in striosomes/patches in the rat striatum. Proceedings of the National Academy of Sciences, 95, 6486–6491.

    Article  Google Scholar 

  203. Wickens, J. R., Budd, C. S., Hyland, B. I., & Arbuthnott, G. W. (2007). Striatal contributions to reward and decision making: Making sense of regional variations in a reiterated processing matrix. In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagami (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 192–212). New York: New York Academy of Sciences.

    Google Scholar 

  204. Wilkinson, L. O., & Jacobs, B. L. (1988). Lack of response of serotonergic neurons in the dorsal raphe nucleus of freely moving cats to stressful stimuli. Experimental Neurology, 101, 445–457.

    PubMed  Article  Google Scholar 

  205. Witten, I. H. (1977). An adaptive optimal controller for discrete-time Markov environments. Information & Control, 34, 286–295.

    Article  Google Scholar 

  206. Wooten, G. F., & Collins, R. C. (1981). Metabolic effects of unilateral lesion of the substantia nigra. Journal of Neuroscience, 1, 285–291.

    PubMed  Google Scholar 

  207. Yang, L.-M., Hu, B., Xia, Y.-H., Zhang, B.-L., & Zhao, H. (2008). Lateral habenula lesions improve the behavioral response in depressed rats via increasing the serotonin level in dorsal raphe nucleus. Behavioural Brain Research, 188, 84–90.

    PubMed  Article  Google Scholar 

  208. Yin, H. H., & Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nature Reviews Neuroscience, 7, 464–476.

    PubMed  Article  Google Scholar 

  209. Zald, D. H., & Kim, S. W. (2001). The orbitofrontal cortex. In S. P. Salloway, P. F. Malloy, & J. D. Duffy (Eds.), The frontal lobes and neuropsychiatric illness (pp. 33–69). Washington, DC: American Psychiatric Publishing.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tiago V. Maia.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Maia, T.V. Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience 9, 343–364 (2009). https://doi.org/10.3758/CABN.9.4.343

Download citation

Keywords

  • Conditioned Stimulus
  • Prediction Error
  • Reinforcement Learning
  • Dopamine Neuron
  • Markov Decision Process