Skip to main content

Dual-System Learning Models and Drugs of Abuse

  • Chapter
Book cover Computational Neuroscience of Drug Addiction

Part of the book series: Springer Series in Computational Neuroscience ((NEUROSCI,volume 10))

Abstract

Dual-system theories in psychology and neuroscience propose that a deliberative or goal-directed decision system is accompanied by a more automatic or habitual path to action. In computational terms, the latter is prominently associated with model-free reinforcement learning algorithms such as temporal-difference learning, and the former with model-based approaches. Due in part to the close association between drugs of abuse and dopamine, and also between dopamine, temporal-difference learning, and habitual behavior, addictive drugs are often thought to specifically target the habitual system.

However, although many drug-taking behaviors are well explained under such a theory, evidence suggests that drug-seeking behaviors must leverage a goal-directed controller as well. Indeed, one exhaustive theoretical account proposed that drugs may have numerous, distinct impacts on both systems as well as on other processes.

Here, we seek a more parsimonious account of these phenomena by asking whether the apparent profligacy of drugs’ effects might be explained by a single mechanism of action. In particular, we propose that the pattern of effects observed under drug abuse may reveal interactions between the two controllers, which have typically been modeled as separate and parallel. We sketch several different candidate characterizations and architectures by which model-free effects may impinge on a model-based system, including sharing of cached values through truncated tree search and bias of transition selection for prioritized value sweeping.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Ainslie G (2001) Breakdown of will. Cambridge University Press, Cambridge

    Google Scholar 

  • Arkadir D, Morris G, Vaadia E, Bergman H (2004) Independent coding of movement direction and reward prediction by single pallidal neurons. J Neurosci 24(45):10047–10056

    Article  PubMed  CAS  Google Scholar 

  • Balleine BW, Daw ND, O’Doherty JP (2008) Multiple forms of value learning and the function of dopamine. In: Glimcher PW, Camerer CF, Fehr E, Poldrack RA (eds) Neuroeconomics: decision making and the brain. Academic Press, London, pp 367–387

    Google Scholar 

  • Balleine BW, Delgado MR, Hikosaka O (2007) The role of the dorsal striatum in reward and decision-making. J Neurosci 27(31):8161–8165

    Article  PubMed  CAS  Google Scholar 

  • Balleine BW, Dickinson A (1998) Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37(4–5):407–419

    Article  PubMed  CAS  Google Scholar 

  • Bechara A (2005) Decision making, impulse control and loss of willpower to resist drugs: a neurocognitive perspective. Nat Neurosci 8(11):1458–1463

    Article  PubMed  CAS  Google Scholar 

  • Berns GS, McClure SM, Pagnoni G, Montague PR (2001) Predictability modulates human brain response to reward. J Neurosci 21(8):2793–2798

    PubMed  CAS  Google Scholar 

  • Blodgett HC, McCutchan K (1947) Place versus response learning in the simple T-maze. J Exp Psychol 37(5):412–422

    Article  PubMed  CAS  Google Scholar 

  • Bonson KR, Grant SJ, Contoreggi CS, Links JM, Metcalfe J, Weyl HL et al. (2002) Neural systems and cue-induced cocaine craving. Neuropsychopharmacology 26(3):376–386

    Article  PubMed  CAS  Google Scholar 

  • Bromberg-Martin ES, Matsumoto M, Hong S, Hikosaka O (2010) A pallidus-habenula-dopamine pathway signals inferred stimulus values. J Neurophysiol 104(2):1068–1076

    Article  PubMed  Google Scholar 

  • Buckner RL (2010) The role of the hippocampus in prediction and imagination. Annu Rev Psychol 61:27–48, C1-8

    Article  PubMed  Google Scholar 

  • Buckner RL, Carroll DC (2007) Self-projection and the brain. Trends Cogn Sci 11(2):49–57

    Article  PubMed  Google Scholar 

  • Carr MF, Jadhav SP, Frank LM (2011) Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval. Nat Neurosci 14(2):147–153

    Article  PubMed  CAS  Google Scholar 

  • Chib VS, Rangel A, Shimojo S, O’Doherty JP (2009) Evidence for a common representation of decision values for dissimilar goods in human ventromedial prefrontal cortex. J Neurosci 29(39):12315–12320

    Article  PubMed  CAS  Google Scholar 

  • Davidson TJ, Kloosterman F, Wilson MA (2009) Hippocampal replay of extended experience. Neuron 63(4):497–507

    Article  PubMed  CAS  Google Scholar 

  • Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16(2):199–204

    Article  PubMed  CAS  Google Scholar 

  • Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8(12):1704–1711

    Article  PubMed  CAS  Google Scholar 

  • Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan R (2011) Model-based influences on humans’ choices and striatal prediction errors. Neuron 69(6):1204–1215

    Article  PubMed  CAS  Google Scholar 

  • Delgado MR, Nystrom LE, Fissell C, Noll DC, Fiez JA (2000) Tracking the hemodynamic responses to reward and punishment in the striatum. J Neurophysiol 84(6):3072–3077

    PubMed  CAS  Google Scholar 

  • Derdikman D, Moser M-B (2010) A dual role for hippocampal replay. Neuron 65(5):582–584

    Article  PubMed  CAS  Google Scholar 

  • Di Chiara G (1999) Drug addiction as dopamine-dependent associative learning disorder. Eur J Pharmacol 375(1–3):13–30

    Article  PubMed  Google Scholar 

  • Di Ciano P (2008) Facilitated acquisition but not persistence of responding for a cocaine-paired conditioned reinforcer following sensitization with cocaine. Neuropsychopharmacology 33(6):1426–1431

    Article  PubMed  Google Scholar 

  • Dickinson A (1985) Actions and habits: The development of behavioural autonomy. Philos Trans R Soc Lond B, Biol Sci 308:67–78

    Article  Google Scholar 

  • Dickinson A, Balleine B (2002) The role of learning in the operation of motivational systems. In: Stevens’ handbook of experimental psychology. Wiley, New York

    Google Scholar 

  • Doya K (1999) What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw 12(7–8):961–974

    Article  PubMed  Google Scholar 

  • Dragoi G, Tonegawa S (2011) Preplay of future place cell sequences by hippocampal cellular assemblies. Nature 469(7330):397–401

    Article  PubMed  CAS  Google Scholar 

  • Everitt BJ, Robbins TW (2005) Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nat Neurosci 8(11):1481–1489

    Article  PubMed  CAS  Google Scholar 

  • Everitt BJ, Dickinson A, Robbins TW (2001) The neuropsychological basis of addictive behaviour. Brains Res Rev 36(2–3):129–138

    Article  CAS  Google Scholar 

  • Faure A, Haberland U, Condé F, Massioui NE (2005) Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J Neurosci 25(11):2771–2780

    Article  PubMed  CAS  Google Scholar 

  • Foster DJ, Wilson MA (2006) Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084):680–683

    Article  PubMed  CAS  Google Scholar 

  • Garavan H, Pankiewicz J, Bloom A, Cho JK, Sperry L, Ross TJ et al. (2000) Cue-induced cocaine craving: neuroanatomical specificity for drug users and drug stimuli. Am J Psychiatry 157(11):1789–1798

    Article  PubMed  CAS  Google Scholar 

  • Gläscher J, Daw ND, Dayan P, O’Doherty JP (2010) States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66(4):585–595

    Article  PubMed  Google Scholar 

  • Hampton AN, Bossaerts P, O’Doherty JP (2006) The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J Neurosci 26(32):8360–8367

    Article  PubMed  CAS  Google Scholar 

  • Hampton AN, Bossaerts P, O’Doherty JP (2008) Neural correlates of mentalizing-related computations during strategic interactions in humans. Proc Natl Acad Sci 105(18):6741–6746

    Article  PubMed  CAS  Google Scholar 

  • Hare TA, O’Doherty JP, Camerer CF, Schultz W, Rangel A (2008) Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J Neurosci 28(22):5623–5630

    Article  PubMed  CAS  Google Scholar 

  • Hasselmo ME (2008) Temporally structured replay of neural activity in a model of entorhinal cortex, hippocampus and postsubiculum. Eur J Neurosci 28(7):1301–1315

    Article  PubMed  Google Scholar 

  • Houk JC, Adams JL, Barto AG (1994) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 249–270

    Google Scholar 

  • Johnson A, Redish AD (2005) Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw 18(9):1163–1171

    Article  PubMed  Google Scholar 

  • Kable JW, Glimcher PW (2007) The neural correlates of subjective value during intertemporal choice. Nat Neurosci 10(12):1625–1633

    Article  PubMed  CAS  Google Scholar 

  • Kahneman D, Frederick S (2002) Representativeness revisited: Attribute substitution in intuitive judgment. In: Gilovich T, Griffin DW, Kahneman D (eds) Heuristics and biases: the psychology of intuitive judgement. Cambridge University Press, New York, pp 49–81

    Google Scholar 

  • Kalivas PW, Volkow ND (2005) The neural basis of addiction: a pathology of motivation and choice. Am J Psychiatry 162(8):1403–1413

    Article  PubMed  Google Scholar 

  • Killcross S, Coutureau E (2003) Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb Cortex 13(4):400–408

    Article  PubMed  Google Scholar 

  • Kim H, Sul JH, Huh N, Lee D, Jung MW (2009) Role of striatum in updating values of chosen actions. J Neurosci 29(47):14701–14712

    Article  PubMed  CAS  Google Scholar 

  • Koene RA, Hasselmo ME (2008) Reversed and forward buffering of behavioral spike sequences enables retrospective and prospective retrieval in hippocampal regions CA3 and CA1. Neural Netw 21(2–3):276–288

    Article  PubMed  Google Scholar 

  • Lansink CS, Goltstein PM, Lankelma JV, McNaughton BL, Pennartz CMA (2009) Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7(8):e1000173

    Article  PubMed  Google Scholar 

  • Loewenstein G, O’Donoghue T (2004) Animal spirits: Affective and deliberative processes in economic behavior (Working Papers Nos. 04–14). Cornell University, Center for Analytic Economics

    Google Scholar 

  • Lovibond PF (1983) Facilitation of instrumental behavior by a Pavlovian appetitive conditioned stimulus. J Exp Psychol, Anim Behav Processes 9(3):225–247

    Article  CAS  Google Scholar 

  • McClure SM, Berns GS, Montague PR (2003) Temporal prediction errors in a passive learning task activate human striatum. Neuron 38(2):339–346

    Article  PubMed  CAS  Google Scholar 

  • van der Meer MAA, Johnson A, Schmitzer-Torbert NC, Redish AD (2010) Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron 67(1):25–32

    Article  PubMed  Google Scholar 

  • Meil W, See R (1996) Conditioned cued recovery of responding following prolonged withdrawal from self-administered cocaine in rats: an animal model of relapse. Behav Pharmacol 7(8):754–763

    PubMed  CAS  Google Scholar 

  • Moore AW, Atkeson CG (1993) Prioritized sweeping: Reinforcement learning with less data and less time. Mach Learn 13:103–130. (10.1007/BF00993104)

    Google Scholar 

  • Nordquist RE, Voorn P, de Mooij-van Malsen JG, Joosten RNJMA, Pennartz CMA, Vanderschuren LJMJ (2007) Augmented reinforcer value and accelerated habit formation after repeated amphetamine treatment. Eur Neuropsychopharmacol 17(8):532–540

    Article  PubMed  CAS  Google Scholar 

  • O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38(2):329–337

    Article  PubMed  Google Scholar 

  • Olmstead MC, Lafond MV, Everitt BJ, Dickinson A (2001) Cocaine seeking by rats is a goal-directed action. Behav Neurosci 115(2):394–402

    Article  PubMed  CAS  Google Scholar 

  • Pan X, Sawa K, Sakagami M (2007) Model-based reward prediction in the primate prefrontal cortex. Neurosci Res 58(Suppl 1):229

    Article  Google Scholar 

  • Panlilio LV, Thorndike EB, Schindler CW (2007) Blocking of conditioning to a cocaine-paired stimulus: testing the hypothesis that cocaine perpetually produces a signal of larger-than-expected reward. Pharmacol Biochem Behav 86(4):774–777

    Article  PubMed  CAS  Google Scholar 

  • Plassmann H, O’Doherty J, Rangel A (2007) Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. J Neurosci 27(37):9984–9988

    Article  PubMed  CAS  Google Scholar 

  • Poldrack RA, Clark J, Paré-Blagoev EJ, Shohamy D, Creso Moyano J, Myers C et al. (2001) Interactive memory systems in the human brain. Nature 414(6863):546–550

    Article  PubMed  CAS  Google Scholar 

  • Rangel A, Camerer C, Montague P (2008) A framework for studying the neurobiology of value-based decision making. Nat Rev, Neurosci 9(7):545–556

    Article  CAS  Google Scholar 

  • Redish AD (2004) Addiction as a computational process gone awry. Science 306(5703):1944–1947

    Article  PubMed  CAS  Google Scholar 

  • Redish AD, Johnson A (2007) A computational model of craving and obsession. Ann NY Acad Sci 1104(1):324–339

    Article  PubMed  Google Scholar 

  • Redish AD, Jensen S, Johnson A (2008) Addiction as vulnerabilities in the decision process. Behav Brain Sci 31(04):461–487

    Google Scholar 

  • Rescorla RA (1994) Control of instrumental performance by Pavlovian and instrumental stimuli. J Exp Psychol, Anim Behav Processes 20(1):44–50

    Article  CAS  Google Scholar 

  • Robinson TE, Berridge KC (2008) The incentive sensitization theory of addiction: some current issues. Philos Trans R Soc Lond B, Biol Sci 363(1507):3137–3146

    Article  Google Scholar 

  • Root DH, Fabbricatore AT, Barker DJ, Ma S, Pawlak AP, West MO (2009) Evidence for habitual and goal-directed behavior following devaluation of cocaine: a multifaceted interpretation of relapse. PLoS ONE 4(9):e7170

    Article  PubMed  Google Scholar 

  • Samejima K, Ueda Y, Doya K, Kimura M (2005) Representation of action-specific reward values in the striatum. Science 310(5752):1337–1340

    Article  PubMed  CAS  Google Scholar 

  • Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80(1):1–27

    PubMed  CAS  Google Scholar 

  • Schultz W (2011) Potential vulnerabilities of neuronal reward, risk, and decision mechanisms to addictive drugs. Neuron 69(4):603–617

    Article  PubMed  CAS  Google Scholar 

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599

    Article  PubMed  CAS  Google Scholar 

  • See RE (2005) Neural substrates of cocaine-cue associations that trigger relapse. Eur J Pharmacol 526(1–3):140–146

    Article  PubMed  CAS  Google Scholar 

  • Simon DA, Daw ND (2011) Neural correlates of forward planning in a spatial decision task in humans. J Neurosci 31(14):5526–5539

    Article  PubMed  CAS  Google Scholar 

  • Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44

    Google Scholar 

  • Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh International Conference on Machine Learning. Morgan Kaufmann, San Mateo, pp 216–224

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press, Cambridge

    Google Scholar 

  • Tanaka SC, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S (2004) Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7(8):887–893

    Article  PubMed  CAS  Google Scholar 

  • Tanaka SC, Samejima K, Okada G, Ueda K, Okamoto Y, Yamawaki S et al. (2006) Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics. Neural Netw 19(8):1233–1241

    Article  PubMed  Google Scholar 

  • Thorndike EL (1898) Animal intelligence: An experimental study of the associative processes in animals. Psychol Rev Monogr Suppl 2(4):1–8

    Google Scholar 

  • Tiffany ST (1990) A cognitive model of drug urges and drug-use behavior: Role of automatic and nonautomatic processes. Psychol Rev 97(2):147–168

    Article  PubMed  CAS  Google Scholar 

  • Tindell AJ, Smith KS, Berridge KC, Aldridge JW (2009) Dynamic computation of incentive salience: “wanting” what was never “liked”. J Neurosci 29(39):12220–12228

    Article  PubMed  CAS  Google Scholar 

  • Tolman EC (1948) Cognitive maps in rats and men. Psychol Rev 55:189–208

    Article  PubMed  CAS  Google Scholar 

  • Tom SM, Fox CR, Trepel C, Poldrack RA (2007) The neural basis of loss aversion in decision-making under risk. Science 315(5811):515–518

    Article  PubMed  CAS  Google Scholar 

  • Vanderschuren LJMJ, Everitt BJ (2004) Drug seeking becomes compulsive after prolonged cocaine self-administration. Science 305(5686):1017–1019

    Article  PubMed  CAS  Google Scholar 

  • Verplanken B, Aarts H, van Knippenberg AD, Moonen A (1998) Habit versus planned behaviour: a field experiment. Br J Soc Psychol 37(1):111–128

    Article  PubMed  Google Scholar 

  • Volkow ND, Wang G-J, Telang F, Fowler JS, Logan J, Childress A-R et al. (2008) Dopamine increases in striatum do not elicit craving in cocaine abusers unless they are coupled with cocaine cues. NeuroImage 39(3):1266–1273

    Article  PubMed  Google Scholar 

  • Wood W, Neal DT (2007) A new look at habits and the habit-goal interface. Psychol Rev 114(4):843–863

    Article  PubMed  Google Scholar 

  • Wunderlich K, Rangel A, O’Doherty JP (2009) Neural computations underlying action-based decision making in the human brain. Proc Natl Acad Sci 106(40):17199–17204

    Article  PubMed  CAS  Google Scholar 

  • Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci 19(1):181–189

    Article  PubMed  Google Scholar 

  • Yin HH, Ostlund SB, Knowlton BJ, Balleine BW (2005) The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci 22(2):513–523

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors are supported by a Scholar Award from the McKnight Foundation, a NARSAD Young Investigator Award, Human Frontiers Science Program Grant RGP0036/2009-C, and NIMH grant 1R01MH087882-01, part of the CRCNS program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathaniel D. Daw .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Simon, D.A., Daw, N.D. (2012). Dual-System Learning Models and Drugs of Abuse. In: Gutkin, B., Ahmed, S. (eds) Computational Neuroscience of Drug Addiction. Springer Series in Computational Neuroscience, vol 10. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0751-5_5

Download citation

Publish with us

Policies and ethics