Tonic dopamine: opportunity costs and the control of response vigor
- 3.2k Downloads
Dopamine neurotransmission has long been known to exert a powerful influence over the vigor, strength, or rate of responding. However, there exists no clear understanding of the computational foundation for this effect; predominant accounts of dopamine’s computational function focus on a role for phasic dopamine in controlling the discrete selection between different actions and have nothing to say about response vigor or indeed the free-operant tasks in which it is typically measured.
We seek to accommodate free-operant behavioral tasks within the realm of models of optimal control and thereby capture how dopaminergic and motivational manipulations affect response vigor.
We construct an average reward reinforcement learning model in which subjects choose both which action to perform and also the latency with which to perform it. Optimal control balances the costs of acting quickly against the benefits of getting reward earlier and thereby chooses a best response latency.
In this framework, the long-run average rate of reward plays a key role as an opportunity cost and mediates motivational influences on rates and vigor of responding. We review evidence suggesting that the average reward rate is reported by tonic levels of dopamine putatively in the nucleus accumbens.
Our extension of reinforcement learning models to free-operant tasks unites psychologically and computationally inspired ideas about the role of tonic dopamine in striatum, explaining from a normative point of view why higher levels of dopamine might be associated with more vigorous responding.
KeywordsDopamine Motivation Response rate Energizing Reinforcement learning Free operant
This work was funded by the Gatsby Charitable Foundation, a Hebrew University Rector Fellowship (Y.N.), the Royal Society (N.D.), and the EU Bayesian Inspired Brain and Artefacts (BIBA) project (N.D. and P.D.). We are grateful to Saleem Nicola, Mark Walton, and Matthew Rushworth for valuable discussions.
- Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 215–232Google Scholar
- Beninger RJ (1983) The role of dopamine in locomotor activity and learning. Brain Res Brain Res Rev 6:173–196Google Scholar
- Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena, BelmontGoogle Scholar
- Bolles RC (1967) Theory of motivation. Harper and Row, New YorkGoogle Scholar
- Daw ND (2003) Reinforcement learning models of the dopamine system and their behavioral implications. Unpublished doctoral dissertation, Carnegie Mellon UniversityGoogle Scholar
- Daw ND, Touretzky DS (2002) Long-term reward prediction in TD models of the dopamine system. Neural Comp 14:2567–2583Google Scholar
- Denk F, Walton ME, Jennings KA, Sharp T, Rushworth MF, Bannerman DM (2005) Differential involvement of serotonin and dopamine systems in cost–benefit decisions about delay or effort. Psychopharmacology (Berl) 179(3):587–596Google Scholar
- Dickinson A (1985) Actions and habits: the development of behavioural autonomy. Philos Trans R Soc Lond B Biol Sci 308(1135):67–78Google Scholar
- Dickinson A, Balleine B (1994) Motivational control of goal-directed action. Anim Learn Behav 22:1–18Google Scholar
- Dickinson A, Balleine B (2002) The role of learning in the operation of motivational systems. In: Pashler H, Gallistel R (eds) Stevens’ handbook of experimental psychology. Learning, motivation and emotion, 3rd edn, vol 3. Wiley, New York, pp 497–533Google Scholar
- Domjan M (2003) Principles of learning and behavior, 5th edn. Thomson/Wadsworth, BelmontGoogle Scholar
- Evenden JL, Robbins TW (1983) Increased dopamine switching, perseveration and perseverative switching following d-amphetamine in the rat. Psychopharmacology (Berl) 80:67–73Google Scholar
- Fletcher PJ, Korth KM (1999) Activation of 5-HT1B receptors in the nucleus accumbens reduces amphetamine-induced enhancement of responding for conditioned reward. Psychopharmacology (Berl) 142:165–174Google Scholar
- Gibbon J (1977) Scalar expectancy theory and Weber’s law in animal timing. Psychol Rev 84(3):279–325Google Scholar
- Houk JC, Adams JL, Barto AG (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 249–270Google Scholar
- Kacelnik A (1997) Normative and descriptive models of decision making: time discounting and risk sensitivity. In: Bock GR, Cardew G (eds) Characterizing human psychological adaptations: Ciba Foundation symposium 208. Wiley, Chichester, pp 51–70Google Scholar
- Konorski J (1967) Integrative activity of the brain: an interdisciplinary approach. University of Chicago Press, ChicagoGoogle Scholar
- Ljungberg T, Enquist M (1987) Disruptive effects of low doses of d-amphetamine on the ability of rats to organize behaviour into functional sequences. Psychopharmacology (Berl) 93:146–151Google Scholar
- Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopaminergic neurons during learning of behavioral reactions. J Neurophys 67:145–163Google Scholar
- Lyon M, Robbins TW (1975) The action of central nervous system stimulant drugs: a general theory concerning amphetamine effects. In: Current developments in psychopharmacology. Spectrum, New York, pp 80–163Google Scholar
- Mahadevan S (1996) Average reward reinforcement learning: foundations, algorithms and empirical results. Mach Learn 22:1–38Google Scholar
- Montague PR (2006) Why choose this book?: how we make decisions. Dutton, New YorkGoogle Scholar
- Niv Y, Daw ND, Dayan P (2005a) How fast to work: response vigor, motivation and tonic dopamine. In: Weiss Y, Schölkopf B, Platt J (eds) NIPS 18. MIT Press, Cambridge, pp 1019–1026Google Scholar
- Niv Y, Daw ND, Joel D, Dayan P (2005b) Motivational effects on behavior: towards a reinforcement learning model of rates of responding. COSYNE 2005, Salt Lake CityGoogle Scholar
- Salamone JD, Wisniecki A, Carlson BB, Correa M (2001) Nucleus accumbens dopamine depletions make animals highly sensitive to high fixed ratio requirements but do not impair primary food reinforcement. Neuroscience 5(4):863–870Google Scholar
- Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophys 80:1–27Google Scholar
- Schwartz A (1993) A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the tenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 298–305Google Scholar
- Staddon JER (2001) Adaptive dynamics. MIT Press, CambridgeGoogle Scholar
- Sutton RS, Barto AG (1990) Time-derivative models of Pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, pp 497–537Google Scholar
- Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, CambridgeGoogle Scholar
- Taylor JR, Robbins TW (1984) Enhanced behavioural control by conditioned reinforcers following microinjections of d-amphetamine into the nucleus accumbens. Psychopharmacology (Berl) 84:405–412Google Scholar
- Taylor JR, Robbins TW (1986) 6-Hydroxydopamine lesions of the nucleus accumbens, but not of the caudate nucleus, attenuate enhanced responding with reward-related stimuli produced by intra-accumbens d-amphetamine. Psychopharmacology (Berl) 90:390–397Google Scholar
- Walton ME, Kennerley SW, Bannerman DM, Phillips PEM, Rushworth MFS (2006) Weighing up the benefits of work: behavioral and neural analyses of effort-related decision making. Neural networks (in press)Google Scholar
- Weiner I, Joel D (2002) Dopamine in schizophrenia: dysfunctional information processing in basal ganglia-thalamocortical split circuits. In: Chiara GD (ed) Handbook of experimental pharmacology, vol 154/II. Dopamine in the CNS II. Springer, Berlin Heidelberg New York, pp 417–472Google Scholar
- Wickens J (1990) Striatal dopamine in motor activation and reward-mediated learning: steps towards a unifying model. J Neural Transm 80:9–31Google Scholar
- Wickens J, Kötter R (1995) Cellular models of reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 187–214Google Scholar