Abstract
Rationale
Dopamine neurotransmission has long been known to exert a powerful influence over the vigor, strength, or rate of responding. However, there exists no clear understanding of the computational foundation for this effect; predominant accounts of dopamine’s computational function focus on a role for phasic dopamine in controlling the discrete selection between different actions and have nothing to say about response vigor or indeed the free-operant tasks in which it is typically measured.
Objectives
We seek to accommodate free-operant behavioral tasks within the realm of models of optimal control and thereby capture how dopaminergic and motivational manipulations affect response vigor.
Methods
We construct an average reward reinforcement learning model in which subjects choose both which action to perform and also the latency with which to perform it. Optimal control balances the costs of acting quickly against the benefits of getting reward earlier and thereby chooses a best response latency.
Results
In this framework, the long-run average rate of reward plays a key role as an opportunity cost and mediates motivational influences on rates and vigor of responding. We review evidence suggesting that the average reward rate is reported by tonic levels of dopamine putatively in the nucleus accumbens.
Conclusions
Our extension of reinforcement learning models to free-operant tasks unites psychologically and computationally inspired ideas about the role of tonic dopamine in striatum, explaining from a normative point of view why higher levels of dopamine might be associated with more vigorous responding.
This is a preview of subscription content, access via your institution.



Notes
Given that the actions we include in “Other” are typically performed in experimental scenarios despite not being rewarded by the experimenter, we assume these entail some “internal” reward, modeled simply as a negative unit cost.
Realistically, even in a well-learned task, the average reward rate and response rates may not be perfectly stable. For instance, during a session, both would decline progressively as satiety reduces the utility of obtained rewards. However, this is negligible in most free-operant scenarios in which sessions are short or sparsely rewarded.
References
Aberman JE, Salamone JD (1999) Nucleus accumbens dopamine depletions make rats more sensitive to high ratio requirements but do not impair primary food reinforcement. Neuroscience 92(2):545–552
Ainslie G (1975) Specious reward: a behavioural theory of impulsiveness and impulse control. Psychol Bull 82:463–496
Barrett JE, Stanley JA (1980) Effects of ethanol on multiple fixed-interval fixed-ratio schedule performances: dynamic interactions at different fixed-ratio values. J Exp Anal Behav 34(2):185–198
Barto AG (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 215–232
Beninger RJ (1983) The role of dopamine in locomotor activity and learning. Brain Res Brain Res Rev 6:173–196
Bergstrom BP, Garris PA (2003) ‘Passive stabilization’ of striatal extracellular dopamine across the lesion spectrum encompassing the presymptomatic phase of Parkinson’s disease: a voltammetric study in the 6-OHDA lesioned rat. J Neurochem 87(5):1224–1236
Berridge KC (2004) Motivation concepts in behavioral neuroscience. Physiol Behav 81(2):179–209
Berridge KC, Robinson TE (1998) What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? Brain Res Brain Res Rev 28:309–369
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena, Belmont
Bolles RC (1967) Theory of motivation. Harper and Row, New York
Carr GD, White NM (1987) Effects of systemic and intracranial amphetamine injections on behavior in the open field: a detailed analysis. Pharmacol Biochem Behav 27:113–122
Catania AC, Reynolds GS (1968) A quantitative analysis of the responding maintained by interval schedules of reinforcement. J Exp Anal Behav 11:327–383
Catania AC, Matthews TJ, Silverman PJ, Yohalem R (1977) Yoked variable-ratio and variable-interval responding in pigeons. J Exp Anal Behav 28:155–161
Chéramy A, Barbeito L, Godeheu G, Desce J, Pittaluga A, Galli T, Artaud F, Glowinski J (1990) Respective contributions of neuronal activity and presynaptic mechanisms in the control of the in vivo release of dopamine. J Neural Transm Suppl 29:183–193
Chesselet MF (1990) Presynaptic regulation of dopamine release. Implications for the functional organization of the basal ganglia. Ann N Y Acad Sci 604:17–22
Correa M, Carlson BB, Wisniecki A, Salamone JD (2002) Nucleus accumbens dopamine and work requirements on interval schedules. Behav Brain Res 137:179–187
Cousins MS, Atherton A, Turner L, Salamone JD (1996) Nucleus accumbens dopamine depletions alter relative response allocation in a T-maze cost/benefit task. Behav Brain Res 74:189–197
Daw ND (2003) Reinforcement learning models of the dopamine system and their behavioral implications. Unpublished doctoral dissertation, Carnegie Mellon University
Daw ND, Touretzky DS (2002) Long-term reward prediction in TD models of the dopamine system. Neural Comp 14:2567–2583
Daw ND, Kakade S, Dayan P (2002) Opponent interactions between serotonin and dopamine. Neural Netw 15(4–6):603–616
Daw ND, Niv Y, Dayan P (2005) Uncertainty based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8(12):1704–1711
Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876–879
Dawson GR, Dickinson A (1990) Performance on ratio and interval schedules with matched reinforcement rates. Q J Exp Psychol B 42:225–239
Denk F, Walton ME, Jennings KA, Sharp T, Rushworth MF, Bannerman DM (2005) Differential involvement of serotonin and dopamine systems in cost–benefit decisions about delay or effort. Psychopharmacology (Berl) 179(3):587–596
Dickinson A (1985) Actions and habits: the development of behavioural autonomy. Philos Trans R Soc Lond B Biol Sci 308(1135):67–78
Dickinson A, Balleine B (1994) Motivational control of goal-directed action. Anim Learn Behav 22:1–18
Dickinson A, Balleine B (2002) The role of learning in the operation of motivational systems. In: Pashler H, Gallistel R (eds) Stevens’ handbook of experimental psychology. Learning, motivation and emotion, 3rd edn, vol 3. Wiley, New York, pp 497–533
Dickinson A, Smith J, Mirenowicz J (2000) Dissociation of Pavlovian and instrumental incentive learning under dopamine agonists. Behav Neurosci 114(3):468–483
Domjan M (2003) Principles of learning and behavior, 5th edn. Thomson/Wadsworth, Belmont
Dragoi V, Staddon JER (1999) The dynamics of operant conditioning. Psychol Rev 106(1):20–61
Evenden JL, Robbins TW (1983) Increased dopamine switching, perseveration and perseverative switching following d-amphetamine in the rat. Psychopharmacology (Berl) 80:67–73
Faure A, Haberland U, Condé F, Massioui NE (2005) Lesion to the nigrostriatal dopamine system disrupts stimulus–response habit formation. J Neurosci 25:2771–2780
Fiorillo C, Tobler P, Schultz W (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299(5614):1898–1902
Fletcher PJ, Korth KM (1999) Activation of 5-HT1B receptors in the nucleus accumbens reduces amphetamine-induced enhancement of responding for conditioned reward. Psychopharmacology (Berl) 142:165–174
Floresco SB, West AR, Ash B, Moore H, Grace AA (2003) Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat Neurosci 6(9):968–973
Foster TM, Blackman KA, Temple W (1997) Open versus closed economies: performance of domestic hens under fixed-ratio schedules. J Exp Anal Behav 67:67–89
Friston KJ, Tononi G, Reeke GNJ, Sporns O, Edelman GM (1994) Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59(2):229–243
Gallistel CR, Gibbon J (2000) Time, rate and conditioning. Psychol Rev 107:289–344
Gallistel CR, Stellar J, Bubis E (1974) Parametric analysis of brain stimulation reward in the rat: I. The transient process and the memory-containing process. J Comp Physiol Psychol 87:848–860
Gibbon J (1977) Scalar expectancy theory and Weber’s law in animal timing. Psychol Rev 84(3):279–325
Goto Y, Grace A (2005) Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior. Nat Neurosci 8:805–812
Grace AA (1991) Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: a hypothesis for the etiology of schizophrenia. Neuroscience 41(1):1–24
Hernandez G, Hamdani S, Rajabi H, Conover K, Stewart J, Arvanitogiannis A, Shizgal P (2006) Prolonged rewarding stimulation of the rat medial forebrain bundle: neurochemical and behavioral consequences. Behav Neurosci 120(4):888–904
Herrnstein RJ (1961) Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav 4(3):267–272
Herrnstein RJ (1970) On the law of effect. J Exp Anal Behav 13(2):243–266
Houk JC, Adams JL, Barto AG (1995) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 249–270
Ikemoto S, Panksepp J (1999) The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking. Brain Res Brain Res Rev 31:6–41
Jackson DM, Anden N, Dahlstrom A (1975) A functional effect of dopamine in the nucleus accumbens and in some other dopamine-rich parts of the rat brain. Psychopharmacologia 45:139–149
Kacelnik A (1997) Normative and descriptive models of decision making: time discounting and risk sensitivity. In: Bock GR, Cardew G (eds) Characterizing human psychological adaptations: Ciba Foundation symposium 208. Wiley, Chichester, pp 51–70
Killeen PR (1995) Economics, ecologies and mechanics: the dynamics of responding under conditions of varying motivation. J Exp Anal Behav 64:405–431
Konorski J (1967) Integrative activity of the brain: an interdisciplinary approach. University of Chicago Press, Chicago
Lauwereyns J, Watanabe K, Coe B, Hikosaka O (2002) A neural correlate of response bias in monkey caudate nucleus. Nature 418(6896):413–417
Le Moal M, Simon H (1991) Mesocorticolimbic dopaminergic network: functional and regulatory roles. Physiol Rev 71:155–234
Ljungberg T, Enquist M (1987) Disruptive effects of low doses of d-amphetamine on the ability of rats to organize behaviour into functional sequences. Psychopharmacology (Berl) 93:146–151
Ljungberg T, Apicella P, Schultz W (1992) Responses of monkey dopaminergic neurons during learning of behavioral reactions. J Neurophys 67:145–163
Lodge DJ, Grace AA (2005) The hippocampus modulates dopamine neuron responsivity by regulating the intensity of phasic neuron activation. Neuropsychopharmacology 31:1356–1361
Lodge DJ, Grace AA (2006) The laterodorsal tegmentum is essential for burst firing of ventral tegmental area dopamine neurons. Proc Nat Acad Sci U S A 103(13):5167–5172
Lyon M, Robbins TW (1975) The action of central nervous system stimulant drugs: a general theory concerning amphetamine effects. In: Current developments in psychopharmacology. Spectrum, New York, pp 80–163
Mahadevan S (1996) Average reward reinforcement learning: foundations, algorithms and empirical results. Mach Learn 22:1–38
Mazur JA (1983) Steady-state performance on fixed-, mixed-, and random-ratio schedules. J Exp Anal Behav 39(2):293–307
McClure SM, Daw ND, Montague PR (2003) A computational substrate for incentive salience. Trends Neurosci 26(8):423–428
Mingote S, Weber SM, Ishiwari K, Correa M, Salamone JD (2005) Ratio and time requirements on operant schedules: effort-related effects of nucleus accumbens dopamine depletions. Eur J Neurosci 21:1749–1757
Montague PR (2006) Why choose this book?: how we make decisions. Dutton, New York
Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16(5):1936–1947
Moore H, West AR, Grace AA (1999) The regulation of forebrain dopamine transmission: relevance to the psychopathology of schizophrenia. Biol Psychiatry 46:40–55
Murschall A, Hauber W (2006) Inactivation of the ventral tegmental area abolished the general excitatory influence of Pavlovian cues on instrumental performance. Learn Mem 13:123–126
Niv Y, Daw ND, Dayan P (2005a) How fast to work: response vigor, motivation and tonic dopamine. In: Weiss Y, Schölkopf B, Platt J (eds) NIPS 18. MIT Press, Cambridge, pp 1019–1026
Niv Y, Daw ND, Joel D, Dayan P (2005b) Motivational effects on behavior: towards a reinforcement learning model of rates of responding. COSYNE 2005, Salt Lake City
Niv Y, Joel D, Dayan P (2006) A normative perspective on motivation. Trends Cogn Sci 10:375–381
Oades RD (1985) The role of noradrenaline in tuning and dopamine in switching between signals in the CNS. Neurosci Biobehav Rev 9(2):261–282
Packard MG, Knowlton BJ (2002) Learning and memory functions of the basal ganglia. Annu Rev Neurosci 25:563–593
Phillips PEM, Wightman RM (2004) Extrasynaptic dopamine and phasic neuronal activity. Nat Neurosci 7:199
Phillips PEM, Stuber GD, Heien MLAV, Wightman RM, Carelli RM (2003) Subsecond dopamine release promotes cocaine seeking. Nature 422:614–618
Redgrave P, Prescott TJ, Gurney K (1999) The basal ganglia: a vertebrate solution to the selection problem? Neuroscience 89:1009–1023
Robbins TW, Everitt BJ (1982) Functional studies of the central catecholamines. Int Rev Neurobiol 23:303–365
Roitman MF, Stuber GD, Phillips PEM, Wightman RM, Carelli RM (2004) Dopamine operates as a subsecond modulator of food seeking. J Neurosci 24(6):1265–1271
Salamone JD, Correa M (2002) Motivational views of reinforcement: implications for understanding the behavioral functions of nucleus accumbens dopamine. Behav Brain Res 137:3–25
Salamone JD, Wisniecki A, Carlson BB, Correa M (2001) Nucleus accumbens dopamine depletions make animals highly sensitive to high fixed ratio requirements but do not impair primary food reinforcement. Neuroscience 5(4):863–870
Satoh T, Nakai S, Sato T, Kimura M (2003) Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci 23(30):9913–9923
Schoenbaum G, Setlow B, Nugent S, Saddoris M, Gallagher M (2003) Lesions of orbitofrontal cortex and basolateral amygdala complex disrupt acquisition of odor-guided discriminations and reversals. Learn Mem 10:129–140
Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophys 80:1–27
Schultz W, Apicella P, Ljungberg T (1993) Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J Neurosci 13:900–913
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
Schwartz A (1993) A reinforcement learning method for maximizing undiscounted rewards. In: Proceedings of the tenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 298–305
Sokolowski JD, Salamone JD (1998) The role of accumbens dopamine in lever pressing and response allocation: effects of 6-OHDA injected into core and dorsomedial shell. Pharmacol Biochem Behav 59(3):557–566
Solomon RL, Corbit JD (1974) An opponent-process theory of motivation. I. Temporal dynamics of affect. Psychol Rev 81:119–145
Staddon JER (2001) Adaptive dynamics. MIT Press, Cambridge
Sutton RS, Barto AG (1981) Toward a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88:135–170
Sutton RS, Barto AG (1990) Time-derivative models of Pavlovian reinforcement. In: Gabriel M, Moore J (eds) Learning and computational neuroscience: foundations of adaptive networks. MIT Press, Cambridge, pp 497–537
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Taghzouti K, Simon H, Louilot A, Herman J, Le Moal M (1985) Behavioral study after local injection of 6-hydroxydopamine into the nucleus accumbens in the rat. Brain Res 344:9–20
Takikawa Y, Kawagoe R, Itoh H, Nakahara H, Hikosaka O (2002) Modulation of saccadic eye movements by predicted reward outcome. Exp Brain Res 142(2):284–291
Taylor JR, Robbins TW (1984) Enhanced behavioural control by conditioned reinforcers following microinjections of d-amphetamine into the nucleus accumbens. Psychopharmacology (Berl) 84:405–412
Taylor JR, Robbins TW (1986) 6-Hydroxydopamine lesions of the nucleus accumbens, but not of the caudate nucleus, attenuate enhanced responding with reward-related stimuli produced by intra-accumbens d-amphetamine. Psychopharmacology (Berl) 90:390–397
Tobler P, Fiorillo C, Schultz W (2005) Adaptive coding of reward value by dopamine neurons. Science 307(5715):1642–1645
van den Bos R, Charria Ortiz GA, Bergmans AC, Cools AR (1991) Evidence that dopamine in the nucleus accumbens is involved in the ability of rats to switch to cue-directed behaviours. Behav Brain Res 42:107–114
Waelti P, Dickinson A, Schultz W (2001) Dopamine responses comply with basic assumptions of formal learning theory. Nature 412:43–48
Walton ME, Kennerley SW, Bannerman DM, Phillips PEM, Rushworth MFS (2006) Weighing up the benefits of work: behavioral and neural analyses of effort-related decision making. Neural networks (in press)
Watanabe M, Cromwell H, Tremblay L, Hollerman J, Hikosaka K, Schultz W (2001) Behavioral reactions reflecting differential reward expectations in monkeys. Exp Brain Res 140(4):511–518
Weiner I (1990) Neural substrates of latent inhibition: the switching model. Psychol Bull 108:442–461
Weiner I, Joel D (2002) Dopamine in schizophrenia: dysfunctional information processing in basal ganglia-thalamocortical split circuits. In: Chiara GD (ed) Handbook of experimental pharmacology, vol 154/II. Dopamine in the CNS II. Springer, Berlin Heidelberg New York, pp 417–472
Wickens J (1990) Striatal dopamine in motor activation and reward-mediated learning: steps towards a unifying model. J Neural Transm 80:9–31
Wickens J, Kötter R (1995) Cellular models of reinforcement. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, pp 187–214
Wilson C, Nomikos GG, Collu M, Fibiger HC (1995) Dopaminergic correlates of motivated behavior: importance of drive. J Neurosci 15(7):5169–5178
Wise RA (2004) Dopamine, learning and motivation. Nat Rev Neurosci 5:483–495
Yin HH, Knowlton BJ, Balleine BW (2004) Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci 19:181–189
Zuriff GE (1970) A comparison of variable-ratio and variable-interval schedules of reinforcement. J Exp Anal Behav 13:369–374
Acknowledgements
This work was funded by the Gatsby Charitable Foundation, a Hebrew University Rector Fellowship (Y.N.), the Royal Society (N.D.), and the EU Bayesian Inspired Brain and Artefacts (BIBA) project (N.D. and P.D.). We are grateful to Saleem Nicola, Mark Walton, and Matthew Rushworth for valuable discussions.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Here we describe in more mathematical detail the proposed RL model of free-operant response rates (Niv et al. 2005a) from which the results in this paper were derived.
Formally, the action selection problem faced by the rat can be characterized by a series of states, S∈S, in each of which the rat must choose an action and a latency (a,τ) which will entail a unit cost, C u , and a vigor cost, C v /τ, and result in a possible transition to a new state, \(S\prime \), and a possible immediate reward with utility, U r . The unit cost constant C u and the vigor cost constant C v can take different values depending on the identity of the currently chosen action a∈{LP, NP, “Other”} and on that of the previously performed action. The transitions between states and the probability of reward for each action are governed by the schedule of reinforcement. For instance, in a random-ratio 5 (RR5) schedule, every LP action has p=0.2 probability of inducing a transition from the state in which no food is available in the magazine to that in which food is available. An NP action in the “no-reward-available” state is never rewarded and, conversely, is rewarded with certainty (p r =1) in the “food-available-in-magazine” state. As a simplification, for each reinforcement schedule, we define states that incorporate all the available information relevant to decision making, such as the identity of the previously chosen action, whether or not food is available in the magazine, the time that has elapsed since the last lever press (in random-interval schedules only), and the number of lever presses since the last reward (in fixed ratio schedules only). The animal’s behavior in the experiment is thus fully described by the successive actions and latencies chosen at the different states the animal encountered {(a i, τi, S i), i=1,2,3, ...}. The average reward rate \(\overline{R} \) is simply the sum of all the rewards obtained minus all the costs incurred, all divided by the total amount of time.
Using this formulation, we can define the differential value of a state, denoted V(S), as the expected sum of future rewards minus costs encountered from this state and onward compared with the expected average reward rate. Defining the value as an expectation over a sum means that the value can be written recursively as the expected reward minus cost due to the current action, compared with the immediately forfeited average reward, plus the value of the next state (averaged over the possible next states). To find the optimal differential values of the different states, that is, the values \(V^{*} {\left( S \right)}\) (and average value \(\overline{R} ^{*} \)) given the optimal action selection strategy, we can simultaneously solve the set of equations defining these values:
in which there is one equation for every state S∈ S, and p(\(S\prime \) |a,τ,S) is the schedule-defined probability to transition to state \(S\prime \) given (a,τ) was performed at state S.
The theory of dynamic programming (Bertsekas and Tsitsiklis 1996) ensures that these equations have one solution for the optimal attainable average reward \(\overline{R} ^{*} \), and the optimal differential state values \(V^{*} {\left( S \right)}\) (which are defined up to an additive constant). This solution can be found using iterative dynamic programming methods such as “value iteration” (Bertsekas and Tsitsiklis 1996) or approximated through online sampling of the task dynamics and temporal-difference learning (Schwartz 1993; Mahadevan 1996; Sutton and Barto 1998). Here we used the former and report results using the true optimal differential values. We compare these model results to the steady-state behavior of well-trained animals as the optimal values correspond to values learned online throughout an extensive training period.
Given the optimal state values, the optimal differential value of an (a,τ) pair taken at state S, denoted \(Q^{*} {\left( {a,\tau ,S} \right)}\), is simply:
The animal can select actions optimally (that is, such as to obtain the maximal possible average reward rate \(\overline{R} ^{*} \)) by comparing the differential values of the different (a,τ) pairs at the current state and choosing the action and latency that have the highest value. Alternatively, to allow more flexible behavior and occasional exploratory actions (Daw et al. 2006), response selection can be based on the so-called “soft-max” rule (or Boltzmann distribution) in which the probability of choosing an (a,τ) pair is proportional to its differential value. In this case, which is the one we used here, actions that are “almost optimal” are chosen almost as frequently as actions that are strictly optimal. Specifically, the probability of choosing (a,τ) in state S is:
where β is the inverse temperature controlling the steepness of the soft-max function (a value of zero corresponds to uniform selection of actions, whereas higher values correspond to a more maximizing strategy).
To simulate the (immediate) effects of depletion of tonic dopamine (Fig. 3b), Q values were recomputed from the optimal V values (using Eq. 2), but taking into account a lower average reward rate (specifically, \(\overline{R} _{{{\text{depleted}}}} = 0.4\overline{R} ^{*} \)). Actions were then chosen as usual, using the soft-max function of these new Q values, to generate behavior.
Finally, note that Eq. 2 is a function relating actions and latencies to values. Accordingly, one way to find the optimal latency is to differentiate Eq. 2 with respect to τ and find its maximum. For ratio schedules (in which the identity and value of the subsequent state \(S\prime \) is not dependent on τ), this gives:
showing that the optimal latency \(\tau ^{*} \) depends solely on the vigor cost constant and the average reward rate. This is true regardless of the action a chosen, which is why a change in the average reward has a similar effect on the latencies of all actions. In interval schedules, the situation is slightly more complex because the identity of the subsequent state is dependent on the latency, and this must be taken into account when taking the derivative. However, in this case as well, the optimal latency is inversely related to the average reward rate.
Rights and permissions
About this article
Cite this article
Niv, Y., Daw, N.D., Joel, D. et al. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology 191, 507–520 (2007). https://doi.org/10.1007/s00213-006-0502-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00213-006-0502-4