Decisions, large and small, depend on the valuation and comparison of the rewards associated with different options. Functional neuroimaging research on reward valuation has implicated a network of regions that include the ventral striatum (VS) and the orbitofrontal/medial prefrontal cortex (OFC/mPFC; Bartra, McGuire, & Kable, 2013; Glimcher & Fehr, 2013; Platt & Huettel, 2008). Blood-oxygen-level-dependent (BOLD) activity in the ventral striatum has been found to scale with cued reward magnitude during the anticipatory phase of the monetary incentive delay task (Knutson, Adams, Fong, & Hommer, 2001), providing evidence that the VS encodes reward values. Subsequent research found VS and OFC/mPFC activity during both anticipation and consumption of reward (Diekhof, Kaps, Falkai, & Gruber, 2012). Several studies have attempted to dissociate VS versus OFC/mPFC components in coding reward parameters, such as encoding expected value (operationalized as the product of reward magnitude and probability) and prediction errors (deviations between the current reward magnitude and the current estimate of expected value; Knutson et al., 2001; Knutson & Cooper, 2005; Plassmann, O’Doherty, & Rangel, 2007; Rolls, McCabe, & Redoute, 2008). The current most parsimonious and broadly supported model is that the VS signal is in large part encoding prediction error, whereas the OFC/mPFC is representing expected reward value (Abler et al., 2006; Hare, O’Doherty, Camerer, Schultz, & Rangel, 2008; McClure, Berns, & Montague, 2003; O’Doherty, Dayan, Friston, Critchley, & Dolan, 2003; Pagnoni, Zink, Montague, & Berns, 2002; Rolls et al., 2008), although we note this is not a pure dissociation (prediction error and expected value are linearly related in most paradigms; Niv, 2009), and expected value signals can be observed in the ventral striatum (Samejima, Ueda, Doya, & Kimura, 2005; Schultz, Apicella, Scarnati, & Ljungberg, 1992).

Both the ventral striatum and OFC/mPFC receive projections from dopamine neurons in the midbrain, which animal studies have consistently demonstrated as critical for reward-seeking behavior (Cromwell & Schultz, 2003; Fiorillo, Tobler, & Schultz, 2003; Gan, Walton, & Phillips, 2010; Roesch, Takahashi, Gugsa, Bissonette, & Schoenbaum, 2007; Samejima et al., 2005; Schultz, 2010; Tobler, Fiorillo, & Schultz, 2005). Given the involvement of dopamine in reward processes, it seems reasonable to hypothesize that differences in aspects of dopamine functioning influence reward valuation processes. Previous efforts to understand the association between reward valuation-related BOLD activity and dopamine function in humans have provided some support for this hypothesis. One study found that the Ankyrin 1 Taq1A polymorphism, whose alleles have been associated with different striatal dopamine D2 receptor (DRD2) density in several studies (Gluskin & Mickey, 2016), modulated effects of the dopamine agonist bromocriptine on BOLD activity related to reward anticipation (Kirsch et al., 2006). A second study found that Taq1A polymorphism modulated the predictive power of valuation-related BOLD activity on subsequent weight gain (Davis et al., 2008; Kirsch et al., 2006; Stice, Yokum, Bohon, Marti, & Smolen, 2010). Nonetheless, Taq1A polymorphism (which lies in a neighboring gene region, rather than the DRD2 gene itself) is only one of many factors impacting DRD2 availability, and two large studies failed to replicate the association between Taq1A and DRD2 availability (Laruelle, Gelernter, & Innis, 1998; Smith et al., 2017). A more accurate assessment of DRD2 availability than genetic proxy can be achieved with positron emission tomography (PET). Using PET, our lab and others previously observed an association between dopamine release and BOLD activity related to reward anticipation during a monetary incentive delay task, but these studies did not examine or report on the role of baseline DRD2 availability in reward valuation (Buckholtz, Treadway, Cowan, Woodward, Benning, et al., 2010a; Schott et al., 2008), and there is evidence showing that baseline DRD2 availability and dopamine release are not positively correlated (Buckholtz, Treadway, Cowan, Woodward, Li, et al., 2010b; Grace, 2000). Also, the anticipation phase of the monetary incentive delay task requires substantial preparatory responses to detect and rapidly respond to the task target, which may complicate interpretation of the BOLD signal as a “pure” reflection of valuation.

The present study directly measured DRD2 availability with PET using the high-affinity DRD2 ligand [18F]fallypride and BOLD activation elicited by a reward valuation fMRI task. We focused on DRD2 availability in the striatum, which has the highest concentration of postsynaptic DRD2, and the midbrain from which dopamine projections arise, and also the site of DRD2 autoreceptors, to examine whether midbrain dopamine function and dopamine-related function downstream in the largest targets of dopaminergic projection show differences in their association with BOLD activity related to reward valuation, particularly BOLD activity in the VS and OFC/mPFC. The reward valuation task, in contrast to other reward tasks, like the monetary incentive delay task, assessed BOLD responses, as both reward magnitude and probability were parametrically varied, which allows examination of the extent to which BOLD activity scales with expected value. The reward valuation task also did not require rapid target detection, thus allowing the dissociation between BOLD activity associated with reward valuation and preparatory responses.

Method

Subjects

Fourteen healthy subjects (mean age 34.0 ± 8.4 years, five females) participated in this study. Because of the expense of PET imaging, subjects were recruited to the present fMRI study from a list of subjects who had recently successfully undergone PET-fallypride imaging as part of ongoing or recently completed PET studies in our lab. Subjects were excluded if they reported any history of psychiatric illness on screening interview (a Structured Clinical Interview for DSM-IV Diagnosis [First et al., 1997] was also available for all subjects and confirmed no history of major Axis I disorders), any history of head trauma, any significant medical condition, or any condition that would interfere with MRI (e.g., extreme obesity, claustrophobia, cochlear implant, metal fragments in eyes, cardiac pacemaker, neural stimulator, pregnancy, anemia, metallic body inclusions or other metal implanted in the body). Subjects were also excluded if they reported a history of substance abuse, current tobacco use, alcohol consumption greater than 8 ounces of whiskey or equivalent per week, use of psychostimulants (excluding caffeine) more than twice at any time in their life or at all in the past 6 months, or any psychotropic medication in the last 6 months other than occasional use of benzodiazepines for sleep. Any illicit drug use in the last 2 months was grounds for exclusion, even in subjects who did not otherwise meet criteria for substance abuse. Urine drug tests, performed with the Construction 12-Drug Screen Test Card distributed by Innovacon, Inc., were available during the screening for all subjects, with any positive tests for the presence of amphetamines, cocaine, marijuana, PCP, opiates, benzodiazepines, or barbiturates reflecting grounds for exclusion. Written informed consent was obtained from all subjects. This study was approved by the Institutional Review Board at Vanderbilt University and performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.

PET data acquisition

PET imaging was performed on a GE Discovery STE scanner located at Vanderbilt University Medical Center. The scanner had an axial resolution of 4 mm and in-plane resolution of 4.5–5.5 mm FWHM at the center of the field of view. [18F]fallypride ((S)-N-[(1-allyl-2-pyrrolidinyl)methyl]-5-(3[18F]fluoropropyl)-2,3- dimethoxybenzamide) was produced in the radiochemistry laboratory attached to the PET unit, following synthesis and quality control procedures described in U.S. Food and Drug Administration IND 12,035. [18F]fallypride is a substituted benzamide with very high affinity to D2/D3 receptors (Mukherjee, Yang, Das, & Brown, 1995).

Three-dimensional emission acquisitions scans were performed following a 5.0 mCi slow bolus injection of [18F]fallypride (specific activity greater than 3,000 Ci/mmol). CT scans were collected for attenuation correction prior to each of the three emission scans, which together lasted approximately 3.5 hours, with two 15-minute breaks for subject comfort. Emission scans were acquired in the following sequence of frames: Scan 1: 8 × 15 s, 6 × 30 s, 6 × 60 s, 1 × 150 s, 3 × 300 s, 4 × 600 s; Scan 2: 4 × 750 s; and Scan 3: 3 × 1,200 s. PET images were reconstructed with decay correction, attenuation correction, scatter correction, and calibration.

PET data analysis

Voxelwise D2/D3 binding potential (BPND) images were calculated using the simplified reference tissue model, which has been shown to provide stable estimates of [18F]fallypride BPND (Siessmeier et al., 2005). The cerebellum served as the reference region because of its relative lack of D2/D3 receptors (Camps, Cortes, Gueye, Probst, & Palacios, 1989). The cerebellar reference region was obtained from an atlas provided by the ANSIR laboratory at Wake Forest University. Limited PET spatial resolution introduces blurring and causes signal to spill onto neighboring regions. Because the anterior cerebellum is located relatively proximal to the midbrain (the location of dopamine neurons) and the colliculi (which possess DRD2 receptors), only the posterior three quarters of the cerebellum was included in the ROI in order to avoid contamination of [18F]fallypride signal from the midbrain and colliculi. The cerebellum ROI also excluded voxels within 5 mm of the cortex to prevent contamination of cortical signals. The putamen ROI, drawn according to guidelines by Mawlawi et al. (2001), served as the receptor-rich region in the analysis. The cerebellum and putamen ROIs were registered to each subject’s T1 image using FSL nonlinear registration of the MNI template to individual subject T1. T1 images and their associated cerebellum and putamen ROIs were then coregistered to the mean image of all realigned frames in the PET scan using FSL linear registration (http://www.fmrib.ox.ac.uk/fsl/, Version 6.00). Emission images from the three PET scans were merged temporally into a 4-D file. To correct for motion during scanning and misalignment between the three PET scans, all PET frames were realigned to the twentieth frame using SPM8 (www.fil.ion.ucl.ac.uk/spm/). Model fitting and BPND calculation were performed using the PMOD Biomedical Imaging Quantification software (PMOD Technologies, Switzerland). BPND images represent the ratio of specifically bound ligand ([18F]fallypride in this study) to its free concentration. Mean BPND in the midbrain and three striatal ROIs (caudate, putamen, and ventral striatum) were extracted to regress against BOLD activity associated with reward valuation. The midbrain and striatal ROIs were drawn in MNI standard space using previously described guidelines (Dang, O’Neil, & Jagust, 2012a, b; Mawlawi et al., 2001) and registered to PET images using the same transformations for cerebellum registration to PET images (see Fig. 1).

Fig. 1
figure 1

ROIs and BPND images. a Striatal (top) and midbrain (bottom) ROIs used for extracting BPND. b Example of one subject’s [18F]fallypride BPND image in native PET space. BPND was highest in the striatum (top) and the midbrain (bottom). Note. Scaling is different in the bottom figure to allow better visualization of the midbrain. (Color figure online)

Reward valuation task

The task was adapted from a design by D’Ardenne, McClure, Nystrom, and Cohen (2008; see Fig. 2). On each trial, subjects were shown a number and asked to guess whether the second number, hidden under a white box, was smaller or larger than the first number. Both numbers were between zero and 10 and were pseudorandomly chosen so that they were never equal. Each trial carried a monetary value between $0.25 and $1.25, which was displayed at the top of the stimulus window. Subjects won the specified amount for correct guesses and lost that amount for wrong guesses. The probability of guessing correctly was greatest when the first number was closer to zero or 10 and decreased as the first number got closer to 5. The first numbers were selected from a distribution that yielded a 66% probability of guessing correctly. Subjects were required to perform above chance level (i.e., 50% accuracy) to be included in the analyses, and all subjects passed this criterion. To minimize habituation of reward processing, the value of each trial increased with the trial number such that early trials were valued at $0.25 and increased periodically to reach $1.25 for the last few trials.

Fig. 2
figure 2

Task schematic. Subjects had to guess whether the number hidden under the white box was greater or less than the number shown. Numbers ranged from zero to 10. Subjects won or lost the money amount shown if they guessed correctly or incorrectly. Reward value for each trial ranged from $0.25 to $1.25

Subjects performed 4 minutes of practice or until they understood the task, as indicated by above-chance performance. During scanning, each subject performed four blocks of 25 trials each; one subject completed only three blocks due to technical failure during the fourth block. Each trial began with a fixation cross lasting from 4 to 14 seconds. Then, the first number and the trial value appeared for 2 seconds. During this reward valuation and decision period, subjects were asked to guess whether the second number was greater or less than the number shown. Yellow arrows then appeared, signaling subjects to make their response by pressing the response pad with their index or middle finger to indicate a greater than or less than choice, respectively. Immediately after subjects made their response, a delay window showing green arrows appeared anywhere from 6 to 10 seconds. If subjects did not respond within 2 seconds of the yellow arrows appearing, the delay window would appear at the end of those 2 seconds. Lastly, feedback appeared for 4 seconds, telling subjects whether they guessed correctly or incorrectly or did not respond. If subjects made a response, the feedback also indicated the amount they won or lost.

MRI data acquisition

Structural and functional MRI scans were performed on a 3 Tesla Phillips Achieva scanner located at the Vanderbilt University Institute for Imaging Science. T1-weighted high-resolution 3-D anatomical scans (1 × 1 × 1-mm resolution) were obtained for each participant. Functional (T2* weighted) images were acquired using a gradient-echo echo-planar imaging (EPI) pulse sequence with the following parameters: TR = 2,000 ms; TE = 28 ms; flip angle 79°; 38 axial oblique slices (3.2 mm thick, 0.35 mm gap) oriented approximately 15 degrees from the AC-PC line. Two hundred and thirty-four volumes were acquired during two task blocks and 227 volumes during the other two blocks.

fMRI data analysis

We used FSL for preprocessing and statistical analyses. Preprocessing included motion correction with MCFLIRT, brain extraction with BET, spatial smoothing with a 5-mm full width half maximum Gaussian kernel, and high-pass temporal filtering (100 s). Statistical analyses were performed using a general linear model implemented by FEAT. FILM prewhitening was applied to correct for temporal autocorrelation. Temporal derivatives and temporal filtering were included to improve fitting of the model to the data. Reward valuation was modeled as the expected value, which is the product of reward magnitude and reward probability. Expected value in the valuation/anticipatory phase is inversely related to prediction error in the feedback phase on correct trials since both depend on the likelihood of guessing correctly given the relative distance from the number 5. As such, it is not always clear whether BOLD signals associated with anticipatory reward valuation are distinct from the subsequent prediction error signals (Niv, 2009). To minimize the overlap between expected value and subsequent prediction error BOLD signal, expected value was modeled during the reward valuation phase along with negative and positive prediction error parameters modeled during the feedback/outcome phase as variables of no interest, for each subject and each task block.

For group-level analyses, we first coregistered first-level results to the T1- weighted anatomical image using boundary-based registration (Greve & Fischl, 2009). T1 images and associated first-level results were normalized to MNI space using FSL nonlinear registration (Andersson, Jenkinson, & Smith, 2007). Then we averaged first level results across the four task blocks for each subject and averaged across subjects to isolate patterns of activity specific to expected value. To examine relations between [18F]fallypride BPND and BOLD activity, we regressed BPND, with age and sex as covariates of noninterest, in a voxelwise analysis against subject-level contrast maps for expected value. All contrast maps were thresholded at z > 2.3 with cluster thresholding (p < .05) to correct for multiple comparisons; FSL cluster thresholding has been shown to be less likely to produce false positives compared with cluster thresholding by other frequently used fMRI analysis software (Eklund, Nichols, & Knutsson, 2016). Regression results were masked with the pattern of activity associated with expected value.

Results

Reward valuation and BOLD activation

Performance accuracy ranged from 58% to 76%, with a mean of 72%. Expected value, modeled during the valuation/anticipatory phase, correlated with bilateral activation in the lateral prefrontal cortex, lateral parietal cortex, cingulate, insular cortex, striatum, thalamus, and midbrain (see Fig. 3a and Table 1).

Fig. 3
figure 3

Reward valuation, BOLD activation, and BPND. a Expected value correlated with activation in the lateral prefrontal cortex, lateral parietal cortex, cingulate, insular cortex, striatum, thalamus, and midbrain. b Midbrain BPND correlated positively with BOLD activity associated with expected value in the left ventral striatum/caudate (peak z stat = 3.27, peak coordinate: x = −14, y = 6, z = 8). Results were thresholded at z > 2.3 with cluster thresholding (p < .05). c Z stats extracted from the left ventral striatal ROI in b were plotted against midbrain BPND to show the correlation graphically. (Color figure online)

Table 1 Peak coordinates of BOLD activation associated with expected value

BOLD activation and [18F]fallypride BP ND

Voxelwise analyses examining the association between BPND and reward valuation-related BOLD activations, controlling for age and sex, revealed that BPND in the midbrain ROI correlated significantly with BOLD activation in the left ventral striatum (extending to the left caudate) such that greater BPND correlated with greater BOLD activation (see Fig. 3b). We extracted mean z statistics for each subject from this left ventral striatal ROI and plotted them against midbrain BPND to show the correlation graphically (see Fig. 3c). BPND in the striatum (caudate, putamen, or ventral striatum) did not correlate with BOLD activation associated with expected value (z > 2.3, cluster thresholding p < .05).

An average of 6.6 ± 3.9 months separated the PET and fMRI scans. The relation between midbrain BPND and reward valuation-related BOLD activations remained significant after controlling for the time difference between PET and fMRI scans, t(11) = 3.08, p = .011).

Discussion

The present study aimed to examine whether D2/D3 BPND in the midbrain was related to individual differences in BOLD activity during reward valuation. Reward valuation, operationalized here as expected value in a guessing task, was correlated with neural activity in the striatum, thalamus, midbrain, lateral prefrontal cortex, lateral parietal cortex, cingulate, and insula. A number of these brain areas have been commonly associated with reward processing (Der-Avakian & Markou, 2012; McClure, York, & Montague, 2004; Montague, King-Casas, & Cohen, 2006; O’Doherty, 2004). Although models of dopamine functions in the striatum have often emphasized prediction error signaling, the emergence of striatal, particularly ventral striatal, activations in relation to reward valuation is consistent with both electrophysiological recordings in the ventral striatum of monkeys that show neural activity responsive to expected value (Samejima et al., 2005; Schultz et al., 1992), and fMRI data during performance of reward tasks (Knutson et al., 2001; Schott et al., 2008).

The novel finding from this study is that individual differences in DRD2 BPND in the midbrain were positively related to the degree with which BOLD responses in the ventral striatum increased in relation to expected value. This finding complements data from nonhuman primates showing that midbrain dopamine activity adaptively scales with expected value (Schultz, 2010; Tobler et al., 2005). However, interpretation of the present result must be considered in light of two sources of individual differences in midbrain DRD2 BPND. Midbrain BPND reflects both the number of dopamine neurons in the midbrain and the density of autoreceptors on those neurons. At present, the relative contribution of these two determinants of midbrain BPND are not known.

If interpreted as reflecting numbers of dopamine neurons, the positive direction of the observed relation between midbrain BPND and BOLD activity is congruent with evidence that BOLD signal associated with reward anticipation correlates positively with dopamine synthesis capacity (Dreher, Meyer-Lindenberg, Kohn, & Berman, 2008), as well as evidence that amphetamine and task-induced dopamine release in the striatum (particularly the ventral striatum) has been found to correlate positively with BOLD activation associated with reward anticipation in the monetary incentive delay task (Buckholtz, Treadway, Cowan, Woodward, Benning, et al., 2010a; Schott et al., 2008). By contrast, lesions of dopaminergic pathways reduce metabolic activity in target sites (Schwartz, Sharp, Gunn, & Evarts, 1976). As such, a heightened number of dopamine neurons would be predicted to produce enhanced valuation signals in the ventral striatum.

However, midbrain BPND also reflects dopamine DRD2 autoreceptors that have downregulatory effects on dopamine release by increasing the threshold for neuronal firing and reducing dopamine synthesis (Cooper, Bloom, & Roth, 2003; O’Donnell & Grace, 1994; Wolf & Roth, 1990). In past studies, we have emphasized the density of autoreceptors rather than the number of dopamine neurons when interpreting DRD2 signals in the midbrain (Buckholtz, Treadway, Cowan, Woodward, Li, et al., 2010b; Zald et al., 2008). The interpretation that higher midbrain BPND primarily reflects autoreceptors is more challenging to reconcile with the current data in that it would suggest there should be an inverse rather than a positive relationship between midbrain BPND and expected value signals due to its inhibition of dopamine synthesis and release. A few factors may lead to this discrepancy. First, in the present study, we particularly were measuring the scaling of BOLD signal to different expected reward values. This is different than studies using the monetary incentive delay task that have examined relations between dopamine and general responses to reward, rather than the scaling of reward values. It is possible that dopamine parameters differentially influence overall valuation versus the scaling of values. Second, it must be noted that these past studies have not explicitly reported an association between midbrain DRD2 and BOLD responses. Midbrain BPND explained less than 20% of the variance in striatal dopamine release in Buckholtz, Treadway, Cowan, Woodward, Li, et al. (2010b), and ventral striatal amphetamine-induced dopamine release explained about 25% of the variance in ventral striatal BOLD activations in Buckholtz, Treadway, Cowan, Woodward, Benning, et al. (2010a). Thus, the extent of autoreceptor regulation, and the conditions under which it is most relevant, remain to be fully established. Future studies assessing the relation between midbrain DRD2 availability, dopamine release, and BOLD response across different components of reward would provide clarity on this issue. Studies utilizing additional markers of presynaptic dopamine functions, such as dopamine synthesis capacity, would also help disambiguate the mechanistic relation between midbrain DRD2 and striatal valuation signals. Furthermore, while dopamine receptors are highly concentrated in the striatum, other neurotransmitters (e.g., glutamate from the cortex) also have a role in striatal functions so examinations of dopaminergic versus nondopaminergic influences on striatal functions would enhance understanding of the mechanism of the present findings.

Interestingly, the relation between midbrain DRD2 availability and reward valuation-related BOLD activity was specific to the left ventral striatum. This left lateralization of striatal responses to reward has been reported across multiple study populations, encompassing individuals with and without pathology (Brody et al., 2004; Juckel et al., 2006; Linnet, Peterson, Doudet, Gjedde, & Moller, 2010; McClure et al., 2003; Menza, Mark, Burn, & Brooks, 1995; Schott et al., 2008; Tomer & Aharon-Peretz, 2004). Although the weight of evidence favors an association between reward function and left striatum, we note that several studies have emphasized the importance of left/right striatal asymmetry in reward responses (Cannon et al., 2009; Tomer, Goldstein, Wang, Wong, & Volkow, 2008; Tomer et al., 2014), and some have shown greater reward-related responses in the right striatum (Martin-Soelch et al., 2011; Tricomi & Lempert, 2015).

An obvious limitation of this study is the small sample size, which negatively impacts statistical power. However, the sample size of this study is comparable or larger than similar multimodal studies employing PET (Monchi, Ko, & Strafella, 2006; Pappata et al., 2002; Zald et al., 2004). Due to the high costs and necessary radiation exposure, the sample sizes of typical PET studies are small relative to studies employing other imaging methods, such as fMRI. A replication of the present results with a larger sample size for greater statistical power would provide additional support for these findings. Another limitation is that PET-[18F]fallypride data were acquired months before fMRI data acquisition. Although the inclusion of this time difference in the analysis did not change the results, this statistical control may not account for all changes in dopamine function during this time that could have altered the relation between DRD2 availability and reward valuation-related fMRI activation. Cognitive performance and associated fMRI activation, particularly in frontostriatal circuitry which is relevant for the current study, have high concordance across one year (Aron, Gluck, & Poldrack, 2006). Published data on the long-term stability of [18F]fallypride binding is lacking at present. However, D2-like receptor availability as measured by [18F]fallypride shows good test–retest reliability across time periods of a month or more and thus appears to provide a reasonably stable index of individual differences in striatal dopamine D2-like function (Mukherjee et al., 2002).

In summary, this study provides in vivo evidence for a relationship between DRD2 availability and reward valuation-related BOLD activity in humans. The findings should contribute to a better understanding of the neurobiological influences on decision-making, in terms of both everyday choices among healthy individuals and abnormalities of reward valuation in psychopathology.