Abstract
Neurons comprising nigrostriatal system play important roles in action selection. However, it remains unclear how this system integrates recent outcome information with current action (movement) and outcome (reward or no reward) information to achieve appropriate subsequent action. We examined how neuronal activity of substantia nigra pars compacta (SNc) and dorsal striatum reflects the level of reward expectation from recent outcomes in rats performing a reward-based choice task. Movement-related activity of direct and indirect pathway striatal projection neurons (dSPNs and iSPNs, respectively) were enhanced by reward expectation, similarly to the SNc dopaminergic neurons, in both medial and lateral nigrostriatal projections. Given the classical basal ganglia model wherein dopamine stimulates dSPNs and suppresses iSPNs through distinct dopamine receptors, dopamine might not be the primary driver of iSPN activity increasing following higher reward expectation. In contrast, outcome-related activity was affected by reward expectation in line with the classical model and reinforcement learning theory, suggesting purposive effects of reward expectation.
Similar content being viewed by others
Introduction
Dopamine neurons (DANs) in the substantia nigra pars compacta (SNc) play an important role in movement and reward signaling1,2. Recent studies report that the phasic activity of SNc DANs changes synchronously with action onset, and that optogenetic manipulation of this activity instantaneously affects the initiation of action2,3, thereby suggesting that phasic DAN activity is involved in immediate action modulation. It is currently believed that SNc DANs influence the dorsal striatum to perform this function4. The majority of striatal neurons consist of projection neurons (SPNs)5, which give rise to the two intrinsic pathways of the basal ganglia, the direct and indirect pathways (dSPNs and iSPNs, respectively)1,6,7. In general, these two striatal pathways are hypothesized to exert an antagonistic effect on motor control through the excitation and inhibition of the downstream thalamus and cortex1,6,8, in agreement with reports on their antagonistic involvement in locomotor activity9, reinforcement10,11, and drug sensitization12. Although it is well acknowledged that dopamine depolarizes and hyperpolarizes dSPNs and iSPNs through dopamine D1 and D2 receptors, respectively13, dSPNs and iSPNs are now known to be activated concurrently to produce coherent sequences of voluntary actions and to convey similar reward information14,15,16,17. In addition, we recently reported that in goal-directed action selection, outcome signals are differentially represented via dSPNs (encoding reward outcome) and iSPNs (encoding no-reward outcome) in the rat dorsomedial striatum (DMS), although showing concurrent activation during movement initiation18. However, there is still a debate on how the phasic activity of SNc dopamine and dorsal striatum neurons integrates the information on recently obtained rewards with the current action and outcome information to achieve the subsequent action appropriately (i.e. action selection guided by reward acquisition in past trials).
On the other hand, many lines of evidence showed that phasic DAN activity is related to outcome information processing rather than action execution, particularly encoding reward prediction error (RPE) to learn the most valuable action through trial and error19,20. The models explaining the roles of dopamine in outcome information processing, in addition to action execution, propose that different functions depend on the timing of the DAN activity change (e.g. the phasic activity of DANs is mainly related to the RPE signal, while persistent activity changes are related to action execution)21,22. Besides the DANs, ~30% of the neurons in the SNc are GABAergic and may be critically involved in the regulation of DAN activity23. Moreover, the nigro-striatal system exhibits a medio-lateral functional and anatomical segregation24. In rodents, the medial and lateral regions of the SNc (hereafter, mSNc and lSNc, respectively) send distinct reward-related dopamine signals to the DMS or dorsolateral striatum (DLS)25,26,27. In monkeys, the mSNc encodes motivational value, whereas the lSNc encodes motivational salience28. It is also known that the target DMS and DLS neurons have different functional properties29,30, receiving topographical inputs from SNc and various areas of the cerebral cortex31,32. In particular, DMS is associated with goal-directed behavior33,34, while DLS is associated with the formation of habitual behavior34,35.
Taken together, a pivotal question arises as to how the different neuron populations of the nigro-striatal system participate in the integration of previous reward information with current action- and outcome-related activity. To address this question, we trained rats under head-fixed conditions to complete a reward-based choice task18. The rats had to select and execute the adequate action based on the presence or absence of a reward in past trials. We examined the action- and outcome-related activity of optogenetically identified striatal and nigral projection neurons during task performance, together with measurement of local dopamine release in the dorsal striatum.
Results
Contribution of recent outcomes to current action choice strategy
To study how the action- and outcome-related neuronal activity and nigrostriatal dopamine transmission relate to recently obtained rewards during decision making, we trained adult wild type (WT) rats (n = 6) and transgenic rats expressing Cre recombinase under Tac1 (n = 7), D2R (n = 6), or TH (n = 7) promotor to perform a push/pull choice task based on probabilistic reward18 (Fig. 1a). After holding a lever in the center position for more than 300 ms, a visual cue (Go-cue) was provided, serving as a signal for the rats to either pushing or pulling a lever based on the reward probability assigned to each action. 70% of correct (preferable) responses (e.g., push in the case of Fig. 1a) and 10% of incorrect (unpreferable) responses (pull in the case) were followed by a high-tone sound (10 kHz, 60 dB, 0.3 s) signaling a reward outcome. The reward (water) was then delivered with a delay of 0.3 s. Conversely, the remaining responses were followed by a low-tone sound (4 kHz, 60 dB, 0.3 s) indicating a no-reward outcome. In this case, no water was delivered. The high-probability rewarded choice (e.g., push in Fig. 1a) was changed after at least 30 correct choices and a 79% correct rate in the last ten trials. Individual rats mastered the task (error rate <15%) within 25 days of training. On average, the rats adapted to select actions associated with high reward probability within 15 trials and then continued to select them in more than 79% of subsequent trials (Fig. 1b, c). Next, we evaluated the effects of recent outcomes on the upcoming action choice. The contribution of recent outcomes to the action choice strategy can be estimated by logistic regression36. We used a regression model in which the probability of staying or switching selection in the next trial was determined by the rats’ recent outcomes (Fig. 1d). The contribution of recent outcomes declined with the passage of trials. Rewards in the previous five trials had a significant effect on choices in the upcoming trial, persuading the animal to stay with the same choice (positive regression coefficients; β = 0.27 ± 0.36, p = 0.0015 with the fifth-past trial). The absence of reward in the previous two trials promoted switching (negative regression coefficients; β = −0.18 ± 0.26, p = 0.04 with the second-past trial). Given that the choice on a given trial was influenced by the outcomes of the preceding five trials, we quantified the average reward experience as a proportion, referred to as the ‘reward rate’ (Fig. 1b, green). Specifically, the reward rate is calculated as the mean number of rewards received in the last five trials (without including current trial), i.e., the number of rewards divided by five. This yields a value between 0 and 1, with 0 indicating no rewards received in the last five trials, and 1 indicating that all five last trials were rewarded. By using this proportional scale, we can accurately represent the recent reward experience for each trial. We observed that the rats implemented a win-stay lose-switch strategy in which rewards served as evidence to stay with the same choice, and no-reward outcomes promoted switching (Fig. 1e). The reaction time (time from Go-cue to movement onset; 218 ± 61 ms overall) and lever movement time (143 ± 224 ms) were also significantly correlated with reward rate (Fig. 1f). These results suggest that the rats used the outcome information from up to the recent five trials to guide their behavior. Based on the contribution of the reward rate to the next choice, from here on, we will evaluate how the reward rate correlates with dopamine release and activity of neurons in the nigral and striatal regions.
Striatal dopamine release correlated with reward rate
To explore how dopamine release in dorsal striatum related to the reward rate, we injected AAV vector into DMS or DLS to express dLight 1.3b and monitored the fluorescence signal around different task-related events (Go-cue, movement onset, and outcome tone: The movement onset occurred after a variable delay following the Go-cue, i.e., reaction time) using fiber photometry (Fig. 2a and Supplementary Fig. 1). The dLight signal at 470 nm excitation showed robust task-related changes (Fig. 2b). In contrast, no change was observed in the control 405 nm excitation (Fig. 2c). The population data showed a rapid robust signal increase after the Go-cue, with an ascending phase coincident with movement onset, illustrating the delay between the Go-cue and the actual movement initiation (Fig. 2d, e). This action-related signal activation was similar in the DMS and DLS (Fig. 2g, h), independent of the current trial outcome. We also observed a characteristic signal change in the DMS and DLS in response to outcome: a clear signal increase after the reward tone and a signal decrease after the no-reward tone (‘dopamine dip’). However, there was a difference in the time course of their changes: in the DLS dopamine signal exhibited a sharp response and a return to baseline within 500 ms (Fig. 2i), whereas in the DMS signal showed a slower and long-lasting change (Fig. 2f). Because we observed different movement-related dopamine signal changes during push and pull selections (Supplementary Fig. 1), we used only the sessions that exhibited a preferred activity modulation during either push or pull selections. To evaluate how the dopamine signal correlated with recently obtained rewards, we used the average number of rewards over the last five trials (reward rate), based on the behavioral analysis (Fig. 1). We evaluated the reward rate correlation with the population dopamine signal at different time windows: immediately after the Go-cue, after the lever movement onset, and after the outcome tone, differentiating between rewarded and non-rewarded trials (Fig. 2d–i, insets; Supplementary Fig. 2; see also Supplementary Table 1). After the movement onset, the DMS and DLS dopamine signals were positively correlated with reward rate (Fig. 2e, h and Supplementary Fig. 2a, b). However, during the outcome period, the signal was stronger when the reward rate was lower, being consistent with positive RPE (Fig. 2f, i and Supplementary Fig. 2a, b).
Identified SNc DAN activity correlated with reward rate
It was reported that some patterns of local dopamine release are inconsistent with the firing activity of dopamine cells37, suggesting that the examination of firing patterns of dopamine cells was needed for understanding the role of dopamine signals in reward information processing. We recorded the neuronal activity in the SNc of task-performing TH-Cre rats under a head-fixed condition. We isolated a total of 1523 neurons during task performance. These neurons were further classified into putative DANs (n = 502) and GABAergic interneurons (n = 1021) according to a clear bimodal distribution of their spike duration (we cannot discard the possibility that the narrow spikes may correspond to neighboring SNr neurons, Supplementary Fig. 3a). We then further selected neurons that showed a significant movement-related and/or outcome-related response (see “Methods” section, DANs: n = 212; INs: n = 138; see Supplementary Fig. 4 for representatives). The ongoing spike rates of the putative GABAergic neurons (3.01 ± 6.2 Hz) were significantly higher than those of putative DANs (1.14 ± 2.25 Hz; p = 1.42e−04, Wilcoxon rank-sum test), consistent with previous reports37. To confidently differentiate between the SNc DANs projecting to the DMS and those projecting to the DLS (SNc-DMS and SNc-DLS DANs, respectively), we used an optogenetically evoked spike collision test with antidromic stimulation of either DMS or DLS while recording SNc neurons of TH-Cre rats38. We injected AAV2-EF1α-Flex-ChRWR/Venus into the left SNc of TH-Cre rats. The expression of Venus in the striatum and the coexpression of Venus and TH in the SNc DANs were confirmed histologically (Fig. 3a). Figure 3b shows representative tetrode traces of antidromically evoked spikes (black) and their disappearance due to collision with spontaneous spikes (yellow) in an SNc recorded neuron. We identified 33 SNc-DMS DANs and 23 SNc-DLS DANs. The latency of the antidromically evoked spikes in SNc-DMS DANs and SNc-DLS DANs was 14.5 ± 4.6 ms and 12.6 ± 3.8 ms, respectively (Supplementary Fig. 5a, b). All DANs exhibited a wide spike waveform (SNc-DMS: 1.05 ± 0.16 ms; SNc-DLS: 0.96 ± 0.12 ms) and a low spike rate (SNc-DMS: 2.02 ± 1.60 Hz; SNc-DLS: 1.92 ± 1.45 Hz; Supplementary Fig. 5c, d). We then further selected neurons that showed a significant movement-related and/or outcome-related response (see Methods), 23 SNc-DMS and 22 SNc-DLS DANs were obtained in both. To evaluate how the neuronal activity variated depending on the reward rate, we used only the neurons that exhibited preferred activity modulation during either push or pull selection. Nineteen SNc-DMS and 11 SNc-DLS DANs exhibited higher activation during either a push or pull selection. These identified task-related neurons showed a mediolateral segregation (Supplementary Fig. 5e, f). Some neurons classified as GABAergic interneurons exhibited some response to the light stimulation. However, this response did not comply with the criteria used for the analyzed neurons (i.e., fixed latency, small jitter, collision test). So, we did not include these data of GABAergic interneurons in the analysis.
Both SNc-DMS and SNc-DLS DANs exhibited robust phasic activation after the Go-cue. Consistent with the activity of the putative SNc DANs (Supplementary Fig. 3b), the identified striatum-projecting DANs showed a characteristic pattern in response to reward and no-reward outcomes (Fig. 3c), i.e., increased activity after the reward tone and suppression after the no-reward tone. The latency of the DANs activity was shorter than that of the dopamine signal response (Fig. 3c vs. Fig. 2e, h), as expected considering the time needed for dopamine release and uptake39. Contrastingly, putative GABAergic interneurons displayed a phasic short-latency activation followed by prolonged activation after the reward tone, and a slower activation followed by prolonged suppression in response to no-reward tone (Supplementary Fig. 3b). SNc-DMS DANs showed higher activation than SNc-DLS DANs in response to a reward (p = 2.6e−21, rank-sum test). Additionally, the activity suppression after no-reward in SNc-DLS DANs was smaller than in the SNc-DMS DANs (p = 0.005, rank-sum test). This response pattern agrees with previous reports40, where DLS-projecting neurons presented activation in response to a reward but showed a lack of a dopamine dip after reward omission.
Next, we evaluated how the neuronal activity correlated with the reward rate (Supplementary Fig. 3c). Consistent with the putative DANs (Supplementary Fig. 3d), the SNc-DMS and SNc-DLS DAN activity after Go-cue and movement onset was positively correlated with reward rate (Fig. 3d, e, g, h). Additionally, only the SNc-DMS DANs exhibited a positive correlation with reward rate during the outcome period (Fig. 3f), whereas the SNc-DLS DANs had a tendency for a negative correlation (Fig. 3i). The correlation of the movement-related SNc-DMS DANs was significantly stronger than that of the SNc-DLS DANs (rewarded: p = 0.009; non-rewarded: p = 0.047, Fisher’s z-test, Fig. 3j). Meanwhile, no significant difference in the reward rate correlation of movement-related activity was found between rewarded and no rewarded trials in SNc-DMS or SNc-DLS DANs (Fig. 3j). These results suggest that the different neuronal populations in the SNc may have a similar movement-related activity correlation with reward rate, consistent with change in state value37,41. On the other hand, the reward rate signaling during the outcome period differed across population types and their mediolateral distribution (Fig. 3f, i). Additionally, the correlation of the outcome-related activity of SNc-DMS DAN with the reward rate notably differed from the dopamine signal in the DMS (Fig. 3f vs. Fig. 2f, inset), where the SNc-DMS DAN exhibited a higher activity as the reward rate was higher, while the dopamine signal exhibited the opposite pattern. These results indicate that dopamine release may convey different information to dopamine neuron firing, supporting the idea that dopamine local release dynamics may be controlled in different ways37.
Identified dSPN and iSPN activities correlated with reward rate
Previous studies have shown a high degree of specificity in the interactions between different cortical and thalamic pathways according to postsynaptic striatal cell types42. Our results showed that the action-related DAN activity and local dopamine release increased with higher reward expectation. The classical model for direct (dSPN) and indirect pathway neurons (iSPN) in the striatum postulates excitatory and inhibitory effects of dopamine on dSPN and iSPN, respectively6. Therefore, a higher dopamine concentration should exert opposite effect on dSPNs and iSPNs acting through dopamine D1 and D2 receptors, respectively. We performed DMS and DLS recordings and selectively tracked the activity of dSPNs (expressing D1 receptors) and iSPNs (expressing D2 receptors) using AAV2-EF1α-Flex-ChRWR/Venus or AAV2-Syn-Flex-rcChrimsonR-tdTomato injections in Tac1-Cre (for dSPN) or Drd2-Cre (for iSPN) transgenic rats. We confirmed that dSPNs and iSPNs project to the SNr (striatonigral neurons) and GPe (striatopallidal neurons), respectively (Figs. 4a and 5a).
We isolated a total of 2002 neurons from the dorsal striatum and classified them using three parameters: ongoing spike rate, spike duration and coefficient variation (CV) of interspike interval (ISI) (Supplementary Fig. 6a). Three clearly separate clusters were formed, assignable to the previously described striatal neuron subpopulations, i.e., SPNs (n = 716), tonically active neurons (TANs, n = 1224), and fast-spiking interneurons (FSIs, n = 62; Supplementary Fig. 6b): we obtained the task-related neurons using the same criteria as for the SNc DANs (SPNs, n = 138; TANs, n = 274; and FSIs, n = 17). Putative DMS-SPNs and DLS-SPNs exhibited a phasic activation in response to the Go-cue and coincided with the movement onset (Supplementary Fig. 6c). Movement- and outcome-related activities of SPNs were positively correlated with the reward rate (Supplementary Fig. 6d). TANs in both DMS and DLS exhibited a subtle activation after the Go-cue and movement-onset, which was positively correlated with the reward rate in non-rewarded trials. FSIs showed activation after Go-cues and movement onset, which was also positively correlated with reward rate in non-rewarded trials. These results show that the positive reward rate correlation with the action-related activity was predominant in SPNs and interneurons.
For the identification of the dSPNs, we confirmed the responsiveness to single light pulses delivered next to the recording sites in the DMS or DLS, selecting only those neurons that exhibited short latency (<8 ms), small jitter (<1 ms), and a high waveform correlation between light-evoked and spontaneous spikes (Fig. 4b, c; see “Methods” section). We identified 50 task-related dSPNs in the DMS and 28 dSPNs in the DLS. We used only those neurons that exhibited a preferred activation during the movement period (DMS-dSPNs: n = 44; DLS-dSPNs: n = 22). dSPNs showed robust activation during the movement onset in both the DMS and DLS (Fig. 4d). DMS- and DLS-dSPNs showed a higher activation in response to rewarded than non-rewarded outcomes (Fig. 4d), in agreement with previous results from our laboratory18. The movement-related activity of DMS-dSPNs showed positive correlation with reward rate in both rewarded and no-rewarded trials (Fig. 4f). That of DLS-dSPNs exhibited positive correlation with reward rate only in no-rewarded trials (Fig. 4i and Supplementary Fig. 2e, f). The outcome-related activity of DLS-dSPNs after no-reward was positively correlated to reward-rate (Fig. 4j).
For the identification of the iSPNs, we evaluated the responsiveness of the striatopallidal neurons to single light pulses delivered from the target GPe (Fig. 5b, c). We matched the light-evoked spikes to the spontaneous spikes, and thereby distinguished iSPNs from D2-positive TANs, identifying 31 task-related iSPNs from DMS and 32 iSPNs from DLS. We analyzed only those neurons that exhibited a preferred activation during the movement period in accordance with dopamine analysis (DMS-iSPNs, n = 22; DLS-iSPNs, n = 20). iSPNs showed robust activation during the movement period in both DMS and DLS (Fig. 5d). Overall, peak latencies of movement-related firing activity were similar among different groups of nigral and striatal neurons except for DMS–SNc neurons (Supplementary Fig. 7). Importantly, the activation of those neurons preceded dopamine release in the striatum (e.g., Fig. 2e. h vs. Figs. 3c, 4d, and 5d), suggesting possible common inputs other than dopamine.
In contrast to dSPNs response, DMS-iSPNs exhibited activation after no-reward outcomes and suppression after reward outcomes (Fig. 5d and Supplementary Fig. 4d) consistent with our previous observation18. However, DLS-iSPN activity did not differ between rewarded and non-rewarded trials at a population level, although some neurons showed higher activation after no reward tone (Supplementary Fig. 4f). The activity of both DMS- and DLS- iSPNs populations was positively correlated with reward rate during movement and outcome periods (Fig. 5e–j and Supplementary Fig. 2g, h). The DMS-iSPNs exhibited the strongest correlation with reward rate in both movement- and outcome related activity among DMS and DLS SPNs (Fig. 5k).
Discussion
In the present study, we investigated how previous rewards information is integrated into action- and outcome-related activity through distinct types of projection neurons and interneurons in SNc and dorsal striatum according to their mediolateral topography (Fig. 6). We used rats under a head-fixed condition performing a reward-context-dependent lever push/pull choice task (Fig. 1), and examined local dopamine release, DMS- and DLS-projecting SNc DAN activity, and DMS and DLS SPN and interneuron activity. To identify different neuron types, we applied optogenetics and electrophysiological techniques, accomplishing high spatial and temporal resolution at a cellular level18,43,44,45.
In our study, the action-related (both Go-cue- and movement-related) local dopamine release (Fig. 2) and activity of both DMS- and DLS-projecting SNc DANs (Fig. 3) were enhanced by reward expectation reflected by reward rate (Fig. 6; green, small upward arrows indicate enhancement). Additionally, the action-related activity of most types of DMS and DLS neurons also exhibited similar positive correlation with the reward rate (Figs. 4–6). According to the classical model1,6,10, dopamine in the striatum exerts an excitatory and inhibitory effect on dSPNs and iSPNs, respectively, thereby regulating the activity balance of the downstream thalamus and cortex antagonistically. However, it was reported that the concomitant action-related activity of dSPNs and iSPNs may convey similar reward information14,18, which cannot be explained by the classical model alone. In this study, the action-related activity of unidentified SPN populations was positively biased toward higher reward rate in both the DMS and DLS (Fig. 6), suggesting an enhancing effect of reward expectation on their activity. In particular, the action-related activities of the identified DMS- and DLS-iSPNs were positively correlated with the reward rate, suggesting a potential dissociation between iSPN activity and dopamine release timing or that reward expectation can override such dopamine effects all through mediolateral nigrostriatal and striato-pallidal systems (Fig. 5). The robustness observed in the iSPN activity could potentially offset the inhibitory influence of dopamine on these neurons, such as the anticipated inhibitory effect mediated through dopamine D2 receptors13. It is important to note that the modulation of iSPN activity by dopamine is not proposed to occur instantaneously but rather in correlation with the patterns of dopamine release over a broader time window. As seen in our results, the onset of dopamine release and changes in SNc DAN and iSPN activity do not align on a moment-to-moment basis, suggesting the impact of other factors in this complex dynamic. Dopamine effects can be multifaceted, impacting different aspects of neuronal function including excitability, synaptic plasticity, and long-term changes in gene expression, which are context dependent. It is worth emphasizing that our study does not imply a direct influence of dopamine on iSPN activity, but rather highlights the role of reward expectation in modulating neural activity, potentially through other parallel inputs. As evidenced by our observations, the precise temporal dynamics of this interplay remain an intriguing area for future investigation.
The processing of reward information, as manifested in the action-related activity, might be under the influence of local striatal control, as well as be affected by inputs other than dopamine (Fig. 6). This includes inputs from diverse brain regions such as the medial prefrontal cortex, anterior cingulate cortex, orbitofrontal cortex, motor cortex, thalamus, and globus pallidus. In particular, the role of intratelencephalic (IT) neurons, which constitute an important portion of the corticostriatal projections, cannot be overlooked. These IT neurons, which send projections from the cortex to the striatum, are known to play a critical role in information processing within the striatum, influencing the activity of SPNs during tasks such as the one we used in the present study. There is a large body of evidence suggesting the central role of IT neurons in modulating striatal activity, emphasizing their importance in our understanding of reward-related neuronal activity46,47. In addition, it has been observed that pyramidal tract-type neurons in the motor cortex provide collateral input to the dorsolateral striatum. This input may convey specific or biased information to the downstream striatal circuits48. This illustrates the complexity and multifaceted nature of inputs into the striatum, and the variety of potential sources that could influence the activity of SPNs in a task-dependent manner.
Furthermore, we evaluated the reward information processing in medial and lateral nigrostriatal systems. In rodents, the SNc and dorsal striatum are thought to have medial and lateral topographical and reciprocal connections reflecting different functions4,49; for example, mSNc-DANs encode signals related to motivational value, whereas lSNc-DANs responsible for motivationally related salience4,26. Likewise, DMS is involved in goal-directed behavior whereas DLS is involved in habitual behavior33,34,35. Nonetheless, our results suggest that during choice selection, nigral and striatal neuron systems convey similar reward information regardless of their medio-lateral topographical segregation, in agreement with dopamine signal restoration studies showing that many cognitive processes are shared by DMS and DLS50,51. Additionally, we observed a substantial number of neurons whose Go cue- and movement-related activity correlated with reaction time, which may suggest that the activation during this period is monitoring movement dynamics. Among all the identified populations, the DLS-SNc and DMS-SNc DANs had the higher proportion of units correlated with reaction time (Supplementary Fig. 8), which may suggest that the activation during this period is monitoring movement dynamics in addition to reward monitoring.
On the other hand, the outcome-related activity, typically differentiable according to reward or no-reward, was distinctly modified by recent rewards, depending on the outcome situation, neuron subtype, and area (Fig. 6, red and blue arrows). This discrepancy during the action and outcome periods suggest that reward rate may be processed in two different modes, action-related and outcome-related modes, by the same set of basal ganglia neurons. Moreover, during the outcome period, only a small number of units exhibited a correlation with reaction time or movement time (Supplementary Fig. 8) suggesting that contrary to the movement-related activity, the activity during outcome period is not related to monitoring movement dynamics, but with monitoring the reward or no-reward of selected action. The uniformity of the correlation pattern of movement period with reward rate could reflect an increase in the value of the external cue or actions, or an increase in motivation37,41,52, whereas the diversity during the outcome period may result from adaptive coordination for different brain functions such as outcome feedback, action selection18, state change53 and motivation54. Movement-related activity in the dorsal striatum as well as SNc may be driven by robust inputs including the cortical and thalamic inputs43,55, rendering it less susceptible to modulation by dopamine inputs. Still, the SPNs can be affected by dopamine from SNc DANs activated simultaneously (Fig. 6 and Supplementary Fig. 7). Previous studies suggest that phasic dopamine release in the dorsal striatum is crucial for regulating movement intensity, where increased motivation is linked to increased actions to obtain a reward56. Our findings echo this relationship, as we identified a similar interplay between movement-associated activity in DMS-SNc neurons and SPNs, hinting at a plausible link between dopamine levels and movement intensity under our experimental context. Also, the latency of the DANs firing activity was shorter than that of the dopamine signal response, as expected considering the time needed for dopamine release and uptake39. As for the SNc DANs, the reinforcement learning theory can consistently explain their enhanced action-related activity and reduced outcome-related activity as encoding of an RPE or temporal difference (TD) error19. This pattern indicates that the basal ganglia system may dynamically alter its integration mode, facilitating the real-time processing of past outcome information with each instance of action-outcome feedback.
Additionally, contrary to the movement period, during the outcome period (when the result of the action was presented), the identified DMS-iSPNs responded strongly to the no-reward outcome (Fig. 5d)18, while DMS-projecting SNc-DANs activity and dopamine signal decreased (Figs. 2b and 3c). These results indicate that the effect exerted by the dopamine on the DMS during the outcome period agrees with the classical models, exciting dSPN and suppressing iSPN. Nevertheless, we observed some discrepancies between the firing activity of DANs and the dopamine release measured by fiber photometry. In contrast to the DMS-projecting SNc-DAN responses to no reward (Fig. 3f, i), the outcome-related dopamine signal was higher when reward expectation was lower, consistent with the RPE (Fig. 2f, i). Discrepancies between DAN activity in the ventral tegmental area and dopamine release in the nucleus accumbens have also been reported in behaving rodents37. Another potential explanation is that the influence of the reward rate on dopamine release during the feedback period may be locally controlled. Previous studies revealed that TANs and FSIs play important roles in the processing of reward information57,58,59. For example, it was reported that cholinergic interneurons (corresponding to TANs) in the striatum may accomplish control of striatal dopamine signaling independent of the activity of midbrain cell bodies60. In our study, action-related activity of TANs and FSIs was positively correlated with reward rate, like other neuron types. Also, although the outcome response was subtle, putative DMS-TANs and FSIs exhibited a pattern influenced by the reward rate (Fig. 6).
In further studies, more precise examination of the activity of various neuron types with consideration of pathway-specific activity is essential for advancing our understanding of how reward and action information is integrated in the nigral and striatal neuron systems. Additionally, although there are some limitations in our study addressing a direct dependence of the neural activity on reward history, and further investigations that manipulate reward expectation directly are necessary, this study provides insights into the neural mechanisms underlying reward processing and highlights the potential role of reward expectation in modulating neural activity, shedding light on the neural mechanisms underlying the link between reward and motor control. In conclusion, we have defined the action-related activity of diverse nigral and striatal neuron types is positively enhanced by reward rate, providing further insights into the mechanism of the basal ganglia circuitry underlying reward-based action selection and execution.
Methods
Animals and surgery
All experiments were approved by the Animal Care and Use Committee of Tokyo Medical and Dental University (A2019-274, A2021-041), and were performed in accordance with the Fundamental Guidelines for Proper Conduct of Animal Experiment and Related Activities in Academic Research Institutions (MEXT, Japan) and the Guidelines for Animal Experimentation in Neuroscience (Japan Neuroscience Society). All surgical procedures were performed under appropriate isoflurane anesthesia, and all efforts were made to minimize suffering. Our procedures for animal experiments are described in our previous studies18,43.
Seven tachykinin precursor 1 (Tac1-Cre) transgenic rats (246 ± 28 g, all male), six dopamine D2 receptor-Cre (Drd2-Cre) transgenic rats (262 ± 24 g, all male), seven tyrosine hydroxylase-Cre (TH-Cre) transgenic rats (269 ± 38 g, all male), and six wild-type (WT) Long-Evans rats (248 ± 31 g, all male) were kept in their home cages under an inverted light schedule (lights off at 9:00 A.M.; lights on at 9:00 P.M.). The rats were briefly handled by the experimenter (10 min, twice) in advance. For head-plate (CFR-2, Narishige) implantation, the animals were anesthetized with isoflurane (4.5% for induction and 2.0–2.5% for maintenance, Pfizer) using an inhalation anesthesia apparatus (Univentor 400 anesthesia unit, Univentor) and were placed on a stereotaxic frame (SR-10R-HT, Narishige). Lidocaine (Astra Zeneca) was administered around the surgical incisions. Reference and ground electrodes (Teflon-coated silver wires, A-M Systems; 125 μm diameter) were implanted above the cerebellum. During anesthesia, body temperature was maintained at 37 °C using an animal warmer (BWT-100, Bio Research Center). Analgesics and antibiotics were applied postoperatively, as required (meloxicam, 1 mg/kg s.c., Boehringer Ingelheim; gentamicin ointment, 0.1%, MSD).
After full recovery from surgery (3–7 days later), the rats had ad libitum access to water during the weekends, but during the rest of the week they obtained water only when they performed the task correctly. When necessary, agar was given to the rats in their home cage to maintain them at >85% of original body weight55.
Vector injection
For the optogenetic identification and manipulation experiments, 1 μl of AAV2-EF1α-Flex-ChRWR/Venus (titer: 2.06 × 1012 vg/mL) or AAV2-Syn-Flex-rcChrimsonR-tdTomato (titer: 8.32 × 1012 vg/ml) vector was injected into the DMS (A: +1.0, L: +2.3, DV: +3.7 mm from bregma), DLS (A: −1.0, L: +4.5, DV: +4.6), or SNc (A: −5.5, L: +1.8, DV: +7.5). For the fiber photometry experiments, 0.5 μl of AAV9-Syn-dLight1.3b (1 × 10¹³ vg/mL; Addgene, MA, USA) was injected into the DMS (A: +1.0, L: +2.3, DV: +3.7) or DLS (A: −1.0, L: +4.5, DV: +4.6). The vector solution was slowly injected (200 nl/min) through a 50 μm tip glass capillary using a syringe pump (Legato 100; KD Scientific) with a 10 μl Hamilton syringe. After the injection, the capillary was left in place for 10 min. The experiments were performed at least 3 weeks after the injections.
Behavioral choice task for probabilistic reward
The push/pull choice task based on probabilistic reward was performed by head-fixed rats in a sound-attenuated chamber shielded from light and electricity (60 × 75 × 60 cm, custom-made by O’Hara & Co.). The rats obtained water as a reward (5 μl, three times in 0.3 s, dispensed by a micropump) from a spout in front of their mouth. Orofacial movements related to the consumption of the reward water were monitored by measuring the torque on the spout (KFG-2N amplified by DPM-711B; Kyowa). On the first day of training, the rats were rewarded immediately after they grasped a lever and after they either pushed it forward or pulled it back. The lever glided over a rail of the acrylic plate and returned automatically toward the center position if released. The rats adjusted the lever to the center position to start the next trial. Stoppers for push and pull targets were separated by 10 degrees. On consecutive training days, after the animals were able to achieve rewards 300‒500 times, the lever push or pull was followed by a high-tone sound (instructing a reward outcome; 10 kHz, 60 dB, 0.3 s), whereas the opposite movement was followed by a low-tone sound (instructing a no-reward outcome; 4 kHz, 60 dB, 0.3 s). The 10 kHz tone was followed by delivery of reward water with a delay of 0.3 s, whereas the 4 kHz tone was not followed by a reward. Over the course of 3 days, most rats learned to choose the reward action consistently after about ten trials. Next, before pushing or pulling, the rats needed to hold the lever at the center position for an incremental duration of up to 0.4 ± 0.1 s (Hold 1, pre-Go, Fig. 1a), and then push or pull the lever in response to the Go signal (chamber light, 30 lux). After reaching the push or pull target, rats were required to keep holding the lever for 0.3 s. The Go light stayed on until 1 s after the reward or no-reward tone onset but turned off when the rats erroneously moved or released the lever. As a final step, the high- and low-tone sounds indicating reward and no reward, respectively, became probabilistic for each movement (rewarded at 70% for push, and at 10% for pull in Fig. 1a). There were two blocks of trials with different action–outcome contingencies (push block: push 70% versus pull 10%; pull block: push 10% versus pull 70%), and these were alternated when the rats selected the high-value option in more than 79% of the previous ten trials (about 30‒50 trials). Task training and recording of neuronal and behavioral data continued for 3–4 h a day (total: at least 15 training days in 3 weeks). The lever position was always monitored by the task-training system through an 8-bit encoder.
Behavioral analysis
The reaction time was defined as the point where the cumulative sum of the lever trajectory drifts more than three standard deviations beyond mean trajectory during the holding period. The movement time was defined as the time from movement onset until the lever entered the pull or push area. The contributions of past rewards or lack of rewards on the animals’ current choice were analyzed on a trial-by-trial basis using the following logistic regression model:
where PS(i) is the probability of push selection in the ith trial, and n indicates the number of past trials included in the model (n = 8). The regression coefficients βjReward and βjNo-Reward represent the contributions of past rewards and lack of rewards, respectively, while β0 indicates the intrinsic bias of the animal.
Fiber photometry
The genetically encoded optical dopamine sensor dLight 1.3 was expressed in dorsal striatum using a viral vector approach. In the same surgery, a fiber optic cannula (200 μm core, Thorlabs) was inserted (~100 μm in-depth dorsal to the AAV injection site) and fixed using dental cement. For signal acquisition, custom-written LabVIEW software was modified to control the recording hardware. LEDs of 470 nm (for dLight) and 405 nm (for isosbestic signal) were alternately turned on and off at 40 Hz, and bulk fluorescence was acquired using a photomultiplier (H10721; Hamamatsu, photonics). The signal was sampled at 1 kHz with a data acquisition device (NI USB 6211) and downsampled to 40 Hz for further analysis. The acquired photometry data were processed with custom-written code in MATLAB. First, a fitting curve was estimated and subtracted from the original signal to remove exponential and linear signal decay during the recording session. A linear fit was applied to align the 405 nm signal to the 470 nm signal, and then the fitted 405 nm signal was subtracted from the 470 nm channel and divided by the fitted 405 nm signal to calculate ΔF/F values. The ΔF/F time-series trace was normalized using z-scores to account for data variability across animals and sessions61,62. Representative 470 nm and 405 nm signals are shown separately in Fig. 2b.
Electrophysiological recording
Supported by agarose gel (2% agarose-HGT, Nacalai Tesque) on the brain, a 64-channel silicon probe (Isomura64-4x-tet-lin-A64, with 16 tetrode-like arrangements on four shanks; NeuroNexus Technologies) was inserted vertically into the left DMS (A: +1.0, L: +2.3, DV: +3.7 mm from bregma), DLS (A: −1, L: 4.5, DV: 4.6 from bregma), or SNc (A: −5.5, L: 0.8 ~ 2.4, DV: 7.5 from bregma) using a micromanipulator (SM-15A, Narishige) on a stereotaxic frame (SR-10R-HT, Narishige). Multineuronal (multiple isolated single unit) recordings were performed during the conditioning task. The neuronal activity was amplified by two main amplifiers (32 channel: FA-32, Multi-Channel Systems; final gain, 2000; band-pass filter, 0.5 Hz to 10 kHz) through two 32-channel head-stage miniature preamplifiers (MPA 32I, Multi-Channel Systems; gain 10). The amplified signals were digitized at 20 kHz with two 32-channel hard-disk recorders (LX-120, TEAC).
Spike isolation
Raw signal data were processed offline to isolate the spike events of individual neurons in each tetrode. Spike candidates were detected and clustered using EToS semiautomatic spike-sorting software63,64. Using the manual clustering software Klusters and the viewing software NeuroScope65, spike clusters were further combined, divided, and/or discarded manually, to refine single-neuron clusters according to the presence of refractory periods (<2 ms) in the autocorrelograms and the absence of refractory periods in cross-correlograms with other clusters.
Optogenetic identification of DANs and iSPNs
After completing a session of recording neuronal activity during the operant conditioning task, we investigated whether the recorded neurons were responsive to light stimulation. To identify striatum-projecting SNc neurons, we used the light-evoked collision test, which was previously established18,43,44,55 (see Fig. 3b and Supplementary Fig. 5). Briefly, before the insertion of silicon probes, an optical fiber for stimulation was placed on either DMS or DLS using a micromanipulator (SM-25A; Narishige). To evoke antidromic spikes in specific axonal projections, a blue LED light pulse (intensity, 5–10 mW; duration, 0.5–2 ms, typically 1 ms) was applied through each of the two optical fibers using an ultra-high-power LED light source (see below) and a stimulator. To be classified as projecting neurons, neurons were required to meet several criteria, including constant latency, high frequency following (frequency-following test, two pulses at 100–250 Hz), and collision tests66. The collision test was visually confirmed online to accumulate spike collision data that would be sufficient for post hoc analysis. After offline sorting for spike isolation, we compared filtered tetrode (four-channel) traces that had no spikes before the stimulus (see Fig. 3b, black, control traces) with those that had a spike in one spike cluster (see Fig. 3b, yellow test traces). MATLAB (The MathWorks) was used for these comparisons. If we found antidromic-like (all-or-none and no jittering) spike activities with constant latency (<25 ms) in many of the control traces (Supplementary Fig. 5a, b), we set a time window for counting possible antidromic spikes based on a clear dissociation between averaged control and test traces due to the presence or absence of spikes. A cutoff threshold defined from a receiver operating characteristic (ROC) curve for the distribution of the most negative points within the time window was used to determine whether the spikes were present or not, allowing us to obtain spike and no-spike counts in the control and test events. According to this method, we included spike clusters with control spike probability >50% and test spike probability less than half that of the control. Finally, the passing of the collision test was statistically justified by a 2 × 2 chi-square test (p < 0.05) for spike and no-spike counts in control and test events. The latency of antidromic spikes was defined as the time from the onset of stimulation to the median (the second quartile, 50%) of their peak positions within the time window, and their jitter was defined as the time between the first (25%) and third (75%) quartiles of their peak positions within the time window. In this way, we judged these spikes to be antidromic or not based on the collisional disappearance of antidromic spikes (collision test), as well as their all-or-none properties, absence of jitter (constant latency test; <0.5 ms), and high reliability (frequency-following test; if applicable in the tentative collision test).
To identify iSPNs, an optical fiber was inserted into the external segment of the globus pallidus (GPe; A: −2.0, L: 3.0, DV: 5.5) in six Drd2-Cre rats to stimulate axons of ChR- or ChrimsonR-expressing striatopallidal neurons. A single light pulse (460 nm or 595 nm, <10 mW, 1.0 ms duration) was delivered through the optical fiber (FT400EMT, FC, Thorlabs; NA, 0.39; internal/external diameter, 400/425 μm). The light pulses were generated by an ultra-high-power LED light source (UHP-Mic-LED-460/595, FC; Prizmatix) triggered by a stimulator (SEN-8203; Nihon Kohden) every 3 s18. We considered action potentials to have been elicited by the light pulse if they occurred at a latency shorter than 10 ms with short jittering (<1 ms). We calculated the correlation coefficient for four-channel patterns of mean spike waveforms (0.125 ms before and after the spike peak) between spontaneous and light-evoked spikes and plotted it against the energy of light-evoked responses for each neuron. The light-evoked spike waveforms needed to appear almost identical (correlation coefficient >0.9) to spontaneously occurring waveforms. For the dSPNs identification, a single light pulse was delivered through the optical fiber in the DMS or DLS (adjacent to recording sites) in Tac1-Cre rats to stimulate axons of ChR- or ChrimsonR-expressing striatonigral neurons. We considered action potentials to have been elicited by the light pulse if they occurred at a shorter latency than 8 ms with small jittering (<1 ms). We performed similar waveform correlation analysis as for iSPNs identification (Figs. 4b, c and 5b, c).
Selection of putative SNc and striatal neurons
For the selection of putative SNc DANs and GABAergic interneurons, the spike duration of the SNc recorded neurons was defined as the time from spike onset to the first positive peak of the aligned averaged spike waveform. Different subpopulations formed two clearly separable clusters. Non-optogenetically identified SNc neurons with a spike duration <800 µs were classified as non-dopamine cells. We further subdivided putative DMS- or DLS-projecting neurons according to their mediolateral recording location (medial: 0.8–1.8 mm; lateral: 1.8–2.4 mm from bregma). To ensure that recorded neurons were located in the SNc, we only analyzed cells recorded during sessions with at least one optically identified dopamine cell.
Striatal nonoptically identified task-related neurons were further subdivided into putative SPNs, TANs, or FSIs using k-means clustering applied to three parameters: 1) ongoing spike rate, 2) peak-to-valley width of the four-channel average spike waveform, and 3) coefficient of variation (CV) of the interspike interval (ISI). The different subpopulations formed three clearly separable clusters67 (Supplementary Fig. 6a, b). The silhouette technique was used to evaluate the cluster separation (average silhouette value = 0.603). The waveform peak-to-valley widths were calculated using the waveforms of the whole recording periods. SPN waveforms were typically wide and displayed a long positive second phase. SPNs and TANs were further subdivided into putative DMS- and DLS-SPNs, and DMS- and DLS-TANs, according to the recording site. We did not perform further subdivisions of FSI neurons because of their small numbers.
Immunofluorescence
Brains were transcardially fixed with 4% paraformaldehyde/0.1 M phosphate buffer under deep anesthesia and sectioned with a vibratome (VT1000S, Leica, Wetzlar, Germany). Brain sections were incubated with specific primary antibody, namely, polyclonal anti-GFP (RRID: AB_221569; 1:1000), polyclonal anti-RFP (AB_10013483; 0.25 mg/ml, 1:1000), or polyclonal anti-TH (AB_741693; 1 mg/ml, 1:1000), followed by species-specific secondary antibodies conjugated with Alexa Fluor 488 (AB_2556542; 2 mg/ml, 1:500) or Alexa Fluor 555 (AB_2762834; 2 mg/ml, 1:500; Invitrogen, ThermoFisher). Images were acquired on a confocal microscope (FV1000, Olympus, Tokyo, Japan).
Statistics and reproducibility
To determine whether a neuron showed a significant action-related and/or outcome-related response, we applied the Wilcoxon rank‒sum test to trial-by-trial firing rates during the baseline period (2–3 s before the Go-cue), 0.3 s period after the Go-cue (Go window), 0.3 s period after movement onset (movement window), and 0.5 s period after the onset of reward and no-reward tones (outcome tone window). The perievent time histograms (PETH; bin width, 20 ms) were smoothed by averaging across a 20 ms sliding window with a 10 ms step. Rewarded (red) and non-rewarded (blue) responses were compared using Student’s t test. Because of the considerable variation in background firing rates of individual neurons, we used z-score normalization to measure neuronal responses. To measure responses to movement onset and outcomes, the discharge rates from a given task period were adjusted by subtracting the mean firing rate of the baseline period and then dividing the difference by the standard deviation of the baseline period. After selecting the task-related neurons using the above-described criteria, we determined whether the firing rate was preferentially related to a specific action (push or pull) by comparing the mean activity after pushing or pulling during the movement window. For this further analysis, we selected only neurons that had a spike rate difference of at least 5% between push and pull trials. We only considered the preferred action activity for the reward rate analysis. To quantify the degree of recently obtained rewards, the reward rate was calculated as the mean number of rewards received in the last five trials (without including current trial), i.e., the number of rewards divided by five (reward = 1; no-reward = 0), irrespective of the selection. This allowed us to quantify the average rewards received recently, providing a window into the animal’s immediate reward history. We obtain the Pearson’s correlation coefficient to investigate the relationship between the reward rate and neuronal activity (expressed as a z-score) in individual neurons. We plotted these correlations, with the reward rate on the x-axis and the corresponding z-scored neuronal activity on the y-axis. The slope of the lines in these plots represents the strength and direction of the correlation; positive slopes indicate that neuronal activity increases with higher reward rate, while negative slopes suggest the activity decreases with higher reward rate. We assessed the distribution of correlation coefficient (r) for a particular neuronal population at a specific task event. And compared the distribution of r values and 0 using Wilcoxon rank-sum test (Figs. 2–5). Additionally, we visualized a tendency of the distribution of correlation coefficients in a boxplot for each dataset. On each box plot, the central mark indicates the median, the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers show the extreme data points not considered outliers. Supplementary Table 1 shows statistical tests of dLight signal and neuron types. To compare the strength of the correlation between two or more populations, we used the Fisher’s z-test of the correlation coefficients with Bonferroni correction for multiple comparison.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Data needed to evaluate the conclusions in the paper are present in the paper and/or the supplementary materials. Source data underlying figures can be consulted from: https://doi.org/10.5281/zenodo.8282962.
References
DeLong, M. R. Primate models of movement disorders of basal ganglia origin. Trends Neurosci. 13, 281–285 (1990).
Howe, M. W. & Dombeck, D. A. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510 (2016).
da Silva, J. A., Tecuapetla, F., Paixao, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).
Schultz, W. Multiple dopamine functions at different time courses. Annu. Rev. Neurosci. 30, 259–288 (2007).
Kemp, J. M. & Powell, T. P. The site of termination of afferent fibres in the caudate nucleus. Philos. Trans. R. Soc. Lond. B Biol. Sci. 262, 413–427 (1971).
Albin, R. L., Young, A. B. & Penney, J. B. The functional anatomy of basal ganglia disorders. Trends Neurosci. 12, 366–375 (1989).
Gerfen, C. R. et al. D1 and D2 dopamine receptor-regulated gene expression of striatonigral and striatopallidal neurons. Science 250, 1429–1432 (1990).
Alexander, G. E. & Crutcher, M. D. Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends Neurosci. 13, 266–271 (1990).
Nambu, A. Seven problems on the basal ganglia. Curr. Opin. Neurobiol. 18, 595–604 (2008).
Kravitz, A. V., Tye, L. D. & Kreitzer, A. C. Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nat. Neurosci. 15, 816–818 (2012).
Shin, J. H., Kim, D. & Jung, M. W. Differential coding of reward and movement information in the dorsomedial striatal direct and indirect pathways. Nat. Commun. 9, 404 (2018).
Ferguson, S. M. et al. Transient neuronal inhibition reveals opposing roles of indirect and direct pathways in sensitization. Nat. Neurosci. 14, 22–24 (2011).
Gerfen, C. R. & Surmeier, D. J. Modulation of striatal projection systems by dopamine. Annu. Rev. Neurosci. 34, 441–466 (2011).
Isomura, Y. et al. Reward-modulated motor information in identified striatum neurons. J. Neurosci. 33, 10209–10220 (2013).
Cui, G. et al. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature 494, 238–242 (2013).
Klaus, A. et al. The spatiotemporal organization of the striatum encodes action space. Neuron 95, 1171–1180.e1177 (2017).
Sheng, M. J., Lu, D., Shen, Z. M. & Poo, M. M. Emergence of stable striatal D1R and D2R neuronal ensembles with distinct firing sequence during motor learning. Proc. Natl Acad. Sci. USA 116, 11038–11047 (2019).
Nonomura, S. et al. Monitoring and updating of action selection for goal-directed behavior through the striatal direct and indirect pathways. Neuron 99, 1302–1314.e1305 (2018).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Satoh, T., Nakai, S., Sato, T. & Kimura, M. Correlated coding of motivation and outcome of decision by dopamine neurons. J. Neurosci. 23, 9913–9923 (2003).
Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).
Mazzoni, P., Hristova, A. & Krakauer, J. W. Why don’t we move faster? Parkinson’s disease, movement vigor, and implicit motivation. J. Neurosci. 27, 7105–7116 (2007).
Nair-Roberts, R. et al. Stereological estimates of dopaminergic, GABAergic and glutamatergic neurons in the ventral tegmental area, substantia nigra and retrorubral field in the rat. Neuroscience 152, 1024–1031 (2008).
Cox, J. & Witten, I. B. Striatal circuits for reward learning and decision-making. Nat. Rev. Neurosci. 20, 482–494 (2019).
Lerner, T. N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).
Parker, N. F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).
Collins, A. L. & Saunders, B. T. Heterogeneity in striatal dopamine circuits: Form and function in dynamic reward seeking. J. Neurosci. Res. 98, 1046–1069 (2020).
Matsumoto, M. & Hikosaka, O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841 (2009).
Balleine, B. W., Liljeholm, M. & Ostlund, S. B. The integrative function of the basal ganglia in instrumental conditioning. Behav. Brain Res. 199, 43–52 (2009).
Redgrave, P. et al. Goal-directed and habitual control in the basal ganglia: implications for Parkinson’s disease. Nat. Rev. Neurosci. 11, 760–772 (2010).
McGeorge, A. J. & Faull, R. L. The organization of the projection from the cerebral cortex to the striatum in the rat. Neuroscience 29, 503–537 (1989).
Pan, W. X., Mao, T. & Dudman, J. T. Inputs to the dorsal striatum of the mouse reflect the parallel circuit architecture of the forebrain. Front. Neuroanat. 4, 147 (2010).
Yin, H. H., Ostlund, S. B., Knowlton, B. J. & Balleine, B. W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).
Ito, M. & Doya, K. Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks. J. Neurosci. 35, 3499–3514 (2015).
Yin, H. H. & Knowlton, B. J. The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7, 464–476 (2006).
Tai, L. H., Lee, A. M., Benavidez, N., Bonci, A. & Wilbrecht, L. Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nat. Neurosci. 15, 1281–1289 (2012).
Mohebi, A. et al. Dissociable dopamine dynamics for learning and motivation. Nature 570, 65–70 (2019).
Matsushita, N. et al. Catecholaminergic cell type-specific expression of Cre recombinase in knock-in transgenic rats generated by the Combi-CRISPR technology. J. Neurosci. Methods 381, 109707 (2022).
Chergui, K., Suaud-Chagny, M. F. & Gonon, F. Nonlinear relationship between impulse flow, dopamine release and dopamine elimination in the rat brain in vivo. Neuroscience 62, 641–645 (1994).
Tsutsui-Kimura, I. et al. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. Elife 9, e62390 (2020).
Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
Johansson, Y. & Silberberg, G. The functional organization of cortical and thalamic inputs onto five types of striatal neurons is determined by source and target cell identities. Cell Rep. 30, 1178–1194.e1173 (2020).
Rios, A. et al. Differential changes in the lateralized activity of identified projection neurons of motor cortex in hemiparkinsonian rats. eNeuro 6, 0110–19 (2019).
Saiki, A. et al. In Vivo spiking dynamics of intra- and extratelencephalic projection neurons in rat motor cortex. Cereb. Cortex 28, 1024–1038 (2018).
Soma, S. et al. Ipsilateral-dominant control of limb movements in rodent posterior parietal cortex. J. Neurosci. 39, 485–502 (2019).
Guo, Q. et al. Whole-brain mapping of inputs to projection neurons and cholinergic interneurons in the dorsal striatum. PLoS ONE 10, e0123381 (2015).
Kress, G. J. et al. Convergent cortical innervation of striatal projection neurons. Nat. Neurosci. 16, 665–667 (2013).
Nelson, A., Abdelmesih, B. & Costa, R. M. Corticospinal populations broadcast complex motor signals to coordinated spinal and striatal circuits. Nat. Neurosci. 24, 1721–1732 (2021).
Ikemoto, S. Dopamine reward circuitry: two projection systems from the ventral midbrain to the nucleus accumbens-olfactory tubercle complex. Brain Res. Rev. 56, 27–78 (2007).
Darvas, M. & Palmiter, R. D. Restriction of dopamine signaling to the dorsolateral striatum is sufficient for many cognitive behaviors. Proc. Natl Acad. Sci. USA 106, 14664–14669 (2009).
Darvas, M. & Palmiter, R. D. Restricting dopaminergic signaling to either dorsolateral or medial striatum facilitates cognition. J. Neurosci. 30, 1158–1165 (2010).
Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005).
Starkweather, C. K., Gershman, S. J. & Uchida, N. The medial prefrontal cortex shapes dopamine reward prediction errors under state uncertainty. Neuron 98, 616–629 (2018).
Smith, K. S., Berridge, K. C. & Aldridge, J. W. Disentangling pleasure from incentive salience and learning signals in brain reward circuitry. Proc. Natl Acad. Sci. USA 108, E255–E264 (2011).
Soma, S. et al. Distinct laterality in forelimb-movement representations of rat primary and secondary motor cortical neurons with intratelencephalic and pyramidal tract projections. J. Neurosci. 37, 10904–10916 (2017).
Panigrahi, B. et al. Dopamine is required for the neural representation and control of movement vigor. Cell 162, 1418–1430 (2015).
Phelps, P. E., Houser, C. R. & Vaughn, J. E. Immunocytochemical localization of choline acetyltransferase within the rat neostriatum: a correlated light and electron microscopic study of cholinergic neurons and synapses. J. Comp. Neurol. 238, 286–307 (1985).
Yamada, H., Matsumoto, N. & Kimura, M. Tonically active neurons in the primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action. J. Neurosci. 24, 3500–3510 (2004).
Berke, J. D. Functional properties of striatal fast-spiking interneurons. Front. Syst. Neurosci. 5, 45 (2011).
Liu, C. et al. An action potential initiation mechanism in distal axons for the control of dopamine release. Science 375, 1378–1385 (2022).
Martianova, E., Aronson, S. & Proulx, C. D. Multi-fiber photometry to record neural activity in freely-moving animals. J. Vis. Exp. https://doi.org/10.3791/60278 (2019).
Robinson, J. E. et al. Optical dopamine monitoring with dLight1 reveals mesolimbic phenotypes in a mouse model of neurofibromatosis type 1. Elife 8, e48983 (2019).
Takekawa, T., Isomura, Y. & Fukai, T. Accurate spike sorting for multi-unit recordings. Eur. J. Neurosci. 31, 263–272 (2010).
Takekawa, T., Isomura, Y. & Fukai, T. Spike sorting of heterogeneous neuron types by multimodality-weighted PCA and explicit robust variational Bayes. Front. Neuroinform. 6, 5 (2012).
Hazan, L., Zugaro, M. & Buzsáki, G. Klusters, NeuroScope, NDManager: a free software suite for neurophysiological data processing and visualization. J. Neurosci. Methods 155, 207–216 (2006).
Lipski, J. Antidromic activation of neurones as an analytic tool in the study of the central nervous system. J. Neurosci. Methods 4, 1–32 (1981).
Yael, D. et al. Haloperidol-induced changes in neuronal activity in the striatum of the freely moving rat. Front. Syst. Neurosci. 7, 110 (2013).
Acknowledgements
This work was supported by Brain/MINDS (JP19dm0207089 to Y.I.; JP18dm0307005 and JP21dm0207115 to A.N.; JP22dm0207113 to K.K) from AMED; by CREST (JPMJCR1751 to Y.I.) from JST; by Grants-in-Aid for Scientific Research on Innovative Areas (JP16H06276 and JP20H05053 to Y.I.; JP18H05524 to Y.S.) and Transformative Research Areas (A) (JP21H05242 to Y.I; JP22H05497 to S.N.; JP23H04688 to A.N.) from MEXT; by Scientific Research (B) (JP19H03342 and JP23H02589 to Y.I.; JP23H02594 to A.N.); Fund for the Promotion of Joint International Research (B) (JP19KK0193 to A.N.) and Early-Career Scientists (JP21K15184 to A.R.; JP19K16300 and JP22K15226 to S.N.) from JSPS; and by the Takeda Science Foundation (Y.I.).
Author information
Authors and Affiliations
Contributions
Conceptualization: A.R., S.N., Y.S., M.K., and Y.I.; Methodology: A.R., S.N., S.K., J.Y., N.M., R.H., Y.S., M.K., and Y.I.; Investigation: A.R. and S.N.; Visualization: A.R. and S.N.; Supervision: A.N., M.T., K.K., Y.S., M.K., and Y.I.; Writing—original draft: A.R., S.N., and Y.I.; Writing—review & editing: A.R., S.N., and Y.I.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Charles Gerfen and Pavel Rueda-Orozco for their contribution to the peer review of this work. Primary Handling Editors: Daniel Bendor and George Inglis. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rios, A., Nonomura, S., Kato, S. et al. Reward expectation enhances action-related activity of nigral dopaminergic and two striatal output pathways. Commun Biol 6, 914 (2023). https://doi.org/10.1038/s42003-023-05288-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-023-05288-x
- Springer Nature Limited
This article is cited by
-
Free will: An Example of the Dopaminergic System
Integrative Psychological and Behavioral Science (2024)