5-HT2C receptor perturbation has bidirectional influence over instrumental vigour and restraint

The serotonin (5-HT) system, particularly the 5-HT2C receptor, has consistently been implicated in behavioural control. However, while some studies have focused on the role 5-HT2C receptors play in regulating motivation to work for reward, others have highlighted its importance in response restraint. To date, it is unclear how 5-HT transmission at this receptor regulates the balance of response invigoration and restraint in anticipation of future reward. In addition, it remains to be established how 5-HT2C receptors gate the influence of internal versus cue-driven processes over reward-guided actions. To elucidate these issues, we investigated the effects of administering the 5-HT2C receptor antagonist SB242084, both systemically and directly into the nucleus accumbens core (NAcC), in rats performing a Go/No-Go task for small or large rewards. The results were compared to the administration of d-amphetamine into the NAcC, which has previously been shown to promote behavioural activation. Systemic perturbation of 5-HT2C receptors—but crucially not intra-NAcC infusions—consistently boosted rats’ performance and instrumental vigour on Go trials when they were required to act. Concomitantly, systemic administration also reduced their ability to withhold responding for rewards on No-Go trials, particularly late in the holding period. Notably, these effects were often apparent only when the reward on offer was small. By contrast, inducing a hyperdopaminergic state in the NAcC with d-amphetamine strongly impaired response restraint on No-Go trials both early and late in the holding period, as well as speeding action initiation. Together, these findings suggest that 5-HT2C receptor transmission, outside the NAcC, shapes the vigour of ongoing goal-directed action as well as the likelihood of responding as a function of expected reward. Supplementary Information The online version contains supplementary material available at 10.1007/s00213-021-05992-8.


Introduction
The central neurotransmitter serotonin (5-HT) has been implicated in the motivational control of behaviour (McElroy et al. 1982;Soubrié 1986;Kulichenko and Pavlenko 2004;Cools et al. 2011). Transmission at the 5-HT 2C receptor, in particular, may play a crucial role in this through the regulation of instrumental vigour-i.e. the energisation of physical goal-directed response, evident as a change in the speed, amplitude, or frequency of movements (Salamone et al. 2007;Dudman and Krakauer 2016). For instance, depletion of central 5-HT or antagonism of the 5-HT 2C receptor both speed responding and increase willingness to work for reward Fletcher et al. 2007;Simpson et al. 2011;Bailey et al. 2016Bailey et al. , 2018Browne et al. 2017;Silveira et al. 2020). This has resulted in 5-HT 2C receptors being considered a possible target for the treatment of disorders of motivation such as apathy Simpson et al. 2011;Bailey et al. 2016Bailey et al. , 2018Browne and Fletcher 2016;Browne et al. 2017).
However, largely separate literature has highlighted a key function for intact 5-HT signalling-and again, the 5-HT 2C receptor in particular-in enabling appropriate response restraint. The tonic firing of 5-HT neurons increases whilst waiting for reward and decays in the period before an animal ceases to wait (Miyazaki et al. 2011), and depletion of central 5-HT or administration of a 5-HT 2C antagonist can also increase inappropriate motor responses in rodents, particularly in anticipation of reward (Winstanley et al. 2004b;Fletcher et al. 2007;Robinson et al. 2008;Agnoli and Carli 2012;Quarta et al. 2012;Adams et al. 2017;Higgins et al. 2020;Silveira et al. 2020). Therefore, a fundamental yet unaddressed question is under what circumstances transmission at 5-HT 2C receptors promotes action over inaction and how this might interact with potential future rewards.
One possible reason for this lack of clarity is that most studies to date have required animals to work for a constant reward; yet, the activity of 5-HT neurons is modulated both by reward magnitude and reward context (Cohen et al. 2015;Li et al. 2016;Matias et al. 2017). A second issue is that the study of action vigour often uses internally guided instrumental paradigms, whilst those investigating the role of serotonin in action restraint have predominantly used stimulus-driven tasks. Such differences are likely to be important as 5-HT has been implicated in gating sensory processing (Petzold et al. 2009), and it has recently been shown that administration of a 5-HT 2C receptor antagonist can specifically reduce the influence of cues over decision-making policies (Adams et al. 2017). An important open question concerns whether the influence of 5-HT 2C in behavioural control depends on the level of reward expectation and if this varies as a function of temporal proximity to rewardpredicting cues.
Therefore, to better understand the role of 5-HT 2C receptors in shaping the influence of environmental stimuli and reward expectations on action initiation and restraint, we investigated the effect of SB242084a functionally selective ligand that is considered to be an antagonist at 5-HT 2C receptors (Kennett et al. 1997;Di Matteo et al. 2000;De Deurwaerdere et al. 2004)on rats' performance of a Go/No-Go task designed to separate action requirements from the value of acting  (Syed et al. 2016). We predicted that while disruption of 5-HT 2C receptors would invigorate instrumental responding on Go trials, it might in tandem impair action restraint for reward on No-Go trials.
After confirming the effect of systemic administration, we further examined whether such effects might be mediated specifically via 5-HT 2C transmission in the NAcC. The NAcC is a key site for regulating how motivation translates into action (Robbins and Everitt 1992;Pothuizen et al. 2005;Pattij et al. 2007;du Hoffmann and Nicola 2014;Floresco 2015;Ko and Wanat 2016;Syed et al. 2016). While the majority of studies have focused on how dopamine regulates these processes, the NAcC also receives notable serotonin input (Azmitia and Segal 1978;Vertes 1991) and expresses 5-HT 2C receptors (Pazos et al. 1985;Eberle-Wang et al. 1997;De Deurwaerdère et al. 2013). Further, disruption of 5-HT 2C transmission in the NAcC, but not in medial prefrontal regions, has been shown to increase premature responding on the 5-choice serial reaction time task (5-CSRTT) (Robinson et al. 2008). Therefore, we compared the effects of SB242084 into the NAcC with the administration of d-amphetamine, under the hypothesis that both drugs might weaken action restraint for reward.

Subjects
All procedures were carried out in accordance with the UK Animals (Scientific Procedures) Act (1986) and its associated guidelines. A total of 26 group-housed male Sprague-Dawley rats (Envigo, UK), aged ~ 2 months at the beginning of training, were used. Animals were maintained on a 12-h light/dark cycle (lights on 07.00). During testing, rats were food-restricted to maintain ~ 85-90% of their free-feeding weight. Water was available ad libitum in their home cages. For the NAcC infusion studies, animals were implanted with bilateral guide cannulae (Plastics One) 1.5 mm above the target site of the NAcC (AP: + 1.4 mm, ML: ± 1.7 mm, DV: -6.0 mm from skull surface) under isoflurane anaesthesia and secured with dental acrylic.

Apparatus
Testing was car r ied out in operant chambers (30.5 × 24.1 × 29.2 cm; Med Associates) (Fig. 2a). Each chamber was housed within a sound-attenuating cabinet ventilated with a fan, which provided a constant background noise of ~ 59 dB. Each chamber contained two retractable levers, 9.5 cm of either side of a central nose poke, which was fitted with an infrared beam signalling when animals entered the receptacle. The wall opposite contained a food magazine into which 45-mg sucrose pellets (Sandown Scientific, UK) could be dispensed. Each chamber was also fitted with a house-light and a speaker for delivering auditory stimuli.

Paradigm
The task design is shown in Figs. 1 and 2a, b. The task required animals to use an auditory cue to guide whether to initiate a response and press either the left or right lever (Go left/right) or to withhold responding (No-Go) to gain either a small or large reward. A trial was initiated when the rat voluntarily entered and stayed in a central nose poke for 0.3-0.7 s. This triggered the presentation of one of four auditory cues, which signalled (a) the action requirement (Go left/right or No-Go) and (b) the reward for a correctly performed trial (small or large, respectively, 1 or 2 sucrose pellets) (Fig. 2b). Go trials required animals to make two presses on the correct lever within 5 s of cue onset (Figs. 1a and 2b). On No-Go trials, animals were required to remain in the nose poke for the No-Go hold period (Fig. 1b). While No-Go trials always posed the same requirement regardless of the promised reward, the left/right mapping on Go trials was consistently associated with a specific reward size (small/large), with the side and reward associations counterbalanced across the cohort. This allowed us to independently manipulate action requirement and reward expectation, and additionally, to assess how this influenced the direction of motor responses on Go trials. Successful trials caused reward to be delivered to a food magazine on the opposite wall of the chamber.

Training
To teach rats to respond or withhold actions to go and no-go cues, animals were trained as described by (Syed et al. 2016). Briefly, animals were first habituated to the operant chamber and learned to retrieve pellet rewards from the food magazine tray. Rats then commenced training initially exclusively with the No-Go trial type. Over several sessions, they were gradually trained to make and hold a response in the central nose poke, such that on No-Go trials, they were eventually able to withhold responding during a 0.3-to 0.7-s pre-cue nose poke hold period and a subsequent No-Go hold period (Cohort 1, systemic SB242084: 1.7-1.9 s; Cohort 2, local SB242084, d-amphetamine and systemic SB242084 replication: 1.5-1.7 s) to gain a reward (1 or 2 sucrose pellets, respectively, for "No-Go small" or "No-Go large" trials). The cue was either a tone, buzz, white noise, or clicker, counterbalanced across animals (each ~ 74 dB). A premature head exit caused the house light to be illuminated as the animal exited the nose poke for the duration of a 5-s time-out; after which, the house light turned off, and a standard 5-s inter-trial interval (ITI) commenced.
Once 60% of No-Go Small and Large trials were performed correctly, rats were next trained exclusively on Go trials. Here, after a 0.3 to 0.7-s central nose poke, one of the two remaining auditory cues would sound, one requiring a left lever press and the other a right lever press (side counterbalanced across animals). Correct responding on a particular lever was associated throughout testing with either a single sucrose pellet ("Go small") or two sucrose pellets ("Go large"). An incorrect lever press would result in the house light switching on for a 5s time-out period, followed by a 5-s ITI. During training, an error-correction procedure was used so that the next trial after an error would always be of the same cue/trial type with the wrong lever withdrawn. Once a criterion of 60% successful Go responses was reached, interleaved No-Go and Go trials were introduced, each with a 25% probability (other than correction trials). Once an average 60% success rate on all four trial types in a session was achieved, the rats moved onto the full version of the task. Here, error correction trials were removed. Further, the number of necessary lever presses on Go trials was increased to two. This ensured that the interval between cue onset to reward delivery was similar between Go and No-Go trials. Reward delivery was delayed for 1 s after the successful completion of a trial. Similarly, the error signal (the house light being illuminated) was also delayed for 1 s following an erroneous response. Throughout training and testing, a session ended after the animals had either earned 100 rewards or had spent 60 min in a session, whichever came first (although rats always met the former criterion during test sessions in the current study). Typically, rats took 1-1.5 months to train to reach stable baseline performance, and each rat in the current study had performed the full task at least 10 times before undergoing drug testing sessions. In between every drug testing session, rats underwent at least one training session without drug manipulations to reestablish baseline performance, where error correction trials were reintroduced, but all other parameters were kept constant.

Pharmacological challenges
Drugs were administered in a within-subjects regular Latin square design. The local infusion sessions were performed following the full completion of an experiment to assess the effect of intra-NAc infusions of D1 agonist or antagonist drugs on task performance (data to be reported separately) and after returning to stable baseline levels of performance. This meant that, prior to the start of local infusion experiments reported here, rats had received 6-7 NAcC infusion (plus one mock infusion session where no substance was injected). Animals always performed at least one behavioural session without injection or infusion in between each drug manipulation to reestablish baseline performance and rule out lasting effects of drugs.

Systemic pharmacology
SB242084 (6-chloro-2,3-dihydro-5-methyl-N-[6-[(2methyl-3-pyridinyl)oxy]-3-pyridinyl]-1H-indole-1-carboxyamide dihydrochloride, Tocris) was dissolved in 25 mM citric acid in 8% w/v cyclodextrin in distilled water, and the pH adjusted to 6-7 using 5 M NaOH. Doses for systemic administration match those used in previous studies to have demonstrated increases in impulsive responding (Winstanley et al. 2004b, a;Fletcher et al. 2007;Silveira et al. 2020) and instrumental vigour . Stock solutions of the vehicle (25 mM citric acid, 8% w/v cyclodextrin in distilled water), SB242084 0.1 mg/ml (referred to in the results' section as "low" dose) and SB242084 0.5 mg/ml ("high" dose), were prepared and aliquoted before being frozen at − 20 °C. Concentrations of both drugs were calculated as the salt. On each experimental day, one aliquot of each drug was defrosted. Systemic injections of drug or vehicle, containing 25 mM citric acid and 8% w/v cyclodextrin in distilled water, were given intraperitoneally in a volume of 1 ml/ kg. Injections were given 20 min prior to testing.

Local infusions
SB242084 was made up as described above. Doses were chosen based on a prior report showing a consistent and dose-dependent increase in impulsive responding on the 5CSRTT (Robinson et al. 2008). Stock solutions (vehicle, SB242084 0.2 μg/μl, 1.0 μg/μl) were prepared and aliquoted before being frozen at − 20 °C. Concentrations of both drugs were calculated as the salt. On each experimental day, one aliquot of each drug was defrosted. D-amphetamine (( +)-α-methylphenethylamine hemisulfate, Sigma-Aldrich) was dissolved in 0.9% NaCl solution to reach a concentration of 10 μg/μl). This was aliquoted and frozen at − 20 °C. On each experimental day, one aliquot of the drug was defrosted.

Surgical procedures
Rats were anaesthetised using inhaled isoflurane (4% in O 2 for induction and 1.5% for maintenance) and given buprenorphine (Vetergesic, 0.03 mg/kg) and meloxicam (Metacam, 2 mg/kg) at the start of surgery. Once animals were secured in a stereotaxic frame and their scalp shaved and cleaned with dilute Hibiscrub and 70% alcohol, a local anaesthetic (bupivacaine, 2 mg/ kg) was administered to the area. The skull was then exposed and craniotomies were made for implantation of bilateral guide cannulae (Plastics One, UK), consisting of an 8-mm plastic pedestal holding together two 26 gauge metal cannulae with a centre-to-centre distance of 3.4 mm and a length of 7.5 mm. The cannulae were implanted 1.5 mm above the target site of the NAcC, at AP + 1.4 mm, ML ± 1.7 mm, DV − 6.0 mm from the surface of the skull and relative to bregma (Franklin and Paxinos 2007). Four anchoring screws (Precision Technology Supplies) were also implanted and dental acrylic was applied to secure the cannulae to the skull. At the end of the surgery, dummy cannulae of the same length as the guide cannulae were inserted to ensure patency and a dust cap was secured onto the pedestal to secure the dummy. Following surgery, animals were again administered buprenorphine (0.03 mg/kg) and meloxicam (2 mg/ kg) and were thereafter given meloxicam for up to 3 days post-surgery.

Local infusion procedure
Animals were first habituated to the manipulation of their implants by being lightly restrained and having the dummy cannulae removed and a fresh set reinserted. Retraining of the animals commenced approximately 2 weeks after surgery. All animals returned back to criterion performance (≥ 60% success rate on each trial type) before drug infusions began. During infusions, the rats were gently restrained whilst the dummy cannulae were removed and the 33 gauge bilateral infusion cannulae, at a length of 9 mm, were inserted into the NAcC. A total of 0.5 μl of vehicle or drug solution was injected per hemisphere at a rate of 0.25 μl/min. The infusion cannulae were left in place for 2 min after the cessation of the infusion to allow diffusion of solution from the cannulae. Next, the infusion cannulae were removed, the dummy cannulae and dust cap replaced, and the rats were returned to their home cage for 10 min to reduce the possible effects of injection stress on performance. Testing then commenced.

Histology
At the end of data collection, animals were deeply anaesthetised with sodium pentobarbitone (200 mg/kg, i.p. injection). They were then transcardially perfused with 0.9% saline followed by a 10% formalin solution (vol/vol). The brains were kept in 10% formalin solution until being sectioned. Brains were sectioned into 60-μm-thick coronal sections using a vibratome (Leica). The sections were then stained with cresyl violet (Sigma-Aldrich) before being mounted in DePeX mounting medium onto 1.5% gelatin-coated slides and enclosed with coverslips.

Data analysis
Latency and performance measures of interest are summarized in Table 1.

Measures of performance
The primary measure of performance was Go and No-Go trial success, measured as the percentage of correctly performed small or large reward Go or No-Go trials, respectively. On Go trials, we considered two types of errors: (i) no response on either lever within 5 s ("response omission" Go) or (ii) incorrect lever response ("wrong lever" Go). No-Go trials were considered to be erroneous if rats left the nose poke prematurely before the end of the holding period while the cue was still playing. We reasoned that premature responses could result from either a failure to inhibit fast cue-driven responses or from a failure to wait for the appropriate time period before initiating a response. We anticipated that the former process would lead to premature responses clustered near cue presentation, while the latter would manifest in No-Go failures nearer the end of the holding period. In support of this distinction, we had previously observed that stimulation of dopamine receptors in this task selectively increased cue-elicited No-Go errors (Grima et al. 2021). Therefore, instances of premature action initiation were quantified within the first and second halves of the No-Go hold period ("early" and "late" epochs, < 0.95 s and ≥ 0.95 s or < 0.85 s and ≥ 0.85 s for the 1st and 2nd cohort, respectively).
Additionally, there were three other behaviours that animals could exhibit that reflected possible changes in performance. First, after exiting the nose poke prematurely on an unsuccessful No-Go trial, animals sometimes pressed a lever. If this occurred during a period equivalent to the minimum No-Go hold interval, these instances were labelled as "invalid lever press" trials and quantified. Second, the number of presses on the correct lever preceding food magazine entry was quantified in Go trials, as rats sometimes continued to press beyond the required number to gain reward. Third, occasionally, rats exited the nose poke prematurely during the pre-cue period and thereby failed to initiate a new trial. These instances were labelled "aborted trials". Fig. 1. Action initiation latency was measured as the time from cue onset to exit of the nose poke in both Go and No-Go trials. Following on from the planned analyses of the patterns of No-Go errors, we conducted additional exploratory analyses on successful No-Go trials where the nose poke exits were classified as falling within the pre-reward (between cue offset and reward delivery later) or within the post-reward interval. We analysed observations within the first 0.8 s of each of these intervals (instead of the full 1 s), as the jitter of the No-Go hold duration sometimes resulted in the remaining 0.2 s falling outside of the time interval of interest. Similar to our conceptualisation of No-Go errors, we reasoned that responses clustered just after No-Go cue offset in the pre-reward period or just after reward delivery in the post-reward interval could be considered cue-driven responses. Travel time in Go trials was measured as the time from nose poke head exit to 1st press on the appropriate lever. Inter-press latency was measured as the time interval between the 1st and the 2nd press on the appropriate lever. Magazine latency was defined as the time interval between completion of a successful No-Go response or the Go response and entry to the food magazine. Changes in within-trial response latencies were used as proxies for changes in animals' instrumental vigour. Reengagement latency was defined as the time taken to reenter the nose poke either after leaving the food magazine in successful trials or 1 s following the registration of an error on unsuccessful trials.

Intra-NAcC d-amphetamine
In order to determine whether any of the effects in the local SB242084 study might have been masked by tissue damage around the tips of the cannulae, we also carried out intra-NAcC infusions of d-amphetamine in this cohort, which has previously been shown to increase behavioural activation and premature responding (Cole and Robbins 1987; West et al. 1998). This meant that, as well as analysing the effect of a hyperdopaminergic state in NAcC on task performance and latencies, we could also use two behavioural measures that were predicted to be strongly influenced by this drug in the majority of animals -No-Go accuracy and Go trial action initiation latencyas a potential marker for the continued viability of the brain tissue around the cannulae. Specifically, if the effect of d-amphetamine on these measures within a subject did not exceed the cohort mean d-amphetamine-induced change by ≥ 50% in at least one of these behavioural outcomes, that animal was categorized as potentially having functionally significant tissue damage around the cannulae tips.

Statistical approaches
All data were analysed using MATLAB (Mathworks), SPSS (IBM), and R (The R Foundation). Performance and time measures were mainly analysed using repeated measures ANOVAs with drug dose and reward size as the within-subject factors unless specified otherwise. As we have described the effects of reward on task performance in a detail in a separate manuscript (Grima et al. 2021), the primary focus here was on the effects of and interaction with the drug. Any significant main effects of reward are, however, reported in the figure legends. Behavioural measures not reported in the main text are documented in Supplementary Tables 1-3 (1, systemic SB242084; 2, intra-NAcC SB242084; 3, intra-NAcC d-amphetamine).
In the d-amphetamine infusion experiment, we used sessions where saline was locally infused into NAcC from a parallel dopamine receptor pharmacology experiment in the same animals as vehicle data for within-subjects analysis. In the systemic SB242084 replication experiment, an additional within-subject factor "training experience" was specified, due to the inclusion of 2 separate replications of systemic SB242084 administration, one prior to cannulation surgery and one after local infusion experiments had been completed. Percentages of nose poke exits in the early and late parts of the No-Go hold period were compared using the chi-squared test of contingency on the group level, as the low number of such observations resulted in skewed residual distributions in ANOVAs. Whenever there was a significant main effect of a drug or an interaction with a task variable of interest, test results were reported in the main text or figure legends and post hoc comparisons across levels of that effect were carried out. The influence of outliers on the ANOVA results was minimized by excluding any subject on the basis of absolute standardised residual values bigger than 3 from that analysis. A p-value less than 0.05 was considered significant.

Systemic SB242084 increases accuracy and instrumental drive on Go trials
We first examined how SB242084 influenced action selection on Go trials, where rats were required to initiate an action and press the appropriate lever twice for a small or large reward. In vehicle sessions, rats' success rate was on average > 80% on both trial types. A first analysis did not find any significant effects of the drug on Go performance (all F < 1.49, p > 0.248). However, as prior research has reported SB242084 to have nonlinear effects on performance with increasing drug doses (Fletcher et al. 2007), we sought to test for nonlinearities by running within-subject contrasts. This revealed that the 5-HT 2C antagonist caused an overall improvement in performance for both low and high reward trials selectively at the low dose, where 11 of the 12 rats showed greater than average success rates compared to vehicle (quadratic effect of drug: F 1, 11 = 11.26, p = 0.006; drug X reward interaction: F 2, 22 = 0.29, p = 0.789; Fig. 2c). Further analyses showed the improvement in performance on the low dose was caused by a decrease in lever press omissions (quadratic effect of drug: F 1, 11 = 7.73, p = 0.018; Fig. 2d), with no corresponding change in the frequency of incorrect lever response trials (all F < 1.43, p > 0.261; Table S1).
We next examined whether administration of the drug altered motor responses within Go trials. While there was no reliable change in latency to initiate action, (all F < 1.65, p > 0.216; Fig. 2e), administration of either dose of the drug reduced travel time to the correct lever (main effect of drug: F 2,22 = 7.15, p = 0.004; Fig. 2f) and decreased the inter-press latency (main effect of drug: F 2, 20 = 4.41, p = 0.026; pairwise comparisons: high dose v vehicle, p = 0.004, all other ps > 0.199; Table S1) as well as magazine latency (main effect of drug: F 2, 18 = 6.71, p = 0.007; Fig. 2g). Despite this invigoration, the ligand did not change the number of lever presses made on the cor rect lever (all main effects and interactions: Fs < 1.91, ps > 0.176; Table S1). These findings suggest that systemic SB242084 promotes engagement with the task and invigorates ongoing motor responses for rewards, and, at a low dose, can do so without compromise to instrumental precision.

Systemic SB242084 increases impulsive action on No-Go trials
We next considered the effect of SB242084 on rats' ability to withhold action for reward by comparing the proportion of correct small and large reward No-Go trials in each of the drug administration conditions. On vehicle, animals successfully withheld responding for small and large rewards on average on > 84% of No-Go trials. However, in contrast to Go trials, administration of either dose of the drug-impaired performance when the prospective future reward was small (drug x reward interaction: F 2, 22 = 5.18, p = 0.014; Fig. 3a). This was not caused by a general inability to withhold action as not only did the ligand have no effect on performance when the large reward was on offer, it also did not change the number of aborted trials, when the rats failed to sustain the pre-cue nose poke required to initiate a new trial (no main effect or interactions with the drug: F < 0.93, p > 0.409; Table S1). Further, the magnitude of the impairment did not reliably correlate with each animal's improvement in Go trial success for either dose of the drug (all − 0.30 > r > − 0.08; all p > 0.34).
We reasoned that successful action restraint during the No-Go hold interval relies first, on inhibition of reactive responses-triggered by the onset of the cue -followed, second, by appropriate control of anticipatory responses targeted at reward retrieval. To test this more directly, we quantified premature head exits in either the "early" or "late" epoch of the No-Go hold period of error trials (Fig. 3b). This revealed that both doses of the ligand promoted erroneous "late" over "early" head exits, again only when the rats were anticipating a small reward (small reward: vehicle v low dose: χ 2 (2) = 9.16, p = 0.010; vehicle v high dose: χ 2 (2) = 6.86, p = 0.032; large reward: all χ 2 < 5.26, p > 0.072; Fig. 3c).
We also investigated whether SB242084 affected action initiation latencies following the successful completion of the No-Go waiting requirement. Although there were no significant differences in correct No-Go trial hold durations on and off the ligand (all F < 0.83, p > 0.451), based on the patterns of No-Go errors, we predicted the ligand might have also altered the balance of fast and slow head exit timings with respect to salient task events (cue offset, signalling the end of the No-Go period, and reward delivery 1 s later). We therefore examined the distribution of head exit latencies on correct No-Go trials within the pre-reward interval (i.e. in the 1 s between No-Go cue offset and reward delivery) as well as an equivalent post-reward interval, both split into "early" and "late" epochs. As can be observed in Fig. 3d, rats that were given vehicle injections on average had the highest likelihood of leaving in the early part of each interval. However, this pattern switched such that the rats given the ligand -particularly at the lower dose -became more likely to leave in the late part of each interval (early-late X drug interaction: F 2, 22 = 5.81, p = 0.009; Fig. 3e). Similar to Go trials, once an action had been initiated, the drug dose-dependently reduced magazine latencies (main effect of drug, F 2, 18 = 7.51, p = 0.004; vehicle v low dose, p = 0.006; vehicle v high dose, p = 0.023; Table S1).
In summary, SB242084 selectively impaired animals' ability to withhold responding when the reward on offer was small. This reflected a significant increase in inappropriately leaving the nose poke port late in the hold period without a concomitant increase in early cuedriven responses. This shift from early to late responses persisted on successful No-Go trials, both in the prereward and post-reward intervals.

Effects of 5-HT 2C receptor manipulation on action restraint/instrumental drive not localisable to NAcC
It has previously been suggested that 5-HT 2C receptors in the NAcC are important for incentive motivation and inhibitory control (Robinson et al. 2008). Therefore, to determine whether the effects of attenuation of goal-directed inhibition and invigoration of reward-related lever pressing observed after systemic injections were dependent on 5-HT 2C receptors in the NAcC, we tested a second cohort of animals on the Go/No-Go task following local infusion of either vehicle, 0.1 μg or 0.5 μg per hemisphere of SB242084. Of the 14 implanted animals, one was excluded for having misplaced cannulae and two others did not complete all testing sessions, resulting in an n of 11 rats (Fig. 4a).
In contrast to the effects of systemic administration, and against our expectations, there were no reliable effects of  Table S2). This remained the case even when limiting analyses to animals that showed a change following d-amphetamine infusions (see next section).

Amphetamine in NAcC impairs action restraint but does not improve instrumental performance
The lack of effect of intra-NAcC infusions of the ligand appears to suggest no direct role for 5-HT 2C receptors for performance on this task. However, it is also possible that it reflects a failure of the drug to reach viable brain tissue or a difference in the behavioural strategy of this cohort of animals from the group that was used for the systemic manipulation.
To rule out the former explanation, and also to provide a direct comparison with a dopaminergic manipulation, we next examined the effect of intra-NAcC infusions of 5 μg per hemisphere of d-amphetamine in the same cohort of animals (n = 13). Infusions of d-amphetamine did not influence Go trial success rate (F < 0.24, p > 0.630) (Fig. 5a). Nonetheless, the drug substantially and selectively speeded action initiation on successful Go trials (main effect of the drug: F 1, 12 = 12.74, p = 0.004; Fig. 5b).
On No-Go trials, intra-NAcC d-amphetamine markedly impaired rats' ability to withhold action. It caused a substantial increase in No-Go errors, and these occurred both in the early and the late epochs of the No-Go holding interval (main effect of the drug: F 1, 12 = 14.05, p = 0.003; all other F < 3.66, p > 0.080) (Fig. 5e-g). Notably, these No-Go errors were often followed by lever pressing within the No-Go holding interval (F 1, 12 = 12.40, p = 0.004). This deficit in withholding actions also generalised to the pre-cue period, where intra-NAcC d-amphetamine substantially increased the numbers of aborted trials (main effect of the drug: F 1, 12 = 5.76, p = 0.035; Table S3).
Taken together, these findings demonstrate that intra-NAcC d-amphetamine significantly biased behavioural strategies towards action over inaction, speeding action initiation and impairing action restraint. In turn, this demonstrates that the null effects observed following intra-NAcC infusions of SB242084 cannot be simply attributed to tissue damage around the cannulae tips blocking the drug from reaching its target.

Effects of SB242084 do not depend on training history or changes in task parameters
The systemic and local NAcC 5-HT 2C perturbations were carried out in separate cohorts, each having trained with slightly different No-Go hold intervals (1.7-1.9 s and 1.5-1.7 s for the systemic and the local 5-HT 2C manipulation, respectively). To ensure that the effects found were not attributable to any such differences in task parameters, or due to training experience, we analysed the effects of two replications of systemic administration of the low dose of SB242084 in the second cohort of animals, one administered before cannulae surgery and one performed after the infusion experiments had been completed.
Arguing against this possibility, the systemic administration of SB242084 caused a very similar pattern of changes in task performance (see Table 2). The ligand again improved Go trial success rate especially on small reward trials (drug X reward interaction: F 1, 13 = 13.56, p = 0.003, vehicle versus drug on small and large reward trials: p = 0.001 and p = 0.869, respectively). Likewise, the ligand again speeded up travel times and magazine latencies (main effect of the drug: both F 1, 13 > 13.22, p < 0.004). Similarly, on No-Go trials, there was an interaction between the effects of the drug and reward, caused by a reduction in success rate in small but not large reward trials following SB242084 (drug X reward interaction: F 1, 13 = 7.14, p = 0.019). Importantly, there was no effect of whether the drug was given before or after the local infusion experiments in any of the above analyses (all F 1, 13 < 1.40, p > 0.258). Therefore, the lack of effects seen after the NAcC infusions of the 5-HT 2C ligand cannot simply be attributable to changes in specific task parameters or experience.

Discussion
Here, we studied the role of 5-HT 2C receptors -an important modulator of instrumental vigour (Simpson et al. 2011;Bailey et al. 2016Bailey et al. , 2018Browne et al. 2017) -in controlling action and restraint. Subtle yet consistent effects on both facets of behavioural control were detected. Systemic, but not intra-NAcC, administration of a low dose of a 5-HT 2C receptor antagonist SB242084, improved instrumental performance on Go trials. This was apparent even in the face of high baseline rates of accuracy and was caused by a reduction in the rates of response omissions. Furthermore, although systemic 5-HT 2C antagonism had no effect on cued action initiation latencies, the drug dosedependently speeded progress through the trial regardless of the reward size on offer. By contrast, 5-HT 2C blockade had a detrimental effect on goal-directed restraint of actions but only on No-Go trials which promised small rewards. This was characterised by a potentiation of impulsive responses in the latter part of the No-Go holding interval, and contrasted with the effects of intra-NAcC infusions of d-amphetamine which amplified both early and late premature action. Taken together, this suggests that 5-HT 2C receptors, outside the NAcC, play an important role in determining internally-driven response likelihood and instrumental vigour, shaped by the anticipated benefits of action or restraint.
A number of previous studies using a variety of tasks have reported reduced operant responding latencies and increased motivation to work, particularly in high effort situations, following systemic administration of SB242084 (Simpson et al. 2011;Bailey et al. 2016Bailey et al. , 2018Browne et al. 2017). Here we also observed an increased success rate on Go trials and the dose-dependent invigoration of instrumental actions. Notably however, this occurred in the context of a task with no equivalent effort requirements (note that while this task was arguably cognitively demanding, a recent study found no effect of 5-HT 2C receptor agents on cognitive effort allocation (Silveira et al. 2020). Therefore, while our results are generally consistent with the idea that perturbing transmission at 5-HT 2C receptors boosts goal-directed motivation and willingness to work (Simpson et al. 2011;Browne et al. 2017;Bailey et al. 2018), the current data refine these definitions and also suggest that there are potentially several distinct processes at play.
First, it was not the case that all response latencies were faster after systemic SB242084 as the average speed of cued action initiation on both Go and No-Go trials remained unchanged. This would suggest that the effect of the drug is not simply to boost motor output or attention, but is more specific in this task to the invigoration of ongoing reward-seeking movements. Second, while there was a monotonic effect with increasing drug dose on those latencies that were affected, the change in Go trial response omissions was limited to the low dose. This divergence between the effect of SB242084 on latencies and omissions is consistent with several previous reports (Winstanley et al. 2004b;Fletcher et al. 2007;Silveira et al. 2020). One possibility is that this reflects a shift in the balance between response speed and precision. While the low dose of the 5-HT 2C receptor antagonist enabled rats to respond faster without making them less likely to choose the correct lever or make more lever presses than required, the additional reduction in response times at the high dose might have started to make their response more imprecise. Future studies, using a greater range of drug doses and video tracking of behaviour, are needed to test this idea.
Third, there was some evidence that performance, though not response latencies, was most affected when the small reward was on offer. Although this might in part reflect a ceiling effect given high levels of baseline performance on large reward Go trials, note that a similar selective effect of the drug when the small reward was on offer was also observed for No-Go performance. Further, this is in line with previous work showing that the most prominent effects of SB242084 were apparent when animals were working for lower net value options (Bailey et al. 2016;Browne and Fletcher 2016). Therefore, the regulation of instrumental performance by 5-HT 2C receptors might depend on the opportunity cost -i.e. the magnitude of what would be lost by failing to perform appropriately. While the SB242084-driven increased engagement with low net value options may be advantageous in some circumstances, it can also impede flexible behaviour when the goal requires withholding a response.
In parallel to the improvements in Go trial performance, we also observed an increase in inappropriate, premature responses on No-Go trials following systemic SB242084, which again was specific to small reward trials. Moreover, there was evidence that the increase in No-Go failures was not reliably coupled to the increase in Go trial performance or responding vigour, indicating the changes are likely not attributable to the same psychological mechanism. There is much evidence from both manipulation (Harrison et al. 1997a, b) and physiological (Miyazaki et al. 2012;Fonseca et al. 2015) approaches that central serotonin is a key modulator of the ability to wait for a reward, with 5-HT 2C receptors playing a central role in mediating this (Higgins et al. , 2020Winstanley et al. 2004b;Fletcher et al. 2007;Quarta et al. 2012;Silveira et al. 2020). However, the effect we observed here did not manifest as an overall increase in impulsivity nor a gross timing deficit, again suggesting that it is not simply a manifestation of a general increase in motor output. Instead, close inspection of the pattern of errors on these trials showed that the ligand specifically increased the proportion of errors that occurred in the late period of the No-Go holding interval on small reward trials, but had no influence on the rate of fast, cue-elicited No-Go errors. Furthermore, when the animals were able correctly to withhold responding during the No-Go period, the 5-HT 2C receptor ligand shifted the pattern of subsequent action initiation latencies away from cue-elicited responses (i.e. ones clustered around cue offset at the end of the No-Go holding period or reward delivery 1 s later).
Taken together with the absence of an observed effect on action initiation latencies, this points towards 5-HT 2C receptors having a more prominent influence over non-cue-driven processes, resulting in an increased instrumental drive to act. Such an effect of SB242084 administration was also reported in a gambling task, where the ligand caused a selective amelioration of the influence of cues on risk-based decision-making (Adams et al. 2017). More generally, this perspective is compatible with recent evidence from optogenetic activation of dorsal raphe serotonin neurons that suggested serotonin may modulate the speed of evidence accumulation when deciding whether or not to switch away from a current behavioural policy, which in turn can also influence the vigour of the ongoing behaviour (Lottem et al. 2018). However, while the authors of this study suggested this was caused by serotonin modulation of uncertainty, it is unlikely that this factor played a significant role here as the rats in the current study were highly trained and the cue-action-reward contingencies were both deterministic and unchanging. Instead, our data implicate 5-HT 2C receptors in the regulation of instrumental drive based on the future benefits and opportunity costs of acting.
We had hypothesised that one potential locus for the effects of systemic administration of SB242084 on instrumental drive and restraint would be the NAcC. Not only is there a notable 5-HT projection to (Azmitia and Segal 1978;Vertes 1991) and level of expression of 5-HT 2C receptors in this region (Pazos et al. 1985;Eberle-Wang et al. 1997), but also a previous study, using the same doses of SB242084, reported an increase in impulsivity on the 5-CSRTT specifically after infusions into NAcC and not into medial frontal regions (Robinson et al. 2008). Therefore, it was unexpected that intra-NAcC administration of SB242084 had no effect on any performance measure. This discrepancy was not caused by the cannulae targeting part of the NAcC that is not important for the task or from tissue damage caused by insertion of the injectors, as subsequent microinjections of d-amphetamine into the NAcC had a substantial influence on performance. Nor can the lack of effect result from subtle task or training differences in the different cohorts of animals that underwent the main systemic experiment and the NAcC cannulation, respectively; systemic administration of SB242084 in the cannulated cohort replicated the original pattern of results in the first cohort. Instead, the findings presented here favour a possible fractionation of 5-HT 2C -modulated "waiting" impulsivity (Bari and Robbins 2013;Dalley and Robbins 2017) depending on whether the animals have to withhold a specific response until a cue is presented (as happens in the 5-CSRTT) or, as here, to withhold competing motor responses in the presence of a cue predicting future reward. These findings are also compatible with the possibility that systemic SB242084 affected the speeding of reward-guided response through the dorsal striatum (Bailey et al. 2018) and inappropriate action through the medial prefrontal cortex (Miyazaki et al. 2020).
Previous studies have demonstrated a direct influence of systemic SB242084 on ventral tegmental area dopamine firing rates and dopamine levels in the NAcC (Matteo et al. 1999;De Deurwaerdere et al. 2004;Browne et al. 2017), and mesolimbic dopamine is known to shape goal-directed motivation and the balance of action initiation and restraint (Roitman et al. 2004;Wassum et al. 2012;Hamid et al. 2016;Syed et al. 2016;Dalley and Robbins 2017). Here, microinjections of d-amphetamine into NAcC, known to potentiate and prolong dopamine and, to a less extent, serotonin (Kankaanpää et al. 1998), also caused a marked increase in premature responses on No-Go trials and speeded action latencies on Go trials. However, the pattern of these changes was strikingly different to those observed after systemic administration of SB242084. Specifically, d-amphetamine exclusively speeded action initiation latencies on Go trials, which had been unaffected by the administration of SB242084 but had no effect on the speed of other responses in a trial or overall success rates, all of which had been altered by systemic 5-HT 2C receptor manipulations. Similarly, premature responses were elevated both in the precue period and throughout the No-Go holding interval after NAcC d-amphetamine, as compared to a selective elevation in the later No-Go holding period after systemic SB242082. This raises the interesting possibility that mesolimbic dopamine and serotonin, acting through 5-HT 2C Rs, provide complementary but distinct modulation of instrumental drive and response restraint, with the former regulating the rapid initiation of reward-seeking actions and the latter affecting instrumental drive for reward. Future studies where the two systems are directly manipulated in the context of the same behavioural task may help to further substantiate the existence of these complementary roles. Note though that this does not rule out additional, more direct interactions between 5-HT 2C transmission and dopamine elsewhere in the basal ganglia implicated in regulating action restraint and response vigour. For example, Bailey and colleagues demonstrated dorsal striatal dopamine levels to be a crucial modulator of instrumental performance after administration of SB242084 (Eberle-Wang et al. 1997;Agnoli and Carli 2012;Bailey et al. 2018). Furthermore, SB242084 could be acting in a dopamine-independent manner in the dorsal striatum or elsewhere in the basal ganglia such as the subthalamic nucleus or NAc shell (Eberle-Wang et al. 1997;Filip and Cunningham 2002;Agnoli and Carli 2012;Bailey et al. 2018).
Previously found effects of SB242084 on goal-directed instrumental vigour have led to the possibility that ligands targeting the 5-HT 2C R -and in particular SB242084 given its functional selectivity over signalling pathways coupled to 5-HT 2C receptors -could potentially be used to treat patients with motivational deficits such as apathy (Matteo et al. 1999;Simpson et al. 2011;Bailey et al. 2018). Our and others' data add a note of caution by showing that the improvement in the instrumental drive for reward may, in certain contexts, also have detrimental effects on response restraint. Similarly, 5-HT 2C R blockade has been observed both to reduce and to amplify particular behaviours associated with obsessive-compulsive disorders (Martin et al. 2002;Albelda and Joel 2012). Therefore, further research will be required to understand the neural mechanisms underlying the complex changes in response vigour and in response restraint reported here, to allow for more precise targeting of 5-HT 2C receptors and their associated signalling pathways in the future.