5-HT 2A and 5-HT 2C receptor antagonism differentially modulate reinforcement learning and cognitive flexibility: behavioural and computational evidence

Rationale Cognitive flexibility, the ability to adapt behaviour in response to a changing environment, is disrupted in several neuropsychiatric disorders, including obsessive–compulsive disorder and major depressive disorder. Evidence suggests that flexibility, which can be operationalised using reversal learning tasks, is modulated by serotonergic transmission. However, how exactly flexible behaviour and associated reinforcement learning (RL) processes are modulated by 5-HT action on specific receptors is unknown. Objectives We investigated the effects of 5-HT2A receptor (5-HT2AR) and 5-HT2C receptor (5-HT2CR) antagonism on flexibility and underlying RL mechanisms. Methods Thirty-six male Lister hooded rats were trained on a touchscreen visual discrimination and reversal task. We evaluated the effects of systemic treatments with the 5-HT2AR and 5-HT2CR antagonists M100907 and SB-242084, respectively, on reversal learning and performance on probe trials where correct and incorrect stimuli were presented with a third, probabilistically rewarded, stimulus. Computational models were fitted to task choice data to extract RL parameters, including a novel model designed specifically for this task. Results 5-HT2AR antagonism impaired reversal learning only after an initial perseverative phase, during a period of random choice and then new learning. 5-HT2CR antagonism, on the other hand, impaired learning from positive feedback. RL models further differentiated these effects. 5-HT2AR antagonism decreased punishment learning rate (i.e. negative feedback) at high and low doses. The low dose also decreased reinforcement sensitivity (beta) and increased stimulus and side stickiness (i.e., the tendency to repeat a choice regardless of outcome). 5-HT2CR antagonism also decreased beta, but reduced side stickiness. Conclusions These data indicate that 5-HT2A and 5-HT2CRs both modulate different aspects of flexibility, with 5-HT2ARs modulating learning from negative feedback as measured using RL parameters and 5-HT2CRs for learning from positive feedback assessed through conventional measures. Supplementary Information The online version contains supplementary material available at 10.1007/s00213-024-06586-w.


Introduction
The monoamine neurotransmitter serotonin (5-hydroxytryptamine; 5-HT) system is implicated in several neuropsychiatric disorders, including major depressive disorder (MDD), obsessive-compulsive disorder (OCD) (Andersen et al. 2021;Carhart-Harris and Friston 2019;Doss et al. 2021;Stroud et al. 2018).Thus, understanding the role of serotonergic modulation mediated by specific 5-HT receptors is critical for developing future therapies for disorders characterized by inflexible behaviour and diminished RL.
5-HT contributes to various cognitive processes across species, including RL (Den Ouden et al. 2013;Iigaya et al. 2018) and cognitive flexibility (Alsiö et al. 2021;Barlow et al. 2015;Clarke et al. 2004).Cognitive flexibility is defined as the ability to adapt behaviour in response to changes in the environment.Inflexible behaviour can manifest itself as compulsive behaviour, e.g.excessively perseverative actions that are independent of outcomevalue associations (Berlin and Hollander 2014;Jentsch and Taylor 2001;Koob and Volkow 2016).Moreover, the ability to adjust behaviour to changes in the environment is closely linked to underlying RL processes, which integrate positive and negative feedback from the environment to maximise rewards and minimise punishment (Sutton and Barto 1998).
Flexible responding can be assessed using reversal learning paradigms across species (Uddin 2021).During reversal learning tasks, initially learned stimulus contingencies change and the subject needs to update behaviour accordingly.Substantial evidence suggests that 5-HT is involved in the modulation of reversal learning, as shown through 5-HT depletion in the orbitofrontal cortex (OFC) in monkeys (Clarke et al. 2004(Clarke et al. , 2005;;Rygula et al. 2015) and rats (Alsiö et al. 2021;Izquierdo et al. 2012).In humans, acute tryptophan depletion (reducing 5-HT levels due to a reduction in its amino-acid precursor tryptophan) increases outcome-independent choice perseveration (Seymour et al. 2012) and impairs reversal learning (Kanen et al. 2021).5-HT also modulates RL processes underlying flexible behaviour, possibly through distinct mechanisms (Bari et al. 2010;Seymour et al. 2012).In healthy human participants, short-term administration of the SSRI citalopram results in increased punishment learning and reduced reward learning (Michely et al. 2022).In patients with MDD, SSRIs impairs learning from negative feedback, while having negligible effects on learning from positive feedback (Herzallah et al. 2013).In rats, acute low-dose citalopram improves negative feedback sensitivity, while acute high-dose citalopram impairs negative feedback sensitivity, similarly to observations in human studies (Bari et al. 2010).
While it is evident that 5-HT is a key modulator of behavioural flexibility, it targets a broad range of receptor subtypes with diverse actions, exerting both excitatory and inhibitory transmission depending on receptor subtype and localisation (Alvarez et al. 2021).Thus, it is vital to understand the modulatory role of 5-HT through different receptors on cognition and RL.In particular, the excitatory 5-HT 2A Rs, which are primarily localized on excitatory pyramidal neurons, and inhibitory 5-HT 2C Rs, found primarily on inhibitory parvalbumin neurons, seem to be involved in reversal learning -possibly with dissociable roles (Aghajanian and Marek 1999;Amargós-Bosch et al. 2004;Liu et al. 2007;Santana et al. 2004).Systemic 5-HT 2A R blockade impairs spatial reversal learning performance, whereas systemic blockade of 5-HT 2C Rs improves performance (Boulougouris et al. 2008).Moreover, high levels of perseveration in rats have been found to be associated with decreased levels of 5-HT 2A R in the OFC (Barlow et al. 2015), consistent with decreased levels of 5-HT 2A R density in the OFC and PFC predicting clinical severity in OCD patients (Perani et al. 2008).Recent findings also suggest that psilocybin improves cognitive flexibility through a mechanism dependent on 5-HT 2A Rs, but not 5-HT 2C Rs (Torrado Pacheco et al. 2023).Less is known about the effects of 5-HT 2A R and 5-HT 2C R stimulation and blockade on component processes of reversal learning, including sensitivity to feedback and subsequent action selection.
To investigate the specific roles of 5-HT receptors in flexibility and RL, we employed the valence-probe visual discrimination (VPVD) task (Alsiö et al. 2019) and combined this task with RL modelling to gain a deeper insight into the latent processes underlying behaviour.We recently employed RL computational modelling to assess effects of 5-HT depletion and SSRI treatment in a different, probabilistic reversal task (Luo et al. 2023).We thus aimed in this study to extend this analysis to specific 5-HT receptor agents.Such models are fitted to trial-by-trial data and allow for extraction of parameters such as value-dependent (i.e., dependent on wins/losses on the previous trial) positive and negative learning rates, the 'reinforcement sensitivity' parameter, as well as the value-independent side and stimulus stickiness parameters, which reflect repeated responses to the same side or stimulus, respectively, regardless of the outcome on the previous trial (Daw 2009).Stickiness differs from perseveration as it provides a measure of the overall tendency to repeat a choice based on all previous trials, whereas perseveration is usually measured as the number of responses to the previously correct stimulus after a reversal.These parameters reflect different aspects of flexibility and RL, separating value-dependent from value-independent components.We examined whether these parameters contribute to choice behaviour on the VPVD task and if they were affected by 5-HT 2A R or 5-HT 2C R blockade.We hypothesized that 5-HT 2A R blockade would increase stickiness parameters, and that 5-HT 2C R blockade would lead to higher learning rates, as previous studies (summarized above) have shown increased perseveration following 5-HT 2A R blockade and improved reversal learning behaviour resulting from 5-HT 2C R antagonism.Computational modelling thus enables us to investigate the roles of the different 5-HT 2 receptors more precisely in different aspects of RL behaviour.

Animals
Subjects were male hooded Lister rats (N = 36; Charles River, UK) (Fig. 1) housed in groups of three or four throughout the experiments.The rats underwent two experiments.In the first experiment (5-HT 2A R antagonism), all 36 rats were included.In the following 5-HT 2C R antagonist experiment, 35 rats were included, as one rat had to be euthanised due to seizures.The rats were housed under a reverse 12-h light/ dark cycle with lights off at 0700 h.All training and testing was performed during the dark phase.To ensure sufficient motivation for task performance, the animals were food restricted with ad libitum access to water and fed once daily at random times after testing.Their body weights were maintained at 85% of their free-feeding weight.All experiments were subject to regulation by the United Kingdom Home Office (PPL 70/7548) in accordance with the Animals (Scientific Procedures) Act 1986.

Hierarchical bayesian reinforcement learning modelling
The VPVD data were modelled with RL models using a hierarchical Bayesian approach.In total, nine different models were implemented in Stan (version 2.26.1), containing different combinations of parameters.The methods and models tested are described in more detail in the Supplementary Materials.
Q-values were updated on each trial using the following equation: where Q t+1 (c t ) is the Q-value of the stimulus chosen on the current trial for the next, Q t (c t ) is the expected value of the stimulus selected on the current trial, α is the learning rate and r t is the reinforcement on trial t (1 for reward and 0 for punishment).The learning rate reflects how much the Q-value is updated based on the prediction error r t − Q t (c t ), with higher α driving faster learning.
Next, the softmax decision rule was used to calculate the probability of making one of two choices: (2) Q t (L) and Q t (R) are the Q-values of the left and right stimuli, and β is the reinforcement sensitivity parameter, which determines to what extent the subject is driven by its reinforcement history (versus random choice).Lower values of β indicate greater exploration and lower sensitivity to reinforcement, whereas greater values represent increased exploitation and greater sensitivity to reinforcement.The behavioural data were simulated with the posterior group mean parameters from the winning model, to ensure that the model could reproduce behavioural observations.The simulations were then analysed using a conventional approach as described below.

Statistical analyses
Data across days within one reversal were collapsed, and trial outcomes were coded as perseverative, random, or learning depending on performance over bins of 30 trials in a rolling window, as described in detail and illustrated previously (Hervig et al. 2020), and following binomial distribution probabilities (Jones and Mishkin 1972).

Valence-probe visual discrimination task with reversal
Behavioural training was performed as previously described in (Alsiö et al. 2019).The VPVD task can assess the effect of positive or negative feedback on learning through a neutral stimulus that is probabilistically reinforced (Phillips et al. 2018).For experimental timeline and design see Fig. 1 and for additional information on the apparatus, behavioural pre-training, and touchscreen visual discrimination and reversal, see Supplementary Materials.
After pre-training, the rats progressed to the VPVD task.The VPVD task was a three-stimulus task, during which responses to one stimulus (A+) were rewarded, whereas responding to the other stimulus (B−) was punished with a time-out.A third stimulus, probabilistically rewarded on average 50% of the time (C 50/50 ), was paired with either the A + or B − on 'probe' trials (Fig. 1).
The trial structure was kept constant, but a tone was played every time a trial was rewarded, and the stimulus duration was unlimited to ensure that animals completed the probe trials.The probe stimulus and frequency of probe trials (every 4 or 5 trials) were determined based on a previous study (Alsiö et al. 2019).After optimization, each of the probe trials was presented once every 8 trials: randomized, but never on the first trial within any 8-trial bin.There was a maximum of 200 trials per session.Both the intertrial interval and time-out (on non-rewarded trials) were 5 s.Rats were initially tested for 5 days on the same A + and B − as during the pre-training reversal (i.e., 'horizontal bars' vs. 'vertical bars').The animals then completed a visual discrimination with a novel pair of stimuli ('slashes' vs. 'backslashes'; counterbalanced across rats).Training continued for a minimum of 5 sessions but could be extended to allow rats to reach 80% correct on the standard trials within the task.Once all rats had reached the criterion, all rats progressed to the 'reversal learning experiment'.On the day before reversal and start of drug treatment, the rats received a saline injection and were given a retention test session.The next day, rats were matched for stimulus-reward contingencies, performance on the probe trials before reversal and pre-training reversal performance, and accordingly allocated to a drug group.The stimulus-reward contingencies were reversed on the first day of reversal and then remained the same for the duration of the training sessions (i.e., there were only between-session reversals).The drug was administered before testing each day.The same stimulus ('diamonds') was used as the probe stimulus for all rats and across each of the phases, both during training and test trials.Training during the SB-242084 experiment followed the same procedure as above but rats were trained on a new pair of stimuli ('arcs' vs. 'triangles' counterbalanced across This model included the following parameters: α rew (reward learning rate), α pun , (punishment learning rate), β (reinforcement sensitivity), κ stim (stimulus stickiness), κ side (side stickiness), and the discount factorρ.Learning from negative feedback was decreased by both low (difference in parameter per-group mean, posterior 95% highest density interval (HDI) excluding zero (group difference, 0 ∉ 95% HDI)) and high (group difference, 0 ∉ 75% HDI) doses of M100907.There was some evidence that low, but not high, dose M100907, also decreased the reinforcement sensitivity parameter (reflecting decreased sensitivity to reinforcement) (group difference, 0 ∉ 75% HDI) and increased the stimulus stickiness parameter (group difference, 0 ∉ 75% HDI).The side (location) stickiness parameter was increased in the low dose group (group difference, 0 ∉ 95% HDI) and slightly increased in the high dose group (group difference, 0 ∉ 75% HDI).The reward learning rate and discount factor were unaffected by M100907 treatment (no group differences, 0 ∈ 75% HDI) (Fig. 2and Table 2.The mean and standard deviation of the novel discount factor ρ for each group can be found in Supplementary Table 2 Furthermore, we simulated the behavioural data using the extracted parameters from the winning model.The data modelled was separated into standard, positive and negative probe trials.The simulations were able to capture the dynamics of behaviour on the VPVD task, as can be seen in the Supplementary Materials (Figure SF.1)

Effects of 5-HT 2A R blockade on VPVD reversal: standard behavioural parameters
There was weak evidence that systemic M100907 impaired performance on the VPVD task.On the standard (A−< B+) trials, there was a trend towards a main effect of dose (F 2,35 = 2.93, p = 0.066) and a trend towards a dose × session interaction (F 26,455 = 1.52, p = 0.051) (Fig. 2A).As there were evident trending effects (although non-significant), we The main measures were percentage correct responses ('% correct') on the standard A−< B + trials and '% optimal choice' for the negative and positive probe trials across sessions.The optimal choice percentage was defined as the percentage of trials where the highest reward-probability option was chosen.Only data up to (and including) the first block of 30 trials where a rat reached criterion (24/30 correct) were analysed.
We also analysed response and collection latencies.Drug effects on standard parameters were analysed using linear mixed-effects models with the lmer package in R as described previously (Phillips et al. 2018) and as recommended for such data (Wickham 2014).The model contained two fixed factors (dose and session or dose and phase) and one random factor (subject).When relevant, further analyses were performed by conducting separate multilevel models on 'dose' for each session or phase.These analyses were followed by post hoc Dunnett's corrected pairwise comparisons with the relevant vehicle condition.Significance was set at α = 0.05.
Visualization and statistical tests were performed with R, version 4.1.2(R Core Team 2021).Response frequencies were square-root transformed, latencies were log transformed and probabilities were arcsine transformed to ensure normality, as confirmed with a quantile-quantile plot of residuals.

Effects of 5-HT 2C R blockade on VPVD reversal: standard behavioural parameters
Systemic SB-242084 impaired performance in the VPVD reversal learning task.On the standard (A−< B+) trials, there was a trend towards a main effect of dose (F 2,35 = 3.15, p = 0.055) but no dose × session interaction (F 26,455 = 0.81, p = 0.74) (Fig. 3).On positive probe trials, there was a significant main effect of dose on % optimal choice(F 2,35 = 7.38, p = 0.0021) but no dose × session interaction (F 26,455 = 1.04, p = 0.41).As there were evident trending effects (although non-significant), we performed further post hoc analyses within each session for the standard (A−< B+) trials.Post hoc comparisons revealed that the 1.0 mg/kg SB-242084 significantly reduced % correct on sessions 7 Experiment 2: effects of systemic 5-HT 2C R blockade on reversal learning and reinforcement learning parameters

Effects of systemic 5-HT 2C R blockade on reinforcement learning processes: computational modeling
Model 7 was the winning model for this dataset (including parameters α rew , α pun , β and κ side ) (Model 9 did not converge; see Supplementary Material).It showed that learning from positive and negative feedback were unaffected by SB-242084 (no group differences, 0 ∈ 75% HDI) (Fig. 3and Table 2. High-dose SB-242084 decreased the reinforcement sensitivity parameter (i.e., reducing sensitivity to feedback) (group difference, 0 ∉ 75% HDI).
The side stickiness parameter was decreased by low-dose (group difference, 0 ∉ 95% HDI) and high-dose (group difference, 0 ∉ 75% HDI) SB-242,084.We also simulated partial 5-HT 2A R agonist, as well as general improvements in set-shifting following ketanserin administration in rats (Baker et al. 2011;Pokorny et al. 2020;Torrado Pacheco et al. 2023).However, such apparent inconsistencies may have resulted from the use of different paradigms to assess flexibility, such as set-shifting, which may involve distinct neural and 5-HT dependent substrates than reversal learning (Clarke et al. 2005;Dias et al. 1996).Dose may also be a relevant factor.The lower dose of 0.03 mg/kg M100907 affected reversal learning more than the 0.1 mg/kg dose, possibly reflecting an inverted U-curve effect, as previously reported for 5-HT 2A R antagonists (Marek et al. 2005).Dose-response studies have shown that moderate systemic doses of M100907 are more effective than low and high doses on a response-inhibition task and that intra-lOFC infusions with moderate M100907 doses induce the most detrimental effects on reversal learning (Furr et al. 2012;Marek et al. 2005).The high-dose of the 5-HT 2A R antagonist may have induced receptor internalization, an established mechanism for the 5-HT 2A R which produces such apparently paradoxical effects (Roth 2011) (Fig. 4).
The findings align with our initial hypothesis of increased stickiness following 5-HT 2A R blockade.Selective depletions of 5-HT in the marmoset OFC and amygdala using 5,7-DHT also results in increased side stickiness rates, similar to our findings following 5-HT 2A R antagonism (Rygula et al. 2015), suggesting that 5-HT 2A Rs in these areas may modulate the stickiness parameter, i.e., repeating responses regardless of previous outcomes.This accords with the demonstration that side stickiness is correlated with functional connectivity between the amygdala and medial OFC in rats (Zühlsdorff et al. 2023).

Effects of 5-HT 2C R antagonism on reinforcement learning and cognitive flexibility
Antagonism of 5-HT 2C Rs with SB-242084 decreased % correct and % optimal choice on the VPVD task at high doses.Previous data have shown that this agent can improve serial reversal performance in the initial perseverative phases due to reduced perseveration but that there is an overall decremental effect on performance, possibly due to impaired (re-) learning of associations after perseveration has been overcome (Alsiö et al. 2015).This interpretation is supported by differential roles of 5-HT in lateral orbitofrontal and medial prefrontal cortex (Alsiö et al. 2019).In probabilistic reversal tasks, where there is already a high baseline of response shifting, further increases are unlikely to improve performance and may impair it (e.g., human data in (Kanen et al. 2019).Using RL models, we found here that 5-HT 2C R blockade decreased the reinforcement sensitivity parameter at a higher dose and decreased side stickiness at low and high doses.In both the (t 91.8 = -2.63,p = 0.020) and 8 (t 91.8 = -2.35,p = 0.040).On positive probe trials, post hoc analyses showed that % optimal choice was significantly decreased on sessions 8 (t 423 = -2.48,p = 0.026), 9 (t 423 = -2.61,p = 0.018), 11 (t 423 = -2.39,p = 0.034) and 12 (t 423 = -2.24,p = 0.049).
Win-stay/lose-shift and latency analyses for both experiments can be found in the Supplementary Materials.

DISCUSSION
These findings indicated contrasting, as well as common, effects of 5-HT 2A and 5-HT 2C R antagonists on measures of RL and cognitive flexibility in the rat.We used a computational modelling approach to visual discrimination reversal that characterized novel drug effects not seen previously using standard behavioural measures.The RL parameters enabled us to gain a deeper insight into the latent mechanisms underlying behaviour on the VPVD task.

Effects of 5-HT 2A R antagonism on reinforcement learning and cognitive flexibility
Selective blockade of 5-HT 2A Rs using M100907 impaired reversal learning as reflected by reductions in % correct on standard trials and an increasing frequency of errors after the initial perseverative phase at the random choice and learning phases.This impairment was not associated with changes in response or collection latencies, showing that it was unlikely to be caused by motivational or sensorimotor deficits.Computational analyses revealed that 5-HT 2A R antagonism impaired learning from negative feedback, decreased the reinforcement sensitivity parameter and increased both side and stimulus 'stickiness', suggesting differential effects of 5-HT 2A R blockade on value-dependent (reinforcement sensitivity) compared to value-independent (stickiness) choices, which may reflect distinct facets of the cognitive flexibility construct.
Previous studies using systemic (Boulougouris et al. 2008) or intra-lateral OFC (Hervig et al. 2020) M100907 have also shown impaired reversal learning performance, consistent with the present findings.Moreover, lower 5-HT 2A R binding in the rat OFC is associated with more perseveration during spatial reversal (Barlow et al. 2015).Our findings may seem inconsistent with studies showing that the 5-HT 2A R antagonist ketanserin normalizes impairments in flexibility resulting from lysergic acid diethylamide (LSD), which is a stickiness, whilst decreasing reward rate and increasing reinforcement sensitivity at a higher dose (Luo et al. 2023).Acute escitalopram in healthy human participants reduces the reward learning rate, decreases reinforcement sensitivity, and decreases stimulus stickiness (Luo et al. 2023), partially aligning with our findings following 5-HT 2C R blockade.Our findings using selective 5-HT 2A R and 5-HT 2C R antagonists may thus aid our understanding of mechanisms underlying cognitive flexibility and RL.
Psilocybin and other psychedelics are receiving increased attention for their therapeutic potential in treating neuropsychiatric disorders such as MDD and anxiety (Carhart-Harris et al. 2016, 2021;Goldberg et al. 2020).Even though their mechanisms are poorly understood, one hypothesis is that psilocybin improves cognitive flexibility (Baker et al. 2011;Torrado Pacheco et al. 2023).Psilocybin, which primarily exerts its psychoactive effects through 5-HT 2A R agonism (Madsen et al. 2019), has been shown to increase cognitive present study and in Phillips et al. (2018), SB-242084 impaired performance and reduced reinforcement sensitivity.This drug therefore appeared to enhance flexible responding as reflected by the reinforcement sensitivity and side stickiness parameters and (Fig. 5) this may account for the initial positive effects on serial reversal.This observation is in accordance with studies showing SB-242084 to improve performance during perseverative phases of serial visual reversal learning (Boulougouris et al. 2008).Our findings indicate that this improvement may be due to decreased side stickiness following SB-242084 administration.However, the reduction in reinforcement sensitivity may lead to an overall deficit in performance.

Implications for mechanisms of action of SSRIs and psychedelics in psychiatric disorders
In a recent analysis, lower doses of the SSRI citalopram increases the reward learning rate and decreases side  -2014-2810 and R210-2015-2982).KZ was supported by the Institute for Neuroscience at the University of Cambridge, the Alan Turing Institute, London and the Angharad Dodds John Bursary in Mental Health and Neuropsychiatry, Downing College, Cambridge.JWD has received funding from GlaxoSmithKline and Boehringer Ingelheim Pharma GmbH and is a co-investigator on an MRC program grant (MR/N02530X/1).TWR is also a co-investigator of the latter grant.RNC's research is supported by the UK Medical Research Council (MRC) (MR/W014386/1).JA was supported by a short-term grant from Fudan University.SFO, TB and BP have no funding to declare.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.flexibility in individuals with MDD for at least 4 weeks (Doss et al. 2021).Ayahuasca, which contains the 5-HT 2A R agonist dimethyltryptamine, similarly increases cognitive flexibility in healthy volunteers (Kuypers et al. 2016;Murphy-Beiner and Soar 2020).In contrast, 2,5-dimethoxy-4-iodoamphetamine, a 5-HT2 A/C R agonist, impairs flexible strategy choice, highlighting different mechanisms of actions of hallucinogenic substances (Torrado Pacheco et al. 2023).Finally, a recent study investigating the effects on RL parameters of the psychedelic LSD, a partial 5-HT 2A R agonist, has reported increased reward and punishment learning, and reduced stimulus stickiness (Kanen et al. 2022).Overall, these results suggest that 5-HT 2A R agonism can improve flexibility.In the present study, we show that antagonism of this receptor decreases the punishment learning rate and increases stickiness, mirroring these hypothetical effects of 5-HT 2A R agonism.A limitation of our study is the fact that only male animals were included; therefore, sex-dependent effects could not be investigated.

Declarations
In summary, we report that both 5-HT 2A R and 5-HT 2C R antagonism altered performance on a visual reversal task.We characterized this impairment using RL models, finding that 5-HT 2A R blockade reduced both learning from punishment and reinforcement sensitivity, but increased stickiness.5-HT 2C R blockade impaired learning from positive feedback as assessed using conventional measures, suggesting a dissociation between the two receptors: the 5-HT 2C R is essential for learning from positive feedback and the 5-HT 2A R is important for learning from negative feedback.Additionally, 5-HT 2C R antagonism reduced reinforcement sensitivity and side stickiness parameters, indicating increased flexibility.These results provide novel insights into the mechanisms of 5-HT and the involvement of different 5-HT receptors in cognitive flexibility.This may be important for our understanding of neuropsychiatric conditions such as MDD and OCD, as well as for research into future treatments such as psychedelic agents that act as 5-HT 2A R agonists.

Fig. 1
Fig. 1 Experimental design.(A) Table of groups and treatments.(N, number of subjects).(B) VPVD stages and stimuli in the M100907 and SB-242,084 experiments.A is the 100% reinforced stimulus, B is the 0% reinforced stimulus, C is reinforced on 50% of probe trials.(C)

Fig. 2
Fig. 2 Effects of M100907 on VPVD parameters.(A) Percent correct and percent optimal choice across sessions.(B) Errors to criterion and errors per phase.Results are represented as mean ± standard error of the mean (SEM); *** p < 0.01, # p < 0.1

Fig. 3
Fig. 3 Results from the hierarchical Bayesian winning RL model 9, showing differences in group mean parameters following M100907 administration.(A) Reward and punishment learning rate parameters.(B) Reinforcement sensitivity, side and stimulus stickiness parameters.

Table 1
Model comparison summary.Models were assumed to be equiprobable a priori Conflict of interest JWD has received research grants from Boehringer Ingelheim Pharma GmbH and GlaxoSmithKline and receives royalties from Springer Verlag.TWR discloses consultancy with Cambridge Cognition; he receives editorial honoraria from Springer-Nature and Elsevier and a research grant from Shionogi.RNC consults for Campden Instruments and receives royalties from Cambridge Enterprise, Routledge, and Cambridge University Press.KZ, MEH, JA, SFO, TB and BP have no conflicts to declare.