Escitalopram Restores Reversal Learning Impairments in Rats with Lesions of Orbital Frontal Cortex

The term ‘cognitive structures’ is used to describe the fact that mental models underlie thinking, reasoning and representing. Cognitive structures generally improve the efficiency of information processing by providing a situational framework within which there are parameters governing the nature and timing of information and appropriate responses can be anticipated. Unanticipated events that violate the parameters of the cognitive structure require the cognitive model to be updated, but this comes at an efficiency cost. In reversal learning a response that had been reinforced is no longer reinforced, while an alternative is now reinforced, having previously not been (A+/B− becomes A−/B+). Unanticipated changes of contingencies require that cognitive structures are updated. In this study, we examined the effect of lesions of the orbital frontal cortex (OFC) and the effects of the selective serotonin reuptake inhibitor (SSRI), escitalopram, on discrimination and reversal learning. Escitalopram was without effect in intact rats. Rats with OFC lesions had selective impairment of reversal learning, which was ameliorated by escitalopram. We conclude that reversal learning in OFC-lesioned rats is an easily administered and sensitive test that can detect effects of serotonergic modulation on cognitive structures that are involved in behavioural flexibility.


Introduction
The frontal lobes of the human brain are thought to be the 'seat of being', providing functions that are quintessentially human. These include language but also functions related to having goals, considering consequences, weighing options, abstracting rules, making plans for the future, and free-will: in short, the frontal lobes hold the cognitive structures that give rise to the essence of human 'self'. These are what Whitehead invoked when he wrote "the life of a human being receives its worth, its importance, from the way in which unrealised ideals shape its purposes and tinge its actions. The distinction between men and animals is in one sense only a difference in degree. But the extent of the degree makes all the difference" (Whitehead 1938, pp. 37-38).
It is not far-fetched to suggest that a hungry foraging rat has 'unrealised ideals' and that these are brought to bear in driving its behaviour and response choice, which determine future action. Furthermore, the frontal lobes of the rat contribute to this goal-directed behaviour and, from this, cognitive structures may be inferred. Therefore, quantifying this behaviour should demonstrate that it is possible, even if only within a relatively restricted cognitive domain, to measure the extent of the degree of difference referred to by Whitehead (1938).
Humans can verbalise many mental (cognitive) functions by introspection and communicate this to others. Without recourse to language, however, cognition cannot be directly measured, but rather only indirectly inferred from behaviour. The challenge then becomes that of finding suitable measures of behaviour that reflect the cognitions of interest in different species in order to take a comparative approach to understanding the neural basis of cognition. Such an approach has the obvious value that it could inform our understanding of fundamental properties of cognitive operations (Miller and Cohen 2001). However, there is an additional potential benefit, in that it enables the refinement of 'animal models' of human psychiatric disorders, such as schizophrenia or depression, in which cognitive flexibility is impaired (Murray et al. 2008;Kehagia et al. 2010;Murphy et al. 2012;Gilmour et al. 2013;Waltz 2017). In recent years pharmaceutical companies have curtailed investment in, or abandoned altogether, research in to treatments for mental illness and other funders are not stepping in to counteract this trend. We recently argued that one of the reasons for this retreat is that 'translational research' has often failed to deliver its promise but, while limits of 'animal models' must be acknowledged, they do have value in providing an understanding of the neural mechanisms of specific symptoms (Insel et al. 2012).
Thus, there are multiple good reasons to identify those cognitive structures that are relevant for human health and wellbeing and are both likely to be evolutionarily conserved and can be readily measured and quantified in different species.
The capacity to behave flexibly is an adaptation that is fundamental for evolutionary fitness and is quantifiable in many different species. This makes studies of behavioural, and the presumed underlying cognitive, flexibility exemplary for this purpose.

How Is Behavioural Flexibility Measured and Cognitive Flexibility Inferred?
Cognitive structures improve the efficiency of information processing by providing a situational framework within which there are parameters governing the nature and timing of information and appropriate responses can be anticipated. In a highly predictable situation, unanticipated events require flexibility: the cognitive model is updated so that appropriate responses are generated. However, this updating incurs a cost, usually measured as additional time or experience required to learn under the changed conditions. Most assays of cognitive flexibility exploit paradigms from the early psychology literature measuring perceptual attentional shifting (examples include the Wisconsin Card Sorting Test (Berg 1948) and the intra-/extra-dimensional (ID/ED) set shifting task (Lawrence 1949) or response switching (examples include task switching (Jersild 1927) and 'learning set' (Harlow 1949)). Some tests include elements of both perceptual shifting and task or response switching (see Floresco and Jentsch 2011), which could be problematic if shifting and switching are separable processes (for an excellent discussion of this see Ravizza and Carter 2008)). The third paradigm that is frequently used as a presumed measure of cognitive flexibility is reversal learning: after one reward pairing has been learned (e.g., 'A+/B−') it is reversed (e.g., 'A−/B+'). Reversal learning has a long history of use, but it has become increasingly popular, particularly in the last decade, because of the ease with which it can be measured in different species, making it particularly useful for translational research (for review, see Izquierdo et al. 2017).
In all of these measures of cognitive flexibility, the assumption is that a cognitive structure is formed due to the repetition of a particular situational context (i.e., a stable 'A+/B−' association; an attentional focus on a particular stimulus feature; an effective response strategy). The anticipation of future stability means that when it is violated (i.e., 'A+/B−' becomes 'A−/B+'; another stimulus attribute is relevant; an alternative response strategy is more effective), there is a 'cost', measured in retardation of learning, as the cognitive model is updated.
It has long been established that reversal learning is more rapid if the reversal is a reversion to a previous learned association. Furthermore, reversals are particularly rapid when they repeat serially (Harlow 1949). The benefit from repeating a reversal could arise in part from familiarity with the particular stimuli and the task requirements and is thus similar to the benefit of over-training (Dhawan et al. 2019). A benefit of repeating a reversal could also be due to incorporation into the cognitive structure the concept that 'reversals may occur' (Izquierdo et al. 2017). In this study, we sought to tease these apart in the context of lesions of the orbital frontal cortex (OFC). We selected this particular brain region because it has repeatedly been shown to impair reversal learning in many different forms (for review see Izquierdo et al. 2017). In addition, serotonin has been implicated in reversal learning (Boulougouris et al. 2007;Bari et al. 2010;Brigman et al. 2010). Therefore, we investigated the effects of the selective serotonin reuptake inhibitor (SSRI), escitalopram, on discrimination and reversal learning in OFC-lesioned rats, and on prefrontal Fos immunoreactivity.

Animals
Twenty-eight naïve male Lister hooded rats (Harlan, UK) were used. The rats were pair-housed and maintained on a 12-h light/dark schedule (lights on at 7 a.m.), with a diet of 15-20 g of standard laboratory chow each day with water available ad libitum. The initial weight range was between 300 and 350 g. At completion of the experiment the weight range was between 310 and 390 g. All procedures were carried out in accordance with the UK Animals (Scientific Procedures) Act 1986.

Apparatus
The apparatus for the task and the basic testing protocol was the same as used during the rat attentional set-shifting task and have been described in detail elsewhere (Birrell and Brown 2000;Tait et al. 2018). In brief, the testing arena was constructed from large plastic home-cages (69.5 cm × 40.5 cm × 18.5 cm), with internal wooden runners permitting Perspex panels to selectively occlude either or both of two adjacent compartments, occupying one-third of the length of the cage, from the waiting area (the remaining two-thirds of the length). Within each of these compartments a ceramic digging bowl, containing scented digging media, could be placed.

Surgery
Fourteen rats were anaesthetised with an isoflurane (4% and reduced to 1% to maintain anaesthesia) and oxygen mix. 0.06 M ibotenic acid was administered bilaterally using a 0.5 µl Hamilton syringe with a 30 gauge needle attached, to target the orbital frontal cortex, at stereotaxic co-ordinates; tooth bar −3.3 mm, AP +4.0 mm, ML ±2.0 mm, DV −4.5 mm (from skull surface) (0.3 µl per site) over 2.5 min. The needle was left in situ for 3 min after administration. Rats were administered a 0.05 ml injection (s.c.) of the anti-inflammatory, carprofen, and a 0.25 ml injection (i.p.) of the sedative, diazepam, prior to surgery. One lesioned rat died two weeks post-surgery, and before any testing.
Fourteen rats were administered sterile phosphate buffer instead of ibotenic acid and were assigned to the control groups.

Behavioural Training
Between 10 and 20 days after surgery, 11 rats (lesion group n = 5; control group n = 6) were tested on the reversal learning task. The rats were first given experience of digging in ceramic bowls (of the size used for the test) and habituating to the food reward. Bowls were placed in the home-cage, filled with sawdust and a quantity of Honey Loops ® (Kellogg Company, Manchester, UK). By the following morning, the food was always eaten. On the training day, rats were placed in the waiting areas of the testing cage, and underwent three stages of training. In stage 1, sawdust-filled bowls, with food bait (half of a Honey Loop) buried in each, were placed in the two smaller compartments, and the partitions were removed allowing rats to approach the bowls in turn, uncover and eat both of the cereal pieces. This was repeated for a total of six trials. If the rat did not uncover the rewards from both bowls within 10 min of being given access to them, then the partitions were lowered, both bowls were rebaited and the trial repeated. To ensure that the rats would respond promptly during sessions when escitalopram would be administered, they were given additional training in the test. In stage 2, rats were exposed to each of the exemplars that they would encounter the following day during testing. The exemplars were paired as they would be during testing, but with odours and media presented separately (see Table 1). Both bowls were baited with half a Honey Loop, and rats were exposed to each pair twice (sides switched). The rat was given 10 min to obtain the reward from each bowl as in stage 1 of the training. During stage 3 the rat learned two simple discriminations, in which the bowls had different odours (the sawdust was scented with mint or oregano) or were filled with different digging media (paper confetti or small polystyrene pieces), and the rat had to learn which of the two bowls was baited. The side of the baited bowl was determined pseudo-randomly for each trial, with a constraint being that there were no more than three consecutive trials with the reward on the same side. If the rat dug in the correct bowl, the latency to dig was recorded and that trial was recorded as correct. The trial terminated when the rat returned to the waiting area of the box, at which point the barrier was lowered and the bowls re-baited. If the rat dug in the incorrect bowl, the latency to dig was recorded and the trial was marked as incorrect, but the rat was still permitted to continue to explore that bowl; the trial was only terminated when the rat returned to the waiting area, at which point the barrier was lowered. For the initial four trials at each stage of the test, the rat was allowed dig in the correct bowl to recover the reward after an initial incorrect response; after the fourth trial an incorrect response terminated the trial. Whether the rat initiated digging in the first bowl encountered or whether he explored both bowls prior to initiating digging was also recorded. The rat was given up to 10 min to uncover the reward from the baited bowl; if the reward was not uncovered the partitions were lowered and the experimenter waited until the rat showed interest again.
Criterion performance was six consecutive correct trials (the probability of making a correct choice 6 times consecutively by chance is 0.015), which could include the first four trials.

Behavioural Testing
On the first test day, the rat performed two series of three discriminations (Table 2). Both series consisted of a compound discrimination (acquisition (ACQ)), in which the rat must learn a novel discrimination between two exemplars of one dimension, ignoring the exemplars of an irrelevant dimension; a reversal (novel-reversal Table 2 Within the two series of discriminations, the order of exemplar pair exposure and whether odour or medium was rewarded in series 1 or 2 was counterbalanced, and matched between groups An example of the order of exemplar pair exposure

Discrimination
Relevant dimension exemplars

Irrelevant dimension exemplars
Series 1 Acquisition (ACQ) (REV)), where the exemplars remain the same as in the ACQ, but the correct and incorrect exemplars are reversed; a second reversal (reversal-back (BACK)), where the correct/incorrect status of the exemplars is reversed such that the discrimination is the same as during the ACQ stage. In the second series of three discriminations, novel stimuli were used, and the dimensional relevance to solving the discriminations was swapped.
The task advanced to the next stage when the rat had reached criterion (six correct trials consecutively). The procedure followed was the same for each stage: for the first four trials, the rat had the opportunity to dig in the correct bowl if it had first dug in the incorrect bowl. Thereafter, when the rat started to dig in either bowl, the partition to the other compartment was lowered to prevent access to the other bowl. The trial was not terminated until the rat returned to the waiting area. If the rat did not dig within 10 min, the partitions were lowered, separating the rat from the bowls. The trial was aborted and recorded as 'non dig'.
Subsequent testing followed the same protocol, although rats did not need to be trained again for these tests.

Counterbalancing
Order of exposure to the dimensions (i.e., initial rewarded dimension being odour or medium) and to the exemplars was not fully counter-balanced due to the number of exemplars and their possible combinations. Exemplars were presented in preassigned pairs (see Table 1) and within each dose, starting dimension and order of presentation of pairs was balanced. Counterbalancing was matched between lesioned and control rats.

Drug Administration
Rats were administered a 1 ml/kg (s.c.) injection of sterile saline on the two days prior to the first test. On the day of testing, rats were administered either a 1 ml/kg (s.c.) injection of sterile saline or a 1, 2, or 4 mg/kg (s.c.) injection of escitalopram (in sterile saline at 1 ml/kg) 30 min prior to testing. Administration of dose was counterbalanced according to a Latin square design. Each rat received each dose once, with the control and lesioned groups matched.

Histology
Rats were transcardially perfused with 4% paraformaldehyde in 0.1 M phosphate buffer (PB) after anaesthesia with 0.8 ml Dolethal. The brains were sectioned (50 µm) and stained for neuronal nuclei (NeuN) and counterstained with cresyl violet to map lesion extent, following standard protocols reported previously.

Data Analysis
Trials to criterion data (excluding non-digs) were analysed by repeated measures ANOVA (SPSS v 19.0) with dose (4 levels: vehicle, 1, 2 and 4 mg/kg escitalopram), discrimination series (2 levels: first and second) and stage (3 levels: ACQ, REV and BACK) as within subject variables, and group (2 levels: control and lesion) as between subjects variable.

Behavioural Training
Between 10 and 30 days after surgery, eight rats (lesion, n = 4; control, n = 4) were trained and tested on the reversal learning task. A further eight rats (lesion, n = 4; control, n = 4) were designated as their yoked controls. As rats were pair-housed, within each pair, one rat was designated to perform the reversal learning task, and the other would be its yoked control. The pair were trained and tested simultaneously. The eight rats that underwent the reversal learning task were trained and tested as described in experiment 1. The eight yoked controls underwent stage 1 of training as previously described, but thereafter training was altered. For stage 2 of training, yoked control rats dug in and obtained a single reward from each of two identical sawdust-filled bowls, an equal number of times to the reversal learning rat. During stage 3 of training, the yoked control rat was given access to two identical sawdustfilled bowls, each containing reward. Each time the reversal learning rat obtained reward, the yoked control rat was granted access to both bowls to obtain reward from one of them.

Behavioural Testing
The day after training, the reversal learning rats performed the two series of three discriminations as described in experiment 1. For the duration of testing, whenever the reversal learning rat obtained a reward the yoked control rat was given access to two identical sawdust-filled bowls and allowed to obtain reward from one of them.

Counterbalancing
With only two reversal learning rats in each condition counterbalancing of exemplars was not possible. Therefore, exemplars were presented in pre-assigned pairs as in experiment 1 and the order of exposure for all rats was the same.

Drug Administration
Rats were administered a 1 ml/kg (s.c.) injection of sterile saline for two days prior to testing. On the day of testing, rats were administered either a 1 ml/kg (s.c.) injection of sterile saline or a 1 mg/kg (s.c.) injection of escitalopram (1 mg/ml in sterile saline) 30 min prior to testing. There were therefore four conditions with two reversal learning rats and two yoked controls in each: control/saline; control/escitalopram; OFC lesion/saline; and OFC lesion/escitalopram.

Histology
Rats were transcardially perfused 90 min after completion of testing and brain sections stained for neuronal nuclei (NeuN) and counterstained with cresyl violet as for experiment 1. For Fos immunoreactivity, sections were treated initially as for NeuN, except they were incubated in goat anti-Fos (dilution 1:8000) on a stirrer for 1 night, followed, after a 5 min wash in sterile PBS, by incubation on a shaker for one hour in rabbit anti-goat biotinylated secondary antibody (vector IgG solution at 5 µl/ml ADS). After washing in 0.1 M PBS again, sections were incubated on a stirrer in Vectastain ABC complex (as above) for a further hour. Sections were then washed in 0.1 M PBS again, and finally immersed in Sigma Fast DAB tablets for approximately 10 min, with the time being determined by visual inspection of the tissue. The tissue was removed when background staining was light but neurons were clearly visible. Sections were washed again in 0.1 M PBS and then mounted on treated glass slides, air-dried and cover-slipped with DPX. Fos positive neurons in the prelimbic area of the medial prefrontal cortex (mPFC) and in the OFC were counted by H. Lundbeck A/S.

Data Analysis
Trials to criterion data were analysed by repeated measures ANOVA (SPSS v 19.0) with stage (3 levels: ACQ, REV and BACK) as within subject variables, and dose (2 levels: vehicle and 1 mg/kg escitalopram) and group (2 levels: OFC lesion and control) as between subject variables. Discrimination series was not used as a within subject variable: whilst all rats completed the first series of discriminations, not all rats completed all stages in the second. A mean of the data collected over the two series was therefore used where rats had completed those stages.
Area-corrected Fos activation counts were analysed by repeated measures ANOVA with side (2 levels: right and left) as the within-subjects variable, and dose (as above), group (as above) and behaviour (2 levels: reversal learning and yoked control) as between-subjects variables. Fig. 1 Coronal schematics of the rat brain (adapted from Paxinos and Watson 2006) showing greatest extent of (light grey), typical (mid grey) and smallest (dark grey) lesion damage for rats from experiment 1

Histology
Lesion placement was visualised in the NeuN/cresyl violet stained sections (Fig. 1). Approximately half of the lesions were positioned more dorsally, with the other half positioned ventrally. All lesioned rats showed cell loss in ventral and lateral OFC regions from bregma +5.00 to +3.50.
There was a three-way interaction between dose, group and stage (F (6,54) = 4.9, p < 0.05) (Fig. 3) in the context of no significant main effect of group (F (1,9) = 3.8, ns) or interactions of dose and group (F (3,27) = 2.4, ns), dose and stage (F (6,54) = 1.3, ns) or stage and group (F (2,18) = 2.8, ns). To probe this three-way interaction, corrected ANOVAs (using the error term from the omnibus ANOVA) were performed for each dose, with stage as within, and group as between-subjects variables.
In the vehicle condition, there was an interaction of stage and group (F 2,54 = 5.9, p < 0.05). Planned contrasts confirmed what is clear from Fig. 3: there was a difference Fig. 2 Mean + SEM trials to criterion from experiment 1 collapsed across group and dose to show the main effect of stage. Reversal stages required more trials to criterion than the acquisition (ACQ) The novel reversal stage (REV) also required more trials to criterion than the reversal back (BACK) Fig. 3 Mean + SEM trials to criterion from experiment 1 collapsed across discrimination series. Lesioned rats were impaired relative to control rats at the REV and BACK stages only after vehicle administration. Escitalopram at all doses ameliorate the effects of the lesion without affecting control rat performance (Paxinos and Watson 2006) between the groups at the REV (F 6,54 = 10.6, p < 0.05) and BACK (F 6,54 = 7.6, p < 0.05) stages, but not in the ACQ stage (F 6,54 = 1.4, ns).
In the three escitalopram conditions, there were no main effects of group, nor any interactions between group and stage. OFC-lesioned rat reversal performance is only impaired relative to control rats in the vehicle group: escitalopram administration at all three doses ameliorates the effects of the OFC lesion on both novel-reversals and reversals-back. Fig. 4 Coronal schematics of the rat brain (adapted from Paxinos and Watson 2006) showing greatest extent of (light grey), typical (mid grey) and smallest (dark grey) lesion damage for rats from experiment 2

Histology
Lesion placement was visualised in the NeuN/cresyl violet stained sections (Fig. 4). All lesioned rats showed cell loss in ventral and lateral OFC regions from bregma + 5.00 to +3.50. Figure 5 shows the number of trials to criterion for each stage at each dose. All rats completed the first series of discriminations, but not all completed the second series within the 90-min testing window. Data were collapsed across discrimination series (acquisition, novel reversal (REV) and reversal back (BACK)) where possible. No

Fig. 5
Mean + SEM trials to criterion from experiment 2 collapsed across discrimination series. The pattern of impairment is similar to that seen in experiment 1, with the same beneficial effect of escitalopram administration on reversal performance in the lesioned rats statistically significant effects were found, likely due to variability within the small sample size, although the visual trend in the data suggests escitalopram is improving reversal learning in the lesioned rats as in experiment 1.

Fos Expression
Fos positive neurons were counted in the mPFC and OFC. Figure 6 shows area corrected (count/mm 2 ) Fos counts for mPFC. There was an interaction between drug and group (F (1,8) = 6.87, p < 0.05): OFC-lesioned rats show greater Fos expression in mPFC than controls and escitalopram induces a further increase in Fos expression in rats with OFC lesions. The same pattern was also seen in the OFC (see Fig. 7): Fig. 6 Mean + SEM Fos count/mm 2 in the mPFC collapsed across side (behaving and yoked rats combined). More Fos activity was recorded in the lesioned rats' mPFC regardless of behaviour. Escitalopram increased Fos activity in the lesioned rats (regardless of whether they were performing a task or yoked control-not shown) without effect in the control rats (* interaction of group and dose, p < 0.05) Fig. 7 Mean + SEM Fos count/mm 2 in the OFC collapsed across side (behaving and yoked rats combined). More Fos activity was recorded in the lesioned rats' mPFC regardless of behaviour. Escitalopram increased Fos activity in the lesioned rats regardless of behaviour, without effect in the control rats an interaction between group and dose (F (1,8) = 5.75, p < 0.05) arose because OFClesioned rats show greater Fos expression in surviving areas of OFC than was seen in the intact OFC of controls. Escitalopram induces a further increase in activation of remaining OFC neurons in OFC-lesioned rats.

Discussion
The aim of this study was to examine the nature of cognitive structures in the rat, looking specifically at the underlying processes and cognitive structures in reversal learning. As reported previously (Chase et al. 2012;McAlonan and Brown 2003;Tait and Brown 2007;Tait et al. 2018), rats with non-selective OFC lesions are impaired relative to controls during compound discrimination reversal learning. Our new data demonstrates that this impairment occurs equally in both novel reversals and reversals returning to a previously learned discrimination. This impairment is ameliorated by administration of the SSRI, escitalopram, at all doses investigated (1, 2 and 4 mg/kg).
Expression of Fos protein in both the mPFC and intact areas of OFC was increased in rats with OFC lesions. Escitalopram at 1 mg/kg potentiated this lesion-induced Fos increase, regardless of the behaviours investigated, but had no effect on Fos expression in control rats.

Reversal Learning
Previous investigations of serial reversal learning in rodents have involved consecutive stages requiring alternation of responding, typically requiring a spatial discrimination (e.g., Béracochéa et al. 2003;Boulougouris et al. 2007;Stalnaker et al. 2007). Serial discrimination reversal learning using visual stimuli has been reported in primates (e.g., Clarke et al. 2007) and using olfactory stimuli in rats (Kinoshita et al. 2008;Schoenbaum et al. 2003). In these studies, stimuli were "simple", in that there was one correct and one incorrect with no deliberately embedded irrelevant information-i.e, any discriminable feature of a stimulus could be used to predict that stimulus' reward status. Our task design adapted the rodent ID/ED attentional set-shifting task, and therefore used compound stimuli-i.e., there was a dual dimensionality to the stimuli, with one dimension's features predicting reward status and the other being uncorrelated with reward status. A compound discrimination reversal must be more difficult than a simple discrimination reversal due to the additional requirement to filter out irrelevant information. Impaired performance at these reversal stages can therefore reflect a reduced ability to either adapt to changes in stimulus reward status, or to filter out this irrelevant information.
In a typical serial reversal learning task, there are several consecutive reversals, with the subject required to switch and back and forth. Improvements occur with successive reversals. As our task design included a novel discrimination between four reversal stages, the third reversal is similar to the first (both are novel-reversals), and the fourth reversal is similar to the second (both are reversals-back). That we observed no difference in performance between the first and second discrimination series reversals, but that there is a difference between novel-reversals and reversalsback, suggests that a learning set did not form. Our data thus demonstrate that novelreversals require more trials to learn than reversals-back. This difference likely arises from the reversals-back being facilitated by familiarity with the particular stimuli, rather than learning about reversals (which would also have benefitted the subsequent reversals).

The Effects of OFC Lesions on Reversal Learning
The role of the OFC in reversal learning in rats is well documented (Ghods-Sharifi et al. 2008;Kim and Ragozzino 2005;McAlonan and Brown 2003;Schoenbaum et al. 2002Schoenbaum et al. , 2003Murray et al. 2007;Chase et al. 2012;Tait and Brown 2007). The processes underlying OFC lesion-induced reversal learning impairments are less clear. We have previously reported that OFC lesions impair reversal learning in compound discrimination reversal learning (McAlonan and Brown 2003) during a test of attentional set-shifting, and that this impairment likely does not arise from perseverative responding to previously rewarded stimuli (Tait and Brown 2007). However, rats with OFC lesions do not benefit from forming an attentional set-there was no difference in performance between intradimensional (ID) and extradimensional (ED) shift stages in the OFC-lesioned rat (McAlonan and Brown 2003;Chase et al. 2012). We have further reported that excitotoxic lesions of the nucleus basalis magnocellularis of the basal forebrain also impair reversal learning and also result in no difference between ID and ED shift performance . In these lesion studies where the ID/ED differences are lost, there is no evidence of a difference between control and lesion group ED shift performances. Instead the data suggest that the ID/ED difference is lost because of worsening performance at the ID stage. Whilst the experimental design of these studies preclude drawing strong conclusions about set-formation, it would be predicted that rats that fail to form an attentional set would not show a shifting cost at the ED stage-i.e., rats try to solve the ID and ED shift stages with no a priori dimensional bias, and there should therefore be no difference in performance between those two stages. These data then imply one of two possibilities: either OFC lesions and/or basal forebrain lesions directly impair both reversal learning and attentional set-formation; or impairments in reversal learning induce impairments in attentional set-formation. To partially answer this question, we reported that OFC lesions do impair set-formation in rats independently of reversal learning in a variant of the ID/ED task with multiple ID stages and no reversal stages (Chase et al. 2012). We cannot yet, however, rule out the reverse: the possibility that impairments in set-formation result in a reduced reversal learning ability. However, given that there are considerable data demonstrating OFC lesion-induced reversal learning deficits outwith tests of compound discrimination reversal learning, we are confident to conclude that the OFC-lesion induced deficits in reversal learning that we report here are a reflection of a fundamental impairment in reversal learning. That OFC-lesioned rats may find compound discrimination reversal learning more difficult than simple discrimination reversal learning because of an additional reduced ability to disregard the irrelevant information present in a compound discrimination is a possibility, but unlikely to be the sole source of the impairment. Furthermore, whilst our task is based on a modified version of the rodent ID/ED task, it does not contain measures of attentional set-formation or set-shifting per se, so attempts to draw conclusions on such would be overly speculative.

The Effects of Escitalopram on Reversal Learning
Increasing the availability of serotonin improves reversal learning in OFC-lesioned rats, and does so in both novel-reversal and reversals-back. Whilst there is a consensus that serotonergic (5-HT) manipulations impact reversal learning, reported results depend not just on the specific manipulation, but also on the form of reversal learning tested. Tryptophan depletion does not impair spatial reversal learning in rats (van der Plasse and Feenstra 2008), but inhibition of tryptophan hydroxylase by para-chlorophenylalanine does impair compound discrimination reversal learning in an attentional set-shifting task (Lapiz-Bluhm et al. 2009). In primates, 5,7-dihydroxytryptamine lesions of OFC impairs visual discrimination reversal learning-both in simple discrimination serial reversal learning and compound discrimination reversal learning during an attentional set-shifting task (Clarke et al. 2007). Increasing endogenous 5-HT improves reversal learning in rodents: citalopram, consisting of both the r-and s-citalopram enantiomers, improves probabilistic reversal learning after both acute and sub-chronic dosing regimes (Bari et al. 2010). Whilst an acute administration of 1 mg/kg citalopram impairs, a higher dose (10 mg/kg) improves, probabilistic reversal learning performance. Lower doses of escitalopram, being more potent than citalopram, would be expected to produce similar effects to higher doses of citalopram. Hence, the fact that we report amelioration of OFC lesion-induced reversal learning impairments at an escitalopram dose of 1 mg/kg should not be considered a conflict with the data that show that the same dose of citalopram impairs reversal learning. Indeed, Bari et al. (2010) discuss evidence that low levels of citalopram induce different outcomes on PFC 5-HT availability, which may explain their reported impairment. It has also been reported that vortioxetine, a SSRI and serotonin receptor modulator, ameliorates reversal learning in an attentional set-shifting task in rats subjected to freezing stress (Wallace et al. 2014).
Reversal learning was thought to involve two distinct phases (see Sutherland and Mackintosh 1971): initially, after the change in the reinforcement contingency is detected, the response must extinguish; subsequent to a period of responding randomly, the new association is gradually learned. We recently demonstrated that this is overly simplistic: responding 'at chance' while seeking a solution is unlikely to be governed by responding 'by chance' (Dhawan et al. 2019). While reversal learning paradigms can depend on model-free learning, they may also involve modelbased processes (Doll et al. 2012;Izquierdo et al. 2017;Dhawan et al. 2019). In serial reversal learning tasks, performance improves with each reversal, as if the animal learns, over-and-above the particular S+/S− attribute, a win-stay/lose-shift rule, which Harlow (1949) referred to as a 'learning set'. In the present study, the rats performed a reversal and then reversed back only once, but already there was a learning benefit. However, it is unlikely that this benefit arose from learning a 'winstay rule' because it did not extrapolate to either the first reversal of a subsequent novel discrimination or the reversal back of that second discrimination reversal.
That neither OFC lesions, nor administration of escitalopram, affects the relationship between novel-reversals and reversals-back implies that there are similar processes involved in each form of reversal-or, more specifically, processes that are affected by OFC lesions and interactions with escitalopram mediate both reversing and reversing back-and whilst the task is sensitive enough to distinguish between novel-reversals and reversals-back, it is not sensitive enough to elucidate differences after OFC lesions and escitalopram administration.

Fos Activity
The data from Fos expression suggest that there is increased, behaviourally independent, activation in both mPFC and OFC after OFC lesions, and that this increased activity is augmented by escitalopram with no significant effect on control animals. The Fos expression reported here is similar in pattern to that seen in surviving mPFC neurons after administration of the atypical antipsychotic, asenapine (Tait et al. 2009), to rats with mPFC lesions. Specifically, rats with mPFC lesions show increased activity in surviving mPFC neurons-an effect that is augmented by administration of asenapine-but that is again behaviourally independent. The similarity of the activation pattern may suggest that both drugs act through overlapping mechanisms on the mPFC, i.e. escitalopram by increasing serotonin levels and asenapine by modulating activity of serotonin receptors (Homberg 2012).
The increased mPFC and OFC Fos expression in the rats with OFC lesions was seen both when they were performing discrimination learning and reversals and also in yoked controls. Consequently, we can conclude that this expression is not a marker of activity driven by the cognitive processes underlying discrimination and reversal learning. It is likely then that there is increased recruitment of PFC neurons resulting from the lesion irrespective of the cognitive demands on the rats.
In intact rats, there was similarly no difference in Fos expression in rats performing the task or their yoked controls. This suggests that the cognitive processes mediated by these brain regions likely require low levels of activity from a relatively large pool of available neurons. Thus, our observations of low levels of Fos expression in the control rats arise because few neurons are activated to a sufficient threshold that Fos is expressed to a detectable level. In lesioned rats, with fewer PFC neurons, there must be increased recruitment of surviving neurons in order for cognition to approach normal levels-more neurons need to activate to the threshold level where detectable Fos is expressed because there are fewer neurons to fulfil their respective roles. In the case of OFC-lesioned rats, this increased expression in a reduced number of neurons reflects increased neuronal activity that is insufficient to normalise reversal learning. However, escitalopram facilitates even greater PFC activity than could occur otherwise, and this increased activity is sufficient to normalise reversal learning in the OFC-lesioned rats. That we observed increased Fos activity in the mPFC of the OFC-lesioned rats, as well as the OFC, is a reminder that a network of brain regions underlies complex cognition and behavioural flexibility. mPFC neurons may be recruited to compensate for the functions that are impaired when the OFC is damaged. The mPFC, being adjacent to the OFC, was also damaged to some extent in most of the lesioned rats. Although this incidental mPFC damage did not result in the same behavioural profile associated with targeted mPFC, it is possible that this is due to compensatory elevation of mPFC activity, as indicated by increased Fos activation, in the surviving mPFC neurons. In both the case of asenapine-treated mPFC-lesioned rats and escitalopram-treated OFC-lesioned rats, behaviourally independent druginduced increases in activity in surviving neuronal populations likely facilitate the cognitive processes that have been impaired by damage, but do not reflect activity actually driven by the undertaking of those cognitive processes.
The fact that reversal learning can be readily measured in different species, using species appropriate stimuli and responses, makes it a particularly valuable test for translational psychopharmacological research (see Izquierdo et al. 2017). Serial reversal learning is commonly used in non-human animals, often because this is a way to gather 'additional data' without recourse to lengthy training of new discriminations or the requirement to generate a large number of novel stimuli for testing. However, serial reversals should be thought of as more complex than simply repetition of the same thing. Reversing-back benefits from the additional familiarity with the stimuli, which is also seen if an animal is given additional post-criterion trials of overtraining. This effect is seen even in the absence of a benefit from the formation of 'learning set' (i.e., incorporating into the cognitive structure the concept that 'reversals can occur'). We report here no evidence of a learning set following a single reversal/reversed back: subsequent reversals of new stimuli were not more rapidly acquired, even while reversing back was consistently more rapid than initial reversing. That notwithstanding, we conclude that reversal learning in OFC-lesioned rats is both an easily administered and sensitive test that can detect effects of serotonergic modulation on cognitive structures that are involved in behavioural flexibility.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.