Introduction

Certain aspects of cognitive flexibility (i.e. the ability to adapt behaviour, see for instance, Fuster 1980; Kolb 1984; De Bruin et al. 1994; Dalley et al. 2004) depend critically on central serotonin (5-HT). Recent experiments in humans with tryptophan depletion, a method resulting in a transient decrease in 5-HT synthesis, show impairments in the ability to learn changed stimulus–reward associations (Rogers et al. 1999) and in the performance on tests of cognitive flexibility (Rubinsztein et al. 2001). Lesion studies in non-human primates corroborate these findings, showing that it is the 5-HT innervation of the prefrontal cortex (PFC) that is critical for adapting behaviour when stimulus–reward contingencies are reversed in a two-choice discrimination (reversal learning; Clarke et al. 2004, 2005). Similar observations have been made after central 5-HT depletion in rats, indicating that 5-HT depletion leads to impaired flexibility (Harrison et al. 1999), increased perseverative responding (Beninger and Phillips 1979; Morgan et al. 1993) and impaired impulse inhibition (Winstanley et al. 2004, 2006).

In addition to effects on cognitive flexibility, tryptophan depletion has also been reported to lead to impairments in affective processing (i.e. the ability to evaluate and integrate emotional content, see for instance, Hariri et al. 2006) and reward discrimination (Rubinsztein et al. 2001; Rogers et al. 2003; see also Roberts and Wallis 2000). Such changes in affective processing (see also Remijnse et al. 2005) have, however, not yet been reported in studies involving selective prefrontal 5-HT lesions in either primates or in rats. Elucidation of the neural substrates underlying this special role of 5-HT in affective processing is one aim of the current study.

A second issue that is central to studies of the PFC in cognitive flexibility is that of specific contributions of the different cortical sub-areas. Evidence for a functional dissociation within the PFC regarding cognitive flexibility comes from studies across species, and it is important to note that the functional differentiation (specificity) of the anatomical subdivision of the rat brain PFC is, to a large extent, comparable to that of the primate PFC (Uylings et al. 2003). In terms of homology between the primate and rodent PFC, we believe, given the data reviewed in Uylings et al. (2003), that the rat medial PFC (mPFC) shows a functional overlap with both lateral and medial frontal regions of the primate brain.

The orbital PFC has been shown to support reversal learning in both primates (Iversen and Mishkin 1970; Butters et al. 1973; Dias et al. 1996; Clarke et al. 2004; Kringelbach et al. 2003; Hornak et al. 2004) and rats (Schoenbaum et al. 2002; Chudasama and Robbins 2003; McAlonan and Brown 2003) and is implicated in the coding of reward-related information in non-human primates (Hikosaka and Watanabe 2000; Tremblay and Schultz 2000; Schultz 2006), rats (Schoenbaum et al. 2003; Bohn et al. 2003) and humans (O’Doherty et al. 2001; Kringelbach 2005).

The mPFC, on the other hand, supports rule switching and attentional set shifting in rats (De Bruin et al. 1994; Joel et al. 1997; Ragozzino et al. 1999a,b; Birrell and Brown 2000). This functional dissociation between reversal learning and attentional set shifting is also observed in primates, in which the latter is dependent on the lateral PFC (Dias et al. 1996; see also Robbins 2005). However, this particular distinction between PFC functions might be rather specific to visual- and odour-based learning in rats (Chudasama and Robbins 2003; Schoenbaum et al. 2002) and primates (Clarke et al. 2005, 2007), as several groups have reported an involvement of the mPFC in reversal learning using spatial discriminations (Kolb et al. 1974; Li and Shao 1998; Salazar et al. 2004; De Bruin et al. 2000). Furthermore, the mPFC has been shown to be involved in the initial processing of reward-related information, as recording of single neuron activity indicated this area to be involved in the coding of reward information in relation to spatial cues (Hok et al. 2005). From this, we conclude that the mPFC may have a specific role in spatial reversal learning. On a more general level, these findings have been paralleled and extended by human studies. Knutson et al. (2005), for example, reported on mPFC-mediated reward coding, and evidence for a mPFC involvement in implementation of that (reward) knowledge in existing stimulus–reward contingencies comes from Ridderinkhof et al. (2004).

Given the fact that 5-HT is involved in reversal learning supported by the orbital PFC (see above) and that 5-HT lesions lead to functional deficits such as perserverative responding and loss of impulse inhibition (Beninger and Phillips 1979; Soubrié 1986; Harrison et al. 1999; Dalley et al. 2002; Winstanley et al. 2004, 2006), which are also observed after lesioning of the mPFC (Morgan et al. 1993; Passetti et al. 2002; Salazar et al. 2004), we feel that investigation of the involvement of 5-HT in the mPFC in spatial reversal learning is warranted. This forms the second main aim of these experiments.

Given the primate literature on the involvement of 5-HT in affective processing and of the mPFC in the implementation of reward information, we hypothesise that the medial prefrontal 5-HT is selectively involved in cognitive flexibility when affect guides decision making (i.e. when ‘emotional’ content drives decision making). Moreover, based on the ‘reversal’ literature, we hypothesise that 5-HT in the mPFC is especially important for successful behavioural adaptation when the stimulus modality is spatial but not odour-based.

To test our hypotheses, we conducted an experiment in which we selectively destroyed 5-HT terminals in the mPFC of male Wistar rats by means of local infusion of the toxin 5,7-dihydroxytryptamine (5,7-DHT). Both sham controls and lesioned animals were then assessed on a series of behavioural tests in which functions thought to underlie cognitive flexibility and affective processing were addressed (De Bruin et al. 2000; Schoenbaum et al. 2006).

To explore the hypothesis that mPFC 5-HT is involved in affective processing, we tested the impact of altered reward value on reversal learning. For half of the sham and control animals, we switched the reward presented after a correct response from non-preferred reward pellets to preferred pellets at the time of the first reversal. Our expectation was that the introduction of this affective shift, on top of the reversal, would cause lesioned animals to show altered task performance because of an inability to accurately process the altered reward information.

For the hypothesis that stimulus modality determines specific involvement of the mPFC, we tested reversal learning across different stimulus modalities.

In line with our studies on mPFC lidocaine inactivation (De Bruin et al. 2000), we subjected the animals to a test of spatial discrimination and reversal learning. In addition, we included a second, operant, odour-based go/no-go reversal task. Based on our hypothesis regarding selective involvement of the mPFC in spatial learning, we expected impaired reversal learning in the spatial but not in the odour-guided task in lesioned animals.

To gain more insight in the mechanisms that might underlie impaired cognitive flexibility, we tested the effect of the lesion on extinction learning and impulse inhibition, and we assessed the animals’ behaviour during extinction of the ‘spatial’ task and responding on ‘go’ and ‘no-go’ trials in the ‘odour’ task.

Materials and methods

All experiments were approved of by the Animal Experimentation Committee of the Royal Netherlands Academy of Arts and Sciences and were carried out in agreement with Dutch Laws (Wet op de Dierproeven 1996) and European regulations (Guideline 86/609/EEC).

Subjects

Subjects were 32 male outbred Wistar rats (Harlan/CPB, Horst, The Netherlands), weighing 175–200 g upon arrival. The animals were socially housed in groups of four in standard type IV Macrolon cages, where they were kept under a reversed day/night cycle (dimmed red light from 7 a.m. until 7 p.m., white light from 7 p.m. until 7 a.m.) for the duration of the experiment. After surgery, the animals were food restricted (16 g/animal per day) to maintain their body weight at 90% of free-feeding weight. Water was available ad libitum throughout the experiment.

Surgery

Two weeks after arrival, the animals were randomly assigned to either the experimental lesion groups or the sham groups and subjected to surgery. The aim was to selectively destroy the serotonergic (5-HT) innervation of the mPFC. To prevent loss of dopaminergic (DA) and noradrenergic (NA) terminals in the target area because of the treatment, all animals were intraperitoneally injected with 20 mg/kg Desipramine HCl (Sigma, Zwijndrecht, The Netherlands) dissolved in MilliQ water (10 mg/ml) 30 min before surgery. The weight of the animals at the time of surgery was 247–289 g. Anaesthesia was induced with 5% isoflurane in oxygen and maintained for the duration of the surgery with 2% isoflurane. For the surgical procedure, the animals were mounted in a stereotactic frame with the toothbar set at −2.5 mm. Bilateral stainless steel cannulas (outside diameter 0.3 mm), connected to a microinfusion pump (801 syringe pump, Univentor, High Precision Engineering, Zejtun, Malta) with flexible PEEK tubing (0.51 mm outside diameter, 0.013 mm inside diameter; Aurora-Borealis, Schoonebeek, The Netherlands), were then placed in the mPFC at an angle of 12°, anterioposterior +30, lateral ±16; ventral −40 (Paxinos and Watson 1998).

After cannula placement, 0.5 μl of either 5,7-DHT (5,7-DHT creatinine sulphate salt, Sigma; 32 μg/μl, dissolved in 0.1% ascorbic acid) or vehicle was infused into the target area over a period of 1 min. After infusion, the cannulas were kept in place for an additional 2 min to allow for diffusion. The wound was closed with surgical stitches after the cannulas were removed from the brain. Post-operative pain reduction was achieved with Finadyne (flunixin meglumide 50 mg/ml, Schering-Plough, Segré, France), 0.01 ml/100 g body weight subcutaneously, approximately 2 h after surgery. After surgery, the animals were returned to their home cages.

Apparatus

Two behavioural tasks were used. For the first (two-lever discrimination) task, locally constructed operant chambers were equipped with a set of two retractable levers positioned to the left and right of a food dispenser (distance lever–feeder, 10 cm). A house light was placed on the opposite wall, two lights were positioned above the levers, signalling their presentation, and one inside the food dispenser, signalling the delivery of a 45 mg food pellet (Noyes Formula AI or P, Research Diets, New Brunswick, NJ). Nose pokes were detected by an infrared sensor located inside the food well. Operation of the Skinner boxes was controlled by a personal computer running the Med-pc™ software (Med Associates, Sanddown Scientific, Middlesex, UK).

For the second (odour-based go/no-go) task, commercially available operant chambers (Med Associates, Sanddown Scientific) were equipped with a retractable lever and an odour-sampling unit positioned on opposing walls (spaced 34 cm apart), allowing the presentation of distinct (non-aversive natural) odours. A food dispenser, from the same company, was positioned next to the odour-sampling port. Trial lights indicated odour, lever and reward presentation.

Pellet rewards

To manipulate the affective value of the rewards, we used two different types of pellet rewards (Noyes ™ AI and P; for additional information see www.researchdiets.com) that were shown to be differentially preferred by the animals (see “Pellet preference”). Before the start of the experiment, all animals were familiarised with both types of pellets in their home cage. During the reversal phases of both tasks, the preferred pellet reward was only available to half of the animals (for groups, see below).

Pellet preference

To establish whether male Wistar rats showed preference for a specific food pellet, we pre-tested three types of rewards (Noyes ™ formula AI, P, and FP; see Fig. 1) in a separate group of eight animals. The testing apparatus was a standard-size T-maze with trap doors in which both arms were baited. Five daily habituation sessions were given during which the animals were allowed to freely explore the entire T-maze and familiarise themselves to the position of the two different rewards that were tested that day. Subsequently, the animals were given the choice between the two arms, each baited with one of the three types of reward (15 trials). In these trials, the trap door was lowered after an arm entry was made to prevent the rat from entering the second arm. All three possible combinations of pellets were given over 3 consecutive days to all rats. To avoid ‘sequence’ effects, the rats were pseudo-randomly assigned to a particular pellet order. The animals were housed and food restricted as the experimental animals.

Table 1 Results of HPLC analysis of sham and 5-HT-lesioned animals in seven brain areas

Experimental procedure

Approximately 9 days after surgery, a period that a pilot experiment had indicated to be necessary to achieve maximum 5-HT depletion (data not shown), the behavioural experiments started. On experimental days, which typically lasted from 9 a.m. to 5 p.m., the animals were transferred to the experimental room, where they remained in their home cages until testing. In both the animal facility and the experimental room, a radio played to mask background noises.

Before the behavioural testing, both lesion and sham groups were further divided to create a total of four groups. Two groups, one lesion and one sham, received non-preferred pellets throughout the experiment, and two groups started receiving non-preferred pellets but only received preferred pellets from the start of the first reversal.

The two behavioural tasks were applied consecutively in the same order to all animals. At the start of the second task, all animals received non-preferred pellets again.

Behavioural task 1. Two-lever discrimination task. (weeks 1–4)

This task was used to assess the effects of mPFC 5-HT depletion on the acquisition of an operant response, discrimination between two spatial stimuli and reversal and extinction of the acquired responses (De Bruin et al. 2000).

First, the animals were trained, in a maximum of nine (64 trials) shaping sessions, to press a lever for a food reward. During this shaping phase, a single lever was presented at each trial (randomly the left or right lever). In the course of the sessions, the animals were required to increase the number of lever responses per trial from a fixed ratio (FR) of one to three responses to obtain a reward. All of the trials in the two-lever spatial discrimination and reversal task (including shaping) were discrete choice trials; that is, the lever(s) were retracted upon responding or, in case of an omission, at the end of the trial duration (60 s). The inter-trial interval was set at 25 s.

Thereafter, a second lever was introduced, and the animal had to learn to press either the left or right lever to obtain a food reward (two-lever discrimination). Pressing the incorrect lever or omitting a response (i.e. not responding within 60 s) led to trial termination and an inter-trial interval timeout. After reaching criterion, i.e. obtaining 90% of the food pellets in a session, the animals were subjected to serial reversals. For two of the four groups, this coincided with the switch from non-preferred to preferred pellets. During this reversal phase, the rewarded and non-rewarded levers were reversed daily for 4 consecutive days. The final phase was an extinction phase in which both levers were presented but none were rewarded.

On both the two-lever discrimination and the serial reversal tests, the animals were tested twice daily: a morning and an afternoon session. The extinction phase consisted of a single daily session for 4 consecutive days. All sessions comprised eight blocks (with eight trials each), which were divided for analysis into two sets of four blocks. Session duration ranged from 45 min (two-lever discrimination) to 2 h (extinction).

Behavioural task 2. Odour go/no-go inhibition task (weeks 5–9)

This task was used to assess the acquisition of a two-odour discrimination, the inhibition of lever pressing and the reversal of the acquired discrimination.

In this task, the animals were first trained to sample an odour in an odour port and subsequently press a lever once for a food reward (FR1). This was achieved in four (30 trial) sessions over 2 days. Once they acquired these two operant behaviours, they were trained in six (30 trial) daily sessions to discriminate between two odours, one indicating a ‘go’ trial in which the animal was required to press the lever to obtain a food reward and another odour that signaled a ‘no-go’ trial in which the animal had to withhold pressing the lever to obtain a food reward. Failure to press the lever on a ‘go’ trial or inhibit a lever response on a ‘no-go’ trial, as well as a failure to sample the odour or respond within 60 s (omission), triggered a 30-s inter-trial timeout during which the house light was turned off and no pellet was rewarded. The final phase was a single reversal in which the stimulus–response contingencies were switched and the animals were required to reverse their previously acquired go/no-go responses. For two of the four groups, this coincided with the switch from non-preferred to preferred pellets. In this final phase, the animals were tested in 24 (45 trial) daily sessions over 12 days. The number of trials per session during this phase was increased to 45 to facilitate task acquisition.

Behavioural measures

Pellet preference test

The mean cumulative number of arm entries/pellet choices per rat were grouped and compared over all, 15 trials, sessions.

General activity

To assess the possible effects of the lesion on general activity, the groups were compared regarding average number of head entries into the food dispenser and lever presses per block. These measures were taken during testing on the two-lever discrimination task. To exclude the influence of performance on these activity measures, data from the last acquisition session (both sets) were used. In this session, the performance of both groups was stable and statistically identical.

Tests of cognitive flexibility

The data obtained from both behavioural tasks were taken as measures of performance in the following fashion: the amount of obtained rewards as a fraction of the amount of obtainable rewards was taken as an index of overall performance on all test phases except the extinction phase of the two-lever discrimination. For this phase, the total number of lever responses was taken as the index, and, as this measure is prone to interference from pre-existing differences between groups in the total number of lever presses, the amount of lever presses per block of eight trials was expressed as a percentage of the mean number of lever presses per group in the same block on the previous day.

Response accuracy as an index of the amount of response errors was scored as the number of correct lever presses relative to the total amount of lever presses. This measure, together with the number of lever omissions, was taken to explain sub-optimal performance.

Performance on the two-odour go/no-go task was further divided into success and failure of responding on the ‘go’ trials and the ‘no-go’ trials as a measure of inhibition.

The effects of reward type (preferred vs non-preferred pellets) on reversal learning were measured by comparing performance scores of the reversal phase of both groups.

The success of task acquisition of any test phase was established by calculating the aforementioned performance. In case of the extinction phase, the success on the test was measured by the total number of lever presses.

Lesion evaluation

Neurochemistry

After all experimental procedures were completed, the animals were numbed with a mixture of O2/CO2, after which they were decapitated. Dissection of the brain and removal of the regions of interest immediately followed decapitation. Both the medial (prelimbic, infralimbic and anterior cingulate cortex) and orbitolateral (ventral orbital, ventrolateral orbital, lateral orbital, agranular insular) PFC were completely removed in three adjacent (anterior–posterior from frontal pole to genu of the corpus callosum) slices as well as a single control area (motor cortex, M-2; Paxinos and Watson 1998; Uylings et al. 2003). After dissection, all samples were stored in −80°C until analysis. All medial and orbitolateral PFC areas were analysed separately to gain more insight into the spread of the lesion. Tissue from both hemispheres were, however, analysed together.

Tissue samples were weighed and homogenised in ice-cold 0.1 mol/l perchloric acid and centrifuged for 15 min at 14,000 rounds/min. After centrifuging, the unfiltered supernatants were transferred to an autosampler and 20-μl aliquots were injected onto a column for high-performance liquid chromatography (HPLC) analysis (Waters 600E pump and Waters 717plus autosampler, Waters Chromatography b.v., The Netherlands; Decade VT-03 electrochemical detector, Antec Leyden, The Netherlands) and Shimadzu Class-vp™ software (version 5.03, Shimadzu Duisburg, Germany).

The mobile phase consisted of 0.06 mol/l sodium acetate, 9 mmol/l citric acid, 0.37 mmol/l heptanesulphonic acid and 12.5% methanol. The flow rate was kept constant at 0.65 ml/min. Separation of 5-HT, DA and NA from other components was achieved with a Supelcosil column (LC-18-DB 25 cm × 4.6 mm × 5 μm), with a 2-cm guard column of the same material (Supelco Superguard ™, Supelco, USA) kept at a constant temperature of 28°C. Quantification was achieved by means of electrochemical detection (oxidation potential set at +0.65 V). The transmitter/metabolite content was calculated against a calibration curve of external standards.

The lowest detectable concentration in the supernatant of our main transmitter of interest, 5-HT, was 800 pg/ml, at a signal to noise ratio of 2:1.

Data analysis

Pellet preference

The data of the pellet-preference experiment were analysed using a Friedman rank-order test with the cumulative number of choices per pellet as rank sums. Post-hoc testing consisted of two-group Mann–Whitney U comparisons. p values (two-tailed) were set at p < 0.01.

Behavioural tasks 1 and 2

Behavioural data taken during these experiments were analysed using SPSS for Windows (version 11.0; SPSS, Gorinchem, The Netherlands). Shaping data from both behavioural tasks were analysed as the number of trials needed to reach a performance level of greater than 90%. Acquisition and reversal data obtained from both behavioural tasks were arcsine transformed to reduce the interference from ceiling effects (Microsoft Excel, Seattle, WA). The data of the extinction phase of the two-lever discrimination, where no ceiling effect was present, were not transformed. Based on previous results, in which effects of local mPFC inactivation were often observed only in one half of a learning session, we divided the data of each session of the two-lever discrimination task in two sets of four blocks, each consisting of eight trials (De Bruin et al. 2000). In contrast, data from the go/no-go task execution were analysed between sessions and not between sets or blocks because the observed learning in this task was slow and group differences spread out over multiple days. Because of technical problems, some data points were lost.

For the data of both tasks, a repeated-measures analysis of variance (ANOVA) was used with treatment (control/lesion) and pellet (AI/P) as ‘between-subjects’ factors and block as the ‘within-subjects’ factor. When a ‘between-subjects’ effect was found, a one-way ANOVA and Student–Newman–Keuls (SNK) test was used to test for group differences. For the shaping and acquisition phases (which preceded the introduction of the different pellets), the behavioural data of both control groups and lesion groups were combined. This was done to increase power and because both control and lesion groups received the same pellet. In case a treatment effect was found, an independent-samples t test was used as the post-hoc test. Activity measures over the last acquisition session were analysed in a similar fashion.

Neurochemical data

Transmitter tissue content was analysed against a calibration curve of external standards. Group comparisons were made using one-way ANOVA, and post-hoc testing was done with an independent-samples t test.

Results of the 5,7-DHT mPFC lesion experiment

Pellet preference

Figure 1 shows the mean cumulative scores or arm entries for the three pellet types tested. Comparison of rank sums using a Friedman rank-order test revealed significant group differences (F (9,2) = 12.667, p = 0.002). Repeated post-hoc Mann–Whitney U tests revealed a food preference for pellet type ‘P’ over pellet types ‘FP’ and ‘AI,’ p < 0.001.

Fig. 1
figure 1

Pellet preference test. Rats were tested over 3 consecutive days in a T-maze for pellet preference. Mean cumulative number of arm entries/pellet choices is displayed. Asterisk, different from ‘AI’ and ‘FP,’ p < 0.001

Neurochemical examination

Postmortem tissue analysis of brain tissue revealed that 5-HT levels were reduced in lesioned animals compared to control rats in all mPFC areas. Table 1 summarises the results. Spread to neighbouring areas was limited to a small but significant reduction of 5-HT in M-2. Pre-surgery desipramine administration prevented significant loss of DA terminals in all inspected areas, although reductions in both medial and orbital areas were observed. A modest but significant reduction in NA was found in the most frontal mPFC area (med-1). Two animals were excluded from the experiment and further data analysis for showing insufficient loss of 5-HT in the mPFC (i.e. less than 30%) or loss of transmitter in neighbouring areas (i.e. more than 50% of both DA and NA).

Body weight and general activity

Post-surgery body weight did not differ between the lesion group and animals that underwent sham surgery (data not shown). The behavioural measures, head entries into the food dispenser and average number of lever presses, taken to assess overall activity of the treatment groups, did not differ, respectively (F (1,28) = 0.719, p = 0.404 and F (1,28) = 0.001, p = 0.977). Table 2 shows the averages for both parameters.

Table 2 General activity measures

Behavioural analysis

Effects of mPFC 5-HT depletion on the two-lever discrimination task

Shaping

During the initial shaping phase, the animals learned to press a lever for a food reward. No differences between any of the groups were observed on number of trails needed to reach criterion (i.e. >90% of pellets obtained; F (3,28) = 1.597, p = 0.215). Shaping took maximally nine sessions spread out over 5 days. Performance, expressed as a percentage of obtained rewards, exceeded 90% before initiation of the next test phase.

Two-lever discrimination

Shaping on the operant task was followed by a two-lever discrimination. A main effect of block, showing that learning took place, was found over all blocks of the morning session (F (7,168) = 17.019, p = 0.000) after which performance levelled off. Initially, the overall performance of the sham controls fell behind that of lesioned animals (interaction effect), which made fewer errors in the second set of the morning session (F (1,24) = 5.599, p = 0.026; see Fig. 2). An independent-samples t test over this set showed blocks 2 and 3 to differ between lesion and control groups, respectively (t = −2.509; p = 0.025 and t = −2.534; p = 0.024). No difference in the number of omissions was observed (F (1,24) = 1.045, p = 0.316). Group differences had disappeared by the second session.

Fig. 2
figure 2

Acquisition of the two-lever discrimination. Depicted is the percentage of correct responses for both lesion and control groups on the first acquisition day. White and grey bars indicate sets of four blocks (32 trials). Asterisk indicates group differences, p < 0.05

Overall performance over the second set of the final acquisition session showed that all groups (four) had obtained at least 90% of the rewards and did not differ significantly in performance (F (1,23) = 0.980, p = 0.334; see Fig. 3a).

Fig. 3
figure 3

Performance on the final acquisition phase and initial reversal. a Performance on the final acquisition sessions. All groups obtained at least 90% of the rewards over the last set of the afternoon session, and no group differences were found. b Performance over the first reversal sessions. Control animals that switched to preferred pellets fell behind the other groups. White and grey bars indicated sets of four blocks (32 trials). Asterisk indicates group differences, p < 0.05

Spatial reversal

After reaching criterion on the two-lever discrimination, the animals were subjected to four serial reversals. As each group (lesion and control) was now divided into a non-preferred- and preferred-pellet group, the number of groups was four from now on. A main effect of block (F (7,133) = 51.715, p = 0.000; not treatment or pellet, p > 0.3) was observed over the morning session of the first reversal (F (1,19) = 51.715, p = 0.000). During the afternoon session, a block effect over both sets taken together indicated that learning continued (F (7,140) = 22.669, p = 0.000). Main effects on performance for both treatment (F (1,23) = 5.245, p = 0.032) and pellet (F (1,23) = 6.158, p = 0.021) as well as a treatment/pellet interaction (F (1,23) = 5.701, p = 0.026) were observed over the first set. A one-way ANOVA with SNK post-hoc test revealed that sham animals receiving preferred pellets obtained fewer pellets than any other group in blocks 1 (F (3,27) = 4.163, p = 0.017), 2 (F (3,27) = 5.176, p = 0.007) and 4 (F (3,27) = 5.520, p = 0.012; see Fig. 3b). No differences for treatment, pellet or interaction were found over the second set of the afternoon session (p > 0.3). A covariate analysis revealed that the effects found during the reversal could not be attributed to a pre-existing performance difference between groups.

To examine if the group differences could be attributed to either an increase in errors or an increase in omissions, further analyses were performed. The data indicated that the control animals that were switched to preferred pellets exhibited a decrease in accurate lever pressing, i.e. increased errors (F (1,24) = 4.638, p = 0.042; see Fig. 4), as opposed to increased omissions (F (1,24) = 1.606, p = 0.213), in this group in the first set of the afternoon session. After the initial reversal, all groups performed in similar fashion in the three subsequent reversals.

Fig. 4
figure 4

Omissions and response accuracy in the first reversal of the two-lever spatial discrimination task. The graph shows the percentage omissions (top) and percentage correct responses (bottom) during the initial reversal. Decreased lever press accuracy (increased errors) of control animals that receive preferred pellets is seen during the first set of the afternoon session. White and grey bars indicated sets of four blocks (32 trials). Post-hoc testing revealed the second block of trials in this set to be statistically different from all other groups. No differences were observed for the percentage of omissions

Extinction

After four consecutive reversals, the animals were subjected to an extinction phase during which lever pressing was no longer rewarded. When comparing the total amount of lever presses corrected for baseline lever pressing, all groups showed a steady decrease over both sets of each of the four sessions (main effect of block: F (7,175) between 29.237 and 10.048; p = 0.000; see Fig. 5). There were, however, no main effects of treatment or pellet or any interaction effect between the groups (p > 0.2). Over the course of 4 days, lever pressing was never totally extinguished. Analysis of the uncorrected extinction data yielded identical results.

Fig. 5
figure 5

Lever responding during extinction of the two-lever spatial discrimination task. The cumulative number of lever presses per group during the extinction phase of the two-lever discrimination task, when lever pressing is no longer rewarded is shown. Although a decrease is seen over each individual session, lever pressing is never totally abolished. White and grey bars indicated sets of four blocks (32 trials). No significant group differences were found

Effects of mPFC 5-HT depletion on the odour-based go/no-go task

Shaping

During the initial shaping phase, the animals learned to sample an odour and press a lever for a reward. Both groups acquired the responses (i.e. >90% of pellets obtained) in maximally eight sessions, spread out over 6 days, without showing any treatment effect (F (3,28) = 0.667, p = 0.590).

Two-odour discrimination

After shaping, the animals were trained, over six sessions, to respond to one odour with a lever press, while inhibiting a lever press after sampling a second, different, odour. Results show an overall increase in performance over these sessions, i.e. a main effect of block (F (5,140) = 16.074, p = 0.00) for all groups (data not shown). No effect of treatment was found (F (1,28) = 1.375, p = 0.251). Response preference for either the ‘go’ or ‘no-go’ response were not observed.

Odour reversal

After the acquisition of the odour discrimination, the stimulus–response contingencies were reversed, and the animals had to switch their response to the odours. In this treatment, the two groups were again divided into sub-groups for preferred and non-preferred pellets. In contrast to the quickly acquired spatial reversal, the odour reversal took 12 sessions to acquire (reaching ‘pre-reversal’ performance). Figure 6 shows the performance of all groups.

Fig. 6
figure 6

Performance on the final acquisition phase and reversal of the two-odour go/no-go discrimination task. a Performance over the four final acquisition sessions. All groups obtained at least 90% of the rewards over the final four acquisition sessions, and no group differences were found. b Performance over the reversal sessions. Control animals that were rewarded non-preferred pellets fell behind the other groups. Asterisk indicates group differences, p < 0.05

Over 6 consecutive test days (12 sessions), main effects of both session (F (1,10) = 40.163, p = 0.000) and treatment (F (1,10) = 9.025, p = 0.013) as well as a treatment/pellet interaction were observed (F (1,10) = 5.764, p = 0.037). Further analysis (one-way ANOVA and SNK post-hoc test) revealed that in sessions 7 (F (3,13) = 6.851, p = 0.009), 8 (F (3,13) = 3.898, p = 0.044) and 9 (F (3,13) = 5.563, p = 0.017), control animals receiving non-preferred pellets obtained fewer rewards than all other groups.

Further analysis of the data showed that unlike the initial reversal task, none of the groups differed in response accuracy (F (1,10) = 0.158, p = 0.699). However, a treatment/pellet interaction (F (1,10) = 5.898, p = 0.036) was observed for the average number of omissions (see Fig. 7). Post-hoc testing showed this measure in sessions 8 and 10 to differ significantly between sham animals on non-preferred rewards and all other groups, respectively (F (3,13) = 4.647, p = 0.028 and F (3,13) = 4.194, p = 0.037).

Fig. 7
figure 7

Omissions and response accuracy during reversal learning of the two-odour go/no-go discrimination task. Mean percentage of omissions (top) and percentage correct responses (bottom) per group during the ‘odour-reversal’ are shown. An increase in the number of omissions is seen for control animals that received non-preferred pellets on sessions 8 and 10. No differences were observed for the percentage of correct responses

Discussion

With the current experiment, we found support for our hypothesis that medial prefrontal 5-HT is selectively involved in cognitive flexibility when affect guides decision making. Local depletion of mPFC 5-HT prevented the reversal learning impairments that were induced by a change in the reward value during operant behaviour. This effect was observed both in spatial two-lever discrimination reversal learning and during the reversal of the odour-based go/no-go task. In the former, control animals that were switched to preferred pellets showed increased erroneous lever pressing compared to all other groups, whereas in the latter, non-switched control animals showed increased omissions. Apparently, the lesion rendered the animals less sensitive to changed reward value. Similar effects, however, were not observed during extinction learning, when rewards were no longer available.

Taken together, these data suggest that depletion of mPFC 5-HT impairs goal-directed behaviour by rendering rats less capable of responding to a change in reward value (incentive learning; Balleine and Dickinson 1998; Balleine 2005; Niv et al. 2006), without disturbing their ability to detect the presence or absence of a reinforcer and to utilise this information to adapt behaviour accordingly.

The data moreover indicate that this effect occurs independent of the stimulus modality of the behavioural tasks, thereby opposing our hypothesis that reversal deficits induced by mPFC 5-HT lesioning would be limited to spatial tasks, as opposed to odour-guided ones.

In the current experiment, we lesioned 5-HT terminals in the mPFC by means of a local microinjection of the toxin 5,7-DHT. HPLC analysis showed that this procedure was successful as a marked loss of serotonergic innervation (70–90%), compared to sham controls, which was observed in the target area. This reduction was observed without spread to neighbouring areas whilst sparing both DA and NA innervation.

Two behavioural paradigms were employed to study cognitive flexibility and affective processing. First, we used a task previously developed in our lab with which we showed mPFC involvement in spatial reversal learning (De Bruin et al. 2000). A second, symmetrically reinforced go/no-go odour reversal task was newly designed for the current experiments. These tasks were chosen to give insight into reversal learning across different stimulus modalities and to allow the study of different aspects of reversal learning, such as inhibitory processes involving both response extinction (spatial discrimination task) and impulse inhibition (odour-based, go/no-go task). Data showed that whilst the stimulus discriminations for both tasks were relatively quickly acquired, reversal learning took significantly longer for the odour stimuli. Whereas the ‘spatial’ lever press reversal was acquired within a day, the ‘odour’ go/no-go reversal took up to 12 days. These differences are in line with existing literature on the acquisition of the spatial lever-press task (De Bruin et al. 2000; Van der Meulen et al. 2003) and a comparable odour-based go/no-go task (Schoenbaum et al. 2006).

Our hypothesis that medial prefrontal 5-HT depletion would have a stronger impact on reversal learning in the ‘spatial’ task than in the ‘odour’ task could not be proven.

When comparing the performance of lesioned animals to that of control animals, we were unable to find any indication for impaired reversal learning on either task. Not only did 5-HT lesioned animals make fewer errors during the initial acquisition, these animals showed to be fully capable of acquiring both types of reversals, demonstrating intact cognitive flexibility across stimulus modalities when the reward identity remains constant. These findings suggest that in rats, the acquisition of a spatial reversal task, which depends on the mPFC, does not depend on 5-HT innervation of that area. This contrasts the finding that in primates, the orbital PFC involvement in visual reversal learning does depend on 5-HT.

To test our hypothesis that mPFC 5-HT is involved in affective processing, we used a pellet switch to induce a change in affective value of the reinforcer during both reversal tasks. The rationale for this was that the impact of the serotonergic lesion in the mPFC on cognitive flexibility would be greater in a situation where the affective value of the reward can be used to guide decision making. Whereas in primate literature, the term ‘affective shifts’ has been used for a switch in the value associated with the stimulus (Dias et al. 1996), we wanted to introduce a switch in affective value of the reinforcer. Similar to the experiments of Tremblay and Schultz (2000), we changed the relative reward value from non-preferred, although still salient, to preferred (see also Watanabe 1996), at the time of the reversal.

To determine food preference, food-deprived rats were pre-tested on three types of food reward in a T-maze, where all rats chose Noyes ™ pellet P over AI. These were subsequently implemented in the task as ‘preferred’ and ‘non-preferred’ rewards, respectively, to examine the effect of this affective shift in both groups during reversal learning. To exclude a ‘novelty’ effect, the animals were acquainted with both types of pellets before testing. As we observed pellet switch-induced performance effects in control rats during both tasks, we conclude that the pellet rewards were sufficiently different. The possibility that the lesion induced an indifference to either reward does not seem likely. If lesioned animals were in fact indifferent to reward outcome, they would have performed as non-switched control animals throughout the experiment, but this was not the case (see below).

During initial ‘spatial’ reversal learning, lesioned animals behaved as non-switched controls, showing a quick reversal, without performance deficits on reversal learning per se. We did, however, observe that switching intact animals to preferred rewards resulted in reduced performance on the reversal task compared to lesioned animals. These results suggest that in intact animals behavioural adaptation is at least partly guided by reward value and, in this particular case, further complicates acquisition of the reversal.

A second instance of such a pellet-switch-induced performance deficit was observed during the ‘odour’ reversal. Whereas the initial reversal impairment in switched control animals was due to increased errors, during the ‘odour’ reversal, non-switched control animals fell behind in performance because of increased omissions.

As the pellet switch did not affect performance of lesioned animals at any time, we pose that an inability to use affective information (i.e. pellet value) in lesioned animals underlies the observed group differences. These data suggest that initial behavioural adaptation guided by altered reward outcome is dependent on medial prefrontal 5-HT.

In the following section, we will first discuss why, in our opinion, a number of possible mechanisms cannot explain the observed effects induced by the pellet switch.

We will then propose how our observations fit and extend current literature.

Among the possible causes for these effects, altered associative learning is not a likely one. Although lesioned animals did make fewer errors during the acquisition of the spatial discrimination task, an effect that has also been observed by others (e.g. Ward et al. 1999), and suggests possible improved stimulus–outcome learning, performance of neither lesioned group (switch or non-switched) exceeded performance of the control animals during the subsequent testing. Moreover, the performance impairments observed in the control group occurred only during the early stages of reversal learning, leaving the acquisition of subsequent (spatial) reversals or stimulus–outcome learning unaffected (see also Harrison et al. 1999).

Despite reports on involvement of both the mPFC (Salazar et al. 2004, Winstanley et al. 2006) and 5-HT (Beninger and Phillips 1979; Soubrié 1986) in impulsive responding and extinction processes, our tests indicate that reduction in mPFC 5-HT does not lead to decreased extinction learning or increased impulsivity. No effects of the lesion were observed during extinction of lever pressing or increased lever pressing during the go/no-go inhibition task. Although these results do not seem to fit the current literature on 5-HT in impulsivity, recent findings by Chamberlain et al. (2006) in human subjects show that impulse inhibition might be mediated via the NA system, rather than the classically implicated 5-HT. Therefore, the possibility that reversal learning improved in lesioned animals through altered inhibition or extinction processes is unlikely. Furthermore, subsequent analysis showed no difference between groups on general motor activity, as assessed by the average number of lever presses and nose pokes into the food dispenser.

The possibility that a motivational effect of 5-HT depletion underlies the observed deficits does not seem likely either. Although the control animals showed an increase in the number of omissions towards the end of the go/no-go task, an effect suggesting decreased motivation towards the end of the experiment for the non-preferred reward, none of the groups (lesion or sham) showed any sign of increased or decreased lever pressing during the preceding test phases, suggesting that motivational processes were intact in lesioned animals. These findings are supported by other reports where motivational changes after general 5-HT depletion in humans (Evers et al. 2005), primates (Clarke et al. 2004) and rodents (Harrison et al. 1999) were not observed either.

Earlier work by others (e.g. Roberts et al. 1994) suggests that (global) 5-HT depletion may induce increased reinforcing efficacy for both psychomotor stimulants and food rewards (but also see Tran-Nguyen et al. 2001). The evidence for such an effect is, however, inconclusive, and the observed changes can also be ascribed to increased impulsivity and insensitivity to punishment (e.g. Harrison et al. 1997). The current data, moreover, does not support these findings, as lesioned animals showed ‘normal’ extinction learning, whereas decreased extinction learning would be expected in case of increased reinforcing efficacy.

From these findings, we conclude that the behavioural effects are not due to a generalized decrement of overall functioning but are rather specific to an inability to effectively incorporate reward value. These results suggest that mPFC 5-HT depletion induce a specific deficit in goal-directed behaviour that is limited to the ability to use current reward value to guide behaviour (Balleine and Dickinson 1998; Niv et al. 2006).

Human literature provides a framework in which we think the current findings can be placed. Miller (2000) has described the PFC as a control centre over overt behaviour that regulates ‘active maintenance of patterns of activity that represent goals and the means to achieve them.’ Later work by Miller and Cohen (2001) and Ridderinkhof et al. (2004) extends this model by incorporating different cortical sub-areas and describes how these areas connect to combine (reward-related) information and lead to goal-directed behaviour. In this model, the orbital PFC can be seen as the area where reward-related information is encoded (as discussed in “Introduction”) and the mPFC functions as an area that regulates or signals the orbital PFC to implement performance adjustments (based on reward information).

A unified model for the rodent PFC, similar to that of Miller for the human PFC, has not been proposed, but the available experimental data obtained in rodent areas that show functional overlap with the human mPFC (Uylings et al. 2003) parallel and extend these findings. Balleine and Dickinson (1998), for example, showed that lesions of the prelimbic area (part of the mPFC) render the animals insensitive to variations in the contingency between a response and a specific reward. Moreover, the same authors (De Wit et al. 2006) showed that inactivation of the mPFC leads to impaired resolving of response conflict through loss of inhibitory function, without any apparent impairment in discriminative function for foods or loss of reinforcing function of food pellets and sugar. Further indications for a parallel between the human and rodent mPFC comes from Cardinal et al. (2001), who showed that lesions of the mPFC yielded what Dalley et al. (2004) later described as a flattening of the shift from large-delayed rewards to small-immediate rewards. Although the authors ascribed this to a ‘timing impairment,’ these findings fit the described, human, model of mPFC-mediated performance monitoring and adjustment of behaviour. Finally, Hok et al. (2005) reported on neurons recorded in the mPFC that encode the motivational salience of places. These results add to the experimental evidence presented earlier of orbital PFC-mediated reward encoding by showing that reward-related information is also processed in the mPFC. Together, the presented rodent data support a role for the mPFC in the monitoring, processing and implementation of reward-related information, as suggested for the human mPFC.

Experiments concerning the underlying neurochemical mechanisms involved in the processing of reward-related stimuli have mainly focused on DA. Direct evidence for DA involvement comes from measurements of DA ventral tegmental area neurons in non-human primates that show reactivity to reward (Schultz 1998), probability of reward (Fiorillo et al. 2003) and adaptive neural firing in response to reward-predicting stimuli (Tobler et al. 2005). These findings are further supported by a recent study that shows midbrain activation in humans during reward anticipation (Wittmann et al. 2005).

The presented data, however, show that cognitive flexibility, when affect guides decision making, might be dependent on intact serotonergic projections. Although we cannot rule out possible compensatory mechanisms, our analysis shows that DA innervation in the lesioned animals remained intact. Support for our findings can be found in human studies where global depletion of 5-HT through tryptophan depletion caused altered processing of reward cues, lesading to unfavourable choice behaviour in a gambling task (Rogers et al. 2003).

The current experiments support previous notions on the mPFC as a performance monitoring structure, responsible for the implementation of adjustments in cognitive control. Moreover, they extend the current model by implicating 5-HT, in addition to DA, in cognitive control.

With regards to the initial research questions whether cognitive flexibility is dependent on mPFC 5-HT, the present data suggest that in both the spatial and the odour tasks, the lesioned animals are fully capable of acquiring a reversal, show appropriate extinction learning in the absence of a reinforcer and exert normal inhibitory control. The current observations, however, also suggest that when the value of a reinforcer changes but not its presence or absence, mPFC 5-HT is necessary to adapt the behaviour accordingly.