Introduction

Haloperidol is a potent typical antipsychotic used with high affinity to the dopamine (DA) D2 receptors. In laboratory animals, it is used to model the extrapyramidal side effects of neuroleptics therapy: Rats treated with haloperidol show symptoms similar to those observed in Parkinson’s Disease (PD) and include cognitive learning deficits as well as akinesia and rigidity (i.e., catalepsy), effects that are mediated by blockade of striatal D2 receptors (Sanberg 1980).

Interestingly, repeated administration of haloperidol leads to an intensification of catalepsy following each consecutive test—a process known as sensitization (Schmidt and Beninger 2006; Schmidt et al. 1999; Lanis and Schmidt 2001; Frank and Schmidt 2003; Barnes et al. 1990; Antelman et al. 1986). This same catalepsy sensitization does not occur only under repeated administration of haloperidol but also in DA-deficient animals (bilateral striatal 6-OHDA-lesion) (Klein and Schmidt 2003; Srinivasan and Schmidt 2004).

Similarly, antipsychotic medications do not improve symptoms of psychosis until a delay of days up to weeks (Reynolds 1992) implicating the involvement of sensitization processes in the therapy of schizophrenia.

Notably, catalepsy sensitization by haloperidol and by 6-OHDA-lesion is context-dependent: it is observed in the context under which haloperidol administration was originally given, testing in other novel contexts results in a significant decrease of catalepsy (Klein and Schmidt 2003; Srinivasan and Schmidt 2004; see Fig. 1).

Fig. 1
figure 1

The Klein and Schmidt (2003) experiment on context dependency of haloperidol-induced catalepsy. One hour after having been injected with 0.25 mg/kg haloperidol, rats were placed to stand with their forepaws on an elevated bar. Catalepsy is measured by the time until the first movement occurs. The test environment remained stable and represents context A. Catalepsy sensitization could be observed during the first 10 days of testing. On day 10, the rat was tested in context B, which was novel (different experimenter, light, smell etc). Catalepsy expression was significantly weaker than in the previous day in context A. From days 11 to 27, the rats were left in their home cages without haloperidol treatment. On day 28, the rats were again injected with haloperidol and tested in context A. The sensitized catalepsy was almost as strong as on day 9. Thus, catalepsy sensitization has a context-dependent component

Furthermore, although catalepsy expression can be extinguished following repeated injections of placebo instead of haloperidol, the sensitization nevertheless shows a non-extinguishable component: a single dose of haloperidol elicits renewed elevation of catalepsy relative to animals who had not been previously sensitized (Amtage and Schmidt 2003; see Fig. 2). Similar sensitization, extinction, and renewal phenomena are observed in response to drugs of abuse (e.g., Redish et al. 2007), raising the question of whether similar principles apply.

Fig. 2
figure 2

The (Amtage and Schmidt 2003) experiment on extinction and renewal of haloperidol-induced catalepsy sensitization. An increase in catalepsy (longer latency to get off the bar) was observed in the “paired” group injected with a threshold dose of haloperidol 1 h prior to being placed on the bar, and saline 30 min afterward. The saline group was injected with saline instead of haloperidol and expressed no catalepsy. The “unpaired” group received saline 1 h prior to the catalepsy test and of haloperidol 30 min after the bar test; the lack of catalepsy rules out the possibility that receptor upregulation alone can account for the day-to-day increase in catalepsy in the paired group. On day 9, the “paired” group received saline instead of haloperidol, yet catalepsy expression was still observed to a far greater degree than in the other groups. Rats from the “paired” group were then divided into two groups: one that received extinction training (“paired-E”) and one that was not tested and left in their cages (“paired-NE”). Catalepsy was progressively extinguished in the “paired-E” group. On day 15, all groups were again challenged with haloperidol on the bar. The “paired-NE” group, who had not received extinction training, showed strong catalepsy. Moreover, despite the previous extinction, the “paired-E” group expressed stronger catalepsy than the “saline” and “unpaired” groups. On day 16, every group was injected with saline before and after the bar test. An identical pattern of results was observed in the “grid test” (not shown)

Despite several years of research into these phenomena, a well elaborated mechanistic explanation for these observations is still lacking. A key observation might be that haloperidol-induced catalepsy sensitization could be related to changes in synaptic strength (i.e., learning) within the striatum—a brain region with a high rate of neuroplastic changes modulated by DA (e.g., Calabresi et al. 2007; Centonze et al. 2001; Robinson and Kolb 1997). Indeed, chronic haloperidol enhances synaptic plasticity via D2 receptor blockade (Centonze et al. 2004) and phosphorylation of GluR1 AMPA receptors in striatopallidal neurons (Håkansson et al. 2006). Haloperidol potently blocks dopaminergic D2 receptors which are primarily found in the indirect pathway of the basal ganglia (BG) (Gerfen 2000; Salin et al. 1996; Boraud et al. 2002; Robertson et al. 1992; Gerfen et al. 1995; Surmeier et al. 2007) and increases spike frequency in striatal spiny I neurons (Frank and Schmidt 2003). Neurons in this same pathway are also hyperactive or show abnormal burst firing in Parkinsonism (Albin et al. 1989; Bergman et al. 1998; Mallet et al. 2006).

In this paper, we focus on this well-established striatal D2 mechanism of haloperidol, in an attempt to account for catalepsy sensitization (see “Discussion” for other mechanisms). To explore the complex dynamic interactions among BG sub-regions in response to haloperidol, we use an explicit, computational model of the BG (Frank 2006), which is also grounded by its ability to account for other dopamine-dependent learning-related phenomena and to make predictions that have subsequently been confirmed via pharmacological manipulations in humans (Frank et al. 2004, 2007c; Frank 2005; Frank and O’Reilly 2006; Cools et al. 2006; Santesso et al. 2009; Moustafa et al. 2008). Here, we report that this same model also reproduces the effects of haloperidol on sensitization, context dependency, extinction, and renewal.

Materials and methods

Model: high level overview

The role of the BG can be seen as that of a dynamic modulator of frontal cortical action plans. With respect to motor control, the BG could function as an action selection device (Graybiel 2000; Redgrave et al. 1999; Frank 2005): efferent projections from the motor cortex reach the BG, which then facilitate appropriate motor commands while suppressing those that are inappropriate (Basso and Wurtz 2002; Brown et al. 2004; Gurney et al. 2001; Jiang et al. 2003; Mink 1996; Redgrave et al. 1999). These two functions can be supported by two separate striatofugal neural projections (Albin et al. 1989; Alexander and Crutcher 1990). The direct striatonigral pathway (expressing high levels of D1 receptors) functions as the “Go-” pathway, by facilitating the selection of particular actions when appropriate in a given sensory context. In contrast, neurons originating in the striato–pallidal–nigral pathway (expressing high levels of D2 receptors) function as the “NoGo-” pathway, by detecting the conditions in which a given action should be suppressed and counteracting the Go pathway at the level of BG output (Frank 2005).

According to the reward prediction error (RPE) hypothesis, midbrain dopaminergic neurons signal when outcomes are better or worse than expected via phasic bursts and pauses in firing (Schultz et al. 1997). Reward-associated behavior is potentiated by activation of D1 receptors and synaptic plasticity in the Go pathway following dopamine bursts (Reynolds and Wickens 2002). Conversely, maladaptive behaviors are suppressed via disinhibition of D2 receptors and potentiation of NoGo cells following dopamine dips (Frank 2005). Recent studies support these dual mechanisms of plasticity: D1 stimulation potentiates corticostriatal Go synapses, whereas a lack of D2 receptor stimulation (simulating the effect of a DA dip) was required to potentiate NoGo synapses (Shen et al. 2008).Footnote 1 Furthermore, this NoGo learning effect would be enhanced by D2 receptor sensitivity (Seeman 2008) and enhanced excitability of striatopallidal NoGo cells in the DA-depleted state (Surmeier et al. 2007; Shen et al. 2008).

Some question the RPE hypothesis of DA signaling altogether, suggesting that the timing of DA signals is too early to encode these errors (Redgrave et al. 1999; Redgrave and Gurney 2006). They argue that the functional role of phasic dopaminergic neuron firing is to reinforce the development of a novel action, rather than unpredicted reward per se, in response to a salient or novel stimulus (for recent reviews, see Redgrave and Gurney (2006) and Lisman and Grace (2005)). This theory was addressed in our second experiment (see below).

BG model functionality

The BG model’s basic functionality is to select an appropriate response in the output units when presented with a stimulus in the input (Fig. 3). This selection mechanism involves interactions between various BG nuclei beginning with the striatum, consisting of simulated Go (striatonigral) and NoGo (striatopallidal) units. Activity in Go units facilitates a response by effectively disinhibiting the thalamic units representing that response and allowing reverberatory thalamocortical projections to generate a cortical response. Activity in NoGo units counteracts the Go activity via inhibition of the external segment of globus pallidus (GPe), which has an opposing effect via focused projections from GPe to GPi (Parent and Hazrati 1995).Footnote 2 The substantia nigra pars compacta (SNc) sends dopaminergic projections to the striatum and signals positive or negative RPEs via DA bursts or dips, respectively. DA bursts further excite activated Go neurons via D1 receptors while inhibiting NoGo neurons via D2 receptors (Gerfen 1992; Joel and Weiner 1999; Brown et al. 2004; Frank 2005). In contrast, DA dips have the opposite effect: NoGo neurons have an increased probability of firing, due to removal of DA inhibition onto sensitive D2 receptors. Furthermore, only striatal neurons that are already activated via glutamatergic corticostriatal input (representing the particular stimulus-response conjunction) can increase or decrease their synaptic strengths, according to the Hebbian-like learning rule in our model (see Electronic supplementary material), similar to the three-factor learning rule proposed by Wickens and colleagues (e.g., Reynolds et al. 2001). Thus, DA bursts potentiate only Go synapses associated with the selected response and activated by the input stimulus, leading to positive reinforcement learning. Conversely, DA dips potentiate activated NoGo synapses such that this response will be more likely to be suppressed in future presentations of the same stimulus (Frank 2005, 2006). Implementation details of the original model can be found elsewhere (Frank 2006); additional changes to simulate the tasks here can be found in the Electronic supplementary material.

Fig. 3
figure 3

a Functional architecture of the BG model. The direct (“Go”) pathway disinhibits the thalamus via the interior segment of the globus pallidus (GPi) and facilitates the execution of an action represented in the cortex. The indirect (“NoGo”) pathway has an opposing effect of inhibiting the thalamus and suppressing the execution of the action. These pathways are modulated by the activity of the substantia nigra pars compacta (SNc) that has dopaminergic projections to the striatum. Go neurons express excitatory D1 receptors while NoGo neurons express inhibitory D2 receptors. b The (Frank 2006) computational model of the BG. The input neurons project directly to the premotor cortex which in turn projects to the output (M1) response neurons. Motor cortical responses are modulated by projections from the thalamus. The left half of the striatum are the Go neurons, the right half are the NoGo neurons, each with separate columns for the competing responses R1 and R2. Dopaminergic projections from the substantia nigra pars compacta (SNc) modulate Go and NoGo activity by exciting the Go neurons (D1) and inhibiting the NoGo neurons (D2) in the striatum, which also drive learning during phasic DA bursts and dips. Projections to and from the subthalamic nucleus (STN) are included here for completeness (see Frank 2006 for their functionality)

Catalepsy simulations

A common measurement of catalepsy in rats is the bar test (e.g., Amtage and Schmidt 2003; Klein and Schmidt 2003) in which the animal is placed to stand with its forepaws on an elevated bar and the time until the first movement occurs is taken. But how does one simulate catalepsy in a computational model? Just as rats can take a longer time to descend off the bar, the BG model can take varying amounts of time to facilitate a response. To measure the latency until an action is selected in the model (hereafter, “response time” (RT)), we assessed the number of network processing cycles (see Electronic supplementary material) before a response was selected by the BG action selection network, i.e., until one of the thalamus units was disinhibited by BG circuitry. When a thalamus unit is activated, the corresponding response is swiftly executed (Frank 2006). Thus, catalepsy is associated with longer latencies to gate responses. These same methods were employed in previous model RT analyses (Frank et al. 2007b, d; Moustafa et al. 2008). Because the BG gating system is required to facilitate a cortical response, similar results are obtained by probing output unit activity.

To gain insight into underlying processes leading to different RTs, we additionally probed striatal unit activity. As described above, the BG model simulates Go and NoGo neuronal populations in the striatum, which facilitate and suppress responses, respectively. If a Go population for a given response is more active than its NoGo counterpart, that response is more likely to be facilitated, thus the relative difference in Go–NoGo activity influences the speed at which the response is executed (Moustafa et al. 2008). If NoGo activity is relatively greater than Go activity, as seen in Parkinsonism, a response may not be selected at all by the BG; therefore the NoGo–Go contrast reflects an internal (“hidden variable”) measure of catalepsy (see Electronic supplementary material for the precise computation).

To simulate the partial blockade of postsynaptic D2 receptors by haloperidol, we reduced the strength of the inhibitory SNc D2 projection onto NoGo neurons to 10% that of the original, representing a 90% occupation of D2 receptors by haloperidolFootnote 3.

Reward prediction error model

According to the RPE hypothesis, DA bursts signal unexpected reward and DA dips signal the lack of expected reward. However, in the aforementioned experiments on catalepsy sensitization, neither explicit reward nor punishment was used following motor responses. However, we reasoned that because the bar test is somewhat aversive (the animal does not want to be on the bar and therefore descends), the escape from aversive conditions may be associated with a positive DA burst. Indeed, there is evidence that an offset of an aversive stimulus is associated with increased striatal DA (Jackson and Moghaddam 2004). Accordingly, we applied a small DA burst following response execution during training.

The network was trained for 60 trials in the haloperidol mode in context A (represented by a set of four sensory input units), and then tested in that context and in an untrained context B (corresponding to a different set of input units).Footnote 4 During this testing procedure, the network’s weights were prevented from changing, so as to prevent learning in the test and to permit multiple tests across training (something which would have to be done between subjects in actual experiments). Next, we simulated extinction by continuing training for a further 40 trials with the network switched from haloperidol mode to the intact state (i.e., weights of the SNc→NoGo projections at 100%). Finally, the haloperidol mode was simulated for an additional five test trials, to determine whether the model still demonstrates sensitized catalepsy after extinction.

Novelty model

We further tested whether an implementation of the novelty hypothesis could account for the same findings, and if so, whether the two models make divergent predictions. In this case, it is the novelty of a stimulus, not the RPE associated with it that drives a DA burst. Accordingly, the apparent context dependency of catalepsy sensitization could arise simply because the animal is not familiar with context B; the associated novelty-driven DA burst (Lisman and Grace 2005; Kakade and Dayan 2002) could activate the Go pathway, promote locomotion and exploratory behavior, and thereby lead to reduction in catalepsy. Note that this hypothesis does not require the assumption that any NoGo learning is specific to sensory input units encoding the external context A. Instead, this learning might generalize across contexts, but the context dependency arises due to the novelty of surrounding context B which drives a DA burst that counteracts catalepsy expression.Footnote 5

Behavioral experiment

After completion of simulation studies and on a suggestion from an anonymous reviewer, we conducted a simple behavioral experiment to distinguish between the RPE and novelty models. This experiment is similar to the context challenge experiment (Klein and Schmidt 2003), but an additional group of animals was habituated to context B (without haloperidol) prior to the sensitization phase, in order to eliminate the novelty of this context. If the novelty model is correct, we would expect to see continued catalepsy expression (or a smaller reduction in catalepsy) during the context challenge in this group, because the context is no longer novel and there should therefore not be a novelty-driven DA burst. In contrast, if the RPE model is correct, all animals should show reduced catalepsy in context B regardless of whether it is novel, because stimulus-NoGo learning only occurred in context A.

Methods

Subjects

A total of 20 male Sprague–Dawley rats (230–260 g at the beginning of the experiment), Charles River, Sulzfeld, Germany, were used. The animals were group-housed (four animals per cage, in standard macrolon IV cages) under a 12/12 light–dark cycle with restricted access to food (12 g per animal per day). Access to water was unconfined (i.e., ad libitum).

Substance

The neuroleptic agent haloperidol (Haldol®-injection solution, Janssen, Germany) was diluted in saline (0.9% NaCl), Fresenius, Germany to a concentration of 0.25 mg/ml. Substance administration was carried out subcutaneously (s.c.), at 1 ml/kg body weight, the same concentration used in Amtage and Schmidt (2003) and Klein and Schmidt (2003).

Behavioral testing

To test for catalepsy, the animals performed a bar test. Within that test, a single rat was put gently with its forepaws on a horizontal bar (9 cm above the table surface, diameter of 0.5 cm). The descent latency, as a proxy for the degree of catalepsy, was measured by taking the time interval between the first placement of the animal on the apparatus and its first active paw movement. This procedure is identical to the one used in Amtage and Schmidt (2003) and Klein and Schmidt (2003).

To test for context dependency, two contexts (A and B) were used. The context consisted of a different room (with different lighting) and a different lab coat of the experimenter (in context A, the experimenter wore a white lab coat; while in context B, a black plastic poncho was put over the lab coat). These context cues are very similar to those used in Klein and Schmidt (2003).

Experimental design

The rats were handled for five consecutive days prior to the first catalepsy test. During the following habituation phase, the animals received a (s.c.) saline injection and were tested 60 min later. The treatment during the habituation phase took place in context A for the first (non-habituated) group (n = 10) and in context B for the second (habituated) group (n = 10).

After the habituation phase, catalepsy sensitization was performed for both groups in context A, for a total of 9 days (after the seventh sensitization day, there was a lack of testing for 2 days). On the first day after sensitization (day 17), both groups were tested in context B to induce the context challenge. On day 18, both groups were retested in context A.

Statistics

Statistical analysis was performed using GB STAT 7.0. Multiple values within a group, where tested with the non-parametric Friedman ANOVA. Two values of one group were submitted to the Wilcoxon signed rank test. To compare individual data between two groups, we used the Mann Whitney U test.

Results

Reward prediction error model results

Catalepsy sensitization

During the first 60 training trials in the haloperidol mode, a steady increase of catalepsy (i.e., an increase in model latencies to select a response) can be observed in context A (Fig. 4a). As expected, the RTs are strongly correlated with relatively greater NoGo than Go activity across trials (Fig. 4b), allowing closer analysis of the mechanisms by which catalepsy materializes. This activation difference resulted from Hebbian learning processes in which active neurons adjust their weights. Because simulated haloperidol blocked dopamine from inhibiting NoGo units, the activity of these units increased as seen during DA depletion (e.g., Mallet et al. 2006). As a result, the synaptic weights between the sensory input (context A) units and these NoGo units increased, consistent with the potentiation of corticostriatal synapses in striatopallidal neurons following haloperidol administration (Centonze et al. 2004; Håkansson et al. 2006). Thus, the next time context A was presented, it elicited greater NoGo activity, which in turn further increased synaptic strength between context A and NoGo units, such that each trial of stimulus context presentation led to progressively greater NoGo activity. In contrast, control networks actually show a decrease of NoGo–Go activity, corresponding to greater D1-dependent Go learning to descend from the bar together with an inhibitory effect of DA onto NoGo neurons (via intact D2 receptors). Thus, this model provides a plausible explanation for the catalepsy intensification resulting from D2 receptor blockade.

Fig. 4
figure 4

Simulation results for the implemented reward prediction error hypothesis. a Each data point shows the average RT (number of cycles a striatal Go response led to disinhibition of the thalamus) across 30 networks. In epochs 0 to 59, the models are trained in the haloperidol mode on context A and tested on context A (solid line) and context B (dotted line). Note the context dependency of catalepsy sensitization (difference between green and black lines). For comparison, intact models are trained on context A and also tested on context A (dashed line) and B (dot-dashed line). In epochs 60 to 100, haloperidol models are switched to intact for extinction training. Note that catalepsy expression continues during extinction relative to the intact mode, but progressively decreases. In epochs 100 to 105, the switched model is switched back to haloperidol mode, to test for the non-extinguishable sensitized component. Models that had been trained in the haloperidol mode show substantially greater catalepsy in the trained context A than other models and contexts. b Summed striatal NoGo–Go activity in the reward prediction error model when presented with context A or B in intact and haloperidol modes. All effects mirror those found using response time analyses

Context dependency

In contrast to sensitization in context A, simulated catalepsy was roughly constant in context B regardless of the number of training trials with haloperidol in context A. We also confirmed that this context dependency arose due to differences in weights between context input units to the striatum (data not shown). Weights from the context A input neurons to the NoGo neurons increased, while those from context B units did not, due to the dependency of Hebbian learning on both pre- and postsynaptic activation (see equation A-8 in the Electronic supplementary material). Because the model was never trained with context B units active, its NoGo weights to the striatum did not increase. Thus, the model replicates the context dependency of sensitized haloperidol observed in rodents.

Extinction training

The model also captures extinction (Fig. 4a, b). After we switched the model from simulated haloperidol to the intact mode, cataleptic activity progressively decreased, reaching its starting value by the end of extinction. Again, these effects can be explained by examining the weights from the input units to the striatum. Initially, the network exhibited cataleptic activity in context A (despite being in the intact mode), due to prior NoGo learning. However, because the DA units can now inhibit NoGo units, the Go units were now free to fire more (due to less inhibitory competition from NoGo units). The DA bursts following response selection (corresponding to the offset of the aversive stimulus) also led to Go learning during this time, and thus a reduction in catalepsy.

Sensitized component

Finally, the haloperidol-trained network also exhibited a sensitized component that was resistant to extinction. As shown in Fig. 4, a switch back from intact to haloperidol mode in trial 100 was associated with a prominent rise of cataleptic activity (increased RT and associated NoGo activity) in context A. This sensitized catalepsy was observed despite the previous 60 trials of extinction training in which catalepsy was reduced back to baseline and was far greater than that observed in networks which had never undergone haloperidol sensitization. This qualitative pattern of data matches that observed in rats (Amtage and Schmidt 2003).

What are the underlying mechanisms that cause this non-extinguishable componentFootnote 6? Intriguingly, examination of the weights from the input to NoGo units revealed that the weights did not substantially decrease during extinction—that is, there was relatively little unlearning of prior NoGo associations. Instead, the steady decrease of cataleptic activity resulted primarily from an increase in Go weights during extinction. When switching from haloperidol to intact mode, the now intact SNc→NoGo projections inhibited the striatal NoGo neurons, which prevented these neurons from changing their weights due to the Hebbian learning rule. Consequently, the previously learned A→NoGo association was maintained, but was only prevented from being expressed, during extinction. Thus, when the model was ultimately switched back to haloperidol mode, this prior learning was then immediately uncovered. Finally, note that when the model was tested in context B in trial 100, there was not a large increase in cataleptic activity, due to the specificity of learned NoGo weights. Thus, according to this model implementation, we would expect the sensitized component to be context-dependent.

Novelty hypothesis model results

Full details of the novelty model and the results are presented in the online Supplementary Material. In brief, this model produces a similar rise in cataleptic activity due to the same NoGo learning mechanism and also exhibits context dependency due to the novelty of the untrained context (Fig. 5a, b) as well as the non-extinguishable component (Fig. 6a, b) without requiring the assumption of distinct context representations. However, the novelty model predicts that catalepsy sensitization would not be context-dependent if tested in a familiar context.

Fig. 5
figure 5

a Response times of the models with the implemented novelty DA signals. In epochs 0 to 59, the models are trained in the haloperidol mode on context A and tested on context A (solid line). In epoch 60, a context change is simulated via an exponentially decaying novelty DA burst (without changing the input representations; see text), leading to context-dependent catalepsy expression. b Striatal NoGo–Go activity in the novelty model

Fig. 6
figure 6

a Response times of networks with novelty-induced DA bursts, when tested for the sensitization in a novel context. In epochs 0 to 59, networks were trained in the haloperidol mode in context A and tested in context A (solid line), with an exponentially decaying DA burst in the first trial. For comparison, an intact model is also trained and tested (dashed line). In epochs 60 to 100, haloperidol networks are switched to intact for extinction training. In epochs 100 to 105, these networks are switched back to haloperidol mode and subject to a novelty-induced DA burst, to test whether a sensitized component is observed in a novel context. In contrast to the reward prediction error model, this model predicts that the sensitized component is not context-dependent. b NoGo–Go activity of the models with the implemented novelty hypothesis, when tested for the sensitization in a novel context

Behavioral experiment results

The results from the novel experiment designed to adjudicate between the two models are shown in Fig. 7. During the habituation phase (days 1–5), there was no significant increase of catalepsy in both groups. During the sensitization phase (days 6–12 and days 15 and 16), a highly significant increase of descent latency (i.e., sensitization) was observed in both groups (p < 0.0001). The 2 days without testing (days 13 and 14) had no significant effect on catalepsy. Compared to the descent latencies on days 16 and 18, both groups showed a significant attenuation of the descent latencies on day 17 (p < 0.05). There were no significant between-group differences in descent latencies between days 16, 17, and 18, or in the decrease of descent latencies from day 16 to day 17. Thus, there was no effect of novelty on the context dependency of catalepsy expression.

Fig. 7
figure 7

Experiment to test the novelty versus the reward prediction error model. The experimental days are shown on the abscissa, and average of the daily descent latencies on the ordinate, with error bars representing standard error. During the first 5 days (habituation phase), both groups received a saline injection 60 min before the catalepsy test. While the “non-habituated” group was placed in context A, the “habituated” group was placed in context B. From days 6 to 19, both groups were injected with haloperidol (0.25 mg/kg) instead of saline. Both groups were tested in context A during the sensitization phase (days 6 to 16, excluding days 13 and 14 during which no testing occurred). On day 17, the context challenge was performed—catalepsy was assessed in both groups in context B. On day 18, both groups were retested in context A. The development of catalepsy sensitization was highly significant in both groups (p < 0.0001, Friedman ANOVA). Both groups showed a significant decrease of descent latency on day 17 (p < 0.05, Wilcoxon signed rank test). There was no between-groups difference in catalepsy reduction from days 16 to 17 indicating no effect of the habituation, and hence no novelty effect

Discussion

In the present study, we explored possible neural mechanisms of haloperidol-induced catalepsy sensitization, using a computational model of the BG (Frank 2006). The model suggests that this catalepsy sensitization reflects a form of “NoGo” learning to suppress action execution, caused by disinhibition of striatopallidal neurons expressing D2 receptors in the basal ganglia. This notion is supported by studies showing that chronic haloperidol administration promotes synaptic potentiation in corticostriatal projections (Centonze et al. 2004), an effect that appears to be specific to NoGo/indirect pathway neurons (Håkansson et al. 2006). Thus, we posit catalepsy sensitization to result from the same mechanism that leads to relatively enhanced “NoGo” reinforcement learning in non-medicated Parkinson’s patients (Frank et al. 2004; Cools et al. 2006), schizophrenic patients treated with antipsychotics (Waltz et al. 2007), and healthy participants with enhanced striatal D2 receptor genetic function (Frank et al. 2007a).

To capture catalepsy sensitization, we measured the response times for the simulated BG networks to select a response, and associated striatal Go and NoGo activations, as a function of experience. Simulated haloperidol led to NoGo unit disinhibition, Hebbian learning in the corticostriatal pathway, and progressively slowed RTs specific to the stimulus context which had been repeatedly paired with simulated drug administration. Thus, sensitization was context-dependent, as observed experimentally (Klein and Schmidt 2003). This catalepsy was incrementally extinguished when switched back to the intact mode, due to Go learning associated with the removal of the aversive stimulus, and inhibition of NoGo representations. Critically, after extinction, when networks were again challenged with simulated haloperidol, they exhibited substantially more catalepsy than a model that was never sensitized in the first place, as seen in rats (Amtage and Schmidt 2003). This latter effect was due to fact that NoGo representations were simply prevented from being expressed, and therefore from being unlearned, during extinction. The subsequent blockade of simulated D2 receptors uncovered this latent NoGo association.

Based on this finding, our model predicts that it may be possible to prevent the development of a non-extinguishable component by blocking Go learning via D1 receptor blockade during extinction. (To prevent the drug from inducing catalepsy itself, this procedure could be executed following the extinction session, which should prevent Go learning consolidation (e.g., Dalley et al. 2005)). In this case, we hypothesize that extinction will occur via unlearning of NoGo representations rather than new Go learning, such that the sensitized component will be entirely (or mostly) extinguishable even when re-challenged with haloperidol. If confirmed, such a result might hold practical importance for understanding and treating Parkinson’s symptoms. Levodopa, the main medication used to improve motor symptoms, induces immediate early gene expression associated with synaptic plasticity in striatonigral (Go), but not striatopallidal (NoGo), neurons (Carta et al. 2005; Knapska and Kaczmarek 2004). As such, the exaggerated potentiation of NoGo synapses in the DA-depleted state (Surmeier et al. 2007; Shen et al. 2008) may remain latent and may be uncovered once levodopa wears off, leading to the return of motor symptoms characterized by classical on/off states (e.g., Chen and Obering 2005).

Sensitization is not a unique property to aversive conditioning. Indeed, this same sensitization process is observed in response to amphetamine and drugs of abuse, where the strength of sensitization predicts relapse (e.g., Robinson and Berridge 2003; Schmidt and Beninger 2006). Furthermore, this sensitization is associated with an increase in striatal synaptic spine density (Li et al. 2004). Our models suggest a similar mechanism for reward-based sensitization, in that phasic DA reinforces contextual cues, but in this case involving postsynaptic D1-mediated Go learning in striatonigral neurons rather than NoGo sensitization in striatopallidal neurons. If our interpretation holds, it may also explain the high rates of relapse following rehabilitation: striatonigral Go neurons may never really unlearn the rewarding associations, which may only be prevented from being expressed during drug-free conditions. Overall, the above explanation is consonant with other evidence that extinction reflects new learning, rather than unlearning of original associations (Pavlov 1927; Bouton 2004; Redish et al. 2007).

It should further be mentioned that catalepsy is not uniquely induced by D2 antagonism. Several reports show that the selective D1 receptor antagonist SCH 23390 induces catalepsy as well (e.g., Morelli and Di Chiara 1985; Undie and Friedman 1988). In a preliminary study, we tested our model in response to simulated D1 receptor blockade and observed an increase in NoGo–Go activity and RTs, much like our haloperidol models. This result is not surprising because blocking the excitatory effect of dopaminergic projections onto striatal Go units leads to a reduction in Go activity, and hence an increase in catalepsy. Furthermore, simulated DA depletion as in Parkinson’s disease (Frank 2005, 2006) led to similar observations of catalepsy sensitization, thus raising the question of whether aspects of catalepsy in PD patients are partially learned via synaptic potentiation.

Prediction error versus novelty models of DA functioning: novel predictions

We also tested the implications of two distinct hypotheses of DA functioning. We showed that both the RPE and novelty hypotheses of phasic DA signals provide reasonable explanations for the observed behavior, but require different assumptions and make different predictions. The RPE model assumed that NoGo neurons learn specific associations to context A, which do not generalize to context B. In contrast, the novelty model need not assume separate NoGo representations of contexts A and B but instead assume that context B elicits a novelty-related DA burst that promotes Go signals and thereby overcomes the catalepsy that would be produced by NoGo activity. This idea is consistent with evidence showing that phasic DA bursts in response to a conditioned stimulus are associated with speeded RTs in that trial (Satoh et al. 2003).

Our behavioral experiment discriminates between these accounts and falsifies the novelty hypothesis: controlled manipulation of context B novelty had no effect on the context dependency of catalepsy expression. This result is thus consistent with the prediction generated by our reward prediction error model, which posits that NoGo learning occurred in striatopallidal neurons linking the sensory context (A) with a NoGo response.

Limitations

Despite our model’s success in accounting for different aspects of haloperidol-induced catalepsy sensitization, extinction, and renewal within an existing framework, the model has several neurobiological limitations that need to be addressed in future work.

First, we focus on haloperidol effects on the D2 receptor (to which it is most strongly bound) in the striatum (where there are by far the greatest number of D2 receptors (Camps et al. 1989), and which has been implicated in PD). However, it must be acknowledged that additionally, D2, D3, and D4 receptors are also likely to be blocked in the frontal cortex, olfactory bulb, amygdala, and hippocampus. Given limited data, it is not clear if these effects play a crucial role in synaptic plasticity changes induced by haloperidol, nor whether these structures are involved in catalepsy expression. Haloperidol effects on synaptic plasticity in the striatal D2 pathway on the other hand are well studied and suffice to provide an explanation for the observed phenomena and derive novel testable predictions.

Another effect not explicitly modeled is that haloperidol can also elevate striatal DA levels via concomitant blockade of presynaptic D2 autoreceptors (Wu et al. 2002; Garris et al. 2003; Frank and O’Reilly 2006). This increased DA would then stimulate D1 receptors, and could therefore actually enhance Go signals. Indeed, these presynaptic effects have been implicated in the delay to catalepsy onset (Garris et al. 2003). In humans, a single low dose of haloperidol can actually enhance Go learning, presumably via preferential presynaptic mechanisms (Frank and O’Reilly 2006). Nevertheless, with higher doses and chronic administration, the postsynaptic effect dominates (likely due to the greater excitability of NoGo than Go cells; Lei et al. (2004); Kreitzer and Malenka (2007)), leading to overall more NoGo activation (and learning). Thus, inclusion of autoreceptors effects would only delay the inevitable occurrence of catalepsy.

D2 receptors can also act via presynaptic heteroreceptors to regulate cortical glutamatergic input to striatum. Thus, blockade of these receptors would lead to stronger cortical input. Because cortical input is stronger onto NoGo than Go neurons (reviewed above), this effect would likely add to that resulting from postsynaptic D2 blockade. Nevertheless, explicit modeling of this mechanism may shed more light on its potential relevance.

Conclusion

In sum, we provided a neurocomputational account for a constellation of findings in the domain of haloperidol-induced catalepsy sensitization. The model used to generate the findings is the same which has accounted for differential patterns of learning in humans on and off DA medications. The current findings extend the generality of the model to observations in a completely different experimental procedure, setting, and species. The behavioral experiment suggests that the reward prediction error model is more suitable than the novelty model to explain the observed phenomena.