Background

Reinforcement-mediated associative learning, or the ability to ascribe deterministic relationships between environmental stimuli that signal the probability of receiving a primary reinforcer (e.g. food), has been divided into 3 components or stages [1]. The first, unlearned stage is the organism's experience with a primary reinforcer that elicits some unlearned/innate physiological response (e.g. palatability of a food or satiety). The second phase involves the attachment of salience to external stimuli (discriminative stimuli) that signal whether the delivery of a primary reinforcer will or will not occur based on some emitted behavior (e.g. a rodent lever presses for food pellets only when a stimulus light is illuminated in an operant chamber). Finally, there is the maintenance of associative learning relationships over time (a stimulus light always signifies a food pellet is available subsequent to a lever press) [see refs. [1, 2] for review]. Berridge and Robinson [1] have proposed that dopamine plays a pivotal role in the second phase of reinforcement-mediated associative learning by molding the "incentive salience" or learned motivational properties of conditioned stimuli. In support of this view are several studies demonstrating that the disruption of dopamine signaling via acute blockade of dopamine receptors with dopamine receptor antagonists [39] or 6-OHDA lesions [1, 10, 11] do not disrupt phase 1, the palatability or primary reinforcing characteristics of a natural reinforcer (e.g. palatability) but disrupt phase 2, the formation of incentive salience.

Dopamine signaling is mediated at the cellular level by two major subclasses of G-protein coupled receptors that are referred to as D-1 like (D1aR, D1b/5R) or D-2 like (D2R, D3R, D4R) [12, 13]. Although individual genes code for each dopamine receptor subtype, there is considerable pharmacological overlap between D1-like and D2-like receptors. Several studies have used pharmacological manipulations to investigate the contribution of dopamine receptor subtypes in various stages of associative learning [1, 2, 14]. Currently it is held that excitatory dopamine D1Rs mediate the acquisition and expression of several conditioned behaviors involving food reinforcers [e.g. [6, 7]] as well as responding for conditioned cues predictive of cocaine delivery [9] – consistent with the interpretation that D1R mediated signaling might modulate the maintenance of stimulus-response outcomes. Despite a growing literature demonstrating disrupted associative learning following acute manipulations of dopamine D1R signaling during acquisition of associative learning tasks with food [e.g. [6, 7]], the actual contribution of D2R-mediated signaling on acquisition and maintenance of associative learning in rodents is unresolved.

While D2R antagonists have been used in the context of rodent associative learning paradigms, their lack of receptor subtype specificity [13] and motor-disrupting effects [6, 8] have prevented a rigorous examination of the actual role this dopamine receptor subtype plays in associative and reversal learning in the context of an operant behavior. Past experiments seeking to explore the possible involvement of dopamine D2Rs in associative learning have utilized low to moderate doses of drugs [2, 5, 6] and site-specific administration [7, 9] in order to avoid the confounding effect that locomotor disruption associated with acute D2R blockade can have on learning performance in rodents. Besides their lack of receptor subtype specificity, an acute exposure to commercially available D2 receptor antagonists fails to completely block signaling mediated solely by D2Rs. Moreover, acute dosing does not recapitulate the marked learning deficits produced in rodents [15, 16] by chronic exposure to dopamine D2R antagonists [6, 7].

The influence of dopamine D2R-mediated signaling on reversal learning is also poorly understood. Reversal learning can be conceptualized as the ability to recognize an unexpected consequence to a previously established associative learning rule and then alter response strategies accordingly [17, 18]. Reversal learning in rodents can serve as an index of learning adaptability and alertness that correlates with human attentional-shift paradigms [17, 18]. Additionally, a previous report demonstrated that excitotoxic lesions of terminal field targets of mesocortical dopamine have been shown to disrupt reversal learning [19]. Administration of the D2R/D3R antagonist, sulpiride, impairs spatial reversal leaning in mice [20] and has also been shown to impair attention and the cognitive performances of healthy human volunteers [21]. Moreover an inverse relationship between lower dopamine D2R levels and compulsive behavior in human subjects has been reported [22]. Taken together these findings suggest that D2R-mediated signaling could play a critical role in reversal learning but are insufficient to prove it.

In an effort to establish the contribution of dopamine D2 receptor-mediated signaling to associative and non-spatial reversal learning in adult mice, we compared mice [23] that had developed in the complete absence of all functional dopamine D2 receptors (D2R-/-) to wild-type littermates (D2R+/+) in a go/no-go operant learning procedure that measures primary reinforcement, sensory processing, and reversal learning [1719]. Here we report that dopamine D2R-mediated signaling must be intact if a mouse is to efficiently learn to associate a food reinforcer with a specific odor and then adaptively disengage inappropriate behavioral strategies following the reversal of reinforcement contingencies.

Results

Both wild-type and D2R deficient mice learn operant behaviors equally well

Mildly food-deprived D2R+/+ and D2R-/- mice readily learned to locate and consume food pellets during both training sessions. Figure 1 depicts the latency for both genotypes to ambulate to a dish to retrieve an unhidden food pellet. There were no differences between genotypes to retrieve the reinforcers (F1,59 = 0.2, p > 0.65), and both groups significantly decreased the amount of time necessary to perform this task along successive trials (F4,59 = 5.74, p < 0.001). Figure 2 depicts the latency to dig for the food pellet buried in a single dish filled with unscented sand. No genotype differences were found in the ability to dig through unscented sand for a hidden food pellet (F1,128 = 1.26, p > 0.27).

Figure 1
figure 1

Mice lacking dopamine D2 receptors acquire a goal-directed behavior similar to wild-type mice. Both genotypes readily learned to retrieve food pellets from a small dish and significantly decreased their latencies to perform this task across trials (* p < 0.001).

Figure 2
figure 2

Wild-type and D2R-deficient mice perform the digging task equally well. No differences in responding as a function of genotype were found, and similar latencies to retrieve the reinforcer were seen. These response patterns suggest that the training from the previous day influenced behavior, demonstrating that both genotypes are capable of learning and retaining goal-directed behaviors.

Wild-type mice outperform D2R-/-mice in an odor-driven, stimulus-discrimination, operant task

To master the odor-driven stimulus discrimination task, D2R-/- mice required significantly more trials (Fig. 3A: t(14) = 2.20; p < 0.05) and committed more errors (Fig. 3B: t(14) = 2.92; p < 0.05) than D2R+/+ mice to learn to associate a specific odor with the presence or absence of the food reinforcer. However, both genotypes did learn the task and eventually maintained accurate discrimination for a minimum of 8 correct responses out of 10 trials.

Figure 3
figure 3

Dopamine D2 receptor-mediated signaling contributes to the acquisition of odor discrimination/associative learning. Wild-type (D2R+/+) mice outperformed (mean + standard error) the D2R-/- mice during acquisition of (A) odor discrimination (*p < 0.05), and (B) committed significantly fewer discrimination errors (*p < 0.01) during the discrimination task.

Mice lacking dopamine D2receptors engage in unreinforced behavior in a perseverative manner following reversal of reinforcement contingencies

D2R-/- mice repeatedly failed to inhibit previously established learning contingencies during reversal trials (Fig. 4A: t(14) = 3.54; p < 0.01) and committed significantly more reversal errors than D2R+/+ mice (Fig. 4B: t(14)= 3.18; p < 0.01). Categorical division of reversal errors (Ferry et al., 2000) – digging in the dish that did not contain the food pellet (S-) (error of commission; Fig. 5A) versus failing to respond within 3-min of presentation (error of omission; Fig. 5B) revealed that both genotypes chiefly committed errors of commission versus errors of omission (D2R-/- mice, U = 0.00; p < 0.01; D2R+/+ mice U = 9.00, p < 0.05), D2R-/- mice committed more commission errors than D2R+/+ mice (U = 5.00, p < 0.05), and there were no differences between D2R-/- and D2R+/+ mice in omission errors (U = 27.5, p = 0.65). Moreover, the number of commission errors committed during the first reversal session (an index of stimulus bound perseveration, where all animals are responding to the reversed contingencies for the first time; Fig. 6) indicated a deficit in the D2R-/- mice compared to the D2R+/+ mice (U = 8.5, p < 0.01). Finally, in an attempt to assess whether perseveration occurred across multiple reversal sessions (Fig. 7), we analyzed the number of commission errors emitted before relearning the reinforcement contingencies during the reversal sessions. The data depicted were collected from subjects that had not achieved the 80% performance criterion. Therefore, the number of D2R+/+ mice represented in session #1 is 8, in session #2 n = 8, session #3 n = 6 (2 mice met our criterion), in session # 4 n = 5 (three met the criterion). Following session 5, the remaining 5 D2R+/+ mice had met criteria. For the D2R-/- mice, each point on the graph represents 8 subjects. None of the D2R-/- mice achieved the criterion of 80% accuracy by session 5. For the overall 2-way ANOVA, there was a significant difference between genotypes for the number of perseverative errors (F1,51 = 16.61, p < 0.001).

Figure 4
figure 4

D2R-/- mice perseverate in unreinforced behavior during reversal learning trials. The number of necessary trials (mean + standard error) to (A) demonstrate reversal learning (*p < 0.01), and (B) reversal errors committed (*p < 0.01) by D2R-/- and D2R+/+ mice.

Figure 5
figure 5

Reversal learning measures reveal D2R-/- mice emit significantly more errors of commission but not errors of omission than D2R+/+ mice. Response patterns (mean + standard error) by D2R-/- mice are more suggestive of stimulus bound perseveration and not extinction. D2R-/- mice committed significantly more errors of commission (A) than D2R+/+ mice (*p < 0.01), but no differences in errors of omission (B) were observed between genotypes (p > 0.6).

Figure 6
figure 6

Mice lacking functional D2Rs display an inability to withhold inappropriate responses. Response patterns during first 10 reversal trials reveal deficits in mice lacking D2Rs. When the mice encountered the inverted reinforcement contingencies during the first reversal learning session, the D2R-/- mice demonstrated an almost complete inability to inhibit responding to previously reinforced stimuli compared to D2R+/+ mice (* p < 0.001).

Figure 7
figure 7

D2R-/- mice commit significantly more perseverative errors across reversal sessions than wild-type mice. An analysis of the time course of responding for errors of commission across the first four reversal sessions revealed that the D2R-/- committed several more errors of commission than the D2R+/+ mice (*p < 0.001).

Discussion

In this study, we sought to determine the contribution of dopamine D2 receptor-mediated signaling to the various stages of associative and reversal learning. D2R-/- mice demonstrated that they were capable of learning to locate and consume food pellets, indicating that their locomotor behavior was not detectably disrupted and their primary motivation to obtain a natural reinforcer (food pellet) was undisturbed. Rather, the impaired ability of D2R-/- mice to assign appropriate discriminative stimulus relationships in an operant discrimination task argues that D2R-mediated signaling contributes to the neuronal processes involved in attaching salience to environmental stimuli. The deficient capacity of D2R-/- mice to disengage inappropriate decision strategies strongly argues that mesolimbic dopamine signaling, mediated by dopamine D2Rs is essential for efficient reversal learning to occur.

Mice with or without intact dopamine D2R-mediated signaling displayed similar decreases in latencies to retrieve unhidden food pellets (Fig. 1) and learned to dig for food buried in unscented sand (Fig. 2). These findings are in complete agreement with earlier studies that utilized rats and fairly selective D2R antagonists [38], as well as even very extensive 6-OHDA lesions [1, 10, 11] in associative and operant learning paradigms. Our results add to a growing literature demonstrating a negligible role of dopamine, and now, specifically, D2Rs in the unconditioned or hedonic value of natural (food) reinforcers [1, 2].

The comparatively poor skill of D2R-/- mice during discrimination trials suggests a role for D2Rs in acquisition of appropriate S+/S- relationships in operant associative learning. Several studies have demonstrated that both the acquisition [6, 7] and expression [9] of associative learning are mediated by dopamine D1Rs. Most literature reviews identify dopamine D1Rs with dopamine-mediated learning and D2Rs with motor related behaviors [e.g. [24]]. Moreover, it has been reported that acute administration of the dopamine D2/3 antagonist, raclopride, actually improves acquisition of food-motivated associative learning [6]. However, only acute administrations of antagonists were given, and learning was not measured during complete D1R or D2R blockade [6]. Significantly, the reports cited above failed to address the technical limitations of the approach: i.e. that the antagonists used lack adequate subtype specificity and only partially blocked D2R-mediated signaling thus making it impossible to rigorously assess the role of D2R-mediated signaling in associative and reversal learning. Additionally, none of these studies addressed the observation that the effects of D2R antagonists on locomotion and learning depend on whether exposure is chronic or acute [e.g. [15, 16]].

That D2R-mediated signaling orchestrates just motor components of learning is a conclusion that is potentially biased due to the practice of avoiding doses of drugs that induce catalepsy (and therefore measuring only partial blockade of dopamine D2Rs) and single administrations [e.g. [3]]. Quite possibly, the functional role of D2Rs in associative learning might be masked because of this concern about locomotor disruption [2, 6, 8]. Doses of raclopride as low as 0.5 mg/kg significantly disrupt motor behavior [25], although this peripheral dose is consistently used in learning paradigms [e.g. [6]]. One might then ask: "How then could the role of D2Rs in associative learning be dissociated from motor behavior?" Seminal experiments [2628] have clearly demonstrated that repeated administration of catalepsy-inducing doses of D2R antagonists in rodents actually leads to a striking behavioral tolerance to catalepsy. These doses have been shown to occupy well over 80% of available D2Rs [29]. Future experiments measuring acquisition of associative learning in rodents that received chronic administration of D2R antagonists and demonstrated behavioral tolerance to their motor disrupting effects would be a logical test of this hypothesis. However, the realization that multiple dopamine receptor subtypes would be concurrently targeted with the presently commercially available antagonists, such as D2Rs, D3Rs, and D4Rs would have to be rectified. We would argue that the most parsimonious approach at this time is to utilize mice that have been genetically altered such that they are lacking one or both functional alleles of the specific receptor of interest. While they do have their limitations (e.g. developmental compensation and strain effects in mouse lines not backcrossed adequately to a parental strain for a minimum of 10 generations), use of our inbred (N20 generation) animals in the present study (where no differences in locomotor behavior were detected between D2R-/- and D2R+/+ mice; Figs. 1 &2) revealed a previously unappreciated role of D2R-mediated signaling in associative learning and attention that could not be measured with the currently available, acutely administered D2R antagonists.

A cursory analysis of our data might suggest to some that the genetic manipulation of the drd2 locus conferred a gross olfactory impairment to the D2R-/- mice. Rather, we argue that this is not likely because the D2R-/- mice did learn to retrieve the food pellets from the dishes and eventually learned to accurately discriminate odors. Recent electrophysiological data demonstrate that dopamine D2R's are located on glutamatergic terminal axons of olfactory nerves and depress excitatory input of the olfactory nerve to the mitral cells of the olfactory bulb [30, 31]. Consequently, our data, and data from other studies, suggest that the complete lack of D2Rs in the olfactory bulb does not prevent transduction of olfactory stimuli; rather it affects the ability to habituate, or tune, olfactory nerve activity associated with repeatedly encountered concentrations of chemical stimuli [32].

The performance of D2R-/- mice during reversal learning is deficient revealing a role for dopamine D2R-mediated signaling in tasks requiring behavioral flexibility

Perseverative behavioral patterns characterized the D2R-/- mice relative to D2R+/+ mice during reversal learning sessions (Figure 4), manifested during early reversal trials (Figure 6) and persisted across several sessions (Figure 7). These findings are significant because D2R-/- mice respond to food reinforcers and ultimately form and maintain odor-driven S+/S- relationships (a putative D1R-mediated behavior [6]) just as D2R+/+ mice (both groups achieved ≥ 80% discrimination accuracy, the dopamine D1Rs in our mice were not targeted). A future extension of this study would be to test performance over a fixed number of trials and determine patterns of error rates during acquisition of the discrimination and reversal learning tasks. This manipulation would control the number of errors based on trials performed to determine if the apparent difference in absolute number of errors demonstrated by the mutant mice (the putative performance deficit) is simply a reflection of extra trials. Nonetheless, the inability of D2R-/- mice to disengage from previously established S+/S- contingencies (responding to a discriminative stimulus; Fig. 4) strongly argues that mesolimbic dopamine, and in particular dopamine D2R-mediated signaling, modulates the process of alerting the subject that familiar contingencies are now associated unexpected consequences.

Schultz and colleagues have demonstrated that dopamine cells display consistent tonic firing patterns during maintenance of associative learning tasks [33]. However, phasic burst activity of dopaminergic cells occurs when discrepancies between predicted and actual reinforcement contingencies transpire [14]. This robust increase of dopaminergic cell activation in response to unpredicted outcomes has been referred to as an "error" signal [14]. To date, the molecular basis of this error signal has not been identified. The inability of the D2R-/- mice to desist responding to a previously reinforced stimulus suggests that the dopamine D2R might in fact be the focal point of this error-signaling cascade.

Electrophysiological data further support the hypothesis that signaling mediated by dopamine D2Rs tunes D1R-mediated mesocorticolimbic output. Calabresi et al. [34] demonstrated that tetanic stimulation of dorsal striatum slices prepared from D2R-/- mice is associated with enhanced EPSP and as a result increased striatal synaptic efficacy. In contrast, stimulation of dorsal striatum slices from wild-type mice resulted in IPSP activity, long-term depression, and decreased neuronal activity of striatal efferents [34]. Carlsson and colleagues [35] have speculated that dopamine D2R stimulation in the striatum serves to "brake" or diminish excitatory corticostriatal signaling and plasticity. Indeed, perseverative behavior is associated with over activity of the dorsal striatum in rodents [36] and over activity of the caudate in patients with ADHD [37] and a strong inverse correlation of D2R binding with compulsive behavior has been reported [22]. Importantly, our data indicate that the poor performances displayed by the D2R-/- mice are manifestations of reversal learning deficits and not gross motivational or sensory impairments. We therefore argue that D2Rs participate in signaling or alerting the organism of learning contingency changes during reversal learning and sculpt ongoing goal-directed behavior.

Conclusions

This study demonstrates that signaling by dopamine D2Rs does not mediate the hedonic value of reinforcers, supporting and refining earlier findings [1, 2]. However, mice completely deficient in D2Rs demonstrated disrupted acquisition of discriminative stimulus relationships and an impaired ability to recognize changes in behavioral determinants. Therefore, we suggest that during associative learning or when unexpected reinforcement outcomes are modified (reversal learning) dopamine-driven reinforcement impulses involve signaling mediated by dopamine D2Rs. In the course of associative learning, striatal synapses [38] receive experience-driven gain [7], thus permitting newly established reinforcement-mediated signals to traverse ascending efferents en route to the frontal cortex, completing a functional limbic/motor circuit [3941]. Possibly, the lack of dopamine D2R-mediated signaling prevents refinement of the corticostriatal reinforcement circuits in the brains of the D2R-/- mice, thus impairing their ability to form S+/S- contingencies and disengage inappropriate dopamine D1R-mediated associative responding when unexpected consequences to goal-directed behaviors happened. Consequently, we are compelled to conclude that the perseverative performance deficits demonstrated by the D2R-/- mice in the tasks used in the present study reveal a previously unreported role for D2Rs in associative and reversal learning.

Methods

Subjects

The sixteen, 8–10 week old, 25–30 g mice (8 per genotype) used in this study were the congenic offspring of breeders that were descendants of the original F2 hybrid (129/Sv × C57BL/6J; Kelly et al., 1997) that was backcrossed to inbred C57Bl/6J stock (Jackson Laboratories, ME). This breeding strategy was repeated for 20 successive generations with the gender of the donor/mutant alternating with each generation. All experimental subjects were genotyped by PCR as previously described [23]. All testing occured randomly across estrous cycles. The Department of Comparative Medicine at Oregon Health and Science University approved all protocols, and all animals were maintained in accordance to National Institutes of Health's Guide for the Care and Use of Laboratory Animals.

Experimental procedures

Odor discrimination training and testing

A 2-odor discrimination paradigm was chosen because it permits a measure of attentional focus in goal-directed behavior. Odor discrimination was chosen as the particular task because it is a highly robust behavior in rodents [18]. To minimize experimental error, the experimenter was blind to genotype during testing. Mice were mildly food deprived to approximately 85% ad libitum weight. Dopamine D2R-/- and D2R+/+ mice (n = 8 per group) were initially trained to ambulate the length of a polyvinyl mouse cage (30-cm) to retrieve ~30-mg of Pico® fat supplemented rodent food located in a small plastic cup (3-cm) attached to the floor of the testing apparatus with Velcro. Each subject received five trials, and the latency to locate and start to consume the food pellet was recorded manually with a standard hand-held stopwatch. Twenty-four hours later subjects were trained to ambulate towards and dig through the same 3-cm dish filled with sterile sand to locate a food pellet buried underneath the sand. Subjects received 10 trials. If a mouse failed to locate a food pellet within 10-min of the first trial, an additional food pellet was dropped on top of the dish. This manipulation usually ensured that a subject would dig through the sand to locate the original pellet and learn this task.

The next day, mice were presented to the testing apparatus that now contained 2 dishes, each filled with different scented sand. One odor signaled that a food reinforcer was buried in that particular dish (S+) while the other odor signaled that no reinforcer was contained within that particular dish (S-). The sand was scented by adding 0.80 grams of either cinnamon or dill weed to 98.8 grams of autoclaved playground sand purchased from a local farm and garden store. Finally, 0.40 grams of finely ground food was mixed with the scented sand in order to mask any odor emitted by the food reinforcer in the S+ dish. This final mixture did not appear to be palatable to the subjects. A trial terminated when the mouse: dug in the correct dish and consumed the hidden pellet, or began to dig in the incorrect dish. Digging was considered the actual physical displacement of sand with the forepaws and/or snout. Placing forelimbs on either of the dishes and sniffing was not considered digging and was not penalized or scored. Each completed trial was separated by a 20-sec inter-trial-interval (ITI). Preliminary experiments demonstrated that an extended time-out period following an incorrect response was not required to ensure goal-directed behavior in the wild-type or mutant mice for this particular task (data not shown). Therefore, the 20-sec ITI was used to separate all trials (successful and unsuccessful). The S+ and S- scents were counterbalanced across subjects in order to control for potential scent bias (pilot data suggest there are no biases, data not shown). Following accurate discrimination, defined as 8 correct retrievals across 10 consecutive trials, reversal trials began.

Reversal learning measurement

We next investigated the contribution of dopamine D2Rs to reversal learning behavior by inverting the reinforcement contingencies – now the former S+ signaled no reinforcer and the previous S- now indicated reinforcement availability. This manipulation permits an evaluation of the ability to inhibit responses to previously reinforced discriminative stimuli [18]. The number of trials necessary to establish 8 out of 10 accurate responses was our definition of successful reversal learning.

Data analysis

The latency to retrieve a pellet from an empty dish, the latency to dig through a dish filled with unscented sand, and the number of errors of commission committed across the reversal sessions were analyzed with 2-way analyses of variance (genotype × trials). The number of trials necessary to attain criterion performance in odor discrimination (8 correct responses across 10 consecutive trials) and errors committed while learning the discrimination tasks were separated as individual testing phases (e.g. odor discrimination and reversal learning) and analyzed separately with unpaired 2-tailed t-tests. Significance was established at p < 0.05. Errors committed during reversal trials were further categorically separated as errors of commission (failing to withhold responding to S-) and errors of omission (failing to dig in either dish after 3-min had expired). Categorical errors and number of commission errors committed during the first reversal learning session were analyzed between groups with nonparametric Mann-Whitney U tests.