Basal Ganglia: Habit Formation
KeywordsConditional Stimulus Medial Temporal Lobe Pavlovian Conditioning Dual Task Condition Declarative Memory
In large part, learning is characterized by the formation of new associations. Habit learning appears to be a form of associative learning that can occur in the absence of awareness of what has been learned. Behaviors that lead to a reward can become habitual with sufficient training. When behaviors are habitual, they can be automatically elicited by antecedent stimuli even when the outcome of the behavior is no longer attractive.
The Representation of Habits
Fundamentally, habits are associations between stimuli and responses, or S-R associations (Graybiel 2008; Mishkin and Petrie 1984; Packard and Knowlton 2002). For example, when you are confronted with an intersection on your drive to work, this visual stimulus may automatically elicit a left-turn response after years of driving the same route. S-R associations contrast with arbitrary associations between stimuli that we typically think of as declarative learning, such as learning associations between word pairs. However, the distinction between stimulus-stimulus (S-S) associations and S-R associations does not fall neatly in the distinction between declarative and nondeclarative memory. For example, Pavlovian conditioning is the S-S association of a conditional stimulus with an unconditional stimulus, and at least simple Pavlovian conditioning can occur in the absence of awareness of what has been learned and independently of medial temporal lobe structures, including the hippocampus, that support declarative memory. Thus, both S-S and S-R associations can be formed implicitly.
Properties of Habits
Independence of Awareness
In order to make the case that habit learning is distinct from other types of learning, it is necessary to develop a list of its properties. As stated above, learning of S-R habits does not appear to require awareness of what has been learned. That is, people may be acquiring a habit, and not recognize it until they recognize themselves performing it. A powerful means to examine whether learning depends on awareness is to test amnesic patients. These patients have a selective deficit in declarative memory due to damage in the medial temporal lobe or associated diencephalic structures (Squire and Zola-Morgan 1991). Thus, if these patients are able to learn habits, it would suggest that habit learning can proceed independently of awareness. Several studies have demonstrated that patients with severe memory problems can acquire associations between stimuli and responses normally. For example, in a “weather prediction” task in which different cues were associated with the outcome either “rain” or “sun,” amnesic patients were able to learn to select the correct choice at the same rate as healthy control subjects. Over about 100 trials, both groups were able to achieve a level of performance significantly above chance. However, amnesic patients were unable to remember much about the testing event, such as the layout of the screen or the order of events on each trial (Knowlton et al. 1996). Amnesic patients’ knowledge of the association between the cues and the outcomes also seems to differ from that of the control subjects. Control subjects were able to answer questions about the cue-outcome associations when posed in a different way than what was present during learning. For example, subjects may be asked to imagine that the outcome is sunny and that two cues had been presented, and they should decide which cues would they most likely be. While control subjects seem to have flexible knowledge of these associations, amnesic patients do not (Reber et al. 1996). It seems that their knowledge of the association between the cues and outcomes is encapsulated as a set of stimulus–response habits, while control subjects have gained declarative knowledge of these associations. While the control subjects may have an S-R representation of the cues and the correct responses that are associated with them, they also appear to have declarative knowledge of which cues are associated with each outcome that allows them flexible access to this knowledge. It appears that the amnesic patients are only able to solve the task using S-R habits, such that the cues come to elicit the correct response (pressing either the sun or rain key). These habits may exist independently of awareness and are only observable when the subject is confronted with the eliciting stimulus.
Lack of Flexibility
Another characteristic of habits is that the knowledge gained is not readily applied in other contexts, for example, in a forced-choice task, in which subjects were presented with a series of pairs of items and were required to make a choice between the items of the pair. One item of the pair was consistently rewarded. Patients with dense amnesia were able to gradually learn to select the correct item across many trials. However, they did not have any explicit knowledge of why they were selecting one item over the other, and they attributed their choices to preference rather than their experience on previous trials (Bayley et al. 2005). In contrast, control subjects learned to pick the correct item in each pair more quickly, and they were well aware that their choices were based on which item had been rewarded on previous trials. Their conscious memory for the rewarded items of each pair enabled them to pick these items out from among all the items – that is, their memory was flexible in that they could readily demonstrate their memory in a context that was different than the one that was present during training. Although the amnesic patients eventually reached the same level as the control subjects in the original forced-choice task, they were making their responses based on S-R habit. They did not have awareness of which items were rewarded and they were unable to use their knowledge flexibly. This lack of awareness and flexibility of knowledge can be considered hallmarks of habit learning.
Insensitivity to Devaluation
One difficulty in determining whether performance on a task is based on habit or declarative memory is that just about every circumstance in which performance could be supported by a habit, performance also could be supported by declarative memory. If a rat learns to press a lever resulting in a food reward, it may be that it has developed an S-R habit, with the lever automatically eliciting a response. However, it may be that the rat remembers that food appears as an outcome of the lever pressing action, an A-O association. Similarly, a person may open their refrigerator door in order to get food that is known to be inside (and A-O association), or they may be simply opening the door out of “force of habit,” in an S-R fashion.
In order to determine the nature of the memory supporting behavior, we need to use probe tasks. For example, we can test subjects’ awareness and flexibility of knowledge to demonstrate that what was learned differed for control subjects and amnesic patients. However, these tasks did not probe the structure of the underlying association – whether performance was supported by S-R or A-O connections. In studies of animal learning, this has been accomplished by assessing the effect of devaluing the outcome on subsequent performance. The logic here is that in an S-R association, unlike in an A-O association, the outcome is not represented. Thus, performance of the habit should not change if the outcome is no longer desired (Dickinson 1985). For example, if you are in the habit of turning left at a certain intersection on your route to work, you may unintentionally make this turn one Saturday when you meant to go somewhere else. Although the outcome (getting to work) was not desired at that time, the habitual response persisted when the intersection was encountered. Another example may be entering a room and reaching to turn on a light switch through force of habit. Even when you have learned that the light has burned out or perhaps it is daytime and you do not need any light, the habit persists.
In several studies of instrumental learning, the training schedule was shown to be an important factor as to whether learning is based on an A-O or S-R association (Dickinson et al. 1983; Yin et al. 2004). When rats were trained to press a bar for food under a ratio schedule, devaluing the food reinforcer resulted in a decrease in response rate, suggesting that the rats were pressing the bar in order to get food. In a ratio schedule, reinforcement is given each time the animal makes a certain number of responses – in other words, food is given after every third bar press. In contrast, training rats under an interval schedule leads to performance that is insensitive to reinforcer devaluation. In an interval schedule, reinforcement is given if a response is made at any time during an interval (e.g., 30 s). No more reinforcement is given if 1 or 100 responses are made during the interval. A real-world example may be like checking your mail. After training rats in an interval schedule, one can then make the food that was used during training unattractive to the rats. One way to accomplish this is to give the rats an unlimited amount of the food while they are in their home cage. When they become sated on the food, they are immediately tested to see whether they will continue to respond as vigorously as a hungry rat. When rats are trained on an interval schedule, devaluation of the reinforcer has no effect. This is some of the strongest evidence that performance can be supported by S-R associations and that such habit learning is distinct from learning the association between responses and their outcomes.
One reason why training with ratio and interval schedules leads to such different underlying associative representations is that the contingency between the response and the outcome differs. Under a ratio schedule, each sequence of responses is reinforced – if responding decreases, the number of reinforcers decreases. However, with an interval schedule, the number of reinforcers received is not nearly so tied to the response rate – under a 30-s schedule – and responding once per second and once per 30 s would yield the same amount of reward. The level of contingency between an action and its outcome appears to be a key factor in whether learning results in a habit (Balleine and Dickinson 1998).
Historically, the idea that learning is based on the formation of stimulus–response habits has been influential. Hull promoted the idea that all learning could be reduced to a series of S-R habits (Hull 1943). The role of reinforcement was to strengthen these S-R bonds. However, this view was challenged by the fact that some types of learning did not seem to be consistent with the acquisition of an S-R habit. The most striking example is latent learning described by Tolman and Honzik (1930), who showed that animals learn the spatial layout of an environment even when not given reinforcement. Although it seems clear in hindsight that not all learning is habit learning, the later acknowledgment that there are multiple learning and memory systems made it clear that habit learning can coexist with other forms of learning. Mishkin and colleagues made this point in drawing the distinction between habit and cognitive learning, with these two different types of learning depending on different brain systems (Mishkin et al. 1984). Differences in the neural architecture supporting these different types of learning give rise to differences in their psychological properties.
Another feature of habit learning that seems to distinguish it from declarative or “cognitive” learning is that it occurs relatively automatically. Declarative learning and the retrieval of information from declarative memory are fairly attention demanding. Memory encoding is poorer when performing a concurrent task, and while retrieval from memory may be accurate under dual task conditions, it is significantly slower (Craik et al. 1996). We certainly understand that being distracted while studying will hamper learning of new facts and that it is more difficult to remember information on a test when a distraction is present. In contrast, habit learning does not seem to benefit as much from focused attention. In fact, it is often when we are distracted that we produce a habitual response when we intended otherwise, such as when we make the turn to go to work when we in fact had set out for a different destination.
The presence of distraction during learning may result in a reliance on the habit learning system under circumstances in which one is likely to form declarative memory representations when attention is not divided. In a study using the probabilistic classification task, subjects learned two different sets of associations (Foerde et al. 2006). They were told that they would be learning to predict the weather in two different cities (Budapest and Prague) using two different sets of cues. The two different sets of cues were in different colors to make it clear which task they were performing on each trial. The subjects predicted the weather for one city under single task conditions – the weather prediction task was the focus of their attention. In other blocks, subjects predicted the weather for the other city while they concurrently performed another task in which they had to keep a running count of the number of tones that were presented. Although subjects performed numerically worse at predicting the weather when they were learning under dual task conditions, performance was similar for the two cities when subjects were later tested under single task conditions for both tasks. This suggests that while the performance on a concurrent task may slightly impair performance of a habit (perhaps by occasionally causing the subject to misperceive the cues or make the wrong response), it does not seem to impair the acquisition of the habit.
One of the most interesting behavioral findings of this study was the fact that the representations underlying single and dual task learning in these two tasks appeared to be different. For the city in which the task learned under single task conditions, subjects had flexible knowledge of the cue-outcome associations. They were able to accurately assess which cues were most likely to be present if it was rainy or sunny, so they were able to access their knowledge of the cue-outcome associations in a way that was different than that posed in the original task. For the city in which the weather prediction learning occurred under dual task conditions, this flexible knowledge was much less apparent, even though performance on the task itself was fairly normal. These results are consistent with the idea that habit learning can proceed under conditions in which attentional resources are limited. In addition, these data suggest that learning while distracted can result in the learned knowledge restricted to S-R habits rather than flexible declarative representations.
An important characteristic of declarative learning is that it can occur in a single trial. We have conscious memories for specific moments in time that we will keep in our entire lives. The unique structure of the hippocampus is thought to be able to support the rapid learning necessary for the encoding of episodes (Rolls 2007). Our experience of habit learning is different in that we think of habits as behaviors that develop over time. Habits are acquired gradually, with feedback incrementally strengthening the bond between stimulus and response (Packard and Knowlton 2002). The neural structures that support habit learning may not be capable of single-trial learning. It may be that the slow incremental learning of habits and the rapid learning of episodes are complementary, ensuring that behaviors that have a long history of reward are automatic and stable while still maintaining the flexibility to alter behavior when there is a conditional change (Squire 1992).
The gradual nature of habit learning is exemplified by the fact that overtraining typically results in goal-directed actions becoming habits. As discussed above, when rats are trained to make an instrumental response under a ratio schedule, performance is goal directed, in that responding will decrease when the reinforcer used is devalued. However, when training is continued, performance will eventually become insensitive to reinforcer devaluation, indicating that the response has become “habitized” (Dickinson 1985). Overtraining may serve to break down the contingency between responding and the outcome, because very few responses are withheld at this point. The animal no longer experiences the consequences of not making a response, so the link between the response and the outcome is less apparent (Dickinson et al. 1983).
Dependence on Feedback
Finally, an important characteristic of habit learning is that rewarding feedback is necessary to strengthen the habit. Declarative learning is thought to occur continually and incidentally (Stark and Okado 2003). As in the case of the latent learning procedure described above, information is stored in memory even without reinforcement. For habit learning, reinforcement is thought to be critical for strengthening S-R bonds, even though the reinforcer is not part of the learned representation. One study that specifically examined the role of feedback in habit learning used the probabilistic classification task (Shohamy et al. 2004). In the version of the task used by these investigators, subjects learned to guess which flavor of ice cream (chocolate or vanilla) was preferred by a Mr. Potatohead figure that was presented on each trial. Unbeknownst to the subject, the features that could be present or absent on the Mr. Potatohead were probabilistically associated with either a chocolate or a vanilla preference. The procedure was similar to that used in the weather prediction version of the task for some of the subjects. These subjects made a response (pressing either the “chocolate” or “vanilla” key) and received feedback that they were correct or incorrect on every trial. The other subjects received an “observational” version of the task in which the figures were presented followed by the correct answer (chocolate or vanilla). In the observational task, the same information about the association between cues and outcomes was present, but the subjects did not make a response that was rewarded. It appeared that these two learning situations differ, in that only feedback learning was impaired in patients with Parkinson’s disease. This finding suggests that feedback-based learning is supported by different neural substrates than observational learning.
The idea that feedback-based and observational learning differ in terms of neural mechanisms is also supported by findings from neuroimaging. Results from a study using the weather prediction version of the probabilistic classification task have shown that when the task is learned with feedback, learning is associated with activation in the striatum, while learning through observation results in activation of medial temporal lobe (Poldrack et al. 2001).
Brain Mechanisms of Habit
Mishkin and Petrie (1984) argue that “Hullian” habit learning and cognitive learning described by Tolman coexist based on the idea that distinct brain systems support different forms of learning. Mishkin and Petrie identify the basal ganglia and medial temporal lobe structures as supporting habit versus cognitive learning. A great deal of support for this idea has come from lesion studies that illustrate double dissociations between the effects of damage to the striatum of the basal ganglia on habits and lesions of the medial temporal lobe on cognitive learning. In nonhuman primates, damage to medial temporal lobe structures including the hippocampus and surrounding temporal lobe cortices results in significant impairment in cognitive memory tasks such as the delayed non-match to sample task (Murray 2000). In this task, the subjects are presented with a sample item and after a delay (from seconds to hours) are presented with the sample item and a different item. The subject’s task is to pick the new item. This task requires the monkey to remember an event – the presentation of the sample stimulus – in order to make the correct choice. Learning in this task is unaffected by lesions of the tail of the caudate nucleus – the region which receives visual information from cortex. In contrast these lesions severely impair performance on a concurrent discrimination task (Fernandez-Ruiz et al. 2001). As described above, in this task the subject is given pairs of objects, with one element of the pair consistently rewarded. As discussed previously, this task can also be learned either declaratively or as a set of S-R habits. It appears that when monkeys are given a single trial per day for each discrimination, learning becomes habitual, perhaps because they are unable to readily recall previous trials with such a long delay between them. Under these conditions, lesions of the tail of the caudate nucleus disrupt learning, while medial temporal lobe lesions have no effect (Fernandez-Ruiz et al. 2001; Malamut et al. 1984). These results indicate that habit learning and declarative learning rely on different neural systems.
Double Dissociations Between Memory Systems
Several studies using rats have demonstrated clear double dissociations between medial temporal lobe and striatal learning systems. In a seminal study by Packard et al., rats were trained on two different versions of the radial arm maze task (Packard et al. 1989). In the radial arm maze, eight arms radiate out from a central platform, and a food reward is placed at the end of each arm. In the win-shift version of this task, arms are not rebaited as the rat eats the food, so the rat should not return to these arms during the session. The win-shift version of this task taps into foraging strategies that rats have presumably evolved to efficiently recover food over an area. Packard et al. (1989) replicated the findings from a number of studies that lesions of the hippocampus severely disrupt performance in this task (Becker et al. 1980). In order to avoid revisiting arms, the rat must remember the locations it has just visited on that session. This would seem to require episodic memory, or memory for specific events.
In contrast to the win-shift version of the task, rats can also learn a win-stay discrimination on the radial arm maze, in which a cue consistently signals when an arm is baited, for example, the cue may be a light, which when illuminated indicates that food is present at the end of the arm. Unlike in the win-shift version of the task, the rat does not need to remember any specific event. Rather, the rat simply must learn to run down lit arms. Perhaps because it does not tap into a natural foraging strategy, rats learn this task gradually over many sessions – keep in mind that the important information learned in the win-shift task is acquired in single trial on each testing day. Packard et al. (1989) found that lesions of the striatum (the lateral caudate nucleus, which roughly corresponds to the primate putamen) impair learning of the win-stay version of the radial arm maze task. Like other habit learning tasks, this task is not affected by damage to the hippocampal system. Because the win-shift version of the task was not impaired by lesions of the striatum, there is a double dissociation between the effects of lesions on the hippocampus and striatum. This finding provides strong evidence that these two tasks depend on independent brain systems.
A similar dissociation was shown using the cross maze described above (Packard and McGaugh 1996). Rats in which the hippocampus was inactivated by an intracranial injection of lidocaine did not exhibit a preference for the location strategy after a moderate amount of training, unlike rats in the control group which received saline injections. Hippocampal inactivation appeared to prevent the expression of the location response. In contrast, rats in which the dorsolateral striatum was inactivated after moderate training showed normal expression of the location response, but when the dorsolateral striatum was inactivated after overtraining, the rats maintained the location response and did not switch to the habit response. Thus, the dorsolateral striatum must be active for expression of the gradually developing habit response.
There is also evidence in human subjects that there is a dissociation between striatal-dependent habit learning and learning that depends on the medial temporal lobe. The probabilistic classification task has been extensively studied in patient groups. For example, patients with Alzheimer’s disease, which is accompanied by damage to the medial temporal lobe memory system, are able to learn the S-R associations in the probabilistic classification task (Eldridge et al. 2002). However, patients with Parkinson’s disease, which affects dopaminergic input to the striatum, appear to be impaired on this task (Knowlton et al. 1996). In patients with mild Parkinson’s disease, performance is not different than that of healthy control subjects. However, fMRI showed that the patients and control subjects used different neural systems during performance of this task (Moody et al. 2004). Performing the weather prediction task was associated with greater activation in the putamen in control subjects than in the patients. The control subjects also showed a decrease in activation in the hippocampus during the weather task, a finding that has been seen before. It appears that in the controls, the hippocampus is more active in the baseline condition, in which the subject is performing a very simple task (it is likely that under such conditions, the mind can wander and the subject spontaneously engages in mnemonic activity). In contrast, in the patients with Parkinson’s disease, hippocampal activation was greater than the level seen in the baseline condition, consistent with the idea that these subjects were relying on declarative memory to support performance in this task. Because the habit learning system was compromised in these patients, they appear to have relied on declarative memory to perform this task.
A similar distinction between habit and declarative learning was found in the fMRI data of the study described above comparing performance of the weather prediction task under single task conditions and during distraction (Foerde et al. 2006). For the associations learned under single task conditions, performance was actually correlated with hippocampal activity in these healthy young subjects. For the associations learned under distraction, performance correlated with activity in the striatum and not the hippocampus, suggesting that the different neural systems were responsible for learning under the two conditions. These findings also suggest that when declarative memory is functioning well, as in undistracted young subjects, it can support learning in the probabilistic classification task. However, habit learning appears to support performance when declarative memory is not strong (older subjects, distracted young subjects).
Although these dissociations based on lesions strongly suggest that different memory systems exist, these findings do not directly demonstrate that the striatal-dependent form of learning has the defining features of habit learning. While win-stay learning in rats has some of the properties of habit learning in that it is gradual and does not depend on brain regions important for declarative learning, more recent work demonstrated that learning in this task does indeed depend on S-R associations (Sage and Knowlton 2000). After training rats on a win-stay task in which they ran down lit arms to receive food, the food reward used was devalued by pairing it in the home cage with an injection of lithium chloride, which results in a learned taste aversion to the reward food. The key question was when the rats were placed on the radial arm maze after devaluation, would their behavior be modified in any way in light of their acquired aversion to the food reward. In fact, rats continued to run down the correct arms just as rapidly as they had when the food was attractive to them. This suggests that the rats had learned an association between the light and the response to run down the arm, with the reward not represented in the association. In contrast, rats trained in the hippocampal-dependent win-shift task slow down significantly when the food reward is devalued. They still employ their typical foraging strategy of efficiently traversing the maze, but their slow responding indicates that their behavior is influenced by the reward value.
Lesions of the dorsolateral striatum have been shown to prevent habit learning in instrumental procedures as well. In one study, rats with dorsolateral striatal lesions and control rats were trained under an interval schedule to press a bar for a food reward (Yin et al. 2004). Both the lesioned group and the control group learned to make the response at roughly the same rate. However, when the reward food was devalued by pairing it with illness, the two groups behaved very differently. The performance of the control group was as expected for animals trained using an interval schedule; they continued to make the instrumental response even though the food reward was no longer attractive to them. In contrast, rats with dorsolateral striatal lesions showed a significant decrease in responding. In a sense, the lesioned rats are behaving with greater “insight” in that they are avoiding a response that is associated with undesired outcome. These data strongly make the case that the dorsolateral striatum is necessary for the development of habits. In the absence of the structure, habitual responding does not appear to develop and responding is guided by the outcome.
Basal Ganglia Loops
Other corticostriatal loops also appear to be important for behavior dependent on declarative learning. Damage to the dorsomedial caudate nucleus in the rat results in behavior that appears habitual under conditions in which intact rats perform based on declarative memory. In the Morris water maze, rats with dorsomedial caudate lesions have difficulty learning the location of a hidden platform and behave similarly to rats that have hippocampal lesions (Devan et al. 1999). This contrasts with the performance of rats with lesions of the dorsolateral caudate nucleus – this group is able to learn the location of the hidden platform, but has difficulty learning to swim to a cue that signals the location of the platform.
In another study using the cross maze, a dissociation was found between lesions of the dorsolateral and dorsomedial striatum (Yin and Knowlton 2004). Rats with dorsolateral striatal lesions did not show the typical tendency to run down an arm based on a stimulus–response habit after extensive overtraining. They retained the tendency to go to the rewarded location even when starting from a new arm. However, lesions placed more medially produced an effect more similar to that of hippocampal system lesions. These rats did not go to the location where they had been rewarded on previous trials, but rather, they responded based on the motor habit. The mediodorsal striatum in the rat is similar to the head of the caudate in the primate. This region is interconnected with prefrontal regions that may be more involved in goal-directed action than in habit learning (Yeterian and Pandya 1991). This loop may in fact be important for implementing actions based on the contents of declarative memory.
Electrophysiological studies have also supported the idea that goal-directed actions and habits depend on different corticostriatal loops. In nonhuman primates, performance early in training is accompanied by increased firing in the caudate nucleus. When the slope of the learning curve is steep, firing in the caudate nucleus was correlated with the slope of the learning curve. In contrast, firing in the putamen was correlated with the level of learning, with firing highest after extensive training when the monkeys were adept at the task (Williams and Eskandar 2006).
Communication Between Loops
Corticostriatal loops are often characterized as functioning independently, based to a great extent on anatomy. The thalamic targets of these loops are the same cortical regions that originated the loop. For example, in the motor loop, cortical projections from the motor cortex are directed to the putamen, which projects to relatively discrete regions of the globus pallidus that project to the ventrolateral nucleus of the thalamus. This thalamic nucleus sends its cortical projections back to motor cortex, thus completing the loop. However, it is also the case that these loops have the potential to interact based on other features of the anatomy. The thalamocortical projections terminate in multiple cortical layers, including layer III, the site of neurons that gives rise to cortico-cortical projections. Thus, the output of one loop is able to influence another through these cortico-cortical projections (McFarland and Haber 2002). In addition, there are extensive projections from the cortex back to the thalamus. For example, cortical regions involved in planning actions (pre-supplementary motor area and premotor cortex) send projections to VL, which is part of the motor loop that projects to primary motor cortex and supplementary motor cortex. By these “open loop” projections, different cortical regions can influence corticostriatal loops at the level of the thalamus.
A third mechanism by which loops can communicate is through the projections from the striatum to dopaminergic neurons in the midbrain (Joel and Weiner 2000). These dopaminergic neurons in the ventral tegmental area and the substantia nigra are the source of a major projection to the striatum. There is reciprocity in these projections, in that the different striatal regions project back to the midbrain region that supplied dopaminergic input to that region. For example, the ventral tegmental area is a major source of dopaminergic input to the ventral striatum, and the ventral striatum projects back to the ventral tegmental area. In addition, however, the ventral tegmental region also projects to other striatal regions, thereby allowing communication between the limbic loop and the other striatal loops. The dopaminergic neurons targeted by striatal regions of the association loop (i.e., the caudate nucleus) also give rise to both reciprocal projections and projections that target the striatal regions of the motor loop (i.e., the putamen). Thus, the midbrain dopaminergic system is another means by which corticostriatal loops may interact.
An interesting feature of the striatal projections to both the thalamus and the dopaminergic system is that there seems to be a hierarchical organization in the potential interactions between loops (Joel and Weiner 2000). Cortical regions in the limbic loop project to thalamic regions in loops involved in executive function, and cortical executive regions project to thalamic regions involved in the motor loop. A similar hierarchy exists for the dopaminergic projections. The ventral striatum (limbic loop) projects to midbrain dopaminergic regions supplying all other loops, striatal regions in the executive loop send projections to dopaminergic regions supplying the executive and motor loops, and the motor striatum primarily sends only reciprocal connections back to the dopaminergic neurons that project to it. Thus, it seems that in both mechanisms of interaction, the limbic loop has the potential for greater influence over the other loops, loops involved in executive function can influence those involved in motor function, while the motor loops have limited ability to influence the other loops.
The reciprocity of connections in the motor loop has been incorporated into computational models of learning in the striatum. Dopaminergic input plays an important role in providing a reinforcement signal in these models. The actor-critic architecture (Barto et al. 1983) captures this role of reinforcement, by postulating a “critic,” in the form of dopaminergic input, that provides the “actor,” in the form of striatal neurons, with a reinforcement signal based on whether success has been obtained. One challenge to these models is the problem of credit assignment – the consequences of an action are often delayed. These models are able to address this challenge based on the fact that the dopaminergic input to each region of the striatum arises from cells in that same zone that receives input from that striatal region. This reciprocity means that dopaminergic activity will strengthen the response just preceding it. On subsequent trials, dopaminergic activity will then occur during this response, thus strengthening the response just before it. Through this recursive mechanism, activity in the dopaminergic neurons becomes a predictive signal for reinforcement (Houk et al. 1995).
Plasticity in the Striatum
One question that arises in discussing the role of the neostriatum in learning is where the plastic changes that support learning occur, and whether the physiological mechanisms of this plasticity are similar to or different than those that have been described in the medial temporal lobe system. As in the hippocampus, long-term potentiation (LTP) occurs in the striatum, and this phenomenon has been considered a model system for the formation of habit memories (Reynolds et al. 2001). In addition, long-term depression (LTD) can also be induced in the striatum. High-frequency stimulation of the cerebral cortex induces LTP or LTD at the corticostriatal glutamatergic synapses. Dopamine is important for the induction of plasticity in the striatum. LTP or LTD cannot be readily induced in slices from animals in which dopamine is depleted (Calabresi et al. 1992; Kerr and Wickens 2001).
Unlike in the hippocampus, where LTP has been studied more extensively than LTD, striatal LTD has received more attention in the past than LTP. High-frequency stimulation trains that produce LTP in the hippocampus produce LTD in the striatum. In fact, LTP was not initially obtained in the striatum unless magnesium was absent. It appears that the facilitative effects of the lack of magnesium were mediated by dopamine-NMDA receptors, which are normally blocked by magnesium, and can be activated on presynaptic dopamine terminals, leading to an increase in dopamine levels (Wickens and Arbuthnott 2005).
In a study examining the role of dopamine in striatal plasticity, Shen et al. (2008) assessed LTP and LTD in both the direct and indirect pathways. On striatal medium spiny neurons that have D1 receptors (the direct pathway), LTP can be generated from pairing presynaptic activity with postsynaptic depolarization. When D1 receptors are blocked, this pairing leads to LTD. In medium spiny neurons with D2 receptors (the indirect pathway), LTP and LTD can be induced, depending on the timing of the pre- and postsynaptic stimulation. Blocking D2 receptors disrupts LTD, while increasing D2 receptor activation results in LTD under conditions in which LTP is normally induced. Thus, under low dopamine levels, when only higher-affinity D2 receptors are activated, plasticity is bidirectional in neurons containing D2 receptors, while only LTD is enabled in neurons containing D1 receptors. However, when dopamine levels rise, as when a reinforcing stimulus is present, LTP can be induced in neurons containing D1 receptors, while those containing D2 receptors will be biased toward LTD.
These findings point to an important role for dopamine in the control of plasticity in corticostriatal synapses and suggest that medium spiny neurons with D1 and D2 receptors play different roles in learning. When D1 receptors are activated, as in the case of a large, unexpected reward, efficiency will increase in corticostriatal synapses on neurons containing D1 receptors, thus strengthening the direct pathway, leading to greater excitation of the cortex. However, when dopamine is at a tonic level, experience would result in enhancement only in corticostriatal synapses on neurons containing D2 receptors, thus strengthening the indirect pathway, leading to increased tonic inhibition of cortex (Shen et al. 2008). In addition to differences in the role of different corticostriatal loops in behavior, conditions within each loop (e.g., dopamine levels) control the net effect of that loop on its target cortical region.
It is tempting to think of these two mechanisms of plasticity of supporting learning of actions and habits; early in learning, new associations are formed in the direct pathway when dopamine signals the presence of reinforcement. As practice increases and reinforcement becomes expected, dopamine levels decrease and performance becomes increasingly controlled by the indirect pathway, characterized by a lack of flexibility and insensitivity to devaluation.
The interplay between the direct and indirect pathways is incorporated into models of dynamic gating by the basal ganglia (Frank et al. 2001). When the direct pathway is activated, there is a disinhibition of cortex, enabling the updating of working memory and flexibility. Activity in the indirect pathway has the opposite effect – it increases inhibition in the cortex, thus favoring maintenance. In this way, the architecture of the cortico-basal ganglia system balances between the conflicting demands of updating vs. maintaining behavior. Furthermore, the existence of parallel loops with relative independent neuronal circuits allows the updating of some information, but not others according to these gating models (O’Reilly and Frank 2006).
Interactions Between Learning Systems
What does the anatomy and neurophysiology of the striatum suggest about how different systems for learning interact? The loop architecture suggests parallel modes of learning; loops involving the caudate and prefrontal cortex may be more involved in voluntary action than loops involving the putamen and motor cortex. Learning may proceed in these loops in parallel, with behavior controlled by whichever loop is the most effective at influencing the structures supporting behavioral output. While the limbic and associative loops are able to influence the effectiveness of the motor loop, there is substantial independence – one could be learning a goal-directed action while also strengthening a habit based in the motor loop. By this view, learning a goal-directed action does not impede the learning of a habit or vice versa. On the other hand, the antagonistic relationship between plasticity in the direct and indirect pathways suggests that at some level there is competition between types of learning. Linking plasticity in the direct and indirect pathways with behavior is essential for understanding the significance of striatal architecture.
If different memory systems acquire memories in parallel, competition is likely to occur at the level of control for behavioral output. It may be that the selection is based on the strength of the learned representation. Declarative memories may be formed faster and thus control behavior early in training, while habits accumulate associative strength more slowly, but eventually become dominant. For example, in the cross-maze task described above, the correct response according to a declarative memory for the location of reward is incompatible with the stimulus–response-based habit to turn in one direction in that these two responses lead to different places when a new starting location is used at the opposite end of the maze. Thus, the control of behavioral output may be determined by relative memory strength, which would likely change with practice level. If behavioral output is determined by memory strength, the critical question is how the “strongest” memory is determined that will control behavioral output. Of course, in most situations, declarative memory and habit would presumably lead to the same outcome. However, situations such as the cross maze can bring these two systems into conflict. This could be seen in the “real-life” example of making a turn along a well-learned route when in fact you had intended to turn the other way at the intersection to go to a different destination on this day. Responding based on habit and on declarative memory will lead to opposing responses.
Given that responses based on declarative memory and habit learning can differ, it is important to determine the conditions that favor the use of one representation over another. Our awareness is based on declarative knowledge. Thus, when we are consciously aware of responding based on memory, we are using declarative memory. However, when awareness is otherwise occupied, behavior is more likely to depend on habits due to their relative insensitivity to working memory load (Foerde et al. 2007). Conditions that occupy or reduce working memory resources, such as anxiety, depression, and fatigue, may bias behavior toward the learning and retrieval of habits. Individuals suffering from depression and anxiety exhibit impaired working memory (Rose and Ebmeier 2006; Samuelson et al. 2006). In addition, even mild forms of depression are accompanied by a tendency to ruminate and have intrusive thoughts, thus squandering working memory resources (Joormann and Gotlib 2008). Emotional factors may thus play a role in the type of representations that are learned and that control behavior. The implications of this idea are that people under stress or preoccupied may be more likely to fall back on old habits even when an updated approach is warranted.
Habit learning is a form of instrumental learning, in that making the response is necessary to obtain the outcome during training. In Pavlovian conditioning, pairing of the conditional stimulus (e.g., a buzzer) and an unconditional stimulus (e.g., food) leads to the conditional stimulus eliciting a conditional response (e.g., salivation). The response has no bearing on whether the unconditional stimulus is received. Interestingly, these two different forms of learning interact. Presenting cues that have been conditional stimuli in Pavlovian conditioning can act to facilitate the transfer of actions to habits. This phenomenon has been discussed most extensively in the context of addiction, where behaviors such as drug taking become compulsive and persist despite the desire to cease. The transition from voluntary action to habitual responding is thought to characterize, at least in part, the transition from drug user to drug addict (Everitt et al. 2001). During drug use, Pavlovian conditioning occurs that associates drug-related cues and contexts (conditional stimuli) with the physiological effects of the drugs (unconditional stimuli). In animal models of addiction, the presence of these conditional stimuli leads to faster learning of an instrumental response (Pavlovian-instrumental transfer) but also a faster transition from voluntary responding to habitual stimulus–response responding. This effect of Pavlovian conditioning may result from the influence of limbic loops of the striatum on other loops (Everitt and Robbins 2005).
Although we are often not aware of our habits, they arguably have a greater influence on our ongoing behavior than declarative memories do. Our habits provide a backdrop to our conscious behavior. Psychiatric disorders such as addiction and compulsive behaviors are fundamentally problems with habits that are intrusive in daily life. Thus, how habits are formed and how they interact with declarative memories are key questions in understanding both normal and abnormal behaviors.
- Barto A, Sutton R, Anderson C (1983) Neuron-like elements that can solve difficult control problems. IEEE Trans Syst Man Cybern 13Google Scholar
- Dickinson A, Nicholas DJ, Adams CD (1983) The effect of instrumental training schedule on susceptibility of reinforcer devaluation. Q J Exp Psychol 35B:35–35IGoogle Scholar
- Houk JC, Adams JL, Barto AG (1995) A model of how the basal ganglia generates and uses neural signals that predict reinforcement. In: Houk JC, Davis J, Beiser D (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, MAGoogle Scholar
- Hull CL (1943) Principles of behavior. Appleton, New YorkGoogle Scholar
- Mishkin M, Petrie HL (1984) Memories and habits: some implications for the analysis of learning and retention. In: Squire LR, Butters N (eds) Neuropsychology of memory. Guilford Press, New York, pp 287–296Google Scholar
- Mishkin M, Malamut B, Bachevalier J (1984) Memories and habits: two neural systems. In: Lynch G, McGaugh JL, Weinberger NW (eds) Neurobiology of learning and memory. Guilford, New York, pp 65–77Google Scholar
- Murray EA (2000) Memory for objects in nonhuman primates. In: Gazzaniga M (ed) The cognitive neurosciences, 2nd edn. MIT Press, Cambridge, MA, pp 753–763Google Scholar
- Tolman EC, Honzik CH (1930) Degrees of hunger, reward, and non-reward, and maze learning in rats. Univ Calif Publ Psychol 4:241–256Google Scholar
- Wickens JR, Arbuthnott GW (2005) Structural and functional interactions in the striatum at the receptor level. In: Dunnett SB, Bentovoglio M, Bjorklund A, Hokfelt T (eds) Dopamine, handbook of chemical neuroanatomy, vol 21. Elsevier, Amsterdam, pp 199–236Google Scholar