One the most cited scientific reports of the last century (over 12,000 citations) is Craik and Lockhart (1972). It has been cited so often not because it reported a seminal finding, but for the point of view stated in the report: “the memory trace is viewed as the by-product of perceptual analysis” (p. 681). This statement captured the research zeitgeist of its time, and so was convenient to cite. However, the claim made by the statement is wrong. To understand how a wrong statement could be so pervasive, one has to understand the limits of knowledge at that time.

Ten years before Craik and Lockhart (1972), cognitive psychology did not exist. It burst on the scene with the publication of Ulric Neisser’s (1967) Cognitive Psychology, which replaced behaviorally driven experimental psychology with computationally obsessed cognitive psychology. Prior to its publication, except for a cognitive studies program at Harvard University and a cognitive science program at the University of California at San Diego, there was not nominal recognition of cognitive psychology within departments of psychology. Within 10 years of Neisser’s publication, the experimental programs of leading psychology departments across the country had renamed themselves as cognitive psychology programs (Miller, 2003). At the time of the birth of cognitive psychology, there was much more active research of perception than of action, as indicated by specialized journals and meetings. In addition to general experimental psychology journals, psychologists studying perception could submit their reports to Perception and Motor Skills (founded 1949), Vision Research (founded 1961), and Perception (founded 1972). Psychologists studying action had only Perception and Motor Skills as an outlet. From 1970, the Association for Research in Vision and Ophthalmology provided an annual meeting for vision scientists. To this day, there is no comparable annual meeting of scientists studying action. So, despite Neisser’s attempt to provide balance by championing an “analysis-by-synthesis” model, and Craik and Lockhart’s mention of nonperceptual “Type II processing,” the emerging cognitive field was perception-centric, which was perfectly caught by the Craik and Lockhart quote.

However, if we begin from an evolutionary perspective and ask why animals can think and plants cannot, we can see that the relationship among action, perception, and cognition is quite different. Animals can think because they can act. The entire purpose of cognition is to generate effective action (Barsalou, 2008). Because action is central to cognition, the entire purpose of sensation and perception is to direct an action to its target (Glass, 2016).

The neural systems for cognition are largely identical to those for motor action. Though the parietal cortex, premotor cortex, and basal ganglia were originally seen to influence the control of motor movements through the effects of injury (Cubelli, Marchetti, Boscolo, & Della Sala, 2000; Rothi, Ochipa, & Heilman, 1997), they are now known to be equally central to mental action involving recall and construction, especially of visual patterns and purposeful actions (Boecker et al., 2002; Braver et al., 2001; D’Esposito et al., 1998; Doyon et al., 1997). This is because the computational processes for mental action evolved from those for motor action and are generated by the same system.

Evolution of cognition from control of motor responses

The nervous system evolved to control internal body systems, but also to respond to stimulation from the environment (Eagleman & Downer, 2018). For example, the great physiologist Pavlov discovered that placing a small amount of meat in a dog’s mouth causes it to salivate, which begins the process of digesting the meat (Todes, 2000).

In the discussion below, a distinction will be made between behavior and cognition. Behavior will refer to reflexes and to the instinctual behaviors that can be described as aggregations of reflexes with a common purpose. Cognition will refer to the mental representation of the environment—that is, to perception—as well as to memory and learning. The discussion begins with the computational requirements of behaviors that do not require perception, memory, or learning. It will be shown that the computations required for instinctual behaviors were elaborated in order to make perception, memory, and learning possible.

Reflexes may be complex actions

In contrast to the well-known patellar reflex (Rosenzweig, Breedlove, & Watson, 2005), reflexes are often more than a ballistic movement. Reflexes are often sophisticated, controlled complex responses to more specific stimuli. For example, when a painful stimulus is applied to the foot of many animals, from amphibians such as frogs to mammals such as humans, there is a defensive withdrawal response (Tresilian, 2012). Precisely where the foot is touched determines the exact trajectory of its withdrawal movement, to ensure that it is no longer in contact with the painful stimulus (Schournborg, 2008). A stimulus may direct an action toward a target as well away from it. The wiping reflex of the frog will cause the leg to move into position and then wipe away an irritating particle. Again, the action is directed by an integrated set of reflexes that direct it to the irritating particle (Berkinblit, Feldman, & Fukson, 1986).

Furthermore, a pattern of stimuli may elicit a sequence of responses through feed-forward stimulation, in which each response becomes the stimulus for the next response in the sequence. Hence, reflexes may include complex behaviors such as locomotion and feeding. When the response has several components extended over time, it is called a modal action pattern (Tresilian, 2012). In a modal action pattern, the initiation of some reflexes results in feed-forward stimulation that initiates other reflexes that form part of the action pattern. For example, birds engage in a complex sequence of actions to build nests (Walsh, Hansell, Borello, & Healy, 2011). Another example is the mammalian diving reflex (Alboni, Alboni, & Gianfranchi, 2011), which briefly survives in humans during the first months of life. When a newborn infant is placed in the water—for example, in a swimming pool—modal action patterns move the infant’s torso and limbs in an effective breast stroke, so the infant swims through the water for a few meters. Water that enters the infant’s mouth is diverted to its stomach, so the infant does not drown (Alboni et al., 2011).

Extended instinctual behaviors in specific contexts

Through modal pattern activation, reflexes respond together in functional systems to create instinctual behaviors. They engage in behaviors such as foraging, stalking, chasing, feeding, and courting. A behavior consists of a sequence of complementary actions that form a modal action pattern toward a common purpose (Lorenz, 1958).

When a single stimulus—for example, a tap on the knee—is innately sufficient to elicit a response—for example, a knee jerk—it is called an unconditioned stimulus (US). Unlike with the well-known patellar reflex (Rosenzweig et al., 2005), more than a single unconditioned stimulus may be involved in the initiation of both individual reflexes and instinctive behaviors. Several stimuli may ultimately increase the activation of response neurons at subthreshold levels, so each stimulus by itself does not elicit the response, but collectively the stimuli increase its probability, which henceforth will be called facilitation. In addition to external stimuli, internal states such as hunger, thirst, and fear may facilitate specific reflexes. External stimuli that produce subthreshold activation often occur in the same contexts as the stimuli that elicit the response, and hence are context cues. For example, the amygdala receives input from a variety of sensory neurons when they detect stimuli signaling threat and facilitates both a general response and specific responses to specific stimuli (Ohman, 2005). The amygdala aggregates and modulates the responses of the superior colliculus and inferior colliculus into a comprehensive orienting or startle response that categorizes the stimuli as good or bad and orients the body toward or away from them and prepares the body for action in response.

Conditioning

The next step in the evolution of the nervous system was the evolution of a neural mechanism that made it possible for additional contextual stimuli to elicit reflexes as the result of experience. It became possible for a contextual stimulus (e.g., a tone) that was predictive of a US (e.g., an air puff) to come to elicit the same response (e.g., an eyeblink).

When the appropriate neural pathways are in place and a contingent relationship exists between a prior contextual stimulus (e.g., a tone) encoded by a sensory neuron that facilitates a response and a subsequent US that elicits the response, the response neuron becomes more sensitive to signals from the sensory neuron for the tone, so now the earlier stimulus may also initiate the response by itself. When this occurs, the earlier stimulus is called the conditioned stimulus (CS). Kandel and his colleagues (Carew, Walters, & Kandel, 1981) demonstrated conditioning for a light touch in the sea snail, Aplysia, and determined the neural pathways that make conditioning possible. A similar mechanism has been found for conditioning eyeblink to a tone in rabbits (Christian & Thompson, 2003) and humans (Woodruff-Pak, Papka, & Ivry, 1996).

Despite the ability to condition salivation or an eyeblink to a tone, there is not a general mechanism in an animal brain that allows any stimulus to become conditioned to elicit any response. Rather, when the neural pathways for conditioning have been studied, pathways for specific possible contextual stimuli have been found that make the conditioning of that specific contextual stimulus possible (Carew et al., 1981; Christian & Thompson, 2003; Woodruff-Pak et al., 1996).

When the necessary neural pathway exists, as was mentioned above, conditioning is not the result of a temporal or spatial association between the contextual stimulus and the US, but rather is caused by a contingent relationship between them (Ward, Gallistel, & Balsam, 2013). This contingent relationship is computed by a neural mechanism that determines from the habituation level of the contextual stimulus whether it has recently appeared without immediately being followed by the US (Groves & Thompson, 1970). Consequently, when a contextual stimulus also occurs without immediately being followed by the US, no conditioning results from the contextual stimulus–US pairings (Randich & Lolordo, 1979). During conditioning the interval between the CS and the US is encoded by neurons in the cerebellum (Johansson, Jirenhed, Rasmussen, Zucc, & Hesslow, 2014). Consequently, the conditioned response occurs at the interval after the CS when the US is most likely to also occur.

The fact that conditioning is the result of a contingent relationship between the contextual stimulus and the US, rather than of their temporal or spatial association, ensures that conditioning will occur only when it has functional significance. Conditioning cannot cause a meaningless response, such as biting at air or salivating in response to an ever-present background tone. Rather, it initiates a meaningful response in a context where the corresponding US is likely to occur imminently.

Furthermore, conditioning does not establish only a single contextual stimulus as a CS. The facilitation effects of other contextual stimuli are also potentiated, and this may lead to a further specification of the precise context in which a CS is effective. Once a CS is established through conditioning, subsequent presentations of the CS that are not paired with the US do not elicit the unconditioned response forever. Rather, if the CS is consistently presented alone, it elicits the unconditioned response less and less, until it does not elicit it at all. This process is called extinction. However, after extinction occurs and there is a subsequent rest period when the CS is not presented, if the CS is again presented alone it may again elicit the unconditioned response. This is called spontaneous recovery (Todd, Vurbic, & Bouton, 2013).

When the CS is presented in the extinction context, the signals from the CS to the response neuron are weakened by contextual stimuli so that the CS no longer elicits the response. Hence, extinction is a context-specific effect. When the CS is presented later in a different context where the same set of contextual stimuli are not present, its signal is no longer as weakened and again sometimes elicits the response, thus exhibiting spontaneous recovery (Todd et al., 2013). Spontaneous recovery may also occur because the internal states partly defining the context of the extinguished CS have changed.

In addition to spontaneous recovery, there are a variety of other situations in which a change of context causes a CS that has undergone extinction to again elicit a response (Todd et al., 2013). In renewal, extinguished responding returns when the CS is tested in a different environmental context from the extinction context (e.g., Bouton & Bolles, 1979). In reinstatement, the response to the CS recovers even if the US is presented by itself again after extinction (e.g., Rescorla & Heth, 1975), because the US was part of the original training context but was not part of the extinction context. In rapid reacquisition, responding to the CS can return quickly if CS–US pairings are resumed after extinction (e.g., Napier, McRae, & Kehoe, 1992).

Extinction and recovery mark an important advance in the control of action. An action may not be appropriate in all contexts. Servers may not salivate to the smell of the food they bring because the context for them is different than that for the customers. They are not about to eat the food. The additional neural complexity of pathways from multiple sensory neurons to a response neuron makes it possible to control an action so that it is only made in an appropriate context. This is accomplished through an increased sophistication of the nervous system, which operates not only through activation, but through inhibition as well. The appropriate contexts for a response are selected by inhibiting it in contexts where it is inappropriate (Bouton & Bolles, 1979; Napier et al., 1992; Rescorla & Heth, 1975; Todd et al., 2013).

To summarize, conditioning has characteristics that make it useful for controlling action: A stimulus must predict some other stimulus that elicits action—that is, the US–unconditioned response pair (Lubow, 1965). In addition to the US, additional circuits between other stimuli and the unconditioned response may define the contexts in which the response is appropriate and the contexts in which it is not.

The sophisticated mechanisms for responding to stimuli in the environment provided the context for the evolutionary adaptation that crossed the boundary from producing phenomena that may be characterized as purely behavioral to phenomena that are cognitive. This is the evolution of trace conditioning.

Delay versus trace conditioning

For conditioning to occur, the contextual stimulus must begin prior to the US. The most common and robust form of conditioning is called delay conditioning. Delay conditioning occurs when the US begins in the presence of the contextual stimulus. For example, a tone is sounded, and while the tone is still being sounded a puff of air is sent to the eye, so that that the termination of the tone and the application of the air to the eye coincide (Thompson, 1986). In humans, the neural pathways for delay conditioning for avoidance reflexes are clustered in two structures of the brain, the cerebellum and the amygdala (Eichenbaum, 2008). In order for delay conditioning to occur, it is only necessary for the US to occur after the onset of the contextual stimulus, so that the contextual stimulus begins first and then the contextual stimulus and US occur together. Neither memory, nor learning, nor even a perceptual system is required for two simultaneous stimuli to affect the nervous system so that subsequent behavior as altered.

At the top of the midbrain in mammals is the forebrain, which contains numerous structures, including an anatomical area called the hippocampus. The hippocampus makes an additional form of conditioning, called trace conditioning, possible. Trace conditioning describes the situation in which the contextual stimulus ends before the US begins. Hence, there is an interval between the end of the contextual stimulus and the onset of the US, which may be as brief as half a second. In order for conditioning to occur, some representation of the contextual stimulus, called a trace, must remain in the brain after it has ended—thus, the name trace conditioning (Beylin et al., 2001).

The brain must contain some mechanism for preserving the trace of the earlier stimulus. In addition to the structures necessary for delay conditioning, trace conditioning requires a functioning hippocampus. Both the cerebellum and the hippocampus are active during trace conditioning of the eyeblink reflex (Tseng, Guan, Disterhoft, & Weiss, 2004), and both the amygdala and the hippocampus are active during trace fear conditioning (Desmedt, Marighetto, Garcia, & Jaffard, 2003). Damage to the hippocampus eliminates trace conditioning but leaves delay conditioning intact. Furthermore, if the hippocampus is damaged after trace conditioning, the reflex continues to respond to the CS. So the role of the hippocampus in trace conditioning is specifically to associate the US with the trace of the novel stimulus during training. Once that association has been formed, a functioning hippocampus is not required in order to preserve it.

Of course, once the trace of a no-longer-present stimulus is being conditioned, memory has become an ability. It is no longer the case of stimuli eliciting responses. Memory requires intervening mental representation over time, which marks the beginning of cognition. Furthermore, the elicitation of the response is not the consequence of a single stimulus but may the result of a set of stimuli that together provide a description of a specific context. When “description of a specific context” is a correct description of the stimulus set, this set is a representation of the world, so perception has also been added as a cognitive ability. In fact, the hippocampus receives input from both the visual and auditory perceptual systems (Glass, 2016). As we shall below, it uses visual input to construct mental maps that direct exploration of the environment (Stensola et al., 2012).

Improvisation and habit

Given the sophisticated, context-specific, instinctual behaviors that are possible without perception, memory, or learning, one might ask why these cognitive abilities evolved. One possible explanation is the increasing unpredictability of the environment. The main source of the unpredictability would have been the variety of increasingly sophisticated attack behaviors of predators and the increasingly sophisticated defense behaviors of prey. An animal whose locations and defenses are predictable will eventually attract a predator that can find and overcome it. An animal whose attacks are predictable will eventually find itself in an environment with prey that can outwit it. When an animal’s behavior is no longer context-specific, when it is no longer the case that innately specified stimuli elicit innately specified responses, but instead the animal can explore and improvise actions in novel environments as the range of contexts in which it can find food and shelter is extended, its chance of survival increases (Glass, 2016).

Suppose that an improvised action in response to a threat or opportunity is successful. The value of the action is greatly increased if it can be remembered and repeated when a similar situation arises. Consequently, learning was added to the roster of cognitive abilities. To enable cognition, a forebrain grew on top of the brain stem that neither replaced nor bypassed the brain stem, but instead extended into the brain stem and repurposed operations that had originally evolved for controlling specific behaviors so that they could be recruited for more general use in the construction of improvisational—that is, voluntary—actions.

Voluntary action and learning are completely different from conditioning. In conditioning, the environment controls an animal’s actions by activating a reflex. For example, when you salivate, you do not choose to salivate. This occurs as the result of an external stimulus. However, when you pour yourself a glass of water and put it to your lips, you chose to perform these voluntary actions. In Pavlov’s classic experiment, a bell came to elicit salivation (Pavlov, 1927). In contrast, in Skinner’s classic experimental paradigm, an animal is given the opportunity to press a bar to obtain bits of food or water (Skinner, 1938). Given this opportunity, an animal will press the bar repeatedly to obtain the reward of food or water. In conditioning, a stimulus precedes and elicits an automatic response. In learning, a reward follows the voluntary action and increases the probability that it will be performed again.

Nevertheless, some animal behaviors have both improvisational and instinctual components. Neural pathways define the kinds of actions that will be elicited by specific rewards. Actions associated with feeding are increased by food rewards (Timberlake & Lucas, 1989). For example, in hamsters, a food reward will increase the frequency of foraging behaviors—for example, digging—but not of activities unrelated to foraging or feeding—for example, grooming (Shettleworth, 1975). Some birds improve the construction of successive nests, indicating some degree of learning from previous attempts (Walsh et al., 2011).

Finally, as we shall see below, the advantages of fast, automatic, context-specific, and instinctual behaviors controlled by the brain stem are recapitulated by the habit system in the forebrain.

To summarize, voluntary action and learning do not appear to have evolved all at once as replacements for complicated, context-specific, instinctual behaviors. Rather, they appear to have emerged as an ability to modify instinctual behaviors so that they would be effective in a wider range of contexts, and might improve in effectiveness when repeated in the same context. For mammals, these advantages appear to have been significant enough to result in an increase in the size of the forebrain, so that perception of the world came to dominate the detection of stimuli and the ability to improvise motor actions in new contexts and retain successful ones became a useful supplement to instinctive behaviors elicited by innately specified stimuli. Brain systems that improvise motor actions and retain successful ones are cognitive systems. Improvising motor actions and retaining them are descriptions of the core cognitive abilities of learning and memory, in the context of their core function of generating effective actions.

Furthermore, once the neural machinery was in place to inhibit motor actions by inhibiting the response neurons that initiated muscle movements, that machinery could be generalized to the inhibition of neurons that performed perceptual processing, as well (Knight, Hillyard, Woods, & Neville, 1981; Remington, 1980; Sperling & Weichselgartner, 1995). Consequently, perception is not a case of passive stimulation, and attention, which is mental action, is on the roster of cognitive abilities. Hence, it is not surprising that, as I described above, exactly the same cortical regions in the parietal and frontal cortices that were first discovered to be involved in motor action were later found to be involved in mental action, as well (Boecker et al., 2002; Braver et al., 2001; D’Esposito et al., 1998; Doyon et al., 1997).

Cognition did not somehow begin without purpose, independently of motor action, and then find a purpose in making motor responses more effective. Likewise, cognition did not begin when the plainly cognitive abilities involved in motor improvisation, learning, and repetition in an appropriate context were somehow duplicated in different neural circuitry for an abstract, nonmotor domain. Rather, the cognitive operations that controlled motor neurons were extended to perceptual neurons, so that the same cortical operations control both motor and perceptual—hence, mental—actions, as is evidenced by the fact that whether an action is motor or mental, it originates from the same region of the brain and passes through the same central subcortical transmission centers, the basal ganglia and the thalamus (Schmahmann & Pandya, 2008).

Organization of this report

At the end of the last century, long-standing lines of research finally revealed the organization of the neural systems underlying mammalian cognition. Three closely integrated neural systems—the improvisational subsystem of the instrumental system, the habit system, and the emotional–arousal system—provide the computational machinery of mammalian, including human, cognition (Knowlton, Mangels, & Squire, 1996a; Packard, 1999; Packard, Hirsh, & White, 1989; Packard & McGaugh, 1992; Packard & Teather, 1997). The instrumental system makes use of sensory information to construct a perceptual representation of the world, and the improvisational subsystem constructs ad-hoc actions to novel targets (Stensola et al., 2012). The habit system encodes sequences of both motor and mental actions and initiates previously successful actions to familiar targets (Sinha & Glass, 2017; Ziessler & Nattkemper, 2001). The emotional system evaluates both targets and the consequences of actions (Packard et al., 1989).

This review consists of six sections. First, foundational evidence from studies of both animal (Packard & McGaugh, 1996) and human (Ziessler & Nattkemper, 2001) learning will be reviewed, revealing the importance of two fundamental operations performed by two different functional systems in the brain: the ability of the habit system (Yin & Knowlton, 2006) to encode both sequences of actions and perceptual sequences, and the ability of the instrumental system to use the level of activation in the perceptual system to encode the perceived recency/novelty of a visual target (Suzuki & Naya, 2014).

Second, an overview of the human instrumental and habit systems is provided, and the version of the dual-system model presented here (Glass, 2016) is compared with earlier versions (Squire & Zola, 1996; Ullman, 2004). To adequately describe the relationship between the instrumental system and the habit system, it will be necessary to go beyond the focus of this review on the recency/novelty and sequence generation operations, and briefly to describe two subsystems of the instrumental system. The stop-action subsystem (Wiecki & Frank, 2013) enables the instrumental system to stop activity by the habit system when an unexpected target is encountered. The semantic subsystem makes available to the instrumental system everything known that is relevant to the unexpected target, in order to increase the likelihood that the action taken in response to that target will be effective (Neely, 1977).

Third, those tasks are reviewed for which fMRI (Petrides, Alivisatos, & Evans, 1995) or the effects of brain damage (Ullman, 2004) have provided direct evidence for the role of each system. Fourth, the results of other tasks that lack confirming fMRI evidence but appear consistent with the dual-system hypothesis are reviewed (Atkinson & Juola, 1973, 1974). The third and fourth sections will demonstrate that an important advantage of the dual-system model is that it provides a unifying explanatory framework for what are currently treated as disparate findings, through the known effects of actual functional neural systems of the brain. Fifth, the implications of the theory for working memory, divided attention, and consciousness, which are the core concerns of this special issue, are described. Sixth, the contribution of dual-system theory to the understanding of cognition is summarized.

There is no doubt, from the results of the studies mentioned above, that both emotional arousal (Packard et al., 1989) and reward (Knowlton & Patterson, 2018; Yin & Knowlton, 2006) play causal roles in animal, including human (Cahill & McGaugh, 1998), memory, because animal learning and memory systems are designed to retain what is important, and emotional arousal is an evaluative measure of importance. However, one difference between human and animal cognition is the conditions that elicit behavior and produce learning. Animal behaviors are elicited by reward or emotion, or are instinctive behaviors such as nest building (birds), foraging (hamsters), or tunnel digging (rats) that emerge at specific ages or are elicited by internal or external stimuli. For example, field rats live in mazes of tunnels that they dig. It was this observation that led Small (1901) to construct a model of the Hampton court maze to investigate how rats learn to navigate through mazes. Furthermore, Tolman and Honzik (1930) demonstrated that rats learn from instinctive exploratory behavior even if it does not produce an immediate reward.

It contrast, humans can and will perform novel tasks unrelated to any instinct, and can learn from mundane tasks without reward. Consequently, it is possible to investigate and describe the performance of the improvisational and habit systems in novel, mundane tasks without reference to the emotional system, so the role of the emotional system in human cognition will not be reviewed here.

Discovery of the improvisational and habit systems of mammalian cognition

As an illustration of the methodology and results that have produced evidence for two distinct systems of mammalian cognition, several illuminating experiments will be described in detail.

Place versus response learning

Maze learning has proved to be one of the most fruitful experimental paradigms ever invented. Two rival explanations of what a rat learns when it navigates a maze were investigated for decades. The response-learning hypothesis was that the rat learned a sequence of turns—for example, right, left, right—that navigated the choice points from the entrance to the goal (Thorndike, 1933). The place-learning hypothesis was that the rat created a mental map of its environment so that, even if its usual route to the goal were blocked, it could use the map to find a new path to the goal (Tolman, 1932). Nearly a century of research into maze learning produced conflicting evidence, until in the final decade of the 20th century, both hypotheses were conclusively shown to be correct.

Packard and McGaugh (1996) placed a barrier in a cross maze to create a T-maze for rats to learn. Therefore, the rat had to make only one turn in order to reach the goal to obtain the reward. At the end of one and again at the end of two weeks of practice, the barrier and start point were changed for one trial, effectively flipping the T-maze 180 deg. If the rat had learned to make a response—for example, to turn right—during the first week, the rat should have continued to make the same response of turning right when the T-maze was flipped. However, if the rat had learned where the reward was located, then the rat should turn in the opposite direction, to the left. In fact, after one week of practice, the rats turned left, clearly supporting the place-learning hypothesis. After two weeks of practice, the rats turned right, clearly supporting the response-learning hypothesis.

To determine the neural systems controlling these behaviors, Packard and McGaugh (1996) injected the anesthetic lidocaine into either the caudate or hippocampus for half of their rodent subjects. By administering a tiny amount of anesthetic, they deactivated just one structure, while the animal itself was awake and active. After one week of practice, deactivating the caudate had no effect on performance, indicating that it had no effect on the decision to turn toward the goal. However, deactivating the hippocampus resulted in the animals turning either right or left with equal probabilities, indicating that they no longer had access to a map containing the location of the goal. Thus, the hippocampus was part of the system for encoding mental maps. After two weeks of practice, deactivating the hippocampus had no effect on performance, indicating that it had no effect on the decision to continue to make the response that had until then produced a reward. However, deactivating the caudate resulted in the animals turning in the opposite direction, toward the goal. Therefore, when the caudate was deactivated, the rat’s response was no longer controlled by a system that caused it to repeat its previous response. When this control was eliminated, the animals instead relied on the mental maps that had already been learned after one week of practice.

Win–shift versus win–stay

Another functional difference between the medial temporal (including the hippocampus) system and the basal ganglia (including the caudate) system was revealed by another experiment. Packard et al. (1989) investigated the responses of rats to two different reward schemes when the rats searched a radial arm maze (eight arms radiating from a central platform) for food. These schemes are called, respectively, the win–shift and win–stay tasks. The win–shift task rewards exploratory behavior: On each trial, the animal must shift to a new pathway not previously investigated in order to obtain all the food. The win–stay task rewards routine behavior: The animal must repeatedly visit the same pathways in order to obtain all the food. Packard et al. (1989) trained different rats on the win–shift and win–stay tasks in a radial maze. In the win–shift task, all maze arms contained one reward each day, and the rat had to visit each arm just once and then shift to a new arm in order to obtain all the food. This task required that the rat remember where it had just been. Rats with caudate damage performed normally on this task, but rats with hippocampus damage were impaired. In the win–stay task, four of the eight maze arms were illuminated each day, and a rat could repeatedly search any one to find a reward. This task required that the same actions be repeated. Rats with caudate damage were impaired on this task, but rats with fornix damage (adjacent to the hippocampus) performed better than normal rats.

Together, Packard and McGaugh (1996) and Packard et al. (1989) produced evidence for two distinct neural systems of mammalian navigation. A wayfinding neural system, which includes the hippocampus, constructed and made use of a mental map of the area when exploring a new area. Packard et al. showed that the map kept track of where the animal had and had not been, which ensured that the territory was completely explored. A route-following system, which includes the caudate, retrieved and executed a sequence of actions to move through a familiar route to a goal. Packard et al. showed that retrieving a sequence of actions that moved along a familiar route to a goal was an efficient way of reaching a goal in a familiar context. For example, the first time you are in a new supermarket, you must remember where you have already been while searching for items on your list. When you return to a familiar store, you retrieve and retrace the steps you made on previous trips in order to find the items you need.

The Packard et al. (1989) and Packard and McGaugh (1996) studies were part of a worldwide explosion of thousands of trans-species studies of animal navigation centered on cognitive maps in the hippocampus (Jacobs & Schenk, 2003), which was stimulated by discoveries of place cells in the hippocampus that confirmed the existence of such maps (O’Keefe & Dostrovsky, 1971; O’Keefe & Nadel, 1978), as well as grid cells in the adjacent entorhinal cortex that indicated how the maps might be encoded (Stensola et al., 2012). The intensive study of navigation revealed general mammalian mechanisms for learning and retention in the context of navigation that have subsequently been shown to be relevant to a variety of tasks and that have been extended to studies of human cognition (Patterson, & Knowlton, 2018; Knowlton & Patterson, 2018; Pennartz, Ito, Verschure, Battaglia, & Robbins, 2011; Ullman, 2004). The instrumental system generated a perceptual recency effect that made the win–shift strategy possible for exploration but was applicable to other tasks. The habit system’s ability to retrieve sequences of actions made it possible to retrieve familiar route sequences that led right to the goal, but this ability was also applicable to other tasks.

Perceptual recency/novelty

On your first visit to the supermarket, it may be that each location you visit is encoded on a mental map. However, there are a variety of ways that the location information encoded in the map may be expressed mentally in order to guide a rat or human exploring a new area. One way is that areas just explored could be perceived as recent and areas not yet explored could be perceived as novel. In fact, Suzuki and Naya (2014) found that one function of the medial temporal system is to generate perceptions of recency/novelty. Clearly, the ability to keep track of what you have just seen or done is essential for performing any task requiring sequential actions—for example, preparing a meal or shopping for clothes. The general usefulness of this ability suggests that the medial temporal instrumental system has a functional role in tasks beyond wayfinding.

Retrieving motor and perceptual sequences

The ability to learn a sequence of actions is obviously essential to a variety of tasks that people routinely perform beyond traversing a familiar route, from getting washed and dressed to performing complex calculations. The same basal ganglia structures are active in all of these tasks (Doya, 2000), and the learning of action sequences is impaired by damage to the basal ganglia, such as in Huntington’s disease (Holtbernd et al., 2016). However, though originally found to be associated with motor control, parts of the basal ganglia have also been found to be active in a variety of cognitive tasks involving perception, attention, language, and planning (Doya, 2000).

The link between the role of the basal ganglia in learning action sequences and the role of the basal ganglia in learning perceptual sequences was established in a series of experiments on perceptual sequence learning (Nissen & Bullemer, 1987; Ziessler & Nattkemper, 2001). If an initially randomly generated sequence of targets is repeated, and a participant must make a classification response to each target—for example, a left key for green targets and a right key for red targets—each time the sequence repeats, the time needed to respond to each target decreases. However, when the repeated sequence includes more than nine items and/or when a participant is distracted by performing another task at the same time, there is little or no recognition or recall of the sequence. In this case, the participant has no awareness that he or she has been responding to a repeated sequence, and has no ability to recognize the sequence (Nissen & Bullemer, 1987).

Ziessler and Nattkemper (2001) showed that a participant does not encode a perceptual representation of the target sequence. Nor does a participant encode a representation of the response sequence. Rather, the habit system begins by encoding an action–effect pair, in which the effect is the target that immediately follows an action. Since each target cues a specific action, the action–effect pair cues the next action, which may be prepared in advance of the appearance of its target cue and performed as soon as this target cue appears. Hence, response time decreases.

Voluntary action to specific locations has been found to be the engine that organizes action–effect pairs into sequences and drives the learning of a sequence of actions to both visual and nonvisual targets (Deroost & Soetens, 2006), including auditory targets (Hartman, Knopman, & Nissen, 1989; Hoffmann, Sebald, & Stöcker, 2001).

Sequence learning is not impaired in patients with amnesia from hippocampal damage (Nissen & Bullemer, 1987; Nissen, Willingham, & Hartman, 1989; Reber & Squire, 1998). This robustness with respect to hippocampal damage, together with the lack of declarative memory, indicates that sequence learning is not produced by the improvisational system.

Sequence learning is impaired in patients with moderate Parkinson’s disease (Deroost, Kerckhofs, Coene, Wijnants, & Soetens, 2006; Vandenbossche et al., 2013) and in patients with Huntington’s disease (Knopman & Nissen, 1991), which are the results of damage to the caudate and associated basal ganglia structures, respectively. Hence, sequence learning is produced by the habit system.

To summarize, when a repeated sequence is embedded in a much longer perceptual sequence, but the beginning and end of the sequence are not perceptually marked in any way, the repeated sequence is not recognized as having been seen or heard before, despite multiple repetitions (Glass, Krejci, & Goldman, 1989). Nevertheless, if an individual perceiving the sequence makes a classification response to each item of the sequence, the responses to successive repetitions of the sequence are faster, indicating that the sequence has been encoded (Nissen & Bullemer, 1987). This is because the habit system encodes each item of the sequence as the consequence of the motor response to the previous item. Subsequently, when the sequence is repeated and each item is presented, the motor response to each item serves as a cue that retrieves the next item in the sequence, so that before the next item is shown, the response to it has been prepared (Ziessler & Nattkemper, 2001). Damage to the habit system impairs both sequence learning and sequence retrieval (Knopman & Nissen, 1991), but damage to the improvisational system impairs neither (Reber & Squire, 1998).

Simultaneous recency and sequence retrieval

The finding that it was the habit system that serially generated a visual sequence to produce perceptual sequence learning led Sinha and Glass (2017) to investigate whether distinct effects of the roles of the improvisational and habit systems could explain a paradoxical finding about immediate visual recognition. They found that paradoxical response times (RTs) for samedifferent judgments were caused by the distinct, complementary contributions of the habit system and the improvisational system to visual recognition. Participants had to respond as rapidly as possible whether successively presented four-consonant strings (e.g., RQBC vs. RQDC) were the same or different. The different RT was an increasing function of the first left-to-right position at which there was a difference between the study string and the test string, indicating serial, left-to-right generation of the study string by the habit system, which terminated when a mismatch between the just-generated study-string consonant and the consonant in the same position of the test string was found. However, the same RT was faster than the different RT, indicating that the improvisational system generated a “same” response on the basis of the perceived recency of the entire test string, without comparing it to the study string (Bamber, 1969; Proctor & Healy, 1987). Supporting this interpretation, Sinha and Glass found that “different” responses were associated with fMRI activation of the caudate and hippocampus, and “same” responses were associated with activation just of the hippocampus.

Yin and Knowlton (2006) theoretical integration

Contemporaneous animal behavioral and neural research broadened the scope of the systems to tasks other than navigation, as was reviewed by Yin and Knowlton (2006). These researchers described the mammalian brain as organized into two distinct neural systems: an improvisational system that controlled the construction of ad-hoc voluntary actions in a novel situation, and a habit system that retrieved and executed a previously effective sequence of actions when a familiar task was performed in a familiar context. Wayfinding and route-following were the consequences of applying the improvisational system and habit system, respectively, to the navigation of novel and familiar locales.

Overview of the systems-and-skills approach to human cognition

The subcortical and cortical structures that compose the instrumental and habit systems in other mammals have the same functional roles in humans. However, the giant human cortex plays a proportionately large role in human cognition, so it is necessary to include functional paths between subcortical and cortical structures in the description of the human systems.

Improvisational subsystem of instrumental system

The instrumental system constructs the visual representation of the world and directs the construction of ad-hoc actions in response to novel targets. As such, the entire visual recognition system, also called the “low road” or ventral system (Goodale & Milner, 1992; Ungerleider & Mishkin, 1982), is a subsystem of the instrumental system. The visual recognition system extends from the occipital cortex down through the inferior lateral temporal cortex to the medial temporal cortex. The visual recognition system constructs a succession of more detailed representations of the environment until it ultimately constructs a representation that contains objects that may be the targets of actions (Goodale, 2000).

When you awake in the morning, you do not see colored shapes, but meaningful objects whose purpose you know. The instrumental system orients an individual to its immediate environment, and this includes actions that have just been performed. To this end, objects are perceived as recent if they have just been seen, familiar if they have ever been seen, and novel otherwise. When you use a clicker to move through television channels and you complete a circuit, you notice the recency of a channel you previously clicked on. As you go through the channels, shows that you have seen at some point in the past appear familiar, and shows you have never seen before appear novel. Recency and novelty are generated within the visual system as a by-product of the degree of habituation of the neuronal response. When a visual object is repeated—hence, detected by exactly the same neurons as its previous appearance—habituation necessarily reduces the response (Groves & Thompson, 1970). The habituated response is ultimately detected by the perirhinal cortex (Suzuki & Naya, 2014). Jointly, the perirhinal cortex and area CA3 of the hippocampus determine the degree of perceived recency of the visual target (Dimsdale-Zucker, Ritchey, Ekstrom, Yonelinas, & Ranganath, 2018; Ji & Maren, 2008). When you click through the channels, the novelty of each perception informs you that there is more to see, until you perceive a channel is recent. Different vendors provide different numbers of channels. However, a viewer can always improvise a single cycle through them, because a single perception of recency indicates that all have been seen.

Habit system

When the instrumental system constructs the visual representation of the world, it is usually the habit system that directs the response to it. Life is routine. Every day a person directs the same actions to the same visual targets. One central function of the habit system is to encode and retrieve sequences of actions for routine tasks. To this end, the habit system must have access to visual target information in order to cue and direct actions. Therefore, to perform its function, the habit system includes the perceptual–motor system, which uses visual information for directing action, also called the “high road” or dorsal system (Goodale & Milner, 1992; Ungerleider & Mishkin, 1982). The perceptual–motor recognition system extends through the parietal cortex. To encode and retrieve sequences of actions, the habit system also encompasses all the motor-planning areas of the frontal cortex.

The motor-planning areas of the habit system are, within the frontal cortex, the premotor area, supplementary motor area, the ventrolateral area, including left Brodmann areas (BA) 44 and 45 (traditionally called Broca’s area), and the basal ganglia (G. E. Alexander, DeLong, & Strick, 1986; De Renzi, 1989; Heilman, Watson, & Rothi, 1997; Hikosaka et al., 1996; Middleton & Strick, 2000a, 2000b; Mishkin, Malamut, & Bachevalier, 1984; Rizzolatti, Fogassi, & Gallese, 2000; Squire & Zola, 1996). These areas have evolved to encode not only motor sequences (Aldridge & Berridge, 1998) but perceptual sequences, and abstract sequences as well (Boecker et al., 2002; Doyon et al., 1997; Graybiel, 1995; Howard & Howard, 1997; Saint-Cyr, Taylor, & Lang, 1988). Areas that have been found to encode and retrieve perceptual sequences are Broca’s area (Gelfand & Bookheimer, 2003; Maess, Koelsch, Gunter, & Friederici, 2001) and the caudate nucleus of the basal ganglia (Sinha & Glass, 2017).

Stop-action subsystem

Skillful behavior involves close coordination between the habit system and the improvisational system. As long as predicted targets appear on schedule so that planned actions may be performed, the activity is under the control of the habit system. However, when a predicted target does not appear on schedule, the control of action immediately transfers to the improvisational system, which directs the construction of an ad-hoc action to the unexpected target. So, when one walks or runs through the world, changing terrain is not an impediment to progress. An unexpected step or incline is momentarily addressed by the improvisational system, and then ordinary walking under the control of the habit system resumes.

The habit system performs sequences of actions to familiar targets. Some sequences of actions have endogenous stopping points, such as tapping three times or singing until the end of a song. Many other sequences of actions, such as walking a familiar route to a goal, are under exogenous and subcortical control. Such a control loop provides rapid, accurate movement but sacrifices the additional processing capacity necessary to detect the context when the action sequence is to be terminated. The task of goal detection falls to the improvisational system, which inhibits the habit system from continuing to perform a sequence of actions (Glass, 2016).

Unfortunately, a clear description of the neural mechanisms for behavioral inhibition is bedeviled by the fact that behavioral inhibition may be initiated at the neural level by activation of a neural system that ultimately causes an activity to cease. To avoid confusion between whether behavioral or neural inhibition is referred to, at the behavioral level, stopping (rather than inhibiting) an action will be mentioned, following a convention used by others (Aron, Robbins, & Poldrack, 2014). There are a variety of mechanisms for stopping habitual action in a variety of contexts.

An alert, such as a fire alarm, may automatically stop action via the orienting response. The orienting response is initiated by subcortical systems (Groves & Thompson, 1970). Orienting is an important survival mechanism that alerts an animal to unexpected threats and opportunities.

Goal-directed action stoppages are under the control of a stop-action subsystem that includes the dorsolateral prefrontal cortex, the inferior frontal gyrus, presupplementary motor area, the striatum, and the subthalamic nucleus. The stop-action subsystem functions as part of the improvisational system, to inhibit automatic action sequences to predictable targets when an unexpected target is detected (Aron et al., 2014; Wiecki & Frank, 2013). The specific role of each anatomical component of the stop-action subsystem is disputed (Swick & Chatham, 2014), but the overall function of the system is not in dispute. Seeing a stop sign while driving may induce the stop-action subsystem to release the right foot from the gas pedal so that a subsequent motor command can move it to the brake pedal.

The period of perseveration early in life reveals competition between the improvisational and habit systems for control of a response, and the need for a stop-action subsystem to bring an end to a routine—hence, habitual—response in the context of a novel target. The earliest age at which any infants can be found who can sit up and reach for an object is 5 months. In a task called the A-not-B task, when given a choice of between a pair of objects in view, the infant reaches for the one indicated by the caretaker (L. B. Smith, Thelen, Titzer, & McLin, 1999). When the task is repeated and the same location is always indicated, the infant repeatedly reaches for the one in the indicated location. However, suppose that after six trials of reaching for the object on one side, the object on the other side is indicated?

Clearfield, Diedrich, Smith, and Thelen (2006) tested infants who performed the A-not-B task at 5, 6, 7, and 8 months of age. At 5 months, even though both objects were in plain view, and the infant clearly saw the one he was invited to reach for on that trial, 15% of the time the infant continued to reach to the location that had been indicated in Trials 1–6 but that was not the location indicated on Trial 7. This inability to alter a previously correct response when a new response is called for is known as perseveration. Over the next three months, the number of perseverative responses made by infants increased to 58%, 72%, and finally 85% of responses!

Early perseveration occurs because the development of the stop-action subsystem lags slightly behind the development of the habit system, disrupting the balance of control between the two, so that once a habit is established, even a purposeless reach to a location may be initiated by the habit system (L. B. Smith et al., 1999). Consequently, an infant may even reach to a now-empty previous location instead of to the desirable toy (Bremner & Bryant, 2001).

The stop system’s original function was the stopping of sequential motor action. However, it has evolved to stop perceptual processing, as well. As a result, it is involved in all sophisticated perceptual skills. For example, sequential eye movements are pointless unless visual processing occurs for the target at each fixation point. Furthermore, sequential eye movements are more useful if they are guided rather than blind. The sophisticated primate system simultaneously samples information from two distinct regions of interest in the visual field. One region is the current fixation point, and the other is an area in the visual field that has not been recently visited and that low-level detection of contrast indicates contains visual information. The primate system responds by, first, stopping visual processing at the fixation point; second, initiating detailed processing at the new area of interest; and third, moving the eyes to fixate on the new region of interest (McConkie & Rayner, 1975; Remington, 1980).

The operations performed by the subsystems for eye movement control, from the superior colliculus to the frontal eye fields, which are subsystems of the habit system and the instrumental system, do not innately generate a roster of scanning patterns for all possible tasks that humans might undertake. Rather, these systems provide the ability to learn task-specific visual scanning skills—for example, for driving, face recognition, and reading. During reading, nearly all visual processing to the left of and above and below the fixation point is stopped (McConkie & Rayner, 1975). This would not do for driving.

Furthermore, the skillful performance of a continuous task in which an occasional unexpected target is encountered requires close coordination between the habit and improvisational systems and the rapid stopping and starting of action. For example, as I mentioned above, during reading, at the same time that the word to the right of the fixation point is read, the beginning of the next word to the right is found, and a saccade to the location of its initial letter is initiated. Also, at the semantic level, the word most likely to be found next is predicted, and the time necessary to confirm the prediction through visual processing at the fixation point is scheduled in advance (Ehrlich & Rayner, 1981). Sometimes, insufficient time is scheduled for a fixation, and the eyes move to the next word before the previous word in the sentence has been identified. In this case, forward saccades are stopped, which makes it possible to perform a regression to the unidentified previous word (Crowder, 1982, p. 9). The entire process, including both the stopping and restarting of forward eye movements, is so rapid that a reader is unaware of both the forward eye movements and the regressions.

A rapid stop action makes divided attention possible because, to divide attention between two tasks, the processing associated with one task must be stopped before the processing of the other task is started. A rapid stop action makes complex span tasks, which require divided attention, possible. Complex span tasks are closely linked to fluid intelligence (Shipstead & Engle, 2013). Hence, the various functions of a stop action show that a single, central mechanism is recruited for various skills at all levels of the cognitive system.

Finally, stop action is not only necessary for skilled performance, but also for skill learning. Skill learning occurs because the repetition of ineffective actions is stopped, allowing only effective actions to be repeated (Lee & Byeon, 2014).

Semantic subsystem

When an unexpected target is perceived, the semantic subsystem rapidly makes available to the instrumental system everything in memory that is relevant to the unexpected target, to guide the selection of an action in response to it. In an experiment designed so that an unexpected but semantically related target word appeared 250 ms after the cue word, a decrease in time to respond to the cue word indicated that a relationship between the words had been established that facilitated the response (Neely, 1977).

The semantic subsystem requires a review of its own, which is outside the scope of this report. It is mentioned here because even though it is widely distributed through the brain, it includes the medial temporal region. Consequently, a double dissociation between the effects of factors associated with basal ganglia activation and the effects of factors associated with the effects of medial temporal activation does not always indicate the different effects of sequence generation versus perceived recency. Instead, such a dissociation may indicate the different effects of sequence generation versus semantic retrieval. Three examples of this dissociation, for language comprehension, recall, and prediction, are described below.

Skills

Although the improvisational and habit systems make cognition possible, they do not by themselves produce cognition. Rather, they provide the computational operations that make it possible for people to learn skills through repeated action to perceptual targets. Cognition refers to the skills, not the systems, that make them possible. For example, people have feet and a neural system for moving them. However, the motor system by itself implies no particular movement at all. Rather, through practice people learn to roll over, crawl, cruise, walk, run, hop, skip, kick, stomp, dance, jump, and climb. There are several ways to kick, as well as many different dances. Each of these is a different skill that a person can learn to perform.

Comparison with earlier dual-system models

In earlier versions of the dual-system model, the habit system was associated solely with procedural learning and implicit effects on performance. In contrast, the improvisational system was associated with declarative learning and semantic memory (Cohen, Poldrack, & Eichenbaum, 1997; Squire & Zola, 1996; Ullman, 2004). The hypothesis presented here is that each system contributes a different set of computational operations that are recruited by the many different skills a person acquires through learning.

One problem with the declarative/procedural distinction is that it is not always clear whether a task is declarative or procedural. It is clear that learning to walk is procedural, but what is learning to talk? Recognizing words is clearly declarative, but what is organizing them into novel, grammatical sentences—procedural or declarative? Athletic skills appear procedural, but what are math skills? Presumably, learning the numbers and simple addition is declarative. Does addition become procedural when a math problem is solved, even though the individual is a applying a declarative rule that has been learned? With practice, does math knowledge become procedural as well as declarative? For this approach to succeed, the different representations of the same information in the declarative versus procedural systems must be described, and the different neural systems that specifically encode declarative versus procedural information must be identified.

It is this point that poses the most immediate difficulty for making the procedural/declarative distinction a neuroanatomical distinction. Once it appeared that the basal ganglia could be identified with motor, hence procedural, learning. However, as was mentioned above (e.g., Doya, 2000), it is now known that the basal ganglia are involved in declarative learning, as well. There does not appear to be an identifiable neural procedural system.

Even if motor skills are learned and performed exclusively by the habit system, as they appear to be, it may also be that the operations of the habit system that are recruited by the skills that make motor action possible are also essential to a variety of cognitive skills. In particular, the habit system encodes both motor (Doya, 2000) and perceptual (Nissen & Bullemer, 1987) sequences through action–effect learning (Ziessler & Nattkemper, 2001). In contrast, the instrumental system uses habituation levels in the perceptual system (Groves & Thompson, 1970) to impute degrees of recency/novelty to perceptual targets (Suzuki & Naya, 2014). Furthermore, the operations that encode sequences in memory and generate them from memory, and the operations that generate perceptions of recency and novelty, are available for incorporation in the operation of learned skills for all kinds of cognitive tasks, including recognition (Sinha & Glass, 2017), recall (Butters & Stuss, 1989), prediction (Seger & Cincotta, 2005), and language (Ullman, 2004). So, both the improvisational system and the habit system play causal roles in declarative learning and memory.

More generally, the purpose of the improvisational system is to respond to novel targets, so it has maximum access to both perceptual and semantic information (Neely, 1977), and it has maximum flexibility to improvise a new response (Glass, 2016). In contrast, the purpose of the habit system is to make preplanned responses to expected targets (Glass, 2016). Hence, the fundamental distinction of the dual-system model proposed here is not between declarative and procedural, but between unexpected and routine. In support of this approach, skilled behavior has been shown to require a mixture of two kinds of computational processes: planned actions in response to predictable targets, requiring the generation of learned sequences, and ad-hoc actions to unpredicted targets, which are identified as novel or recent or are not immediately identified at all (Ehrlich & Rayner, 1981). The rapid integration of responses to the two kinds of targets during a single task is made possible through the stop-action subsystem of the instrumental system (Aron et al., 2014). Otherwise, the existence of regressions during reading is difficult to explain (Ehrlich & Rayner, 1981).

Evidence for the basic mammalian neural systems in human cognition

In the studies of human cognition below, neural activation was measured through fMRI. Evidence for the habit system is inferred from activation of the caudate (or Broca’s area, for language) in behavioral evidence of sequence retrieval from memory, and is also inferred from evidence of impaired sequence retrieval as a result of damage to the caudate (or Broca’s area) or associated structures. Evidence for the role of the improvisational system is inferred from activation of the hippocampus when a target was recognized and/or a temporal (recent vs. novel) judgment was made, and is also inferred from impaired recognition or temporal judgments that resulted from damage to the hippocampus or associated structures. The tasks that meet these criteria for both the instrumental system and the habit system are navigation, visual recognition (described above), language comprehension, and recall.

Navigation

Studies of human navigation have used fMRI to record neural activity when exploring a new area (wayfinding) versus traversing a familiar area to a goal (route-following; Baumann, Chan, & Mattingley, 2010; T. I. Brown, Ross, Tobyne, & Stern, 2012; Doeller, King, & Burgess, 2008; Hirshhorn, Grady, Rosenbaum, Winocur, & Moscovitch, 2012; Konishi et al., 2013; Marchette, Bakker, & Shelton, 2011; Wegman, Tyborowska, & Janzen, 2014). They have consistently found that wayfinding is associated with hippocampus activation, and route-following is associated with caudate activation.

Language comprehension

Ullman (2004) reviewed the evidence for the roles of the habit and improvisational systems in language comprehension. Activation of the improvisational system is associated with the improvisational functions of word recognition and comprehension. Activation of the habit system is associated with the habit functions of the sequential production and comprehension of sentences.

Activation in the temporal lobe, including the hippocampus and other medial temporal lobe structures, occurs during the processing of both nonlinguistic conceptual–semantic knowledge and lexical knowledge (H. Damasio, Grabowski, Tranel, Hichwa, & Damasio, 1996; Martin, Ungerleider, & Haxby, 2000; Newman, Pancheva, Ozawa, Neville, & Ullman, 2001). Activation in ventrolateral prefrontal cortex surrounding Broca’s area occurs during the syntactic processing of sentences (Caplan, Alpert, & Waters, 1998; Embick, Marantz, Miyashita, O’Neil, & Sakai, 2000; Friederici, 2002; Indefrey, Hagoort, Herzog, Seitz, & Brown, 2001; Moro et al., 2001; Ni et al., 2000; Stromswold, Caplan, Alpert, & Rauch, 1996). Syntactic processing also elicits activation in the supplementary motor area (Caplan et al., 1998; Newman et al., 2001) and the caudate (Moro et al., 2001).

Language disorders (or aphasias) are of two main kinds that correspond, in both symptoms and damaged areas, to the contributions of the habit and improvisational systems to the skills that make language comprehension possible (M. P. Alexander, 1997; A. R. Damasio, 1992; Goodglass, 1993).

Broca’s (or nonfluent) aphasia is caused by damage surrounding Broca’s area in the left ventrolateral frontal cortex, as well as the basal ganglia (M. P. Alexander, 1997; A. R. Damasio, 1992; Dominey, Hoen, Blanc, & Lelekov-Boissard, 2003; Dronkers, Redfern, & Knight, 2000; Goschke, Friederici, Kotz, & van Kampen, 2001), which areas, as was mentioned above, are integral to the habit system. This disorder is characterized by an inability to generate (Szelag, von Steinbuchel, & Poppel, 1997) and understand grammatical sentences. Broca’a aphasics have relatively good recognition and comprehension of content words e.g., nouns, adjectives; Goodglass, 1993). As would be expected, with damage to the basal ganglia, there is difficulty retrieving content words, despite the spared recognition of these words (M. P. Alexander, 1997; A. R. Damasio, 1992; Dronkers et al., 2000; Goodglass, 1993).

Broca’s aphasia is strongly associated with impairments in generation of other language and nonlanguage sequences, including articulation and complex learned motor skills involving sequences (M. P. Alexander, 1997; Dronkers et al., 2000; Goodglass, 1993). These aphasics have impairments learning new sequences, especially sequences containing abstract structure (Dominey et al., 2003; Goschke et al., 2001).

Wernicke’s (or fluent) aphasia is caused by damage of the left temporal lobe. This aphasia includes impairments in the recognition of the sounds and meanings of content words, as well as of conceptual knowledge (M. P. Alexander, 1997; A. R. Damasio, 1992; Dronkers et al., 2000; Farah & Grossman, 1997). However, fluent aphasics tend to produce syntactically well-formed sentences.

Recall

Paired associates are two-item lists. Consequently, if list retrieval is an operation exclusive to the habit system, then recall of an associate in response to a cue must involve the habit system. In fact, activity in BA 45 occurs during verbal recall (Petrides et al., 1995), indicating habit system activity.

However, when the rate of presentation of a study list is too fast to allow the habit system to construct a complete sequence for the study list, the fragmentary sequences retrieved are subjected to a recency check. When a list is studied of words that are all related to a common word not on the list, the common word not on the list has the same probability of being falsely recognized or recalled as being on the list as the moderate-frequency words that occurred on the list (Roediger & McDermott, 1995). This effect can be explained by the effect of the list words on the perceptual fluency of the unpresented word, which raises its perceived recency.

Damage to the instrumental and habit systems confirms that each plays a role in recall. Amnesias are of two types: Those that affect the medial temporal region impair the encoding and retrieval of information from the semantic system, consequently affecting both recognition and recall (Butters & Stuss, 1989; Milner, Corkin, & Teuber, 1968). However, Huntington’s disease, which only affects the basal ganglia—hence, the habit system—impairs recall but not recognition (Tröster, Salmon, McCullough, & Butters, 1989), indicating the role of the habit system in recall.

General features of human cognition explained by the dual-system model

Some features of human cognition are also explained by the dual-system model, though imaging studies to establish the neural mechanisms have not been done. The remember–know distinction provides evidence of the effects of the instrumental system and the habit system, whereas rehearsal, voluntary action, and prediction provide evidence of the effect of the habit system.

Remembering versus knowing

As was mentioned above, the degree of habituation in the visual system for the processing of a visual target is used by the perirhinal cortex to generate a context-free perception of recency/novelty for a test item (Sinha & Glass, 2017; Suzuki & Naya, 2014). In contrast, the caudate may generate a sequence of targets that were the context in which the test item appeared (Sinha & Glass, 2017). Atkinson and Juola (1973, 1974) used a task in which participants responded whether a test item had appeared on a just-presented study list, in an experiment in which a word list was followed by test items that were either list words or foil words not on the list. Different lists were presented and were then followed by test items in succession. The foils in a test sequence could either be words that had appeared on earlier lists or words that had not appeared before. RTs for both list words and foils were faster when the foils were new words that had not appeared before, indicating that recency/novelty judgments were being made and were faster than responses based on retrieval of the study list and comparison with the test word. Furthermore, Glass (1993) and Kristofferson (1972a, 1972b) found that RTs when new-word foils were used were not linearly related to the length of the study list, confirming that they were based on the recency/novelty of the word, and RTs when previously seen foils were used increased linearly as a function of the size of the study list, confirming that judgments were made by serially generating the study list and comparing each list member to the test word.

Also, in a task in which the study items are word pairs and the test items are intact versus rearranged word pairs (two words from different study pairs), the habit system must retrieve a study pair at test containing at least one of the test words and compare it to the test pair in order to determine whether the test pair was a study pair. Rotello and Heit (2000) had participants perform the task under a short response deadline that precluded retrieving the study pair. If pair retrieval were the only process available for determining whether a test pair had been a study pair, then error rates should have increased for both targets (intact pairs) and foils (rearranged pairs). Instead, only false alarms increased, presumably caused by fast recency responses of the instrumental system to the recently presented words of the rearranged pairs.

Troyer, Winocur, Craik, and Moscovitch (1999) found that the ability of the habit system to encode perceptual sequences gave it the ability to sequentially encode the contextual features of a study item—for example, its size, color, and location—and then to retrieve the list of those features at test. This provided an experimental framework in which the effects of factors that influenced the ability of the participant to encode the study context were compared with the effects of factors that influenced the perceived recency of the test item. A double dissociation between the effects of these two kinds of factors was inferred to demonstrate the separate effects of retrieval from memory versus perceived recency/novelty on recognition.

Troyer et al. (1999) found that when a distractor task interfered with study, the recognition of the contextual information available at test for retrieval by the habit system was reduced. In contrast, the recognition of the test item itself, based on a recency/novelty judgment by the instrumental system, remained intact.

On the other hand, factors that affect perception affect recency without affecting recollection of contextual information. Priming increases perceptual fluency, which increases recency, but does not influence recollection. For example, briefly flashing a word just prior to presenting it in a recognition test, visually presenting a word more clearly than other words in a test, revealing a word letter by letter rather than presenting the entire word, or presenting a word in a conceptually predictive rather than an unrelated context increases the perceived recency—hence, the recognition—of the target items, but has no effect on recollection of the context. Consistent with the results of Rotello and Heit (2000), mentioned above, fluency effects are observed in single-item recognition tasks, but not in associative recognition tests that require discrimination of intact versus rearranged pairs, when there is sufficient study time, but such effects are observed for both item and associative recognition tests when study time is extremely brief, thus limiting the encoding of information that would support recollection (Cameron & Hockley, 2000).

Also, changing the perceptual characteristics of words between study and test reduces their perceptual fluency, and hence reduces recency, but does not influence recollection. Changing the modality from study to test had larger effects in speeded than in nonspeeded test conditions (Toth, 1996), again suggesting that the manipulation affected recency rather than recollection.

Finally, across 32 intervening items in a continuous recognition test, recognition memory for single items decreased significantly, whereas memory for associative recognition remained unchanged (Hockley, 1991, 1992), suggesting that recency, but not recollection, decreased across these delays. A similar pattern of disproportionate forgetting for item recognition as compared to associative recognition was also seen in procedures in which a study list was followed by a distracter list before recognition (Hockley, 1991, 1992).

As was mentioned above, Glass et al. (1989) found that when the beginning of a repeated substring in a long number sequence was perceptually segmented, it was recognized in an immediate test. Participants verbally shadowed the string, character by character, which was sufficient for perceptual encoding but prevented list encoding. Consistent with the results of Hockley (1991, 1992), when the recognition test was delayed and the interval between study and test was filled with the shadowing of a distractor string, the repeated substring was not recognized. In contrast, in another condition, instead of shadowing each character, at the end of each substring a participant had to generate it. Since this required retrieving the string from memory, the task was presumed to activate list encoding by the habit system. Again consistent with the results of Hockley (1991, 1992), when the recognition test was delayed and the interval between study and test was filled with the shadowing of a distractor string, recognition of the repeated substring was not impaired.

Tulving (1985) named the judgments based on recency/novelty know judgments, and judgments based on context retrieval remember judgments. In an unfortunate siloization of knowledge, recognition studies since then, including those described above, have been carried out within the framework of remember–know judgments (Diana, Reder, Arndt, & Park, 2006; Yonelinas, 2002) and have not mentioned the dual-system model. Consequently, considerable ongoing research pursued in ignorance of the dual-system model has consistently produced supporting evidence for this model.

Rehearsal

Tulving (1962) found that even when a study set of words is repeatedly presented in different orders, subjects consistently recall the words they do recall in the same order, indicating that the list is learned because the habit system encodes it as a sequence. Hebb (1961) found that distributed rehearsal of a superspan string without mnemonic intent, or even awareness that the string was repeated, was sufficient to produce accurate recall of the string.

Voluntary action and retention

If the purpose of cognition is to make action more effective, then what is retained long-term from an experience is the consequences of voluntary actions rather than aspects of the perceptual experience unrelated to action. One example of this is rehearsal, mentioned directly above. In fact, this behavioral effect is among those best-established in psychology, described in relation to different tasks as the generation effect (Bobrow & Bower, 1969), the retrieval effect (Pan & Rickard, 2018), and the testing effect (Bjork, 1975). A sentence—for example, a sentence containing the words car and tree—is twice as likely to be remembered if it is self-generated than if it is read or heard (Bobrow & Bower, 1969). Cueing retrieval of something increases its subsequent retention more than presenting it for restudy does (Pan & Rickard, 2018). Similarly, following study material with questions on it increases long-term retention more than additional study does (Bjork, 1975). What these three tasks have in common is that an individual performs a mnemonic act: a voluntary action whose purpose is the retrieval of information from memory. In all cases, subsequent long-term retention is increased for whatever is retrieved.

Prediction task

The ability of the habit system to retrieve a study item and compare it with a subsequent test item becomes an important tool for classification learning, because it can be used to detect errors between the previous predictions it retrieves and compares with subsequent events (Seger & Cincotta, 2005). So an operation provided by the habit system may influence conscious decisions about the future by making error detection possible. Furthermore, the experimental paradigm for the prediction task involves making a motor response to a visual cue, which is followed by a visual outcome. Hence, at the operational level, participants are engaging in action–effect learning, which activates the caudate (Poldrack, Prabhakaran, Seger, & Gabrieli, 1999). Damage to the basal ganglia, as indicated by Huntington’s disease (Knowlton, Squire, et al., 1996) or Parkinson’s disease (Shohamy et al., 2004), impaired performance on the prediction task.

Voluntary action, working memory, divided attention, and consciousness

Consistent with the purpose of this special issue, this section describes how the action-based dual-system model provides explanations for three central topics in attention: working memory, divided attention, and consciousness.

Short-term/working memory

Within the framework of the dual-system model, the explanation of short-term or working memory is different from those in the extant models. In 1959, the term short-term retention appeared for the first time in the titles of two journal articles studying immediate retention of a short list (Peterson & Peterson, 1959; Tulving, 1959). The term working memory first appeared in the classic preface to the era of cognitive psychology, Plans and the Structure of Behavior (Miller, Galanter, & Pribram, 1960). Subsequently, the term short-term memory rather than short-term retention has entered colloquial English, possibly because of the Scientific American article (Peterson, 1966) entitled “Short-Term Memory.” Since then, the terms working memory and short-term memory have been associated, sometimes treated as synonyms, sometimes defined as cause and effect, over the past half a century.

One meaning refers to an observable behavioral effect. If a person is given a subspan string of three characters to remember but is prevented from rehearsing them, the probability of recalling the three characters declines over the next 18 s. Hence, the memory of the characters is short-term (J. Brown, 1958; Peterson & Peterson, 1959). This meaning of short-term memory makes it effectively synonymous with the recency effect, which describes the same phenomenon in a different context. If immediate recall of a superspan list is permitted, then up to four of the list items are recalled (Glanzer & Cunitz, 1966). However, if rehearsal of the last four items is prevented, then recall of the last four items is no better than that for earlier items on the list (Postman & Phillips, 1965). Hence, the better memory for the most recently presented list items when immediate recall is permitted is called the recency effect.

With regard to this meaning, it is unfortunate that short-term memory replaced short-term retention as the descriptive term. Like recency effect, short-term retention clearly refers to an effect, not a cause. However, short-term memory reifies the description, tendentiously implying that information is stored in a special kind of storage device: the short-term memory.

Consequently, the second meaning of short-term/working memory is the description of a special kind of short-term storage mechanism that presumably explains why the short term/working memory effect occurs. According to this model, memory is divided into a sensory register, a short-term store containing whatever is within awareness, and a long-term store containing whatever is outside of awareness. Atkinson and Shiffrin (1968, pp. 90–91) described the model as follows:

The short-term store is the subject’s working memory; it receives selected inputs from the sensory register and also from long-term store. Information in the short-term store decays completely and is lost within a period of about 30 seconds, but a control process called rehearsal can maintain a limited amount of information in this store as long as the subject desires.

Atkinson and Shiffrin (1971, p. 83) provided the following rationale for the model:

Our account of short-term and long-term storage does not require that the two stores necessarily be in different parts of the brain or involve different physiological structures. One might consider the short-term store simply as being a temporary activation of some portion of the long-term store. In our thinking we intend to equate the short-term store with “consciousness,” that is, the thoughts and information of which we are currently aware can be considered part of the contents of the short-term store.

The implications of the model described by Atkinson and Shiffrin (1968, 1971) were explored extensively by Baddeley, beginning with Baddeley and Hitch (1974), and by Cowan (2005). Both added details to the model consistent with the original definition.

Randall Engle and his colleagues (Shipstead & Engle, 2013; Shipstead, Lindsay, Marshall, & Engle, 2014) added a variety of tasks requiring the retention of novel sequences and patterns to the study of short-term retention. Also, in addition to a simple span task in which retention is limited only by the suppression of rehearsal, Engle and his colleagues have studied the effect of complex span tasks, which are divided-attention tasks in which one of the tasks is a retention task. For example, in the seminal task, participants read aloud a sequence of sentences and remember the last word of each sentence (Daneman & Carpenter, 1980).

To summarize, as one moves from the Atkinson and Shiffrin (1968) model to the successive models of Baddeley and Hitch (1974), Cowan (2005), and Engle (Shipstead et al., 2014), the initial model is elaborated to increasing levels of complexity. Because of the variety of tasks currently designated as working memory tasks and the variety of explanations for short-term retention from the tasks, at the current time the term short-term/working memory without further specification may refer to any, some, or all of the tasks that produce only short-term retention, or any, some, or all of the explanations for the effect. What they have in common is that a rapidly decaying activation level determines the probability of immediate recall, and that activation is maintained through rehearsal.

In the dual-system model, there is no structure that corresponds to a short-term store, and the level of activation plays no role in the probability of recall. Working memory definitely does not refer to a neural structure that preserves perceptual or semantic information for a brief period of time. Neither the lack of a dedicated short-term store nor a role for rapidly decaying activation in recall is necessarily a bar to the adequacy of the dual-storage model because these two hypotheses by themselves have not been sufficient to describe the observed parameters of short-term retention.

At the time of Atkinson and Shiffrin’s (1968) original proposal, there were objections to their hypothesized short-term store and the causal role of the level of activation, on the grounds that the factors influencing retention in the hypothetical short-term store were identical to those that had been long-established for long-term memory (Melton, 1963). For example, immediate-recall failure for a study item is a consequence of proactive interference (PI) from the previous encoding of other similar study items (Keppel & Underwood, 1962). PI is clearly a causal factor in short-term retention, because when PI is eliminated through a shift to a different kind of study item, immediate-recall failure is also eliminated (Wickens, 1972; Wickens, Born, & Allen, 1963). Furthermore, release from PI is achieved by the presentation of a semantic cue that is not presented with the study item during the task, but only associated with it in long-term memory (Gardiner, Craik, & Birtwhistle, 1972).

Atkinson and Shiffrin (1968) accounted for the effect of PI and any other effects that appeared to occur in long-term memory by pointing out that in their model, an item in the short-term store was also in the long-term store, which also influenced its probability of recall, so effects on recall deriving from long-term memory could be accounted for. Although this is an adroit response, it only provides a basis for discounting one kind of evidence against a short-term store. It does not provide any positive evidence for a short-term store. It thus leaves open the possibility that if it is necessary to assume that a study item is in both short-term and long-term stores in order to account for short-term retention, then it may be possible to entirely explain retention on the basis of effects on the long-term store, so that the short-term store is superfluous.

Within the habit system, the recall of mundane information is encoded as a list of items by the habit system, and the first list item, or if necessary an initial sequence of items that uniquely identifies the list, serves as a cue to the habit system for generating the list (Glass et al., 1989). Consequently, within the framework of the dual-system model, virtually all recall is cued recall, and recall failure is the result of an insufficiently specific cue or of competition among different lists in memory that begin with the same cue. In a free recall task, the individual must first generate specific cues for the target lists and then use those cues to generate the target lists (Glass, 2016). A specific cue is a cue that has perceptual features that are shared with only the target—hence, uniquely specifying it—or are shared with only a small set of items in memory that includes the target (Glass, 2016).

In tasks that produce immediate-recall failure, study items are deliberately selected that do not have retrieval cues other than themselves, such as meaningless consonant trigrams. Consequently, the only way to recall them is to rehearse them, maintaining them as cues for themselves. If rehearsal is prevented, then no cue is available for recall. Furthermore, in studies of long-term memory, attempts to generate (not recall) a categorically identified target, such as an animal name or a meaningless trigram beginning with the letter v, have shown that priming in the improvisational system biases the target generated toward something that had been recently perceived (Graf, Shimamura, & Squire, 1985). Consequently, when an individual in the distractor task (J. Brown, 1958; Peterson & Peterson, 1959) is asked to recall successive study items that do not have retrieval cues other than themselves on the first trial, there is a strong bias in long-term memory to generate the single, just-presented study item. However, when the successive items are different from each other, for each successive item the bias to generate one of the increasing number of incorrect earlier study items increases. This explanation, which is derived entirely from the known properties of long-term memory, provides an account of the findings of both J. Brown (1958) and Peterson and Peterson (1959), as well as the of other studies of short-term retention mentioned above, that is as accurate and comprehensive as the Atkinson and Shiffrin (1968) model, without invoking an ad-hoc short-term store or a magical decaying activation that increases the probability of recall.

Furthermore, in part to account for the dual representation of novel, meaningless study items in both the short-term and long-term stores, Atkinson and Shiffrin (1968) included in their model the assumption that every time a study item was rehearsed, not only was its activation level increased in the short-term store, there was also some probability that it would be transferred to the long-term store. However, Craik and Lockhart (1972) showed that this assumption was false. Mere rehearsal by itself did not increase the probability of long-term recall of a study item. Only when the rehearsal involved the generation of a cue that was specific to the study item was subsequent recall increased, as is predicted by the functional characteristics of the habit system in the dual-process model.

Some quantitative predictions of the probability of recall that are derived from the cue-specificity explanation of short-term retention may be identical to the predictions of a decaying-activation model, because for superspan lists in both models, the probability of recall is determined by the speed, hence the frequency, with which rehearsals may be generated (e.g., Barrouillet, Bernardin, & Camos, 2004). However, the explanations are not identical. They differ in the domains of the tasks they cover. When Olton (1979) performed radial-arm maze experiments with rats to determine how they remembered where they had been, he described their memory of the visited arms as short-term memory. In Packard and McGaugh’s (1996) experiment, the memory controlling the rat’s response for a week might be called a short-term memory, to contrast it with the memory that controls the response after two weeks. Baddeley (2010) pointed out that if the explanation of the result of the J. Brown (1958)–Peterson and Peterson (1959) distractor task requires activation decaying over 30 s, then the attribution of short-term memory to the Olton and to the Packard and McGaugh results is use of the same label for entirely different phenomena. In comparison, the dual-system hypothesis provides a common explanation for all the phenomena. In the J. Brown–Peterson and Peterson distractor task, on the first trial the recency effect in the improvisational system biases it toward finding the cue that generates the target in the habit system, but on each subsequent trial the recency effect in the improvisational system increasingly biases it against finding the correct cue. In the radial-arm maze, the recency effect in the improvisational system keeps track of where the rat has been. In learning of a T-maze, recency in the instrumental system is used to identify the arm to turn into, until the habit system takes control of the task.

Consistent with the continuity hypothesis that working memory tasks engage exactly the same neural computational systems as long-term memory tasks, the cortical areas found to be active during working memory tasks are precisely the areas of frontal cortex that are components of the habit system: the premotor area, supplementary motor area, and left ventrolateral areas (Braver et al., 2001; D’Esposito et al., 1998; E. E. Smith & Jonides, 1999).

Divided attention

Also, as was mentioned above, performance on complex span tasks is correlated with performance on a variety of other tasks. As Baddeley (2010) pointed out, when the complex span task is considered to be another task in which the suppression of rehearsal prevents reactivation, then the connection with other tasks that do not involve short-term retention of a novel sequence is not clear. Within the framework of the dual-system model, the complex span task is a divided-attention task involving two subtasks, one of which is a short-term retention task. It is the divided-attention aspect of the complex span task that produces correlations with performance in other divided-attention tasks.

Because a person can perform one voluntary action at a time, the action-based dual-system hypothesis incorporates a response bottleneck that limits the ability to divide attention across tasks. The rate at which voluntary actions can be started, stopped, and started again by the improvisational subsystem of the improvisational system is one of two factors that constrain the performance of two or more tasks simultaneously. It is well-established that there is a time cost for switching from one task to another (Jersild, 1927; Rogers & Monsell, 1975; Spector & Biederman, 1976). The second factor is the ability to accurately predict the occurrence of task-relevant targets and to prepare a schedule of actions in response to them so that each action will be performed by the habit system as each target occurs. In cases of extreme certainty, the response may be initiated before the target is detected, so that the resulting action coincides with target onset. As long as targets and target times are predictable, the habit system is able to achieve a high degree of speed, accuracy, and automaticity (Schumacher et al., 2001). In humans, within the habit system the ability to encode and retrieve sequences of targets has extended to the precise timing of when each target should appear so that rhythm is encoded (Meck & Benson, 2002; Schubotz & von Cramon, 2001; Szelag et al., 1997).

Consciousness

As was mentioned by Glass (2016), the definition of voluntary action implies consciousness: By definition, if someone is not aware of what they are doing, the action isn’t voluntary.

The reverse is also largely true. Seeing is almost always the result of looking, and hearing is the result of listening. So perceptual experience—hence awareness of the world, hence consciousness—is the result of voluntary action. Thoughts are certainly mental actions or the consequences of mental actions. It would be an overstatement to claim that every bit of awareness is the result of voluntary action, but not by much. Even when awareness is not directly caused by a specific voluntary action, it is closely related to voluntary action. Perhaps the clearest example of an experience that is not always caused by voluntary action is pain. However, pain exists to guide voluntary action in order to avoid self-injury. Without voluntary action, there would be no reason for pain to exist.

Hence, if the purpose of voluntary action is to enable an animal to devise a response in an unfamiliar situation, then that is the purpose of consciousness.

Though voluntary action is synonymous with consciousness, the automatic subcortical operations underlying voluntary action that were incorporated from instinctive behaviors play a role. Merker (2013) traced in detail the pathways ascending and descending from the superior colliculus, which is the control center for eye movements. In primates, this area is part of a larger system that also includes the cortical eye fields. Merker argued that the superior colliculus does more than just move the eyes; it collects visual information that is used to construct the egocentric view of the world that people experience. This information is transmitted by the superior colliculus to the pulvinar nucleus of the thalamus, and is distributed from there across the cortex.

The view expressed by Glass (2016) that voluntary action and consciousness are the same thing is similar to the view of Morsella, Godwin, Jantz, Krieger, and Gazzaley (2016). Morsella et al. similarly suggested that consciousness is an effect of voluntary motor action and that perception is closely associated with voluntary action. However, they suggested that the purpose of consciousness is to provide information relevant to alternative courses of action when conflicting inclinations toward different actions are present.

Clearly, these two hypotheses about the reason for consciousness are related. A familiar situation will not lead to conflicting inclinations, because experience will have strengthened the inclination toward a previously successful course of action. Consequently, both Morsella et al. (2016), implicitly, and Glass (2016), explicitly, consider consciousness an evolutionary response to novelty. In this situation, Morsella et al. explicitly argued that consciousness arose as a low-level mechanism for resolving conflict when the novel situation elicited conflicting inclinations for action. The role of consciousness is to represent all of the information relevant to the possible courses of action. However, ultimately, a course of action is selected subconsciously, on the basis of the strength of the various inclinations and their fit with the information eliciting them. Morsella et al. called this hypothesis the passive frame theory. In contrast, Glass (2016) assumed that the selection of an action is a conscious decision and presented a theory of how higher levels of consciousness emerged that is similar to other proposals that will be mentioned below.

Passive frame theory is a general theory of animal consciousness that does not distinguish between the consciousness of a fish and a human. However, there are apparent differences among different animals in their understandings of the world—and hence, likely differences in their conscious experiences of the world. These differences can be explained by the variety of skills enabled by voluntary action. As was described above, rather than having specific computational operations dedicated to specific instinctive behaviors, voluntary action makes it possible to recruit any set of operations provided by the cognitive systems to create new skills. For humans, it was the evolution of the ability to construct increasingly sophisticated social skills that caused the elaboration of consciousness.

Social organization obviously does not require voluntary action. Sociobiology (1975) was written by Edward O. Wilson, a student of insects, and it immediately pointed out that bees and ants have high degrees of social organization. However, social organization based on voluntary action has a qualitative effect on cognition that is not required by a social organization based on instinct. Members born into a society based on voluntary action must learn their roles. In a society based on instinct, no learning is required. The learning required by a social system leads to an evolutionary development of the learning ability that facilitates social learning. Social organizations based on learning may be more varied and more responsive to their immediate environments than those based on instinct.

Social organization exists among animals because it facilitates survival through better foraging and protection and, as Wilson (1975) pointed out, through collective altruism. A platoon of scouts can cover more territory in the search for food than can a single individual. A defensive circle of the strongest members of a herd can protect the entire community, but an individual alone cannot watch its own back. If the members of a community care for each other when they are temporarily incapacitated, temporary incapacitation will not led to death from starvation or predation for any of them.

Within the animal kingdom, we can identify a small number of hypersocial animals. These are the animals that can be domesticated (Diamond, 1997). These animals so enjoy the company of others that they can form trans-species social bonds that make domestication possible. For example, race horses so despise being alone that another social animal—a dog, goat, or bird—may be added in a horse’s stall to provide a companion (Hillenbrand, 1999).

Humans are the most social animals (Dunbar, 2013) who have unique cooperative abilities (Tomasello, 2018). For perhaps as long as three million years, these social abilities have made human society become so adept at dominating its environment that the likelihood that humans would die from hunger, thirst, exposure, or being the prey of other animals decreased as a significant factor in their lives, and hence in the evolution of their species (Gurven & Kaplan, 2007). However, the consequence of the overwhelming effect of social organization on environmental hazards was not that evolution slowed down, but that it speeded up. When the effectiveness of the social organization became the determining factor for the survival of all the members of the community, and status within the social hierarchy became the determining factor for the wellbeing of an individual of the community, cognitive ability and social ability became locked in a positive feedback loop, because increases in the former increased the latter, which in turn elicited further demands on the former (Dunbar & Sutcliffe, 2013; Tomasello & Call, 1997). New cognitive operations made it possible for humans to understand the intentions of others (Stuss & Anderson, 2004), which made both greater cooperation and greater deception in social relations possible (Dunbar, 2013; Frith, 2010; Prinz, 2012), allowing individuals to rise in the social hierarchy by one means or the other.

Finally, the invention of language less than 200,000 years ago (Pagel, 2017) certainly influenced human consciousness, because without human language, there would be no autobiographical memory, and hence an impoverished sense of personal identity. Autobiographical memory begins in life at the point at which a child has sufficient skill to tell a story. At that point, a child begins to compose the story of his or her own life (Jack, MacDonald, Reese, & Hayne, 2009; Mullen, 1994; Reese, Haden, & Fivush, 1993).

Contributions of the dual-system model

A descriptive model of behavioral measures of cognition, no matter how precise and accurate, cannot be considered explanatory if it cannot be connected with the neural processes producing cognition. Furthermore, the domain of speculative computational operations to explain behavior is much larger than the known neural processes underlying behavior, so neural processes constrain the most likely computational operations to those consistent with the known neural processes.

One important contribution of the dual-system model is that it derives the computational operations producing cognition from the operation of well-established functional neural systems (Yin & Knowlton, 2006), and so may be considered an explanatory rather than a merely descriptive model. Furthermore, the connection between the neural and the computational has provided an extremely useful additional measure for the evaluation of computational models. On the one hand, a more complicated dual-process model of recognition (Sinha & Glass, 2017) may be selected over a simpler single-process model (Tanner & Swets, 1954) on the basis of clear evidence that two distinct neural systems contribute to recognition judgments. On the other hand, once the dual-process model is validated, other computational mechanisms and processes become redundant and may be eliminated, such as a short-term store and a role for rapidly decaying activation in recall.

The systems-and-skills approach clarifies the relationship between innate neural computational operations and the flexible organization of operations through learning to enable a variety of task-specific skills. It avoids the trap of inventing a new neural system for whatever skill a person may learn.

Also, an important purpose of this review has been to document that the habit system does more than control motor action. Rather, it evolved from encoding and initiating motor sequences to encoding and retrieving perceptual sequences (Sinha & Glass, 2017; Ziessler & Nattkemper, 2001). When redundant perceptual constraints on storage and activation are replaced by sequence generation by the habit system, it becomes apparent that when action is made the causal agent in cognition, a general explanation of cognition can be achieved, as opposed to a multitude of piecemeal explanations that are restricted to different time frames, contents, and contexts. For a variety of cognitive tasks, including navigation, recognition, and recall, there is clear behavioral evidence that the bulk of the results can be explained through the effects of the same two, well-established neural systems (see Table 1).

Table 1 Common terms for areas of study in which the effects of perceived recency/novelty and sequence generation are causally relevant for different task domains

Finally, basing a cognitive model on neuroscience is a step toward explaining how cognitive systems and skills evolve, which in turn is a step toward understanding why those systems evolved and the skills were invented; this is a more comprehensive and satisfying theory of human cognition than those currently available. Improvisation versus habit is a more useful and more accurate framework for describing cognition than the declarative–procedural framework.

Author note

This review was written while the author was a visiting scholar at UCLA at the Bjork Learning and Forgetting laboratory. The author is grateful to Elizabeth Bjork and Robert Bjork for welcoming him into a stimulating research environment and for their extreme hospitality. No new data are reported in this review.