At the heart of science is an essential tension between two seemingly contradictory attitudes – an openness to new ideas, no matter how bizarre or counterintuitive they may be, and the most ruthless skeptical scrutiny of all ideas, old and new. This is how deep truths are winnowed from deep nonsense.

—Carl Sagan (1987)

In broad strokes, cognitive science examines mental life, including perception, attention, memory, reasoning, language, and related functions. As an example, consider a brief scenario depicting the beginning of a typical day: A young woman is awakened by her alarm at 6:00. Although she presses “snooze” out of habit, she quickly recalls an early meeting on her calendar and begrudgingly gets up. While walking her dog, various thoughts run through her mind: She mulls over ideas for a Mothers’ Day gift, plans her workday, and reminds herself to pay bills later. While considering these topics, she also appreciates her surroundings, noticing trees and birds, and following her usual morning route. She happily notes that someone removed a sandwich wrapper that so interested her dog yesterday. Upon seeing an unfamiliar car in a numbered parking spot, she wonders whether new neighbors have moved in downstairs. After showering, she stands in her closet, sipping coffee and considering potential outfits, while a song runs through her head. Realizing that time is growing short, she quickly selects jeans and a sweater, and then hurries off to work.

Although simplified and schematic, the foregoing example captures the essence of cognitive life: The mind is constantly perceiving visual, auditory, and tactile stimuli. Objects are reflexively categorized as chairs, clouds, or toothpaste. Among these perceived objects, many are personally “known” (e.g., a favorite coffee cup), such that perceiving them also awakens stored memories and associations. Attention waxes and wanes, sometimes focusing on the outside world but often shifting to an inner train of thought. All the while, thoughts and behaviors are guided by memory, such as knowing what you are doing next. Language is pervasive in cognitive life: Even when not engaged in conversation (or reading, watching TV, etc.), an “inner dialogue” often characterizes private thought. In our example, the young woman’s behavior is affected by her memory about an early meeting; her attention is captured by the presence of an unknown car and by the absence of a known wrapper. She mixes daily routine with novel concerns, such as reminding herself to pay bills and thinking about Mothers’ Day. Such experiences typify the automatic, constant life of the mind. Beyond this loose classification, all her behaviors share another similarity: None can be plausibly explained, or even meaningfully addressed, by the principles of embodied cognition.

Mind and body

Everyone knows that mind and body are deeply connected. While watching a potential home run sailing perilously close to foul territory, the entire crowd bends sideways in unison, expressing their hopes via posture. People explore the environment with constant head and eye movements, gather information from touch, and fluently use tools as extensions of their limbs. People with expertise in specific skills (whether athletic, surgical, etc.) are more adept than novices in perceptually discriminating good from poor performances (Helsen & Starkes, 1999). Thinking about emotional topics can increase heart rate and temperature. In more cognitive terms, physical needs (e.g., hunger) will involuntarily direct attention to relevant objects (e.g., food) in the environment. Engaging mental imagery for an action activates the premotor and motor cortices of the brain, which then affect muscle tension in the limbs that were imagined. When people hold their hands near visible objects, attention toward those objects is systematically altered (e.g., R. A. Abrams, Davoli, Du, Knapp, & Paull, 2008; Weidler & Abrams, 2013). There are many examples of bodily states affecting cognition, and cognitive states affecting the body.

Beyond such relationships, there are certain “problems” that are not amenable to cognitive analysis. For example, knowing a ball’s size and weight does little to help a person successfully throw it at a target, but briefly wielding the ball will enable a reasonably accurate toss (e.g., Zhu & Bingham, 2008, 2010). A more dramatic example comes from the ability of baseball outfielders to chase and catch fly balls. A baseball in flight follows a parabolic trajectory, affected by numerous variables (e.g., the angle of the ball leaving the bat, spin on the ball, wind). Despite these challenges, fly balls are governed by constant principles. Saxberg (1987a, 1987b) theorized that outfielders can assess key flight parameters by observing the early moments of fly balls, using them to predict where the ball will land. This proposed solution was not attractive for several reasons. For example, distance from home plate to the outfield would make precise visual assessment nearly impossible, and the angle of flight relative to the observer could systematically warp perception. Moreover, although nobody knows the computational abilities of the human mind, the degree of real-time calculation required by Saxberg’s approach would be daunting.

In contrast to this computational approach, more elegant heuristic solutions were proposed, requiring only perception and action. According to optical acceleration cancelation (Chapman, 1968; Fink, Foo & Warren, 2009), the outfielder can align himself in the path of flight and, by running forward, can make the ball appear to move with constant velocity. By doing so, the outfielder will intercept the ball. A more general solution is the linear optical trajectory (McBeath, Shaffer, & Kaiser, 1995), wherein the outfielder runs in any direction that makes the ball appear to follow a straight line. By either strategy, the outfielder uses perceptual information to guide locomotion and uses locomotion to hold the perceptual information constant. Such a tight coupling of perception and action is consistent with premises from ecological psychology (e.g., Gibson, 1966, 1979) and suggest that some psychological “problems” can be solved with simple perception–action coupling, rather than complex computation.

In recent years, there has been growing interest in embodied cognition (EC), with many books and journal articles appearing every year (see Mahon, 2015, Fig. 1). A keyword search on Google Scholar using “embodied cognition” shows over 15,000 books and articles published since the year 2000. Despite such extraordinary levels of activity, EC is often vaguely defined, with various authors attempting to clarify what the field actually entails (e.g., Adams, 2010; Barsalou, 2008; M. Wilson, 2002). This ambiguity is surprising, especially when some EC articles discuss replacing standard cognitive theories (e.g., A. D. Wilson & Golonka, 2013) or unifying all branches of psychology (e.g., Glenberg, 2010; Glenberg, Witt, & Metcalfe, 2013; Schubert & Semin, 2009). Many proponents view EC as a paradigm shift for cognitive science – a claim that requires careful scientific consideration.

Fig. 1
figure 1

The challenge of face to photo-ID matching. The faces and licenses (shown with fabricated names and addresses) match in half of the examples, mismatching in the other half. The answers are provided in Footnote 2

A logical critique of embodied cognition

The present article offers a critique of EC, taking a different approach, relative to prior critiques that examine one specific domain in detail. For example, imagine that an EC proponent conducts a study, finding that perception of action-related words (e.g., kick) activates areas of motor cortex associated with the leg (e.g., Hauk, Johnsrude & Pulvermüller, 2004; Pulvermüller, Hauk, Nikulin, & Ilmoniemi, 2005). Given their theoretical perspective, the researchers suggest an “embodied” interpretation that effector-specific areas of motor cortex actually mediate (or modulate) word perception itself. Thus, rather than assume that (1) word perception leads to (2) motor priming, the causal chain is reversed, such that (1) motor simulation of the word kick leads to (2) perceptual appreciation of the word itself. Given this strong and counterintuitive theoretical interpretation, researchers with different theoretical perspectives can be expected to critically evaluate the study, asking whether the evidence truly merits such an elaborate account. For example, Mahon and Caramazza (2008) considered numerous experiments and claims about embodied language processing (and its neural bases) and found little compelling evidence for an EC account.

We characterize the foregoing as the “standard approach” to scientific debate: Some empirical arena is determined, data are collected, and interpretations are offered. This begins a cycle wherein experiments are parametrically extended, theories are challenged, and points and counterpoints are published. Such cycles are often fruitful in science, and can sustain research for years. However, they also induce tunnel vision – theoretical debates zoom in on specific phenomena while broader assumptions are rarely examined. Proponents of the controversial new theory typically “strike first,” choosing the empirical arena(s). In the case of EC, several such domains have emerged, including the role of motor systems in language perception (Glenberg, Sato, & Cattaneo, 2008a; Glenberg, Sato, Cattaneo, Riggio, Palumbo, & Buccino, 2008b; Pulvermüller et al., 2005; Sato, Mengarelli, Riggio, Gallese, & Buccino, 2008), well-known perception–action loops in behavior (Witt & Proffitt, 2005), various social priming phenomena (Denke, Rotte, Heinze, & Schaefer, 2014; Zhong & Leonardelli, 2008), and others. Having such empirical domains is beneficial, as they foster concrete debate. Problems can arise, however, because strong theoretical claims may appear reasonable when confined to specific domains but appear deeply flawed when extended to broader analysis.

In contrast to the standard approach, our goal is to “zoom out” from specific empirical debates, asking instead what EC offers to cognitive science in general. To preview, we argue that EC is theoretically vacuous with respect to nearly all cognitive phenomena. EC proponents selectively focus on a subset of domains that “work,” while ignoring nearly all the bedrock findings that define cognitive science. We also argue that the principles of EC are often (1) co-opted from other sources, such as evolution; (2) vague, such that model building is not feasible; (3) trivially true, offering little new insight; and, occasionally, (4) nonsensical. In fairness, some cognitive phenomena (e.g., mental rotation) appear consistent with embodied accounts. Our overall message, however, is that EC cannot replace cognitive psychology (Chemero, 2011; Shapiro, 2011; Wilson & Golonka, 2013), nor can it illuminate a path toward unifying myriad branches of psychology (Glenberg, 2010; Glenberg et al., 2013).

In the late 1950s, a “cognitive revolution” occurred: Behaviorism was the dominant psychological paradigm, which fostered great strides in certain domains (e.g., associative and operant learning) but precluded theorizing about internal mental events. Eventually, scientific interest in topics such as attention and memory increased, creating natural tensions with the restrictive behaviorist framework. A watershed moment in the cognitive revolution occurred when Chomsky (1959) published a review of Skinner’s (1957) book Verbal Behavior. In his book, Skinner had attempted to extend theoretical principles of behaviorism to the acquisition and use of language, for example suggesting that verbal behavior reflects stimulus–response associations. Chomsky wrote a long, incisive review, articulating Skinner’s claims and refuting each using commonsense counterexamples. Of particular relevance to the present article, Chomsky (pp. 51–52, original emphasis) noted that Skinner’s theoretical ideas quickly became vacuous when applied to actual language use:

Consider first Skinner's use of the notions stimulus and response. . . . A typical example of stimulus control for Skinner would be the response to a piece of music with the utterance Mozart or to a painting with the response Dutch. These responses are asserted to be “under the control of extremely subtle properties” of the physical object or event. Suppose instead of saying Dutch, we had said Clashes with the wallpaper, I thought you liked abstract work, Never saw it before, Tilted, Hanging too low, Beautiful, Hideous, Remember our camping trip last summer?, or whatever else might come into our minds when looking at a picture . . . . Skinner could only say that each of these responses is under the control of some other stimulus property of the physical object. If we look at a red chair and say red, the response is under the control of the stimulus redness; if we say chair, it is under the control of […chairness]. This device is as simple as it is empty. . . . The word stimulus has lost all objectivity in this usage. Stimuli are no longer part of the outside physical world; they are driven back into the organism. . . . It is clear from such examples, which abound, that the talk of stimulus control simply disguises a complete retreat to mentalistic psychology.

In the present article, we first identify the core ideas that characterize EC. We then consider classic findings from cognitive science (and mental life), asking the logical question, “do the EC principles have anything to offer?” Our approach is simple, like Chomsky’s thought exercise above, wherein we merely consider whether embodiment helps us understand various phenomena, or whether it conveys any scientific leverage. Relative to us, however, Chomsky had one clear advantage (beyond his sense of style). Whereas Skinner was clear and explicit about the principles of behaviorism, attempting to distill the principles of embodied cognition is quite challenging.

What is embodied cognition?

Reading the literature on EC can be a vexing experience. This is true partly because different theorists range from “mild embodiment” to “radical embodiment,” with very different claims included under a single, umbrella term. Generally speaking, theorists on the “mild embodiment” side (e.g., Barsalou’s, 2008, perceptual symbols theory, PST) contend that knowledge is not acquired in a vacuum. Instead, all cognitive experiences are necessarily grounded in the sensory and motor contexts of their occurrence. Sensorimotor information critically shapes conceptual representations and, during online cognition, those same sensorimotor codes actively shape processing. According to PST, during perception, people register multimodal perceptual, motor, and introspective states. Later, when similar perceptual information is processed, these representations are reactivated (i.e., motorically simulated), which allows the perceiver to apply the sensorimotor information that was previously encoded. PST is essentially an exemplar theory of perceptual learning and generalization (Goldinger, 1998; Medin & Schaffer, 1978; Nosofsky, 1984; see Mahon 2015, for a critique of “weak embodied” theories), with sensorimotor codes folded into the multidimensional representations of experiences.

At the other range of the spectrum, “radical embodied” theorists argue that mental representations are an empty and misguided notion (e.g., Chemero, 2011; Wilson & Golonka, 2013; see the “replacement hypothesis” from Shapiro, 2011). According to this view, cognition does not merely happen “in the head” but is a distributed system that extends to the body and the environment. As Wilson and Golonka (2013, p. 1, original emphasis) wrote:

The most exciting idea in cognitive science right now is the theory that cognition is embodied. […] Embodiment is the surprisingly radical hypothesis that the brain is not the sole cognitive resource we have available to us to solve problems. Our bodies and their perceptually guided motions through the world do much of the work required to achieve our goals, replacing the need for complex internal mental representations. This simple fact utterly changes our idea of what “cognition” involves, and thus embodiment is not simply another factor acting on an otherwise disembodied cognitive processes.

This stance from Wilson and Golonka reflects their focus on ecological psychology. As discussed with respect to catching fly balls, there are certain problems that can be elegantly solved with minimal cognitive mediation, at least in theory. Such problems usually require a person (or animal) to move through space, to wield objects, and so on. If we ask someone to throw a grapefruit at a distant target, no amount of thinking (e.g., use grip strength X . . . .) will help her achieve that goal – the problem is simply not “cognitive.” However, if she holds the grapefruit for a moment, her sensorimotor systems will calibrate themselves, helping her quickly prepare her throw. Based on such phenomena, some EC theorists have generalized, arguing that cognition writ large is achieved without representations. As we note below, and as others have argued (e.g., M. Wilson, 2002), this claim quickly fails when the vast majority of cognitive life is considered. For example, it would be exceedingly challenging to recall that Bill Murray starred in Groundhog Day without some stored representation of the movie. For now, we merely note that the label “embodied cognition” denotes a dramatic range of interpretations, with very different theoretical implications at different ends of the spectrum.

A related, challenging issue is that EC is often described in vague terms that are hard to “use” in a scientific sense. M. Wilson (2002) similarly noted that definitions and concepts of EC vary across publications, which hinders theoretical progress. Fortunately, for our purposes, we do not require a singular definition of EC. Like Wilson, we identify several key themes that collectively communicate the essence of EC, and allow us to evaluate its utility. As the name implies, the foremost principle of EC is that cognitive processes are fundamentally embodied. This is a challenging concept to articulate with precision, but Glenberg et al. (2013, p. 573) were quite clear:

In preview, the fundamental tenet of embodied cognition research is that thinking is not something that is divorced from the body; instead, thinking is an activity strongly influenced by the body and the brain interacting with the environment. To say it differently, how we think depends on the sorts of bodies we have. Furthermore, the reason why cognition depends on the body is becoming clear: Cognition exists to guide action. We perceive in order to act (and what we perceive depends on how we intend to act); we have emotions to guide action; and understanding even the most abstract cognitive processes (e.g., the self, language) is benefited by considering how they are grounded in action. This concern for action contrasts with standard cognitive psychology that, for the most part, considers action (and the body) as secondary to cognition.

This quote conveys several core themes of EC. First, cognition is inherently “influenced by the body.” This statement is ambiguous, but we will assume that EC excludes any obviously trite interpretations. Everyone surely agrees that cognition cannot occur without a living body, that a person cannot see an object without directing her eyes at it, and that cognitive functions will vary in response to biological factors such as fatigue, hunger, and inebriation. Thus, we articulate the basic claim of embodiment as follows: When a person processes information (e.g., perceiving an image or understanding a sentence), her body is involved in some nontrivial way, as a constraint or bias on processing, perhaps via simulation. For example, when viewing a coffee cup, perception is fundamentally shaped by the presence of a handle that can be grasped (e.g., Bub & Masson, 2010). Or, when hearing a sentence describing some action (e.g., Mike handed Tony a salami), the implied action is implicitly simulated, which constitutes understanding (Glenberg & Kaschak, 2002). Thus, the most important theme is that (1) cognitive processing is influenced by the body.

The second main theme from EC is that (2) cognition is “situated,” meaning that cognitive activity occurs in the surrounding environment and intimately involves perception and action. This theme from ecological psychology arises repeatedly in the EC literature. As with embodiment, we assume that “situated cognition” is not meant to convey anything trite. For example, a person can only see objects in her immediate surroundings, which is trivially true and offers no insight. Instead, our interpretation of “situated cognition” is that cognitive processes change (either qualitatively or quantitatively) based on the person’s goals and the immediate context.

A closely related theme, seen in the previous quote from Wilson and Golonka (2013), is that (3) cognition can be off-loaded to the environment. There is abundant evidence for off-loading, and common experience attests to its utility. People make lists to avoid holding information in memory, use objects in the environment as memory cues, an so on. As a concrete example, it is easy to look around a cluttered room but hard to memorize where everything is located. Thus, visual search appears “amnesic” (e.g., Horowitz & Wolfe, 1998): People will repeatedly fixate incorrect locations while searching for targets, allowing the stable environment to optimize cognition. In our view, such off-loading appears theoretically neutral: Whether one assumes embodied or “disembodied” cognition, it makes sense to hypothesize that humans evolved to use perception whenever memorization is unnecessary. More challenging is the idea that (4) the cognitive system extends into the environment. As M. Wilson (2002, p. 630) explained this idea:

The claim is this: The forces that drive cognitive activity do not reside solely inside the head of the individual, but instead are distributed across the individual and the situation as they interact. Therefore, to understand cognition we must study the situation and the situated cognizer together as a single, unified system.

Wilson (2002) found this hypothesis problematic and logically flawed. We merely note that “extended cognition” appears both trivially true and trivially false. As an example, if your eyes fall upon the word avocado, then “avocado” and its various associations become active in your mind. The environment has shaped cognition by driving perception, making the hypothesis trivially true. Nevertheless, none of the activated associations (e.g., avocado is common in California cuisine) are present in the environment, which makes it trivially false to claim that the word itself is doing any “cognitive work.”

Another prominent theme from EC is directly stated in the foregoing quote from Glenberg et al. (2013), specifically that (5) cognition is for action. In our view, this is an overextension of a commonsense, evolutionary idea. Obviously, perceptual and cognitive systems evolved to maximize survival, just like circulatory and digestive systems. Perception and action are intimately linked, and countless examples abound of cognition that supports action. At the same time, while watching television, a person exhibits a vast range of cognitive behaviors (e.g., perception, attention, prediction, memory, language processing), all while sitting on the sofa, explicitly avoiding action. Nevertheless, the idea that “cognition exists for action” is central to EC.

A final theme that arises in some versions of EC is that (6) cognition does not involve mental representations (e.g., Chemero, 2011; Wilson & Golonka, 2013). This is not a unanimous view, as many EC theorists (e.g., Barsalou, 2008) explicitly posit a role for representations in cognition. As we will repeatedly note, this hypothesis is deeply flawed and untenable; we include it here only to acknowledge its prominence in the EC literature. Taking all six themes together, we suggest that several can be further consolidated, such that EC can be characterized by three principles: (1) cognition is influenced by the body, including its potential actions; (2) cognition is influenced by the environment, including off-loading; and (3) cognition may not require internal representation. When evaluating how EC fares relative to classic cognitive phenomena, we mainly consider the first two principles, as theorists differ widely about the third.

What can you do with embodied cognition?

In the remaining sections of this article, we consider a wide array of classic findings in “laboratory cognitive science,” asking whether EC offers any meaningful scientific insight into each. Before beginning our review, we must address one serious limitation of EC, as it currently stands. Specifically, consider the first principle above: Cognition is influenced by the body. As a scientist, what can you do with this claim? Given the hypothesis, a researcher can select some cognitive behavior, create experimental conditions that induce different bodily states (for example), then test whether the behavior changes. An example is asking people to judge the steepness of a hill while wearing either a light or heavy backpack (Bhalla & Proffitt, 1999; Proffitt, 2006). A researcher could also present stimuli for some task while varying the “embodied cues” inherent to those stimuli, such as presenting words for lexical decision, including verbs that imply hand- or leg-specific actions, then measuring motor priming or cortical activity (e.g., Buccino et al., 2005; Pulvermüller et al., 2005). In social priming, people may be placed in a physiological state, then make social judgments (e.g., while sitting in a wobbly chair, people judge celebrity marriages as less stable; Kille, Forrest & Wood, 2013). Perception–action studies show that athletic skill (e.g., in batting) affects relevant perceptual judgments (see Witt, 2011), suggesting that a person’s potential actions can selectively warp their perception.

To date, studies such as these have constituted the EC research program (along with copious theoretical writing). On the surface, this appears to be a robust domain, with great opportunity for scientific advancement. Nevertheless, there have been incisive critiques of EC research. For example, Firestone (2013) powerfully critiqued the theory of “paternalistic vision” that wearing a heavy backpack should change the perceived steepness of a hill (see also Durgin et al., 2009; Shaffer, McManama, Swank, & Durgin, 2013). Similarly, Mahon and Caramazza (2008; Mahon 2015) critiqued a broad array of experiments implicating motor activity as a mediator of language perception. In the present article, we avoid focusing on such debates because we wish to offer a novel analysis, focusing instead on what EC can offer cognitive science as a whole.

More critically, although the EC hypothesis motivates many experiments, it appears extremely challenging to incorporate into a formal model and is therefore limited to broad, qualitative predictions. How might we write an equation that expresses embodiment? How can the environment (such as the affordances of various objects) be parameterized? Pezzulo et al. (2011) considered the challenges of computational modeling for EC and suggested some potential directions. For the time being, however, EC is largely defined by a set of vague and flexible claims. It is beyond our present scope to address the role of formal models in cognitive science, but their importance has been well established (e.g., Farrell & Lewandowsky, 2010; Hintzman, 1991; Lewandowski, 1993; Wagenmakers, van der Maas, & Farrell, 2012). We only emphasize two points. First, although researchers argue over details, cognitive science has a wealth of impressive models. There are drift-diffusion models that predict RT distributions in perceptual or memory tasks (Norris, 2009; Ratcliff & McKoon, 2008), neural networks that predict word naming RTs (Perry, Ziegler & Zorzi, 2010), and classification models that predict human behavior with precision (Shin & Nosofsky, 1992). There are models of attending in space (Itti, Koch, & Niebur, 1998), memory creation and retrieval (Shiffrin & Steyvers, 1997), the control of eye movements in reading (Reichle, Rayner, & Pollatsek, 2003), and many others. Given such progress in so many domains, it is an affront to reason to see theorists calling for a “replacement” agenda, wherein we throw out everything and start over. Such a proposal is akin to suggesting that people stop traveling in airplanes and instead begin flinging themselves around with giant catapults.

Second, the issue regarding models has immediate relevance, as it determines how we can proceed. Specifically, our present goal was to ask the question: What do the core principles of EC offer when applied to classic cognitive findings? In the next sections, we convey the essence of various findings, and then consider a potential role for embodiment. However, because modeling is currently impossible in EC, we cannot ask our question in a standard, scientific manner. We cannot compare the adequacy of formal accounts derived from EC and standard cognitive approaches. Instead, out of necessity, our approach is similar to Chomsky’s (1959) thought exercise presented earlier. In considering each finding, our primary ground rule was that for EC to be considered relevant, it has to offer some insight into the behavior, without resorting to trivial arguments. For example, it would not be valid to argue that word naming supports EC simply because a person must use her body for reading and speaking. Instead, the essential question is, when considering a domain such as word naming, are there classic findings that are better explained from the EC perspective, relative to standard cognitive theories? Clearly, although we have tried to be fair, readers may disagree with some (or all) of our determinations. To preview, for nearly all classic topics – with the only exception being mental rotation – we find almost no logical or empirical support for EC.

Classic topics in cognitive science

Having set the stage, we now briefly consider a series of classic, textbook ideas from cognitive science. To avoid having this review grow unmanageable, we have selected nine topics that illustrate our point. (Table 1 lists other candidate phenomena that we could have addressed, but excluded for brevity.) Because each finding is well known, we provide brief explanations, just noting the basics.

Table 1 Twenty additional findings from cognitive science that appear challenging to explain from an embodied cognition perspective

Word frequency and related effects

The well-known Hebb (1949) learning rule states that concurrent activation of adjacent neurons will strengthen their shared connections. Therefore, repeated exposure to a stimulus increases the fluency of neural subpopulations responding to its presence. In word perception, a person encodes a spoken or printed string, which automatically activates its corresponding meaning or syntax. The more commonly any word is experienced, the faster and more robust its lexical access becomes – the word frequency effect. Perhaps no experimental variable has pervaded the cognitive literature to a greater extent: Frequency affects every word-perception task, and it moderates the impact of other variables such that common words are immune to variations in other lexical dimensions, but rare words show many effects. In addition to laboratory measures such as lexical decision or naming, word frequency predicts eye fixation durations in reading (Inhoff & Rayner, 1986; Staub, White, Drieghe, Hollway, & Rayner, 2010) and ERP waveforms that occur before overt responses are generated (Polich & Donchin, 1988).

Beyond word perception, frequency effects also arise in recognition memory (e.g., the mirror effect; Glanzer & Adams, 1990), but the effect is flipped. Whereas high-frequency words show advantages in perception, low-frequency words show advantages in recognition memory. Because of its ubiquity, word frequency must be addressed by any viable model of word perception; the most prominent accounts are connectionist (neural network) models that track the statistical properties of large word corpora (e.g., Perry et al., 2010; Sibley, Kello, Plaut, & Elman, 2008). With respect to memory, word frequency is assumed to correlate with distinctiveness, allowing greater differentiation of targets and lures (e.g., Wagenmakers et al., 2004).

Can the core principles of EC help explain word frequency effects? The first principle is that “cognitive processing is influenced by the body.” Stated plainly, we cannot conceive of any embodied account for frequency effects, without resorting to the trivially true notion that peoples’ bodies are present every time they perceive (or produce) words. Even then, frequency effects would imply that different bodily states exist across trials in word-perception tasks, and that opposite bodily states exist across trials in recognition memory. In similar fashion, the second EC principle (that “cognition is situated” in the environment) cannot explain the frequency effect, unless it merely means that words are experienced in the environment. Finally, explaining frequency effects without representations appears impossible – they reflect a lifetime of linguistic memory, and that memory must reside in some form.

Perhaps more reasonably, an EC theorist could argue that word perception involves motor simulation, which becomes more fluent with expertise.Footnote 1 For example, one might appeal to the classic motor theory (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; Liberman & Mattingly, 1985), which posits that speech perception is accomplished through recreation of the motor commands for speaking. Similar theories have been offered for handwritten word perception (Babcock & Freyd, 1988). By this view, word frequency could be construed as a motor fluency effect, consistent with EC.

Despite this possible account, matters become far more complicated for EC when other word-perception findings are considered. Specifically, there are myriad effects showing that perception of any given word is profoundly affected by its relations to other potential words in memory. Regularity and consistency effects (Monaghan & Ellis, 2002; Seidenberg, Waters, Barnes, & Tanenhaus, 1984) show that perception of a word, such as GAVE, is affected by the presence and “strength” of other similar words with different pronunciations (WAVE and SAVE are “friends,” but HAVE is an “enemy”). Neighborhood effects (Andrews, 1989; Ziegler & Muneaux, 2007) show that perception is affected by the sheer numbers of words that resemble any given word. Imageability and concreteness effects (Strain, Patterson, & Seidenberg, 1995) show that word perception can be affected by semantic factors. All these effects interact with word frequency. Considering EC, neither motor simulation nor “bodily influences” can explain effects that derive from stable relationships among “covert” words. Words that are not present in the environment, but exist in memory, affect the perception of words that are actually shown. All these effects (and the models that predict them) are inherently statistical, with fine-tuned tradeoffs among myriad, complex variables. They cannot reasonably be explained by reference to the body, or the environment, or a mind devoid of lexical representations.

Concepts and prototypes

People are capable of remarkable feats of categorization. When motivating the EC theory, Glenberg and Kaschak (2002) described Harnad’s (1990) symbol grounding problem: A foreigner lands in a Chinese airport, speaking no Chinese, with only a Chinese dictionary. This is characterized as an impossible problem because unknown symbols can only be mapped onto other unknown symbols. But, although the traveler cannot read the airport signs, is he entirely out of luck? If he stumbles across baggage claim, can he identify his suitcase on the conveyer belt? Can he discriminate employees from other passengers? Can he locate an exit and a taxi? The answer to all these questions is clearly “yes” – you can travel anywhere in the world and rely upon past experience to help you classify new objects or interpret situations.

People experience the world largely in categories, fluently recognizing tables and parrots and lemons that were never previously encountered. People also have strong intuitive ideas about category prototypes, their central tendencies or best representations (even for ad hoc categories, defined on the spot; Barsalou, 1983). Two major theories explain how people derive prototypes. According to prototype views (e.g., Homa, Sterling, & Trepel, 1981; Reed, 1972), as category exemplars are experienced, perceivers gradually abstract generalities across items, unconsciously generating prototypes, even without veridically experiencing them. According to exemplar views (e.g., Medin & Schaffer, 1978; Nosofsky, 1988), perceivers store each category example in memory; prototypes are emergent properties of the memory-trace population. (For an interesting discussion regarding the logical limitations of exemplar theories, see Murphy, 2015.) By either theory, perceptual classification is a hallmark of cognitive life: We constantly recognize new instances of known categories, using prior knowledge to mediate new perception.

Research on prototype abstraction largely stems from Posner and Keele (1968, 1970), who had participants learn to classify “dot pattern” stimuli into categories. Each categorized pattern was actually a distorted version of an unseen prototype, with different categories derived from different prototypes. In subsequent transfer tests, people classified old and new items, including the unseen prototypes. Posner and Keele (1968) found that prototypes elicited the best classification, relative to transfer patterns that were equally similar to other training patterns. Posner and Keele (1970) later found that if testing was delayed by a week, the unseen prototypes were remembered better than the actually studied patterns (Homa & Cultice, 1984; Omohundro, 1981). Similar results have been obtained hundreds of times, and in various populations, such as amnesics (Knowlton & Squire, 1993), newborn infants (Walton & Bower, 1993), and nonhuman animals (e.g., Smith, Redford, & Haas, 2008; Wasserman, Kiedinger, & Bhatt, 1988).

In what manner might EC help us understand prototype abstraction? Even more basic, how might EC help explain ubiquitous perceptual classification, such as recognizing dogs or books? As stated by Glenberg et al. (2013, p. 573), “thinking is an activity strongly influenced by the body and the brain interacting with the environment.” Does this assertion illuminate how a person recognizes common objects, or learns the central tendency of dot patterns? Without adding numerous complex assumptions, there is no reasonable way to argue that “the body” plays any role in these common acts of categorization. Similarly, the environment cannot explain the data, nor can off-loading, nor culture, nor emotions. Prototype abstraction is a purely cognitive activity, rooted in the relationships among encoded memories. Moreover, people have rich conceptual structures that guide thinking and behavior. Is a canary a bird? What is a “doggier” dog, a dachshund or a golden retriever? Where might you go for an unpleasant vacation? When answering these questions, are your answers somehow explained by bodily states or potential actions? People possess so much general knowledge, with no appreciable connections to the body, it seems untenable to posit embodiment as a basis for thinking.

Although we consider conceptual knowledge difficult to reconcile with EC, there have been prior attempts (e.g., Allport, 1985; Barsalou, 2008). For example, making conceptual judgments is slower when a person must “switch implied modalities” from one trial to another in an experiment (e.g., Barsalou, 1999; Pecher, Zeelenberg, & Barsalou, 2004), such as judging whether “lemons are tart,” followed by whether “thunder is loud.” Pecher et al. (2004) suggested that people use sensorimotor simulation while accessing conceptual knowledge in such a task. Although we do not question the results, we note that a vast array of conceptual questions do not logically entail sensorimotor dimensions, and the implied manner of simulation is far too scientifically flexible. Is titanium a metal? A person might answer “yes” by finding that knowledge in memory, or by internally simulating some experience wherein she touched titanium and realized that it felt like other metals. The latter possibility is more complex, still requires memory, and appears unmotivated. It is a theory of embodiment, simply for the sake of embodiment.

Short-term memory scanning

In Chapter 1 of his book Embodied Cognition, Shapiro (2011) nicely summarizes the classic study by Sternberg (1966) on short-term memory scanning. In this procedure, a person first memorizes a series of one to six digits. A moment later, a test digit is shown, and the person must quickly indicate whether it was in the original set. Sternberg hypothesized that such “scanning of short-term memory” might engage any of three processes. It could occur in parallel, which would create flat RT functions: “Yes” and “no” responses would be equally fast and unaffected by set size. Alternatively, scanning might occur in a serial, self-terminating manner, with the person serially searching working memory for a match to the test digit, responding “yes” when a match is found or continuing until all options are exhausted. This would create a pattern wherein RTs increase with set size increases, but the slope for “no” responses would be double the slope for “yes” responses. (The serial, self-terminating search nicely accords with common experience: Once you find your keys, you stop searching for them.) Finally, a person might serially scan the memory set, but always scan the entire set, even if a match is detected along the way (serial-exhaustive search). This would create a pattern wherein “yes” and “no” RTs would again overlap, both increasing with larger set sizes.

The actual, surprising result matched the serial-exhaustive search prediction: RTs increased linearly with set size, with no divergence of “yes” and “no” trials. Thus, memory scanning is akin to searching your entire house for your keys, even after finding them. The counterintuitive result makes sense when considering that scanning time (approximately 40 ms per item) is very fast, whereas decision time (“yes” vs. “no”) is estimated to require about 250 ms. The original Sternberg (1966) study has been cited over 3,000 times (Google Scholar) and has inspired numerous empirical and theoretical extensions (Kahana & Sekuler, 2002; Monsell, 1978). For example, Nosofsky, Little, Donkin & Fific (2011) recently applied an exemplar-based random-walk model to the Sternberg paradigm, fitting an impressive array of data, including RT distributions from individual participants.

How might EC help us understand the Sternberg memory-scanning paradigm? Even Shapiro (2011) offered no embodied account: The speed of internal scanning is too fast to correspond with simulated action, and there is no reasonable way to attribute the results to the environment, perception–action loops, or a mind without representations. From a scientific perspective, there appears to be little gained from asserting that short-term memory scanning is rooted in bodily experience.

Priming effects

In any word-perception task (e.g., lexical decision, naming, identification), there are myriad and robust priming effects, costs and benefits based on recent context. Priming arises in both perception and memory, from various underlying relationships. Arguably the strongest are repetition priming effects which arise in word perception (Forster & Davis, 1984; Scarborough, Cortese, & Scarborough, 1977) and memory (Jacoby & Dallas, 1981). In repetition priming, an item is presented at Time 1 and repeated at Time 2, with wide variations across experiments in terms of materials, tasks, and delays between repetitions. The effects are profound, changing perceptual fluency, feelings of memory, neural habituation and other measures. There are also numerous form-priming effects, wherein perception of a word (e.g., clock) is affected by preceding words that are orthographically or phonologically similar (e.g., flock, click).

In some regards, priming effects appear consistent with EC. For example, assume that word perception inherently involves motor simulation of the articulatory gestures used for word production. It becomes easy to predict that repeated simulation will become fluent. Other priming results also naturally emerge from EC. For example, the modality effect shows that repetition priming is stronger when both word presentations occur in the same modality (e.g., visual–visual), rather than changing modalities across repetitions (Scarborough, Gerard, & Cortese, 1979). Priming appears loosely tethered to the perceptual channel used for encoding, which seems consistent with an embodied account, relative to accounts based on abstract underlying symbols.

Once again, however, broader examination of priming quickly undermines any logical connection to EC, unless we resort to trivial truisms. Consider semantic priming (e.g., Becker, 1980; Meyer & Schvaneveldt, 1971): Processing a word such as bread improves processing of the related word butter (relative to a “neutral” prime, such as #####), and impairs processing of an unrelated word, such as giraffe. Semantic priming seems to entail both automatic activation of semantic neighbors in memory and strategic expectancy effects (Neely, 1977). With respect to EC, how might we explain semantic priming? Does the body explain why doctor primes nurse? Those are moderately “embodied” concepts, but what about sky priming cloud, or China priming Japan? The word light can create priming effects for switch, heavy, dark, weight, bulb, and house. In trying to explain such effects, do we gain any leverage from asserting that “cognitive processes are influenced by the body?” Do they suggest a mind without representations? Clearly, priming is guided by the person’s immediate environment (i.e., the presented words), but this statement is theoretically empty. Finally, there is a rich literature on masked priming, wherein primes are subliminal, yet create patterns of semantic and form priming (Abrams, Klinger & Greenwald, 2002; Kinoshita, 2006; Lupker & Davis, 2009). Such effects help elucidate what happens when lexical representations receive an activation boost, without strategic responding by observers. We cannot envision any reasonable embodied account that predicts or explains subliminal priming.

Face perception

People are both remarkably good and remarkably poor at face perception, a domain that demonstrates perception, attention, and memory working together. Imagine that you are at the airport, waiting to meet someone as they exit the terminal. Depending upon whom you are meeting, the experiences may differ dramatically. Perhaps you are meeting someone for the first time, but she has provided a description: “Im blonde and will be wearing a red jacket.” Given this clue, you can tune visual attention, allowing red to “pop out” from the crowd, then focusing on each person who catches your eye. But, even if you see a person matching this description, “blonde” is a broad and common category, so several false alarms may occur before an eventual hit. Alternatively, perhaps you have seen a photograph of the person. This would allow you to scan the crowd, pausing to consider potential matches, and eventually spot someone who is probably correct. But if the person has changed hairstyles since the photograph was taken, the task will be challenging, with high potential for a miss. Finally, perhaps you are picking up a spouse or close friend. In this case, you can disengage attention almost entirely, loosely scanning the crowd, confident that your eyes will be drawn to your familiar target, regardless of clothing or variations in appearance.

In the foregoing example, the various target individuals differ only in their familiarity to the observer. In more cognitive terms, they differ in the degrees to which they allow top-down matching from memory. In visual search, speed and accuracy are powerfully affected by the quality of internal target representations (Hout & Goldinger, 2014). An expert radiologist can detect CT anomalies better than a novice; you can find your own child quickly on a crowded playground. In face perception, there are surprisingly profound performance differences based on top-down knowledge. Given unfamiliar faces, observers are surprisingly poor at detecting whether two photos depict the same person (Megreya & Burton, 2006, 2008; Papesh & Goldinger, 2014). Consider the photo-ID matching task shown in Fig. 1, with the simple task of deciding whether each license photo matches the adjacent person. Even knowing that half the examples are mismatching faces, the task is quite challenging.Footnote 2

Face matching is very different when viewing familiar people. Anecdotally, it is trivially easy to recognize a close friend, even if she changes hair color. Familiar actors are easily recognized across movies. In a recent study, Jenkins et al. (2011) had U.K. participants sort 40 photographs into separate piles, such that each pile should only contain photographs of the same person. Unknown to participants, only two individuals (both Dutch celebrities) were included in the set of 40 photographs. No participants accurately sorted the photographs into two piles, with 7.5 piles as the median performance. In contrast, nearly all Dutch participants (for whom the celebrities were familiar) sorted the photographs into two piles. Similarly, people may attend reunions once every 10 years, but easily recognize hundreds of old friends (Bahrick, Bahrick, & Wittlinger, 1975).

Another robust effect is the own-race bias (ORB): People are better at discriminating among (unknown) members of their own race, relative to other races (Chiroro & Valentine, 1995; Goldinger, He, & Papesh, 2009; Meissner & Brigham, 2001; Valentine & Endo, 1992). This effect does not reflect inherent differences in physiognomic variability across races and is not strongly predicted by racial attitudes. Instead, it seems to emerge as a function of perceptual expertise (the contact hypothesis), developing early in childhood (Kelly et al., 2007). Some findings suggest that own-race faces are processed more holistically than other-race faces (Michel, Rossion, Han, Chung, & Caldara, 2006), allowing own-race faces to be classified in a higher dimensional space. The ORB is widely observed in face learning, memory and neural-processing measures. But, as with face matching, the ORB does not affect the perception of familiar faces, which appear to enjoy “special” status.

Taking these ideas together, face perception is surprisingly error-prone when processing unknown faces, especially from other races. On the other hand, familiarity confers robust face recognition, despite myriad changes in appearance, age, or context. Even without familiarity, people fluently appreciate faces as high-dimensional perceptual objects, instantly classifying them with respect to sex, race, age, attractiveness, emotional states. For known people, however, we are often sensitive to subtle cues that strangers might not appreciate. Face perception therefore illustrates a general principle in cognitive science: In perception, classification, and memory, theories must account for both the generality and specificity of knowledge. For example, people can appreciate dogs as a category, and can discriminate dachshunds, Dalmatians, and Pomeranians. But they can also recognize their own dogs as familiar pets.

In EC, a recurring theme is that embodiment connects cognitive, cultural and emotional processing (Glenberg, 2010). Of all topics considered thus far, embodiment appears best positioned to address face perception. Faces generate expressions by virtue of motor commands, leading to visible displays that are easily simulated and imitated. Faces (and people) move, allowing the full leverage of perception–action loops for tracking changes over time. Having conceded these points, we still cannot understand how face perception “works” from an EC perspective. How do bodily influences predict the own-race bias? What EC principle predicts the dramatic changes that arise between known and unknown people? Perhaps, once we become familiar with someone (even indirectly, as with famous actors), we develop fluent routines for simulating their idiosyncratic facial gestures. Given this hypothesis, why are there such dramatic differences between recognizing static images of known and unknown people? Finally, although EC is claimed to encompass cultural and emotional processing, the mechanisms to achieve such connections are unexplained.

Faces are visible and expressive parts of other peoples’ bodies and seem perfect for embodied theories of person perception. Yet, we quickly encounter the same conceptual barriers as before: How can EC explain large psychological effects that clearly derive from stored knowledge? In everyday cognitive life, you can scan a crowded room and easily spot your friend, an amazing perceptual feat. The fluency and stability conferred from known faces cannot be attributed to bodily states, or cues in the environment, or a mind without representations. Although person perception likely involves perception–action loops, they are not sufficient. In ecological psychology, there are principles to explain how a person tracks and intercepts a Frisbee (e.g., optic flow, tau). But what if there are multiple Frisbees in the air, and the perceiver must catch only his own? Perhaps all the flying Frisbees belong to the perceiver and, once they are airborne, he is told to “catch the one you bought last month.” Now, personal memory must be used in concert with ecological principles. Although perception–action coupling is critical to achieving the goal (as in McBeath et al., 1995), even “strongly embodied” behaviors are easily understood to require a broad array of psychological processes. Face perception has all the hallmarks of embodiment, but EC fails to address its inherently cognitive dimensions.

Serial recall

In a simple memory task, people may hear a series of words, then later recall them (either while trying to preserve order, or in any order). The results can be plotted, showing recall rates as a function of each word’s position in the original list (McCrary & Hunter, 1953; Deese & Kaufman, 1957). In almost all cases, items are best recalled from the beginning and ending of the list (the primacy and recency effects, respectively), leading to a U-shaped serial position curve (SPC; Murdock, 1962). The SPC is a classic result, easily replicated in a classroom and across numerous changes of materials, modes of presentation, and participants (Eslinger & Grattan, 1994).

The most common account of the SPC posits that the primacy and recency effects reflect different memory systems. Early items are rehearsed and transferred into LTM, allowing later retrieval. Late items are still active in STM when testing begins, and can thus be recalled if no distraction occurs (e.g., Rundus, 1971). Consistent with this theory, behavioral manipulations elicit double dissociations of the primacy and recency effects. For example, presenting items faster decreases the primacy effect but leaves the recency effect unchanged. Conversely, distracting participants just after list presentation will eliminate the recency effect but leave the primacy effect unchanged. Different forms of brain damage selectively modulate each effect, leaving the other untouched.

Unlike prior topics in this review, the theoretical division of STM and LTM has been directly addressed in the EC literature, most notably by Glenberg (1997). In our view, his theory is fairly schematic, with no clear account for the enormous empirical literature on serial recall. The first claim is that memory reflects modality- and effector-specific interactions with the world (Barsalou, 2008; Glenberg, 2010), meaning either real or simulated sensorimotor experiences. We cannot discriminate this claim from any cognitive theory, wherein memories reflect real or imagined experiences. The second claim is that memory is not dissociable into systems or subsystems. Glenberg (1997) explicitly rejected the hypothesis of short- and long-term stores, stating that STM is simply an “illusion.” Many cognitive theories posit continuity between these systems, for example, suggesting that STM is an activated subset of LTM (e.g., Cowan, 1993). However, by positing no division at all, it appears difficult for EC to predict primacy and recency effects, or to accommodate all the neurological and behavioral data for dissociations.

Speaking more generally from EC principles (rather than focusing on one specific article), we arrive at a familiar impasse. The data are simple: Words are presented in serial order, but recall creates a U-shaped function, with leading and trailing branches that are independently affected by different manipulations. As before, we must ask how the body, or the environment, or sensorimotor simulations create this pattern. Although we do not advocate for any particular model, the cognitive literature offers numerous computational models that address serial recall. Such models can predict the SPC and related effects; many produce impressive quantitative fits across dozens of experiments. The response from EC is a blanket rejection of the principles that motivate those models, with no coherent alternative explanation.

Generalization in psychological space

In classic research on associative learning in dogs, Pavlov (1927) famously discovered that if some signal (e.g., a bell) consistently preceded the delivery of food, the dogs would quickly learn its predictive value, and the signal could then trigger salivation alone. He also discovered stimulus generalization: Other sounds could also trigger salivation, with stronger responses for sounds that more closely resembled the original signal. In the following decades, generalization became a bedrock principle of learning and behavior: Once a person or animal learns something about stimulus X, that learning will generalize to stimulus Y, as a function of the perceived similarity between X and Y. Generalization can take many forms, such as perceptual confusion, slower discrimination, or implicit biases (e.g., a man dislikes his boss, then feels irrational hostility toward other people who resemble his boss).

Regardless of the organism or stimuli involved, the generalization gradient is a function that describes the “drop-off” in responding as the similarity between learned and novel stimuli decreases. Although learning theorists (such as Hull, 1943) were eager to discover a systematic function governing generalization, they became discouraged: When physical stimulus differences were measured, many different gradients were observed. Moreover, gradients differed across species, and across individual animals or people. Decades later, Shepard (1987) proposed a solution – a universal law of generalization is achievable when relations among stimuli are cast in psychological space, such as one derived using multidimensional scaling. Shepard showed that, once stimuli are properly represented in this abstract space, generalization across stimuli decreases exponentially with their psychological distance. He then derived a mathematical theory wherein simple geometric assumptions predict the exponential gradient, across numerous conditions. The concepts from Shepard’s theory have been expanded (Chater & Vitányi, 2003) and are critical to models of perceptual classification (e.g., Nosofsky’s 1984, 1988, generalized context model).

Without articulating many new assumptions, it appears impossible for EC to explain (or coherently address) lawful generalization across items in psychological space. As Shepard (1987, p. 1318) wrote, “Analogously in psychology, a law that is invariant across perceptual dimensions, modalities, individuals, and species may be attainable only by formulating that law with respect to the appropriate abstract psychological space.” By definition and design, the principles of EC are exceedingly concrete, such as bodily cues and actions, movement in the environment, external resources, and cognition without representations. None of these ideas comport with stimulus relations inside abstract psychological spaces. The universal law of generalization is an elegant achievement in cognitive science. It is inconsistent with EC, not only because EC is too vague to allow mathematical formulation, but because its core tenets directly contradict the critical ideas that make Shepard’s law possible.

Mental rotation

Among all topics in cognitive science, mental imagery is perhaps the most challenging to study with scientific rigor. A person may affirm that she is imagining some object or action, creating activity that registers in fMRI, but how can we evaluate the substance of her imagery? The best known approach is the mental rotation procedure, developed by Shepard and Metzler (1971). In this task, a person is shown two figures and must quickly decide whether they are identical, or mirror images of each other. The objects are misaligned, with orientations that mismatch along the vertical axis to various degrees. Shepard and Metzler’s data were striking: RTs to correctly classify “same” pairs increased in linear fashion as the angle of rotation increased, suggesting that people mentally rotated one image, relative to the other, until they could appreciate a match. Since the original study, hundreds of experiments have replicated and extended mental rotation, finding similar results across objects and procedures (e.g., Cooper & Shepard, 1973; Jolicoeur, 1985).

When it comes to mental rotation, there is considerable evidence that motor activity accompanies mental imagery, although EC does not provide a complete account. In behavioral data, Wexler, Kosslyn and Berthoz (1998) observed systematic patterns of facilitation and interference when people performed concurrent mental and physical rotations. Dozens of neuroimaging studies have shown activity in premotor and motor cortices (among other brain regions) during mental rotation. These studies typically indicate motor-related activity as a fundamental correlate of mental rotation (Cohen & Bookheimer, 1994; Richter, Somorjai, Summers, & Jarmasz, 2000; Vingerhoets, de Lange, Vandemaele, Deblaere, & Achten, 2002; see Zacks, 2008, for meta-analysis). If motor cortex is stimulated using TMS, it changes mental rotation performance (Ganis, Keenan, Kosslyn & Pascual-Leone, 2000). Unlike prior topics in this review, mental rotation is influenced by the body and is performed (at least concurrently) with motor simulation. Moreover, in keeping with the spirit of this article, the embodiment hypothesis makes sense with respect to mental rotation. The task does not clearly require stored representations, the psychological process has a clear physical counterpart, and the task naturally recruits brain regions that typically guide object manipulation in space. Nevertheless, it remains challenging to argue that EC helps to explain mental imagery in a broader sense: Although mental rotation recruits motor systems, how might we address other forms of imagery (such as conjuring a mental image of a rose) that lack corresponding motoric tasks? In our view, a more reasonable claim is that the human mind can recruit motor knowledge when it is beneficial to some task, but motor knowledge cannot explain other common forms of mental imagery.

Sentence processing

The preceding sections have focused on classic findings from cognitive science (e.g., semantic priming), without regard to their presence or absence in the EC literature. In this final section, we specifically focus on the most prominent finding that motivates EC. The action-sentence compatibility effect (ACE; Glenberg & Kaschak, 2002) is a hallmark finding in EC, implicating the motor system in language comprehension. In the ACE paradigm, people make sensibility judgments about sentences that imply movement either toward or away from themselves. For example, concrete sentences might be, “Close the drawer,” or “You tossed the keys to Christine.” Experiments may also include abstract sentences such as, “You told Mike about the theory” versus “Mike told you about the theory,” implying movement away from and toward the participant.

Glenberg and Kaschak (2002) developed an innovative method, allowing them to examine whether overt motor behaviors interact with (theorized) motor simulation during language processing. Participants made “yes/no” sensibility decisions using a special response box with buttons near and far to themselves, and a central key that served as a launching point. The “sensible” response button was located either near or far, such that responding involved moving the arm either toward or away from oneself. When sentence-implied movements matched the required response movements, reading times (the latency between sentence onset and releasing the “start” key) were relatively fast. When the implied and intended movements were incompatible, reading times were slower. As Glenberg and Kaschak (2002, p. 558) wrote, “These data are consistent with the claim that language comprehension is grounded in bodily action, and they are inconsistent with abstract symbol theories of meaning.”

The ACE is widely cited as evidence that language comprehension is embodied, rather than symbolic. As of January, 2015, Glenberg and Kaschak (2002) had been cited over 1,400 times (Google Scholar). It also motivated numerous studies examining motor activity during word or sentence perception (including behavioral data, neuroimaging, EMG measures, or TMS interference; Borreggine & Kaschak, 2006; Buccino et al., 2005; Chersi, Thill, Ziemke, & Borghi, 2010; de Vega, Moreno & Castillo, 2013; de Vega & Urrutia, 2011; Glenberg et al., 2008a, b; Kaschak & Borreggine, 2008; Nazir et al., 2008; Pulvermüller et al., 2005; Sato et al., 2008; Zwaan & Taylor, 2006). These studies have typically produced results consistent with the EC view of language processing. For example, Pulvermüller et al. (2005) used MEG to show that processing action verbs results in premotor and motor activity within 200 ms of word onset. Across studies, the typical account is that sentence processing requires internal simulation that recruits corresponding sensorimotor brain areas (an idea often linked to mirror neurons; e.g., Glenberg & Gallese, 2012).

The ACE has generated considerable debate, with authors questioning the results and interpretation (e.g., Arbib, Gasser, & Barrès, 2014; Mahon & Caramazza, 2008; Weiskopf, 2010). Our goal is to address a broader issue: Once researchers have defined an arena for scientific inquiry, there is a strong tendency for other researchers to focus on that arena. In the case of Glenberg and Kaschak (2002) and many following studies, there has been a strong focus on motor-related words and phrases. Many theorists have noted that purely abstract language poses a challenge to embodied accounts of language, and some EC theorists have conceded that hybrid theories may be required (e.g., Zwaan, 2014). Despite this concession, we must ask our familiar question: As with word frequency, prototype abstraction and other findings, does EC really help explain sentence processing?

Here is the problem, stated plainly: In the present article, the vast majority of sentences cannot be “simulated,” or mapped onto actions, in any transparent manner. That is true for this sentence, and the prior one, and nearly every previous one. Consider the earlier sentence: “Familiar actors are easily recognized across movies.” This is a perfectly legitimate sentence, but offers no obvious (or subtle) approach to simulation. Even though our opening paragraph described actions being performed by a young woman, it included sentences such as: “Upon seeing an unfamiliar car in a numbered parking spot, she wonders whether new neighbors have moved in downstairs.” This sentence is readily understood and can be visually imagined, but how exactly would the motor system intercede in comprehension? If vanishingly few sentences are suitable candidates for motor simulation (such as this one), then positing simulation as a core principle is theoretically empty.

To their credit, Glenberg and Kaschak (2002) recognized this issue in their original article. They dismissed it, however, with a flourish of speculation, using the chimerical power of affordances. As they wrote (p. 563):

What is the scope of this analysis? Clearly, our data illustrate an action-based understanding for only a limited set of English constructions. Furthermore, the constructions we examined are closely associated with explicit action. Even the abstract transfer sentences are not far removed from literal action. Although we have not attempted a formal or an experimental analysis of how to extend the scope of the [indexical hypothesis], we provide three sketches that illustrate how it may be possible to do so. Consider first how we might understand such sentences as “The dog is growling” or “That is a beautiful sunset.” We propose that language is used and understood in rich contexts and that, in those rich contexts, some statements are understood as providing new perspectives—that is, as highlighting new affordances for action. Thus, while taking a walk in a neighborhood, one person may remark that an approaching dog is quite friendly. A companion might note, “The dog is growling.” This statement is meant to draw attention to a new aspect of the situation (i.e., a changing perspective), thereby revealing new affordances. These new affordances change the possibilities for action and, thus, change the meaning of the situation. A similar analysis applies to such sentences as “That is a beautiful sunset.” The statement is meant to change the meaning of a situation by calling attention to an affordance: The sunset affords looking at, and acting on this affordance results in the goal of a pleasurable experience.

We have two general responses to this quote. First, in mechanical terms, we cannot conceive of any language comprehension system that would allow a person to appreciate the affordances of a sunset as a precondition to understanding a sentence about that sunset. The claim is that motor simulations (or situational affordances) are integral to linguistic processing, but what system could theoretically activate such high-level semantics before the sentence itself is processed? This problem arises even for clear “motor” sentences, such as “Jane handed David the stapler.” Although “handing something” could activate a motor simulation, how would the rest of the sentence (two people and a stapler) become part of that simulation, in advance of sentence understanding? There are well-known theories in word perception (e.g., Harm & Seidenberg, 2004) wherein semantic features can generate top-down feedback to facilitate perception, typically for words that are “disadvantaged” (low-frequency, inconsistent words; Strain et al., 1995). Such a system could be conceived for motoric features, which are conceptually akin to concreteness, but their potential role is logically limited to a small set of sentences.

Second, we are powerfully struck by the similarity between Glenberg and Kaschak’s (2002) speculation and the earlier quote from Chomsky (1959). As presented, affordances are wholly unconstrained. Given the hypothesis that context constrains interpretation, we could doubtless find many confirming examples. However, we could also generate thousands of sentences with no contextual relevance (or affordances), and people would readily understand them all. “Few people realize it, but Hitler adored paintings of kittens.” An appeal to affordances does not address the motor simulation hypothesis, and it renders the embodied account untestable. Taking the EC principles in turn, the claim that language perception is “fundamentally embodied” or entails motor simulation is untenable. There are far too many sentences (like this one) wherein “simulation” makes no sense. Appealing to the environment (or context-specific affordances) does not help, because countless sentences are understandable without connections to context. Finally, explaining sentence perception without internal representations appears hopeless.

Closing comments: The emperor has a body, but no clothes

In this article, we have repeatedly suggested that, despite current enthusiasm, EC falls woefully short – on simple, logical grounds – of addressing any aspect of cognitive life. Clearly, people have profound connections of body and mind. From the perspective of cognitive science, it is theoretically comfortable to acknowledge that bodily states may affect cognition, and that cognition may affect bodily states. Our bodies provide sophisticated information-bearing channels, beyond vision or audition. A well-adapted mind should use any available, reliable signals. Similarly, we have evolved mechanisms wherein mental states (e.g., fear) can affect physiological functions. For these reasons, “weakly embodied” approaches to cognition are completely plausible (although, as noted by Mahon, 2015, they are largely indistinguishable from purely cognitive accounts). When research shows that action-related words trigger activity in motor cortex (Pulvermüller et al., 2005), or that object perception is affected by the presence of graspable handles (Bub & Masson, 2010), such effects are easily incorporated into perceptual models from cognitive science. The reverse relationship does not hold: If one adopts the stance that cognition is fundamentally rooted in bodily states, a vast array of data are immediately beyond hope of theoretical explanation. Strong versions of EC are logically unable to address almost any cognitive findings, including sentence processing, despite its prominence in the EC literature.

In our view, the enthusiasm surrounding EC is genuine but misguided. To help illustrate its essential, scientific flaw, we ask readers to imagine a scenario: It is approximately 30 years ago, before embodied cognition was ever hypothesized. Instead, a collection of researchers make several key observations showing that emotional states profoundly affect cognitive processing. When people experience strong negative or positive emotions, it has powerful effects on their attention to the environment, their perception of other people, the memories they create, the decisions they make, and their behavioral repertoires. In fact, some aspects of cognition cannot be satisfactorily explained without emotions, which seem to defy modeling via equations and parameters. When shown stimuli denoting emotional events, people display strong effects across the board, with changes in thought patterns, skin conductance, and muscle tension. Emotional states strongly modulate social and cultural interactions, effects that occur in individuals, but also in groups and even nations. When the brain is imaged using PET or fMRI, there are clear and powerful signatures indicating both the strength and valence of emotions.

The researchers name their burgeoning field emotional cognition and quickly discover that “everything makes sense.” Emotions are shared across animal species, and were a powerful evolutionary force that ensured survival. Emotions shape bonding relationships (parental, romantic, tribal) and are a cultural force. The brain is replete with deep connections between emotional and cognitive centers (e.g., the hippocampus and prefrontal cortex). Variations in emotional stability predict learning, economic prospects, creativity, and other important outcomes. Eventually, a core hypothesis emerges that “cognition is fundamentally rooted in emotional states, which are shared across species and cultures, have deep evolutionary roots, and are reflected in perceptual, neural, and endocrine systems.” In the laboratory, this core hypothesis repeatedly finds confirmation: Dozens of experiments show that emotion-related and neutral stimuli elicit different reactions and that emotions warp cognition. There is a surge of publications, students clamor to join prominent labs, and the field rapidly gains prominence.

This scenario, which closely parallels the emergence of embodied cognition, is seductive and compelling. What is hard to appreciate, however, is the quandary that arises for scientific debate. In their fervor, the researchers made an ill-founded leap: Having discovered that emotions affect numerous cognitive processes, and knowing that emotions are evolutionarily ancient and culturally profound, they theorize that emotions mediate all cognitive operations. This creates a scientific (and sociological) trap. The core hypothesis could be falsified, merely by documenting some cognitive behaviors that are not affected by emotions. This situation, however, requires scientists who disagree with emotional cognition to build their case using null results, experiments wherein emotions fail to affect behavior. As is well-known, null effects can arise for many reasons (including bad experiments) and are therefore rarely published. In theory, a few “good” null effects could topple a scientific theory. In practice, it would likely take many years and hundreds of studies, due to the obstacles surrounding null effects.

At a deeper level, once ideas takes root, they can become profoundly difficult to dislodge. Scientists are human beings, unlikely to abandon a theory that feels right, based on some null effects. When a dozen experiments support a key hypothesis, it becomes easy to dismiss occasional failures to replicate. At best, a new research enterprise may emerge, as different researchers attempt to understand why the effects come and go. This cycle has a profound sociological effect, such that broad theoretical premises fade into the background as attention shifts toward technical details. Ultimately, the new theoretical perspective becomes part of the scientific landscape, without ever having to defend its core assumptions. What about all the cognitive domains that seem devoid of emotional (or embodied) content, such as mental arithmetic, reading simple text, discriminating cars from trucks, or recalling your childhood phone number? Such phenomena are either ignored or slated for “future research.”

We suggest that, in the case of embodied cognition, a similar course of events has taken place. A small set of phenomena were identified, such as the ability to catch fly balls, that are not well-suited to cognitive explanations but do involve locomotion. There are deep and obvious connections between the body and mind. It is readily shown that bodily states affect mental operations such as attention, perception and reasoning. Conversely, mental states elicit bodily changes in posture, muscle tension, and adrenaline levels. Researchers appreciated that, if cognition were attuned to bodily states, it would have great evolutionary benefits. They also appreciate that different cultures are connected by their shared humanity, and that bodily movements are required for exploring and manipulating the environment. Everything makes sense, numerous experiments provide support, and enthusiastic collaborators and students embrace the new theory. Ultimately, the claim becomes that all cognition is profoundly rooted in bodily experience, the call for a paradigm shift. From that moment forward, the blinders are firmly in place: Cognitive life continues apace, filled with behaviors that defy embodied accounts, but are rarely acknowledged. All EC researchers can remember their mothers’ names, their favorite toothpastes, and their fifth-grade teachers. They can discriminate toucans and penguins, and appreciate that neither is very “birdy.” They should also appreciate that embodied cognition cannot logically explain these basic aspects of mental life.