- First Online:
- 2.8k Downloads
Embodied cognition is sweeping the planet. On a non-embodied approach, the sensory system informs the cognitive system and the motor system does the cognitive system’s bidding. There are causal relations between the systems but the sensory and motor systems are not constitutive of cognition. For embodied views, the relation to the sensori-motor system to cognition is constitutive, not just causal. This paper examines some recent empirical evidence used to support the view that cognition is embodied and raises questions about some of the claims being made by supporters.
KeywordsAction-sentence compatibility effect Affordances A-modal symbols Cognition Embodied Indexical hypothesis Meaning Sensible Sensori-motor Symbol-grounding
The view that cognition is embodied (Valera et al. 1991; Gibbs 2006; Gallagher 2005) is rapidly gaining prominence in the world of cognitive science, and is aiming for dominance. “Embodiment” in embodied cognition covers different things for different scientists, so I will limit my remarks here to a particular strand that is being investigated largely in the psychological literature (Barsalou 1999, 2010; Gallagher 2005; Glenberg and Kaschak 2002; Glenberg et al. 2004, 2010a, b; Pecher and Zwaan 2005). In this literature, the view that cognition is embodied comes to the view that cognition takes place, not only in a central system (Fodor 1983), but in the perceptual and motor systems as well. The basis for this claim consists of some indeed stunning data from the psychological laboratory. The question I pursue here is whether the data, impressive though it is, supports the claims being drawn from the data. I believe that it does not support many of the conclusions drawn and will explain why.
I find a certain similarity between what is now happening in the literature on embodiment and what took place in the mid-twentieth century over the introduction of the Identity Theory of mind and brain. Arguments for the identity theory had two steps: an empirical step and a logical step. The empirical step consisted of finding empirical correlations between mental states and brain states. The logical step consisted of arguing that the best explanation of the empirical correlations was that mental states are brain states.
In the embodiment literature, we find the empirical step consisting of empirical correlations between certain kinds of cognitive processing and sentence comprehension and certain kinds of perceptual/motor performance. Then we find that the logical step is an argument to the conclusion that the best explanation of the empirical correlations is that cognitive processing of this type just is processing that includes perceptual/motor processing. It is simpler if cognition exploits representations already in the perceptual-motor system. And it helps to solve the symbol-grounding problem (or so it is claimed) if understanding is grounded in knowledge of sensori-motor contingencies recorded in the perceptual-motor system. So it is claimed that supporting embodied cognition (EC) and explaining the empirical data is best achieved by abandoning the notion that concepts are stored in a non-modal system (Barsalou 1999, Glenberg and Kaschak 2002; Glenberg et al. 2004, 2010a, b).
First, I will describe and document a set of data used in the empirical step towards embodied cognition. Then I will explain the claims of the logical step that are being drawn from the data, and finally, I will explain why I have reservations about whether the data support some of the further claims of the logical step being made for embodiment of cognition. And lastly, I will provide some reasons to be skeptical of even weaker conclusions that might be drawn from the empirical data.
The data (empirical step)
Influenced by Barsalou (1999) and Gibson (1979) and being among those who are helping to develop the view that cognition is embodied, Glenberg and colleagues (for example, Glenberg and Kaschak 2002) accept the view meaning is embodied and “consists in a set of affordances…a set of actions available to the animal.”(558) On this view, words and phrases are indexed or mapped to perceptual symbols—calling this the Indexical Hypothesis (IH) about meaning. And they see perceptual symbols as modal and non-arbitrary. That is, the affordances are derived from perceptual symbols and the meanings of these symbols are grounded in the sensori-motor system.
Glenberg and Kaschak (2002) report a phenomenon that they call the action-sentence compatibility effect or (ACE). They discovered that “merely comprehending a sentence that implies action in one direction (e.g., ‘close the drawer’ implies action away from the body) interferes with action in the opposite direction (e.g., movement toward the body).”(558) From this they maintain that “language is made meaningful by cognitively simulating the actions implied by sentences.”(559) They also maintain that “[t]hese data are consistent with the claim that language comprehension is grounded in bodily action, and they are inconsistent with abstract symbol theories of meaning.” Abstract theories of meaning would be those like Fodor’s (1983, 1990), where symbols in the central system are not necessarily derived from the perceptual or motor systems and where the meaning of a symbol is not grounded in perceptual motor activity (though of course it may be mediated by such activity).
In an experimental setup, Glenberg and Kaschak ask subjects to read sentences and determine whether they are “sensible” (as opposed to nonsense or non sensible). A sensible sentence might be “Hang the coat on the vacuum cleaner.” A non-sensible sentence might be “Hang the coat on the coffee cup.” Subjects are instructed to determine as quickly as possible whether the sentences are sensible or not and then press a “yes” or a “no” button. They begin with their index finger on a neutral button. In the set-up the “yes” button can be either nearer to the body than the “no” button, or reversed and be farther from the body than the neutral button. So the subjects have to move their finger either towards their bodies (near) or away (far) from their bodies to answer the questions.
In the ACE effect, Glenberg and Kaschak discovered that subjects were either slower or faster to answer the questions depending on whether the movement they made (toward or away) from their bodies matched or conflicted with the movement in the meaning of the sentence. So a “toward” sentence might be “Open the drawer.” or “Put your finger under your nose.” These imply movement toward the body. And a typical “away” sentence might be “Close the drawer.” or “Put your finger under the faucet.” The yes-button was either near or far. On a button box 28 × 18 × 6 cm the subjects used their right index finger to respond to the sentence. Then they either had to move to the yes-button that was nearer their body (in the near condition) or farther away from their body (in the far condition).
The prediction was that to answer the question, the subjects had to run a mental simulation of the perceptual-motor system (Barsalou 1999). “If this simulation requires the same neural system as the planning and guidance of real action, understanding a toward sentence should interfere with making a movement away from the body to indicate yes.” Glenberg and Kaschak (2002) found that subjects were significantly slower (by up to 30 ms) to respond when there was a mismatch in direction (yes-is away and the sentence is toward/yes-is toward and the sentence is away). They later replicated their results (Glenberg et al. 2010b), finding that participants judged sensibility of sentences such as “You gave Andy the pizza” or “Andy gave you the pizza,” by moving the hand from a start to a yes button. Responding was faster when the movement was consistent with the action of the implied sentence. Apparently, understanding these action sentences called on the same neural and bodily states involved in the real action. (3).
In a follow-up study, (Glenberg et al. 2010b) decided to look at the connection between emotion and cognition. With the same view of embodiment in mind, they decided to test whether “full understanding of language about emotional states requires that those emotional states be simulated.” (4) In this experimental paradigm, subjects again were asked to judge the sensibility of English sentences. But this time they either had a pencil between their teeth (promoting facial configuration of a smile) or a pencil between their lips (promoting the facial configuration of a frown). Since “part of understanding emotional language is getting the body into the appropriate emotional state” subjects with pencil in teeth should be faster to understand “pleasant” sentences and with pencil in lips should be slower to understand “unpleasant” sentences.
Samples of pleasant sentences are: “The college president announces your name and you proudly step onto the stage” and “You and your lover embrace after a long separation.” Samples of unpleasant sentences include: “The police car rapidly pulls up behind you, siren blaring,” and “Your supervisor frowns as he hands you the sealed envelope.” The somewhat striking results are that the reaction time for subjects to acknowledge understanding of pleasant sentences was 122 ms faster with pencil in teeth. Their reaction time to acknowledge understanding of pleasant sentences was 45 ms slower with pencil in lips. Glenberg et al. (2010b) take these results to strongly confirm EC.
In another study (Borghi et al. 2004), it was found that subjects are faster to identify parts of objects depending on the subject’s spatial and functional perspective.
So, for example, subjects may read the sentence “You are driving a car” (inside perspective). Or they may read: “You are washing a car” (outside perspective). Subjects are then queried with an inside prompt “Can you see outside?”, “Can you touch the headlights?” or an outside prompt “Is the car in front of you?”, “Is the car behind you?” Subjects who adopted the inside perspective were 150 ms faster to identify inside car parts than outside car parts. Subjects who adopted the outside perspective were 50 ms faster to identify outside car parts than inside car parts.
In a second experiment (Borghi et al. 2004), subjects took the perspectives inside (driving the car) or outside (filling the tank). Then subjects were asked to identify car parts that would be near or far from those perspectives. From inside the license plate would be far. From outside the steering wheel would be far, and license plate near. From the inside perspective, subjects were faster (50 ms) to identify near inside car parts than far inside car parts. From the outside perspective, subjects were faster (100 ms) to identify near outside car parts than far outside car parts. Once again, these are stunning results and somewhat surprising, to say the least. But what do they show?
The conclusion Borghi et al. (2004) wish to maintain is that these results are due to the fact that cognitively the subjects are running perceptual simulators, thereby tapping into affordances (i.e., cognitive supports for interactions between animal and object). Affordances are derived from perceptual symbols and these affordances are what ground the meanings of linguistic and cognitive symbols…evidence…that cognition is grounded in perception and action. (871)
Conclusions drawn (logical step)
Let us look carefully at the claims that are being made in support of EC on the basis of this set of data. I believe that, stunning though the data are, the conclusions being drawn are not supported by the data.
First, Glenberg and Kaschak (2002) suggest that the sentence “Hang the coat on the upright vacuum cleaner” is sensible because one can derive from the perceptual symbol of the vacuum cleaner the affordances that allow it to be used as a coat rack. (559) In contrast, they claim that the sentence “Hang the coat on the upright cup” is not sensible in most contexts, because cups do not usually have the proper affordances to serve as coat racks. (559) They continue, claiming that: “Neither of these judgments could be based on…abstract symbols; because abstract symbols are arbitrarily related to their referents, one cannot derive new affordances from them.” (559)
Second, Glenberg and Kaschak (2002) further clarify their belief that EC is to be contrasted with the view of understanding that exploits a-modal symbols in a central system, where the meaning of a sentence is not derived from affordances of symbols in the perceptual-motor system. In contrast to that a-modal view, they maintain that “…language is made meaningful by cognitively simulating the actions implied by sentences.” (559) In support of EC, they offer the claim that “…most…would reject as nonsense a sentence such as ‘Art stood on the can opener to change the bulb in the ceiling fixture.’ According to the Indexical Hypothesis (Glenberg and Kaschak 2002) this sentence makes little sense because readers have a difficult time imagining how to combine the affordances of a can opener and a light bulb to accomplish the goal of changing a ceiling fixture.” (Glenberg et al. 2004, p. 426)
The first question I will raise about this line of reasoning is how Glenberg and Kaschak (2002) are using the word “sensible.” On the one hand, does “sensible” mean “mean?” Is this a new-wave, i.e., EC-verificationist theory of meaning? In some cases, it seems that (Glenberg and Kaschak 2002) are claiming that the meaning of an English sentence consists in a set of sensori-motor simulations that a subject must perform to understand that meaning. Yet, if “sensible” means “mean,” then these claims are likely to all be false, and demonstrably so. Everyone knows what it would mean to attempt to hang the coat on the upright cup or to change the bulb in the ceiling fixture by standing on the can opener. Under normal circumstances with normal cups, coats, and can openers, one could not do these things. In fact, it is because one knows what these sentences mean that one can tell that they are FALSE (or that one could not comply with the request). False things are not non-sensible. False things are meaningful (sensible). “Hang the coat on the upright cup.” Everyone reading this knows what this means? It is because you know what it means that it seems silly or ridiculous…something with which you cannot comply. Still, you could probably even try to satisfy it, so there must be some affordances understood. “Art stood on the can opener to change the bulb in the ceiling fixture.” This too is very clear in meaning and false (no doubt)…though there are contexts in which it may be true, of course (huge can openers…extraordinarily long bulbs or low ceilings). The main point is that we know the truth (compliance) conditions…and thereby understand the meaning. The sentences are sensible, though silly or odd.
Now, on the other hand, if “sensible” does not mean meaningful, then what does it mean? Imaginable? Perceptually simulable? If it means one of these, then the claims Glenberg and company are making are analytic and uninteresting. We are being told that subjects cannot perceptually simulate experiences that ground these sentences. I think that is false, but even if it were true, it would be true by stipulation of ‘sensible.’ Even if true, would that tell us that the sentences were not meaningful? No. They still have very clear and determinate truth conditions. Would it tell us that subjects who did not readily simulate perceptual-motor groundings for them, did not understand them? No, not necessarily. It may be that they are still quite understandable, even though subjects are faster on reaction times when there are perceptual groundings readily available. That is the perspective taking and priming (pencil in teeth, lips) may causally induce certain perceptual-motor activity that, in turn, causally induces certain cognitive processes, without the perceptual-motor activity constituting cognitive processing.
Furthermore, if the “Indexical Hypothesis” amounts to a new verifiability theory of meaning, then it is likely to founder in the same place as the old one….on the meaning of the Indexical Hypothesis itself. It says that a sentence is only sensible if an agent can perceptually simulate it using the relevant affordances. Now: can one perceptually simulate IH, itself? No! What are its affordances? So just as the verifiability criterion of meaning was not itself empirically verifiable (Hempel 1950), the Indexical Hypothesis of sensibility may not itself afford affordances for sensori-motor simulation.
To see that Glenberg and Kaschak (2002) and Glenberg et al. (2004, 2010a, b) are not alone in drawing these sorts of conclusions from the data, let us also compare claims by Zwaan and Madden (2005). Like Glenberg and company, they maintain that “…there are no clear demarcations between perception, action, and cognition. Interactions with the world leave traces of experience in the brain. These traces are (partially) retrieved and used in the mental simulations that make up cognition. Crucially, these traces bear a resemblance to the perceptual/action processes that generated them….” (224) They contrast this view with that of Chomsky, Fodor, and Pylyshyn who they claim hold the view that “…the human mind is like a bricklayer…who puts together bricks to build structures. The malleable clay of perception is converted to neat mental bricks we call words and propositions, units of meaning, which can be used in a variety of structures.” (224)
John pounded the nail into the wall.
John pounded the nail into the floor.
They claim to find a shortcoming in the traditional a-modal view. “There is nothing in the traditional view to suggest the nail’s orientation. There is nothing in the traditional view to suggest that the nail’s orientation should be part of the comprehender’s mental representation.”(226)
He saw the eagle in the sky.
He saw the eagle in the nest.
They follow with the claim that “[s]urely the eagle in (9) cannot be the same lego brick as the eagle in (10). In (9) the eagle has its wings stretched out, whereas the eagle in (10) most likely has its wings drawn in…and there is empirical evidence that comprehenders are sensitive to these differences…and this is not predicted by the traditional view.”(226) The idea being presented is that unless we consider the perceptual/motor simulations of driving the nail into the floor vs. the wall or of observing the eagle’s flight vs. its nesting, we cannot explain how sentence comprehension works (or related observed differences in reaction times that vary orientation).
Still, is it that “nail” means something different depending on orientation? Or “pounding” of the nail means something different? Aren’t the truth conditions the same whether I pound the nail vertically into the floor or the wall? It may be weird or unusual to pond a nail horizontally into the floor or vertically into the wall, but if I did so, would the sentences not be true, all the same? And is the eagle not still “on the nest,” with wings out? Or is the eagle not “in the sky” with wings folded, but diving downward?
The meaning of these target sentences is consistent with multiple perceptual/motor experiences. The literal/semantic content of “nail” in pairs (5) and (6) or “eagle” in (9) and (10) seem to be constant across differences in perceptual-motor activity or simulation. True: there is a customary orientation for nails in (5) and (6)…horizontal, vertical, and eagles in (9) and (10), wings out, folded. And this customary orientation may explain differences in reaction times, but it does not mean that the semantics of “nail” or “eagle” differs in them. So, once again, the empirical differences referred to (in reaction time studies) do not support the claims about semantics and EC, that are being drawn from the empirical data.
Xs cause “X”s (as a law)
Y’s cause X’s (Y’s are not X’s)
(2) because (1), but not vice versa.
In his 1987 book, Fodor spends an entire section on what he calls “The psychophysical Basis” of his theory. (Chapter 4) Fodor tells us that horse tokens are caused by horses, via their horsy looks and then adds: “The story is, I admit, sort of old-fashioned…it connects having concepts with having experiences and knowing meaning with knowing what would count as evidence.” (p. 118) Now what is arbitrary is that one set of neurons rather than another instantiate one’s concept of horse. But what is not arbitrary is that one’s concept of horse is causally linked to horses via the perceptual presentation of horses to our cognitive mechanisms. It is not arbitrary that horsy percepts fire one’s concept horse. Not even on Fodor’s a-modal account.
So when EC theorists claim that cognition is grounded in perceptual/motor processing (simulations), is this a semantic thesis? Is it that the very meanings of the referring terms are or are to be verified in the simulations themselves? I’ve been suggesting that this is not the proper way to interpret the data because it is not supported by the data? Even Fodor’s semantics permits that there are causal connections between concepts and perceptual/motor activity. On his view, however, this falls well short of a constitutive relation. The simulations in the perceptual/motor system do not constitute the meaning of the concepts or terms. Symbols are about their real world referents, not perceptual or motor activity.
But might not the causal connectivity between employing concepts in sentence comprehension (and other) tasks and perceptual/motor activity enhance understanding? Can’t it prime or speed up reaction times and account for the kind of data in the studies of Glenberg and Kaschak (2002) and Glenberg et al. (2004, 2010a, b)? Yes, it can and this is the view I will ultimately adopt. I believe this is the proper sort of conclusion to draw from the empirical data on EC.
Conclusion: worries about what the EC data prove
Can the types of data for EC that we’ve considered support something still stronger, short of an account of meaning, but stronger than the view that causal connections between concepts and perceptual/motor activity? Can it support that activity involved in perceptual simulating is necessary for cognitive processing of certain kinds? I believe a stronger claim such as this is possible, but I now want to give some evidence that stands in the way of this stronger claim.
Camarazza and Mahon (2006) raise the issue that the qualitatively same bodily movements can accompany distinct cognitively driven actions. If so, then the “meaning” of a thought or sentence expressing the thought cannot be tied exclusively to perceptual-motor activity, for that activity may be ambiguous, but the meaning of the sentence or action not. They point out that “[o]ne issue that arises is whether such a framework would provide the means for individuating distinct mental states that are coextensive with the same motor program (for discussion, see Jacob and Jeannerod 2005).” (p. 26) Their example is the sentence (or action) “Fred takes a drink of water.” Is this because: (a) he’s nervous? Or because (b) he’s thirsty?
They express the worry this way: “This would seem to indicate that attributions of mental states depend on background knowledge. But how is “background knowledge” embedded in modality-specific input/output systems?” (p. 27)
More important than a sort of “ambiguity” of bodily movement brought out by this example, is the matter of motor-deficiency. Camarazza and Mahon (2006) express the thought this way: “…representation of intentional content on the Simulationist Framework depends on at least two things: (1) empirical demonstrations that production programs are run in the course of recognition, and (2) that such production programs are sufficient to ground conceptual content…First, does recognition of biological motion involve the processes required to produce such motion? Second, what happens to recognition and access to conceptual knowledge when modality-specific output processes are damaged?” (p. 27)
Their answers are extremely interesting: answer #1, No. Infants recognize actions that they cannot themselves produce (walking, talking…specific movements, not general). Even adults can understand sentences of languages they cannot themselves produce. Answer #2, It is not significantly impaired. “Calder et al. (2000) report the performance of an individual, LP, who had bilateral paralysis of the face from infancy (Mobius syndrome). LP was normal on an unfamiliar face-matching test (Benton Test), impaired on Warrington’s Recognition Test for unfamiliar faces, and borderline impaired on a test requiring recognition of famous people. On a test of facial affect recognition (applying one of the six basic emotions to a face) LP was not Impaired…the fact that the patient could succeed at all on tasks of emotion attribution based on facial affect indicates that the ability to attribute intentional states is not exhausted by simulation of the observed behaviour.” (p. 28)
Camarazza and Mahon (2006) also report on another study where Pavlova et al. (2003) asked whether the degree of motor impairment in 13- to 16-year-old children with congenital motor disorders was inversely related to sensitivity to Johansson point-light displays. There was no significant relationship between visual sensitivity to the point-light displays and severity of motor impairments, while there was a relation between degree of motor impairment and the volume of periventricular lesions in parietal–occipital areas. (p. 28) The subjects never produced these motions in their lives and yet they performed normally in perceiving and understanding these bodily motions.
Lastly, Camarazza and Mahon (2006) report and draw the strong conclusion that a “…patient of (Ochipa et al. 1989) was 17/20 for naming real objects, but could use only 2 of the 20 objects correctly. The performance of patients such as that reported by Ochipa and colleagues falsifies the claim that conceptual knowledge of manipulable objects is grounded in modality-specific output representations required to use them.” (p. 30)
I report this contra-indicating data cited by Camarazza and Mahon (2006) to record the negative results found on the other side of the EC issue. These data seem to indicate that there are at most causal correlations between perceptual-motor activity and cognition, but that such activity is not constitutive of cognition and that such activity may not even be necessary for normal levels of cognitive competence. So, in the end, what do the type data in favor of EC cited here prove?
I agree with Camarazza and Mahon (2006) when they say: “…while simulations over modality-specific input/output representations are not sufficient to ground conceptual content, such “simulations” may contribute in important ways to the “full” meaning of object concepts. In other words, while one’s concept of HAMMER is not represented in terms of information required to use hammers, it might be that information required to use hammers nevertheless adds in important ways to our understanding of hammers.” (Camarrazza and Mahon, p. 30)
In closing, there are some differences between perceptual-motor activity and thinking (cognition) that I believe should not be overlooked, when assessing the claims of EC. First, perceptual/motor experiences have a phenomenal content, a what it’s like….that belief that p does not have. One may believe something more or less strongly, of course, but that is quite different than the taste of sugar or smell of a rose or feel of moving one’s feet in the sand. There is no phenomenology of the sort associated with perceptual-motor activity in cognition. This should not be overlooked. Second, perceptual states generally admit of more or less intensity…knowledge that p (which involves belief) does not…believing can be more or less intense, but not the that p. And third, perceptual states have a particularity (this blue, that bitter). Beliefs have a generality (blueness, bitterness) that perceptual/motor states lack. Again, these differences should not be overlooked by those defending the thesis of embodied cognition. If cognition takes place in the whole brain, including perceptual-motor regions, we would expect such differences not to exist. At the very least, proponents of EC need to be able to explain (away) these differences, if EC is true.