Gestures are spontaneous hand movements that accompany speech (Goldin-Meadow & Brentari, in press; Kendon, 2004; McNeill, 1992). They have the capacity to portray actions or objects through their form (iconic gestures), to represent abstract ideas (metaphoric gestures), to provide emphasis to discourse structure (beat gestures), and to reference locations, items, or people in the world (deictic gestures). Children gesture before they can speak (Bates, 1976; Goldin-Meadow, 2014) and people all over the world have been found to gesture in one way or another (Kita, 2009). Gestures provide a spatial or imagistic complement to spoken language and are not limited to conventions and rules of formal linear-linguistic systems. Importantly, gestures play a unique role in communication, thinking, and learning and have been shown to affect the minds of both the people who see them and the people who produce them (Goldin-Meadow, 2003).

There are many questions that arise when we think about gesture: What makes us gesture? What types of events make gesture likely? What controls how often we gesture? These sorts of questions are all focused on the mechanism of gesture production––an important line of inquiry exploring the structures and processes that underlie how gesture is produced. Rather than ask about the mechanisms that lead to gesture, we focused on the consequences of having produced gesture––that is, on the function of gesture. What effects do gestures have on the listeners who see them and the speakers who produce them? What features of gestures contribute to these effects? How do these features and functions inform our understanding of what exactly gestures are?

We propose that gestures produce effects on thinking and learning, because they are representational actions. When we say that gestures are representational actions, we mean that they are meaningful substitutions and analogical stand-ins for ideas, objects, actions, relations, etc. This use of the term representational should not be confused with the term representational gesture––a category of gestures that look like the ideas and items to which they refer (i.e., iconic and metaphoric gestures). Our proposal that gestures are representational is meant to apply to all types of nonconventional gestures, including representational gestures (iconics, metaphorics), deictic gestures, and even beat gestures. Iconic gestures can represent actions or objects; deictic gestures draw attention to the entities to which they refer; beat gestures reflect discourse structure. Most of this paper explores the functions of iconic and deictic gestures, but we believe that our framework can be applied to all (non-conventional) gestures.

Gestures are representational in that they represent something other than themselves, and they are actions in that they involve movements of the body. Most importantly, the fact that gestures are representational actions differentiates them from full-blown instrumental actions, whose purpose is to affect the world by directly interacting with it (e.g., grabbing a fork, opening a canister). In addition, gestures are unlike movements for their own sake (Schachner & Carey, 2013), whose purpose is the movement itself (e.g., dancing, exercising). Rather, gestures are movements whose power resides in their ability to represent actions, objects, or ideas.

Gestures have many similarities to actions simply, because they are a type of action. Theories rooted in embodied cognition maintain that action experiences have profound effects on how we view objects (James & Swain 2011), perceive other’s actions (Casile & Giese, 2006), and even understand language (Beilock, Lyons, Mattarella-Micke, Nusbaum, & Small, 2008). The Gesture as Simulated Action (GSA) framework grew out of the embodied cognition literature. The GSA proposes that gestures are the manifestation of action programs, which are simulated (but not actually carried out) when an action is imagined (Hostetter & Alibali, 2008). Following at least some accounts of embodied cognition (see Wilson, 2002, for review), the GSA suggests that when we think of an action (or an object that can be acted upon), we activate components of the motor network responsible for carrying out that action, in essence, simulating the action. If this simulation surpasses the “gesture threshold,” it will spill over and become a true motor expression––an overt gesture. The root of gesture, then, according to this framework, is simulation––partial motor activation without completion.

The GSA framework offers a useful explanation of how gesturing comes about (its mechanism) and the framework highlights gesture’s tight tie to action. However, this framework is primarily useful for understanding how gestures are produced, not for how they are understood, unless we assume that gesture comprehension (like language comprehension; Beilock et al., 2008) also involves simulating action. More importantly, the framework does not necessarily help us understand what gestures do both for the people who produce them and for the people who see them. We suggest that viewing gestures as simulated actions places too much emphasis on the action side of gesture and, in so doing, fails to explain the ways in which gesture’s functions differ from those of instrumental actions. The fact that gesture is an action is only one piece of the puzzle. Gesture is a special kind of action, one that represents the world rather than directly impacting the world. For example, producing a twisting gesture in the air near, but not on, a jar will not open the jar; only performing the twisting action on the jar itself will do that. We argue that this representational characteristic of gesture is key to understanding why gesturing occurs (its function).

Our hypothesis is that the effects gesture has on thinking and learning grow not only out of the fact that gesture is itself an action, but also out of the fact that gesture is abstracted away from action––the fact that it is representational. Importantly, we argue that this framework can account for the functions gesture serves both for producers of gesture and for perceivers of gesture. We begin by defining what we mean by gesture, and providing evidence that adults spontaneously view gesture-like movements as representational. Second, we review how gesture develops over ontogeny, and use evidence from developmental populations to suggest a need to move from thinking about gesture as simulated action to thinking about it as representational action. Finally, we review evidence that gesture can have an impact on cognitive processes, and explore this idea separately for producers of gesture and for receivers of gesture. We show that the effects that gesture has on both producers and receivers are distinct from the effects that instrumental action has. In each of these sections, our goal is to develop a framework for understanding gesture’s functions, thereby creating a more comprehensive account of cause in the phenomenon of gesture.

Part 1: What makes a movement a gesture?

Before we can unpack how gesture’s functions relate to its classification as representational action, we must establish how people distinguish gestures from the myriad of hand movements they encounter. Gestures have a few obvious features that differentiate them from other types of movements. The most obvious is that gestures happen off objects, in the air. This feature makes gestures qualitatively different from object-directed actions (e.g., grabbing a cup of coffee, typing on a keyboard, stirring a pot of soup), which involve manipulating objects and causing changes to the external world. A long-standing body of research has established that adults (as well as children and infants) process object-directed movements in a top-down, hierarchical manner, encoding the goal of an object-directed action as most important and ignoring the particular movements used to achieve that goal (Baldwin & Baird, 2001; Bower & Rinck, 1999; Searle, 1980; Trabasso & Nickels, 1992; Woodward, 1998; Zacks, Tversky, & Iyer, 2001). For example, the goal of twisting the lid of a jar is to open the jar––not just to twist one’s hand back and forth while holding onto the jar lid.

In contrast to actions that are produced to achieve external goals, if we interpret the goal of an action to be the movement itself, we are inclined to describe that movement in detail, focusing on its low-level features. According to Schachner and Carey (2013), adults consider the goal of an action to be the movement itself if the movement is irrational (e.g., moving toward an object and then away from it without explanation) or if it is produced in the absence of objects (e.g., making the same to-and-fro movements but without any objects present). These “movements for the sake of movement” can include dancing, producing ritualized movements, or exercising. For example, the goal of twisting one’s hands back and forth in the air when no jar is present might be to just stretch or to exercise the wrist and fingers.

So where does gesture fit in? Gestures look like movements for their own sake in that they occur off objects and, in this sense, resemble dance, ritual, and exercise. However, gestures are also similar to object-directed actions in that the movements that comprise a gesture are not the purpose of the gesture––those movements are a means to accomplish something else––communicating and representing information. Gestures also differ from object-directed actions, however, in their purpose––the purpose of an object-directed action is to accomplish a goal with the object (e.g., to open a jar, grab a cup of coffee); the purpose of a gesture is to represent information and perhaps communicate that information (e.g., to show someone how to open a jar, to tell someone that you want that cup of coffee). The question then is––how is an observer to know when a movement is a communicative symbol (i.e., a gesture) and when it is an object-directed action or a movement produced for its own sake?

To better understand how people know when they have seen a gesture, we asked adults to describe scenes in which a woman moved her hands under three conditions (Novack, Wakefield, & Goldin-Meadow, 2016). In the first condition (action on objects), the woman moved two blue balls into a blue box and two orange balls into an orange box. In the second condition (action off objects with the objects present), the balls and boxes were present, but the woman moved her hands as if moving the objects without actually touching them. Finally, in the third condition (action with the objects absent), the woman moved her hands as if moving the objects, but in the absence any objects.

In addition to the presence or absence of objects, another feature that differentiates object-directed actions from gestures is co-occurrence with speech. Although actions can be produced along with speech, they need not be. In contrast, gestures not only routinely co-occur with speech, but they are also synchronized with that speech (Kendon, 1980; McNeill, 1992). People do, at times, spontaneously produce gesture without speech and, in fact, experimenters have begun to instruct participants to describe events using their hands and no speech (Gibson, Piantadosi, Brink, Bergen, Lim & Saxe, 2013; Goldin-Meadow, So, Özyürek, & Mylander, 2008; Hall, Ferreira & Mayberry, 2013). However, these silent gestures, as they are known, look qualitatively different from the co-speech gestures that speakers produce as they talk (Goldin-Meadow, McNeill & Singleton, 1996; Özçalışkan, Lucero & Goldin-Meadow, 2016; see Goldin-Meadow & Brentari, in press, for discussion). To explore this central feature of gesture, Novack et al. (2016) also varied whether the actor’s movements in their study were accompanied by filtered speech. Movements accompanied by speech-like sounds should be more likely to be seen as a gesture (i.e., as a representational action) than the same movements produced without speech-like sounds.

Participants’ descriptions of the event in the video were coded according to whether they described external goals (e.g., “the person placed balls in boxes”), movement-based goals (e.g., “a woman waved her hands over some balls and boxes”), or representational goals (i.e., “she showed how to sort objects”). As expected, all participants described the videos in which the actor moved the objects as depicting an external-goal, whereas participants never gave this type of response for the empty-handed videos (i.e., videos in which the actor did not touch the objects). However, participants gave different types of responses as a function of the presence or absence of the objects in the empty-handed movement conditions. When the objects were there (but not touched), approximately 70 % of observers described the movements in terms of representational goals. In contrast, when the objects were not there (and obviously not touched), only 30 % of observers mentioned representational goals. Participants increased the number of representational goals they gave when the actor’s movements were accompanied by filtered speech (which made the movement feel like part of a communicative act).

Observers thus systematically described movements that have many of the features of gesture––no direct contact with objects, and co-occurrence with speech––as representational actions. Importantly, participants made a clear distinction between the instrumental object-directed action, and the two empty-handed movements (movements in the presence of objects and movements in the absence of objects), indicating that actions on objects have clear external goals, and actions off objects do not. Empty-handed movements are often interpreted as movements for their own sake. But if the conditions are right, observers go beyond the movements they see to make rich inferences about what those movements can represent.

Part 2: Learning from gestures over development

We now know that, under the right conditions, adults will view empty-handed movements as more than just movements for their own sake. We are perfectly positioned to ask how the ability to see movement as representational action develops over ontogeny. In this section, we look at both the production and comprehension of gesture in the early years, focusing on the development of two types of gestures––deictic gestures and iconic gestures.

Development of deictic gestures

We begin with deictic gestures, because these are the first gestures that children produce and understand. Although deictic gestures have a physically simple form (an outstretched arm and an index finger), their meaning is quite rich, representing social, communicative, and referential intentions (Tomasello, Carpenter & Liszkowski, 2007). Interestingly, deictic gestures are more difficult to produce and understand than their simple form would lead us to expect.

Producing deictic gestures

Infants begin to point between 9 and 12 months, even before they say their first words (Bates, 1976). Importantly, producing these first gesture forms signals advances in children’s cognitive processes, particularly with respect to their language production. For example, lexical items for objects to which a child points are soon found in that child’s verbal repertoire (Iverson & Goldin-Meadow, 2005). Similarly, pointing to one item (e.g., a chair) while producing a word for a different object (e.g., “mommy”) predicts the onset of two-word utterances (e.g., “mommy’s chair”) (Goldin-Meadow & Butcher, 2003; Iverson & Goldin-Meadow, 2005). Not only does the act of pointing preview the onset of a child’s linguistic skills, but it also plays a causal role in the development of those skills. One and a half-year-old children given pointing training (i.e., they were told to point to pictures of objects as the experimenter named them) increased their own pointing in spontaneous interactions with their caregivers, which led to increases in their spoken vocabulary (LeBarton, Goldin-Meadow & Raudenbush, 2015). Finally, these language-learning effects are unique to pointing gestures, and do not arise in response to similar-looking instrumental actions like reaches. Eighteen-month-old children learn a novel label for an object if an experimenter says the label while the child is pointing at the object but not if the child is reaching to the object (Lucca & Wilborn, 2016). Thus, as early as 18 months, we see that the representational status of the pointing gesture can have a unique effect on learning (i.e., language learning), an effect not found for a comparable instrumental act.

Perceiving deictic gestures

Children begin to understand other’s pointing gestures around the same age as they themselves begin to point. At 12 months, infants view points as goal-directed (Woodward & Guajardo, 2002) and recognize the communicative function of points (Behne, Liszkowski, Carpenter, & Tomasello, 2012). Infants even understand that pointing hands, but not nonpointing fists, communicate information to those who can see them (Krehm, Onishi & Vouloumanos, 2014). As is the case for producing pointing gestures, seeing pointing gestures results in effects that are not found for similar-looking instrumental actions. For example, Yoon, Johnson, and Csibra (2008) found that when 9-month-old children see someone point to an object, they are likely to remember the identity of that object. In contrast, if they see someone reach to an object (an instrumental act), 9-month-olds are likely to remember the location of the object, not its identity. Thus, as soon as children begin to understand pointing gestures, they seem to understand them as representational actions, rather than as instrumental actions.

Development of iconic gestures

Young children find it difficult to interpret iconic gestures, which, we argue, is an outgrowth of the general difficulty they have with interpreting representational forms (DeLoache, 1995). Interestingly, even though instrumental actions often look like iconic gestures, interpreting instrumental actions does not present the same challenges as interpreting gesture.

Producing iconic gestures

Producing iconic gestures is rare in the first years of life. Although infants do produce a few iconic gestures as early as 14 months (Acredolo & Goodwyn, 1985, 1988), these early gestures typically grow out of parent-child play routines (e.g., while singing the itsy-bitsy spider), suggesting that they are probably not child-driven representational inventions. It is not until 26 months that children begin to reliably produce iconic gestures in spontaneous settings (Özçalışkan & Goldin-Meadow, 2011) and in elicited laboratory experiments (Behne, Carpenter & Tomasello, 2014) and, even then, these iconic forms are extremely rare. Of the gestures that young children produce, only 1-5 % are iconic (Iverson, Capirci & Caselli, 1994; Nicoladis, Mayberry & Genesee, 1999; Özçalışkan & Goldin-Meadow, 2005). In contrast, 30 % of the gestures that adults produce are iconic (McNeill, 1992).

If gestures are simply a spillover from motor simulation (as the GSA predicts), we might expect children to begin producing a gesture for a given action as soon as they acquire the underlying action program for that action (e.g., we would expect a child to produce a gesture for eating as soon as the child is able to eat by herself). But children produce actions on objects well before they produce gestures for those actions (Özçalışkan & Goldin-Meadow, 2011). In addition, according to the GSA, gesture is produced when an inhibitory threshold is exceeded. Because young children have difficulty with inhibitory control, we might expect them to produce more gestures than adults, which turns out not to be the case (Özçalışkan & Goldin-Meadow, 2011). The relatively late onset and paucity of iconic gesture production is thus not predicted by the GSA. It is, however, consistent with the proposal that gestures are representational actions. As representational actions, gestures require sophisticated processing skills to produce and thus would not be expected in very young children.

Perceiving iconic gestures

Understanding iconic gestures is also difficult for toddlers. At 18 months, children are no more likely to associate an iconic gesture (e.g., hopping two fingers up and down to represent the rabbit’s ears as it hops) or an arbitrary gesture (holding a hand shaped in an arbitrary configuration to represent a rabbit) with an object (Namy, Campbell, & Tomasello, 2004). It is not until the middle of the second year that children begin to appreciate the relation between an iconic gesture and its referent (Goodrich & Hudson Kam, 2009; Marentette & Nicoladis, 2011; Namy, Campbell, & Tomasello, 2004; Namy, 2008; Novack, Goldin-Meadow, & Woodward, 2015). In many cases, children fail to correctly see the link between an iconic gesture and its referent until age 3 or even 4 years (e.g., when gestures represent the perceptual properties of an object; Hodges, Özçalışkan, & Williamson, 2015; Tolar, Lederberg, Gokhale, & Tomasello, 2008).

The relatively late onset of children’s comprehension of iconic gestures is also consistent with the proposal that gestures are representational actions. If gestures were simulations of actions, then as soon as an infant has a motor experience, the infant ought to be able to interpret that motor action as a gesture just by accessing her own motor experiences. But young children who are able to understand an instrumental action are not necessarily able to understand a gesture for that action. Consider, for example, a 2-year-old who is motorically capable of putting a ring on a post. If an adult models the ring-putting-on action for the child, she responds by putting the ring on the post (in fact, children put the ring on the post even if the adult tries to get the ring on the post but doesn’t succeed, i.e., if the adult models a failed attempt). If, however, the adult models a put-ring-on-post gesture (she shows how the ring can be put on the post without touching it), the 2-year-old frequently fails to place the ring on the post (Novack et al., 2015). In other words, at a time when a child understands the goal of an object-directed action and is able to perform the action, the child is still unable to understand a gesture for that action. This difficulty makes sense on the assumption that gestures are representational actions since children of this age are generally known to have difficulty with representation (DeLoache, 1995).

As another example, young children who can draw inferences from a hand that is used as an instrumental action (e.g., an object-directed reach) fail to draw inferences from the same hand used as a gesture. Studies of action processing find that infants as young as 6-months can use the shape of someone’s reaching hand to correctly predict the intended object of the reach (Ambrosini et al, 2013; Filippi & Woodward, 2016). For example, infants expect someone whose hand is shaped in a pincer grip to reach toward a small object, and someone whose hand is shaped in a more open grip to reach toward a large object (Ambrosini et al, 2013)––but they do so only when the handshape is embedded in an instrumental reach. Two-and-a-half-year-olds presented with the identical hand formations as gestures rather than reaches (i.e., an experimenter holding a pincer handshape or open handshape in gesture space) are unable to map the hand cue onto its referent (Novack, Filippi, Goldin-Meadow & Woodward, 2016). The fact that children can interpret handshape information accurately in instrumental actions by 6 months, but are unable to interpret handshape information in gesturing actions until 2 or 3 years, adds weight to the proposal that gestures are a special type of representational action.

Part 3: Gesture’s functions are supported by its action properties and its representational properties

Thus far, we have discussed how people come to see movements as gestures and have used findings from the developmental literature to raise questions about whether gesture is best classified as simulated action. We suggest that, even if gesture arises from simulated action programs, to understand fully its effects, we also need to think about gesture as representational action. Under this account, simulated actions are considered nonrepresentational, and it is the difference between representational gesture and veridical action that is key to understanding the effects that gesture has on producers and perceivers. In this section, we examine similarities and differences between gesture and action and discuss the implications of these similarities and differences for communication, problem solving, and learning.

Gesture versus action in communication

As previously mentioned, one way in which gestures differ from actions is in how they relate to spoken language. Unlike object-directed actions, gestures are seamlessly integrated with speech in both production (Bernardis & Gentilucci, 2006; Kendon, 1980; Kita & Özyürek, 2003) and comprehension (Kelly, Ozyurek, & Maris, 2010), supporting the claim that speech and gesture form a single integrated system (McNeill, 1992). Indeed, the talk that accompanies gesture plays a role in determining the meaning taken from that gesture. For example, a spiraling gesture might refer to ascending a staircase when accompanied by the sentence, “I ran all the way up,” but to out-of-control prices when accompanied by the sentence, “The rates are rising every day.” Conversely, the gestures that accompany speech can influence the meaning taken from speech. For example, the sentence, “I ran all the way up,” is likely to describe mounting a spiral staircase when accompanied by an upward spiraling gesture, but a straight staircase when accompanied by an upward moving point. We discuss the effects of gesture-speech integration for the speakers who produce gesture, as well as the listeners who perceive it.

Producing gesture in communication

Gesture production is spontaneous and temporally linked to speech (Loehr, 2007; McNeill, 1992). Moreover, the tight temporal relation found between speech and gesture is not found between speech and instrumental action. For example, if adults are asked to explain how to throw a dart using the object in front of them (an instrumental action) or using just their hands with no object (a gesture), they display a tighter link between speech and the accompanying dart-throwing gesture than between speech and the accompanying dart-throwing action (Church, Kelly, & Holcombe 2014). Other signatures of the gesture-speech system also seem to be unique to gesture, and are not found in instrumental actions. For example, gestures are more often produced with the right hand (suggesting a link to the left-hemisphere speech system), whereas self-touching adaptors (e.g., scratching, pushing back the hair), which are instrumental actions, are produced with both hands (Kimura, 1973).

The act of producing representational gesture along with speech has been found to have an effect on speakers themselves. Gesturing while speaking can improve the speaker’s lexical access and fluency (Graham & Heywood, 1975; Rauscher, Krauss, & Chen, 1996), help the speaker package information (Kita, 2000), and even lighten the speaker’s working memory load (Goldin-Meadow, Nusbaum, Kelly, & Wagner, 2001; Wagner, Nusbaum, & Goldin-Meadow, 2004). Moreover, movements that are not gestures, such as meaningless hand movements, do not have the same load-lightening effects on the speaker as gestures do (Cook, Yip, & Goldin-Meadow, 2012).

Perceiving gesture in communication

The gestures that accompany a speaker’s talk often emphasize information found in that talk. Seeing gestures has been found to improve comprehension for listeners, particularly for bilinguals with low-proficiency in their second language (Sueyoshi & Hardison, 2005) or for young children (McNeil, Alibali & Evans, 2000). Seeing gestures also has been found to improve listeners’ mental imagery, particularly with respect to spatial topics (Driskell & Radtke, 2003). In a meta-analysis of gesture comprehension studies, messages with gesture were shown to have a moderate, but significant, comprehension advantage for the listener compared with messages without gesture (Hostetter, 2011). But gestures also can provide nonredundant information not found in the speaker’s talk (Church, Garber & Rogalski 2007; Goldin-Meadow 2003; Kelly, 2001; Kelly, Barr, Church & Lynch, 1999; McNeill, 1992), and listeners are able to take advantage of information conveyed uniquely in gesture (Goldin-Meadow & Sandhofer, 1999). For example, listeners are more likely to infer the meaning of an indirect request (e.g., “I’m getting cold”) if that speech is accompanied by a gesture (point to an open window) than if it is produced without the gesture (Kelly et al., 1999). Gesture serves a function not only for speakers but also for listeners.

Moreover, the effects of perceiving gesture are not the same as the effects of perceiving instrumental action. For example, although adults can seamlessly and easily integrate information conveyed in speech with gesture, they often fail to integrate that information with instrumental action. For example, adults can easily ignore actions that are incongruent with the speech with which they are produced, but they have difficulty ignoring gestures that are incongruent with the speech they accompany, suggesting a difference in the relative strength of speech-gesture integration versus speech-action integration (Kelly, Healy, Özyürek, & Holler, 2014). Thus, gesture has a different relationship to speech than instrumental action does and, in turn, has a different effect on listeners than instrumental action.

Gesture versus action in problem solving

Gesture not only has an impact on communication, but it also plays a role in more complex cognitive processes, such as conceptualization and problem-solving. Again, we find that gesture and instrumental action do not influence problem-solving in the same way.

Producing gesture in problem-solving

Viewing gestures as representational action acknowledges that gesture has its base in action. Indeed, gestures often faithfully reflect our action experiences on objects in the world. Take, for example, the Tower of Hanoi task (Newell & Simon, 1972). In this task, participants are asked to move a number of disks, stacked from largest to smallest, from one peg to another peg; the goal is to recreate the stacked arrangement without ever placing a larger disk on top of a smaller disk by moving only one disk at a time. Solving the task involves actions (i.e., moving the disks) and the gestures that participants use to later explain their solution represent elements of the actions that they produced while solving the task in the first place. More specifically, participants who solved the problem using a physical tower produce more grasping gestures and curved trajectories than participants who solved the problem using a computer program in which disk icons could be dragged across the screen using a mouse curser (Cook & Tanenhaus, 2009). Gestures thus reflect a speaker’s action experiences in the world by re-presenting traces of those actions.

As noted earlier, gesturing about an action accomplishes nothing tangible––gesturing about moving disks does not actually move the disks. Even though gesture does not accomplish anything physical, it can change our cognition in ways that action does not. Using the Tower of Hanoi task again as an example, we see that individuals who gesture about how they moved the disks encode the problem differently from individuals who do not gesture. In one study using this paradigm, after explaining how they solved the task and gesturing while doing so, participants were surreptitiously given a new stack of disks that looked like the original stack but differed in weight––the largest disk was now the lightest, the smallest disk became the heaviest and could no longer be lifted with one hand (Goldin-Meadow & Beilock, 2010). Participants who had initially produced one-handed gestures when describing how to move the smallest disk were adversely affected by the switch in weights––the more these participants gestured about the small disk with one hand, the slower their time to solve the problem after the disk weights had been switched (recall that the small disk could now not be moved with one hand). By gesturing about the smallest disk with one hand, participants set themselves up to think of the disk as light––the unanticipated switch in disk weights violated this expectation, leading to relatively poor performance after the switch. Importantly, if participants are not asked to provide explanations before the switch––and thus do not gesture––the switch effect disappears (Beilock & Goldin-Meadow, 2010). Moreover, participants who are asked to act on the objects and actually move the disks while explaining their solution (instead of gesturing) also do not show the switch effect (Trofatter, Kontra, Beilock & Goldin-Meadow, 2014). Gesture can have an effect (in this case, a detrimental effect) on thinking, and it can have a more powerful effect on thinking than action does.

Finally, although gestures contain many components of the actions to which they refer, they also drop out components. Gestures are not, and cannot be, exact replicas of the actions to which they refer. Using the Tower of Hanoi task again as a case study, we see that one cannot veridically represent, in a single gesture, both the force needed to lift a heavy disk and the speed at which the disk is lifted. Incorporating into gesture the actual force needed to lift the disk (while lifting nothing) will necessarily result in a much faster movement than was made when the disk was actually lifted. Conversely, incorporating into gesture the speed at which the disk actually moved (while moving nothing) would not require the same force as is necessary with an object in hand. Thus, gestures are not just smaller versions of actions; they have fundamentally different features from actions and, perhaps as a result, have different functional effects on cognitive processes.

Perceiving gesture in problem-solving

The Tower of Hanoi task also exemplifies the impact that perceiving gesture has on the listener’s conceptualizations. As mentioned in the last section, participants gesture differently as a reflection of how they solved the Tower of Hanoi task, producing smaller arches to represent the movement of the disks if they had solved the task on a computer than if they had solved the task with actual disks (Cook & Tanenhaus, 2009). Participants who saw those gestured explanations, but did not act on the Tower themselves, were influenced by the gestures they saw when they were later asked to solve the problem themselves on a computer. Participants who watched someone explain how to solve the Tower of Hanoi task using gestures with high arches were more likely to produce higher arching movements themselves on the computer (even though it is not necessary to arch the movement at all on the computer) than participants who saw someone use gestures with smaller arches––in fact, the bigger the gestured arcs, the bigger the participant’s movements on the computer screen. The gestures we see can influence our own actions.

Gesture versus action in learning

Gesture also can lead learners to new ideas or concepts, both when learners see gesture in instruction and when they produce gesture themselves. Learners are more likely to profit from a lesson in which the teacher gestures than from a lesson in which the teacher does not gesture (Cook, Duffy & Fenn, 2013; Ping & Goldin-Meadow 2008; Singer & Goldin-Meadow, 2005; Valenzeno, Alibali & Klatzky, 2003). When children gesture themselves, they are particularly likely to discover new ideas (Goldin-Meadow, Cook & Mitchell, 2009), retain those ideas (Cook, Mitchell & Goldin-Meadow, 2008), and generalize the ideas to novel problem types (Novack, Congdon, Hemani-Lopez & Goldin-Meadow, 2014). We argue that gesture can play this type of role in learning, because it is an action and thus engages the motor system but also because it represents information.

Learning from producing gesture

Producing one’s own actions has been found to support learning from infancy through adulthood (see Kontra, Goldin-Meadow, & Beilock, 2012, for a review). For example, 3-month-olds given experience wearing Velcro mittens that helped them grab the objects they reached for, come to interpret successfully other’s goal-directed reaches in a subsequent habituation test. In contrast, infants given experience simply watching someone else obtain objects while wearing the mittens do not come to understand other’s reaches (Gerson & Woodward, 2014; Sommerville, Woodward & Needham, 2005). Even college-aged students benefit from active experience in learning contexts. When physics students are given the chance to feel the properties of angular momentum first-hand (by holding a system of two bicycle wheels spinning around an axel), they score higher on a test of their understanding of force than their counterparts who simply had access to a visible depiction of the angular momentum (i.e., watching the deflection of a laser pointer connected to the bicycle system) (Kontra, Lyons, Fischer, & Beilock, 2015). Finally, neuroimaging data suggest that active experience manipulating objects leaves a lasting neural signature that is found when learners later view the objects without manipulating them (James, 2010; James & Swain, 2011; Longcamp et al., 2003; Prinz, 1997). For example, children given active experience writing letters later show greater activation in motor regions when just passively looking at letters in the scanner compared with children who were given practice looking at letters without writing them (James, 2010). Given that gestures are a type of action and that action affects learning, we might expect learning from gesture to resemble learning from action.

In fact, recent work suggests that learning via producing gesture engages a similar motor network as learning via producing action. When children were taught how to solve mathematical equivalence problems while producing gesture strategies, they later showed greater activation in motor regions when passively solving the types of problems they had learned about compared with children who learned without gesture (Wakefield, et al., 2016). The same motor regions have been implicated in studies looking at the effect of producing action on learning (James 2010; James & Atwood, 2009; James & Swain, 2011), suggesting that gesture and action are similar in the effect they have on the brain.

But gestures differ from actions in a number of ways, and these differences might influence the impact that producing gesture has on learning. First, as mentioned earlier, actions are produced on objects; gestures are not. To compare the effects of learning via gesture versus learning via action, Novack and colleagues (2014) taught third-graders to produce actions on objects or gestures off objects during a math lesson. Children were shown movable number tiles placed over numbers in problems, such as 4 + 7 + 2 =__ + 2. Children in the Action condition were taught to pick up the first two number tiles (4 and 7) and then hold them in the blank. Children in the Concrete Gesture condition were taught to move their hands as if they were picking up the tiles and holding them in the blank but without actually moving them. Finally, children in the Abstract Gesture condition were taught to produce a V-point gesture to the first two numbers and then a point to the blank. In all three conditions, children were using their hands to represent a strategy for solving the problem––the grouping strategy in which the two numbers on the left side of the equation that are not found on the right are added and the sum is put in the blank. But the conditions differed in whether the hands actually moved objects. Although children in all three conditions learned how to solve the types of problems on which they had been trained, only children in the gesture conditions were able to transfer what they had learned to problems with a different format (near-transfer problems, e.g., 4 + 7 + 2 = 4 + __; far-transfer problems, e.g., 4 + 7 + 2 = __+ 6). Children in the Action condition seemed to have gotten “stuck” in the concrete nature of the movements, learning how to solve the problem at a shallow level that did not lead to transfer. Even more surprising, children in the concrete gesture condition were less successful on far-transfer problems than children in the abstract gesture condition, suggesting that the closer a gesture’s form is to action, the closer the gesture comes to behaving like action.

Understanding how learners are affected by gesture compared to object-directed action is particularly important given the widespread use of manipulatives in educational settings (see Mix, 2010, for review). Manipulatives, or external symbols, are thought to help learners off-load some of the cognitive burden involved in maintaining abstract ideas in mind. Children can use concrete external symbols as a reference to be revisited, freeing up cognitive resources for other processing tasks. Importantly, external symbols can be moved and acted on, allowing for the integration of physical, motor processes with abstract conceptual ideas. Despite these potential benefits of learning through action, and consistent with findings on learning through gesture, research from the education literature casts doubt on manipulative-based learning. Interacting with a manipulative can encourage learners to focus on the object itself rather than its symbolic meaning (Uttal, Scudder & DeLoache, 1997). The perceptual features of objects can be distracting (McNeil, Uttall, Jarvin & Sternberg, 2009), and young children in particular may lose track of the fact that the manipulatives not only are objects but also stand for something else (DeLoache, 1995). Gesture has the potential to distance learners from the concrete details of a manipulative, thus encouraging them to approach the concept at a deeper level.

Learning from perceiving gesture

The gestures that children see in instruction also have beneficial effects on learning (Cook, et al., 2013; Ping & Goldin-Meadow 2008; Singer & Goldin-Meadow, 2005; Valenzeno, et al., 2003). Some have suggested that seeing gestures can help learners connect abstract ideas, often presented in speech, to the concrete physical environment (Valenzeno et al., 2003). Seeing gesture also might support learning through the same mechanisms as producing gesture, that is, by engaging the motor system. Listeners recruit their own motor systems when listening to speakers who gesture (Ping, Goldin-Meadow, & Beilock, 2014), and neuroimaging research suggests that recruiting the motor system may be key in learning. Adults learn more foreign words if they are taught those words while seeing someone produce meaningful iconic gestures compared with seeing someone produce meaningless movements (Macedonia, Muller, & Friederici, 2011). Those adults then activate areas of their premotor cortex when later recognizing words initially learned while seeing gesture, implicating the motor cortex in learning from seeing gesture.

Another way that perceiving gesture might have an impact on learning is through its ability to integrate with speech. Children are more likely to learn from a math lesson if the teacher provides one problem-solving strategy in speech simultaneously with a different, complementary strategy in gesture (S1+G2) than if the teacher provides the same two strategies in speech (S1→S2), which, of course, must be produced sequentially (Singer & Goldin-Meadow, 2005). Moreover, it is gesture’s ability to be produced simultaneously with speech that appears to promote learning. Children are more likely to learn from the math lesson if the gesture strategy and the speech strategy occur at the same time (S1+G2) than if the speech strategy occurs first, followed by the gesture strategy (S1→G2). In other words, the benefit of simultaneous speech+gesture instruction disappears when the two strategies are presented sequentially rather than simultaneously in time (Congdon et al, 2016). A question for future work is whether learning through action will also be affected by timing––that is, will learning differ when an action problem-solving strategy is presented simultaneously with speech, compared to when the same action strategy is presented sequentially with speech? We suspect that this is yet another area where learning via gesture will differ from learning via action.

Part 4. Open questions and areas for future research

We have shown that, although gesture may be an effective learning tool, at least in part, because it is a type of action, it is the fact that gesture is abstracted action, or representational action, that likely gives rise to its far-reaching learning outcomes. Viewing gesture as representational action explains many of the benefits gesture confers in instruction and also may explain cases where using gesture in instruction is suboptimal. For example, gesture instruction is less useful than action instruction for 2-year-olds (Novack et al., 2015), likely because, at this young age, children are only beginning to be able to decode representational forms. Gesture instruction also has been shown to be less useful than action instruction in children with a rudimentary understanding of a concept (Congdon & Levine, 2016), raising the possibility that a learner’s initial understanding of a task affects that learner’s ability to profit from a lesson on the task containing representational action. In the final section, we explore open questions of this sort and discuss how their answers can inform the proposed framework.

One major topic that we have touched on in this paper, but that would benefit from additional research, is the relative effect of producing gesture versus perceiving gesture. We have provided evidence suggesting that gesture’s functions arise from its status as representational action both for the producer of gesture and for the perceiver of gesture. Thus, we believe that our framework can be applied to both situations. However, the magnitude of gesture’s effects may not be identical for doing versus seeing gesture (Goldin-Meadow et al., 2012). Moreover, there might be effects on thinking and learning that depend on whether a person is perceiving or producing a gesture. For example, gesture’s ability to support learning, retention, and generalization may depend on whether the gesture is produced or perceived. When children are shown a gesture that follows speech and is thus produced on its own, they do no better after instruction than when the same information is displayed entirely in speech (Congdon et al., 2016). In other words, learning from a seen gesture may depend on its being produced simultaneously with speech. In contrast, when children are told to produce a gesture, they profit from that instruction (Brooks & Goldin-Meadow, 2015) and retain what they have learned (Cook, et al., 2008) even when the gesture is produced on its own without speech. Learning from a produced gesture does not seem to depend on its being produced along with speech. Producing versus perceiving gesture might then function through distinct mechanisms, although we suggest that gesture’s status as a representational form is still essential to both. Additional studies that directly compare learning from seeing versus doing gesture are needed to determine whether the mechanisms that underlie these two processes are the same or different, and whether seeing versus doing gesture interacts with interpreting gesture as representational action. For example, it may be easier to think of a gesture as representational when producing it (even if it’s a novel action) than when seeing someone else produce the gesture.

A related open question is whether movement is categorized as gesture in the same way for perceiving versus producing movement. We reviewed evidence about when perceivers of a movement see the movement as representational (Novack et al., 2016). However, it is unclear whether the same features lead producers of a movement to see the movement as representational. This question is particularly relevant in tasks where learners are taught to produce movements during a lesson (Goldin-Meadow et al., 2009). These movements are meaningless to the learner at the beginning of the lesson. The question is whether the movements become meaningful, and therefore representational, during the lesson and, if so, when? Do children think of these hand movements as “gesture” when they are initially taught them, or do they think of them first as “movement-for-its-own sake” and only gradually come to see the movements as “gesture” as their conceptual understanding of the lesson shifts? If the process is gradual, might there be markers or features within the movement itself that an observer could use to determine when a rote movement has become a true gesture?

This possibility brings into focus whether learners need to be aware of the representational status of a gesture in order to benefit from that gesture during instruction. Although gesture-training studies find that, on average, instruction with gesture supports learning better than instruction without gesture (see Novack & Goldin-Meadow, 2015, for review), there is always variability in learner outcomes. Might a learner’s ability to profit from a gestural movement be related to that learner’s ability to categorize that movement as meaningful? Perhaps only learners who see a movement as gesture will benefit from incorporating that movement into instruction. Alternatively, learners may be able to benefit from gesture in instruction without being explicitly aware of its representational properties. Thomas and Lleras (2009) found that adults asked to produce arm movements that were consistent with the solution to an unrelated problem were more likely to subsequently solve the problem, compared to adults asked to produce arm movements that were inconsistent with the solution to the problem (see also Brooks & Goldin-Meadow, 2015, for similar evidence in children). Importantly, these adults were not aware of the link between the arm movements and the problem-solving task (they were told that the arm movements were for “exercise breaks”). Thus, at least in some cases, learners do not need to consciously see a movement as meaningful in order to learn from it.

Another open question related to the issue of categorizing movement as gesture or instrumental action is whether there are in-between cases. Clark (1996) has identified a class of movements called demonstrations––actions produced with the intention of showing something to someone. For example, if a mother were to show her child how to open a jar, she could hold the jar out in front of the child, twist open the lid in an exaggerated manner, and then put the lid back on the jar, handing the jar to the child to try the action himself. This object-focused movement has elements of an instrumental action––the mother’s hands directly interact with the object and cause a physical change. However, the movement also has obvious elements of representational actions––in the end, the jar is not open and the movement is clearly performed for communicative (as opposed to purely instrumental) purposes. As another example, consider “hold-ups”––gestures in which someone holds up an object to display it to someone else (e.g., a child holds up her bottle to draw it to her mother’s attention). Hold-ups have some aspects of gesture––they are intended to communicate and are like deictic pointing gestures in that they indicate a particular object. They also have aspects of instrumental actions; they are produced directly on objects. Developmentally, hold-ups tend to emerge before pointing gestures (Bates Camaioni & Volterra, 1975), lending credence to the idea that hold-ups may not be as representational as empty-handed gestures. The important question from our point of view is whether hold-ups function like gestures for the child. It turns out that they do, in at least in one sense; they predict the onset of various aspects of spoken language. For example, hold-ups have been counted as deictic gestures in studies finding that early gesture predicts the size of a child’s subsequent spoken vocabulary (Rowe & Goldin-Meadow, 2009), the introduction of particular lexical items into a child’s spoken vocabulary (Iverson & Goldin-Meadow, 2005), and the developmental onset of noun phrases (Cartmill, Hunsicker & Goldin-Meadow, 2014).

In terms of developmental questions, although we have reviewed evidence exploring the features that encourage adults to view a hand movement as a representational action (Novack et al., 2016), it is an open question as to what infants think about the movements they see. Infants have a special ability to process actions on objects (Woodward, 1998), but how do infants process actions off objects, that is, gestures? One possibility is that infants think of gestures as movements for their own sake––seeing them as mere hand waving (Schachner & Carey, 2013). Another possibility is that, despite the fact that infants may not be able to correctly interpret the meaning of a gesture, they can nonetheless categorize the gesture as a representational act. Just as infants seem to know that speech can communicate even if they cannot understand that speech (Vouloumanos, Onishi & Pogue, 2012), infants might know that gestures are meant to represent without being able to understand what they represent. Knowing what infants think about gesture (and whether they categorize it as a unique form) would contribute to our understanding of the development of gesture processing.

Finally, with respect to how gestures affect learning, additional research is needed to determine when in the learning process, and for which content domains, gesture instruction is particularly helpful. Gesture’s status as representational action might mean that it is most useful for some content domains, and not others. For example, gesture has been shown to support generalization and retention in math instruction. But math is a relatively abstract subject. Gesture may be less useful in domains grounded in physical experience, such as physics, where direct action on objects has been found to support learning (Kontra et al., 2015; generalization has yet to be studied in these domains). There are, however, domains that are grounded in physical experience, such as dance, where practicing gesture-like movements has been found to promote learning better than practicing the actual dance movements (Kirsh, 2010, 2011). Dancers often “mark” their movements, a form of practice in dance that involves producing attenuated versions of dance moves. Marking is comparable to gesturing in that the movements are produced to represent other movements, rather than to have a direct effect on the world (i.e., they represent movements that will, in the end, be seen by an audience, see Kirsh, 2011). Dancers use marking when practicing on their own, as well as when communicating with other dancers (Kirsh, 2010). This marking seems to function like gesture in that it promotes learning, even though dance is grounded in physical action.

Conclusions

In this paper, we present a framework for understanding gesture’s function. We propose that gesture has unique effects on thinking and learning because of its status as representational action. More specifically, the fact that gesture is representational action, and not instrumental action, is critical to its capacity to support generalization beyond the specific and to support retention over a period of time. Our proposal is agnostic about whether gesture’s role in learning depends on its being embodied, and about whether the Gesture-as-Simulated-action framework can account for how gesture is produced, its mechanism. The proposal is designed to account for why gesture is produced, that is, for the functions it serves, particularly in a learning context. Our proposal is thus not inconsistent with the mechanistic account of gesture production proposed in the GSA framework (Hostetter & Alibali, 2008). But it does offer another perspective—a functional perspective that highlights the differences between gestures and other types of actions.

Although in some cases mechanism and function are critically related, in other cases they are not. For example, consider an alligator’s nightly sojourn into the Mississippi River. The functional explanation for this phenomenon is that the alligator is cold-blooded, and in the evening the river water is warmer than the air; entering the water at night serves the function of helping the alligator maintain its body temperature during the overnight hours. However, the mechanism by which this behavior comes about has nothing to do with temperature and depends instead on changes in sunlight. The alligator heads for the water in response to fading light, a relationship that was discovered by experimentally dissociating temperature from light. Alligators approach the water as the light fades whether or not the temperature changes and do not approach the water as the temperature drops unless the light fades (Lang, 1976). Thus, the temperature-regulation function of the behavior (going into water to regulate temperature) is different from its light-sensitive mechanism (going into the water in response to changes in light). We therefore cannot assume that the function of a phenomenon is the complement of its mechanism and must explore function in its own right. Our hope is that by expanding the investigation of gesture to include a framework built around its functions, we will come to a more complete understanding of how and why we move our hands when we talk.