What the hands can tell us about language emergence
Why, in all cultures in which hearing is possible, has language become the province of speech and the oral modality? I address this question by widening the lens with which we look at language to include the manual modality. I suggest that human communication is most effective when it makes use of two types of formats––a discrete and segmented code, produced simultaneously along with an analog and mimetic code. The segmented code is supported by both the oral and the manual modalities. However, the mimetic code is more easily handled by the manual modality. We might then expect mimetic encoding to be done preferentially in the manual modality (gesture), leaving segmented encoding to the oral modality (speech). This argument rests on two assumptions: (1) The manual modality is as good at segmented encoding as the oral modality; sign languages, established and idiosyncratic, provide evidence for this assumption. (2) Mimetic encoding is important to human communication and best handled by the manual modality; co-speech gesture provides evidence for this assumption. By including the manual modality in two contexts––when it takes on the primary function of communication (sign language), and when it takes on a complementary communicative function (gesture)––in our analysis of language, we gain new perspectives on the origins and continuing development of language.
KeywordsGesture Mimetic encoding Homesign Sign language
Humans are equipotential with respect to the modality of the language they learn––if exposed to language in the manual modality, that is, to a signed language, they learn that language as quickly and effortlessly as children exposed to language in the oral modality (Newport & Meier, 1985; Petitto, 1992). Thus, on an ontogenetic time scale, humans can, without retooling, acquire language in either the manual or oral modality. Why then, on an evolutionary time scale, has the oral modality become the channel of choice for languages across the globe?
The oral modality could have triumphed over the manual modality simply because it is so good at encoding messages in the segmented and combinatorial format that characterizes human languages. But the manual modality turns out to be just as good as the oral modality at segmented and combinatorial encoding. Why then would language be biased to become the province of the oral modality rather than the manual modality?
I consider the possibility that the oral modality took over segmented and combinatorial encoding, not because of its strength at conveying information in a segmented and combinatorial format, but because of its weakness in conveying information mimetically. This weakness is relative. There is increasing evidence that the oral modality can convey some information mimetically (Dingemanse, 2012; Dingemanse, Blasi, Lupyan, Christiansen & Monaghan, 2015; Haiman, 1985; Hinton, Nichols & Ohala, 1994; Nuckolls, 1999; Perlman & Cain, 2014, Shintel, Nusbaum & Okrent, 2006). But unlike the manual modality where it’s easy to find transparency between form and meaning, we have to search for this type of transparency in the oral modality. As an example, if adults are asked to create labels for objects and actions with their hands and no voice, they create comprehensible symbols more easily than if they are asked to create the same labels with voice alone (Fay, Lister, Ellison & Goldin-Meadow, 2014).
Conveying information in an analog and mimetic format turns out to be important to human communication, and this function is well served by the manual modality––in the form of spontaneous gestures that accompany speech (Feyereison & de Lannoy, 1991; McNeill, 1992). If both segmented and mimetic encoding are essential to human communication, and if mimetic encoding falls to the manual modality because it’s so good at it, segmented encoding falls, by default, to the oral modality. This argument rests on (at least) two assumptions: that the manual modality is good at segmented and combinatorial encoding and that mimetic encoding is important to human communication and is handled well by the manual modality. In the next two sections, I review evidence for each of these assumptions.
Manual modality is good at segmented and combinatorial encoding
The first assumption is that the manual modality is as adept as the oral modality at segmented and combinatorial encoding. In fact, sign languages have been shown to have the essential properties of segmentation and combination that characterize all spoken language systems, despite the fact that sign languages are processed by hand and eye rather than by mouth and ear (Stokoe, 1960). For example, American Sign Language (ASL) is structured at the level of the sentence (i.e., syntactic structure; Liddell, 1980), at the level of the sign (i.e., morphological structure; Mathur & Rathmann, 2010), and at the level of subsign, and meaningless, elements akin to phonemes (i.e., phonological structure; Brentari, 1995), just as spoken languages are.
Unlike spoken languages, however, the form-meaning pairs that comprise the morphology of ASL are not produced in a linear string but are often produced simultaneously (see Goldin-Meadow & Brentari, 2016, for further discussion of where sign languages do and do not resemble spoken languages). For example, the ASL verb “ask both” is composed of two parts simultaneously produced––“ask,” which involves moving the index finger away from the chest area and bending it as it moves, and “both” which involves reduplicating the motion. The sign “ask both” is produced by superimposing the grammatical morpheme “both” on the uninflected form of “ask,” resulting in reduplication of the basic outward bending movement, once directed to the left and once to the right (Klima & Bellugi, 1979). Despite the fact that the morphemes of ASL are produced simultaneously, they have psychological integrity as isolable parts. For example, children acquiring ASL produce the meaningful parts of signs (the morphemes) in isolation and prior to combining them into composite wholes (Newport, 1981; Supalla, 1982), even though the parts do not appear in isolation in their input. Thus, sign language, when developed within a community and passed down from generation to generation, is characterized by a system of segmented units that combine in rule-governed fashion.
Even more striking, segmentation and combination characterize communication in the manual modality when that communication is invented within a single generation by a deaf child of hearing parents (Goldin-Meadow, 2003a). Deaf children exposed from birth to a conventional sign language, such as ASL, acquire that language following steps comparable to those of hearing children acquiring a spoken language (Newport & Meier, 1985). However, 90% of deaf children are not born to deaf parents who could provide early exposure to conventional sign language. Rather, they are born to hearing parents who, not surprisingly, speak to their children and want their children to learn to speak. Unfortunately, it is difficult for deaf children with severe to profound hearing losses to spontaneously acquire the spoken language of their hearing parents, and their speech is typically markedly delayed even when given intensive instruction (Mayberry, 1992). In addition, unless hearing parents send their deaf children to a school in which sign language is used, the children are not likely to be exposed to a conventional sign system.
Despite their lack of a usable model of conventional language, deaf children of hearing parents do manage to communicate and do so using a self-created system of gestures called homesign. Homesign systems are characterized by a variety of language-like properties, including segmentation and combination. Rather than mimetically portray a scene with their entire bodies, homesigners convey the message using segmented gestures combined into a rule-governed string. For example, rather than going over to a cookie jar and pretending to remove the cookie and eat it, the child points at the cookie and then jabs her hand several times toward her mouth, effectively conveying “cookie-eat.” Importantly, the gesture strings generated by each homesigner follow simple “rules” that predict which semantic elements are likely to be gestured and where in the gesture string those elements are likely to be produced. For example, homesigners tend to produce gestures for the object of an action (cookie in this example) and tend to put the gesture for the object before the gesture for the action (cookie-eat) (Feldman, Goldin-Meadow & Gleitman, 1978; Goldin-Meadow & Feldman, 1977; Goldin-Meadow & Mylander, 1984, 1990). Homesign thus has sentence structure.
In addition to structure at the sentence level, each child’s homesign system also has structure at the word level. Each gesture is composed of a handshape and a motion component, and the meaning of the gesture as a whole is determined by the meanings of each of these parts (Goldin-Meadow, Mylander & Butcher, 1995; Goldin-Meadow, Mylander & Franklin, 2007). For example, a child moves his fist hand in a rotating motion to request the experimenter to turn the key on a toy. The fist handshape represents an “object with a small diameter” in this gesture and in the child's entire corpus of gestures, and the rotate motion represents “twisting” in this gesture and the entire gesture corpus. When produced together within a single gesture, the component parts combine to create the meaning of the whole, “twisting an object with a small diameter.” In addition to combining components to create the stem of a gesture, homesigners can alter the internal parts of a gesture (the number of times a motion is performed, and the placement of the gesture) to mark the grammatical function of that gesture, in particular, to distinguish between a noun role and a verb role (Goldin-Meadow, Butcher, Mylander & Dodge, 1994). For example, when using the fist+rotate gesture as a noun to refer to the key, one homesigner tended to produce the rotating motion only once and in neutral space (near the chest area); in contrast, when using the fist+rotate gesture as a verb to refer to the twisting act, the homesigner produced the rotating motion several times and extended it toward (but not on) the key. Thus, the parts of a gesture vary as a function of the gesture’s role in discourse, suggesting morphological structure.
Importantly, the structure found at the sentence and word levels in each homesigner’s gesture system cannot be traced back to the spontaneous co-speech gestures used by the child’s hearing parents (Goldin-Meadow & Mylander, 1983, 1984, 1998; Goldin-Meadow et al., 1994, 1995, 2007; Hunsicker & Goldin-Meadow, 2012). The systems thus appear to have been generated by the children themselves. It is consequently of particular interest that these self-created gesture systems contain the properties of segmentation and combination, properties that characterize all naturally evolving language systems, spoken or signed.
Mimetic encoding is important to human communication and is handled well by the manual modality
The second assumption on which the argument rests is that mimetic encoding is an important aspect of human communication, well served by the manual modality. The gestures that hearing speakers spontaneously produce as they talk provide evidence for this assumption. Although, as we have just seen, the manual modality can serve as a medium for language, communication in the manual modality does not always assume language-like forms. When speakers use their hands to gesture, those co-speech gestures convey meaning, but the gestures are not characterized by the analytic format found in speech (McNeill, 1992). Co-speech gesture thus conveys meaning differently from speech. Speech conveys meaning by rule-governed combinations of discrete units, codified according to the norms of that language. In contrast, gesture conveys meaning mimetically and idiosyncratically through continuously varying forms.
McNeill (1992:41) lists four fundamental properties of co-speech gesture: (1) Gestures are global in meaning, which means that the parts of a gesture are dependent for their meaning on the whole. (2) Gestures are non-combinatoric and thus do not combine to form larger, hierarchically structured forms. Most gestures are one to a clause and, even when there are successive gestures within a clause, each corresponds to an idea unit in and of itself. (3) Gestures are context-sensitive and are free to reflect only the salient and relevant aspects of the context. Each gesture is created at the moment of speaking and highlights what is relevant. Because gestures are sensitive to the context of the moment, there is variability in the forms a gesture takes within a speaker. (4) Gestures do not have standards of form, and different speakers display the same meanings in idiosyncratic ways, resulting in variability in gesture form across speakers.
Despite the fact that gesture and speech represent meaning in different ways, the two modalities form a single system and are integrated both temporally and semantically. For example, the gesture and the linguistic segment representing the same information as that gesture are aligned temporally (Kendon, 2004). A speaker produced the following iconic gesture when describing a scene from a comic book in which a character bends a tree back to the ground (McNeill, 1992): He grasped his hand as though gripping something and pulled the hand back. He produced this gesture as he uttered the words “and he bends it way back.” The gesture was a concrete description of precisely the same event described in speech and thus contributed to a semantically coherent picture of a single scene. In addition, the speaker produced the “stroke” of the pulling-back gesture just as he said, “bends it way back.” Typically, the stroke of a gesture tends to precede or coincide with (but rarely follow) the tonic syllable of its related word, and the amount of time between the onset of the gesture stroke and the onset of the tonic syllable of the word is quite systematic––the timing gap between gesture and word is larger for unfamiliar words than for familiar words (Morrell-Samuels & Krauss, 1992). The systematicity of the relation suggests that gesture and speech are part of a single production process.
Given that gesture and speech convey meaning differently (albeit within a unified system), it is possible for the meanings expressed in each of the two modalities to complement one another, creating a richer picture than the view offered by either modality alone. For example, when describing Granny’s chase after Sylvester in a cartoon narrative, a speaker said, “she chases him out again,” while moving her hand as though swinging an umbrella (McNeill, 1992). Speech conveys the ideas of pursuit and recurrence while gesture conveys the weapon used during the chase. Both speech and gesture refer to the same event, but each presents a different aspect of it. As a second example, a speaker who may not be able to convey a particular meaning in speech may still be able to express that meaning in gesture. At a certain stage in the acquisition of mathematical equivalence, a child explains that she solved the problem 6+3+4=__+4 by adding all of the numbers on both sides of the equation––she says, “I added the 6, the 3, the 4, and the other 4 and got 17”––never commenting on the fact that the equal sign divides the equation into two parts. However, in her gestures, the same child conveys just this notion––she produces a sweeping gesture under the 6, the 3, and the 4 on the left side of the equation with her left hand, and the same sweeping gesture under the blank and the 4 on the right side of the equation with her right hand (Perry, Church, & Goldin-Meadow, 1988). In fact, gesture can convey aspects of equivalence that are not found anywhere in a child's speech (Alibali & Goldin-Meadow, 1993; Goldin-Meadow, Alibali & Church, 1993). In this way, gesture expands on the representational possibilities offered by the codified spoken system.
Not only can co-speech gesture add information to the information conveyed in speech, but the juxtaposition of two different messages––one in gesture and another in speech––appears to have cognitive significance. These gesture-speech “mismatches,” as these types of utterances are known (Church & Goldin-Meadow, 1986), can tell us when a learner is ready to make progress on a task––children who produce many gesture-speech mismatches when explaining their solutions to a given task are in a transitional state with respect to that task and are ready to learn it (Church & Goldin-Meadow, 1986; Goldin-Meadow, 2003b; Perry et al., 1988). Note that it is the juxtaposition of information from both modalities that predicts readiness-to-learn, pointing to the importance of having two vehicles for communication that can be deployed simultaneously (see Congdon et al., 2016, for evidence that gesture’s ability to be simultaneously produced with speech is an essential factor in promoting learning and generalization).
But is it the juxtaposition of two modalities per se that predicts learning or the juxtaposition of the two representational formats found in those modalities that predicts learning––the segmented and combinatorial format characteristic of speech, and the analog and mimetic format characteristic of gesture? To test this hypothesis, we need to know whether juxtaposing different ideas in two representational formats predicts learning even if the ideas are conveyed within the same modality. This hypothesis is difficult to test in hearing speakers, who gesture in one modality and speak in another. But the hypothesis can be tested in deaf signers, who produce gestures in the manual modality along with their signs, which also are in the manual modality (Emmorey, 1999; Duncan, 2005). If juxtaposing different ideas across two modalities is the key ingredient, within-modality mismatch should not predict learning in signers. Alternatively, if juxtaposing different ideas conveyed in two representational formats (an analog and mimetic format underlying gesture vs. a discrete and segmented format underlying language, spoken or signed) is key, within-modality mismatch should predict learning in signers just as across-modality mismatch predicts learning in speakers.
Goldin-Meadow, Shield, Lenzen, Herzig and Padden (2012) tested these possibilities in 40 ASL-signing deaf children asked to explain their solutions to math problems and then given instruction in those problems. Children who produced many gestures conveying information not found in the signs the gestures accompanied (i.e., gesture-sign mismatches) were more likely to succeed after instruction than children who produced few, suggesting that mismatch can occur within-modality, and that within-modality mismatch predicts learning just as across-modality mismatch does. This study thus reinforces the assumption with which we began––the mimetic representational format in gesture is a cognitively important component of communication, seamlessly integrated with the segmented representational format found in sign or in speech.
Why it’s good to have a segmented code and a mimetic code
Corballis (1989:500) describes the benefits of a generative system based on categorical elements for human language and thought––“Generativity is a powerful heuristic, for it allows us to describe, represent, or construct an enormous variety of composites, given only a relatively small number of building blocks and rules of construction.” At the same time, however, Corballis (1989) notes the limitations of generativity. A generative system becomes unworkable if the number of units in the system is too large. Moreover, the relatively small number of units required to make the system manageable also makes it difficult to capture subtle distinctions. These distinctions may be more easily expressed via an analog representational format. For example, a verbal description of the shape of the east coast of the United States is likely, not only to be very cumbersome, but also to leave out important information about the coastline (Huttenlocher, 1973; 1976). It is just this information that can easily be captured in a mimetic gesture tracing the outline of the coast. Having a mimetic code alongside a segmented and combinatorial code creates a composite communication system that not only is generative but also is responsive to the context-specific communicative needs of human speakers. Such an integrated system retains the virtues of categorical generativity, while avoiding the unworkability of an over-refined linguistic code. A mimetic code helps realize the advantages of the categorical code.
It is the manual modality––not the oral modality––that is particularly well suited to mimetic representation. As a result, the manual modality takes over the mimetic aspects of human communication, leaving the analytic aspects by default to speech. Under this scenario, the mimetic and linguistic sides of language evolved together, producing a single system (McNeill, 2012). In other words, the argument does not assume a gesture-first explanation of language evolution (Hewes, 1973; Armstrong, Stokoe, & Wilcox, 1995; Corballis, 2002) and, in fact, aligns better with the view that both modalities work together at all points during evolution and development––their relative contributions may wax and wane, but both modalities participate in communication throughout.
This arrangement allows for the simultaneous production of both formats, making possible the flexibility and scope of human language. Note that the alternative arrangement––in which the manual modality assumes the segmented code and the oral modality serves the mimetic functions––also allows for the simultaneous production of the two formats, but it has the disadvantage of forcing the oral modality to be unnaturally imagistic in form (although see Dingemanse, 2012; Dingemanse et al., 2015; Haiman, 1985; Hinton et al., 1994; Nuckolls, 1999; Perlman & Cain, 2014, Shintel et al., 2006, for evidence that the oral modality does exhibit some iconic properties). If the argument is correct, speech became the predominant medium of human language not because it is so well suited to the segmented and combinatorial requirements of symbolic communication (the manual modality is equally suited to the job), but rather because it is not particularly good at capturing the mimetic components of human communication (a task at which the manual modality excels).
This speculation about the importance of maintaining a vehicle for mimetic representation along with speech leads us to think again about sign language. In sign, it is the manual modality that assumes the segmented and combinatorial form essential to human language. Can the manual modality be used for holistic and mimetic expression at the same time? Do signers gesture along with their signs? We have seen that the answer to this question is “yes,” and that gesture is integrated into sign just as seamlessly as gesture is integrated into speech. Nevertheless, there may be disadvantages to having both segmented and mimetic encoding within a single modality, disadvantages that studies of gesture in sign can help us uncover. Moreover, given that it is possible to observe language emergence in the manual modality (see, for example, Goldin-Meadow, 2005; Senghas, 2003), we may be able to pinpoint the moment in development when a manual communication system takes on both segmented and mimetic encoding. We know that homesigners create a segmented and combinatorial code from the earliest stages (Goldin-Meadow, 2003a). Do they also convey information mimetically, as signers of established languages do, and, if not, at what point in their development can they be said to gesture?
Of course, homesign cannot be taken as a simulation of the first creation of language in hominid evolution simply because modern-day homesigners are developing their gesture systems in a world in which language and its consequences are pervasive. But homesign, and the subsequent steps that homesign takes to become a full-fledged language over generations (Goldin-Meadow et al., 2015), can offer insight into the pressures that make a language system change. By widening our lens to include communication in the manual modality when it takes on the primary function of communication and a segmented code (as in established sign languages and homesign) and when it takes on a complementary communicative function and a mimetic code (as in co-speech and co-sign gesture), we gain new perspectives on the origins and continuing development of language.
This work was supported by Grant No. R01 DC00491 from the National Institute on Deafness and other Communication Disorders, Grant No. R01-HD47450 from the National Institute of Child Health and Human Development, and Grant No. BCS-0925595 from the National Science Foundation.
- Brentari, D. (1995). Sign language phonology: ASL. In J. Goldsmith (Ed.), A Handbook of Phonological Theory (pp. 615–639). NY: Basil Blackwell.Google Scholar
- Congdon, E.L., Novack, M.A., Brooks, N., Hemani-Lopez, N., & O’Keefe, L., & Goldin-Meadow, S. (2016). Better together: Simultaneous presentation of speech and gesture in math instruction supports generalization and retention. Under review.Google Scholar
- Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton, NJ: Princeton University Press.Google Scholar
- Duncan, S. (2005). Gesture in signing: A case study from Taiwan Sign Language. Language and Linguistics, 6(2), 279–318.Google Scholar
- Fay, N., Lister, C., Ellison, T.M., & Goldin-Meadow, S. (2014). Creating a communication system from scratch: Gesture beats vocalization hands down. In I. Berent & S. Goldin-Meadow (research topic eds), Language by Hand and by Mouth, Frontiers in Psychology (Language Sciences), 5, 354. doi: 10.3389/fpsyg.2014.00354
- Feldman, H., Goldin-Meadow, S., & Gleitman, L. (1978). Beyond Herodotus: The creation of language by linguistically deprived deaf children. In A. Lock (Ed.), Action, symbol, and gesture: The emergence of language (pp. 351–414). N.Y.: Academic Press.Google Scholar
- Feyereisen, P., & de Lannoy, J.-D. (1991). Gestures and speech: Psychological investigations. Cambridge: Cambridge University Press.Google Scholar
- Goldin-Meadow, S. (2003a). The resilience of language: What gesture creation in deaf children can tell us about how all children learn language. N.Y.: Psychology Press.Google Scholar
- Goldin-Meadow, S. (2003b). Hearing gesture: How our hands help us think. Cambridge, MA.: Harvard University Press.Google Scholar
- Goldin-Meadow, S., & Brentari, D. (2016). Gesture, sign and language: The coming of age of sign language and gesture studies. Brain and Behavioral Sciences, in press.Google Scholar
- Hinton, L., Nichols, J., & Ohala, J. J. (Eds.). (1994). Sound Symbolism. Cambridge: Cambridge University Press.Google Scholar
- Huttenlocher, J. (1973). Language and thought. In G. Miller (Ed.), Communication, language and meaning: Psychological perspectives (pp. 172–184). N.Y.: Basic Books.Google Scholar
- Huttenlocher, J. (1976). Language and intelligence. In L. Resnick (Ed.), The nature of intelligence (pp. 261–281). Hillsdale, NJ: Erlbaum.Google Scholar
- Klima, E., & Bellugi, U. (1979). The signs of language. Cambridge, MA: Harvard University Press.Google Scholar
- Liddell, S. (1980). American Sign Language syntax. The Hague: Mouton.Google Scholar
- Mayberry, R. I. (1992). The cognitive development of deaf children: Recent insights. In S. Segalowitz & I. Rapin (Eds.), Child Neuropsychology, Vol. 7. Handbook of Neuropsychology (pp. 51–68). Amsterdam: Elsevier.Google Scholar
- McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press.Google Scholar
- Morrell-Samuels, P., & Krauss, R. M. (1992). Word familiarity predicts temporal asynchrony of hand gestures and speech. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 615–622.Google Scholar
- Newport, E. L. (1981). Constraints on structure: Evidence from American Sign Language and language learning. In W. A. Collins (Ed.), Minnesota Symposium on Child Psychology (Vol. 1, pp. 93–124). Hillsdale, NJ: Erlbaum.Google Scholar
- Newport, E. L., & Meier, R. P. (1985). The acquisition of American Sign Language. In D. I. Slobin (Ed.), The cross-linguistic study of language acquisition, Vol. 1. The data (Vol. 1, pp. 881–938). Hillsdale, NJ: Erlbaum.Google Scholar
- Petitto, L. A. (1992). Modularity and constraints in early lexical acquisition: Evidence from children's early language and gesture. In M. Gunnar (Ed.), Minnesota Symposium on Child Psychology (Vol. 25, pp. 25–58). Hillsdale, NJ: Erlbaum.Google Scholar
- Stokoe, W. C. (1960). Sign language structure: An outline of the visual communications systems. Studies in Linguistics, Occasional papers No. 8.Google Scholar
- Supalla, T. (1982). Structure and acquisition of verbs of motion and location in American Sign Language. Unpublished doctoral dissertation, University of California at San Diego.Google Scholar