We cannot achieve a satisfying proposal on the evolution of language, which is generally spoken, without explaining how it became possible for our ancestors to use their voices flexibly and creatively, thus to make the kinds of sounds that, with other changes, would have allowed them to communicate symbolically (Fitch, 2000). This capability presumably emerged before, not after, the appearance of language (Darwin, 1871). But what would a wordless use of the modulated voice have done for our prelinguistic ancestors? I will suggest that the voice both cued and signaled fitness qualities because of something attractive or informative about its physical form.

There is a lot of theoretical work to do here because chimpanzees, codescendants of our last common ancestor, tend not to vocalize flexibly or creatively, to invent new vocalizations, or to combine calls to convey new “meanings” (Fischer, Wheeler, & Higham, 2015; Zuberbühler, 2003). This leaves us with an important question: How did our ancestors acquire the diverse and flexible production capacity of modern humans? In stressing the role of production, of course, we must not neglect the evaluative role of receivers. An increase in diversity or attractiveness may have been driven by a change in the perceptual or evaluative systems of receivers who were motivated by new social pressures to make finer distinctions regarding fitness (Guilford & Dawkins, 1991).

One hypothesis is that the vocal logjam was broken by a huge expansion in the size of human groups, producing new social challenges that required more complex ways of signaling (Humphrey, 1976; Jolly, 1976; also see Freeberg, Dunbar, & Ord, 2012). This may have played a major role, but individuals must have begun to diversify and control vocal behavior in specific situations. I will claim that changes in sociality, in conjunction with modifications of life history, combined to liberate and diversify vocalization in two important contexts.

Infancy and childcare

The first proposal relates to infancy. Most evolutionary proposals say little about this stage, but no account can be complete without identifying the environmental changes that produced relevant traits in development and the reasons why the new traits would have conferred contemporaneous reproductive advantages. Nearly a century ago, Walter Garstang (1922) spoke of the “absurdity” of supposing that a new trait could evolve in mature members of the species, a judgment that has been repeatedly confirmed in the interim (Gould, 1977; Hall, 2002; Northcutt, 1990; West-Eberhard, 2003).

The context is care. Changes in life history left the human infant unusually helpless—unable to survive without an intensification and extension of care (Bogin, 1999; Falk, 2004). A second change, earlier weaning, reduced infancy from 5 years—the duration of chimpanzee weaning—to three, producing a short childhood (Locke & Bogin, 2006). This also truncated the period of maternal lactation, which reduced the interbirth interval. The resulting increase in equally dependent siblings could only have stiffened competition for care (Bogin, 1999; Locke & Bogin, 2006). These changes, I suggest, encouraged infants to look for new ways to attract attention, and caregivers to unconsciously search for indications that their offspring’s development was developing apace.

The increasing altriciality of human infants may also have encouraged parents to adopt cooperative breeding, parental sharing of care with relatives and others (Hrdy, 1999, 2004). In cooperative breeding arrangements, individuals who are less genetically related to the infant than his or her own parents may be less motivated to provide care. Here I would note that nestling birds that are genetically unrelated to prospective caregivers beg more loudly than others (Briskie, Naugler, & Leech, 1994), and human infants raised by step- or alloparents may try harder, or operate more strategically, to get the care they need (Locke, 2006).

The first negotiation

In the early 1970s, Trivers (1972, 1974) described a basic dilemma in the rearing of the young: a conflict between the needs of parents and their dependent offspring. On the one hand, the necessity of feeding and protecting their infants requires parents to monitor their offspring’s behavior for signs that attention is required. But parents have competing responsibilities, including the management of other children. Parents thus look for opportunities to withdraw care, and infants attempt to prevent this with increasingly clever bids for attention, and by monitoring parental responses to these behaviors. It is assumed that selection would have operated on both the infant’s use of, and the parents’ response to, behaviors that signaled infants’ needs. Today, a female advantage in the appraisal of vocal or facial affect, where it occurs, is occasionally attributed to evolutionary pressure to detect subtle changes in infant signals (Babchuk, Hames, & Thompson, 1985; Hampson, van Anders, & Mullin, 2006).

Crying reliably elicits care, and it carries transient information about the infant’s physical and emotional state. If noxious or inconsolable, crying can elevate the stress level of caregivers (Zeskind & Collins, 1987; Out, Pieper, Bakermans-Kranenburg, Zeskind, & IJzendoorn, 2010) and lead to abuse (Frodi, 1985; Locke, 2006). Crying also carries information about health conditions. It has long been known that the crying of healthy infants can be acoustically discriminated from the crying of infants with low birth weight; metabolic, chromosomal, and endocrine disorders; and brain damage (Michelsson, 1986; Wolff, 1969).

But if human societies enlarged, were there parallel changes in the criteria parents and alloparents used in allocating attention? It is generally agreed that increases in group size ramped up levels of social competition and cooperation. What was at stake, often, was access to social capital—the knowledge, goods, and services that members of alliances and reciprocal friendships trade with each other. But these social arrangements are not automatic: they must be negotiated. Thus, as the young develop, parental criteria for continuing attention would be expected to shift from physical well-being to abilities in the social domain: infants most likely to receive extended care and instruction would be those who display the ability to attract and engage with others—early signs of the ability to negotiate.

This may have been what Trivers (1974) had in mind when he pointed out that infants may employ “psychological weapons” to keep their parents from withdrawing care. A potentially important weapon is instrumental crying. In his observations of crying, Wolff (1969) noticed that as early as the third week of life, many infants produce “fake” cries, presumably based on an earlier discovery that their genuine cries elicited care. Wolff also noticed that first-time mothers were more likely to respond to cries than multiparous mothers.

These observations suggest that, with experience, mothers may learn to discriminate honest crying from the false cries of infants who want attention, but are not truly distressed. Because infants are the only ones who know whether they are “faking,” parents must learn to interpret these care-elicitation signals. Selection favors parental ability to discriminate honest from dishonest signals, driving infants to produce even more convincing signs that they are worth continuing care. Having been submitted to conscious control, the infant’s signals can then be used in other contexts.

In the seventh month, crying typically subsides and a different form of vocalization appears (Koopmans-van Beinum & van der Stelt, 1986)—one that has the opposite effect on caregivers and may have furthered the emancipation of vocal behavior from subcortical control (Myers, 1976): It is babbling, the production of well-formed syllables that parents frequently hear as speech. Initially, babbled sounds appear as fairly precise reduplications of apical stops (e.g., “da-da-da”), but in succeeding months the place of articulation and other phonetic features diversify. It has been speculated that the rapid and controlled shifting from sound to sound that occurs in variegated babbling “decouples” speech from more reflexive and prosodic vocal activity (Oller, 2000, 2004; also see Locke, 2004a).

Studies indicate that infants who produce a high rate of syllables per utterance are considered more pleasant, friendly, and likeable than infants who vocalize less complexly (Bloom, D’Odorico, & Beaumont, 1993; Bloom & Lo, 1990). When infants invent and use novel phonetic forms it appears to please parents, who incorporate them into their own speech. This tendency, which I have called “trickle up phonetics,” may have contributed to the universal pattern whereby “baby words,” like babbled utterances, are almost exclusively composed of reduplicated CV syllables, such as “dada” (or “papa”) and “mama” (Locke, 1990, 2004b).

To some degree, babbling may have persuaded parents that their offspring were physically fit, for the timely onset of babbling implies normal auditory sensation (Oller & Eilers, 1988) and the development of left-hemisphere control of fine motor movement (Locke, Bekken, McMinn-Larson, & Wein, 1995). There is evidence, too, that infants who are neurologically impaired or intellectually disabled are less likely to begin producing complex (syllabic) vocalization at the neurotypical age than their typically developing peers (Cobo-Lewis, Oller, Lynch, & Levine, 1996; Oller, Eilers, Neal, & Cobo-Lewis, 1998). Because there is continuity between babbling and speech, it is not surprising that strong positive correlations have been obtained between quantity and quality of vocalization in infancy and measures of intelligence in later stages of life history, including adulthood (Cameron, Livson, & Bayley, 1967).

If they were to thrive under conditions of increased competition, individuals would be expected to show promise of social ability in infancy. Today, decisions about care are also based on the infant’s ability to initiate and respond to social stimulation. This may have contributed to an extension of care into childhood. In fact, childhood itself may have been expanded to its present (4-year) length because parents needed to provide their outward-bound offspring with reliable information about how to deal with an increasingly complex social environment (Fitch, 2004, 2007), one that contained individuals of unknown intentions.

What I am suggesting is this: As reproductive fitness increasingly presupposed the ability to negotiate social relationships and social capital—an indirect effect of large group living—parents may well have ramped up their search for signs that offspring were attempting to engage with them in an attempt to negotiate continuing attention, including instruction. What would have been needed in our steadily socializing species was a more flexible and playful form of vocalization that would engage, possibly even entertain, parents who were looking for signs that their infant would eventually be able to develop and maintain friendships, interpret intentions, and do all the other things that cooperation and competition require. My claim is that infants who vocalized playfully and creatively received continuing care from their parents, whose perceptual and evaluative criteria were also becoming more finely tuned, and more socially and psychologically oriented.

If social negotiations presuppose one’s ability to send and interpret social signals, they also reflect the ability to interact. In the 1970s, Daniel Stern studied 3- to 4-month-olds in interaction with their mothers. He found that when these dyads were positively aroused emotionally, they vocalized together and appeared to get a great deal of enjoyment from it. He called these episodes “coactional vocalizations,” early attachment behaviors that seemed to contribute to the formation of mother–infant bonds (Stern, Jaffe, Beebe, & Bennett, 1975). Since then, it has been found that vocal and other types of mimicry (and synchrony) facilitate emotional and social relationships (Carpenter, Uebel, & Tomasello, 2013; Cirelli, Einarson, & Trainor, 2014; Cirelli, Wan, & Trainor, 2014).

Mate selection

If, in evolution, selection acted on traits that were already present in some form, increased vocal ability emerging from infancy and childhood may have been appropriated and refined for new applications that arose in juvenility and adolescence. In songbirds, several studies have witnessed an association between song complexity and learning proficiency—males with more song phrase elements requiring fewer learning trials to solve a novel foraging task (Boogert, Giraldeau, & Lefebvre, 2008; Cauchard, Boogert, Lefebvre, Dubois, & Doligez, 2013).

In a reproductive context, there is evidence that females use accuracy of song learning by males as an honest cue to developmental history, thus to quality (Lachlan & Nowicki, 2012; Nowicki & Searcy, 2011). In one study, it was found that songs learned by well-nourished male swamp sparrows elicited significantly higher levels of courtship display from females than the songs that had been learned by undernourished male swamp sparrows. Songs of the properly nourished males were longer and displayed a higher trill rate, greater stereotypy, and more notes per syllable (Searcy, Peters, Kipper, & Nowicki, 2010).

There may also be a link between signal complexity and mating success in primates. In gelada monkeys, groups may contain over a thousand individuals and a number of tiny reproductive units under the control of male “leaders.” It was reported recently that males display significantly more complex vocalizations than do females (Gustison, le Roux, & Bergman, 2012), and in playback experiments, female geladas displayed a preference for these more complex vocalizations (Gustison & Bergman, 2016).

Do displays of complex behaviors play a role in human mating? In a study of instrumental music—piano compositions—Charlton (2014) reported that young women preferred, as short-term sexual partners, men who had composed more rather than less complex music at precisely the interval in their menstrual cycle when conception risk was highest. How do young women feel about vocal complexity?

First, it needs to be acknowledged that something about the speaking voice—its pitch—plays a key role in the evaluation of fitness. Men with low-pitched voices have relatively higher amounts of testosterone and are typically judged by female listeners to be more dominant and attractive (Collins, 2000; Feinberg et al., 2005; Puts, Gaulin, & Verdolini, 2006), and there is evidence that women prefer low-pitched male voices, especially when they hear them in a courtship or mating context (Apicella & Feinberg, 2009; Little, Connely, Feinberg, Jones, & Roberts, 2011). This preference is stronger when women are in the fertile phase of their ovulatory cycle and estrogen levels are unusually high (Feinberg et al., 2006; Puts, 2005). Predictably, lower voice pitch predicts the mating success of males (Apicella, Feinberg, & Marlowe, 2007).

An important transition in the evolution of vocal control may be linked to the distinction between cues and signals. A cue—some physical or behavioral feature that is informative—can evolve into a signal if its reproductive value is actively displayed (Maynard Smith & Harper, 1995). The voice is a reproductive cue when it varies with sex hormone levels, but it can also be a reproductive signal. It has been reported that men may lower their pitch, and women may raise theirs, in a contrived mating context (Fraccaro et al., 2011; Puts et al., 2006; see discussion in Pisanski, Cartei, McGettigan, Raine, & Reby, 2016).

What about vocal complexity—in the form of speech? Men with novel, extensive, or intricate vocal repertoires tend to dominate other men, and to enjoy unusual access to sex (Locke, 2001, 2008). Is this simply a correlation, or do men consciously or unconsciously diversify or ornament their utterances when it would increase their perceived fitness to do so?

There are indications that they do. In one study, young men defined contrived word combinations far more creatively when tested by an attractive young woman, or in competition with other men (Franks & Rigby, 2005). In another study, it was found that young men used more low-frequency words following an imaginary assignation with a younger than with an older female, or an imaginary liaison with a male (Rosenberg & Tunney, 2008). In a third study, subjects preferred an actor with attractive ways of speaking, both in the context of short- and long-term mating, based on lexical diversity, grammatical complexity, and verbal fluency (Lange, Zaretsky, Schwarz, & Euler, 2014).

Concluding remarks

Because languages are universally spoken, it is incumbent on evolutionary theorists to identify specific pressures that induced the vocal and phonetic skill that was required to speak. My hypothesis is that the ability to generate attractive strings of vocal and phonetic material first cued, then signaled, physical and psychological fitness in infancy; and when human societies enlarged, that the young benefitted additionally if they displayed the vocal skills needed to negotiate social relationships. These abilities would surely have produced changes at the neural level of vocal control, for intentionality plays a critical role both in the concept of instrumental crying and modulated vocalization.

I want to close with an anecdote that may be instructive. In 1783, Chrisfrid Ganander described an ancestral Finnish tradition in which fathers used riddles to test “the acuity, intelligence and skills” of their daughters’ suitors. When a young man sought romantic commitment, “three or more riddles were posed to him, to test his mind with them,” wrote Ganander, “and if he could answer and interpret them, he received the girl, otherwise not” (Maranda, 1976, p. 127). Ganander’s anecdote relates to the role of a parent in mate selection, but I see no reason why parents, aware that their offspring would ultimately be appraised on the basis of their social intelligence, might not have posed similar tests—ones that involved their offspring’s ability to demonstrate or respond to vocal complexity, working much like a vocal riddle.