Language is not a single faculty, but an integration of multiple subfaculties (Fitch, 2006; Hauser, Fitch, & Chomsky, 2002; Okanoya, 2007), each of which must have evolved gradually through natural selection. Compared with the process of formation of each of the subfaculties, the process of integration must have been more rapid (Miyagawa, Berwick, & Okanoya, 2013); nevertheless, the process was subject to natural selection. In this sense, the theory that language appeared suddenly in humans cannot be accepted.

In my theory of language evolution, I assume that evolution of form and evolution of content are independent to some extent (Okanoya, 2002). However, content did not gain adequate structural complexity until after it was integrated with form. Even if I concede that language evolved to facilitate thinking, as is the claim of the school of generative grammar (Fujita, 2016), for a thought to be complex, adequate structural complexity of the form that supports the externalization of thought must be present. For this reason, I assume that the process and mechanisms of signal complexity are an important part of the study of language evolution.

Here, I use birdsong as an example to explain how signal complexity may have evolved. Of course, birds are phylogenetically quite distant from humans, but birds and humans share a common property of vocal learning (Jarvis, 2006). Because vocal learning is known to use similar brain substrates in birds and humans, by studying a process in which birdsong becomes complex, we should be able to gain some insight into the similar process of speech evolution in humans.

Evolution of communication and emergence of language

Communication is defined as the process of signal transmission whereby, in the long run, the receiver’s behavior is changed by the signal, and the sender gains benefit by the change in the receiver’s behavior (Slater & Halliday, 1994). Here, “the long run” could mean several generations. For example, the sender’s benefit could mean an increase in the occurrence of the sender’s genes. Taking the example of birdsong, a receiver (female) listens to the song of a sender (male) and decides whether to copulate with the sender. If she decides to do so, the song, in the long run, increases the fitness of the male singer.

Thought is a prefrontal function, and the prefrontal lobe has evolved in service of behavioral control (Arnsten, Wang, & Paspalas, 2012). In the emergence of language, it is also valid to consider that communicative behavior prepared the vehicle of thought, because communication uses a variety of motor movements as signals and the movements associated with signals might be internalized (Pulvermüller & Fadiga, 2010). The internalization of behavioral intention could be freely combined to enable mental simulation, and behavior can be considered a preadaptation to thought (Pulvermüller, 2010). Thus, to resolve “the fallacy of communication” often raised by scholars in generative grammar (Fujita, 2016), one method is to assume that thought does not produce behavioral complexity, but that, conversely, behavioral complexity produced flexibility in thought. In the former view, the question remains how thought came about, but in the latter, we can only account for how behavior became complex.

Behaviors that have a specific function can only have a degree of complexity that matches the function. Complexity that exceeds the need of the function is simply a waste of resources. However, behaviors without a specific function related with each topographical movement could develop complexity. Behaviors used for sexual demonstration, in themselves, do not have a specific proximate function, but may serve to express the ultimate fitness of the demonstrator. Moreover, if the direct cost of receiving that behavior disappears, the behavior could gain more arbitral complexity through relaxed selection (Deacon, 2010).

In many animals, vocal signals function as sexual signals. In birds and whales, this tendency is further developed and, in some of these species, learning is necessary to acquire species-specific mating signals, which take the form of songs. Language is a unique property of humans, but songs—long, complex sequences of vocalizations—are observed in many animal species. Because of this behavioral continuity, by taking song as a precursor to language, the biological origin of language has become accessible (Brown, 2000; Darwin, 1871; Fitch, 2013; Mithen, 2005).

Let us hypothesize that before language, protohumans developed singing behavior associated with various social contexts. If songs became a learned property, as they are in some bird and whale species, a syllable phrase may have been shared by more than one song. Then, likewise, a part of the behavioral context in which a song was sung may also have been shared by more than one song. For example, a song sung when hunting (Song H) and a song sung when dining (Song D) might have shared the same phrase “h&d.” Furthermore, Song H and Song D shared the context of “doing something together.” After a while, by singing the shared phrase “h&d,” the singer may have specified the context of “let’s do that together.” By repeating this process, holistic songs might have been deconstructed into specific phrases, and these phrases might then have become protowords. I call this the mutual segmentation hypothesis of song phrases and song contexts (Merker & Okanoya, 2007; Okanoya & Merker, 2007). I shall explain how the process of mutual segmentation became possible through an increase in the formal complexity of sexual songs in Bengalese finches.

Evolution of complexity in Bengalese finches

Bengalese finches are familiar as pet birds throughout the world, but they do not exist in nature. They are a domesticated strain of the wild white-rumped munia, Lonchura striata (Restall, 1996). The white-rumped munia, as its name suggests, has a white patch on a part of the rump, but otherwise the body feathers are dark and light brown. Bengalese finches, however, have brown patches on this white background. Records indicate that approximately 250 years ago, a Japanese federal king imported white-rumped munias from China (Taka-Tsukasa, 1917; Washio, 1996). At the early stage of domestication, these birds were kept for their parental competence and used to foster birds that did not breed well in captivity. Approximately 120 years ago, white color mutations appeared, and they were renamed Bengalese finches rather than white-rumped munias.

The songs of white-rumped munias are simple, with approximately eight song syllables that are sung in a fixed order for several renditions, to construct a bout. The songs of Bengalese finches are more complex, both in their syllable acoustics and syllable sequences, and in their song syntax. Bengalese finch songs consist of chunks of two to five song syllables arranged in a probabilistic Markovian way, or in finite-state syntax (Katahira, Suzuki, Okanoya, & Okada, 2011).

To clarify the extent of learning in the song differences between the two strains, we cross-fostered their eggs—that is, eggs from white-rumped munias were incubated by Bengalese finches and vice versa (Takahasi & Okanoya, 2010). We found that while Bengalese finches learned most of munias’ songs, munias could not learn some of the song elements of Bengalese finches. The results indicated that there were some heritable substrates related with song differences between the species. Next, we compared volumes of song-related telecephalic nuclei including HVC (proper name), RA (robust nucleus of the arcopallium), and Area X (striatal part of the subpallium). The result indicated that Bengalese finches had larger song-related brain structures than did munias (Okanoya, 2004). We also found no difference in the overall forebrain volumes between the two strains of finch (unpublished observation). This is an opposite tendency from that reported in domesticated mammalian species, of which most are shown to have smaller brain sizes than that of their wild counterparts, but the causal relation between domestication and brain size has not yet been established (Wilkins, Wrangham, & Fitch, 2014). We also compared the expression of glutamate receptors (Wada, Sakaguchi, Jarvis, & Hagiwara, 2004) in the two strains of bird in song-related nuclei and found that when differences occurred, Bengalese finches always demonstrated denser expression of the receptors (Wada et al., 2008). Furthermore, body size and weight were correlated with song bout length in Bengalese finches, suggesting that receivers could judge the fitness of senders (Soma et al., 2006).

To explore evolutionary reasons for song complexity in Bengalese finches, we examined songs in populations of wild white-rumped munias in Taiwan, Republic of China. We identified three natural populations of white-rumped munia. Interestingly, each population had a different degree of mixture with a sympatric species, the spice finch, Lonchura punctulata. We found that song complexity was low in the population where the sympatric ratio was high, while it was higher in the population where the sympatric ratio was low (Kagawa et al., 2012). We interpreted the results in the following manner: When munias are mixed with many sympatric species, they need to make the species–specific characteristics of their songs more conspicuous to avoid cross-breeding, whereas if they have a low degree of sympatric species, they can develop complex songs to attract conspecific females without the risk of hybridization. The hybridization of these two species has been reported in captivity, but their fertility is unknown (Restall, 1996).

To sum up, we consider that some of the reasons why songs become complex in domesticated Bengalese finches include female preference for complexity originating from wild white-rumped munias, the weakened need for the song to function as a species signature in the domesticated environment, and the relaxed environment of domestication, which does not have predation or feeding pressures (Okanoya, 2015).

Domestication syndrome in Bengalese finches

Humans have domesticated many species for a variety of purposes. Some of these animals show curious similarity to each other: loss of pigmentation to a part of the body surface, round face and weak biting force, and decreased aggressiveness associated with decreased cortisol levels. These changes are collectively called “domestication syndrome,” and a recent theoretical analysis has proposed that they may be because of the delayed migration of neural crest cells during embryogenesis (Wilkins et al., 2014). This hypothesis states that the adrenal cortex that secretes stress hormones, pigmentation cells, and jaw bones descend from neural crest cells that were produced during embryogenesis. By selecting tame individuals, humans may have been selecting individuals with slower migration or lower production of these neural crest cells.

Because Bengalese finches originated from white-rumped munias imported from China some 250 years ago, there should have been selection for tameness that secured survival in the course of long travel. Munias were initially used as foster parents for foreign birds, and this required further tameness and allowance, while aggressive individuals must have been removed from the stock. Parenting in a small cage requires stress tolerance, and thus it was necessary for the level of stress hormones to decline. These requirements probably led to delayed migration of neural crest cells, resulting in the overall white appearance of domesticated Bengalese finches.

To examine whether the neural crest hypothesis of domestication syndrome (Wilkins et al., 2014) applies to Bengalese finches, we examined several socioemotional factors in this species, and when possible, compared them with white-rumped munias. Biting force was examined in both strains by holding the bird and challenging it with a stick equipped with a piezoelectric sensor (Suzuki et al., 2012). Munias bit twice as often, and their biting force was twice as strong as that of Bengalese finches. This demonstrated that Bengalese finches are less aggressive or less afraid of the situation. Fearfulness was examined using a tonic-immobility test (Suzuki, Ikebuchi, & Okanoya, 2013). The bird was held on its back for 15 s to simulate predation and then released. The time it took to move and the time it took to fly away were measured as an index of fearfulness; it is an adaptive trait to remain still in a predatory situation. White-rumped munias took 3 times as long as Bengalese finches to fly away; the results thus indicated that compared to munias, Bengalese finches have less fear. Finally, fecal corticosterone levels were measured in both strains of bird. Bengalese finches showed half the corticosterone level of munias, indicating lower stress levels in domesticated Bengalese finches (Suzuki, Yamada, Kobayashi, & Okanoya, 2012).

These results are consistent with the neural crest hypothesis and further explain some of the other data. For example, we found that in a free-flight cage, Bengalese chicks learned songs not only from their fathers but also from other males (Takahasi, Yamada, & Okanoya, 2010). A similar experiment was conducted with munias, but they only learned from their fathers (Kagawa et al., 2008). This, in part, may be because of the decreased fearfulness and increased tameness of Bengalese finches. Furthermore, the smaller neural structure in munias than in Bengalese finches might be explained by the decreased corticosterone level in the latter. Mineralocorticoid and glucocorticoid receptors coexist in the HVC of Bengalese finches (Suzuki, Matsunaga, Kobayashi, & Okanoya, 2011). It is known that in rodents with a higher corticosterone level, these receptors function to suppress neural growth and induce cell death (Abdanipour, Sagha, Noori-Zadeh, Pakzad, & Tiraihi, 2015). Although the process has never been demonstrated in avian species, if similar mechanisms function in the brains of Bengalese finches and munias, this could account for the result that more developed neural tissues are found in Bengalese finches than in munias.

Self-domestication and signal evolution in humans

It has been suggested, although still without much empirical evidence, that hominoids domesticated themselves over a period of millions of years (Francis, 2015; Hare, Wobber, & Wrangham, 2012; Omoto, 2003), and the process of domestication included the selection of calm, sociable individuals to enable group living and collaborative activities. By protecting themselves from predators, hominoids may have shared more energy for sexual rituals (Miller, 2000). Vocal plasticity is a feature that may have either been sexually selected or evolved through relaxation (Wells, Dunn, Sergeant, & Davies, 2009). In birds, vocal plasticity is an adaptive trait to increase mating success. Like many domesticated animals, some of these traits might be related to the cells descended from the neural crest (Wilkins et al., 2014). In humans, vocal plasticity might have prepared the behavioral vehicle on which thought could be organized. Thus, studies of song complexity in Bengalese finches may provide certain insight into the preparatory phase of the emergence of human language (Okanoya, 2015).