Did language evolve in multilingual settings?

instead of seeing bilingualism as a peripheral ability to be

studied after monolingualism is well understood,

bilingualism can be a central part of the story of language evolution

Roberts 2013:192


Accounts of language evolution have largely suffered from a monolingual bias, assuming that language evolved in a single isolated community sharing most speech conventions. Rather, evidence from the small-scale societies who form the best simulacra available for ancestral human communities suggests that the combination of small societal scale and out-marriage pushed ancestral human communities to make use of multiple linguistic systems. Evolutionary innovations would have occurred in a number of separate communities, distributing the labor of structural invention between populations, and would then have been pooled gradually through multilingually mediated horizontal transfer to produce the technological package we now regard as a natural ensemble.

Fig. 1
Map 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    Within certain approaches to language evolution, particularly the saltationist view associated with Chomsky and his collaborators (e.g. Hauser et al. 2002), just one of these innovations, namely the development of recursion (or its intellectual descendant Merge), gets privileged as THE crucial step in the development of language (or, more precisely, Faculty of Language in the Narrow Sense, in their terminology). Obviously such approaches are not automatically compatible with the gradualist, multi-adaptation view adopted here, though even in that intellectual tradition a ‘Faculty of Language in the Broad Sense’ would include most or all of the above. To that extent, the arguments made in this paper would be limited to evolution of ‘language in the broad sense’, but are applicable nonetheless.

  2. 2.

    As Daniel Dor (p.c.) points out, humans have also invented other technologies for experiential displacement, such as drawings or maps. Language, crucially, allow for displacement of material that cannot readily be experientially displaced by these means, such as possible worlds (modalities), differences in common ground between interlocutors (e.g. definiteness), or chains of evidence (‘evidential’ inflections in e.g. Quechua).

  3. 3.

    A referee asks whether we can project this genetic variability back into the small, early populations who were evolving speech. At present we do not have an empirically based answer to this question—which depends, among other parameters, on knowing what time in the past we are talking about and exactly how the population of proto-speakers is to be delimited (Sapiens only? Neanderthals and Denisovans as well?) However, induction from current human populations of comparable size makes it seem unlikely that it would be genetically homogeneous.

  4. 4.

    Ideas linking ‘race’ to language features were firmly rejected, within linguistics, by Franz Boas’ argument that a child of any racial background can acquire total mastery of any language provided they are exposed to it from early life. But this is not incompatible with the findings of Dediu, Ladd and others that certain linguistic features correlate with genetic ones: small genetic differences between populations, iterated over many generations, can differentially favour the evolution of particular linguistic traits: ‘mathematical and computational models suggest that genetic biasing of language, even if small at the individual level, can act as a forcing factor on the trajectory of language change’ (Dediu 2011: 286), but this does not render them unlearnable by those from other populations once they have been evolved.

  5. 5.

    Cf. this definition in the German version of Wikipedia, where the German word Song is defined as a particular kind of Lied: Ein Song (englisch für Lied) ist ein Lied des 20. oder 21. Jahrhunderts, das sich an anglo-amerikanischen Vorbildern orientiert. Der Begriff findet vor allem in der populären Musik Verwendung und grenzt sich ab zum Kunstlied, zum Volkslied bzw. Folksong, zum Schlager im deutschsprachigen Raum und zum französischen Chanson. Anders als im englischsprachigen Raum, wo der Begriff „Song“ weitgehend synonym zur weiten Bedeutung des deutschen Wortes „Lied“verwendet wird, ist im deutschsprachigen Raum der Song eine Liedgattung. []. Translation: ‘A Song (English for Lied) is a Lied of the twentieth or twenty-first century, which is oriented to Anglo-American examples. The concept is primarily used in popular music and is delimited from the Kunstlied (‘art song’), the Volkslied and the Folksong, to the term Schlager (‘hit’) in the German-speaking area and to the French chanson. Unlike in the English-speaking area, where the concept ‘song’ is broadly used as a synonym for the broad meaning of the German word Lied, in the German-speaking area the Song is a type of Lied.’

  6. 6.

    This is not too far below Dunbar’s (1992) ‘comfortable’ human group size of 148, though I hasten to point out that, as argued here, multilingualism ensures that the social group is substantially larger than the language group.

  7. 7.

    Though this may be exaggerated—see Pascoe (2014) for an important recent critique arguing for forms of agriculture and other types of sedentary food production (fish traps, eel-channels etc.) over much of the continent.

  8. 8.

    Further, the likely skewing of speaker-population sizes along a log-normal distribution means that the average speaker population is likely to have been even smaller than this average suggests. I am grateful to an anonymous referee for pointing out this consequence.

  9. 9.

    Before European contact, there were probably around 250 languages in Australia, though some recent estimates push this up to 407 (Bowern 2016)—and given that estimates of precontact population range from 250,000 to 750,000, this is roughly 650–3000 speakers per language. For New Guinea, a rough estimate of total number of languages is around 1200, for a precontact population of perhaps four to six million, giving the number of speakers per language as somewhere between 3300 and 5000.

  10. 10.

    The usefulness, in the quest for a future spouse, of learning the language of one’s mother in addition to that of one’s father, comes out particularly clearly in this characterisation by Leenhardt (1946) of the traditional situation in New Caledonia: Les femmes enseignent aux enfants leur langue maternelle; de quelques pays qu’elles viennent, elles préparent leurs filles à aller un jour au pays de l’oncle utérin, et la connaissance de la langue du kaña leur paraîtra toujours indispensable dans ce but. De même, leur fille devra comprendre la langue du “frère” boru ña, qui sera un jour son mari. … nombre de jeunes gens continuaient leur séjour, et ne revenaient qu’après avoir épousé la femme qu’ils allaient ramener chez eux. Durant ce temps, ils avaient appris à fond la langue. Leur femme parlera sa langue en même temps qu’elle apprendra celle de son mari. Ainsi tout indigène était pour le moins bilingue. [‘Women teach their children the maternal language; from wherever they come, they prepare their daughters to go one day to the country of their maternal uncle, and the knowledge of the language of the kaña is indispensable for this. In the same way, their daughter must understand the language of the ‘brother’ boru ña, who one day will be her husband… [After certain feasts in the country of the maternal clan] a number of young people will stay on, not returning until they have married a woman who they will bring back to their country. During this time, they will have master the language. Their wife will talk her language at the same time as she is learning that of her husband. Thus every indigenous person is at least bilingual.] (Leenhardt 1946: xvi; my translation).

  11. 11.

    Moore gives, as an example, the courting by a young man called Jonas of a girl called Gogo in her mother’s compound in the Mandara Mountains, Cameroon. To enhance his chances, Jonas courts her in Mada, her father’s language, even though they already had two other languages in common: Wandala, the local lingua franca, and Wuzla, the first language of Jonas’ father and Gogo’s mother. (In addition Jonas speaks five other languages). Prior to visiting Gogo’s mother’s compound, Jonas had jotted down a list of topics of conversation and relevant vocabulary on a piece of paper, but did not need to use them during the conversation.

  12. 12.

    A ‘stretched frontier’, for example the linear peopling of a coastline, may have somewhat limited options, as compared to a more densely populated area where one has neighbouring groups on all sides, but all but the ‘tip group’ would have at least two options, one in front and one behind, from whom to draw mates.

  13. 13.

    I am ignoring many differences of detail here across the different regions mentioned here. For example, multilingual capacity may, in everyday practice, be played out in ‘asymmetric bilingual conversations’, where each party speaks their own language but understands the others, or participants may shift into whichever language is appropriate for their location. In some regions, this may result in people regularly exhibiting an active command of several languages, while in others they only speak one, but ‘hear’ others. Either way, however, the knowledge enabling them to interact must extend to the structures of two or more languages.

  14. 14.

    Needless to say, this is presented here not as an actual account of early human lingualism, but rather to show in a particularly vivid way how deeply ingrained the social-signalling function of language can be in many cultures, and also how taken for granted it can be that cultural blocs can span multiple languages. The Warramurrungunji myth has been recorded in a number of languages—Iwaidja, Kunwinjku, Gun-djeihmi—from people belonging to quite different clans. See Evans (2010: 5–8 for more details of this myth).

  15. 15.

    The most likely sociohistorical scenario is that, in the first generations, descendants of these first mixed marriages were bilingual, and served as go-betweens between Cree hunters of fur and Quebec French fur-traders. Subsequent legal changes in Canada, which formalised ethnic group membership, left the Métis in a position where they belonged neither to the recognised indigenous tribes nor to the mainstream white population. At that point fluency in one of both of the formant languages would have atrophied, and a new mixed language appears to have emerged, though a process of what Bakker (1997) calls ‘language intertwining’, as a group marker. In subsequent generations this left Michif speakers whose language repertoire did not include the source languages.

  16. 16.

    For an interesting line of research at the intersection of semantic and social structure, see the series of studies by McConvell (1985, 2018) which trace the genesis of Australia’s unique eight-class ‘subsection’ system—which assigns every individual to one of eight classes, effectively providing a schema for their kin relation to every other member of the social universe. Most proximally this appears to have originated through the interaction and integration of two isomorphic but differently named four-class systems across a language boundary among bilingual speakers; more distally, four-class systems in their turn may have originated through comparable interactions of two-class ‘moiety’ systems.

  17. 17.

    A rebus is a symbol that borrows the sounds of an easily drawn word to represent a homophonous word that is more difficult to depict visually, e.g. using the symbol to write the English verb ‘be’.

  18. 18.

    The Siddham script, from which the Bengali, Tibetan and some other scripts evolved, is neither an alphabet, which has a distinct letter for each sound, nor a syllabary, which has a distinct letter for each syllable, e.g. な na versus ぬ nu in Japanese hiragana. Like the Semitic scripts, the Siddham script is an abugida: letters have an ‘inherent’ a-vowel which will be pronounced in the default case (e.g. Devanagari न na), but can be deleted and replaced by modifying the main letter (e.g. the underswirl in Devanagari नु nu).The fact that Kūkai’s exposure to the Siddham script led him to produce a script which, as a syllabary, differs typologically both from Chinese and from Indic abugidas is further testimony to the way that multilingualism (effectively between three languages in this case) can lead to the emergence of quite new structures.

  19. 19.

    Hiragana is primary in the sense that it is learned first, and can be used to write any word; characters are still used alongside it. (There are in fact two syllabaries, hiragana and katakana, specialised for different purposes, one developed from the old regular script and one from the cursive script.).

  20. 20.

    A referee raises the question of whether cross-borrowing would lead to homogenisation. The study of ‘linguistic areas’ in historical linguistics, whereby unrelated or only distantly related languages converge through time, has certainly identified several putative convergence zones or Sprachbünde (e.g. the Balkans, South Asia, Mainland Southeast Asia) in which languages come to possess certain structural features in common (e.g. not using infinitives in the Balkans, developing retroflexes and verb-final syntax in South Asia, tone, monosyllabicity and serial verbs in Mainland Southeast Asia). But in no case is the convergence perfect, and if anything the direction of current findings for most of the classic convergence zones is to show that they are lot less well-defined than has been classically believed. As the distinguished contact linguist Sarah Thomason (2000) puts it: ‘Even in the strongest Sprachbünde, the often-cited “tendency toward isomorphism” rarely if ever leads to massive overall convergence.’

  21. 21.

    This is a handy term (Croft 2000) for discussing units of linguistic structure in a way that is non-committal with regard to whether they concern sound (phonemes), word-structure (morphemes), etc.

  22. 22.

    The initiation language Demiin, taught to second-degree initiates of the Lardil group in northern Queensland, presents an interesting example of speech-gesture hybrids (Hale 1973; McKnight 1999). Its set of spoken signs compresses the whole of Lardil vocabulary down to less than 200 words, but these are then disambiguated through gesture. Thus the word ɬ↓i can refer to any (gilled) fish (ɬ↓ is an ingressive lateral fricative sound unique to Demiin), but the different types of fish are disambiguated by making an appropriate gesture while saying the word, e.g. for ‘parrotfish’ (ngerrawurn in Lardil) the hand is held with the thumb out and up but inclined slightly: the thumb represents the dorsal fin and the inclination the fact that these fish generally tilt while eating coral.

  23. 23.

    Note for the non-linguist: in the international phonetic alphabet the upside-down capital R, ʁ is an Edith-Piaf style ‘uvular trill’, most often accomplished by English speakers when gargling.

  24. 24.

    This practice is still found in languages like Ewe from Ghana: cf. pótópótó ‘sound of a small drum’ (high tone), potopoto ‘sound of a big drum’ (low tone). See Ameka (2001: 30).


Funding was provided by Australian Research Council (Grant No. FL130100111), Centre of Excellence for the Dynamics of Language (Grant No. CE140100041) and Alexander von Humboldt-Stiftung (Anneliese Maier Forschungspreis).

