Introduction

Cultural evolution and biological evolution share a number of similarities that have long been recognised (e.g. Darwin 1871; Lumsden and Wilson 1981; Boyd and Richerson 1985; Durham 1991). But since cultural inheritance and biological inheritance also encompass necessarily different features (e.g. Boyd and Richerson 2005; Tëmkin and Eldredge 2007; Mace and Holden 2005), the study of both disciplines has followed relatively independent paths in terms of the methodologies and approaches used by each. Notwithstanding a number of scholars have argued that studying cultural phenomena within a unifying framework that takes insights from evolutionary biology is potentially useful to integrate separate disciplines (Mesoudi et al. 2006; Charbonneau 2016). Cross-disciplinary approaches have also been defended for the field of language evolution (e.g. Bickerton 2003; Christiansen et al. 2002), although the uniqueness of human languages has undoubtedly delayed the construction of theoretical integrated frameworks incorporating both the findings in computational modelling and state-of-the-art empirical knowledge in evolutionary developmental biology.

Human languages are different from other animal communication systems. For example, they exhibit a semantically compositional structure that enables humans to manipulate long and complex chains of signals. This feature is known as the Fredge’s Principle of Compositionality, which essentially boils down to the fact that the meaning of a complex expression is a function of the meaning of its parts (for an overview: Krifka 1999; Szabó 2017, and references therein). Compositionality of meaning is generally assumed to be intimately connected to two other distinguishing properties of natural languages, namely their productivity and their systematicity (Fodor and Lepore 2002), which in turn are related to a property of the syntactic principles responsible for the construction of complex expressions. This distinctive feature of language is often referred to as recursion (e.g. Hauser et al. 2002). Other properties of languages are duality of pattern, convexity, linearity or displacement (Hockett 1960, 1966). The simultaneous presence of these distinctive features makes human languages open-ended communication systems (Kirby 2017).

A common way to explain the origins of these features and, more generally, language structure, is natural selection (e.g Pinker and Bloom 1990). But, as Eric Lenneberg pointed out, this approach can be problematic if it intends to explain evolution as a simplistic and unidirectional mapping of genotypes onto phenotypes, and it only pays attention to “the biological usefulness of certain features of animal communication” (Lenneberg 1967, p. 253). Firstly, because evolutionary biological dynamics can be radically altered by other external pressures such as the environment (Gilbert and Epel 2009; Sultan 2015) or, in the case of linguistic phenotypes, culture (Kirby 2017); secondly, because dependencies between genes and phenotypes can not be drawn unidirectionally or attending to a single locus (Marcus and Fisher 2003; Fisher and Vernes 2015) when it comes to explaining language; and finally, because it has been shown that a constellation of processes that bias selection and modify the frequency of heritable variation, such as developmental biases and niche construction, can alter the way in which natural selection proceeds (Laland et al. 2000; Robert 2004; Deacon 2010; Lewens 2019).

This paper is structured as follows. In the next section, we will focus on the definition of domain specificity and we will propose a revision of the concept in light of alternative models that eschew traditional versions of genetic determinism. Then, in Sect. 3 we will review some relevant models suggesting that language regularities can be successfully acquired and transmitted without the need of strong genetic encoding. In Sect. 4, we expound the minimum requirements for iterated learning to work in the light of recent controversies. Then, in Sect. 5 we provide a brief review of the history that led to the development of eco-evo-devo models and argue for the need to abandon traditional dichotomies in order to better account for the linguistic phenotype. Finally, in Sect. 6 we will revisit a variety of studies that might be adding evidence to support the main hypothesis of this paper: the conceptual apparatus of eco-evo-devo models is compatible with the findings of iterated learning models and helps dissolving the boundaries between a traditional dichotomy that has been limiting our understanding of the evolution of language. Using the notion of niche construction, where individual organisms play a much central role than in standard approaches, we will propose an integrated theoretical framework that stresses the need to connect the development of sensory-guided motor capacities and the requirements for iterated learning. Our conceptual model intends to help fill gaps in our knowledge about how variational explanations (changes due to variation within the population) and developmental explanations (changes due to variation within the individual) relate, as well as to provide a framework for language and cultural evolution to advance in the construction of new hypotheses upon which triple-inheritance models can be developed.

Domain specificity

The modularity of mind is a hypothesis about the architecture of mind according to which a number of cognitive systems, typically associated to perception, operate in characteristic ways that makes them, among other things, domain-specific and mostly impermeable to the operation of other modules and cognitive systems (Fodor 1983, 1985). There is no conceptual or logical connection between the notion of modularity and nativism. But it is often the case that their proponents, on the basis of such considerations as the poverty of stimulus argument, assume that such functionally defined modules are associated to the corresponding Chomskyan-modules (that is, innate repositories of domain-specific information that are supposed to underlie our cognitive abilities in various domains). Accordingly, a system is domain specific if the class of objects and properties that it computes information about is restricted within narrow limits (Fodor 2000; Robbins 2017). Under this definition, humans would be endowed with systems of knowledge which serve as specialized evolutionary devices for specific tasks. For example, knowledge of language would be a domain-specific system that gives humans the ability required for the acquisition and use of language (Chomsky 1986; Spelke and Kinzler 2007). A particularly radical version of this stance is exemplified by evolutionary psychology and its massive modularity thesis according to which all extant human cognitive abilities (not just the peripheral ones) are modular and, also, adaptations to the environment of the Stone Age (Barkow 1992; Plotkin 1997). Differences as to the extent of modularity notwithstanding and focusing our attention on language, it is certainly true that both stances appear to be committed to some form of nativism according to which neurally specific modules for language are shaped by specific genes (Pinker and Jackendoff 2005; Berwick and Chomsky 2016; Kirby 2017). Researchers have found support for domain specificity in many different ways: e.g., looking at the competencies of infants, comparing human capacities with other animals or using the poverty of the stimulus argument as evidence for universal grammar (Chomsky 1967; Pinker 1991; Berwick et al. 2011).

But attempts to address key questions such us “why only us?” (see Berwick and Chomsky 2016) or, how do children acquire language without sufficient evidence in the primary linguistic data (Chomsky 1965), have not always ended up proposing models that verify the existence of a domain-specific module for language. To be sure, even the human- and language-specificity of the computational operation Merge, the only putative genetically determined residue of UG in Chomsky’s Minimalist Program (Chomsky 1995, and later work), has been called into question on the grounds of a detailed analysis from the perspective of the notion of biological homology (Balari and Lorenzo 2013, 2015). In the field of cultural evolution, these same questions have been addressed using a variety of experimental and computational methods that, without relying on strong genetic constraints or domain specificity, model the successful acquisition and emergence of universal properties of language (e.g. Zuidema 2002; Kovas and Plomin 2006; Scerif and Karmiloff-Smith 2005; Morgan et al. 1989; Chater et al. 2009; Smith and Wonnacott 2010; Culbertson and Kirby 2016).

The notion of domain specificity has traditionally been linked with innateness in different ways, causing significant confusion in the field. However, if proponents of innateness argue that language acquisition is determined by genetic factors, and proponents of domain specificity claim that language is processed in localized modules that deal exclusively with a single information type, then we can no more argue that these two issues are automatically interchangeable in the debates about the evolution of language (Bates 1994; Elman et al. 1996).

It is relatively common ground in the field of cultural evolution studies that domain-specific constraints, when genetically wired, might have evolved to take the form of weak biases or general capacities that, amplified by culture, interact with the linguistic system in domain-specific ways (Culbertson and Kirby 2016). Be that as it may, we suspect that the whole debate may acquire a totally different flavor as soon as one adopts a developmental view. Firstly, because of the fact, firmly established already by 19th century embryologists like Karl Ernst von Baer, that all development follows a pattern going from the less specific to the particular and is hardly a matter of master control genes (Minelli 2003). Secondly, because the traditional interpretation of the innate-acquired distinction, where what is innate is typically assumed to be internal to the object in question and, consequently, genetic, is most probably misleading (Wimsatt 1986; Keller 2010). We see no reason why the case of language should be different.

In the next section we will review some models that show how language features can emerge in the absence of strong genetic constraints, and demonstrate how such abilities as copying and sharing might be sufficient, when combined with iterative learning, to yield outputs that appear domain-specific without the need for strong language-related biological predispositions.

Challenging domain-specificity

The argument from the poverty of stimulus (henceforth, POS) states that children are not exposed to sufficient data within their linguistic environment to induce their native language. In 1967, Mark Gold provided a formal proof that has usually been interpreted as evidence for this argument (Gold 1967). Gold’s proof showed that, given a context-free grammar, regardless of the number of samples from an infinite language a presented to a learning algorithm, the algorithm can not accurately determine whether the samples belong to an infinite language or to a finite subset thereof containing the samples in question.

To investigate how a grammar that would be unlearnable by Gold’s method could be acquired successfully, Zuidema (2002) constructed a model that uses “cultural evolution.” The model implements linguistic abilities using context-free grammars and three operations called “incorporation,” “compression” and “generalization.” When the algorithm is initiated it produces random strings, simulating transmission from the parent to the child. In these randomly generated strings some regularities may appear, for example: aab, bab, cab. In this example, the child can compress the substring ab into the non-terminal X: \(S\mapsto aX\), \(S\mapsto bX\), \(S\mapsto cX\), \(X\mapsto ab\). Then, say the child obtains another rule from another set of strings: \(Y\mapsto d\). Now, the generalization operation can equate the non-terminals X and Y. This means the child can obtain the unobserved strings ad, bd, cd from the resulting grammar. Over generations, in a population of agents, language becomes more structured and unseen strings more learnable, increasing communicative success. With this elegant model, Zuidema (2002) showed that POS is not necessarily a problem for learners to successfully acquire grammars from a class that is unlearnable by Gold’s criterion.

To explore the extent to which language genes in the form of a highly specialized module could have co-evolved with language properties, another well known computational model was constructed by Chater et al. (2009). They simulated a population of language learning agents where arbitrary linguistic principles could become genetically encoded via the Baldwin effect (Baldwin 1896; Weber and Depew 2003, for a contemporary perspective). In evolutionary biology, the Baldwin effect describes a process where individuals have the ability to acclimatise to new pressures during their lifespan by learning a new behaviour. This mechanism would affect the individuals’ reproductive success and the new trait could become gradually encoded in the genome over generations. However, Chater et al. (2009) showed in their study that this genetic encoding gets significantly reduced when the rate of language change is high enough. Therefore, they concluded, since language changes much more rapidly than genes, genetic evolution of domain-specific constrains is unlikely. As pointed out by Culbertson and Kirby (2016), there is nonetheless room for a more nuanced thesis that supports the existence of weak biases (that is, soft constraints that can impose a continuum of weak preferences) affecting language acquisition. And it could be the case that these weak biases of the individuals were not reflected in the spoken language.

Kirby et al. (2007) investigated this by testing how innate biases are related to universal properties of language. Their model shows that cultural transmission can amplify weak biases and end up producing language properties which are near universal. If this is the case, cultural transmission would have produced “apparent adaptations,” that prevented the evolution by natural selection of strong constraints in the form of domain-specific genes, mainly because those genes would be highly prone to drift.

We know, however, that the relationship between learners’ biases and language structure is not straightforward when it comes to explaining linguistic variation. In a recent study that uses both a Bayesian model of learning and transmission and collected data from an artificial language learning experiment that mirrors the model, Smith et al. (2017) showed that weak biases can have a wide range of effects on language structure, from strong to weak or even no effects. Therefore, transmission and use are essential for understanding the interactions between biases and statistical learning.

For the purpose of this paper, the examined models constitute sufficient evidence to illustrate the discussion for the next section. For a more detailed review of this line of work see Kirby (2017).

What does iterated learning actually require?

Berwick and Chomsky (2016) argue that cultural evolutionary approaches have generally mistaken the word “universal” as a property of the faculty of language with Greenberg’s linguistic universals (Greenberg 1966), or properties of externalized languages. They claim that what ultimately evolves in these models is a population of learning agents’ choices and agents that already had the ability to choose between two alternative concept representations. In their words, this would not solve the problem of where a universal comes from (e.g. compositionality) because the ability to build context free grammars, generate infinite languages and/or even something like Merge is presupposed. Thus, they conclude, iterative learning models do not satisfactorily attempt to delimit any pre-existing innate universal grammar (UG) related with the language faculty.

But this might not be the case. Iterated learning models of language evolution define stochastic processes that can be mathematically characterized using Markov chains. To analyze the requirements for iterated learning, it is necessary to understand the core concepts that define the properties of Markov chains. In Appendix A, we provide an accessible summary that includes a brief characterization of Markov chains, along with a numerical example applied to language transmission. For now, it is important to note that Markov chains are very useful to analyze iterated learning processes by computing a transition matrix (a square matrix that gives the probabilities of different languages going from one to another) and finding the stationary probability of each language. For more detailed explanations of Markov chains see Kemeny and Snell (1983), Brémaud (1999, Ch. 2) or Griffiths and Kalish (2007), for example.

However, in the real world, learners have individual biases that affect the results of the predictions of an iterated learning process that has been reduced to a Markov chain. In order to construct learning algorithms that incorporate a wide characterization of these biases for a wide set of cognitive features, a number of researchers have used bayesian agents applied to human cognition (e.g. Anderson 1990; Oaksford and Chater 1998) and the emergence of linguistic regularities (e.g. Kirby 2001; Brighton 2002; Smith et al. 2003). Interestingly, the predictions of these computational approaches have been successfully reproduced and tested against data obtained from psychology experiments with human participants (Tamariz and Kirby 2015; Kirby et al. 2008).

Here, we review Bayes’ rule applied to language acquisition. For a detailed analysis of iterated learning using learning algorithms based on Bayesian inference, see Griffiths and Kalish (2007).

The Bayesian framework used in iterative learning models computes the posterior probability of an event according to the Bayes’ theorem:

$$\begin{aligned} P(h\mid d)=\frac{P(d\mid h)P(h)}{P(d)} \end{aligned}$$
(1)

where P(h), named prior probability distribution, is the estimate of the probability of the hypothesis \(h\in {{\mathscr{H}}}\) before d is observed (it encodes learner’s biases). \(P(h\mid d)\) is the posterior probability, the probability of h after d is observed. \(P(d\mid h)\), named the likelihood, is the probability of observing d given h, and P(d), named the marginal likelihood, is the probability of d averaged over all hypothesis,

$$\begin{aligned} P(d)={\sum _{h\in {{\mathscr{H}}}}P(d\mid h)P(h)} \end{aligned}$$
(2)

Applied to language acquisition, h is a language, and d the set of utterances sampled from the target language. Additionally, each learner has a learning algorithm (LA) that specifies the procedure for choosing h after observing d, and a production algorithm (PA) that specifies how they choose d given h.

Now, if we assemble this rule from generation to generation by forming an iterative learning process based on the principles of Bayesian inference, we have a Markov chain where each learner produces a set of data (a posterior distribution over languages) by combining a prior (representing their inductive biases) with the data produced by the previous generation. Then, this data is supplied to the next generation, and so on, as illustrated in Fig. 1.

Fig. 1
figure 1

Iterated learning has been proposed as an explanation for the emergence of linguistic regularities and the existence of linguistic universals. Each learner sees a set of utterances (d) produced by the previous generation, forms a hypothesis (h) about the language from which those utterances were produced, and uses this hypothesis to produce the data that will be supplied to the next generation. Figure adapted from Griffiths and Kalish (2007)

As Griffiths and Kalish (2007) stress, a prior should not be interpreted as reflecting innate predispositions to language acquisition, but as a collection of factors, not necessarily domain-specific constraints, that affect the agents’ own hypothesis. So, although there might be a sense in which we could correctly say that there are basic functionalities built into the model, none of them are language related. In fact, these models require only two skills: the ability to learn data and the ability to produce data (for transmission). The mechanisms underlying these abilities may be quite elaborate, but, to make the point clearer, not innately determined (in the traditional sense) to deal with specifically linguistic data.

We can observe, then, that the concept of a pre-existing biological condition leading our species alone to possess language is not in fact discussed in the above computational models. Instead, what is challenged is the specificity and language-related origin of that genetic basis. In this line of thought, Kirby (2017) pointed out that if we want to look for human adaptations related to a precondition for language, then we might better look at the biological origin of these two traits: the ability to copy vast sets of behaviours, and our predisposition to share. These two non-language-specific predispositions are the only ones that are required for iterated learning models to work.

If this is true, the relevant question to ask now if we are looking for human adaptations that biologically configured the so called language-ready brain in our species, is which current biological approach accounts best for the emergence of the key necessary biological changes that brought about the mentioned abilities (to copy and share). In the next section, we will suggest that evo-devo can be seen as the best general perspective to be taken when approaching this question, and we will discuss which implementation of evo-devo best fits the requirements of iterated learning to operate.

Throughout this investigaton, we will aim at showing that such traditional distinctions as general vs. specific or Faculty of Language (Narrow sense; FLN) vs. Faculty of Language (Broad sense; FLB) of Hauser et al. (2002), should be abandoned in order to construct a less simplistic developmental approach to the complex cognitive capacities that serve as the basis for iterative learning processes to give rise to language universals.

Which approach should we take to account for the preconditions for iterated learning?

The Modern Synthesis, a term popularized by Huxley (1942), gave rise to modern biology by gathering a number of postulates from natural selection, population genetics and Mendelian inheritance into an articulated corpus of empirical evidence and mathematical laws. Ernst Mayr, a key evolutionary biologist of the past century, was one of the main figures of this conceptual revolution in the field. Among his contributions, articulating the biological species concept and studying different forms of allopatric speciation stand out. Mayr’s open skepticism towards what he called “beanbag genetics” notwithstanding (Mayr 1963), he nonetheless contributed to the consolidation of a biological thought centered on the notion of a “genetic program” (Mayr 1982) and on a neat separation of “proximate” vs. “ultimate” causes (Mayr 1961). As a direct consequence of this stance, the mainstream orientation of the Modern Synthesis tended to ignore developmental processes and their role in evolutionary dynamics (Maynard Smith 1982; Robert 2004; Amundson 2005), while, at the same time, organisms disappeared from the explanatory apparatus of evolutionary biology (Walsh 2015). However, such a view on causality has often been considered as highly problematic (e.g., Lewontin 1974; Oyama 2000b; Laland et al. 2013; Walsh 2019), and Mayr’s genetic program has also been shown to be unable to reflect the environmental context-dependency of phenotypic outcomes (e.g., Lewontin 1983; Gilbert and Epel 2009, 2015; Sultan 2015).

But in light of the discovery of the toolkit genes (highly conserved genes whose products regulate gene expression and control the organism’s embryonic development) developmental geneticists and evolutionary biologists have been forced to confront each other’s ideas in a more interconnected way. This filled the gap between both levels of analysis and gave rise to evo-devo, a new discipline that, since its origins, has been expanding upon the evolutionary synthesis (Carroll 2008; Pigliucci and Müller 2010, for two slightly different perspectives of this new synthesis).

Evo-devo, however, is not a unified theory (Hall 2003; Benítez-Burraco and Longa 2010), but a theoretical trend or general perspective where different evo-devo models fall here or there. Balari and Lorenzo (2013, chapter 6) describe three main categories of evo-devo approaches:

  1. 1.

    those that encompass genome deterministic models;

  2. 2.

    those that include developmental factors beyond the genes but maintain a gene-centered approach; and

  3. 3.

    those that hold that disparate factors interact to bring about ontogenetic outcomes.

The last category approximates what may be categorized as the “eco-evo-devo” approach, which shares a fair number of assumptions with the framework of the extended evolutionary synthesis (henceforth EES; Sultan 2017; Müller 2020). According to the proponents of the EES, developmental processes, including cellular products, intermediate phenotypic states, environmental inputs and behavioral practices, share with inclusive inheritance and niche construction, the potential to drive individual variation and, ultimately, evolution (Laland et al. 2015; Müller 2017). Given this classification, it is not difficult to tell a priori which evo-devo category fits better with the concept of a strong domain-specific faculty of language and which one explains human motor capacities as the result of a complex architecture of interconnected developmental levels.

For example, as pointed out by Benítez-Burraco and Longa (2010), Chomsky has recently suggested non-trivial analogies between the biolinguistic approach (BA) and evo-devo (Chomsky 2007, 2010). According to Benítez-Burraco and Longa, however, Chomsky’s analogies mostly refer to that version of evo-devo that fully assumes a gene-centered perspective (e.g., Carroll 2005), a stance that might have been appropriate as regards the Principles-and-Parameters Theory, but that, as these authors extensively argue, does not even fit well with a minimalist BA. Be that as it may, Chomsky has also advocated for a tripartite causal model according to which different aspects of the linguistic phenotype may be neatly attributed to well-delimited factors, namely genetic endowment, experience (i.e. the environment), and general principles not specific to the faculty of language like principles of data analysis, computational efficiency, or developmental constraints, among others (Chomsky 2005). This is precisely the kind of analysis of causes that Lewontin (1974) showed to be impossible and that today still survives in the nature vs. nurture debate under its different guises (Keller 2010).

Chomsky’s views are direct heirs of a tradition where such dichotomies as internal vs. external, inherited vs. acquired, and genes vs. environment (or culture) have played an important explanatory role. But this stance radically comes into conflict with the idea, widely shared by most supporters of view (iii) above, that nature is not genetic but phenotypic; that nature is not a self-contained internal program but rather the open-ended product of a dynamic developmental interaction between internal factors including genes with external, environmental ones (Oyama 2000a). Development thus arises from a complex network of causal interactions in which organism and environment co-construct each other (Laland et al. 2013, 2014) through reciprocal influences that effectively break the supposed barrier between the internal and the external (Sultan 2015, 2019). An immediate consequence of this is that the genetic regulatory systems of developmental genetics loose their causal primacy in favor of the causal complex made up by the organism and its environment. Similarly, the concept of inheritance also changes to embrace an extended form of inheritance where the developmental ‘resources’ range from DNA sequence, to environmentally-induced epigenetic marks, to the location and ecological niche the organism inhabits (Laland et al. 2015; Bonduriansky and Day 2018).

We have focused on Chomsky’s attitudes towards a number of crucial issues that eco-evo-devo invites us to look at through a different prism, but it should be clear that some of these criticisms also apply to other frameworks, not necessarily friendly to Chomskyan thought. To be sure, since—and paraphrasing Lewontin (1974, p. 401)—the relevant questions are not whether the phenotype of an individual is the result of either environment or genotype, or of either biology or culture, or of either nature or nurture, because the phenotype, to the extent that all these dichotomies make real sense, is the result of both. Accordingly, the framework we are advocating for here definitely shares a number of central points with Cecilia Heyes’s “Cogntive Gadgets” (Heyes 2018). Indeed, we agree that when the cognitive equipment of newborn humans is exposed to “culture-soaked” human environments, it changes dramatically. Similarly, in this paper we propose that we are born endowed with complex cognitive mechanisms that emerge as part of our development and that these mechanisms continue to develop throughout life within our deeply social and anthropized environments. These social environments constitute diverse human niches, which are in turn affected by horizontal and vertical transmission processes in which the ability to copy and share information at a fast rate play a crucial role (Tomasello et al. 2005; Tomasello and Carpenter 2007; Laland 2017a). But our approach here also differs from Heyes (2018), at least, in denying the necessity, if not the possibility, of retaining the nature vs. nurture dichotomy and of neatly identify the causal contribution of each. The cognitive equipment of newborn humans may not differ substantially from the minds of closely related species. Yet, human cognitive diversity and evolved predispositions might be the product of observable causal processes whose causes can not be depicted as totally disentangled, because that would lead to epistemological contradictions when it comes to clearly delimiting objects, causes and effects. For example, DNA is both inherited and environmentally responsive, and we know this in enough detail to move beyond the nature-nurture debate) (Robinson 2004). Thus, in the model we propose here, the contingencies of those traditional categories are reduced to mere instrumental categories.

To the extent that this new paradigm constitutes the recognition of the need to adopt a pluralistic attitude toward the complex nature of the language faculty, the emergence of which can not be clearly quantified in terms of internal versus external structures, nor characterized as a unique object, we think that eco-evo-devo and EES approaches are also demanding the abandonment of such traditional distinctions as FLN/FLB and others already we already referred to above. Many of these arguments that we will not expand here have been amply analyzed using a variety of biolinguistic approaches (Boeckx 2014; Balari and Lorenzo 2018).

Dropping such distinctions, we think, the search for the faculty of language is also freed, to some extent, from the metaphor of specificity/generality, facilitating the construction of a non-reductionist, less simplistic, general theory of language that encompasses a complex multifactorial cognitive human capacity that does not yield specific linguistic outputs by itself, but is required subsequently to give rise to the phenotype through learning and transmission.

In the next section we will revisit some studies that add evidence to support such an interconnection of factors underlying the so-called “language-ready brain,” without the need for strong straight dependencies between specific genes and specific language properties—even if key genes have obvious subsequent dramatic effects on the development of language. We will focus on the relationship between genetic factors, such as FOXP2, and cognitive abilities. Then, in the light of niche construction, we will argue in favor of a general theory of language evolution that integrates the developmental architecture of cognitive abilities and iterated learning models.

Towards an integrated theory: insights from comparative genomics and niche construction

As soon as we depart from a simplistic gene-centered approach that relies on an incredibly lucky mutation or behaviorally assimilated trait to explain language complexity and our capacity to acquire it, we face the need to expand our approach to incorporate developmental processes that explain how complex functional phenomena evolve. During the last decades, neuroscientists have gather evidence that some cognitive domains can operate as overlapping functional architectures. For example, language processing has traditionally been associated to Broca’s area, but fMRI studies have also identified activation patterns in Broca’s area associated with recognition, imitation or movement preparation (e.g. Anderson 2010). These neural reuse theories seem to be incompatible with strong conceptions of structural or functional modularity and offer an interesting perspective for the conception of more comprehensive evolutionary-developmental models.

Recent evo-devo approaches have focused on the molecular analysis of behavioural traits such as learning and memory applied to the evolution of language. For example, genes relevant for language, including the transcription factor FOXP2, have been identified. FOXP2 was initially identified as a genetic factor of a speech disorder in a family known as KE, and was thus the first gene to be associated with speech and language (Fisher et al. 1998; Lai et al. 2000).

Despite the strong correlation between a FOXP2 variant and developmental verbal dyspraxia (DVD) (Lai et al. 2001), it must be noted that FOXP2 belongs to a complex molecular network of genes that build proteins that in turn regulate the expression of other genes. In particular, FOXP2 is controlled by a set of upstream regulators, and in turn it regulates a vast set of target downstream genes, by repressing or activating them (Shu et al. 2001; Vernes et al. 2007).

Although FOXP2 is the best known gene in the field of language evolution, it doesn’t work alone. A huge variety of gene products regulate neuronal development and function, including “proliferation, migration, neurite outgrowth, and axon guidance, as well as development, maintenance, and plasticity of synapses” (Fisher and Vernes 2015). From an evo-devo point of view, human speech can be described as a form of auditory-guided, learned vocal motor behaviour, and FOXP2 and its regulatory molecular network might be key factors to “shape neural plasticity in cortico-basal ganglia circuits underlying the sensory-guided motor learning in animal models” (Scharff and Petri 2011). Indeed, the connection between vocal learning abilities in several species including humans and a number of homologous gene networks and brain structures is today incontestable (Jarvis 2019, for a review). Since language is culturally transmitted, a cognitive impediment within these molecular networks would affect the emergence of language properties in a community through iterative learning processes. Actually, the emergence of language properties through cultural transmission requires both ingredients: the adequate development of neural circuits and the adequate social structure. Note that neither of these components by themselves would be able to result in linguistic specific outcomes.

So, to what extent have these two processes (the emergence of a regulatory neural network and the social requirements for iterated information transmission) developed independently one from the other? In Sect. 4, we showed that iterated learning does not require innately determined abilities to account for the emergence of linguistic regularities and in Sect. 5 we showed that an “eco-evo-devo” approach is a valid framework to account for the preconditions for iterated learning without the need of relying on strong genetic constraints. Both, iterated learning processes by generating novel linguistic phenotypes, and “eco-evo-devo” processes (such as developmental plasticity, genetic accommodation and extragenic inheritance) by facilitating evolutionary transitions and the alteration of environments and niche construction (Gilbert et al. 2015), shape evolution by constructing extended phenotypes (Simon and Hessen 2019), which in turn promote niche construction, that is, the ability to produce a better nests, houses, institutions or environments (including linguistic ones).

Niche construction, therefore, can be thought of as an emergent property of triple-inheritance systems that take into account all the three transmission pathways of genes, culture and environment (Kobayashi et al. 2019). Just as adaptive behavioral phenomena results from iterative processes at different scales, niche construction selects the behavior of the organisms in an iterative process during ontogeny (Simon and Hessen 2019). Organisms’ traits develop by interacting with the environment, and in turn increasing the expression of synergistic relationships between different levels of development (Lewontin 1983). A similar argument was originally offered by Deacon (1997) and Bickerton (2009), where it is suggested that the repetitive use of symbolic communication can create socially artificial niches that in turn enforce new pressures on human cognition.

Using ideas originally developed by Lewontin (1983), Laland et al. (2000) constructed a version of this conceptual model by mapping the causal relationship between biological evolution and cultural change. This model proposes that biological evolution depends not only on natural selection and genetic inheritance but also on “niche construction.” According to this framework, phenotypes have a more active role in development and culture amplifies the human capacity to alter sources of natural selection. Cultural traits affect the environment and may have additional effects on how evolution proceeds. These changes, in turn, may persist throughout generations, beyond the lifespan of an individual organism. Crucially, cultural change can occur at a much faster rate than biological change. Culture, therefore, can relax or intensify selection and create new demands by changing ecology, which favors new adaptations (Whitehead et al. 2019). Interestingly, a relaxation of selection at the organism level may have given rise to new complex synergistic features of the human language capacity, which may explain why so much language information is “inherited” socially (Deacon 2010). At the level of the population, as our species constructed its niche for enhanced social relations, where “self-domestication” or “self-control” might have driven the selection of anatomical and behavioral traits whose functionality is related with mild neural crest cell deficits during embryonic development (Wilkins et al. 2014; Thomas and Kirby 2010; Shilton et al. 2020).

Fig. 2
figure 2

Integrated causal graph for the emergence of a language phenotype through developmental interactions. A change in the source variable causes a change in the destination variable. The area of the square in the middle represents learning and production through iterated learning processes. Learning and production algorithms interact with the environment and the neural plasticity underlying the individual sensory-guided motor capacities. Niche construction from all ontogenetic processes modifies human selective environments. In turn, individual motor capacities within a population with social structures favoring transmission through iterative processes would result in a particular language phenotype. This integrated theory does not assume that acquired language regularities became innate or specific. Instead, language phenotypes would have evolved due to selection affecting multiple levels of all these mechanisms (this is represented in the graph with the right hand bracket). From generation to generation, language change occurs faster than other biological processes (this is represented in the causal graph with arrow thickness). Dashed lines represent permeability between developmental categories. Some relations of this causal graph have been designed following an EES framework (Lewens 2019)

Niche construction can result from different sources (genetic, ontogenetic, and cultural processes) and affect both biological and cultural evolution (Laland et al. 2000); for a number of examples, see Naiman et al. (1988), for beavers, Laskowski and Pruitt (2014), for social spiders, and Feldman and Cavalli-Sforza (1976) and Lotem et al. (2017), for some cases concerning humans.

Regarding communication, niche construction has also been invoked in hypotheses about language evolution (e.g., Bickerton 2009, 2014; Deacon 2010; Laland 2017b, among others). A number of learning biases and sensory-guided motor capacities (e.g. vocal control) evolved in response to new environmental and social pressures. Since this new communicative feature became extremely important within human populations for successful integration in human societies and, in turn, reproduction, it also could have brought about selection favoring better acquisition and transmission. This would obviously include our capacity to copy and share large sets of communicative variants. However, unlike deterministic and Baldwinian models, niche construction does not assume genetic assimilation of linguistic features, nor innate or language-specific knowledge. Instead, niche construction favors selection of motor capacities, cognitive biases and environments (e.g. social structures) that in turn facilitate the maintenance of such a niche.

Using a version of the previous niche construction framework, a general causal graph can be constructed by putting together all the interactions between the relevant variables (Fig. 2). A key feature that differentiates our model is the inclusion of insights from both iterated learning models and current eco-evo-devo theoretical approaches. The model is constructed in the light of comparative genomics and niche construction: in our integrated version, niche construction processes, which are in continuous interaction with both the individual motor capacities of individuals and the environment, favor transmission trough iterative learning processes, resulting in a particular language phenotype. Thus, niche construction is considered as a key feature of the model, since it has a prominent role altering two main sources of variation that are directly related with iterated learning processes. On the one hand, sensory guided motor capacities from neural development, and on the other, environmental structures such as social structures, rules or cultural conventions. In turn, a change in these two sources of variation can be identified as two factors that modify the agents’ learning and production algorithms during cultural transmission. This consequently connects neural development and the agents’ own hypotheses in a way that could be potentially implementable by using iterated learning models. To capture this idea in an integrated model, each learner’s learning algorithm (LA) and production algorithm (PA) should be constructed as a function of the variables altered by niche construction. Since cultural transmission of language occurs at a much faster rate than organic evolution, it can quickly create new pressures that, in fast iterative cycles, accelerate the emergence of new linguistic adaptations. This effect would relax selection at the level of the individual, due to a large redistribution of selective pressures and a diversification of social traits inheritance mechanisms.

We agree with Odling-Smee and Laland (2009) that niche construction will fail to account for the evolution/development of human language until we take into account the exceptionally powerful role of human cultural processes and the mutual scaffolding effects between them, cognitive abilities, and individual biases in our species (Wimsatt and Griesemer 2007; Wimsatt 2014). For example, since language change occurs at a faster rate than genetic changes, neural development selected for language might reflect “the most persistent and invariant demands of the highly variable linguistic niche” (Deacon 2010; Chater et al. 2009). This idea is coherent with a model that incorporates niche construction, organic and cultural evolution, where phenotypes (such as the language phenotype) have a much more active role in evolution (Laland et al. 2000; Gilbert and Epel 2009, 2015; Sultan 2015).

Moreover, since the construction of human cultural niches is able to favor effective cultural responses beyond the lifetime of individuals, with impact, or absence thereof, on human genetics, it adds more uncertainty into the evolutionary process. Depending on the time-frame used, the social and environmental structure, and the specific communicative feature examined, researchers might find different niche construction effects. Individual cases will require individual explanations, and they are showing little by little that the dissolution of traditional dichotomies based on general observations is more necessary than ever before. Here, computer modeling of language evolution that simulates population dynamics using iterative learning are helpful to expand upon theoretical frameworks for language development like the one that we present here.

There are several reasons to think that cultural niche construction can offer an alternative framework to understand language evolution and bring about new hypotheses to test the compatibility of biological and cultural explanations of language. First, cultural niche construction itself is a useful eco-evo-devo approach to fill the gap between traditional dichotomies in the field of language evolution (e.g. biological/cultural, specific/general). Second, it does not assume that acquired language regularities themselves ever become innate (Deacon 2003, 2010) nor cultural responses automatically genetically encoded (Odling-Smee and Laland 2009). And third, it gives room to integrate under the same umbrella the developmental molecular processes leading to the language capacity and the developmental processes of language change leading to the emergence of language universals.

In recent decades, advances in molecular biology and computational modelling have incredibly narrowed down the processes of ecological inheritance related with language structures. Language is endowed with complex regularities that can not be explained by learning nor transmission alone, nor by genetic encoding of human behaviours. And such complexity will probably remain unsolved for several more decades, or more. In the meantime, one observation seems clear, whatever approach we take to explain language development and evolution, it will necessarily have to consider the vast interconnectedness of genetic, ontogenetic and cultural factors that shape language.

Conclusions

In this article we have reviewed a number of studies that show that iterated learning does not require strong genetic constraints in the form of a domain-specific module to give rise to near language universals. Instead, general abilities unrelated with informational specificity, such as the ability to copy and to share, are required to develop language through cultural evolution. These general abilities can nevertheless yield specific properties, and might have emerged from a complex multifactorial cognitive human capacity that includes genes, cellular products, phenotypic states, environmental inputs and behavioral practices. Here we have argued that the developmental explanation of human abilities and iterated learning through cultural transmission are mutually dependent processes and therefore compatible, insofar as both are common processes and interact stabilizing selection at different levels. We have used the notion of niche construction to sketch an integrated framework that builds bridges between evolutionary developmental accounts for sensory-guided motor capacities and cultural evolution guided by iterated learning models. This integrated model aims to overcome traditional boundaries between biological and cultural approaches in the debates of language evolution.