Deciphering diversity through evolution

The living world is nested and multilevel, involves multiple agents and changes at different timescales. Evolutionary biology tries to characterize the dynamics responsible for such complexity to decipher the processes accounting for the past and extant diversity observed in molecules (namely, genes, RNA, proteins), cellular machineries, unicellular and multi-cellular organisms, species, communities and ecosystems. In the 1930s and 1940s, a unified framework to handle this task was built under the name of Modern Synthesis [1]. It encompassed Darwin’s idea of evolution by natural selection as an explanation for diversity and adaptation and Mendel’s idea of particular inheritance, giving rise to population and quantitative genetics, a theoretical frame that corroborated Darwin’s hypothesis of the paramount power of selection for driving adaptive evolution [2]. This framework progressively aggregated multiple disciplines: behavioural ecology, microbiology, paleobiology, etc. Overall, this classic framework considers that the principal agency of evolution is natural selection of favourable variations, and that those variations are constituted by random mutations and recombination in a Mendelian population. The processes of microevolution, modelled by population and quantitative genetics, are likely to be extrapolated to macroevolution [3]. To this extent, models that focus on one or two loci are able to capture much of the evolutionary dynamics of an organism, even though in reality many interdependencies between thousands of loci (epistasis, dominance, etc.) occur as the basis of the production and functioning of a phenotypic trait. Among forces acting on populations and modelled by population geneticists, natural selection is the one that shapes traits as adaptations and the design of organisms; adaptive radiation then explains much of the diversity; and common descent from adapted organisms explains most of the commonalities across living forms (labelled homologies), and allows for classifying living beings into phylogenetic trees. Evolution is gradual because the effects of mutations are generally small, large ones being most likely to be deleterious as theorized by Fisher’s geometric model [4].

Many theoretical divergences surround this core view: not everyone agrees that evolution is change in allele frequencies, or that population genetics captures the whole of the evolutionary process, or that the genotypic viewpoint — tracking the dynamics of genes as ‘replicators’ [5] or the strategy ‘choices’ of organisms as fitness maximizing agents [6] — should be favoured to understand evolution. Nevertheless, it has been a powerful enough framework to drive successful research programs on speciation, adaptation, phylogenies, evolution of sex, cooperation altruism, mutualism, etc., and incorporate apparent challenges such as neutral evolution [7], acknowledgement of constraints on variation [8], or the recent theoretical turn from genetics to genomics following the achievement of the Human Genome Program [9]. Causation is here overall conceived of as a linear causal relation of a twofold nature: from the genotype to the phenotype (assuming of course environmental parameters), and from the environment to the shaping of organisms via natural selection. For instance, in the classic case of evolution of peppered moths in urban forests at the time of the industrial revolution, trees became darkened with soot, and then natural selection favored darker morphs as ‘fitter’ ones, due to their being less easily detected by predator birds, resulting in a relative increase in frequency of the darker morphs in the population [10].

Yet in the last 15 years biologists and philosophers of biology have regularly questioned the genuinely unifying character of this Synthesis, as well as its explanatory accuracy [11]. Those criticisms questioned notably the set of objects privileged by the Modern Synthesis, arguably too gene-centered [12], and its key explanatory processes, since niche construction [13], lateral gene transfer [14, 15], phenotypic plasticity [16, 17], and mass extinction [18] could, for example, be added [11]. Usually these critiques emphasize aspects rooted in a particular biological discipline: lateral gene transfer from microbiology, plasticity from developmental biology, mass extinction from paleobiology, ecosystem engineering from functional ecology, etc. There were also recurring claims for novel transdisciplinary fields: evo-eco-devo [19], investigating the evolutionary dynamics of host and microbe associations (forming combinations often referred to as holobionts), evolutionary cell biology [20], or microbial endocrinology [21], among others. This latter discipline aims at understanding the evolved interactions between microbial signals and host development. Indeed, it is compelling for evolutionary biologists to decipher how such multi-species interactions became established (namely, whether they involved specific microbial species and molecules, and whether they evolved independently in different host lineages).

Evolutionary biology is thus currently undergoing various theoretical debates concerning the proper frame to formulate it [11, 22,23,24]. Here, we introduce an original solution which moves this debate forward, acknowledging that nothing on Earth evolves and makes sense in isolation, thereby challenging the key assumption of the Modern Synthesis framework that targeting the individual gene or organism (even when in principle knowing that it is part of a set of complex interactions) allows us to capture evolution in all its dimensions. Since the living world evolves as a dynamic network of interactions, we argue that evolutionary biology could become a science of evolving networks, which would allow biologists to explain organisational complexity, while providing a novel way to reframe and to unify evolutionary biology.

Biology is regulated by networks

Networks at the molecular level

Although numerous studies have focused on the functions of individual genes, proteins and other molecules, it is increasingly clear that each of these functions belongs to complex networks of interactions. Starting at the molecular scale, the importance of a diversity of molecular agents, such as (DNA-based) genes and their regulatory sequences, RNAs and proteins, is well recognized. Importantly, in terms of their origins and modes of evolution, these agents are diverse. Genes are replicated across generations, via the recruitment of bases along a DNA template, thereby forming continuous lineages, affected by Darwinian evolution. By contrast, proteins are reconstructed by recruitment of amino acids at the ribosomal machinery. There is no physical continuity between generations of proteins, and thus no possibility for these agents to directly accumulate beneficial mutations [25]. Moreover, all these molecular entities are compositionally complex, in the sense that they are made of inherited or reassembled parts. E pluribus unum: genes and proteins are (often) conglomerates of exons, introns [26,27,28], and domains [29,30,31]. Similar claims can be made about composite molecular systems, such as CRISPR and Casposons [32, 33], etc. This modular organisation has numerous consequences: among them, genes can be nested within genes [34]; proteins congregate in larger complexes [35]. Importantly, this modularity is not the mere result of a divergence from a single ancestral form, but also involves combinatorial processes and molecular tinkering of available genetic material [36,37,38]. The coupling and decoupling of molecular components can operate randomly, as in cases of presuppression proposed to neutrally lead to large molecular complexes [39,40,41]. Presuppression, also known as constructive neutralism, is a process that generates complexity by mechanically increasing dependencies between interacting molecules, in the absence of positive selection. When a deleterious mutation affects one molecular partner, existing properties of another molecule with which the mutated molecule already interacted can compensate for its partner defect. Presuppression operates like a ratchet, since the likelihood to restore the original independency between molecules (by reverting the deleterious mutation) is lower than the likelihood to move away from this original state (by accumulating other mutations). Molecular associations can also evolve under constraints [42], eventually reinforcing the relationships between molecular partners, as suggested for some operons [43] and fused genes [44, 45].

Consistently, interconnectedness is a striking feature of the molecular world [46, 47]. Genes belong to regulatory networks with feedback loops [48]. Proteins belong to protein–protein interaction networks. This systemic view contrasts with former atomistic views assigning one function to one gene. First, it is not always correct that a gene produces only a protein, in the case of alternative splicing. Second, it is also unlikely that a protein performs one function, because no protein acts alone. Rather, biological traits result from co-production processes. This is nicely illustrated by the actual process of translation, during which both proteins and DNA necessarily interact, allowing for the collective reproduction of these two types of molecular agents. How these different components became so tightly integrated is a central issue for explaining evolution. Understanding how the molecular world functions and evolves therefore requires analysing molecular organisation and the evolution of the architecture of interaction networks, especially since this structure can partly explain molecular reactions [46, 47, 49, 50]. Thus, systems biologists search for common motifs in molecular interaction networks from different organisms, such as feed-forward loops, assuming that some of these recurring patterns, because they affect different gene or protein sets, may reflect general rules and constraints affecting the construction and evolution of biological organisations [46].

Focusing evolutionary explanations on the structure of the interactions between genes rather than on the primary sequence of the genes is fundamentally different from sequencing genes and inferring history from their sequences alone. One could think here of the case of explaining gene activation/repression. Comparative works on molecular interaction networks show that interactions affect the evolution of the molecules composing networks, which means that beyond compositional complexity, organisational complexity must be modeled to understand biological evolution [46, 51,52,53,54]. Before the analysis of complex networks, compensatory sets of elements, such as groups of sub-functional paralogous genes [55], or groups of genes with pressupressed mutations [39, 40], already stressed the evolutionary interdependence of molecules. However, compensatory interactions between agents, each of them being by themselves poorly adapted, ran counter to the intuition that natural selection will eliminate dysfunctional individual entities. Their recognition invites one to consider Earth as possibly populated by unions of individually dysfunctional agents rather than by the fittest survivors within individual lineages, possibly since early life, according to Woese’s theory on progenotes, namely communities of interacting protocells unable to sustain themselves alone, evolving via massive lateral genetic exchanges [56].

At the molecular level, it is reasonable to assume that processes resulting from interactions of a diversity of intertwined agents offer a crucial explanans of biological complexity. Rather than ‘one agent, one action’, it would be more accurate to consider ‘a relationship between agents, one action’ as the modus operandi of life. Multiple drivers, of different nature, contribute to the evolution of these interactions: among others, gene co-expression/co-regulation [57], sometimes mediated by transposons [58,59,60,61]; the evolutionary origin of the genes [62]; and also physical and chemical laws, as well as the presence of targeting machineries that constrain and regulate diffusion processes in the cell. These types of relationships described at the molecular level are also recovered at other levels of biological organisations.

Networks at the cellular level

Similar conclusions have been reached at the cellular level, also crucial for understanding life history. All prokaryotes and protists are unicellular organisations, and the cell is a fundamental building block of multicellular organisms. Cells must constantly evaluate the states of their inner and outer environments, i.e. to adjust their gene expression and react accordingly [46]. This involves regulatory, transduction, developmental, and protein interaction networks, etc. Cells are built upon inner networks of interacting components, and involved in or affected by a diversity of exchanges, influences and modes of communications (namely, genetic, energetic, chemical and electrical modes). Microbiology has gone a long way toward unraveling these processes since its heyday of pure culture studies, a fruitful reductionist approach now complemented by environmental studies. These latter further unraveled that cells compete and cooperate with, and even compensate for each other, within mono- or multispecific microbiomes [63, 64]. Both types of microbiomes have a fundamental commonality: they produce collective properties and co-constructed phenotypes (Fig. 1) evolving at the interface between cells. Such properties cannot be understood without considering networks of influences: the oscillatory growth of biofilms of Bacillus subtilis cannot be deduced from the analyses of the complete genomes of these clones, but requires modeling metabolic co-dependence within a monogenic community affected by a delayed feedback loop, involving chemical and electrical signals [65, 66].

Fig. 1.
figure 1

An example of co-construction, the case of holobionts. The left circle represents the set of traits associated with a host, the right circle represents the set of traits associated with its microbial communities; the intersected area represents traits that are produced jointly as a result of the interaction between hosts and microbes. When this area becomes large or when co-constructed traits are remarkable, they cannot be correctly explained under a simple model treating hosts and microbes in isolation. This scheme holds for different types of partners

Furthermore, many cellular agents show a relative lack of autonomy. In nature, some groups of prokaryotes display complementary genomes with incomplete metabolic pathways, consistent with the black queen hypothesis, which predicts that our planet is populated by groups of (inter)dependent microbes [67, 68]. More precisely, this hypothesis predicts the loss of a costly function, encoded by a gene or a set of genes, in individuals, when this function becomes dispensable at the individual level, since it is achieved by other individuals that produce (usually leaky) public goods in sufficient amount to support the equilibrium of the community. Thus, gene losses in some cells are compensated by leaks of substrates from other cells, formerly encoded by the lost genes. Some microbes experience labor division [69]. Symbionts and endosymbionts depend on their hosts. The ‘kill the winner’ theory [70] further challenges the notion that the microbial world is a world of fit cellular individuals. This theory stresses a collective process via which viruses mechanically mostly attack cells that reproduce faster and thus regulate bacterial populations, these latter sustaining their diversity because these populations are comprised of individual prokaryotic cells that make a suboptimal use of a diversity of resources. Thus, cells belong to networks that affect their growth and survival, which might explain why most bacteria cannot be grown in pure culture. They only truly thrive within communities, whose global genetic instructions are spread over several genetically incomplete microbes.

Accounting for these internal and external cellular networks requires considering processes that are not central in the synthetic evolutionary theory. Typically, the notion that cellular evolution makes jumps, because new components and processes (such as metabolic pathways) are acquired from outside a given cellular lineage, contrasts with more gradual accounts of biological change, like accounts based on point mutations affecting genes already present in the lineage. Because saltations (macromutations) are essential evolutionary outcomes of introgressive processes, via the combination of components from different lineages, no complete picture of evolution can be provided without these jumps, which are naturally modeled by networks. Indeed, genetic information has been flowing both vertically and horizontally between prokaryotes for over 3.5 billion years [71,72,73,74,75,76,77], and possibly earlier, according to Woese, who proposed that our universal ancestor was not an entity but a process, that is, genetic and energetic exchanges within protocellular communities [56]. Remarkably, this latter case indicates that network modeling could help to tackle a fundamental issue in evolutionary biology: modeling the evolution of biological processes that emerge from interactions between biological entities. Since these interactions can be represented by a network, the evolution of these interactions, describing the evolution of biological processes, can then be represented by dynamic networks. Likewise, eukaryogenesis rested on the co-construction of a novel type of cell, as a result of the endosymbiosis of a bacteria within an archaeon [78,79,80]. Later, the evolution of photosynthetic protists emerged from endosymbioses involving unicellular eukaryotes and cyanobacteria, or various lineages of protists, namely in secondary and tertiary endosymbioses [81]. Such endosymbioses, and their outcomes as illustrated in our work [82, 83], are also naturally modeled using networks.

Moreover, the long-term impact of these introgressive processes on cellular evolution should not be underestimated. For instance, endosymbiosis does not merely introduce new cellular lineages, it also favors the evolution of chimeric structures and chimeric processes within cells [83,84,85,86,87,88,89,90,91]. Such intertwining cannot be modeled using a single genealogical tree, which only recapitulates cellular divergence from a last common ancestor. Even though cells always derive from other cells, a full cellular history cannot be reduced to the history of some cellular components that are assumed to track the history of cellular division [92]. In particular, phylogenetic analyses of informational genes cannot be the only clue to understanding the origins of cellular diversity, since these genes do not reflect how cells are organized, how they gather their energy, and how they interact with each other. Analyzing the co-construction side of evolution requires enhanced models: understanding eukaryotic evolution requires mixed considerations of cellular architecture, population genetics and energetics, which go beyond classic phylogenetic models, which not so long ago were still prone to considering three primary domains of life [93,94,95].

Although invoking multiple agents rather than a single ancestor in evolutionary explanations might appear to contradict the famous Ockham’s razor [96], it does so only superficially when it is likely that many cells are co-constructed, especially in the context of a web of life. Enhanced models including intra- and extracellular interactions appear necessary to understand cellular complexity, including the predictable disappearance of traits (and processes), namely the convergent gene loss of mitochondria and plastids [97] by a process called dedarwinification [98, 99].

Networks beyond the cellular level

Studies of multicellular organisms—we will focus on animals—have led to similar general findings. Understanding animal traits and their evolution requires analyzing the relationships between a multiplicity of agents belonging to different levels of biological organisation, eventually nested, some of which co-constructs animals and guarantees their complete lifecycle [100]. Because no sterile organism lives on Earth, animal development, health and survival depend on microbes. Granted, bacteria can often legitimately be seen as part of the environmental demands in an evolutionary model focused on the host’s lineage; or sometimes bacteria and host could also be considered as part of a coevolution process, with no need to posit the whole as a unit of selection [101]. However, asking ‘who is the beneficiary of the symbiosis as the result of evolution?’ may in some cases lead to the recognition that bacteria and host evolved together and were selected together [102]. More generally, while some microbes contribute to animals’ lives possibly as a result of host-derived selection, others contribute as a result of selectively neutral processes (like microbial priming [103]) [101, 104]. These interactions produce communication networks within the animal body: chemical information circulates between the animal brain and the gut microbiome. These interactions also result in communication and interaction networks between individuals. In some animal lineages, the microbiome affects social behaviors, for instance fermenting microbes inform about the gender and reproductive status in hyena [105]. Components of the microbiome also affect mating choice [106], reproductive isolation and possibly speciation. Consequently, the microbiome now appears as an essential component of animal studies [107]. Microbiome studies, the significance of which is overstated in some respects, nevertheless have shown that the evolutionary intertwining between many metazoa and commensal or symbiotic bacteria could not be neglected anymore and black-boxed in favor of purely host gene-centered evolutionary models. And the associations between hosts and microbes do not need to be units of selection to be part of the recent insights that support the novel theoretical framework proposed here. Their interplay imposes reconfigurations of practices, theories and disciplines [108]. As a result of our improved insight into evolution, zoology and immunology [109] become theaters of new ecological considerations [110], sometimes strangely qualified as Lamarckian [111, 112], because animals can recruit environmental microbes and transmit them (with a non-null heritability [113]) to their progeny. Therefore, nuclear gene inheritance alone may provide too narrow a perspective to account for the evolution of all animal traits; as an example, aphid body color depends on animal genetics and the presence of Rickettsiella [114]. Population genetics gets included in a broader community genetics, which also considers transmission of microbes and their genes [108, 114]. The use of gnotobiotic and transbiotic animals becomes a new experimental standard to analyze multigenomic collectives without counterparts in modern synthesis theories. These collectives harbor morphological, physiological, developmental, ecological, behavioral and evolutionary features [115,116,117,118,119] that are not purely constructed by animal genes, but rather appear to be co-constructed at the genetic and metabolic interface between the microbial and macrobial worlds, while the content of the respective animal genomes only provides incomplete instructions. Understanding animal evolution requires understanding the interaction networks between components from which these taxa evolved, and the networks to which these taxa still belong.

In ecology, an analogous turn towards network thinking has been promoted since the 1990s with the general acceptance of the notions of metapopulations [120] and then metacommunities [121]. These views suggest that the dynamics of ecological biodiversity is not so much located within a community of species but rather in a metacommunity, which can be thought of as a network of communities exchanging species, while targeting one community blinds one to what genuinely accounts for biodiversity and ecosystem functioning [122].

This quick overview provides evidence that networks are at the origin of the genes of unicellular and multicellular organisms and central for their functions. The living world is a world of ‘and’ and ‘co-’. From division of labor and compensations, to dependencies and co-constructions, etc.: interactions (which only begin to be deciphered) are everywhere in biology. Thus, explaining the actual features of biodiversity requires explaining how multiple processes, interface phenomena (co-construction of biological features, niche construction, metabolic cooperation, co-infection and co-evolution) and organisations (for instance, from molecular pathways to organisms and ecosystems) arose from interacting components, and how these processes, phenomena and organisations may have been sustained and transformed on Earth.

Reframing evolutionary explanations from the scaffolded evolution perspective

Introducing a classification of interacting components

While classic evolutionary models, prompted by Darwin’s famous tree [123], mostly stress how related entities diverge in relative independence, it appears important to show how a diversity of components, which may not be related, interact and produce various evolutionary patterns.

The notion of scaffolding [124], which describes how one entity continues an event initiated by another entity, and relies on it up to the point that at some timescale it becomes dependent upon it for further evolution, appears as a fundamental relationship to describe the evolution of life. We propose scaffolding should become more central in explanations of evolution because no components from the biological world are actually able to reproduce, or persist, alone (Fig. 2). Each entity influences or is influenced by something external to it, and is consequently part of a process. Scaffolding thus defines the causal backbone of collective evolution. It describes the historical continuity between temporal slices of interaction networks, since any evolutionary stage relies on previously achieved networks and organisations. Therefore, describing the evolution of interactions requires explanations to address the following issues: what scaffolds what, what transforms the environment of what, and are these influences reciprocal? Characterizing the types of components that, together, have evolutionary importance through their potential interaction is therefore a central step to expanding evolutionary theory.

Fig. 2.
figure 2

Different types of scaffolding, at four levels of biological organisations. a Functional interactions at the molecular level. b Introgression and vertical descent at the cellular level. c Co-construction at the multicellular level. d Niche-construction and physico-chemical interactions at the eco-systemic level

We propose that a first distinction can be made between obligate and facultative components. Suppressing the former impacts the course and eventually the reproduction of the process to which they contribute (Fig. 3), whereas facultative components do not hold such a crucial role, and may simply be involved by chance. A second distinction is whether the components are biotic (genes, proteins, organisms…) or abiotic (such as minerals, environmental, cultural artefacts). Abiotic components can be recruited from the environment or be shaped by biological processes [125]. They can also alter the evolution of the biotic components, for example, environmental change can drive genetic and organismal evolution and selection. The history of life clearly depends on the interplay of both types of components. Biotic components, however, deserve a specific focus. Some of them form lineages (for instance, genes replicate), while others do not (for instance, proteins are reconstructed). Finally, interacting replicated components can be further classified into fraternal components when they share a close last common ancestor (e.g. in kin selection cases), and egalitarian components, when they belong to distinct lineages (as an example, think of the evolution of chimeric genes by fusion and shuffling [29, 45, 126]) [63].

Fig. 3.
figure 3

Classification of major types of components in evolving systems. A process/collective cannot be completed in the absence of obligate components, whereas facultative components do not affect the outcome of the process/function of the collective. Biotic components are biological, material products, whereas abiotic components are environmental, geological, chemical, physical or cultural artefacts. Replicated components are produced by replication, which implies a physical continuity between ancestral and descendent components; they undergo a paradigmatic Darwinian evolution. Reconstructed components are reproduced without direct physical continuity, and cannot directly accumulate beneficial mutations. Fraternal components belong to the same lineage, whereas egalitarian components belong to different lineages

Introducing dynamic interaction networks

Biodiversity usually evolves from interactions between the diverse types of components described above. For example, metalloproteases emerge from the interaction between reconstructed biotic components (proteins) and a metal ion. Regulatory networks involve biotic components that can be either replicated (i.e. genes and promoters) or reconstructed (i.e. proteins). Protein interaction networks intertwine reconstructed egalitarian biotic components, which means proteins that are not homologous. Evolutionary transitions such as eukaryogenesis result from the interweaving of biotic components (cells) from multiple lineages. Holobionts evolve from interactions between egalitarian biotic components (macrobial hosts and microbial communities) and possibly abiotic components, such as the mineral termite mounds, or the volatile chemicals produced by the microbial communities of hyenas [105].

Taking collectives of interacting components as central objects of study in evolutionary biology invites us to expand the methods of this field. It encourages developing statistical approaches or inference methods beyond those operating under the very common assumption that biological components are independent. Therefore, we propose to represent interactions between components in the form of networks in which components are nodes and their interactions (of various sorts) are edges. These networks are conceptually simple objects. They can be described as adjacency lists of interactions, in the form ‘component A interacts with component B, at time t (when such a temporal precision is known)’. Such dynamic interaction networks could become more central representations and analytical frameworks, and serve as a common explanans for various disciplines in an expanded evolutionary theory. Importantly, because these networks embed both abiotic and biotic, related and unrelated components (like viruses, cells and rocks), they should not be conflated with phylogenetic networks, but recognized as a more inclusive object of study (Fig. 4). Where phylogenies describe relationships, networks can describe organisations. How such organisations evolve could for example be described by identifying evolutionary stages, that is, sets of components and of their interactions simultaneously present in the network (Fig. 4). Investigating the evolution of an ecosystem corresponds to studying the succession of evolutionary stages in such networks and detecting possible regularities—in the sense that some evolutionary stages would fully or partly reiterate over time—or hinting at rules or constraints (like architectural contingencies [127, 128] or principles of organisations [46]) on the recruitment, reproduction and heritability of their components.

Fig. 4.
figure 4

An evolving interaction network. Nodes are components (circles are full when the component is biotic). Thick black edges represent interactions between these components. The network topology evolves as nodes or their connection change. Dashed edges represent the phylogenetic ancestry of lineage-forming components

Thus, we suggest that evolutionary biology could be reframed as a science of evolving networks, because such a shift would allow inclusive, multilevel studies of a larger body of biological and abiotic data, via approaches from network sciences.

Concrete strategies to enhance network-based evolutionary analyses

Enhancing network-based evolutionary analyses, beyond the now classic research program of phylogenetic networks, could consolidate comparative analyses in the nascent field of evolutionary systems biology [129, 130], as illustrated by examples based on molecular networks. Network construction/gathering constitutes the first step of such analyses. This involves first defining nodes of the network, namely components suspected to be involved in a given system, and edges, namely qualitative (or quantitative, when weighted) interactions between these entities. Many biological interaction networks (gene co-expression networks (GCNs), gene regulatory networks (GRNs), metabolic networks, protein–protein interaction networks (PPIs), etc. [46]) are already known for some species, or can be inferred [131,132,133,134,135,136]. For example, GCNs offer an increasingly popular resource to study the evolution of biological pathways [137], as well as to reveal conservation and divergence in gene regulation [138]. GCNs are already used for micro-evolution studies, as in the case of fine-grained comparisons of expression variations between orthologous genes across closely related species, and for the analysis of minor evolutionary and ecological transitions, such as changes of ploidy [139, 140], adaptation to salty environments [141] or drugs [142], or the effects of plant domestication [143, 144]. Likewise, GRNs are starting to be used in micro-evolution and phenotypic plasticity studies [145]. Understanding the dynamics of GRNs appears critical to inferring the evolution of organismal traits, in particular during metazoan [146,147,148], plant [149] and fungal [150] evolution. We suggest that PPI, GCN and GRN studies could become mainstream and also be conducted at (much) larger evolutionary and temporal scales, to analyze additional, major, transitions.

Based on these established networks, two major types of evolutionary analyses (network-decomposition and graph-matching; Fig. 5) can be easily further developed by evolutionary biologists. More precisely, the above-mentioned kinds of biological networks could be systematically turned into what we call evolutionary colored biological networks (ECNs). In ECNs, each node of a given biological network is colored to reflect one or several evolutionary properties. For example, in molecular networks, nodes correspond to molecular sequences (genes, RNA, proteins) that belong to homologous families that phylogenetic distribution across host species allows us to date [137, 151,152,153,154,155,156]. The ‘age’ of the family at the node can thus become one evolutionary color (Fig. 5). Likewise, several processes affecting the evolution of a molecular family (selection, duplication, transfer, and divergence in primary sequence) can be inferred by classic phylogenetic analyses or, as we proposed, by analyses of sequence similarity networks [157]. Such studies provide additional evolutionary colors (like quantitative measures: intensity of selection, rates of duplication, transfer, and percentage of divergence), which can be associated with nodes in ECNs [139, 149, 154, 158,159,160,161]. Thus, ECNs contain both topological information, characteristic of the biological network under investigation, as well as evolutionary information: what node belongs to a family prone to duplication, divergence, or lateral transfer, as well as when this family arose. Combining these two types of information in a single graph allows us to test specific hypotheses regarding evolution.

Fig. 5.
figure 5

Workflow of the evolutionary analysis of interaction networks. From left to right: triangles represent components of interaction networks, edges between triangles represent interactions between these components. Interaction networks are first constructed/inferred, then their nodes and edges are colored to produce evolutionary colored networks (ECNs) that represent both the topological and the evolutionary properties of the networks. ECNs can be investigated individually by graph decomposition and centrality analyses, or several ECNs can be compared by graph alignment. The two types of comparisons can return conserved subgraphs that allow understanding of the dynamics of interaction networks, meaning when different sets of interactions (hence processes) evolved, and whether these interactions were evolutionarily stable. Ancient and Contemporary refer to the relative age of the sub-graphs, identifying new clade-specific relationships (here called refinement); introgression indicates that a component, and the relationship it entertains with the rest of the network, was inferred to result from a lateral acquisition

Using ECNs, it is first fruitful to test whether (or which of) these evolutionary colors correlates with topological properties of the ECNs [162,163,164]. The null hypothesis that nodes’ centrality, e.g. nodes’ positions in the network, is neither correlated with the age nor with the duplicability, transferability or divergence of the molecular entities represented by these nodes can be tested. Rejection of this hypothesis would hint at processes that affect the topology of biological networks or are affected by the network topology. For example, considering degree in networks, proteins with more neighbors are less easily transferred [163], highly expressed genes, more connected in GCNs, evolve slower than weakly expressed genes [165], and genes with lower degrees have higher duplicability in yeast, worm and flies [166]. Considering position in networks, node centrality correlates with evolutionary conservation [136], gene eccentricity correlates with level of gene expression and dispensability [167], and proteins interacting with the external environment have higher average duplicability than proteins localized within intracellular compartments [168]. Additionally, network structure gives a clue to evolution since old proteins have more interactions than new ones [169, 170]. Generalizing these disparate studies could help to understand the dynamics of biological networks, in other words how the architecture, the nodes and edges of present day networks, evolved and whether their changes involved random or biased sets of nodes and edges or follow general models of network growth with detectable drivers.

This focus would complement a classic tree-based view. For instance, under the reasonable working hypothesis that pairs of connected nodes of a given age reflect an interaction between nodes that may have arisen at that time [154, 171], ECNs can easily be easily decomposed into sub-networks, featuring processes of different ages (that is, sets of nodes of a given age, e.g. sets of interacting genes). This strategy allows identification of conserved network patterns, possibly under strong selective pressure [159]. Constructing and exploiting ECNs from bacteria, archaea, and eukaryotes thus has the potential to define conserved ancestral sets of relationships between components, allowing evolutionary biologists to infer aspects of the early biological networks of the last common ancestor of eukaryotes, archaea and bacteria and even of the last universal common ancestor of cells. Assuming that some of these topological units correspond to functional units [172], especially for broadly conserved subgraphs [138, 149, 152, 166, 173,174,175,176,177,178,179,180,181,182], would allow network decompositions to propose sets of important processes associated with the emergence of major lineages.

Moreover, graph-matching of ECNs allows several complementary analyses. First, for interaction networks, such as GRNs, whose sets of components and edges evolve rapidly [183,184,185], it becomes relevant to analyze where in the network such changes occur in addition to (simply) tracking conserved sets of components and edges. Whereas the latter can test to what extent conservation of the interaction networks across higher taxa supports generalizations made from a limited number of model species [186], the former allows us to test a general hypothesis: are there repeated types of network changes? For example, does network modification primarily affect nodes with particular centralities, as exemplified by terminal processes [187], or modules? Systematizing these analyses would provide new insights into whether the organisation principles of biological networks changed when major lineages evolved or remained conserved. In terms of the ECN, can the same model of graph evolution explain the topology of ECNs from different lineages? The null hypothesis would be that these major transitions left no common traces in biological networks. An alternative hypothesis would be that the biological networks convergently became more complex (more connected and larger) during these transitions to novel life forms. Indeed, analyses conducted on a few taxa have reported quantifiable and qualifiable modifications in biological networks (in response to environmental challenges [188], during ecological transitions [189] or as niche specific adaptations [190]). More systematic graph-matching [191,192,193] and motif analyses, comparing the topology of ECNs from multiple species, could likewise be used to test the hypothesis that major lineages are enriched in particular motifs (either modules of colored nodes and edges, or specific topological features, such as feed-forward loops [46] or bow-ties [194]). It would also allow identification of functionally equivalent components across species, namely different genes with similar neighbors in different species [176].

While inferences on conserved sets of nodes and edges in ECNs are likely to be robust (since the patterns are observed in multiple species), missing data (missing nodes and edges) constitute a recognized challenge, especially for the interpretation of what will appear in ECN studies as the most versatile (least conserved) parts of the biological networks. The issue of missing data, however, is not specific to network-based evolutionary analyses, and should be tackled, as with other comparative approaches, by the development and testing of imputation methods [195,196,197]. Moreover, issues of missing data can also be addressed by the production of high coverage -omics datasets in simple systems, allowing for (nearly) exhaustive representations of the entities and their interactions (i.e. PPIs, GCNs and GRNs within a cell, or metabolic networks within a species poor ecosystem). This kind of data would allow testing for the existence of selected emergent ecosystemic properties (like carbon fixation), as stated by the ITSNTS hypothesis [198]. For instance, deep coverage time series of metagenomic/metatranscriptomic data coupled with environmental measures from a simple microbial ecosystem, such as carbon fixation, could produce enough data to allow the evolutionary coloring of nodes of metabolic networks. Comparing ECNs representing, at each time point, the origin and abundance of the lineages hosting the enzymes involved in carbon fixation could test whether some combinations of lineages are repeated over time, and whether the components (e.g. genes and lineages) vary, whereas carbon fixation is maintained in the ecosystem, which would suggest that this process evolves irrespective of the nature of the interacting components.

Finally, entities from different levels of biological organisation (domains, genes, genomes, lineages, etc.) could also be studied together in a single network framework, by integrating them into multipartite networks [199]. Recently, our studies and others (see [200] and references therein) have demonstrated that various patterns in multipartite graphs can be used to detect and test combinatorial (introgressive) and gradual evolution (by vertical descent) affecting genes and genomes. Decomposing multipartite networks into twins and articulation points could for example then be used to represent and analyze the evolution of complex composite molecular systems, such as CRISPR, or the dynamics of invasions of hairpins in genomes [201].

Further justifications for a shift toward network thinking

Enlargement of evolutionary biology

Focusing evolutionary explanations and theories on collectives of interacting components, which may be under selection, facilitate selection, or condition arrangements through neutral processes [39, 40, 202], and representing these scaffolding relationships using networks with biotic and abiotic components and a diversity of edges representing a diversity of interaction types would be an enlargement. Enlargements, as expressing the need to consider structures that are more general than what already exists, have already occurred within evolutionary theory, when simplifications from population genetics were relaxed with respect to the original formalization in the Modern Synthesis [203], to account for within-genome interaction [9], gene–environment covariance [204], parental effects [205], and extended fitness though generations [206]. It also occurred when reticulations representing introgressions were added to the evolutionary tree.

Interestingly, replacing standard linear models in evolutionary theory with network approaches would transcend several traditional axes structuring the debates in evolutionary biology. For instance, scaffolded evolution, the idea that evolution relies on what came before, is orthogonal to the distinction between vertical and horizontal descent, since both tree-like and introgressive evolution are particular cases of scaffolding. Scaffolded evolution is also orthogonal to the distinction between gradual and saltational evolution. Likewise, scaffolded evolution is orthogonal to the debates between the actual role of adaptations vs neutral processes. Selection is a key mode of evolution of collectives but not the only one. The processes involved in the forming and evolution of collectives are not even restricted to the key processes of the Modern Synthesis (drift, selection, mutation and migration) but embrace interactions such as facilitation—namely antagonistic interactions between two species that allow a third species to prosper by restraining one of its predators or parasites [207], presuppression [39, 40], etc. Consequently, some evolutionary concepts may become more important than they currently are to explain evolution. For example, contingency, which means the dependence of an evolutionary chain of events upon an event that itself is contingent, in the sense that it can’t be understood as a selective response to environmental changes [18, 208, 209], is often associated with extraordinary events, like mass decimation. Contingency could come to be seen as a less extraordinary mode of evolution in the history of life, since the ordinary course of evolution might include many cases of contingent events, that is, associations of entities in a transient collective, including any scaffolds—associations that are not necessarily selective responses or the outcomes of processes modeled in population genetics.

Likewise, adopting a broader ontology could affect how evolutionary theorists think about evolution. Population thinking and tree-thinking came after essentialist conceptions of the living words, when populations and lineages were recognized as central objects of evolutionary studies [210]. A shift towards collectives and scaffolded evolution might encourage a similar development: the emergence of an openly pluralistic processual thinking, consistent with Carl Woese’s proposal to reformulate our view of evolution in terms of complex dynamic systems [211].

Further unifying the evolutionary theory

Using a network-based approach to analyse dynamic systems also permits explanations that rely purely on statistical properties [212] or on topological or graph theoretical properties [213, 214] besides standard explanations devoted to unravelling mechanisms responsible for a phenomenon. Moreover, because of the inclusiveness of the network model, disciplines already recognized for their contribution to evolutionary theory (microbiology, ecology, cell biology, genetics, etc.) could become even more part of an interdisciplinary research program on evolution, effectively addressing current issues, consistent with the repeated calls for transdisciplinary collaborations [19,20,21, 215]. Disciplines that were not central in the Modern Synthesis—chemistry, physics, geology, oceanography, cybernetics or linguistics—could aggregate with evolutionary biology. Since a diversity of components gets connected by a diversity of edges in networks featuring collectives, as a result of a diversity of drivers, several explanatory strategies could be combined to analyze evolution. This extension to seemingly foreign fields makes sense when the components/processes studied by these other disciplines are evolutionarily or functionally related to biotic components and processes (either as putative ancestors of biological components and processes, like the use of a proton gradient in cells, which possibly derived from geological processes affecting early life [216], or as descendants of biological systems, e.g. technically synthesized life forms, which have a potential to alter the future course of standard biological evolution).

Remarkably, this mode of unification of diverse scientific disciplines would be original: the integration would not be a unification in the sense of logical positivism [217]—namely reducing a theory to a theory with more basic laws, or a theory with a larger extension. It would be a piecemeal [218] unification. Some aspects would be unified through a specific kind of graph modeling (because some interactions, namely mechanical, chemical, ecological ones, and a range of time scales are privileged in a set of theories), while other theories might be unified by other graph properties (like different types of edges and components). For example, the fermentation hypothesis for mammalian chemical communication could be analyzed in a multipartite network framework, which would involve nodes corresponding to individual mammals, nodes corresponding to microbes, and nodes corresponding to odorous metabolites. Nodes corresponding to mammals could either be colored to reflect an individual’s properties (its lineage, social position, gender, sexual availability), or these nodes could be connected by edges that reflect these shared properties, which defines a first host subnetwork. This host subnetwork can itself be further connected to a second subnetwork, namely the microbial subnetwork in which nodes representing microbes, colored by phylogenetic origins, could be connected to reflect microbial interactions (gene transfer, competition, metabolic cooperation, etc.). Connections between the host and microbial subnetworks could simply be made by drawing edges between nodes representing individual mammals hosting microbes, and nodes representing these microbes. Moreover, nodes representing mammals and nodes representing microbes could be connected to nodes representing odorous metabolites to show what odours are associated with what combinations of hosts and microbes. Elaborating this network in a piecemeal fashion would involve cooperation between chemists, microbiologists, zoologists and evolutionary biologists.

Of note, the use of integrated networks could pragmatically address a deep concern for evolutionary studies, by connecting phenomena that occur at different timescales: development and evolution [219] or ecology and evolution [220]. Considering transient collectives (thus processes) as stable entities at a given time-scale, when these collectives change much more slowly than the process in which they take part, amounts to a focus on interactions occurring at a given time scale by treating the slower dynamics as stable edges/nodes. Then, various parts of the networks embody distinct timescales, which may provide a new form of timescale integration, working out the merging of timescales from the viewpoint of the model, and with resources intrinsic to the model itself. The reason for this is that a node in an interaction network Ni, describing processes relevant at a time scale i, can itself be seen as the outcome of another (embedded) interaction network Nj, unfolding at a time scale j. This nestedness typically occurs when the node in Ni represents a collective process, involving components that evolve sufficiently slowly with respect to the system considered at the time scale i to figure as an entity, a node in Ni. In the case of a PPI network Ni, each node conventionally represents a protein, but the evolution of each protein could be further analysed as the result of mutation, duplication, fusion and shuffling events affecting the gene family coding the proteins over time; for instance, each protein could thus be represented as the outcome of interaction between domains in a domain–domain interaction network Nj. Considering these two time-scales, it becomes apparent that gene families enriched in exon shuffling events, a process directly analysable in Nj, have a higher degree in PPI networks represented at the time-scale Ni [221].

Predictions: discovery of co-constructed phenotypes

What possible findings may result from this perspective shift? One can only speculate, but the nature of the potential discoveries is exciting. At the molecular level, the structure and composition of regulatory networks and protein interaction networks could be substantially enhanced to scaffolding elements. Currently, these networks represent interactions within a single individual/species. Yet, viruses are everywhere, viral genes and proteins clearly influence the networks of their hosts, and likely constitute an actual part of their evolution. Thus, virogenetics, a novel transdiscipline, may prosper in an expanded evolutionary theory to show how and to what extent viruses co-construct their hosts, including perhaps reproductive-viruses, allowing their hosts to complete their lifecycles. At the cellular level, new modes of communication [222, 223] could be discovered, as possible viral and microbial languages and communication networks in biofilms would exemplify. At the level of multicellular organisms and holobionts, ‘symbiotic codes’, guiding the preferential association between hosts and symbionts, could be identified. At the level of phyla, hidden evolutionary transitions may be unraveled. While secondary (and tertiary) acquisitions of plastids have been documented [81], it might be shown that mitochondria too have been so acquired in some eukaryotic lineages (alongside the plastid or independently). Secondarily acquired mitochondria may provide their new hosts with additional compartments, where chimeric proteomes could assemble [91, 224] and perform original physiological processes. At the ecosystemic level, evolving networks could be used to model the changes and stases of our planet, grounding biotic lineages and processes in their environment, while highlighting potential regularities in the organisations and dynamics of ecosystems. What affects the stability of what over the course of evolution could thus become a central theme of an expanded evolutionary theory.

Concluding remarks and open questions

Interactions are not merely a part of biological history, they are what made this history. But evolutionary biologists have certainly not reconstructed the Dynamic Interaction Network of Life (DINol) yet. Undertaking this endeavor, however, would emphasize the importance of processes. Our ancestors were processes. Our descendants and those of other life forms will be processes too. Some one hundred and fifty years after On the Origin of Species, which started a great evolutionary inquiry, evolutionists should prepare to face a larger challenge: expanding evolutionary theory to study the evolution of processes. With the development of -omics and network sciences, the concepts, data and tools for this research program are increasingly available.