Introduction

An improving empirical picture of hominin evolution is creating a growing theoretical challenge, namely that of piecing together fragments of insight from a wide variety of disciplines, methodologies, and contexts into coherent explanations—the stories of our deep prehistory. What we seem to observe is a perfect storm of contingent factors and powerful feedback processes that generated an evolutionary episode that biological theory sorely under-describes (e.g., Andersson et al. 2014; Foley 2016; Fuentes 2016; Whiten et al. 2017). Hominin evolution is thereby turning out to demand dedicated and spirited theoretical development. As Maslin et al. (2015) put it, we must create new “meta-narratives”—new stories about how to tell stories.

In this spirit, we develop here a proposition about what fundamental type of macroevolutionary trajectory we are looking at. We propose that human evolution is best understood as an evolutionary transition in individuality (ETI; e.g., Michod 2007; Leigh 2010; Hanschen et al. 2015) that combines evolutionary patterns familiar from earlier ETI but in a radically new type of substrate.

Key to this new theoretical proposition is the argument that the basic kinetics of early hominin communities closely parallels protocell models of the origin of cellular life: incidentally preadapted chemical vesicles that compartmentalized autocatalytic reaction networks, producing macrolevel evolutionary populations (Gánti 1975, 1997; Hanczyc and Szostak 2004; Rasmussen et al. 2004; Filisetti et al. 2010; Serra and Villani 2017). We argue that hominin communities were preadapted to act (under the right set of ecological circumstances) as “social protocells” with regard to heritable cultural traditions and hominins.

The social protocell model paints a “culture first” scenario where the hallmark deep cooperativity and cumulative culture of Homo originated from mutualistic cooperation between simple atomistic cultural traditions (similar to those observed in Pan today). Homo here emerges as the outcome of mutualistic evolution, as a component of the increasingly organismal organization of the novel type of macroscopic and bio-socio-technical evolutionary individual (EI; Lewontin 1970; Sober and Wilson 1994; Maynard-Smith and Szathmáry 1995; Queller 2000; Michod and Roze 2001) that coalesced within the social protocell. We refer to this emergent EI as a “sociont.”

By making this theoretical connection, our hypothesis takes a macroevolutionary perspective on human origins (see also Szathmary 2015 and Foley 2016) that, we hold, has the potential to organize the interpretation of empirical evidence and serve as an enveloping theoretical meta-narrative. We see human evolution as fundamentally about the emergence of evolution on a new level of organization, and the evolution of a new set of basic evolutionary mechanisms for heredity, storage, development, and organization. This calls for a shift from “normal evolution” to “transitional evolution” in a new meta-narrative about human evolution (see also Foley 2016).

We also thereby move in the direction of unifying human evolution with the larger issue of major evolutionary transitions in natural history (MET). The dramatic evolutionary, ecological, and environmental impact of the advent of Homo hereby falls more squarely into the larger natural historical pattern of dramatic evolutionary disruptions resulting from bouts of innovation on this fundamental level (e.g., Maynard-Smith and Szathmáry 1995; Calcott and Sterelny 2011; Andersson et al. 2014; Erwin 2015; Szathmáry 2015; West et al. 2015; O’Malley and Powell 2016).Footnote 1

The proposed hominin ETI would have begun in early Homo communities ca. 2.6 mya, with large game carnivory, we argue, as the kernel around which the initial mutualistic system of traditions coalesced. The dramatic range expansion across the Old World that took place ca. 1.8 mya (Fleagle et al. 2010; Antón et al. 2014; Foley 2016) signals that the ETI had settled into a functioning new evolutionary machinery capable of macroscopic adaptation in previously unseen ways. The equally dramatic and unusual physiological change that followed in Homo (e.g., Antón and Snodgrass 2012), as well as the increasing organizational complexity of Homo societies, we see as evidence of a shift from micro- (hominin) to macrolevel (community) selection, and to a mutualistic evolutionary trajectory between the hominin organism and the exotic sociont host organism that coalesced around it.

We first introduce the social protocell model of the hominin ETI, in the following section. The resulting community-level EI would then harden into a sociont with an internal bio-socio-technical organization—selected and cumulatively adapted on the macrolevel into an organismal organization (the third section). With the term early hominins, we are referring to Australopithecus forms, some of which were ancestral to Homo (before ca. 3.0 mya). By early Homo, we intend pre-erectus forms (e.g., H. habilis; ca. 3.0–1.8 mya).

The Hominin ETI

The community-to-sociont ETI model has two main components:

  1. (1)

    The “social protocell” (see “The Social Protocell as a Preadapation” section and Fig. 1a–c).

  2. (2)

    The right kind of cultural tradition for setting off the ETI (see “The IGUT” section and Fig. 1d).

Fig. 1
figure 1

Overview of the hominin ETI model

We picture the primordial situation (pre-Oldowan; >2.6 mya) as similar in outline to what we see among present-day Pan, with the “social protocell” arising as a side effect of social group dynamics, containing a “primordial soup” of appearing and disappearing cultural traditions. The key cultural tradition would arise from this creative variability, but only under specific ecological circumstances.

The outcome, we argue, was a macroscopic evolutionary population of group-selected communities—socionts—that seamlessly combined and integrated biological, social, and technical components. The socionts were evolutionary individuals (EI) exhibiting phenotypic variation, differential fitness, and heritable fitness (Lewontin 1970; Sober and Wilson 1994; Maynard-Smith and Szathmáry 1995; Michod and Roze 2001; Godfrey-Smith 2007).

By referring to this hypothetical type of EI as socionta we simultaneously differentiate and indicate continuity with bionta (a defunct taxon denoting all living things). We thereby also differentiate the post-ETI community from precursor communities, where lower forms of nonkin cooperation may be adequately understood in simpler evolutionary terms as the outcome of robust individual benefits (e.g., Dugatkin 1997, 2002; Clutton-Brock 2009).

The Social Protocell as a Preadaptation

We begin by describing the protocell model in its original context, moving then to its reapplication to hominin evolution.

Protocells and the Origins of New Channels of Inheritance

The backbone of biological cell interfaces is a phospholipid bilayer membrane. Significantly for explaining the plausible emergence of cellular life, eminently nonliving phospholipid vesicles grow and divide under reasonably lax assumptions (Hanczyk and Szostak 2004). Simply put, (1) if an autocatalytic metabolic process contained within such a vesicle produces phospholipid molecules as a by-product, these will spontaneously become incorporated into the enclosing membrane, causing it to grow. (2) If the vesicle grows too large it will spontaneously undergo fission, resulting in two smaller vesicles (e.g., Filisetti et al. 2010; Terasawa et al. 2012).

The resulting smaller vesicles (which are impermeable to the internal reaction networks) will contain whatever chemical processes the original vesicle contained, so the daughter vesicles will exhibit inheritance of the properties of the parent vesicle (Gánti 1975, 1997).

If the reactions within populations of such vesicles exhibit heritable variation in efficiency, stability, and so on, then vesicle growth and fission rate would be variable and subject to natural selection. They would have fitness. A Darwinian evolutionary exploration of the available cell-level design space would result: fitness would immediately be transferred from the chemical reaction networks to their collective performance on the macroscopic cellular level.

In this “fission–fusion” model for the origins of life (Norris and Raine 1998) one may say that portions of a “primordial soup” get “canned” within a structure that just happens to exhibit suitable macrolevel kinetic properties. The structure did not emerge because it had these properties, but is explained as a chemical phenomenon, which is crucial since it means that we do not have to invoke the processes we seek to explain. Compartmentalization furthermore stabilizes the chemical environment, keeping reactants together, and eventually permits the formation of adaptable, organized, and homeostatic inner environments (Maynard-Smith and Szathmáry 1995, pp. 20–23, 52–57, 99–107; Gánti 1997).

Community Lifecycle and “The Social Protocell”

With the protocells discussed above in mind, let us now consider Moffett’s (2013, pp. 239–249) review of the community-level lifecycles of Pan and recent hunter-gatherers: Homo and Pan communities grow as a result of ecological success, and they eventually fission into two separate new communities if they become too large.Footnote 2

Moffett notes that this is a much under-researched phenomenon, perhaps due to its relatively long time scale. Only two Pan community splits have been observed: a split of a bonobo community at Wamba (a side observation by Furuichi 1987) and one of a chimpanzee group at Gombe (Goodall 1986). In both cases, the process unfolded on a decadal time scale, and is only now beginning to be studied in detail (using Goodall’s field notes; Feldblum et al. 2018). An analysis of the ages of chimpanzee communities suggests that the period of these fission events fall in a range between several centuries to millennia (Langergraber et al. 2014).

Based on the available evidence, Moffett (2013, pp. 240–241) proposes a causal mechanism that he labels subgroup coalescence as responsible for effecting a fission event. Growth causes social instability as group membership increases (Dunbar 1992, 1993, 1998; Hill and Dunbar 2003), increasing the likelihood of a split due to a dynamical redistribution of the focus of community cohesion (Moody and White 2003). At some point in this process of separation, territorial behavior is triggered, causing the emerging compartments to treat each other as social out-groups, which completes the division irreversibly. Moffett (2013, p. 240) proposes that approximately symmetrical splits are likely to be prevalent.

Periodic growth and symmetric division (like a cell) is thereby the most probable macroscopic life cycle pattern of hominin communities on the face-to-face coordinated level throughout human evolution. The societal systems simply grew organically, splitting periodically in two.

Just as in the prebiotic protocells, (1) the daughter communities will inherit “packages” of reproducing units whose performance (2) affects macrolevel growth, fecundity, and mortality, and (3) the “reactants” will be kept together socially and by territorial defense.

Community Permeability and Sociont-Level Inheritance

In the protocell model of the origin of cellular life, a key property of the phospholipid vesicle is its impermeability to the autocatalytic chemical networks that generate it (see above). This is what structures microlevel inheritance into macrolevel inheritance on the cellular level, it is what protects emerging internal adapted organization, and is what limits the options for components that undermine cooperation. The social protocell argument is entirely analogous: the cultural variants that cause differential community growth rates are contained within their communities; see Fig. 2.

Fig. 2
figure 2

Schematic comparison between social and biological protocell models. We see two systems that could hardly be more different in a material sense, but that, nevertheless, exhibit close dynamical and structural similarities of key importance

We base our argument on an analogy between Pan and early hominins, assuming that, on the abstract level reflected in the model, early hominin and Pan community kinetics were qualitatively similar (Fig. 2). First we note that Pan has changed little since the time of divergence from our lineage (see, e.g., Foley and Gamble 2009; MacKinnon and Fuentes 2011; Malone et al. 2012; Read 2012; van Schaik 2016). Second we note that this type of group dynamics appears to largely remain also in our lineage, in Homo exhibiting a trend of increasing refinement of a simpler ancestral theme by the appearance of additional levels of social organization (Layton and O’Hara 2010; Grove et al. 2012; Layton et al. 2012).

So why do Pan and, presumably, early hominin communities act as containers? The basic reason is that members of the same community interact intensively, persistently, and amicably, while members of different communities avoid each other or interact agonistically (e.g., Goodall 1986; Wilson and Wrangham 2003; Boesch et al. 2008; Schel et al. 2013). Since culture is transmitted in social networks, and since social networks are thereby partitioned at the borders between communities, communities partition culture. They contain culture.

We can identify four robust factors that are likely the cause of a robust and persistent partitioning:

  1. (1)

    Close and prolonged contact between role models and naïve learners is needed for cultural transmission (as with any social learning). Such conditions apply within but not between communities (see, e.g., Tostevin 2012).

  2. (2)

    Enculturated individuals are prevented from moving freely between communities (e.g., Nishida et al. 1979; Pusey 1979; Wrangham 1979; Wilson and Wrangham 2003).

  3. (3)

    Even in the cases when enculturated individuals do transfer between communities, isolated individuals are poor vectors of cultural traditions due to conformism (Whiten et al. 2005; van de Waal et al. 2010, 2013; Haun et al. 2012; Luncz and Boesch 2014, 2015).

  4. (4)

    Components of integrated cultural systems are coadapted into larger functional wholes. Their fitness is thereby highly context-dependent. This compatibility problem must be expected to have gotten worse rather than better with increasing cultural complexity.

Lycett et al. (2009) conclude that observed patterns of chimpanzee cultural variation are indeed best explained by vertical rather than horizontal transmission of culture, lending yet another line of evidence; i.e., demic diffusion is more prevalent than the diffusion of ideas (Ammerman and Cavalli-Sforza 1984), a model that genetic evidence also increasingly favors in Holocene sedentary populations (Shennan 2013, pp. 302–303).

Although under-researched and challenging to investigate, indirect archaeological evidence based on patterns of traces of behavior (Foley and Lahr 2011; Layton et al. 2012; Blasco et al. 2013) suggests that community boundaries remained a strong barrier to cultural transmission throughout the evolution of Homo.

The IGUT

We now turn to the microlevel of the social protocell, and, in particular, the variety of socially learned traditional behavior that we may assume that it contained. These traditions play a central role in our argument as analogues of the autocatalytic chemical reactions contained in the protobiotic protocell (Fig. 2; see sections above).

A Cultural Tradition That Made a Difference

Extant chimpanzees are widely believed to be qualitatively similar to early hominins also with regard to the capacity to form and maintain cultural traditions (e.g., Whiten et al. 2003; McGrew 2010; van Schaik 2016, p. 78). It is therefore plausible to assume that early hominins maintained cultural traditions at a level and of a type similar to extant wild chimpanzees (e.g., Boesch and Tomasello 1998; Whiten et al. 1999, 2003; Boesch 2003; Lycett et al. 2009; Whiten 2011; Harmand et al. 2015). That is, a diverse and broad range of material and behavioral traditions (see Boesch 2012 for overview), potentially long-lived (Mercader et al. 2002, 2007), with rudimentary cumulative refinement and multi-component sequential tool use (Boesch 2003, pp. 88–89, 2012, pp. 66–72; Whiten et al. 2003; Vale et al. 2017). Some traditions clearly contribute to ecological success (Whiten 2006; Boesch 2012, pp. 47–80) but not in a way that approaches the irreplaceable role that culture plays for humans.

The tinder—social protocells churning with ever-new variants of chimpanzee-grade cultural traditions—was thereby likely in place. What was missing (and remains missing in Pan) was the spark: a cultural tradition that would cause the community to meet all criteria for evolutionary individuality. We conceptualize such a tradition (eventually an integrated system of traditions) in terms of three abstract properties it must have possessed:

  • Importance—Possessing the tradition provides a substantial competitive edge.

  • Generativity—The tradition has room for open-ended improvement: variants conveying steadily increasing adaptive benefit.

  • Universality—The tradition would remain important across large, contiguous geographical areas. It would not be directed at resources or behaviors whose adaptive values were essentially tied to locally occurring conditions.

Importance means that selection would generate a sufficiently strong selective signal to overcome the vagaries of chance. Generativity would ensure that competition would not simply stop at an early point beyond which further elaboration of the tradition would not pay. Universality would ensure that the tradition (and adaptive variants thereof) actually could spread beyond its original range, making space for a sizeable macrolevel population. We refer to such an important, generative, and universal tradition as an IGUT.

The Actual Spark: Cooperative Tool-Assisted Large Ungulate Carnivory

Cooperative large ungulate carnivory provides a well-supported and plausible candidate IGUT. A likely sequence of steps began in scavenging and continued—with deeper cooperation and increasingly sophisticated social and technical behavior—to confrontational scavenging, and finally to hunting (e.g., Bunn and Ezzo 1993; Domínguez-Rodrigo and Pickering 2003; Fuentes 2017). Tool-assisted and cooperative carnivory emerged by 2.0 mya (Ferraro et al. 2013) but likely at least from the beginning of the Oldowan lithic tool tradition ca. 2.6 mya (e.g., Semaw 2000; Plummer 2004; Domínguez-Rodrigo et al. 2005).

Even small quantities of meat can serve an important dietary role by providing essential micronutrients, which for example may permit a more efficient overall diet when these nutrients do not have to be extracted from large volumes of low-quality foods (Tennie et al. 2009). Meat is socially and/or technically hard to obtain, but it has high nutritional value, and it exists widely in sufficiently large quantities to potentially replace most other food sources.

Increasing the intake of meat also dynamically remained important as an enabler of brain growth (Aiello and Wheeler 1995; Milton 1999, 2003; Snodgrass et al. 2009; Navarrete et al. 2011)—and thereby the technical and social intelligence that was critical to the particular way that hominins obtained meat. Technical and social intelligence, in turn, are widely seen as the two main evolutionary drivers of brain enlargement and general intelligence among primates (Byrne and Bates 2010). Meat would thereby enable more meat.

Large-game carnivory indeed remained the ecological focus throughout the evolution of Homo (e.g., Stiner 2002), and its generativity as a target of socio-technical adaptation is amply documented across our 2.6 my continuous archaeological record of tool production. Efficiency and risk reduction could, bit by bit, be achieved traditionally along multiple axes, such as mobility, social coordination, weaponry, raw material provision, processing, storage, and so on. In these areas, we see specialized and compartmentalized strategies and artifacts, all organized together into a regulatory hierarchy of mutualistic functional wholes. Indeed, we see an unbroken genealogy of carnivory leading all the way to present-day industrial livestock farming.

Strategies for large ungulate carnivory are also relatively easy to readapt to new settings across the world. Large ungulates are present across extensive contiguous tracts of land, and their behavior and defenses do not vary drastically.

The first steps may have had similarities with collaborative hunting of colobus monkeys, with frequent food sharing, seen in some chimpanzee communities (particularly in the Taï forest; see Boesch 2012, p. 90). Although not necessarily fully cooperative (van Schaik 2016, pp. 108–109; in the categories of Boesch and Boesch 1989) these hunts showcase how factors (behavior, motivations, and features scaffolding the behavior) can align to produce a protocooperative behavior even in species that are not strongly adapted for cooperation in general (Tomasello et al. 2012, pp. 674–680). Along with extractive foraging (which is the main area of chimpanzee subsistence tool use), social hunting is a strong candidate precursor of the much more intensive and well-coordinated cooperation we must imagine in the scavenging, hunting, and foraging strategies that we see as a hallmark of Homo (van Schaik 2016, p. 101).

Ecological Context of the Sociont I: Large Carcasses

A key difference between the ancestors of Homo and chimpanzees may simply have been access to large carcasses. Beyond the nutritional qualities of meat, these exhibit a range of key qualities as targets of a coalescence of mutualistic traditions vis-à-vis early Homo, whether as a target of scavenging or hunting:

  1. 1.

    Marginal cost of sharing food diminishes as food package size increases, since a large carcass cannot be consumed quickly, or at all, by a single individual (Blurton Jones 1984; Winterhalder 1996; Stevens and Gilby 2004). That is, conflict is inherently low, which stimulates cooperation.

  2. 2.

    Cooperation and coordination greatly increases a group’s effectiveness in obtaining and monopolizing large carcasses (Bickerton 2009; Bickerton and Szatmary 2011)—be it by confrontational scavenging or by hunting. That is, hominin social cooperation and coordination becomes adaptive.

  3. 3.

    Getting the most out of a large carcass is effectively an open-ended exercise in tool-assisted extractive foraging. Carcass processing (and storage, utilization, transport, etc.) demands high technical intelligence: this was indeed also the main area of use of early Oldowan tools (e.g., Plummer 2004) and remains a central role of technology throughout. This means that cooperation and coordination between cultural traditions becomes adaptive.

Open landscapes are associated with large ungulates, and it is widely believed that the dividing line between the Pan and hominin lineages was a split in habitat range (e.g., Fleagle 2013). The former remained in a closed canopy rainforest environment while the latter moved into more varied types of landscapes, including lightly forested and open grassland (Cerling et al. 2011; Potts 2012, p. 302).

Alignment and Export of Fitness

Let us now return to the technicalities of transitional evolution during an ETI. Alignment of the initially separate fitness interests of cooperating microlevel EI (here, traditions and hominins) into a unified macrolevel fitness interest (here, the sociont) plays a key role in models of ETI (e.g., Michod et al. 2006; Folse and Roughgarden 2010; Niklas and Newman 2013). As alignment grows stronger, a second process termed export of fitness kicks into action. Export of fitness entails emerging macrolevel adaptive organization that does not belong to the formerly independent components, but that emerges “between them,” and that they become increasingly dependent on as a source of fitness. In the sociont, this would be the emerging cultural socio-technical system, directed at large ungulate carnivory.

Alignment may result from a variety of causes. For example, the containing properties of the social protocell (see above) aligns the fitness of the contained micro-components, since it limits their options for avoiding the risks involved in cheating (strong “boomerang effect”; see Mesterton-Gibbons and Dugatkin 1992). In our model, this provides an initial alignment of fitness that creates a favorable setting for cooperation to emerge (between traditions and/or hominins).

The significance of a latent preexisting potential rests largely on it removing the perceived necessity that the adaptiveness of and the potential for cooperation must be explained together, typically as a bottom-up self-reinforcing story.

There is a good reason why it seems that explanations must look like that. Williams’ (1966) demonstration that group-level cooperation is highly sensitive to being undermined by individual-level cheaters convinced evolutionary theorists not to touch explanations that did not refer to kin selection, which, following Hamilton (1964a, b), seemed to be the only way that cooperation would not disintegrate “from below.”

Subsequent work on cooperation may be described as attempts to wiggle out of this “kinship sack”—pushed on by the evident ubiquity of cooperation also among nonkin, not least among humans. How could Homo become hyper-cooperative (Nowak and Highfield 2011; Burkart et al. 2014) despite a lack of close kinship in groups? The focus on cheating is warranted but remains at a level that has been described as “obsessive” (Calcott 2011)—not only in the context of human evolution but generally.

With kinship not an option, the only remaining option has seemed to be that hominin behavior and cognition must have set off a self-reinforcing process whereby increasing cooperation persistently produced a surplus of fitness alignment, which potentiated yet more cooperation. The preferred path has been through increasingly sophisticated forms of reciprocity (Trivers 1971; Fehr et al. 2002; Nowak 2006).

That cooperation and the potential for cooperation must be generated together is, however, a strong constraint on permissible explanations—not least on explanations of origins. While plausible, these bootstrapping explanations are not particularly robust. Many stars must be aligned for the stories to work, and, in the end, many such stories must work together. A more robust source of alignment of fitness interests would clearly be preferable—if one existed.

Our social protocell model may provide such a source. Like kin selection, it explains why fitness interests between cooperators would be aligned at the outset by explaining why cheating would not be adaptive in the first place on the microlevel.

So the social protocell narrative has it that the fitnesses of hominins and their evolving “soup” of traditions were initially aligned by the social protocell dynamics (Figs. 1c, 2), and that this fitness was subsequently exported to adapted organization on the emerging sociont level.

Tomasello et al. (2012) developed a compatible model that also aims to explain the origins of fitness alignment, based on “interdependence kinship” (Roberts 2005). Their model focuses on how a network of interdependence between hominins could have mimicked the aligning effects of genetic kinship. The social protocell depicts a situation that would have been ideal for the formation of strong interdependence networks (again, a strong “boomerang effect”). From our perspective, interdependence kinship provides an example of a mechanism by which integrated adapted complexes of traditions (initially the carnivory IGUT, which is also the kernel of cooperation that they propose) would lead to fitness alignment and vertical export of fitness also among hominins (i.e., not only among traditions). Within the emerging sociont, hominins would need to rely on each other’s ability and propensity to function nominally within these shared adapted cultural systems.

The Culture-First Scenario

Significantly, this potentiates a culture-first scenario where cultural traditions may have led the charge in the evolution of cooperation to (1) scaffold cooperative and coordinated hominin behavior and (2) create selection pressures for improved hominin capacities for cooperation and coordination. The sociont benefited from making us cooperate and coordinate, and nothing prevents traditions from modifying behavior in that direction.

Cooperation and coordination imposed from the top-down in this manner is a central concept in ETI. Michod and Nedelcu (2003, p. 66), for example, argue that cheating, in a context of adaptive cooperation, generally introduces selection for “conflict modifiers”—a specific type of adaptive organization with exported fitness—in this manner. Their effect is to stabilize cooperation and align fitness interests from the top-down—here effectively to impose the new “evolutionary will” of the sociont on traditions and hominins.

The relevance of such a framework is evident. Many of the salient adapted systems that stand out behind the rise of Homo are macroscopic and integrate biological, social, and technical components. They, moreover, frequently play central roles precisely in cooperation and coordination among hominins. While their adaptive benefit is straightforward once established, their gradual bottom-up origins from individual-level and primitive precursors is significantly harder to explain.

Examples of “exported” macroscopic cultural systems, and their biological support components, include language (e.g., Maynard-Smith and Szathmáry 1995; Stout et al. 2008; Ardila 2018), exceptional social and technical intelligence (e.g., Dunbar 1993; Whiten and van Schaik 2007), moral norms (e.g., Tomasello and Vaish 2013), unsolicited prosociality (e.g., Burkart et al. 2014), cultural learning and teaching (Gergely and Csibra 2006; Csibra and Gergely 2009, 2011; Laland 2017), a variety of “mental modules” (e.g., Lotem et al. 2017), executive functions providing function and control (Ardila 2008), a reputation economy and indirect reciprocity (e.g., Nowak 2006), and cheater punishment (e.g., Jensen 2010). See also, for example Herrmann et al. (2007), Vaesen (2012), and Laland (2017).

From the macroevolutionary sociont perspective, such systems (and adaptations to instate and maintain them) would not primarily explain cooperation. They would represent the exploitation of an adaptive potential to realize cooperative systems in the fitness-aligned interior of the sociont. Driven by an expandable and highly adaptive IGUT, a coalescence of traditions into a mutualistically coadapted cultural system of traditions can now be plausibly imagined.

Within such systems, collective functions like distribution, communication, and coordination are plausible—as are other supporting functions, such as ones pertaining to the transmission and storage of culture and conflict modification. These would represent a type of infrastructure of the bio-socio-technical sociont, analogous to, for example, the vascular, endocrine, immune, or neural systems that we find within biological organisms. The adaptive rationale for the sociont to promote internal cooperation and coordination is that this causes the space of permissible microlevel arrangements to grow, which would open up new swathes of design space on the macrolevel.

Organismality and Individuality of the Sociont

In our view cooperation would initially have followed something like the following sequence of phases: (1) Cultural traditions becoming coadapted such that their combined expression, how they prepare the ground for one another, how they stimulate or inhibit the expression of one another, and so on, cause them to perform superiorly together. (2) Larger and more sophisticated adaptive complexes of such cultural traditions taking shape, with (3) “internal” traditions arising whose sole fitness contribution comes from serving (regulating, coordinating, making more efficient, etc.) other traditions, and (4) the formation of modular bio-socio-technical cultural systems.

Notably, we may imagine this without imagining much social coordination and cooperation between hominins at all. We may readily imagine, however, how such a system of traditions would be more powerful if hominins cooperated socially and coordinated their actions—which macroscopic sociont organization potentially could make them do since culture canalizes behavior and cognition (see above). We may furthermore imagine that as the hominins themselves became adapted to comfortably serve their roles in these systems, that would open up for even more powerful macroscopic sociont adaptations to arise—adaptations that again stretched the capabilities of the hominins. Eventually, these systems would come to seamlessly integrate biological, social, and technical components—in some cases, such as language, to an extent that culture and nature partly become fused.

Although social cooperation well beyond what we observe in Pan likely arose early in Homo, the sociont may for a long time have remained tightly cohesive and coordinated, yet relatively socially simple compared to recent human societies. But in particular, to rightly assess the sociont hypothesis we must expand our view of cooperation, from pertaining exclusively to social cooperation between hominins, to also include cooperation among traditions.

So it may not be the birth of strong cooperation, coordination, and cohesion per se that we witness in the much more recent evolution of Homo sapiens (e.g., Hare 2017), but an evolution of sophisticated sociont adaptations to maintain a high level of cohesion between more powerful hominin components (see also Read 2012), capable of developing and maintaining more powerful and flexible bio-socio-technical systems. Again, what we witness from this macroevolutionary perspective would be a familiar pattern in transitional evolution: the emergence of top-down adaptations to expand the evolutionary design space on the macrolevel.

Adaptive Functional Organization

Exploring and testing the organismality hypothesis is not a realistic aim for this article. Nevertheless, we wish to offer a preliminary exploration of the territory, led by those questions macroevolutionary concepts and models might call for us to ask (see, e.g., McShea 2000; Queller and Strassmann 2009; Folse and Roughgarden 2010).

From a macroevolutionary view, the sociont would have utilized heritable cultural information to combine the biological, social, and technical domains into powerful functional complexes. This holistic bio-socio-technical system is what we hypothesize is best viewed as a sociont organism: a mutualistic regulatory system of interacting functional parts with an entangled biological, social, and material “physiology” (see also e.g., Hodder 2012; Pradhan et al. 2012; Whiten and Erdal 2012; Andersson et al. 2014; Laubichler and Renn 2015; Stiner and Kuhn 2016; Whiten 2016).

Selection on the sociont level would configure and optimize the sociont-level functionality of this internal system. But it must also be expected to have worked to maintain its flexibility in at least two senses: (1) cultural evolvability, i.e., heritable adaptation, and (2) developmental/behavioral flexibility, i.e., flexible and creative cognition- and memory- (experience) based recombination of heritable cultural elements to situations as they arise.

Adapted macroscopic systems—be they social, technical, or biological—are hierarchical and modularized: components are internally integrated and externally separated such that functionality can be achieved with minimal interference among different functional requirements. They are “near-decomposable” (e.g., Simon 1962; Wimsatt 1975, 1994; “near” since integration across the system is needed but minimized). Unless they were, they could not have arisen. Near-decomposability drastically reduces the dimensionality of design spaces and permits their flexible navigation by search processes such as natural selection or creative design. Simply put, near-decomposability permits creative processes to improve systems one component at a time—or at least minimizing the extent to which adjustments must be made across subsystems.

The most salient axis of macroscopic modularization and hierarchization is that of social units. Modern hunter-gatherer social organization represents a highly derived cultural form of an ancestral fission–fusion style of organization. Several organizational levels—bands, clans, tribes—have been added between and above the basic levels of foraging group and community, yielding a hierarchical system of adapted organizational levels (Grove et al. 2012; Layton et al. 2012; Read 2012; Moffett 2013). These represent macroscopic adaptive organization: they serve several functions, such as optimizing local group sizes depending on task and circumstances, and minimizing mobility requirements in areas with low biomass densities (Rolland 2004; Layton and O’Hara 2010; Grove et al. 2012; Layton et al. 2012).

Direct evidence of sociont-level bio-socio-technical systems does not preserve well: social elements do not preserve at all; physical artifacts preserve differentially depending on material, context, and age; and large-scale physical organization is very hard to reconstruct in the general case. Evidence of their existence—and a shadow of their general outlines—can, however, sometimes be inferred. Some examples include:

  • Transportation and caching of lithic raw material, and transport of carcasses to defendable refuges (ca. 2.5–2 mya; e.g., Blumenschine 1991; Potts 1991; Plummer 2004; Braun et al. 2008a, b; Goldman-Neuman and Hovers 2012) evidences a nascent macroscopic socio-technical regime with multiple adapted subcomponents that would make sense only together.

  • Gesher Benot-Ya’aqov offers a rare glimpse into early (ca. 800 kya) societies, revealing a complex economy that was differentiated spatially and functionally in an even more markedly modular fashion (e.g., Alperson-Afil et al. 2009). Multiple specialized domains of activity (e.g., quarrying and woodworking; Goren-Inbar et al. 1992) imply a high degree of temporal regulation and timing of activities.

  • Analyses of well-preserved Lower Paleolithic javelins (Schöningen, ca. 400 kya) reveal that they were parts of a complex, multi-domain organization of activities across time and space (Haidle 2009) including networks of activities surrounding their production, use, and maintenance.

  • Late Lower and Early Middle Paleolithic evidence at Qesem Cave (400–200 kya; Stiner et al. 2011) shows how increasingly ordered social and technical activities coalesced around hearths in an institutional role.

We think the sociont emerged and thrived because its organizational, hereditary, and developmental mechanisms permitted an entirely new way of adapting—one that no other type of organism could counter. Apart from quantitative benefits, like speed of adaptation, the sociont would not be constrained by having to pack all functionality into a single cohesive bodily system. It would have a superior potential for separation and thereby independence between functional components. We will here refer to this new way of adapting to (or indeed hacking) the ecosystem as hyper-adaptability.

Hominin Functional Differentiation

Had hominins followed the script inferable from other ETI (e.g., Szathmáry 2015, p. 10105), a likely scenario seems to be differentiation into functionally complementary castes, similar to social insects (Oster and Wilson 1978). But with a culturally inherited socio-technical system as a superior source of adaptive flexibility, that is not what happened. The hominin organism would carry and enable sociont-level cultural adaptation rather than compete with it.

It is therefore unsurprising that hominins embarked on a different and wholly unique evolutionary route where differentiation and specialization came to be almost entirely offloaded to the fast cultural channel of inheritance (Lewontin 1972; Foley and Lahr 2011, pp. 1081–1082). We should expect to see what we are actually seeing: the hominin as an increasingly powerful, flexible, general-purpose “platform” for behavior and cognition that can be turned into an exceptionally wide range of specialized forms by being “filled with” cultural content via enculturation (e.g., Han and Ma 2015; Legare 2017; Sherwood and Gómez-Robles 2017).

The Baldwin effect provides a likely model of how sociont and hominin evolution is linked (Jablonka and Szathmáry 1995; Weber and Depew 2003; Bateson 2004). If the sociont continually stretched the cultural capabilities of its hominin components, this would exert a consistent selection pressure for efficient transmission and use of culture. Those that could comfortably serve their roles as parts of this machinery would have a competitive advantage. By “catching up” in this manner, hominin evolution would potentiate further sociont evolution, thus renewing the pressure.

The culturally configurable hominin would be at the core of the immense and flexible design space of the sociont bio-socio-technical system. For example, with adaptations for coordination, cooperation, and cultural storage and transmission (e.g., language and a natural pedagogy; see Gergely and Csibra 2006; Csibra and Gergely 2009, 2011; Legare 2017; several other cognitive and psychological adaptations to such a role were mentioned in the “Alignment and Export of Fitness” section). This is the proposed nature and role of hominins as components of the sociont.

The Hominin and its Brain in the Hyper-Adaptable Sociont

In turn, the arguably most enigmatic part of Homo is its brain. Encephalization quotients in Homo grew in the period that we propose the ETI took place (ca. 2.5–1.8 mya) to exceed those of Pan and Australopithecus decisively (Falk et al. 2000; Williams 2002; Roth and Dicke 2005). During the further evolution of Homo, relative brain size continued to increase steadily to modern levels (e.g., Rightmire 2004). From our perspective, this indicates that the hominin brain was now “mounted” in an adaptive machinery that could not only make better use of the brain but also better shield it from the risks that it incurred (e.g., Han and Ma 2015; Sherwood and Gómez-Robles 2017; Sterelny 2017).

We propose that the sociont organism could leverage intelligence into much larger adaptive benefits than could an individual ape organism. Break-even between the benefits, costs (Aiello and Wheeler 1995; Leonard et al. 2003; Snodgrass et al. 2009; Herculano-Houzel 2012), and risks (e.g., altriciality, obstetric dilemma; Dunsworth et al. 2012) of having a large brain thereby occurred at ever-larger brain sizes as the sociont’s bio-socio-technical design space was evolutionarily explored and expanded over time. The hominin brain was furthermore buffered from risks by a homeostatic regulatory host that maintained conditions within increasingly narrow ranges.

The Ecological Context of the Sociont II: Variability Selection

Adaptability to environmental variability was already highly developed in early hominins, as in great apes generally (Ungar et al. 2006; Boesch 2012, pp. 47–80; Malone et al. 2012; see also Rendell et al. 2011). We see the evolution of hyper-adaptability as a continuation, by new means, along that same trajectory: a capacity to adapt to variability by creativity and resourcefulness (see also Fuentes 2017; Fogarty et al. 2015).

The emerging picture of the ecological context of the ETI is that the period between ca. 3.0–2.0 mya combined three patterns that are associated with the onset of the Pleistocene ice age: (1) general climatic cooling and spread of open landscapes with high concentration and large packages of biomass (Vrba 1988, 1995; deMenocal 1995); see also the “Ecological Context of the Sociont I: Large Carcasses” section; (2) high climatic variability (Potts 1998a, 2012; Maslin and Christensen 2007; Potts and Faith 2015); (3) a pulsed distribution of this variability (Shultz and Maslin 2013; Maslin et al. 2014, 2015).

Hyper-adaptability fits well into the image of the “variability selection hypothesis” (Potts 1998b, 2012; Grove 2011a, b; Maslin et al. 2014, 2015) in the sense that hyper-adaptability would be particularly adaptive in a context of high environmental variability. Territorial hominins, unable to migrate freely when faced with sudden and radical local biotope and faunal turnover (Leakey and Werdelin 2010, pp. 7–8; Maslin et al. 2015), would benefit greatly from the ability to drastically reconfigure their strategies locally.

The period of environmental variability and high hominin diversity in Southern and East Africa (e.g., Antón et al. 2014; Maslin et al. 2015; Carotenuto et al. 2016) was followed by a dramatic range expansion across the Old World (Fleagle et al. 2010; Antón et al. 2014; Foley 2016) We think hyper-adaptability, driven by variability selection, also poised the sociont for adapting to geographical variability, and thereby a unique expansion into an entirely new and broad range of biotopes.

Evolutionary Individuality

To qualify as an EI the sociont must not only be sufficiently cohesive to align the fitnesses of its components. It must also meet some basic criteria pertaining to how it behaves as a population (Lewontin 1970; Godfrey-Smith 2007). We conclude that Homo communities as socionts may plausibly have done so:

  • Cultural phenotypic variation exists between extant chimpanzee communities (Whiten et al. 2001; Lycett et al. 2009; Boesch 2012; Luncz and Boesch 2014, 2015). Cultural elements of different Homo communities certainly vary, as do their manifestations as holistic adapted systems.

  • Cultural differences may cause at least slight differential fitness in chimpanzees (Boesch 2012, pp. 47–80; Whiten 2006), and they straightforwardly do in Homo.

  • Traditions can be highly persistent among chimpanzees (Mercader et al. 2002, 2007) and may be assumed to have been also among pre-ETI early hominins. Many Homo cultural components are exceptionally persistent and widespread—not least basic lithic designs. As chimpanzee traditions are heritable and may confer fitness, we feel confident stating that cultural fitness is heritable.

Discussion

We began by arguing the need for new “meta-narratives” and claiming that some variant of our hypothesis potentially may provide one. Meta-narratives—narratives about narratives—in the context of scientific theory mean, essentially, some general set of theoretical principles that scaffolds the construction of explanations and new ideas. It structures the theoretical and empirical search space and helps maintain a measure of unity so that derivative explanations support rather than contradict one another. In the case of historical sciences, the lack of a meta-narrative means that history tends to look like “one damned thing after another”—as Henry Ford reputedly put it.

A meta-narrative is not sufficient in itself, however: it is too abstract to “reach” all the way down to the empirical level. It must be evaluated through the performance of more specialized theoretical embodiments that extend the meta-narrative while remaining consistent with it—models, theory, and arguments about specific phenomena.

The macroevolutionary theoretical search for theory to explain the big patterns in natural history and the phylogeny of life is precisely a search for models to go between foundational Darwinian meta-narrative and the empirical background. A long-standing mystery that neo-Darwinian evolutionary theory says very little about is how major evolutionary transitions have taken place (e.g., Maynard-Smith and Szathmáry 1995; Gould 2002; Szathmáry 2015). How did evolution go from bacteria to protists? From unicellular protists to multicellular organisms? From multicellular organisms to human civilizations?

Such bouts of transitional evolution call for meta-evolutionary theory that explains how these Darwinian principles—embodied in one set of adapted systems—can periodically give rise to sets of new such embodiments and radically new regimes of adapted organization. As we have discussed, a series of evolutionary pathways that explain how such transitions are possible within the constraints of the overarching Darwinian meta-narrative have been identified.

What we do is to adapt and combine insights about such pathways to address the human MET—arguably the most dramatic and inherently interesting one of them all. If human evolution kicked off as something that has recognizable precedents in natural history—which the proposed model hypothesizes was the case—then we would be in a much better shape to understand human evolution.

First, human evolution would be better unified with our overall evolutionary understanding of natural history. That would bode well for future theoretical development. Second, we would have something to begin constructing theory from—something that produces quite specific predictions about what patterns we should and should not expect to see in human evolution.

Our aim with this article has been to argue for plausibility and likelihood to a level where the proposition may be viewed as worthy of further inquiry, which will entail going deeper into literature that we have only sampled here. The first task ought to be to consider the existing but disparate wealth of causally important factors that have been proposed based on close consideration of the expanding empirical field (such as those reviewed by Maslin et al. (2015), as they called for new meta-narratives). Is our proposition consistent with such models, and does it stimulate a unification among such models?

Furthermore, it remains to be explored how our proposition interacts with other proposed meta-level frameworks. For example, niche construction theory (NCT; e.g., Laland and Brown 2006; Smith 2007; Laland and O’Brien 2015), dual-inheritance theory (e.g., Cavalli-Sforza and Feldman 1981; Boyd and Richerson 1985; Whiten et al. 2011), models of bottom-up evolution of stable reciprocal cooperation (e.g., Fehr and Fischbacher 2003; van Schaik and Kappeler 2006; Tomasello 2008; MacKinnon and Fuentes 2011; Nowak and Highfield 2011; Tomasello et al. 2012; Wilson 2012; Burkart et al. 2014), and models emphasizing the role of organization, regulation, and development in cultural evolution (e.g., MacKinnon and Fuentes 2011; Wimsatt 2013; Andersson et al. 2014; Fuentes 2015, 2016; Stiner and Kuhn 2016).

We have furthermore identified a number of domains in terms of the ontology of the proposed model where further work to organize and interpret existing evidence will put the hypothesis to the test and permit its elaboration. We have the ETI itself (see “The Hominin ETI” section), which bears on the kinetics of community life cycles and their primate and human social underpinning. We also have the functional organization of Homo communities (see the “Organismality and Individuality of the Sociont” section) and the evolution of Homo itself, where we hint at a mutualistic evolutionary pathway where Homo actually turns into a component of the sociont. Homo would, for many functional purposes, have become enclosed within the sociont, which may suggest that endosymbiosis could be a fruitful model for understanding human evolution, and possibly also domestication. Finally, we have macroscopic historical and geographical evolutionary patterns, which we have not gone into at all in this article.

Since we are invoking MET models that are extensively worked out in their original settings, quite specific predictions may be developed, and the potential for finding strong tests is thereby promising. In other words, our hypothesis constrains what sorts of patterns it is compatible with quite strongly. We may thereby expect to be able to tell whether this is actually a workable meta-narrative or not.

The only other theoretical framework approaching this scope would be NCT—and it also predicts a structured and organized “layer” between what we traditionally think is the lead character of the story (i.e., the hominin organism) and the backdrop (the ecological and environmental exterior). NCT does appear considerably less theoretically constraining than the sociont model as a meta-narrative; i.e., it is more unclear what critical tests would tell us that we are not looking at niche construction, or, for that matter, that would point uniquely to niche construction.

That said, the sociont is by no means incompatible with NCT. To the contrary, even if the bio-socio-technical “innards” of the sociont arose top-down through internal adaptive organization (ETI), there are potentially interesting points of connection between ETI and niche construction (e.g., Torday 2016). Moreover, the sociont itself is a potential niche constructor on its level, as it interfaces with, and modifies, the environment (see also Fuentes 2016)—inserting itself, as it were, between the hominin and the constructed niche. Its main novel adaptive capacity—which we describe as hyper-adaptability (see above)—may easily be seen as a capacity that vastly expands the potential for, precisely, constructing niches.

Moving forward also entails something that is easy to forget: evolutionary hypotheses make statements about outcomes of historical dynamics. We are exceptionally poor at seeing through emergence and nonlinear dynamics, and so models must be designed to verify that the hypothesis really predicts what is being argued and tested. Models, moreover, permit us to explore theoretical systems by varying them, and they permit us to discover patterns and phenomena that we never could have guessed were entailed by the theory. Simulating hominin/Pan communities on the behavioral microlevel to reproduce their lifecycles may be a first step, to be followed by adding cultural transmission to simulate the predicted vertical channeling and its outcomes. The ETI may be similarly explored as may the evolution of simulated hominins reproducing in this system.

We agree fully with Foley’s (2016) view that human evolution has a great potential to push the boundaries of evolutionary theory. In many ways we are looking at a phenomenon that, compared with the rest of natural history, is as exotic as any extraterrestrial type of life could be, and we have it right before us in increasingly fine detail.

From the perspective of anthropology, the proposed model may help to reduce friction between evolutionary and social approaches (Fuentes 2016) by addressing in a fundamental way how an evolutionary view may interface with the immensely complex, integrated, and rich organization of hominin societies. An improved understanding of what would be a correct ontology for understanding how humans, our societies, and nature are related would indeed have very far-reaching consequences.