Introduction 1: the eubacteria-neomura dichotomy in cell structure

Use of ribosomal RNA sequences for phylogeny led to recognition of the important distinction between archaebacteria and eubacteria (Fox et al. 1980). It soon became clear that archaebacteria are more closely related to eukaryotes than to eubacteria and that archaebacteria plus eukaryotes constitute a clade characterised ancestrally by surface N-linked glycoproteins. The archaebacteria/eukaryote clade was called neomura, meaning new walls (Cavalier-Smith 1987c), to contrast it with eubacteria that typically have walls of murein peptidoglycan (mycoplasmas that secondarily lost murein the sole exception) instead of N-linked glycoproteins. From the outset, it was controversial whether archaebacteria are ancestral to eukaryotes (Van Valen and Maiorana 1980; Williams et al. 2013) or are their sisters (Cavalier-Smith 1987c, 2002a), still not unambiguously decided (Cavalier-Smith 2014).

The cladistic relationship between eubacteria and neomura has been even more controversial, with three contrasting views (Fig. 1): (a) eubacteria are ancestral to neomura, which are therefore younger (Cavalier-Smith 1987b, c, 2002a, 2014; Lake et al. 2009; Valas and Bourne 2011); (b) they are sisters and thus of roughly equal age, with the root of the universal tree lying between them (Gogarten et al. 1989; Iwabe et al. 1989); (c) neomura, specifically eukaryote-like cells, are ancestral to eubacteria, with the universal root lying within the eukaryote stem or crown and prokaryotes having arisen by secondary simplification (so called streamlining) (Forterre 1995); Mariscal and Doolittle (2015) lumped 10 disparate speculations as ‘eukaryote-first’, but all are extremely vague as to the overall cellular properties possessed by the last ‘universal’ common ancestor of all life (LUCA), none explicit enough to be worthwhile scientific hypotheses about LUCA, and none truly eukaryote-first (i.e. none positing that LUCA had a nucleus, mitosis, meiosis, syngamy, ER-Golgi differentiated endomembrane system, and cilia or mitochondria, a logical impossibility!) as they mostly refer only to relatively trivial mainly genomic molecular details and ignore most cell biology; calling them ‘eukaryote-first’ is conceptually misleading. Saying ‘eukaryote-first does not mean Eukarya first’ was obscurantist. Unless we can confidently decide between these three roots, we cannot accurately reconstruct the nature of LUCA and determine the direction of evolution at key transitions.

Fig. 1
figure 1

Longstanding contradictory interpretations of the universal rRNA tree. On the ‘eubacteria-first’ view (a), eubacteria are the ancestral domain, several times older than neomura which arose by the neomuran revolution (Cavalier-Smith 1987c, 2002a), a radical cell transformation caused by loss of murein peptidoglycan by a eubacterium similarly to the origins of mycoplasmas and L-forms from Bacillia. a is strongly supported by the fossil record, which indicates that neomura are 3–4 times younger (originating between 0.8 and 1.45 Ga, depending on controversial identification of fossils in this period as ‘stem eukaryotes’ or ‘unusually complex bacteria’: Cavalier-Smith 2006a). Associated changes in cell biology were explained in detail (Cavalier-Smith 2014) on the assumption that the eubacterial ancestor of neomura was a posibacterium (Lake et al. 2009; Valas and Bourne 2011), whereas new evidence presented here favours the more recent idea that it was a planctobacterium (Reynaud and Devos 2011). It argues that long stems at the base of neomura and eukaryotes on rDNA and RP trees result from episodic hyperacceleration of ribosome evolution caused by origins of cotranslational secretion of glycoproteins and the nucleus respectively (Cavalier-Smith 2002a). The ‘archaea ancient’ view (b) assumes that neomura are as old as eubacteria and that neomuran and eubacterial characters evolved divergently immediately after the origin of life, often assuming that their membranes arose independently by simultaneous separate origins of acyl ester lipids in eubacterial ancestors and isoprenoid ethers in ancestral neomura (this ancient ‘lipid divide’ is now refuted by eubacterial prenyl ether lipids, and archaebacterial fatty acids). b is based on (1) highly dubious a priori ideas about archaebacteria (Woese and Fox 1977a, b); (2) the false assumption that rDNA nucleotide substitution rates have been largely unchanged since cells began; and (3) uncritical interpretation of the first protein paralogue trees that ignored the likelihood that they also are temporally distorted by episodic hyperacceleration causing long-branch artefacts that misroot the three-domain tree in the stretched neomuran stem (Cavalier-Smith 2002a, 2006c). b imagined that eukaryotes replaced isoprenoid ethers by α-proteobacterial acyl esters during mitochondrial enslavement (Martin 1999). Variants of a and b exist that assume that archaebacteria are ancestral to, not sisters of, eukaryotes (Williams et al. 2013), but also accept neomura as a clade. In contrast, the prokaryotes-late or eukaryotes-first (Mariscal and Doolittle 2015) view (c) assumes cells were originally eukaryote-like and prokaryotes arose by radical simplification (‘streamlining’: Forterre 1995) but never explicitly attempted to explain how; Forterre (2013) now prefers b. Proponents of b and c ignore the fossil record that refutes both, and largely ignore cell biology, failing to explain how assumed cell transformations could have occurred (incredible for c; highly implausible selectively and mechanistically for b—yet b may still be the most widespread assumption despite its serious defects; many remain unaware that paralogue pairs more often favour a eubacterial root, like fossils). Only a offers a scientifically explicit hypothesis as to the cell structure of LUCA

Sequence trees alone did not give a generally accepted answer (Gouy et al. 2015; Philippe and Forterre 1999). Though many mistakenly think paralogue rooting tells us that Fig. 1b is correct, that topology was only true of the first two such papers (Gogarten et al. 1989; Iwabe et al. 1989). A majority of later paralogue trees placed the root within eubacteria (Cavalier-Smith 2006c; Zhaxybayeva et al. 2005) in accord with Fig. 1a. Cavalier-Smith (2002a, 2006c) argued that this eubacterial root is probably correct and that paralogue trees suggesting otherwise are misrooted because of severe long-branch attraction artefacts resulting from transient ultrafast evolution in neomuran stem lineages. Apparently none favour a eukaryote root (Fig. 1c), so most reject this possibility and accept that neomura are a clade, though few know its name. This conflict between different paralogue trees over root 1a and 1b, irrespective of its causes, means that evidence from other sources than sequence trees is indispensible to allow their correct interpretation (Cavalier-Smith 2006c). Sequence evidence from indels puts the root in eubacteria (Lake et al. 2009; Valas and Bourne 2011). So also does evidence from the fossil record that crown eubacteria are 3.5 Ga whereas eukaryotes are only ~ 1 Ga or even less; mapping their rRNA and ribosomal protein trees onto well-dated palaeontological evidence (fossils, biomarkers, and the date of atmospheric oxygenation) strongly argues that the root is within eubacteria, relative dates being incompatible with a root in the interconnecting stem between neomura and eubacteria (Cavalier-Smith 2006c). An ingenious rooting argument is that eubacterial amino acid usage bias makes it likely that the genetic code evolved in eubacteria not neomura (Fournier and Gogarten 2010); this analysis does not tell us whether the root is within the eubacterial crown as Cavalier-Smith (2002a, 2006a, c) argued or in the neomuran stem (which the authors assumed but their analysis could not justify), but it argues against it being within neomura, thus against Fig. 1c and all 10 ideas discussed oversympathetically by Mariscal and Doolittle (2015). Two outgroup-free rooting methods applied to the universal rDNA tree gave contradictory results, the one more sensitive to systematic artefacts placed it in the neomuran stem, whereas the more accurate method put it within eubacteria, implying that archaebacteria evolved from and are younger than eubacteria (Williams et al. 2015).

Recent evidence from sterane and other fossils implies that neither archaebacteria nor eukaryotes became abundant before ~ 0.85 Gy ago (Schinteie and Brocks 2017). A lateral gene transfer from chloroplasts to archaebacteria (Petitjean et al. 2012) as explained later in this paper decisively shows that archaebacteria are at least three times younger than eubacteria, so the root must lie within eubacteria (Cavalier-Smith 2002a). Even 30 years ago, it was clear to those familiar with the microbial fossil record that eukaryotes are several times younger than eubacteria and that sequence trees could only be reconciled with the fossil evidence if archaebacteria also are substantially younger than eubacteria (Cavalier-Smith 1987c). At that time, Woese (1987) considered the possibility that the root of the universal tree may lie within eubacteria, but found that idea ‘intuitively unappealing’ yet provided no evidence against it; though he asserted that eubacterial and archaebacterial rDNA evolved at different rates, he misleadingly called rDNA a chronometer, and never discussed fossil evidence for actual dates, from which alone differential rates can be objectively inferred. Chronometer (an exceptionally accurate clock) was an extremely misleading term for a molecule that actually evolved at vastly different rates in different lineages (Cavalier-Smith 2002a; Cavalier-Smith et al. 1996, 2018) and is often more erratic in its rate evolution than many proteins. Woese (1987 p. 262) wrote ‘Since archaebacterial 16S rRNA is closer in sequence to both its eubacterial and eucaryotic counterparts than these two are to one another, the archaebacterial version of the molecule must be closer to the common ancestral version than is one or both of the other versions’. That was illogical as the seemingly intermediate nature of archaebacteria is compatible with all three Fig. 1 root positions, and most simply explained by (a); his drawing of archaebacteria at the base of his tree (his Fig. 4) and earlier progenote ideas (now disproved) and unwarranted belief in the great antiquity of methanogens (Woese and Fox 1977a, b) and exaggeration of the distinctiveness of archaebacteria apparently prevented him considering contrary evidence and arguments. Many others have been similarly uncritical and still believe that eubacteria are a clade, despite compelling evidence that they are the sole ancestral ‘domain’ of life, as explained in detail previously (Cavalier-Smith 2002a, 2006a, c).

This paper focuses instead on (1) internal phylogeny of eubacteria and archaebacteria, (2) problems in inferring from RP trees which eubacteria were ancestral to neomura, (3) where the archaebacterial and eukaryotic roots lie, and (4) whether eukaryotes are sisters of all archaebacteria or branch within them. Though a firmibacterial ancestry for neomura (Valas and Bourne 2011) was seemingly strengthened by discovery that some Bacilli have both eu- and archaebacterial type lipids (Guldan et al. 2011), our new site-heterogeneous RP trees (more taxon-rich than hitherto) strongly contradict a posibacterial origin (Cavalier-Smith 1987c), being more compatible with the increasingly discussed idea that neomura arose from Planctobacteria (Reynaud and Devos 2011). Furthermore, Sphingobacteria (=FCB group), which we show here are sisters of Planctobacteria, have all the basic archaebacterial lipid-making enzymes, which actually make such lipids when introduced into Escherichia coli, and Planctobacteria have some of them (Villanueva et al. 2018; Coleman et al. 2019). We therefore critically reassess steadily growing evidence for a planctobacterial origin of neomura, explain why that idea is greatly superior to all its competitors, and correct many previous misinterpretations of the universal tree and cell evolution.

Though mistaken about the tree’s root and archaebacterial antiquity, Woese was probably the first post-sequencing to suggest that the last eubacterial common ancestor was photosynthetic (Fox et al. 1980). Our improved eubacterial phylogeny enables us jointly with other evidence to confirm this and provide a stronger basis than hitherto for LUCA having been a photosynthetic eubacterium similar to Chloroflexi (Cavalier-Smith 2006a, d); we demonstrate that vertical inheritance coupled with numerous losses best accounts for scattered distribution of photosynthesis across the eubacterial tree (Cavalier-Smith 2002a, 2006a, c) and lateral gene transfer (LGT) was less important than some suggest (e.g. Shih et al. 2017; Ward et al. 2018). We conclude that the murein peptidoglycan wall, eubacterial flagella, and negibacterial outer membrane (OM) with porins were also demonstrably present in LUCA and multiply lost, but OM lipopolysaccharide probably originated only after Chloroflexi and other phyla diverged. We demonstrate also a high frequency of losses for respiration and methylotrophy and that (contrary to widespread assumptions) archaebacteria ancestrally inherited aerobic respiration and prenyl diether lipid synthesis from eubacteria. A general conclusion of our synthesis is that multiple losses, evolutionarily easy by independent gene deletions, and secondary simplification have been much more important in prokaryote evolution than commonly assumed, whereas LGT is too often invoked with insufficient phylogenetic evidence or explicitness when vertical inheritance plus losses are a better explanation.

Introduction 2: negibacterial root of eubacteria

Most eubacterial phyla have a complex envelope with an OM traversed by hollow cylindrical porin channels (and other β-barrel proteins) connected to the cytoplasmic membrane (CM) via bridges through the murein wall. Such bacteria are called negibacteria as most have thin walls and so stain Gram-negatively (Cavalier-Smith 1987b, c, 2006a, c), though a few (e.g. Deinococcus) with thicker murein stain Gram-positively. Two groups with thick murein walls stain Gram-positively (Actinobacteria, the high GC Gram +ves; and Clostridiia/Bacilli, the low GC Gram +ves) and were once formally grouped together as division (=phylum) Firmacutes (Gibbons and Murray 1978) (later Firmicutes: Murray 1984) to contrast them with division Mollicutes (mycoplasmas, with neither walls nor OM). Closer grouping of mycoplasmas to Clostridiia/Bacilli than to Actinobacteria or negibacteria made it clear that the absence of the negibacterial OM in mycoplasmas was evolutionarily more fundamental than the absence of murein and likely that mycoplasmas arose degeneratively from Clostridiia/Bacilli by wall loss analogously to the well-known wall-less L-forms. Therefore, all three were grouped as subkingdom Posibacteria (Cavalier-Smith 1987b, c), which was a clade on the first rDNA trees (Fox et al. 1980), and Endobacteria was introduced as a subphylum name for Clostridia/Bacilli plus mycoplasmas on the assumption that their last common ancestor had thick walls and endospores (Cavalier-Smith 1998b). The distinction between negibacteria and posibacteria appeared to be the most evolutionarily important ultrastructural dichotomy within eubacteria, which highlighted a fundamental question about cell membrane evolution. Did posibacteria arise from negibacteria by OM loss (Blobel 1980; Cavalier-Smith 1987c)? Or were posibacteria with just one membrane older and negibacteria evolved from them by OM addition as many have assumed, e.g. Gupta (1998b) when proposing the terms monoderm or diderm for cells with one or two bounding membranes.

Gupta’s argument that eubacteria were ancestrally monoderm stemmed from two incorrect beliefs: (a) the universal tree is rooted between monoderm archaebacteria and the eubacterium Thermotoga and (b) Thermotoga is monoderm also. Cavalier-Smith for a while accepted Thermotoga as monoderm, so wrongly put it in Posibacteria (Cavalier-Smith 1998b, 2002a), but later excluded it after realising its ‘toga’ is an unusual negibacterial OM with OmpA porin homologues that secondarily lost lipopolysaccharide (LPS) (Cavalier-Smith 2006c), which recent analyses support (Antunes et al. 2016; Eveleigh et al. 2013). A key to understanding posibacterial evolution was the discovery of endospore-forming bacteria that stained Gram-negatively, but confusion over whether they had an OM (as does Selenomonas) or not (e.g. Heliobacterium with an S-layer, not OM) persisted for some years, hampering classification and making the significance of their frequent grouping with Bacilli/Clostridiia on trees ambiguous. It is now clear that two distinct clades of Gram-negative endospore-forming bacteria have genuine negibacterial OMs (Halanaerobiales and ‘Negativicutes’) but are phylogenetically interspersed with several Gram-negative endospore-forming lineages that lack an OM and so are classically posibacterial or monoderm (e.g. Heliobacteriales); sequence trees group both negibacterial clades more closely with the original posibacterial Endobacteria than they do with Actinobacteria (Campbell et al. 2015; Marchandin et al. 2010). ‘Negativicutes’, a now invalid name corresponding with the Selenobacteria originally excluded from Posibacteria because of their OM (Cavalier-Smith 1992b), and Halanaerobiales both have LPS, whose synthesis is vertically inherited in eubacteria; thus, the OM was lost more than once by negibacterial endospore formers to generate posibacterial monoderm phenotypes (Antunes et al. 2016; Poppleton et al. 2017). Our new RP trees confirm this polyphyly of low-GC Gram-positives and also strongly show that Actinobacteria lost the OM independently of Endobacteria. We conclude that ancestral eubacteria were negibacteria with two membranes, and monoderm posibacteria evolved from them by several OM losses, not one loss as first suggested (Cavalier-Smith 1987b, c). The possibility that Actinobacteria were the ancestral state for eubacteria is excluded as indel analysis put the root outside them (Servin et al. 2008).

As posibacteria are not a clade, we abandon phylum Posibacteria and henceforth treat Actinobacteria (ancestrally monoderm, mycobacteria secondarily diderm) and Endobacteria (ancestrally diderm, polyphyletically mostly secondarily monoderm) as separate phyla, but retain subkingdom Posibacteria to embrace both. ‘Endobacteria’ here refers to the clade comprising all descendants of the endospore-forming last common ancestor of Halanaerobiales, Heliobacteriales, ‘Negativicutes’, Clostridiia/Bacilli and mycoplasmas irrespective of whether or not they retain ancestral OM, murein, and endospores. Our new RP trees strongly confirm the monophyly of thus redefined Endobacteria and also show for the first time that mycoplasmas are polyphyletic and arose from Bacilli by two separate murein losses. Currently, nomenclature and classification of clade Endobacteria is confused. Bergey’s Manual and most recent papers (e.g. Ruggiero et al. 2015) do not accept it as a clade but incorrectly treat it as two phyla: Tenericutes with the single class Mollicutes, which are polyphyletic, and ‘Firmicutes’ which our trees robustly show are paraphyletic. Though some papers use this phylogenetically unsound classification, e.g. Segata et al. (2013), others contradictorily extended Firmicutes to include Mollicutes/Tenericutes when labelling clades on eubacterial trees (Battistuzzi et al. 2004; Ciccarelli et al. 2006; Hug et al. 2016). Though the latter makes sense cladistically, that two contradictory meanings of Firmicutes are now in use is confusing, especially as neither corresponds to its original sense or is descriptively meaningful. As Endobacteria refers to the endospore innovation that ancestrally distinguished the clade from all other eubacteria, it is distinctive and semantically appropriate. Adopting Firmicutes (which originally referred to thick skin, i.e. thick murein walls without an OM) for this group that includes thin-murein negibacterial basal members and derived murein-free members, but excludes the descriptively and originally firmicute Actinobacteria, would be descriptively meaningless and conceptually confusing; so as before we avoid the ambiguous term Firmicutes, and recommend that others likewise abandon it.

Transition analysis excluded the root of the universal tree from neomura and Posibacteria, concluding that its most likely position is between Chloroflexi and all other organisms (Cavalier-Smith 2006c); that paper regarded Chloroflexi as negibacteria, i.e. as having an acyl ester phospholipid bilayer OM evolutionarily distinct from the secondarily derived mycobacterial OM. Unlike almost all other negibacteria, Chloroflexi lack LPS, so Sutcliffe (2011) argued that the outer layer is an S-layer not a membrane. Nobody doubts that LPS is absent in Chloroflexi, but that is not evidence for the absence of an OM of phospholipids, as Sutcliffe incorrectly assumed it to be; Keppen et al. (2018) prematurely assumed that the absence of LPS makes the chloroflexan Oscillochloris monoderm, when ultrastructurally it appears to be plausibly diderm with a visible OM. New micrographs of Pelolinea submarina convincingly show its outermost layer to be an OM (Imachi et al. 2014 Fig. S1C) with the same trilaminar structure as the CM. Moreover, Flexilinea (Sun et al. 2016) and Thermoflexus (Dodsworth et al. 2014) outer layers more closely resemble OMs than S-layers; Nitrolancea with a thicker envelope appears to have an OM just outside a thin peptidoglycan layer, plus an external thicker capsule that could be related to an S-layer. In the photosynthetic Chlorobaculum tepidum cryoelectron tomography, without chemical fixation, sectioning or staining that might distort structure, shows an OM indistinguishable in appearance from the CM (Kudryashev et al. 2014). These better resolved micrographs show that reassigning Chloroflexi to posibacteria (Cavalier-Smith 2014) based on Sutcliffe’s misinterpretation was incorrect. Numerous Chloroflexi porin-homologues are annotated in GenBank (including Chlorobaculum) making it likely that most Chloroflexi have a porin-traversed OM of simpler chemistry than most negibacteria. Though Chloroflexi lack the four core LPS biosynthetic genes (Antunes et al. 2016) many others annotated as involved in LPS synthesis are present in GenBank and might be involved in making historical precursors of some LPS components which must have existed before full scale LPS synthesis could have evolved in all its complexity. Therefore, the case for Chloroflexi being the earliest diverging negibacteria prior to LPS origin remains as strong as ever. Figure 2 indicates likely relationships amongst the major kinds of cell that our study aims to test and provide a more robust taxon-richer phylogeny for prokaryotes, especially the extremely diverse and likely ancestral eubacteria. In contrast to the apparent loss of LPS in the thermophilic negibacteria Thermotogales and Caldisericia, and its loss in some spirochaetes, some Hadobacteria, and a few parasitic proteobacteria (none of which has lost the OM, proving several times independently that OMs without LPS exist) (Sutcliffe 2010), LPS absence in Chloroflexi is likely the ancestral state for eubacteria (Cavalier-Smith 2006c).

Fig. 2
figure 2

The major kinds of cell and likely evolutionary relationships. Cell envelope and chromosome chemistry divides life into ancestral eubacteria, with murein peptidoglycan walls and DNA negatively supercoiled by DNA gyrase without histones, and derived neomura (probably over three times younger), with N-glycoproteins cotranslationally secreted by more complex SRPs and DNA passively negatively supercoiled by histones (some archaebacteria may retain eubacterial DNA gyrase and reverse gyrase and some lost histones). Eubacteria exhibit three grades of organisation: Chloroflexi (=Chlorobacteria), unusual negibacteria with an outer membrane (OM) of phospholipids but no lipopolysaccharide (LPS); glycobacteria, the majority of negibacteria (11 phyla), whose OM has an outer leaflet of LPS: and monoderm posibacteria whose ancestors lost the OM and comprise a majority of phyla Actinobacteria and Endobacteria. We argue that neomura arose after simultaneous loss of murein and OM by a planctobacterial glycobacterium with primitive microtubules; numerous recent discoveries make the older idea based on OM loss parsimony (Cavalier-Smith 1987c) that they arose from a posibacterium by losing murein only (dashed line) no longer tenable. Eukaryotes kept eubacterial acyl ester lipids but archaebacteria became hyperthermophiles by largely replacing them by stabler prenyl ether lipids (whose biosynthetic enzymes and diether variants probably arose much earlier in glycobacteria). Archaebacteria retained prokaryote cell structure and DNA segregation machinery but not microtubules, whereas eukaryotes evolved phagotrophy that caused evolution of an endomembrane system with coated vesicle budding and targeted vesicle fusion leading to origin of the nucleus, microtubule-based mitosis and consequential radical genetic changes, and enabled intracellular symbiogenesis by enslaving glycobacteria: a chromatophore-bearing α-proteobacterium as mitochondria to make kingdom Protozoa; and later a thylakoid-bearing cyanobacterium as chloroplasts to make kingdom Plantae. Kingdoms Eubacteria and Archaebacteria have non-homologous rotary extracellular flagella; but eukaryotes all descend from an ancestral biciliate protozoan with two immensely more complex microtubule-based intracellular bending cilia that undergo structural transformation once every cell cycle, the younger one losing its juvenile morphology in the second cell cycle. We do not portray the most complex membrane topology of all, found in kingdom Chromista, where chloroplasts, a red algal plasma membrane, and sometimes a relict nucleus, are present inside host ER lumen, having arisen soon after chloroplasts when a biciliate phagotroph enslaved an engulfed red algal symbiont (see Cavalier-Smith 2018). Eukaryogenesis is postulated to have involved three logically distinct stages (asterisks); mitochondria must have preceded spliceosomes and followed the prekaryote phase but might have become symbionts simultaneously with nucleus and cilium origins

All three negibacterial groups without LPS have OM porins, so homologous OM porins not LPS are the distinguishing feature of negibacteria, which had a single origin, but gave rise to monoderm posibacteria (by several OM losses) and as we show here independently to neomura. Thus, negibacteria had a monophyletic origin, whereas diderm prokaryotes are polyphyletic having arisen three times: negibacteria soon after the origin of life, the mycobacterial OM with a mycolic acid long after the origin of Actinobacteria, and the wall-less crenarchaeote Ignicoccus with an outer cell membrane (OCM) of diether lipids (Jahn et al. 2004; Rachel et al. 2002), which unlike negibacterial OMs is energised and evolved long after archaebacteria. We discuss the independent origins of these non-homologous diderm membranes.

Relationships amongst the above eubacterial groups and internal phylogeny of Negibacteria were not unambiguously answered by rRNA trees as they lacked basal resolution within the dense eubacterial bush with numerous near simultaneously diverging phyla (Woese 1987). rDNA trees were very useful for revealing the major gulf between eubacteria and archaebacteria, leading to the concept of three separate ‘domains’ for them and eukaryotes, and also for establishing preliminary phylogenetic clusters that often came to be called ‘phyla’. At present, there are roughly 30 deep-branching rDNA-defined eubacterial clusters, amongst which most relationships were unclear before our study. Though 29 were provisionally accepted as ‘phyla’ in a recent comprehensive classification of life (Ruggiero et al. 2015), it was noted that this number is highly inflated compared with eukaryote phyla because the widely used rule of thumb rDNA clustering criterion for phylum rank often does not indicate great morphological disparity in body plan amongst clusters as do eukaryote phyla, but in essence just reflects the weak resolution of rDNA trees for deepest branching patterns (as noted earlier in relation to deep-branching clades known only from environmental DNA sequencing (Cavalier-Smith 2002a)).

Multiprotein ribosomal protein (RP) trees now offer markedly higher resolution for prokaryote phylogeny than rDNA (Lasek-Nesselquist and Gogarten 2013; Raymann et al. 2015) giving a chance to resolve some of these issues, especially if evolutionarily more realistic and accurate, site-heterogeneous algorithms are used instead of site-homogeneous ones largely used for rRNA trees, which are more prone to long-branch attraction (LBA) artefacts (Lartillot et al. 2007). Previous site-heterogeneous analyses were taxonomically too undersampled for eubacteria to answer these questions: the broadest (Raymann et al. 2015), with 67 eubacteria, included only 13 of the 29 ‘phyla’; Lasek-Nesselquist and Gogarten (2013), with 42 eubacteria, included only 18. The 151 eubacteria included here represent all 29, plus other more recently recognised lineages, which we now conclude are better reduced to 14 robust phyla by merging clearly related similar groups—a twofold simplification of eubacterial diversity.

Introduction 3: outstanding key problems in archaebacterial cell evolution

Archaebacteria have many fewer deep divergences than eubacteria or eukaryotes and are divisible into just two (probably sister) clades: Euryarchaeota (most methanogens and halophiles) and Filarchaeota, best ranked as phyla (Ruggiero et al. 2015) or subphyla (Cavalier-Smith 2014). Filarchaeota originally comprised classes ‘Crenarchaeota’, ‘Thaumarchaea’ (including ‘Cenarchaeum’ and ‘Caldiarchaeum’), and ‘Korarchaea’ (Cavalier-Smith 2014) and should now also include the more recently discovered Asgard archaebacteria (Zaremba-Niedzwiedzka et al. 2017) as a fourth class (here informally called Asgardia) as they all share the group-defining eukaryote-like ESCRT III proteins and actin, absent in most Euryarchaeota. RP trees suggested that the archaebacterial root may lie within euryarchaeotes (Lasek-Nesselquist and Gogarten 2013; Petitjean et al. 2015; Raymann et al. 2015) but trees for a set of 38 longer, more conserved proteins place the root instead between Euryarchaeota and Filarchaea (Petitjean et al. 2015), the most likely position on cell evolutionary grounds (Cavalier-Smith 2014). However, these trees did not include any Asgardia and also excluded a group of lineages of simplified and genomically reduced ultrasmall archaebacteria with extra-long branches on trees that sometimes form a clade distinct from both euryarchaeotes and filarchaeotes, called DPANN (i.e. ‘Diapherotrites’, acidophilic ‘Parvarchaeum’ and ‘Micrarchaeum’, ‘Aenigmarchaeota’, ‘Nanoarchaeota’, and halophilic ‘Nanohaloarchaea’ (Rinke et al. 2013)).

‘Nanohaloarchaea’ are strongly sisters of Halobacteriales on the rDNA tree in the absence of other long-branch DPANNs (Narasingarao et al. 2012) so were put in phylum Euryarchaeota by Ruggiero et al. (2015). A concatenation of 38 conserved genes with 32 RPs strongly confirmed that and showed that ‘Nanoarchaeum’ and ‘Parvarchaeota’ did not group with ‘Nanohaloarchaea’, but both were separately within euryarchaeotes (Petitjean et al. 2015). That strongly indicates that a DPANN grouping is in part an LBA artefact and that ‘Nanoarchaeum’ is not the earliest branching archaebacterium as sometimes claimed. When ‘Nanohaloarchaea’ and the other longest branching DPANNs were removed, the remaining DPANN strongly grouped as one clade within euryarchaeotes as the second deepest branch (distinct from Halobacteriales) in a 45-protein analysis including some RPs (Williams et al. 2017). Though their trees strongly argued against DPANN being a clade distinct from Euryarchaeota, Williams et al. (2017) presented evidence from a questionable analysis of gene losses and gains by LGT (which could have been confounded by convergent massive gene loss by DPANN lineages) that the archaebacterial root lies between DPANN and all other archaebacteria, contradicting their earlier outgroup-independent rooting between Filarchaeota and Euryarchaeota/DPANN (Williams et al. 2015). To clarify these controversies, we included representatives of all major DPANN lineages in our 60-taxon archaebacterial RP analyses (selectively favouring those with shortest branches to reduce LBA) as well as lokiarchaeotes to represent Asgardia. None of our trees placed the root within non-DPANN euryarchaeotes as did Raymann et al. (2015) or within Filarchaeota, but both the position of DPANNs which appeared as one or more often two clades and of the root were sensitive to taxon sampling and method, the root often seeming within or beside DPANNs; we think this is a long-branch artefact and favour a root between Euryarchaeota/DPANN and filarchaeotes as in rDNA trees of Williams et al. (2015) and the 70-protein trees of Petitjean et al. (2015).

Introduction 4: long inter-domain stems magnify problems of rooting RP subtrees

It is well known that establishing the root position of a tree by outgroup rooting can be much more difficult than determining the group’s internal branch topology. Rooting is especially difficult when outgroup branches are very long and differ greatly in sequence from ingroups. For universal rRNA trees, the stem at the base of crown eukaryotes is much longer than the entire crown depth and the stem at the base of neomura is much longer than the depth of either the archaebacterial or eukaryote crown radiations. These two hugely stretched stems arise because of temporary, episodic hyperacceleration of nucleotide substitution rates just before archaebacteria and eukaryotes diversified (Cavalier-Smith 2002a). Their immense length made it very easy to divide organisms cleanly into three domains but make determining the position of the root of eukaryotes, archaebacteria, and neomura extremely difficult, both because the original information relating to the root position has been multiply overlain by repeated substitutions and because of long-branch artefacts. Therefore, it proved impossible to determine reliably the position of any of these three root positions using site-homogeneous 16s/18S rDNA trees (Cavalier-Smith 2002a). Even with combined large and small subunit rDNA sequences and improved site-heterogeneous methods, the apparent positions of the eukaryote and archaebacterial roots on three-domain trees are so contradictory amongst methods and taxon samples (e.g. Foster et al. 2009; Williams et al. 2012) that none to date is credible. All are contradicted for both the eukaryotic and archaebacterial roots by RP trees (Lasek-Nesselquist and Gogarten 2013; Petitjean et al. 2015; Raymann et al. 2015). RP trees also have extremely stretched eukaryote and neomuran stems (Lasek-Nesselquist and Gogarten 2013), which Petitjean et al. (2015) rightly attribute to temporarily hugely accelerated amino acid substitution—they noted that neomuran stem acceleration was greater than for the 38 more conserved proteins proving that RPs cannot be a uniform ‘molecular chronometer’. These long stems show that all components of the ribosome underwent coevolutionary ultrarapid evolution during the origin of the cell nucleus and of the novel neomuran RPs, probably for reasons previously partially explained (Cavalier-Smith 2002a) which include coevolution with the novel features of the ribosome-associated neomuran signal-recognition particle (SRP) which underwent more radical changes during the origin of neomura (the neomuran revolution: Cavalier-Smith 2014) than at any other time since the first cells evolved.

This episodic ribosomal evolution during the neomuran revolution and eukaryogenesis grossly exaggerates the duration of eukaryote and neomuran stem evolution relative to crown evolution if one were to erroneously apply a single molecular clock to any universal ribosomal tree; another example of a highly inflated stem at the base of a clade on multigene trees concerns Foraminifera, whose fossil record is so extensive that one can prove that the stem is in fact grossly inflated compared with the crown as Cavalier-Smith et al. (2018) explained in detail. Because the fossil record is so much less good for archaebacteria and stem eukaryotes, the inflation of their ribosomal tree stems had to be inferred by more indirect correlation between trees and fossil evidence and so is not yet appreciated by all. However, though rDNA and RPs clearly coevolved, their relative tempo was not the same during these two evolutionary episodes: for rDNA, the eukaryote stem is much longer than the neomuran stem, whereas for RPs, the reverse is true. Thus, RPs were relatively more affected than rRNA during the neomuran revolution, presumably because that involved the greatest change in RP composition in the history of life.

The neomuran stem on the (incorrectly rooted) RP tree of Lasek-Nesselquist and Gogarten (2013) represents an average of 5.4 amino acid substitutions per site. Most RP sites must have been overwritten many times since archaebacteria diverged from eubacteria, so it is not credible that enough sites could have persisted unchanged in neomura since that epoch to allow consistent determination by RP trees where within the roughly 30 deep branching eubacterial clades neomura actually arose. That probably explains why the apparent eubacterial origin point for neomura is completely different in all three previous site-heterogeneous RP analyses (Lasek-Nesselquist and Gogarten 2013; Petitjean et al. 2015; Raymann et al. 2015) and also different from earlier rDNA analyses. Here, we run separate one-domain, two-domain, and three-domain RP trees in order to disentangle the logically distinct problems of the internal phylogeny of each domain (for which we show RP trees provide highly credible solutions) from those of rooting each domain and placing it accurately relative to ancestral domains, for which the highly stretched internal stems make RPs very bad phylogenetic markers. We conclude that widespread underappreciation of this problem has led to an exaggerated trust in the overall conclusions possible from three-domain universal ribosomal molecular trees, which we show suffer from more distortion than do two-domain trees.

Introduction 5: Need for more accurate, critically interpreted taxon-rich RP trees

A taxonomically rich maximum likelihood (ML) three-domain tree for 16 RPs from 3,083 taxa using 2596 amino acids heralded as ‘a new view of the tree of life’ (Hug et al. 2016) illustrates the serious pitfalls of massive automated site-homogeneous trees if we examine its branching order within eukaryotes, whose phylogeny is much better established than for prokaryotes by multiple lines of evidence. Though many younger clades are reasonable, problems are greatest amongst the deepest branches. Nine examples: (1) the apusomonad protozoan Thecomonas trahens appears with 100% support as sister to the apicomplexan Toxoplasma gondii within the alveolate Chromista—completely different kingdom (they are actually as distantly related as humans and grass)—with three lower strongly supported nodes that are all false. (2) The apusomonad Manchomonas bermudensis groups with another apicomplexan Theileria annulata with 98% to form a false clade that appears wrongly as sister to glaucophytes (kingdom Plantae) that is ‘sister’ to another multiply false clade comprising Rhodophyta (Plantae) into which are intruded three unrelated lineages from kingdom Protozoa. (3) Alveolates are not a clade, not only for these reasons but also because ciliates are completely misplaced within a cluster of Amoebozoa that belong to a different kingdom. (4) Opisthokonts that are easily robustly found to be monophyletic on all good multigene trees and on many single-gene trees are not a clade, as Nuclearia groups in a false deep clade with a metamonad and an amoebozoan (none of these three group with their true relatives)—we ignore the fact that one ‘arthropod’ groups within flowering plants which must be a mix up! (5) Rhizaria do not group with alveolates plus heterokonts as they do on every good multiptrotein tree. (6) Haptophytes which on any good single-gene or multiprotein tree form a robust clade appear polyphyletic. (7) Amoebozoa wrongly appear polyphyletic as do other well-established clades. (8) The parasite Giardia is shown as the deepest branching eukaryote and is nowhere near it real metamonad relative Trimastix and two nodes away from its true sister Trichomonas (both should be much higher in the tree). (9) The second deepest branch is the cryptomonad nucleomorph which is an enslaved red algal nucleus that should have grouped with rhodophytes. In fact, the branching order of all nine deepest branching ‘clades’ within the eukaryote domain are meaningless and false; many are false clades. These profound errors probably mainly reflect LBA, which likewise long ago wrongly put Giardia, Trichomonas, and other long branches like Microsporidia at the base of eukaryotes on site-homogeneous three-domain trees, thereby grossly misleading our understanding of eukaryote early evolution (Cavalier-Smith 2002a). This 16-protein tree is even more profoundly misleading than was rDNA and beautifully exemplifies the criticism made by Gouy et al. (2015) that studies of relationships amongst the three domains and of the overall root of the tree typically accept much lower phylogenetic technical standards than are de rigeur for eukaryotes and that several questions widely assumed to be settled are not.

If the 16-RP basal branching order is completely wrong for eukaryotes in nine serious ways, it may also be completely wrong for archaebacteria and eubacteria, but because most biologists know no way of cross-checking prokaryote phylogeny other than sequence trees and tend uncritically to accept their results they would be harder to recognise. However, we must not reject RP trees altogether just because some have given ridiculous results. Our present study of 26- and 51-protein RP trees shows that one can with a carefully curated data set from 354 taxa obtain three-domain RP trees without any of the problems just enumerated in the eukaryote subtree. Our results are congruent for eukaryotes with the best independent evidence, but imply that most of the deep branches within that 16-RP tree for archaebacteria and eubacteria (Hug et al. 2016) are indeed false and totally misleading. We therefore present a genuinely new view of the tree of life with potentially more reliable conclusions.

If eubacteria are the only primary domain of life and neomura are their much more recent descendants, as palaeontology and indel and transition analysis all suggest, then it is important to have a more comprehensive robust eubacterial phylogeny to better understand life’s early evolution. We therefore assembled RP sequence data for all 29 eubacterial ‘phyla’ recognised by Ruggiero et al. (2015) to enable new site-heterogeneous phylogenetic analyses, wherever possible including several or at least two phylogenetically widely distinct representatives of each. We also included a 30th ‘phylum’ (Melainabacteria: Di Rienzi et al. 2013) discovered since Ruggiero et al. (2015). Our trees including 151 eubacteria allow us to conclude that no more than 14 (perhaps only 13) genuine phyla (each robustly supported by both site-heterogeneous and site-homogeneous methods) are needed to encompass the presently known phylogenetic diversity of eubacteria. Our 26-protein trees for the first time establish a robust phylogeny amongst most of them, greatly clarifying these and other phylogenetic questions, and highlight key remaining issues.

Another limitation of earlier three-domain site-heterogeneous RP trees is that they were weakly sampled for eukaryotes (18 species in Raymann et al. (2015), 35 in Lasek-Nesselquist and Gogarten (2013)) and excluded most protozoan phyla and poorly sampled all five eukaryotic kingdoms; they were also mutually contradictory with respect to the root position and internal phylogeny of eukaryotes—though they were greatly superior to the 16-RP ML tree (with far more taxa) criticised above. The two-domain neomura-only and three-domain trees of Raymann et al. (2015) were also mutually contradictory. Other site-heterogeneous three-domain multiprotein trees (predominantly including RPs but not restricted to them) included still fewer eukaryotes (10) and yielded strongly contradictory eukaryotic phylogenies (Williams and Embley 2014; Williams et al. 2012, 2013), most clearly wrong in comparison with taxonomically far richer (109–171 taxa) eukaryote multiprotein trees based on 187 conserved proteins (Cavalier-Smith et al. 2014, 2015a, b) and there were similar contradictions in eukaryote phylogeny and root between three-domain and neomuran trees (Williams et al. 2012). If these RP trees are clearly wrong for eukaryotes, how reliable are they for prokaryotes? As much experience indicates that taxonomically rich trees are more reliable than sparse ones, we decided to compare taxonomically rich eukaryote RP trees with the now mostly robustly resolved 187-gene trees (Cavalier-Smith et al. 2015a, 2018). To facilitate exact comparison this study focuses on the 51 RPs from our 187-protein alignments that are shared with archaebacteria. We constructed separate 51-protein trees for 143 eukaryotes representing all major lineages, 60 archaebacteria, and 203 neomura in order to determine whether or not inclusion of distant outgroups distorts two-domain trees. We also constructed 26-RP trees for all three groups as well as 26-RP three-domain trees to allow critical comparison between one-, two-, and three-domain trees. We constructed site-homogeneous and site-heterogeneous trees using 26 and 51 proteins for all three two-domain combinations as well as for three domains and for archaebacteria or eukaryotes only plus 26-protein trees for eubacteria. Though we found that 51-RP site-heterogeneous trees are slightly less good for eukaryotes than 187-protein trees and 26-RP trees a little less good, both taxon rich RP trees were much more congruent with 187-protein eukaryote trees than were published more sparsely sampled RP trees, which confirms that richly sampled site-heterogeneous RP trees can be relatively reliable—though site homogeneous maximum likelihood (ML) trees were more discordant. To better understand the strengths and limitations of RP trees, we compare the largely congruent, but partially conflicting, results of all these trees.

We discuss how our results clarify distortions of single-domain RP trees by foreign domain outgroups and the strengths and limitations of RPs for reconstructing the universal tree of life, and interpret results in the light of other evidence for rooting the entire tree and each domain. Our taxon-rich RP trees improve eubacterial internal phylogeny substantially, but we did not expect them to resolve the exact ancestry of neomura, though hoped more thorough eubacterial sampling would better define the limitations of RP for correctly placing neomura within eubacteria. Unsurprisingly, our trees show slightly contradictory positions for neomura within negibacteria, but are most consistent with an origin from Planctobacteria, which several other recent discoveries have favoured (Reynaud and Devos 2011). This agrees with a few previous rDNA trees that excluded faster evolving sites (Brochier and Philippe 2002) or used more accurate site-heterogeneous algorithms (Williams et al. 2012); both contradicted earlier site-homogeneous rDNA trees that grouped neomura with hyperthermophilic Thermotoga and/or Aquifex that was reasonably attributed to a long branch artefact; however, these earlier authors overlooked their trees’ evidence for a neomuran relationship with Planctobacteria as they incorrectly rooted them in the neomuran stem (explained: Cavalier-Smith 2006c).

Though sharing of phosphatidylinositol and proteasomes by actinobacteria and eukaryotes earlier favoured posibacterial actinobacteria as the closest eubacterial relatives of neomura (Cavalier-Smith 1987c; 2006c), discovery in posibacterial Bacillus of isoprenoid ether lipids with the same sn-glycerol-1-phosphate stereochemistry as in archaebacteria (Guldan et al. 2011) seemed to favour endoposibacteria (i.e. monoderm Endobacteria) instead as the sisters or ancestors of neomura, which is also more consistent with evidence from indels (Lake et al. 2009; Valas and Bourne 2011) and signal recognition particle structure (Cavalier-Smith 2010d). However, enzymes making sn-glycerol-1-phosphate were recently discovered to be widespread not only in both actinobacteria and endobacteria, but also in Sphingobacteria and more scattered in some members of the vast majority of negibacterial phyla (Coleman et al. 2019), so no longer specifically favour posibacteria as neomuran ancestors. Our RP trees give no support to the idea that neomura arose from any posibacteria or for posibacterial monophyly (Cavalier-Smith 1987c), and also confirm that endoposibacteria are probably polyphyletic—they must have had a more complex evolutionary history than was previously realised (Yutin and Galperin 2013) and cannot reasonably be placed beside the root of the tree of life as some do (Lake et al. 2009). Instead, RP trees best fit the idea that Planctobacteria (the phylum that embraces Planctomycetes, Chlamydiia, and Verrucomicrobia: Cavalier-Smith 1987b, 2002a) are ancestral to neomura, which implies that the secondarily wall-less intermediate ancestor on neomura created by murein loss simultaneously lost the planctobacterial OM, as we explain. Our trees show it is harder than often supposed to establish the roots of the archaebacterial and eukaryote subtrees, but are consistent with (a) a root for archaebacteria between Filarchaeota and Euryarchaeota, with differential character loss between them and (b) eukaryotes being sister to Archaebacteria rather than Filarchaeota, which better explains numerous character distributions across the three domains, including the origins of archaebacterial and eukaryote N-linked glycoprotein synthesis machinery than previous interpretations (Cavalier-Smith 1987c; Lombard 2016), as we shall explain in a new synthesis of the transitions between the three domains.

The better eubacterial taxon-sampling of our trees reveals that Thermotoga and Aquifex, whose relationship was previously highly controversial (Eveleigh et al. 2013), are each part of two separate ancient taxon-rich negibacterial thermophilic lineages, older Synthermota and younger Aquithermota, both ranked as phyla, which greatly simplifies eubacterial phylogeny. So also does our clear evidence for the unity of Endobacteria and of a broadened Proteobacteria, despite the marked internal morphological diversity of each. Our improved trees allow us to recognise as few as 14 distinctive and robustly monophyletic eubacterial phyla, rather than the hugely inflated 92 ‘phyla’ in the flawed 16-protein analysis (Hug et al. 2016). Furthermore, our site-heterogeneous trees have strong support for the relative branching order amongst them, except at one weakly supported node. These taxon-rich site-heterogeneous RP trees therefore provide a firmer basis for understanding eubacterial diversification and evolution than previously.

Having strengthened evidence for a planctobacterial origin for neomura, we present a new synthesis for origins of archaebacteria and eukaryotes, which explains better than hitherto how both originated and diverged so radically from each other and their eubacterial ancestors. In so doing, we clarify numerous past confusions and refute many widespread misconceptions about the tree of life. As this necessarily makes the paper very long, readers may first like to read the 26 major conclusions at the end.


From previous alignments used for eukaryote 187-protein trees (Cavalier-Smith et al. 2014, 2015a, b, 2016), we selected the 51 RPs shared with archaebacteria from 143 eukaryotes that represent all major taxa except Microsporidia and Ectoreta (both excluded because of their exceptionally long-branches that might confuse trees with distant outgroups) and red algae (excluded because chromists are historically chimaeras of a heterotrophic host and an enslaved red alga some of whose genes might be overlooked and thus included for some chromist taxa instead of host genes causing them artefactually to attract red algae on trees: Cavalier-Smith et al. 2015a). From these RPs, we selected the 26 also shared with eubacteria and then added RPs from 60 archaebacteria and 151 eubacteria to these two core alignments, starting with the prokaryote RPs from Lasek-Nesselquist and Gogarten (2013) to which we added archaebacterial RPs from Eme et al. (2013) and numerous prokaryote RP sequences from GenBank. For archaebacteria, we added sequences representing the full diversity of DPANN taxa and lokiarchaeotes to represent Asgardia (both omitted in previous RP analyses). For eubacteria, we included sequences for all 29 ‘phyla’ recognised in Ruggiero et al. (2015) plus Melainabacteria, the majority not represented by earlier site-heterogeneous multiprotein RP trees. We also included a sample of chloroplast and mitochondrial sequences to enable arguments based of their position to be used for relative dating of some eubacterial branches compared with eukaryotes. Alignment was manual, by eye using MacGDE.

Phylogenetic analysis by maximum likelihood (ML) used RAxML-MPI v.7.2.8 PROTGAMMALGF with four gamma rates and 100 fastbootstraps. Site heterogeneous analyses (abbreviated as CAT) used PhyloBayes-MPI v.1.4e GTR-CAT-C-4 rates, the most accurate method readily available that can cope with so many taxa, and at least two chains. Trees were constructed for each chain plus a consensus tree for both after we removed early trees as burnin; the burnin cutoffs and degree of convergence varied amongst datasets as specified in individual figure legends. ML and CAT trees were constructed for eukaryotes, archaebacteria, and eubacteria, separately, for all three domains, and for all combinations of two domains, i.e. 7 distinct taxon samples. Except for eubacteria-only trees that used only 26 RPs, the other six were run separately for 26 and 51 RPs. Because of extremely long branches of the highly divergent mitochondrial sequences, they were omitted from these analyses, but we ran separate eubacterial analyses including mitochondria giving 28 separate analyses for overall comparison. Trees were run on 256 processors in parallel. ML trees took under 5 days but CAT trees were run for at least 10 days (up to a maximum of 45) until they fully converged or we became convinced that one or two branches were so strongly discordant between chains that they would never fully converge. We also ran PhyloBayes-MPI v.1.4e Poisson-CAT-C-4 rate trees for three-domain and one-domain trees for one RP selection in case this simpler but less accurate algorithm would allow quicker or more complete convergence.

We first consider the single-domain trees, then the two-domain trees, before the three-domain trees. As site-heterogeneous trees are theoretically and largely in practice more accurate, figures will show the CAT-GTR trees with support values for CAT-GTR, CAT-Poisson, and ML plotted on them, and major differences noted in the text. In general (especially for CAT-GTR), there were only a few differences between 51 gene and 26 protein trees for one taxon sample so 51-protein trees are discussed first before noting differences using fewer genes. Except for eubacteria, 26-protein trees are in supplementary material. After discussing individual trees, we evaluate their overall implications for establishing a universal tree of life, and better understanding prokaryote phylogeny and major steps in cell evolution, especially origins of neomura, archaebacteria, and eukaryotes.

Alignments for all 51 RPs and for SMC are in supplementary material, as are treefiles for Figs. 3, 4, 5, 6, 7, 8, 9, and 10 and 12.

Fig. 3
figure 3

Site-heterogeneous PhyloBayes CAT-GTR tree for 51 ribosomal proteins from 143 eukaryotes representing all the most divergent lineages. Support values for bipartitions are from left to right: posterior probabilities for CAT-GTR, posterior probabilities for CAT-Poisson, RAxML bootstrap percentages for 100 pseudoreplicates; black blobs signify maximal support by all methods in this and all other figures. To fit the page branches for major taxa are collapsed and the number of species included in each given beside their label; their names are shown on uncollapsed trees in Supplementary material, e.g. Fig. S1. The CAT-GTR tree summed 103,304 trees after removing 40% as burn in; both chains converged satisfactorily - maxdiff 0.276977. The CAT-Poisson tree summed 201,391 trees after removing 20% as burn in, but its two chains had slightly different topology (see text)—maxdiff 0.96. The tree is rooted within Eozoa between discicristates and jakobids, but leaving their relative branching order compared with Tsukubamonas as an unresolved trifurcation as it is unclear whether Tsukubamonas is more closely related to jakobids or to discicristates or the deepest branching lineage (see Cavalier-Smith 2017, 2018). However, it remains controversial whether Eozoa is the basal eukaryote group as shown or whether it is a clade (see text and Fig. 6); in any case, the bifurcation between Eozoa and neokaryotes is the most strongly supported dichotomy on the basal backbone of the RP tree. Compared with Eozoa, whose deep branches are well spread out and fully resolved by all methods, basal branches of neokaryotes form an explosively rapid radiation that is necessarily relatively poorly resolved

Fig. 4
figure 4

Site-heterogeneous PhyloBayes CAT-GTR tree for 51 ribosomal proteins from 60 archaebacteria representing all the most divergent lineages. Support values for bipartitions are from left to right: posterior probabilities for the CAT-GTR chain 1 analysis (50,437 trees summed after removing 40% of trees as burnin; chain 2 was identical except for rhe position of ‘Nanohaloarchaea’ which were sister to Aenigmarchaeota/GWA2_AR5 in the position shown by arrow NH2 as also on the ML tree), posterior probabilities for the CAT-Poisson (126,435 trees summed after removing 20% as burnin: maxdiff 1), RAxML bootstrap percentages for 100 pseudoreplicates. Arrow NHP shows the contradictory position of Nanohaloarchaea on CAT-Poisson analyses. Asgard archaebacteria are represented only by lokiarchaeotes as other sublineages were unavailable when our analyses began

Fig. 5
figure 5

Site-heterogeneous PhyloBayes CAT-GTR tree for 26 ribosomal proteins from 151 eubacteria representing all the most divergent lineages with cultivated representatives plus Melainabacteria and chloroplasts. Support values for bipartitions are from left to right: posterior probabilities for the CAT-GTR, posterior probabilities (PP) for the CAT-Poisson, RAxML bootstrap percentages for 100 pseudoreplicates. To fit on the page branches for some major taxa are collapsed and the numbers of species included for each given beside their label; their names are shown on uncollapsed trees in Supplementary material, e.g. Figs. S9, S10. Despite 70,629 trees being summed after removing the first 30% of them as burnin the two chains did not converge (maxdiff 1) because of two persistent topological differences within Endobacteria at nodes where PP are shown in red. The CAT-Poisson tree did converge (maxdiff 0.328 after we removed 20% as burnin and summed 89,031 trees) on a slightly different topology that also implies five OM losses within Endobacteria; all 14 phyla were clades; branching order of phyla was the same except that Hadobacteria and Fusobacteria were sisters (0.6 support) and Sphingobacteria sisters (0.72) to Spirochaetes not Planctobacteria. The six probably ancestrally monoderm clades are marked by an open brown oval. All others were ancestrally negibacteria with a porin-bearing OM (the two in Endobacteria are labelled OM). Polyphyletic wall-less mollicutes are marked by a black blob beside their names

Eukaryote ribosomal protein trees

As Fig. 3 shows, CAT topology for 51 RPs is remarkably similar to that with 187-proteins (Cavalier-Smith et al. 2014, 2015a, b, 2016). Most clades are maximally supported by CAT; 69 of these are also maximally supported by ML. Every one of these plus all those additional clades with at least 95% support by both methods was also found on previous 187-protein trees. The least well-supported clades are those at the base of corticates (i.e. Chromista and Plantae, notably affecting the basal branching of Plantae and Hacrobia, neither of which appears as a clade as they should; basal branching within chromist subkingdom Harosa is as robust as with 187 proteins) and at the base of scotokaryotes (primarily affecting the basal branching order within and amongst the protozoan phyla Sulcozoa, Neolouka, and Amoebozoa). Corticata are a weakly supported clade by CAT—but not by ML because of incorrectly intruding sulcozoan planomonads, which 187-protein trees show are deep-branching scotokaryotes (Cavalier-Smith et al. 2014, 2013a, b).

Almost all topological discordances between CAT and ML relate to the deepest branches in corticates (8 contradictions) and Amoebozoa (8 contradictions)—there are only two others: one within Filosporidia in opisthokonts, one within Jakobea in Eozoa. All these contradictions have frequently been noted in multigene eukaryote trees based on over a hundred proteins and stem from their involving numerous extremely closely diverging branches reflecting explosive early radiations. Even for the difficult phylum Amoebozoa, Fig. 3 CAT topology recovers all seven classes as clades as well as subphylum Conosa, exactly as in 187-protein trees (Cavalier-Smith et al. 2016) and even 325-protein trees (Kang et al. 2017). It differs from these only in the insignificantly supported position within Conosa of the archamoeba Phreatamoeba and in the weakly supported position of Cutosea relative to Tubulinea and Discosea. The position of Cutosea is slightly uncertain even with 325 proteins and was different for 187 proteins, so for Amoebozoa the 51 RP CAT tree is only slightly less good than with 187 or 325 proteins; discordant branches all have weak support, encouraging caution in interpretation. The ML tree corresponding to Fig. 3 had substantially lower support for many bipartitions and a less accurate topology (not shown), not only with respect to planomonads but also in wrongly placing Cutosea within Discosea making Discosea seem paraphyletic. Our CAT GTR 51-protein tree was markedly superior for Amoebozoa than a tree using the slightly less accurate CAT Poisson algorithm for only seven proteins that wrongly placed Cutosea within Conosa and Tubulinea within Discosea (Panek et al. 2016), though that tree more correctly placed Archamoebae as sisters of Mycetozoa—perhaps because it included eight Archamoebae, not just one.

The main weakness of the 51-protein CAT tree is that it does not resolve the base of Corticata or scotokaryotes accurately, and also shows scotokaryotes as weakly paraphyletic not a clade. However, these branches are relatively much closer and more numerous than in any prokaryote trees discussed below so the good performance of RPs for eukaryotes—if (and perhaps only if, given the discrepancies seen on previous sparser trees) they are taxonomically richly sampled—suggests that similar RP trees for prokaryotes ought to be reasonably reliable provided taxon sampling is sufficiently comprehensive. The corresponding Poisson tree was very similar but differed in some support values and a few branching orders for less well-supported clades. In three respects, Poisson was better (in comparison with the best 147-protein trees) than CAT: the moss Physcomitrella and pteridophyte Selaginella were correctly successive branches not sisters (0.98); the opisthosporidian protozoan Rozella was correctly sister to all Fungi and not weakly sister to Allomycota (0.78 support for exclusion from Fungi); Nuclearia was correctly sister to Fungi/opisthosporidia not holozoa. Poisson was worse in Corbihelia being scattered (differently on the two chains) not a clade. Sulcozoan phylogeny differed by Poisson but was not obviously overall better or worse: e.g. the deepest branching neokaryote apparent clade was Mantamonas/Collodictyon not Breviatea/Trimastix, and planomonads were sisters of opisthokonts (swapping position with apusomonads/Mantamonas). Within Amoebozoa Cutosea wrongly intruded into Discosea. As with CAT, hacrobian lineages intruded into Plantae near Viridiplantae but the chains were contradictory with respect to the positions of their subclades. Thus, both site-heterogeneous 51-RP trees were good (better than ML) but imperfect in slightly different ways.

However, the 26 protein CAT tree (Fig. S1) is generally somewhat less good: only 61 instead of 69 clades were maximally supported by both methods and support for other well-established clades was usually lower. Unlike in Fig. 3, there was no clear bipartition between corticate and scotokaryote clades as glaucophytes (Plantae) jumped from corticates into scotokaryotes as sister to the insignificantly supported false clade comprising breviates and Trimastix on CAT, whereas on ML trees (Fig. S2) glaucophytes were wrongly sister to breviates alone and planomonads wrongly intruded into corticates as with 51 genes. As with 51 genes by ML, Cutosea wrongly intruded into Discosea but with different overall topology. Despite these deficiencies, it is surprising quite how good the 26-gene RP tree is compared with 187-protein tree, as it correctly reconstructed a large majority of those clades that are well supported on trees using over 187 or more proteins and is only seriously defective for those that have been the most difficult of all to establish. In one respect, the CAT-GTR 26-protein tree is better than the 51-protein one: the opisthosporidian Rozella is sister to Fungi and does not incorrectly branch within Fungi, though ML still places Rozella incorrectly with Chytridiomycetes, making Fungi seem paraphyletic. The 26-protein CAT-GTR tree is clearly wrong only for branches that are also wrong or else rather weakly supported with 51 proteins. So even 26 RP CAT trees should be quite good for prokaryotes—better than ML.

Archaebacterial ribosomal protein trees

The 51-protein CAT-GTR tree did not converge fully because of an irresolvable contradiction in the position of ‘Nanohaloarchaea’ between the two chains whose individual trees had otherwise identical topology. Chain 1 (Fig. 4) was identical to the two-chain consensus tree in placing them as sister of Halobacteriales with maximal support, as strongly shown by the rDNA tree (Narasingarao et al. 2012) and the 70-protein tree of Petitjean et al. (2015). However, chain 2 discordantly placed ‘Nanohaloarchaea’ with 0.97 support as sister to ‘Aenigmarchaeota’ (not included in the analysis of Petitjean et al. (2015)) within a DPANN clade that branched within Euryarchaeota as a sister to all core euryarchaeotes other than Thermococcales (Fig. S3). Figure 4 and the consensus tree by contrast both show all DPANN other than ‘Nanohaloarchaea’ as a single clade, that we here designate Microarchaea. Clade ‘Microarchaea’ had maximal support on chain 2, where nanoarchaeotes and ‘Parvarchaeum’ formed a subclade with 0.97 support that was sister to aenigmarchaeotes with 0.98 support; ‘Micrarchaeum’ grouped with ‘Iainarchaeum’ with insignificant (0.48) support. The bipartition between phyla Euryarchaeota and Filarchaeota was maximally supported by all methods. Within Filarchaeota, class Nitrososphaeria (=thaumarchaeotes) (always including aigarchaeotes nested within—not a separate group) was strongly supported as sister to Sulfolobia cl. n. by CAT-GTR, weakly by ML; this joint clade was sister to Candidatus ‘Korarchaeum’ and Asgardia were strongly supported as the deepest branch, sister to subphylum Crenarchaeota (i.e. ‘Korarchaeum’ plus Sulfolobia/Nitrososphaeria. Subphylum Crenarchaeota Cavalier-Smith 2002 is the correct formal name for what some later unnecessarily called the TACK clade; TACK stands for initial letters of four subclade names of subphylum Crenarchaeota, none nomenclaturally valid. Unreasonable rejection (see Tindall 2014) of class Crenarchaeota Cavalier-Smith 2002 means that this longstanding name can never again be legitimately used for a class, so our Taxonomic Appendix creates replacement name Sulfolobia for the class, but subphylum Crenarchaeota is not rejected and remains legitimate. Throughout the rest of this paper, we therefore use Nitrososphaeria to include all thaumarchaeotes and aigarchaeotes, and Crenarchaeota for the whole subphylum (Fig. 4), not just the invalid class; unavoidable invalid names are usually in quotes or lower case.

The ML 51-protein tree had only four differences, all in euryarchaeotes: (1) ‘Nanohaloarchaea’ moved into Micrarchaea to become sister of aenigmarchaeotes with insignificant (40%) support to form a DPANN clade with moderate (80%) support; (2) ‘Parvarchaeum’ moved to sister of ‘Micrarchaeum’ with insignificant (44%) support, almost certainly LBA as these are the tree’s two longest branches. Twenty-seven clades had maximal support by both methods; (3) Methanopyrus moved up a node to be sister to Methanococcales/Methanobacteriales, making class Methanothermea a clade. Most clades with less than 100% by ML were strongly supported; (4) Ignicoccus moved down one node with scarcely significant (50%) support. Only one ML clade unaffected by movement of these four branches was insignificantly supported. Classes Picrophilea and Protoarchaea (Cavalier-Smith 2002a) were maximally supported by both methods; indeed, all five euryarchaeote classes established by Cavalier-Smith (2002a) were distinct clades by ML, with only Methanothermea weakly supported, so it is odd that most papers ignore classes, labelling only the more numerous orders (e.g. Raymann et al. 2015). Clearly, they well reflected euryarchaeote large scale diversity before the discovery of Micrarchaea, which deserve to be made a sixth euryarchaeote class when species are described and it can be validly published by designating types. Even though all five were validly published at the time, they were unfairly rejected recently (Tindall 2014) and even had they not been they would be invalid as incorrectly formed under the new rules—as are all class level names suggested by Petitjean et al. (2015). Figure 4 therefore uses the new replacement class names established in the Taxonomic Appendix in conformity with current rules.

The 51-RP CAT-Poisson tree differed from CAT-GTR primarily in having a DPANN clade that was placed within euryarchaeotes as sister to SCGC AAA251-l15 which moved down four nodes so the joint clade was sister to all euryarchaeotes except Thermococcia. In addition, Lokiarchaea moved up two nodes to be within crenarchaeota as sister to Nitrososphaeria. Interestingly, new class Methanocellia was a strongly supported clade by all three methods, whereas previous site-homogeneous trees had often shown it as paraphyletic ancestors of Halobacteriales (Brochier-Armanet et al. 2011; Petitjean et al. 2015).

The 26-protein CAT tree (Fig. S4) converged well between chains (maxdiff 0.0666) and gave a broadly similar topology to 51-RPs but with often somehat lower support (only 23 clades had maximal support by both methods) and four differences in topology: (1) Thermococcales moved into Methanobacteriia as maximally supported sister to Methanococcales. (2) Methanomassiliicoccus moved one node to be sister to MBGD_SCGC_AB-539-N05; this change may be an artefact of low gene representation in these two taxa compared with most others. (3) ‘Nanohaloarchaea’ moved into ‘Micrarchaea’ to become sister of aenigmarchaeotes with strong (0.99) support. (4) The DPANN clade (i.e. ‘Micrarchaea’ plus ‘Nanohaloarchaea’) resulting from (3) was not within euryarchaeotes but separate. The 26-protein ML tree (Fig. S5) differed in three respects: (1) In Filarchaeota, ‘Korarchaeum’ moved up one node to be sister to Sulfolobia; (2) Ignicoccus moved down one node as with 51 proteins; (3) methanogens Methanocella and Methanoculleus interchanged positions.

For large scale phylogeny, the most important difference between 26- and 51-protein trees was the exclusion of DPANN from euryarchaeotes as a single clade rather than two internal clades. Relationships of DPANN lineages to other archaebacteria are controversial. When the tiny symbiotic ‘Nanoarchaeum’ was discovered, some took their apparent branching outside euryarchaeotes at face value and considered them primitive archaebacteria or even the most primitive cells, but others argued that their tiny cells and genomes were secondarily reduced and such exclusion a LBA. 50-protein RP ML trees strongly placed it outside shorter-branch euryarchaeotes as did 27-protein large subunit RP trees, whereas 23-protein small subunit trees and 18-protein large subunit trees that excluded nine proteins with discordant single-gene trees placed it within euryarchaeotes as sister to Thermococcales (Brochier et al. 2005). A CAT gamma recoded tree that included also ‘Parvarchaeum’ and ‘Micrarchaeum’ grouped ‘Parvarchaeum’ with ‘Nanoarchaeum’ in the same position but ‘Micrarchaeum’ was weakly placed just above Methanothermea (Brochier-Armanet et al. 2011). A site-homogeneous Bayesian tree for 32 RPs plus 38 other proteins also put Nanoarchaeum with Thermococcales but ‘Parvarchaeum’ and ‘Micrarchaeum’ as two sister clades to Picrophilea whereas ‘Nanohaloarchaea’ were maximally supported sisters of Halobacteriales (Petitjean et al. 2015): thus, there appeared to be four distinct ‘DPANN’ clades within euryarchaeotes on this 70 protein tree using 10,963 sites (their Fig. 4) that gave no evidence for DPANN being one clade and the same strongly supported position for ‘Nanohaloarchaea’ as in our Fig. 4; the other three ‘Micrarchaea’ clades could have been aggregated into one at the same position as in Fig. 2 by each crossing just one node (all weakly or insignificantly supported). A CAT-GTR tree for 45 archaebacterial proteins including a few RPs but excluding ‘Nanohaloarchaea’ and the longest branch DPANNs had a single maximally supported ‘Micrarchaea’ (Williams et al. 2017) (Fig. S2), with maximal support for it being within Euryarchaeota in the Fig. 2 position (both using all 10738 positions (Fig. S2) and a more stringent selection of 5920 (their Fig. S3)). In another tree including only 25 genes and the 10 most genomically complete DPANN, including Nanosalina and ‘Micrachaeum’, DPANN was a single clade but within Euryarchaeota with 0.97 support against euryarchaeotes minus DPANNs being a clade. Six further CAT GTR trees each using a separate DPANN subclade placed it within euryarchaeotes (5 with maximal support, one with 0.98 support: 3 in Fig. 4Micrarchaeum’ position, the others all different, only ‘Nanoarchaeum’ sister to Thermococcales). A tree for 29 proteins rooted on unspecified eubacterium/a also placed a single DPANN clade within euryarchaeotes as sister to all euryarchaeotes other than Thermococcales. There is therefore consistent support from all previous site-heterogeneous trees and from all cited site-homogeneous ones for all DPANN clades branching within a paraphyletic euryarchaea as shown on Fig. 4. Previous evidence for the position of ‘Nanohaloarchaea’ was more contradictory. Our eubacteria-rooted prokaryote trees (below) more decisively support the conclusion of Petitjean et al. (2015) that ‘Nanohaloarchaea’ are sisters of Halobacteriales, not within ‘Micrarchaea’ as some trees of Williams et al. (2017) implied, but suggest that all other DPANN are a clade, contrary to Petitjean et al. (2015).

Eubacterial ribosomal protein trees

The 26-protein RP tree (Fig. 5) is taxonomically richer than any other, with 151 species representing all major lineages including many omitted from all previous trees. Both chains converged to exactly the same topology except for one persisting contradiction within Endobacteria, causing Maxdiff to remain at 1. Clostridiia sensu stricto were sisters of Bacilli plus mycoplasmas, making Clostridiales/Bacillia a clade with maximal support on one chain, but in the other chain Clostridiales s. s. moved down two nodes to join Thermoanaerobacterales. In marked contrast to rDNA trees, all bipartitions in the tree backbone were significantly supported except for that separating Hadobacteria and Fusobacteria, which therefore might really be a single clade (as they are with some taxon samples; see below). In a separate tree including also five mitochondrial sequences (Fig. S15), they branched within free-living α-proteobacteria in the position shown by the purple arrow on Fig. 5, not with Rickettsias, suggesting that grouping with Rickettsias on some published trees is a LBA artefact. Except for the position of Leptospirillum, delimitation of the proteobacterial subphyla is strongly supported by CAT. Adding mitochondria slightly altered the tree backbone by making Hadobacteria and Fusobacteria insignificantly supported sisters and this joint clade weakly sister to Synthermota, but changed no other relationships between the 14 major phyla (but increased support for Armatimonadetes being sister to Melainabacteria/Cyanobacteria from 0.68 to 0.91 and for Elusimicrobium being sister to other Planctobacteria from 0.8 to 0.95).

All 14 phyla were clades (strongly except for Planctobacteria) on the converged CAT-Poisson tree, but two moved slightly in position: Hadobacteria and Fusobacteria were sisters (0.6 support) and Sphingobacteria became sister of Spirochaete not Planctobacteria (unlike most non-Poisson trees). Within Endobacteria, Sulfobacillales were the deepest branch, not Halanaerobiales as in all other trees, and other subgroups rearranged. Within Proteobacteria Leptospirillum moved away from Acidobacteria to become sister of Rhodobacteria.

Topology is largely similar by ML (Fig. S6; only 12 differences, only one affecting the backbone: Aquithermota moved to be sister to Thermocalda with insignificant 29% support) but support for less robust branches tends to be lower. Fourteen major deep branching clades that may reasonably be considered phyla are strongly supported by both PhyloBayes site-heterogeneous methods, most maximally (Table 1) of which seven have maximal support by all three methods. Additionally, phyla Melainabacteria and Cyanobacteria are maximally supported as sister clades (here jointly made superphylum Oxybacteria: it is confusing to call Melainabacteria Cyanobacteria: Soo et al. 2017) and the position of chloroplasts within the more advanced cyanobacteria is maximally supported. Table 2 summarises the revised higher eubacterial classification proposed here; its simplicity with only 14 phyla, all phylogenetically sound, is enabled by proper use of intermediate categories (subkingdoms, superphyla, subphyla, infraphyla, superclasses) and greatly superior to the 114 phyla of the indigestible system of Parks et al. (2018), which fails to show relationships between the phyla. Later sections explain the most important of its innovations.

Table 1 Support on RP trees for the 14 eubacterial phyla and subphyla
Table 2 Revised higher classification of kingdom Eubacteria* and its four subkingdoms and 14 phyla

Infrakingdom Gracilicutes comprising four major negibacterial phyla (Proteobacteria, Spirochaetes, Sphingobacteria, Planctobacteria) is well supported by CAT but only weakly by ML. All methods give even stronger support for a broader grouping of eight negibacterial phyla that we treat as new subkingdom Neonegibacteria (Gracilicutes with two partially photosynthetic phyla; plus the two major thermophilic phyla Aquithermota and Synthermota; and two minor heterotrophic phyla Fusobacteria and Hadobacteria). There is weaker support by all methods for a clade comprising Armatimonadetes plus Oxybacteria, which we call subkingdom Eoglycobacteria as its three phyla form the deepest branching clade of negibacteria with LPS (if the tree is correctly rooted between them and Chloroflexi whose OM lacks LPS). Instead of the earlier term Eobacteria, we refer to Chloroflexi plus Eoglycobacteria jointly as Eonegibacteria, arguably the four most ancient negibacterial phyla.

Another important conclusion is that posibacteria are multiply polyphyletic. Actinobacteria and Endobacteria are not sisters. On CAT trees, Endobacteria are maximally supported as sister to Neonegibacteria, whereas Actinobacteria are near maximally (0.99) sisters of Endobacteria plus Neonegibacteria, these two positions being also significantly (but weakly) supported by ML. Within Endobacteria, deepest branching clades are all negibacteria, within them are at least two distinct posibacterial clades with only one membrane. However, internal branching within Endobacteria was inconsistent between CAT, Poisson, and ML, so further work must define the branching order of the major robust subclades, as a later section explains in detail. Irrespective of that uncertainty, our trees strongly indicate that Actinobacteria and at least two subclades of Endobacteria lost the OM independently. Furthermore, mycoplasmas are robustly nested within Bacilli so there is no justification for continued treatment of Tenericutes and Firmicutes as separate phyla: Endobacteria is definitely a clade with endospores its ancestral character, but there have been multiple losses of OM and (independently) of murein. Our trees robustly group the mycoplasma Mesoplasma with Erysipelothrix and Coprobacillus (both in order Erysipelotrichiales), but grouped another mollicute clade comprising Acholeplasma and Haloplasma with maximal support with Turicibacter instead. Thus, there appear to be two independent major mycoplasma clades, so reductive evolution has been rampant in Endobacteria.

The PVC group (classical Planctobacteria) is near maximally supported and consistently groups with good support with Elusimicrobium which is morphologically similar and so here included in slightly broadened Planctobacteria. Aquithermota comprising classes Aquificia (=Aquificae) and Thermodesulfobacteriia is maximally supported as sister to Gracilicutes by CAT but that joint clade was not found by ML. By contrast, Synthermota including Synergistetes, Thermotogia, Caldisericia, and Dictyoglomia is a completely distinct thermophilic CAT clade that branches more deeply below both Fusobacteria and Hadobacteria, and is thus the deepest branching neonegibacterial subclade. By ML, Synthermota splits into two robust subclades: Synergistetes and Thermocalda (new subphylum names proposed here), which do not group together, Thermocalda moving to be insignificantly supported sister to Aquificia. Thus, our CAT trees firmly resolve the long-standing controversy over whether the two hyperthermophilic eubacterial groups (Thermotogales and Aquificales) are directly related (Eveleigh et al. 2013). They clearly are not, each being nested separately within a broader thermophilic group, which are not even sisters. As several authors have argued, Aquithermota are more closely related to Gracilicutes than to Synthermota and the erratic contradictory groupings of Aquifex and Thermotoga together or apart on early ML rDNA trees that varied with taxon sampling reflected insufficient taxon sampling and evolutionarily less realistic algorithms plagued by long-branch artefacts.

Neomuran ribosomal protein trees

When eukaryotes and archaebacteria are included in the same 51-protein CAT tree (Fig. 6), topology of each is only very slightly changed from their single-domain trees. Eukaryotes appear rooted between maximally supported clade Eozoa (all but one internal branch maximally supported with Fig. 3 topology) and an insignificantly supported (0.36) neokaryote clade. Corticata, Plantae, Chromista, and Corbihelia are weakly supported clades; opisthokonts, Animalia, Amoebozoa, Alveolata, Heterokonta, and Rhizaria maximally supported clades. Internal phylogeny of Amoebozoa differed in putting Cutosea within Discosea, not as its sister (from the 351-protein tree (Kang et al. 2017) probably neither is correct but Fig. 3 more nearly so). Though the consensus tree is better than the eukaryote-only tree (Fig. 3) in recovering clades Plantae and Chromista, the two chains did not fully converge because of a few contradictions within eukaryotes only: (1) on chain 2 Trimastix was strongly sister to Breviatea as in Fig. 6 but chain 1 put it alone as the deepest branching eukaryote with the root between it and all others; (2) Plantae and Chromista were strongly (1, 0.99) clades on chain 1 but on chain 2 a haptophyte/Picomonas/Telonema false ‘clade’ weakly disrupted Plantae and Planomonadida weakly intruded into the remaining chromists; (3) Centroheliozoa moved slightly; (4) within amoebozoan Discosea deep branching slightly differed.

Fig. 6.
figure 6

Site-heterogeneous PhyloBayes CAT-GTR tree for 51 ribosomal proteins from 203 neomura representing all the most divergent lineages. Support values for bipartitions are: posterior probabilities for CAT-GTR (left; 51,189 trees summed after removing 40% as burnin: maxdiff 1; convergence was prevented by four persisting contradictions deeply within neokaryotes), RAxML bootstrap percentages for 100 pseudoreplicates (right). To fit the page branches for major taxa are collapsed; all names are shown on uncollapsed trees in Supplementary material, e.g. Fig. S1. Includes all taxa from Figs. 3 and 4

Archaebacterial topology was identical to Fig. 4 with DPANN within euryarchaeotes, except for ‘Nanohaloarchaea’ being sisters of Aenigmarchaeota, not Halobacteriales and class Methanobacteriia being a clade as on 200-protein trees (Petitjean et al. 2015). Thus, both major methanobacterial classes were clades. Eukaryotes were strongly excluded from lokiarchaeotes and did not group with them. If the tree is rooted within archaebacteria between euryarchaeota (including DPANN) and Filarchaeota, eukaryotes appear to be sisters of ‘Korarchaeum’ with low (0.63) support. This may be artefactual as distant outgroups are often attracted to such long unbroken branches; eukaryotes would only have to cross this and another weakly supported node to join archaebacteria between Filarchaeota and Euryarchaeota, which may therefore both really be clades. Basal branches within archaebacteria are much more spread out and on average much longer than in eukaryotes, implying a faster evolution; they are also much more variable in length (and would be even more so if we had not excluded the longest branches to reduce artefacts; we also excluded the longest branch eukaryote taxa, but even allowing for that eukaryote evolutionary rates are generally much more uniform than for archaebacteria implying greater evolutionary constraints).

The ML tree differed from the archaebacteria-only tree in showing DPANN as sister to euryarchaeotes not within them and in moving ‘Nanohaloarchaea’ to sister of ‘Parvarchaeum’. Eukaryotes remained outside lokiarchaeotes as insignificantly (39%) sisters of ‘Korarchaeum’—they would have to cross only that branch and one other with trivial 30% support to join the tree between Filarchaeota and euryarchaeota/DPANN. Thus, neither tree convincingly supports eukaryotes branching within Filarchaeota. For eukaryotes, ML gives maximal support for paraphyly of Eozoa with eukaryotes being rooted between Percolozoa and Euglenozoa plus all other eukaryotes, i.e. within discicristates, clearly contradicting the site-heterogeneous tree—internal topology of Eozoa is unchanged except for Seculamonas and Jakoba not being sisters. For eukaryotes the ML tree was marginally worse than for eukaryotes only or CAT for some of the most weakly placed clades as Collodictyon intruded into Corticata as insignificant sister of glaucophytes and breviates wrongly grouped with apusomonads. However, the tree overall was not grossly distorted by either method by adding genetically extremely distant archaebacteria, strongly supported clades being the same, though CAT appears slightly more resistant to such perturbation.

With only 26 RPs, the eukaryote CAT tree (Fig. S7) was different in a few respects but previously strongly supported patterns generally remained strongly supported, notably internal phylogeny of Eozoa, opisthokonts, Amoebozoa, Viridiplantae, haptophytes, rollomonad cryptists, and Harosa. However, branching at the base of neokaryotes was less conserved, e.g. Plantae being disrupted by Viridiplantae moving into Chromista and Glaucophyta and Heliozoa outside corticates to weakly join Collodictyon and Planomonadida respectively. These difficult-to-place groups and the sulcozoan lineages also appear misplaced on the 26-gene ML tree. For archaebacteria, 26 RP trees rearranged euryarchaeotes, placing Themococcales within Methanobacteria—not as the deepest group—by both CAT and ML; both put DPANN as sister to not within euryarchaeotes. Both placed the eukaryote root between Percolozoa and other eukaryotes with strong support for its exclusion from all others (0.92; 98) and thus for paraphyly of Eozoa. However, ML and CAT were contradictory for where eukaryotes joined archaebacteria: ML still grouped them with ‘Korarchaeum’, but CAT put them as sister to ‘Korarchaeum’ plus all other Filarchaeota except Lokiarchaea, i.e. closer to the base of archaebacteria than in the 51-protein tree. This further emphasises the unreliability of the position of eukaryotes within Filarchaeota; contrary to a neomuran CAT-GTR tree using 55 RPs (Zaremba-Niedzwiedzka et al. 2017) and a greater diversity of Asgaardia but an unspecified number of eukaryotes (likely fewer than in ours), none of our neomuran trees grouped eukaryotes with lokiarchaea. Our analyses agree with theirs in having a thaumarchaea/Sulfolobia clade, but theirs effectively have eukaryotes one node lower than in any of ours. Thus taking their analysis and ours together eukaryotes appear in three different contradictory places deep within Filarchaeota.

Contradictions amongst these trees with respect to the root of eukaryotes and where they join archaebacteria are unsurprising given that on Fig. 5, the stretched eukaryote stem that separates them represents a mean of 2.83 substitutions per site. As some sites are invariant and others evolve much faster than average most variable positions will have been overwritten many times since eukaryotes and archaebacteria diverged; scarcely any will have retained phylogenetically informative information about where the two ends of the stem historically joined each crown group. Very likely, chance convergences in the most variable positions will overwhelm genuine ancestral phylogenetic signal. The longest included unbroken DPANN branches are even longer, corresponding to a mean of nearly four substitutions per site, so one expects LBA artefacts to be serious for them (as others have convincingly argued: Brochier et al. 2005) and reasons for disbelieving the exclusion of DPANN from euryarchaeotes and separation of the two groups of halophilic bacteria on some RP trees. By contrast, mean branch length of crown eukaryotes represents only about 0.656 substitutions per site so a substantial amount of phylogenetically informative sequence information must remain. But because of explosive radiation at the base of eukaryotes, there was too little time between deepest branch points for many phylogenetically informative mutations to accumulate, so basal branch order is necessarily less well supported than in the more spread out deep eubacterial tree. The 26 universal RPs shared with eubacteria underwent almost as much change (mean 2.6 substitutions per site in the eukaryote stem) and show a similar disparity in rate patterns as the 51 neomuran ones.

Prokaryote ribosomal protein trees

There was a marked difference in archaebacterial deep branching and the apparent position of their root according to whether CAT trees used 51 or 26 archaebacterial RPs. With 51, ‘Nanohaloarchaea’ were sister of Halobacteriales within euryarchaeotes with maximal support on both chains which converged on the same topology within archaebacteria (Fig. S9) and ‘Micrarchaea’ were weakly sisters of Filarchaeota, so there was no DPANN clade. With only 26 RPs (Fig. 7) by contrast, there was a DPANN clade and the root appeared between it and other archaebacteria. These trees were also contradictory for a few parts of the eubacterial backbone (but showed all the same major clades). Both showed archaebacteria emerged from eubacteria as weakly supported sisters of Planctochlora, the joint Planctobacteria/Sphingobacteria clade. That is consistent with evidence discussed below that Planctochlora ancestrally had prenyl diether membrane lipids in addition to acyl esters and thus are credible eubacterial ancestors for archaebacteria. With 26 RPs, both chains supported that position. But when 51 archaebacterial and 26 eubacterial RPs are combined in a prokaryote tree (Fig. S9), they conflicted: chain 2 put Archaebacteria as sisters to Planctochlora (negligible 0.43 support), whereas chain 1 grouped them weakly (0.6) with Gracilicutes plus Aquithermae. In Fig. 7, the stem joining eu- and archaebacteria has a mean of 4.3 amino acid substitutions per site. Therefore, it is highly improbable that archaebacterial RPs retain enough ancestral-clade-specific information to place this long stem with precision within the roughly 20 major eubacterial lineages. Its apparent position is almost certainly lower in the tree than its true position, as the faster-evolving parts of its sequences needed to fix it relationship to more recent eubacterial branches must be overwritten; only the slowest evolving regions could retain useful phylogenetic information, and others may largely reflect vagaries of multiple overwriting of ancestral sequences—as previously argued for rDNA (Cavalier-Smith 2002a).

Fig. 7
figure 7

Site-heterogeneous prokaryote PhyloBayes CAT-GTR tree for 26 ribosomal proteins from 60 archaebacteria and 151 eubacteria representing all the most divergent lineages. Consensus of two chains; support values for bipartitions are posterior probabilities for the CAT-GTR (left; after removing 40% as burnin 179,537 trees summed; maxdiff 0.179537), RAxML bootstrap percentages for 100 pseudoreplicates (right). To fit on the page branches for major taxa are collapsed; their names are on uncollapsed trees in Supplementary material, e.g. Fig. S9. Archaebacteria are strongly excluded from Posibacteria and branch within Neonegibacteria. weakly as sister to Planctochora

Despite such multiple overwriting, addition of the long eubacterial outgroups, and eubacterial data being absent for 25 of its RPs, internal phylogeny of archaebacteria on Fig. S9 has scarcely changed from Fig. 4. In both, Filarchaeota are maximally supported as a clade, with internal phylogeny identical except for the maximally supported position of Ignicoccus, in a moderately supported slightly different position with this crenarchaeote subclade in Fig. 4. Within Micrarchaea, basal branching order is identical but much more strongly supported in Fig. S9, which differs only in the basal part of the 5-member environmental DNA subclade that is sister to the Nanoarchaeum clade; its Fig. S9 topology is markedly more strongly supported. Within short-branch euryarchaeotes, there are only two differences: (1) Methanomassiliicoccus maximally sister to an environmental lineage that weakly branches one node lower with very weak support; (2) Thermococcales move up one node to be sister to Methanopyrus. In these respects, except for the position of Thermococcales, the Fig. 7 topology is better supported and more consistent between chains, implying that adding eubacteria despite their distance stabilised topology by breaking up the basal stem, perhaps allowing better reconstruction of ancestral states for some branches. These features of Fig. S9 topology may therefore better reflect internal phylogeny of the three major groups. The only other difference is the position of Micrarchaea: sister to Euryarchaeota in Fig. S9, within it as sister to all except Thermococcales in Fig. 3. Given weak support for Micrarchaea being sister to Filarchaeota in Fig. S9 and its long branches, we suggest addition of an extremely distant outgroup may have pulled it artefactually one node away from its position within euryarchaeotes and misplaced the root by one node through long-branch attraction towards it. If so, Micrarchaea should really be within euryarchaeotes one node higher than Thermococcales, as in Fig. 4, and the archaebacterial root should be between Euryarchaeota and Filarchaeota, not sister to DPANN as in Williams (a possible long-branch attraction (LBA) artefact) or within short branch Euryarchaeota as in Raymann (a possible artefact of taxonomic undersampling).

When prokaryote trees are restricted to the 26 RPs shared with eubacteria (Fig. 7), archaebacterial CAT topology unsurprisingly changes slightly. Filarchaeota remain maximally supported with the same internal topology except that Ignicoccus moves one node. Euryarchaeote phylogeny is changed not only by exclusion of ‘Nanohaloarchaea’ (and their grouping with Aenigmarchaeota within Micrarchaea) but also by Thermococcales and Methanopyrus separated and intermingling with the Methanobacteria, and one change within the problematic environmental DNA clade. We suggest that for archaebacteria, the Fig. S9 topology is more reliable, being based on nearly twice as many genes and more concordant with the major euryarchaeote phenotypes. CAT 26-RP trees put the archaebacterial root between DPANN and other archaebacteria but support is low for the likely artefactual non-DPANN clade (0.63).

ML gives the same archaebacterial topology with 51 as with 26 genes but with often lower support; both also place the root within DPANN between the Micrarchaeum/Iainarchaeum clade and the rest, but are contradictory as to which is the deepest branch—Micrarchaeum/Iainarchaeum with 51 RPs and other DPANNs with only 26. Support is insignificant for both; both are likely to be artefacts and less accurate than CAT trees. There is no reason to prefer the ML topology to the evolutionarily more realistic CAT ones.

Positioning archaebacteria within eubacteria was also sensitive to gene sampling and method. With 51 proteins, ML put them as sister to a spurious (19%) Sphingobacteria/Spirochaete ‘clade’ (different from the CAT positions) with insignificant (18%) support. 26 proteins (Fig. S10) put them as sister to Sphingobacteria only (ML: insignificant 30%). Thus, ML tends to group archaebacteria with Sphingobacteria, with trivial support, whereas CAT does so with Sphingobacteria/Planctobacteria, with weak but higher support.

Eukaryote-eubacterial two-domain ribosomal protein trees

If there were no long-stem problems, these two-domain trees should theoretically be as reliable as the two preceding ones for rooting eukaryotes and correctly placing the neomuran stem within eubacteria. But in practice, one might expect them to be less reliable as the stem connecting eubacteria and eukaryotes is even longer: from Fig. 8, it has mean of 10.7 amino acid substitutions per site. In theory, eukaryotes should be placed within eubacteria in the same position as archaebacteria if there were a genuine phylogenetic signal able to show their correct position. However, eukaryotes appear within Planctobacteria only, as sister to the PVC group (exluding Elusimicrobium); the apparent position of the eukaryote root is within Eozoa between Percolozoa and all other eukaryotes (moderate support 0.84 and by ML 76%). ML puts eukaryotes within Planctobacteria as sister to Planctomycetales plus Elusimicrobium (insignificant 18%) a likely false clade. Reducing the eukaryote data to the 26 shared genes (Fig. S11), puts eukaryotes as insignificantly sisters of Planctomycetia only (0.49%) and the eukaryote root more narrowly within Percolozoa between Naegleria only and all other eukaryotes, both unlikely; the corresponding ML tree (Fig. S12) has the eukaryote root between holophyletic (54%) Percolozoa and the rest (77% for non-percolozoan eukaryotes being a clade) and shows eukaryotes as sister to all Planctobacteria except Elusimicrobium. Thus, these two-domain trees consistently support the theory that eukaryotes evolved from Planctobacteria (Reynaud and Devos 2011). Though the prokaryote trees instead suggest a slightly deeper position as sister to Planctochlora as a whole, both sets are weakly supported, as expected from the inferred degree of substitutional overwriting. More importantly, both two-domain trees strongly exclude neomura from both Actinobacteria and Endobacteria and thus clearly contradict a posibacterial origin of neomura (Cavalier-Smith 1987c, 2002a) and strongly indicate that their ancestors were neonegibacteria, and more weakly that they were most likely gracilicutes of Planctochlora subclade, rather than any of the deeper-branching hyperthermophilic neonegibacteria (Thermobacteria) as had been suggested by some three-domain rDNA trees. The weakness of the signal for their precise position within Planctochlora is emphasised by the two CAT chains being contradictory: chain 2 grouped eukaryotes with Planctobacteria (0.85) as sister to all except Elusimicrobium (0.68) whereas chain 1 put them as sister (0.55) to all Planctochlora, as were archaebacteria on the prokaryote tree, but excluding them from Planctobacteria insignificantly (0.45).

Fig. 8
figure 8

Site-heterogeneous 2-domain PhyloBayes CAT-GTR tree for 51 ribosomal proteins from 143 eukaryotes and 26 ribosomal proteins from 151 eubacteria representing all the most divergent lineages. Support values for bipartitions are from left to right: posterior probabilities for the CAT-GTR (left), RAxML bootstrap percentages for 100 pseudoreplicates (right). To fit on the page branches for major taxa are collapsed; their names are shown on uncollapsed trees in Supplementary material, e.g. Fig. S12. Despite 33,393 trees being summed after removing the first 17,893 as burnin the two chains did not converge (maxdiff 1) because of a few persistent topological differences (with 0.5 support or less) at the base of neonegibacteria and neokaryotes; both strongly excluded eukaryotes from Posibacteria and placed them within gracilicute Neonegibacteria. The root of eukaryotes beside Percolozoa within Eozoa was the same on both chains; one chain placed eukaryotes within Planctobacteria as on the consensus tree, but more strongly so, whereas the other put them more weakly two nodes more deeply as sister to Planctochlora

Eukaryote internal phylogeny is no more obviously disturbed on the 51 or 26 RP tree by adding the much more divergent eubacteria than it was for adding archaebacteria, so we shall not describe the eukaryote parts of these trees in detail: they exhibit similar tendencies for corticate, chromist and plant holophyly to be degraded and planomonads to intrude wrongly into chromists.

Eubacterial internal phylogeny is also very little changed by adding the 51 eukaryotic RPs. The relative branching order of the six deepest branching phyla (Chloroflexi, the three eoglycobacterial phyla and Actinobacteria and Endobacteria) is identical, and the closer relationship of Actinobacteria to Endobacteria plus Neonegibacteria than to Eoglycobacteria is more strongly supported (0.99 not 0.62). Except for Endobacteria whose deep branching order was poorly supported on Fig. 5, their internal phylogeny is identical (but one minor difference within chloroplasts). Neonegibacteria contains the same major clades with almost identical internal phylogeny but their relative positions are somewhat altered, probably because the long-stem eukaryotes branch within them as weakly supported (0.63) apparent sisters of Planctobacteria. Thus, eukaryotes do not branch in either of the two positions found for archaebacteria in the prokaryote tree. This conflict suggests that there were too many amino acid substitutions along the stem joining eukaryotes or archaebacteria to eubacteria for their correct position to be consistently determined. Despite eukaryotes branching within Gracilicutes, the relative branching order within Gracilicutes of all subgroups is identical and thus rather stable. However, unlike Fig. 5 where Gracilicutes were strongly supported as a clade (0.99) as were Aquithermota (1), the two chains placed Aquithermota contradictorily, so their position as sister to Proteobacteria in the consensus tree (Fig. 8) is a weakly supported compromise. In chain 2, Aquithermota were a maximally supported clade strongly supported (0.99) as sister to strongly supported (0.98) Proteobacteria, whereas in chain 1, Aquificia separated from Thermodesulfobacteriia and entered Synthermota as weak (0.78) sister to Thermocalda, whereas Thermodesulfobacteriia entered Gracilicutes as sisters (0.82) of δ-Proteobacteria (now much more weakly, 0.49, supported as a clade). As Aquithermota remain a well supported (86%) clade by ML outside Gracilicutes, its discordant splitting in one CAT chain is probably artefactual, perhaps caused by the very different eukaryote sequences. The relative branching order of Synthermota, Hadobacteria, and Fusobacteria also differ from Fig. 5.

The ML tree insignificantly groups all three major clades of thermophilic bacteria together, but Synthermota is not a clade. Actinobacteria move up the tree away from Endobacteria, insignificantly sisters of Hadobacteria. Within Endobacteria, Bacilliia plus Clostridiales sensu stricto (i.e. classical posibacterial endobacteria) are weakly (56%) supported as a clade unlike in Fig. 8. Eukaryotes appear within Gracilicutes but move to within Planctobacteria as sisters (no support: 17%) of a probably false grouping of Elusimicrobium and Planctomycetales.

With only 26 RPs the CAT tree (Fig. S11) did not fully converge as the two chains had a strongly supported conflicting topology within eubacteria. One chain gave essentially the same topology as Fig. 8; the other is basally very different, as Melainabacteria/Cyanobacteria moved upwards to become strongly sisters of Fusobacteria and Actinobacteria moved up to be strongly sister of Hadobacteria. Strong support for this aberrant topology made it dominate the consensus tree (Fig. S11). Despite these contradictions, both chains agreed in placing eukaryotes within Planctobacteria as sister to all Planctobacteria other than Elusimicrobium (0.67, 0.68 support) and in putting the eukaryote root within Percolozoa between Naegleria as in Fig. S11. Thus, although the main eubacterial clades are not altered by addition of eukaryotes, the backbone branching pattern of eubacteria is destabilised more by adding 26 eukaryote RPs with eubacterial relatives than by adding 51 eukaryote RPs. It is as if the presence of the 25 neomuran-specific proteins without eubacterial partners prevents the eukaryote sequences from destabilising the eubacterial part of the tree. With ML for 26 RPs (Fig. S12), the Melainabacteria/Cyanobacteria clade remains as in Figs. 5 and 8, but Actinobacteria move up to join Hadobacteria with insignificant (30%} support; the eukaryote root is within Eozoa between Percolozoa and the rest and eukaryotes are insignificantly (21%) sister of the probably false grouping of Elusimicrobium and Planctomycetales.

Universal three-domain ribosomal protein trees

On both CAT-GTR and ML trees, irrespective of whether 51 or 26 neomuran RPs were used, the apparent eukaryote root was between Percolozoa and all others (Fig. 9) as in the eubacteria-rooted tree (Fig. 8), not between Eozoa and neokaryotes as in the neomuran tree (Fig. 6). However, the position of eukaryotes within archaebacteria and of neomura within eubacteria varied, as did the apparent root of archaebacteria, and the branching order of eubacteria was generally more distorted compared with Fig. 5 than in two-domain trees and eukaryote topology also worse. Overall, three-domain trees appear notably less trustworthy than single and two-domain trees, making it unfortunate that they have been largely exclusively relied on in most previous work on the tree of life, except for the comparisons of Raymann et al. (2015).

Fig. 9
figure 9

Site-heterogeneous universal three-domain PhyloBayes CAT-GTR tree for 26 ribosomal proteins from 143 eukaryotes, 60 archaebacteria, and 151 eubacteria representing all the most divergent lineages. Support values for bipartitions are from left to right: posterior probabilities for the CAT-GTR (left), RAxML bootstrap percentages for 100 pseudoreplicates (right). To fit on the page, branches for major taxa are collapsed; their names are shown on uncollapsed trees in Supplementary material, e.g. Fig. S1. As the chains did not converge, this figure is for chain 2 with ML support values also mapped on to it. After removing the first 20% as burnin, the remaining 19,165 trees were summed. Deep branching order of prokaryote phyla is markedly more disturbed than in 2-domain trees (Figs. 6, 7, and 8)

The 26 RP CAT trees did not converge for the position of neomura, so Fig. 9 for a single chain and Fig. S13 for a consensus tree exemplify the two contradictory topologies CAT yielded for 26 RPs. In Fig. 9, neomura are sister to Gracilicutes as a whole not just to Planctochlora or the subset of Planctobacteria as in two-domain trees. In this tree, internal phylogeny of Gracilicutes is standard but non-gracilicute phyla are drastically rearrranged: Aquithermota enter Synthermota as sister to Thermocalda and Actinobacteria and Melainabacteria/Cyanobacteria move upwards to join Fusobacteria and Hadobacteria respectively as in the aberrant chain described in the previous paragraph. Chain 1 by contrast had normal positions for Actinobacteria and Melainabacteria/Cyanobacteria but greater aberrations for the thermophiles—Aquithermota are split: Thermodesulfobacteriaceae entered Gracilicutes as sisters (0.99) of δ-Proteobacteria whereas Aquificia move to be sisters of Thermocalda (weakly). Neomura are strongly (0.97) sister to that probably false Thermocalda/Aquificia clade. A broadly similar phylogeny is seen in the consensus tree (Fig. S13) but support for this position of neomura is negligible (0.38) and Hadobacteria are attracted within Synthermota also. Figure 9 put eukaryotes as sister to Lokiarchaea (0.89), whereas Fig. S13 put them within Lokiarchaea, weakly sister to Loki1 (0.48), contradicting neomuran trees that mostly grouped them with ‘Korarchaeum’. Basal branching of eukaryotes was almost completely unresolved, with maximal support for contradictory but maximally supported branching order at almost every backbone node—though most eukaryotic subgroups are well supported apart from problems as usual at the base of Hacrobia and scotokaryotes.

With 51 neomuran RPs CAT-GTR three-domain trees, we ran four separate chains that also did not converge but there were markedly fewer distortions within eubacteria and eukaryotes; none showed the aberrant upwards movement of Actinobacteria and Melainabacteria/Cyanobacteria but all differed in the position of neomura. Chain 4 put neomura within Gracilicutes as sister to the Spirochaete/Planctochlora clade (0.63) with negligible support for their not being closer to Planctochlora (0.43) and maximal support for eukaryotes as sister to all Filarchaeota except lokiarchaeotes; 1-3 related eukaryotes in contradictory ways to the rearranged non-gracilicute thermophiles. Chain 1 put neomura as sister (0.44) to Thermocalda, eukaryotes as sister to all Filarchaeota except lokiarchaeotes (0.87); chain 2 put neomura as sister to Thermocalda/Aquithermota (0.58) and eukaryotes as sister to ‘Korarchaeota’ (maximal support); chain 3 put neomura as sister to Thermocalda/Aquificia/Hadobacteria and eukaryotes as sister to Lokiarchaea (maximal support). These four contradictory positions for neomura and three for eukaryotes confirm the conclusion from 26 RP trees that three-domain trees cannot reliably position either. For what it is worth (not much), the consensus tree for all four chains (Fig. 10) puts neomura as insignificant (0.44) sister to Thermocalda/Aquithermota and eukaryotes as weakly (0.64) sister to all Filarchaeota except Lokiarchaea. Figure 10 with 51 neomuran RPs weakly (0.54) supports DPANN as a clade (including Micrarchaea and ‘Nanohaloarchaea’) and only weakly (0.57) places it as the deepest archaebacterial branch; with only 26 RPs, Figs. 9 and S13 (strongly 0.98, 0.99) have DPANN as the deepest archaebacterial branch. The corresponding CAT-Poisson trees also did not fully converge (maxdiff 1; 40% burnin; 2 chains with 29,287 trees summed) but both chains rooted eukaryotes within Amoebozoa between Tubulinea and other eukaryotes with strong support and put eukaryotes as sister to Loki2/3 with fairly strong support and rooted archaebacteria within non-DPANN euryarchaeotes in two contradictory places; in all these respects, they contradicted all CAT-GTR trees, which are theoretically more accurate. One chain put neomura as sister to Synthermota, the other within Synthermota as sister to Caldisericum/Coprothermobacter only (0.52), adding two more conflicting positions thus confirming the inability of RP trees to place neomura or root archaebacteria or eukaryotes consistently amongst methods. Despite all these conflicts, the internal branching order of eubacteria was essentially as in Fig. 5 and that of eukaryotes largely consistent with Fig. 3, indicating that the theoretically inferior reconstructive ability of CAT-Poisson was mainly confused by neomuran hyperaccelerated and eukaryote stems not by an inability to reconstruct intradomain branches correctly.

Fig. 10
figure 10

Site-heterogeneous universal three-domain PhyloBayes CAT-GTR tree for 51 ribosomal proteins from 143 eukaryotes and 60 archaebacteria and 26 ribosomal proteins from 151 eubacteria representing all the most divergent lineages. Support values for bipartitions are from left to right: posterior probabilities for the CAT-GTR (84.962 trees summed from four independent chains after removing 4035 trees as burnin: maxdiff 1), posterior probabilities for the CAT-Poisson (29,287 trees summed after removing 9,872 treees as burnin), RAxML bootstrap percentages for 100 pseudoreplicates. To fit on the page branches for major taxa are collapsed; their names are shown on uncollapsed trees in Supplementary material, e.g. Fig. S1

ML places neomura insignificantly (40%) sister to Sphingobacteria, using 26 or 51 RPs, with strongly supported Planctobacteria the next branch. Eubacterial and eukaryote backbone branching orders are insignificantly supported but most subclades are as in single-domain CAT trees. For 26 RPs, Fig. S14 has Melainabacteria/Cyanobacteria in the normal position as sister to Armatimonadetes with insignificant (40%) support, Aquithermota and Thermocalda are both clades, so in these respects, the ML tree is less perturbed by long branch neomuran and eukaryote stems than CAT-GTR. However, ML shows Actinobacteria as weakly (27%) sisters of Hadobacteria, so this likely artefact is consistent between these methods (not seen on CAT-Poisson). ML put eukaryotes weakly (67%) sister to Lokiarchaeota and rooted archaebacteria inside DPANN beside the ‘Iainarchaeum’/‘Micrarchaeum’ clade (negligible support for this as the deepest clade: 34%). When 51 neomuran RPs are included, ML puts eukaryotes more weakly (59%) sister to lokiarchaeotes, and the archaebacterial root between the ‘Micrarchaeum’/‘Iainarchaeum’ clade and the rest (even weaker support: 24%), almost certainly an LBA artefact. But for inclusion of neomura, Gracilicutes are a clade with the same internal branching as in all other trees. Aquithermota groups (insignificant support) within Synthermota as sister to Thermocalda and Actinobacteria are sisters of Hadobacteria with insignificant support (33%).

Overall pattern and limitations of the universal ribosomal protein tree

The tree is effectively three densely branched multistem bushes (crown eukaryotes, archaebacteria, eubacteria) interconnected by two long unbranched stems. It can be interpreted correctly only by mapping it onto the fossil record and understanding the reasons for the two immensely long bare stems.

The depth of the eubacterial bush corresponds to 3.5 Gy, from the age of RuBisCo-based carbon fixation given by isotopic 13C/12C ratios in Archaean kerogen, which at least as long ago as 3.41 Ga is sometimes associated with plausible morphological microfossils (Wacey et al. 2011) or stromatolites (Tice and Lowe 2004), and depth of the eukaryote bush (the only one certainly a clade) only to ~ 850 Ma. The earliest generally accepted crown eukaryote cellular fossils are only ~ 760 My old (likely corticate scales and likely scotokaryote amoeba tests; see Cavalier-Smith 2013a). The oldest known steranes, commonly viewed as eukaryote markers even though several disparate eubacteria make simple steranes, are in rocks dated 820-720 Ma (Brocks et al. 2017) suggesting that eukaryotes were not abundant before 820 Ma.

But all early crown eukaryote fossils are neokaryotes; Eozoa the earliest branch on our RP trees do not fossilise well, so if Eozoa are older than and ancestral to neokaryotes as suggested by a majority of our trees that put Percolozoa most deeply (Figs. 8, 9, 10, S7, S8, S11, S12, and S13 using proteins of eukaryote host origin), the last eukaryote common ancestor (LECA) is somewhat older, as previously suggested (Cavalier-Smith 2010b, 2013a, 2014, 2017). However, if Eozoa are a sister clade to neokaryotes, as suggested by one of our neomuran trees (Fig. 6) and a similar outgroup-rooted tree using proteins of eubacterial origin likely derived via mitochondrial symbiogenesis (He et al. 2014), then Eozoa and neokaryotes would be essentially the same age. A later analysis using mitochondria-derived proteins concluded instead that Eozoa are a clade that is sister to Corticata (Derelle et al. 2015); if that were historically correct, Eozoa would be effectively the same age as Corticata (probably ~ 745 Mya based on our RP tree proportions) and thus a little younger than LECA, so the absence of eozoan fossils would not bias the inferred age of LECA. On Fig. 6, LECA appears only marginally older than the neokaryote clade; from the Fig. 6 tree proportions, if neokaryotes were 800 My old, LECA would be dated ~ 816 Ma by applying a uniform eukaryotic molecular clock; if neokaryotes are only 760 Ma, the LECA date would be only 775 Ma. But if our trees placing the eukaryotic root instead within Eozoa between Percolozoa and all others were correct (which we doubt; see below), the inferred age for LECA would be older—from Fig. 9 proportions ~ 1.0 Gy. This illustrates the importance of knowing the position of the eukaryote root for mapping sequence trees onto the fossil record.

On present evidence, it remains unlikely that crown eukaryotes are older than ~ 850 ± 30 My, the same as argued earlier when mapping rRNA trees onto the fossil record (Cavalier-Smith 2002a, 2006a). If the depth of the eubacterial crown represents 3.5 Gy, and that of the eukaryote crown ~ 0.85 Gy, they should have a ratio of ~ 4.1 if amino acid substitution rates were the same in both. In fact (ignoring the accelerated longer branches of chloroplasts and cellular endoparasites like Rickettsias and mycoplasmas), the ratio is only ~ 1.7 so most eubacterial RPs have evolved about 2.3 times more slowly than most eukaryotic RPs, implying that selection against change is stronger in eubacteria. Figures 3, 4, 5, 6, 7, 8, 9, and 10 omitted mitochondrial RPs as they evolve far faster than chloroplast RPs and have immensely longer branches, presumably because purifying selection preventing random divergence is weaker. Nonetheless, the point where the mitochondrial stem diverges from within the α-proteobacterial clade (Fig. S15 and arrow on Fig. 5) gives an upper bound to the age of both LECA and stem eukaryotes. Applying a constant eubacterial molecular clock to the Fig. S15 RP tree, we estimated the upper bound of age of the first mitochondria and therefore LECA to be ~ 1.18 Ga, but the actual age of LECA is likely younger.

The age of archaebacteria is less clear as they have no morphological fossils and the oldest direct evidence for their age is ~ 820 My old isoprenoids from halophilic archaebacteria, at least some of which were probably methanogens as indicated by the presence of crocetane (Schinteie and Brocks 2017). Given that some methanogens can be halophilic and the possibility that other early archaebacterial clades might have been also, these lipids cannot be regarded as specific markers for the halophilic euryarchaeote clade shown on Fig. 4, which appears to be over 30% younger than the last archaebacterial common ancestor (LACA), which was likely a methanogen if Fig. 4 topology is correct. However, if we assume that these lipids did come from the base of that clade then we could use them to set an upper bound to LACA's age: 1.17 Ga. In a later section, we use an LGT from viridiplant chloroplasts to ‘Cenarchaeales’ (Petitjean et al. 2012) to date the euryarchaeote/filarchaeote divergence at 1.18 Ga. Thus, three independent phylogenetic/fossil calibrations give the same young ages for eukaryotes and archaebacteria: both are less than 1.2 Gy and more than 0.85 Gy old, i.e. ~ 1.0 ± 0.15 Gy old. Thus, present evidence is compatible with the idea that eukaryotes and archaebacteria are sisters of equal age, as Cavalier-Smith (1987c, 2002a, 2006a, 2014) long argued; but if they actually branch within archaebacteria, either within or as sisters to Filarchaeota, archaebacteria would be slightly older. As there is no other credible evidence for the actual age of archaebacteria, there is no reason to think they are as old as eubacteria. Although all our trees weakly suggest that the eukaryote stem emerges near the base of Filarchaeota—but in several contradictory places, resolution is not good enough to eliminate the idea that eukaryotes and archaebacteria are sisters, which many aspects of cell evolution favour, and that euryarchaeotes and filarchaeotes mutually diverged at essentially the same time as archaebacteria and eukaryotes in an unresolvable trifurcation. Certainly, there is no evidence from RP trees or from palaeontology that archaebacteria are substantially older than stem eukaryotes. Neomura are likely about three times younger than eubacteria. Fallacious arguments for greater archaebacterial antiquity stem from methanogenesis and their (non-unique) lipids, whose relatively recent evolution is explained in detail in later sections. The chimaeric origin of reverse DNA gyrase from two eubacterial enzymes has long been evidence that archaebacteria evolved from and thus are younger than eubacteria (Cavalier-Smith 2002a)—we argue below that their reverse gyrase most likely came from Aquithermota which must therefore be older than archaebacteria, as must Planctobacteria, if neomura evolved from them (as Fig. 8 suggests).

If archaebacteria are of similar age to eukaryotes, their longer branches imply that RPs of shorter branch archaebacteria evolve ~ 2.5× faster than RPs in most eukaryote lineages and thus about 5.8 times faster than most eubacteria. Some archaebacterial lineages, notably many DPANN, evolve much faster still, which makes their accurate placement problematic (see below); RP evolutionary rate disparity within DPANN is greater than shown on our trees as the longest branches were omitted to reduce long-branch artefacts—in eubacteria, except for mitochondria, we omitted none for that reason so they are genuinely more clock-like than archaebacterial RPs. A few extra-long eukaryotic branches (notably free-living Foraminifera and genomically reduced intracellular parasitic microsporidia and retarian Mikrocytos) were omitted for the same reason, but most eukaryote lineages have more uniform branch lengths even than eubacteria, indicating that even though mean amino acid substitution rates are higher than for eubacteria their relative rates are mostly more constrained than in eubacteria.

The difficulty of deciding whether eukaryotes are sisters of archaebacteria or branch deeply within them proves that they cannot be as much as 3–4 times as old as eukaryotes, as eubacteria probably are: if they were, eukaryotes should branch shallowly within them with maximal support, which no sequence trees show. Thus, the relative proportions of ribosomal trees combined with fossil evidence for eukaryote recency have long proved that archaebacteria cannot be as old as eubacteria, as Cavalier-Smith (1987c) first emphasised and later elaborated in detail (Cavalier-Smith 2002a, 2006a, d). Therefore, archaebacteria are much younger than eubacteria. Most evidence indicates that they are also substantially younger than cyanobacteria which almost certainly evolved before the great oxygenation event (GOE) of 2.4 Gy ago that made the atmosphere oxidising and which left the best eubacterial morphological fossils (but see later discussion of SMC protein evolution claimed to show cyanobacteria as younger than archaebacteria). Contrary to their name, archaebacteria are the youngest, not oldest major bacterial group and are irrelevant to the origin of life. They have often been assumed to be ancestrally anaerobic (Weiss et al. 2016), but more critical reevaluation of the evolution of aerobic respiratory chains in a later section shows that they were not and were ancestrally facultative aerobes that evolved a novel kind of methanogenesis different from the likely earlier aerobic version recently discovered in eubacteria (Teikari et al. 2018).

Widespread, but mistaken, beliefs that archaebacteria are as old as eubacteria stem from misinterpreting the significance of the two long bare stems on rRNA and some protein trees (including RPs) located between (1) archaebacteria and eubacteria (called the neomuran stem as it is at the base of the neomuran clade: Cavalier-Smith 2002a) and (2) between the ancestral prokaryotes and derived eukaryotes (the eukaryote stem), as well as similar long bare stems that join the subtrees of protein paralogue trees of molecules like protein synthesis elongation factors (EF) (Cavalier-Smith 2002a, 2006c, 2014). EF subtrees also have long bare internal neomuran and eukaryote stems (Baldauf et al. 1996); as in RPs, the neomuran stem is longer than the eukaryote stem, indicating greater sequence change in ribosome-related proteins during the origin of neomura than during the origin of eukaryotes, but the interparalogue stem is longer still. That greater length does not imply a longer time span, but much faster evolution during a brief time than occurred within any of the three terminal bushes. Ultrarapid evolution for a short period followed by deceleration is the general explanation for the greater length of these stems than of the bushes. Episodic hyperacceleration also explains the bareness (no side branches) as ultrarapid evolution was so shortlived that no radically different subgroups evolved before rates returned to the normal low ones maintained by strong purifying selection: for detailed explanation, see Cavalier-Smith (2002a, 2006c); Cavalier-Smith et al. (2018) use Foraminifera that display similar inflated stems on multiprotein trees but have billions of well-preserved and well-dated fossils to prove that this explanation of long bare stems applies equally to them.

The great length of the EF interparalogue stem was caused by rapid adaptive evolution to make two different proteins with substantially different functions (EF-Tu and EF-G); during that divergent adaptation directional selection for novelty was strong, but once the two distinct GTPase functions were largely perfected most selection was against further change so evolutionary rates plummetted to a low level throughout eubacteria (dependent largely on the relative strengths of mutation pressure and purifying/stabilising selection). That divergent change happened before the last universal common ancestor of all life (LUCA), which fossil evidence and sequence trees (summarised above) in conjunction with much cell biology tell us must have been the same as the last eubacterial common ancestor, not an imaginary ‘progenote’ as postulated by Woese and Fox (1977a, b). By contrast, episodic hyperacceleration in the neomuran stem did not occur close to LUCA, as wrongly assumed without any evidence (Woese and Fox 1977a, b), but ~ 2.5 Gy later and must have been caused by novel changes during the neomuran revolution when cotranslational synthesis and secretion of N-linked glycoproteins evolved after eubacterial murein was lost, which entailed coevolutionary changes in the signal recognition particle (SRP) and the evolution of all the neomuran RPs for which homologues are unknown in eubacteria—the most radical change in protein synthesis in the history of life. The major SRP protein (SRP54/Ffh) and its receptor (SRα/FtsY) also arose by gene duplication and great divergence in LUCA during which Ffh evolved a new C-terminal extension and FtsY a new non-homologous N-terminal extension (Gribaldo and Cammarano 1998); the SRP/receptor paralogue tree for the shared region also has a longer neomuran than eukaryote stem but the interparalogue stem is intermediate in length implying that its ancient pre-LUCA divergent sequence change was less than the far more recent change during the origin of neomura.

Woese and Fox realised that the long neomuran and eukaryote stems must be caused by temporary ultrarapid evolution, much faster than that within the branched bushes, but wrongly assumed that both accelerations took place close to the origin of life before the basic machinery of translation was perfected and proper cells evolved. Both then and later, they ignored fossil evidence that crown eukaryotes are so much younger than eubacteria indicating that this assumption cannot possibly be true, and that the long stem for eukaryotes at least must have been caused by radical changes to ribosomes billions of years after LUCA. They expressed the prejudice that such radical change could only occur close to the origin of life and continued to believe that for most of its history ribosomal molecules have been accurate chronometers.

The case of mitochondria tells us how radically wrong that was. Their ribosomes evolved from α-proteobacterial ribosomes roughly 2.5 billion years after the first eubacterium, which had only 54 RPs, yet before LECA are inferred to have had 72 RPs, having evolved 19 new RPs not found in prokaryotes and probably lost one proteobacterial RP (Desmond et al. 2011). Numerous other major changes occurred in mitoribosomes by loss and addition of RPs in many eukaryote lineages, as well as large changes in mt rDNA sequences greater than those differentiating the three domains. In most eukaryote mitochondria, SRPs have been lost and opisthokont mitoribosomes are permanently attached to the inner membrane, largely making membrane proteins. Their 82 RPs are more even than the 79-80 cytosolic ones (Bieri et al. 2018; Greber and Ban 2016). This means that radical changes in ribosome structure are possible long after LUCA and occurred several times; even though such changes may cause translational errors, these errors do not prevent conservation of encoded protein sequences as they do not change the DNA germline. Probably more can be tolerated in mitochondria (where few different proteins are made), so purifying selection is less stringent for mitoRPs than cytoRPs, e.g. in mammals mitoRPs evolve 13 times as fast. The reader can see from Fig. S15 how non-clock-like mitochondrial RPs are compared with the far more slowly evolving eubacterial ones. The situation is even more dramatic than that figure shows because for most eukaryotes mitoRPs were even more divergent and so immensely harder to align and we omitted them from our analysis; many omitted species would have even longer branches. Figure S15 also emphasises that for mitochondria, most acceleration occurs in the crown part of the tree, not in the stem whose length is 4–8 times shorter than the crown—the exact opposite to the eukaryote and neomuran stem acceleration which are relatively much longer and so will have erased phylogenetic signal more than happened for mitochondria whose stem is relatively short. Note that the eukaryote crown is immensely longer for mtDNA than for nuclear DNA even though both must be the same age, proving systematic gross acceleration for mitochondria and deceleration for nuclear RPs since LECA.

Many others appear ignorant of both Woese’s assumption of early rapid acceleration and Cavalier-Smith’s (2002a) identification of neomuran and eukaryote stem hyperacceleration instead and of the contradictions amongst protein paralogue trees as to the position of the root; so mistakenly (a) place the root in the neomuran stem, and so fundamentally misunderstand early cell evolution, and (b) apply a single clock to the whole tree, leading to absurdly inflated age estimates for archaebacteria and eukaryotes (e.g. Betts et al. 2018; Blank 2009; Sheridan et al. 2003; with others, a later section criticises in detail). Gogarten-Boekels et al. (1995) accepted 10-fold acceleration in the neomuran stem (probably an underestimate) but even so imagined that the long neomuran stem indicated a billion or so years of evolution and speculated that the absence of any side branches in that imaginary billion years was caused by meteorite bombardment extinguishing all earlier radiating life except for two lineages that diversified to form eubacteria and neomura a billion or more years after LUCA. That interpretation is incompatible with the accurate dating of cyanobacterial origins from the RP tree on the assumption of episodic hyperaccelaration in the neomuran and eukaryote stems involving manyfold faster amino acid substitution than the much more nearly clock-like diversification within crown eubacteria. Episodic hyperaccelaration by a much greater factor in the neomuran and eukaryote stems simultaneously explains more simply than highly speculative meteorite bombardment, for which there is no evidence, why both stems are bare; only accepting radically different stem and crown rates by at least two orders of magnitude allows accurate detailed mapping of the whole RP tree onto the fossil record and only that explains why the eukaryote stem plus crown branch is so much longer than the archaebacterial branch. Episodic ultrafast evolution affecting some molecules not others explains why the relative proportions of the same parts of the universal tree are so different for some molecules than for others. A later section gives a new example of a protein that has undergone radically different local accelerations from RP but to which a single clock has also been wrongly applied globally. Fundamental misinterpretation of universal trees by the entirely false assumption of a universal molecular clock is a pervasive problem for virtually all sequence trees. Refuting that assumption hundreds of times, as has been done, has sadly had no effect on many who calculate dates by computers, ignoring evidence for more massive rate changes than their algorithms can model, so obtain results exemplifying the principle ‘garbage in garbage out’.

The evidence from mapping first rRNA and now RP trees onto the fossil record shows that the grossly stretched neomuran and eukaryote stems both reflect two much more recent episodic hyperaccelerations in ribosomal evolution that took place billions of years later, most likely > 2.5 Ga after the origin of life. The scale of the stretching is so great that it explains why ribosomal trees are so bad at accurately reconstructing the root of eukaryotes and archaebacteria or their precise eubacterial ancestors even though the much slower evolving crown sequences of all three domains make these molecules very good for resolving their internal phylogeny so long as one uses numerous RPs and site-heterogeneous trees. They are worst for basal eukaryote phylogeny because its divergences were more sudden than the equally numerous eubacterial ones for which we believe Fig. 5 gives the most accurate tree to date.

Unfortunately Woese’s mistaken assumptions and ill-defined erroneous notion of a progenote lying midway along the neomuran stem have been so pervasively influential that many archaebacterial researchers similarly ignorant of fossil and other evidence against it still imagine that archaebacteria are ancient, as do some others who refuse to take the evidence against it seriously (e.g. Koonin (see his comments as a referee of Cavalier-Smith (2006c)) and certain others who have axes to grind for interpretations that are entirely untenable and at variance with the evidence summarised here).

Taxon-rich multi-RP trees are reasonably accurate within domains

We expected eukaryote 51 RP trees to be less accurate than earlier studies with 187 proteins and 26 RPs trees to be less accurate still and ML to be less accurate than CAT. We also expected topological deviations from 187-protein trees would be mainly in areas where numerous branches diverge almost simultaneously, traditionally the hardest to yield consistent results: notably at the base of Hacrobia, Plantae, and scotokaryotes. Our RP trees confirm all four expectations as explained above. We also expected that the more distant the outgroups, the more likely would internal phylogeny of eukaryotes be perturbed. As predicted, eukaryotes-only or neomuran-only trees were generally more concordant with 187-protein trees than were three-domain trees; the worst trees for eukaryotes, with the lowest basal resolution and highest contradictions between chains, were the three-domain trees for 26 proteins. Yet even these were markedly more accurate for eukaryotes than most previously published three-domain trees with much sparser taxon sampling. Internal phylogeny of nearly all major clades was the same for both 51 and 26 RP trees as with 187 proteins, and except in the three difficult regions relationships amongst them were the same. Most were strongly or maximally supported with 51 RPs, but some were lower with only 26 genes. Despite this, our trees include a very few seriously wrong placements of major eukaryote branches with high support, but markedly fewer instances than on previous multidomain trees—all undersampled for eukaryotes.

We conclude that eukaryote taxon-rich trees for 51 RPs are reasonably accurate provided site-heterogeneous methods are used, but are not perfect and thus cannot be a substitute for trees with hundreds of genes. This is primarily because basal eukaryote branches are so numerous and so tightly clustered that only small amounts of still conserved phylogenetically informative changes can have occurred in the stems of the deepest branches. It is therefore not worth discussing the few deviations from genically more comprehensive trees in detail. A combination of hundreds of proteins, site-heterogeneous methods, and care to exclude the fastest evolving positions is necessary to establish accurately the most difficult parts of eukaryote branching topology (Kang et al. 2017). The extremely tight clustering of basal eukaryotic lineages on RP trees confirms earlier arguments that the basal eukaryotic radiation was indeed explosive, a pattern not dismissable as an artefact of substitution saturation.

That is strikingly shown by the basal branching of eubacteria, which are about four times as old, being much more spread out and thus likely inherently more gradually divergent. This difference is striking on Figs. 8, 9, and 10, where especially for neokaryotes, basal radiation resembles an explosive big bang (as previously emphasised for rDNA: Philippe and Adoutte 1996, 1998) that is necessarily inherently difficult to resolve. Within eubacteria, that problem is less, for basal branches on Fig. 5 are almost all strongly supported by CAT, though markedly less by ML. Stronger support for the eubacterial tree backbone stems primarily from their basal branches being more spread out in time, so more differences could accumulate between successive branches between phyla than possible for the basal neokaryote radiation that probably took only a few tens of million years around 800 Ma. A second reason why basal eubacterial branching is highly credible is that RP evolutionary rates must be only about half as fast in eubacteria as in eukaryotes (because the eubacterial crown is only on average about twice as deep as the neokaryote crown despite being four times older: 3.5 Gy. That age is set by the age of the 13C/12C isotopic ratios in ancient hydrocarbons interpreted as evidence for RuBisCo photosynthetic carbon fixation that is restricted to eubacteria (specifically Negibacteria)). Only one feature of the eubacterial backbone appears doubtful (relative positions of the non-photosynthetic negibacterial phyla Hadobacteria and Fusobacteria: sometimes successive, sometimes sisters). Its overall pattern appears robust, much more so than past rDNA trees and in places differing distinctly from them as detailed below.

Robustness of the eubacterial tree allows us to conclude that some eubacterial phyla are much younger than others. For example, given the rooting shown, cyanobacteria (ancestors of chloroplasts) are notably younger than Chloroflexi, Endobacteria, or any gracilicute phyla, assuming a mean molecular clock (reasonably as eubacterial branch lengths are broadly similar, differing by less than twofold; unlike for neomura). Taking the mean of the Gloeobacteria and somewhat longer tip lengths of subphylum Phycobacteria (i.e. cyanobacteria with thylakoids: Cavalier-Smith 2002a) to represent the present, the Fig. 5 tree proportions suggest an age of ~ 1.3 Gy for crown cyanobacteria and ~ 2.3 Gy for stem cyanobacteria. As this is closely similar to the GOE (~ 2.4 Gya), it is likely that oxygenic photosynthesis originated close to divergence of Cyanobacteria and Melainabacteria. This close agreement of fossil evidence and our RP tree rooted on Chloroflexi itself supports our rooting. There would be no such agreement if (as far too many suppose) it were rooted halfway along the neomuran stem. The chloroplast stem emerges from cyanobacteria later, at ~ 1.0 Ga but that could be an overstimate if its longish branch is artefactually deep because of LBA. Likewise, α-proteobacteria, the ancestors of mitochondria, whose age sets an upper limit to that of eukaryotes, appear to be > 2 Gy younger than negibacteria, consistent with their last common ancestor being aerobic and giving an extreme upper bound to the origin of crown eukaryotes of ~ 1.1 Ga; the position of the mitochondrial stem within proteobacteria on Fig. S15 corresponds to ~ 0.97 Ga. Even this may be a bit too old for crown eukaryotes if the long mitochondrial branch is somewhat too low within α-proteobacteria as can happen by LBA. A slightly younger date would fit the absence of eukaryote-like steranes before 820 My (Brocks et al. 2017) and of definitely neokaryote cellular fossils before 760 Ga (Cavalier-Smith 2013a) and the idea that neomura date back only to ~ 850 My (Cavalier-Smith 2002a). Having a robuster tree, we can map other evolutionary events onto it and better evaluate claims for LGT. For example, later sections argue that the role of LGT has been exaggerated in evolution of photosynthesis, respiration, and nitrogen fixation, and that LUCA was a negibacterial anaerobic photosynthesiser with nitrogen fixation and respiratory electron transfer abilities, and eubacterial flagella.

For basal archaebacteria, the RP tree is markedly less well resolved, for three reasons. First, the deep branches are part of an explosive radiation, as in eukaryotes, not well spread out as in eubacteria, so fewer ancient changes can have occurred between them. Second, archaebacterial RPs evolve faster than eubacterial or eukaryote ones: their tree branches are longer than those of eubacteria and around three times longer than those of eukaryotes, despite the oldest fossil evidence for archaebacterial lipids (820 My ago) suggesting they are the same age as eukaryotes (for which fossil steranes of complexity indicating eukaryotes are no older than 720–820 Ma: Brocks et al. 2017) as does the LGT from chloroplasts noted above. Third, they are markedly less equal in evolutionary rate than in eubacteria. DPANN lineages (secondarily miniaturised archaebacteria with exceptionally diverse rates, probably because of their simplified genomes) are a nuisance for tree reconstruction as they have likely lost most information that would accurately place them (see below). However, despite these difficulties, the bipartition between Euryarchaeota/DPANN and Filarchaeota is consistently strong in archaebacteria-only trees, and the majority of their branching topology other than for DPANNs appears to be reliable at least for site-heterogeneous trees (somewhat better for 51 than for 26 proteins).

Major improvements to the eubacterial tree

RP trees agree with rDNA trees in showing with maximal or near maximal support the monophyly and deep distinctiveness of 10 established major groups: the eight phyla Chloroflexi, Armatimonadetes, Cyanobacteria, Hadobacteria (=Deinococcus/Thermus group), Fusobacteria, Spirochaetae, Planctobacteria (largely = PVC group), Sphingobacteria (largely = FCB group); and subphyla Actinobacteria and Endobacteria, which in light of our RP trees showing they are not sisters, we now rank as separate phyla. Unlike many recent eubacterial ‘phylum’ names in common use, all these taxon names were validly published (Cavalier-Smith 2002a; Tamaki et al. 2011) even though the International Code of Prokaryote Nomenclature (ICNP) does not apply to categories ranked above class (Parker et al. 2014). Though Hadobacteria was recently rejected as a class name (Tindall 2014) for unspecified reasons that may be invalid, we use it here at its original non-rejected phylum rank (Cavalier-Smith 1992b, 1998b) as it is less cumbersome than the three-word ‘group’ name. Our trees confirm that candidate phylum ‘Melainabacteria’ (lacking cultured representatives (Di Rienzi et al. 2013; Utami et al. 2018) except for predatory Vampirovibrio (Soo et al. 2015b)) is sister to Cyanobacteria and show for the first time that their joint clade is probably sister to Armatimonadetes. Eoglycobacteria is a suitable new name for this robust clade comprising Cyanobacteria, Melainabacteria, and Armatimonadetes, as it is apparently the earliest branching glycobacterial clade, best ranked in formal classification as a subkingdom. Glycobacteria was introduced as the infrakingdom name for all eubacteria with outer membranes containing LPS (Cavalier-Smith 1998b). Our RP trees also reveal two major previously unrecognised thermophilic clades (Synthermota including hyperthermophilic Thermotogales; Aquithermota including hyperthermophilic Aquificales; formally established as two new phyla in the Taxonomic Appendix) and confirm that Proteobacteria are phylogenetically much wider than has been generally appreciated, supporting the broadening of Proteobacteria in the eubacterial classification of Cavalier-Smith (2002a). Contrary to the trees of Yutin et al. (2012) and Boussau et al. (2008b), but in agreement with most rDNA trees, Thermotogia and Aquificia are not sisters. Contrary to Lasek-Nesselquist and Gogarten (2013), Thermotogia, Aquificia, and Synergistetes are not a clade. Raymann et al. (2015) excluded Aquificia, Synergistetes, and Fusobacteria. Our trees also show that Elusimicrobia are better included in Planctobacteria and Gemmatimonadetes in Sphingobacteria than treated as separate phyla as in the past. Thus, the whole diversity of major named eubacterial groups can now be included in just 14 robustly monophyletic phyla as summarised in Fig. 11 and Table 1, a great simplification compared with 29 in Ruggiero et al. (2015).

Fig. 11
figure 11

The 14 eubacterial phyla recognised here. For two exceptionally diverse phyla (Proteobacteria, Sphingobacteria) their three major subbranches, here ranked as subphyla (but often treated as several smaller phyla), are also shown. Support for the monophyly of each (from Fig. 5) is extremely high as is CAT-tree support from Fig. 5 for their relative branching order, except for the position of Hadobacteria which sometimes appear as sister to Fusobacteria (dashed arrow). Their branching order is otherwise very stable on site-heterogeneous trees restricted to Eubacteria, but adding one or both highly divergent neomuran groups on multidomain trees makes branching order less stable, there being a strong tendency for the major thermophilic phyla (Aquithermota, Synthermota) to group together or become partially intermixed with Hadobacteria/Fusobacteria; these changes are likely artefacts. Phyla with some photosynthetic members are in green; the different types of photosynthetic reaction centres (RC and characteristic deletions) and presence of FMO, chlorins, phycobilisomes (PB) and chlorosomes (cs) are mapped onto the tree; it is unknown if uncultured Candidatus Palusbacteriales (‘Eremiobacteria’: Ward et al. 2019) has chlorosomes—as not in our analyses, its likely position in Armatimonadetes (dashed line) is only weakly established; its discovery increases the likelihood that ancestral eubacteria (i.e. LUCA) had RCII. The position of neomura (dashed line) is based on two-domain RP trees (see text)

Our trees strongly support monophyly of clade Gracilicutes established at infrakingdom rank to embrace Proteobacteria, Spirochaetae, Planctobacteria, and Sphingobacteria based on a combination of indels, rDNA trees, and ultrastructure (Cavalier-Smith 2006c), but show that their branching order then deduced by cladistic arguments is almost certainly incorrrect. In all our eubacterial trees, Planctobacteria and Sphingobacteria are sisters, forming a clan here designated Planctochlora that is robustly sister to Spirochaetae, Proteobacteria always being sister to Spirochaetae plus Planctochlora. That is precisely the same gracilicute branching order as Yutin et al. (2012) found using 50 RPs and FastTree, which is slightly less accurate than RAxML used here (Price et al. 2010), and must be substantially less accurate than PhyloBayes CAT (their WAG evolutionary model is also less accurate than LG used for ML here). This exact branching order and Gracilicutes as a clade were all strongly supported in the 56-protein eubacterial ML tree of Boussau et al. (2008b) though they did not sample all major planctochloran groups. This gracilicute branching order is conserved in nearly all our multidomain trees so is robust to inclusion of highly divergent neomuran relatives. (We refer to Planctochlora as a clan not a clade as many multidomain trees imply that neomura evolved from Planctochlora; if that is correct, they are paraphyletic.) The pioneering multidomain site-heterogeneous trees of Lasek-Nesselquist and Gogarten (2013) and Raymann et al. (2015) also found a Planctochlora clade but spirochaetes were not its sisters but branched one node lower, possibly because they included a much narrower range of Proteobacteria than we did.

The non-gracilicute part of the trees differ from those of Yutin et al. (2012) in numerous ways. A highly misleading feature of their results discordant with almost all other studies (references in Davis et al. 2013) is that Mollicutes were grouped with Fusobacteria, not placed within Bacilli as in our trees and those of Yutin and Galperin (2013), Boussau et al. (2008b), and Davis et al. (2013). Lasek-Nesselquist and Gogarten (2013) and Raymann et al. (2015) both excluded Mollicutes. Eubacteria-only trees robustly place Aquithermota as sister to Gracilicutes, whereas Fusobacteria, Hadobacteria, and Synthermota branch successively more deeply; they also robustly show that these five groups collectively form a major clade that we call Neonegibacteria, as it embraces all negibacteria except the deep branching Eonegibacteria (Chloroflexi, Armatimonadetes, Cyanobacteria, Melainabacteria) and two lineages belonging in Endobacteria. These findings will be discussed individually after considering Endobacteria (often confusingly called Firmicutes), long an evolutionarily and taxonomically confusing group as it includes both negibacterial and posibacterial phenotypes—as our trees strongly confirm. So many important evolutionary questions are raised by endobacterial diversity that we treat them in seven sections.

Striking evolutionary diversification of Endobacteria

When Actinobacteria and Endobacteria were established as subdivisions (=subphyla), they were assumed to be sisters, as some but not a majority of rDNA trees had shown, and were grouped together in phylum Posibacteria believed to be ancestrally characterised by a shared thick murein wall (Cavalier-Smith 2002a). Conceptually, Posibacteria (originally ranked as phylum: Cavalier-Smith 1987b) included all eubacteria then believed to lack an outer membrane (OM) (Cavalier-Smith 1987c) and did not refer to their Gram-positive staining as it was clear at the outset that some posibacteria (notably Mollicutes) stained Gram negatively and some negibacteria with OMs had thicker walls and stained Gram-positively (e.g. Deinococcus). It was assumed that endospore-forming bacteria (e.g. Selenomonas) that stain Gram negatively because they lack a thick wall had an outer membrane (OM) with lipopolysaccharide (LPS) and were ancestral to Posibacteria, postulated to have arisen from them by a single loss of murein (Cavalier-Smith 1987b, c), so ‘Selenobacteria’ were excluded from Posibacteria and tentatively grouped (as subphylum) with Fusobacteria and Fibrobacteria as new glycobacterial phylum ‘Eurybacteria’ (Cavalier-Smith 1998b). After it was found that all Heliobacteria made endospores (Kimble-Long and Madigan 2001), their relationship to ‘Selenobacteria’ appeared stronger despite no LPS having been found in Heliobacteria (Beck et al. 1990). As it then appeared that the earlier assumption that ‘Selenobacteria’ had an OM was mistaken, both groups were transferred to the new posibacterial subphylum Endobacteria and placed in class Togobacteria on the assumption that the toga of Thermotogales was an S-layer as the outermost layer of Heliobacteria appeared to be (Cavalier-Smith 2002a). Later, evidence accumulated that ‘Selenobacteria’ actually have an OM not an S-layer, so both were removed from Posibacteria and grouped with Fusobacteria as a phylum ‘Eurybacteria’ (Cavalier-Smith 2006d), which though used subsequently (Cavalier-Smith 2009, 2010a, 2014) was never validated nomenclaturally and eventually abandoned as polyphyletic (Ruggiero et al. 2015).

Most ‘Selenobacteria’ including Selenomonas, Sporomusa, and other endospore-forming genera and close relatives clearly having an OM were recently formally grouped as class Negativicutes (Marchandin et al. 2010). However, that class is not now valid under the latest edition of ICNP which requires that class names are formed by adding -ia to the stem of the type order of the class (here Selenomonadales). We therefore establish new class Selenomonadia in accord with that rule (Taxonomic Appendix). Genome sequencing confirmed that ‘Negativicutes’ have an OM with LPS (Campbell et al. 2014) and led to their classification into three orders (Campbell et al. 2015). Selenomonadia (=Negativicutes) is invariably a robust clade always nested within unimembranous groups without an OM. Their sister is Dethiobacteria; the only electron micrograph (Sorokin et al. 2008) is too fuzzy to show whether its outermost dense layer is an OM or an S-layer (which we consider more likely as we found no genomic evidence in GenBank for OM-related proteins). Genome sequencing gave no evidence for an OM in Heliobacterium, which on our trees groups not with Selenomonadia but strongly as sister to Syntrophothermus (classified with it in Clostridiales) and Carboxydothermus placed in the separate order Thermoanaerobacterales. Genome sequencing also gave no evidence for an OM in Carboxydothermus or Syntrophothermus (Djao et al. 2010; Wu et al. 2005); ultrastructurally Carboxydothermus clearly has only a single membrane and peptidoglycan is thin (Wu et al. 2005); Syntrophothermus also appears to have an outer S-layer and thin murein (though micrographs are fuzzier) but no OM (Sekiguchi et al. 2000). Thus, the maximally supported clade comprising Carboxydothermus, Syntrophothermus, and Heliobacterium (here all grouped in new order Heliobacteriales) apppears to be uniformly monoderm in phenotype, without an OM, yet with much thinner murein than in the robust clostridial subclade comprising Clostridium, Oscillibacter, and Anaerostipes, which we refer to as Clostridiales sensu stricto (s. s.); as noted below, the heliobacterial clade appears to lack teichoic acids unlike thick-walled endobacteria. The 50-RP ML tree of Yutin and Galperin (2013), using Treefinder LG+G, also excluded Carboxydothermus, Syntrophomonas, and Heliobacterium from both Clostridiales s. s. and Thermoanaerobacterales, though they did not form one clade. However, in a taxonomically immensely richer PhyloBayes CAT analysis of 21 RPs from Clostridia only, Carboxydothermus, Syntrophothermus, and Heliobacterium were a robust clade branching in the same order (Kunisawa 2015); that study also robustly showed Clostridia s. s. as a clade.

Thus, in all three studies, Carboxydothermus does not group with other Thermoanaerobacterales, which on our trees form a completely robust clade within Endobacteria comprising Thermosediminibacter (flagellate Gram-negative thermophilic anaerobes, with no thin-section EM and no mention of OM proteins in genome: Pitluck et al. 2010), Thermoanaerobacter (thermophilic anaerobes some with endospores and no OM), Caldicellulosiruptor (flagellate asporogenous hyperthermophilic anaerobes with posibacterial type cell walls), and Caldanaerobacter (anaerobic spore formers). Cavalier-Smith (2006c), considering Clostridiales too diverse, published a separate order Heliobacteriales (not yet validated). However, these Thermoanaerobacterales appeared paraphyletic as Clostridiales s. s. grouped within them in (Kunisawa 2015). The position of Clostridiales s. s. was inconsistent on our CAT trees: sister either to Thermoanaerobacterales or to Bacilli/Mollicutes. Kunisawa’s analysis in this respect is probably more reliable because of its richer taxon sampling (though he excluded Mollicutes), so we suspect that Clostridiales s. s. and Thermoanaerobacterales are a joint clade with Thermoanaerobacterales ancestral to Clostridiales. That would be consistent with both having thick murein walls and being anaerobic, whereas thick-walled Bacilli are largely aerobic. Now it is certain that Heliobacterium does not group with Clostridiales s. s. and Carboxydothermus does not group with Thermoanaerobacterales s.s., we expand Heliobacteriales to include Syntrophothermus and Carboxydothermus (Taxonomic Appendix).

The deepest branch in Endobacteria is Gram-negative Halothermothrix (order Halanaerobiales), whose genome reveals a typical glycobacterial OM with lipopolysaccharide (LPS) and typical endosporulation genes. The second deepest branch comprises Symbiobacterium, Thermaerobacter, and Sulfobacillus, whose branching topology is maximally supported, yet are all also classified in Clostridiales, showing Clostridiales to be deeply paraphyletic (or polyphyletic: see below). Often Gram-negative Thermaerobacter lacks an OM (Spanevello et al. 2002) and has no spores. Sulfobacillus thermophilus is spore forming; neither its genome nor that of five other species gave evidence of an OM or LPS. Symbiobacterium forms endospores (Ueda et al. 2004) but its genome does not evidence an OM or LPS. Thus, this clade appears uniformly monoderm in membrane topology; we remove it from Clostridiales as separate new order Sulfobacillales (Taxonomic Appendix). Kunisawa (2015) included Thermodesulfobium and Coprothermobacter in his analysis which were then assumed to be Clostridiia (Ludwig et al. 2009b). Our trees all decisively exclude them from Endobacteria and show that they are successively sisters with strong support to Caldisericum, often unwisely placed in its own phylum; they further show that this joint clade is robustly sister to Dictyglomus, also unwisely given its own phylum, and that this wider clade is robustly sister to Thermotogia forming thermophilic clade Thermocalda, which on most of our trees is strongly sister to Synergistia, also unnecessarily treated as a separate phylum. This negibacterial clade is here called phylum Synthermota (see Taxonomic Appendix). Our analyses therefore fully confirm for the first time Kunisawa’s suspicion based on gene order and gene absence that Thermodesulfobium and Coprothermobacter are neither endobacteria, nor sisters, but far away on the tree.

Polyphyly of Mollicutes

Our trees strongly show that Mollicutes nest firmly within Bacilli, so must be derived from them by murein loss; the first rDNA trees grouped Mycoplasma with Clostridia/Bacilli but lacked resolution to pinpoint their origin (Fox et al. 1980). Our trees robustly group the mycoplasma Mesoplasma with Erysipelothrix and Coprobacillus (both in order Erysipelotrichiales), but grouped another mollicute clade comprising Acholeplasma and Haloplasma with maximal support with Turicibacter instead. Turicibacter sanguinis is a non-flagellate, anaerobic, walled Gram-positive bacterium (Bosshard et al. 2002) having genes for (non-observed) sporogenesis (Cuiv et al. 2011), which previously was found to group with Haloplasma (neither methods nor tree shown) (Auchtung et al. 2016). On 16S rRNA trees, Acholeplasma and Haloplasma did not group together, though Haloplasma did group with Turicibacter and numerous environmental DNA lineages including mollicute Candidatus Izimaplasma (Skennerton et al. 2016). Our trees strongly show that Mollicutes are polyphyletic and evolved twice from different Bacilli by two independent wall losses; this was first found by Davis et al. (2013) but slightly less convincingly as the Acholeplasma/‘Phytoplasma’ clade was isolated and did not group with Turicibacter, a normal walled endosporogenous bacterium. On our ML trees, Acholeplasma (with much longer branch than Haloplasma) also failed to group with Haloplasma. Asserting Mollicutes to be monophyletic (Grosjean et al. 2014) was mistaken; their polyphyly needs to be recognised in future studies of their reductive evolution from Bacilli. A 34-RP ML tree showed that Spiroplasma is related to Mycoplasma, but that Acholeplasma and Mycoplasma form a separate clade which however did not group with Turicibacter (Davis et al. 2013). On our ML trees also Acholeplasma failed to group with Turicibacter (but support for that alternative is weak), whereas Haloplasma always did by both methods. We attribute these ML discrepancies to Acholeplasma-associated long-branch artefacts.

Yutin and Galperin (2013) found that the robust Mesoplasma/Mycoplasma clade was sister to Erysipelothrix plus Clostridium ramosum and spiroforme; they correctly believed both should be excluded from Clostridium (unfortunately their new genus Erysipelatoclostridium seems not yet validly published). That is entirely consistent with our trees, where Acholeplasma never groups with Mesoplasma or the Erysipelothix/C. ramosum subclade but was deeper; but as they did not include Turicibacter, they did not realise that Mollicutes evolved twice from two independent branches of the walled bacterial family, Erysipelotrichaceae. Erysipelothrix has distinctive murein peptidoglycan chemistry (Schubert and Fiedler 2001). Cladistically, therefore, mollicutes are secondarily simplified Bacillia and do not merit a separate class Mollicutes, which anyway would be polyphyletic. Still less do they deserve a separate phylum, which was first also called Mollicutes (Gibbons and Murray 1978), but later (confusingly) Tenericutes (Murray 1984). Separate phylum status was correctly strongly criticised by Davis et al. (2013). We urge that class Mollicutes and phylum Tenericutes be both abandoned and that Mycoplasmatales and Acholeplasmatales, their two oldest orders, are placed directly within a here broadened class Bacillia (see Taxonomic Appendix). Here, we group them with their ancestral (paraphyletic) order Erysipelotrichales as a new subclass Erysipelotrichidae embracing all three orders, which together form a strong clade on our RP trees and those of Davis et al. (2013); Erysipelotrichia Ludwig et al. 2010 was established as a class (Ludwig et al. 2009a) to contrast with another new class Bacilli (Ludwig et al. 2009a). However, it was then not appreciated how shallowly and robustly Erysipelotrichia nest within Bacilli, as shown by our trees and those of Davis et al. (2013). Excluding Erysipelotrichales from Bacilli and mollicutes from Erysipelotrichia and splitting the longest established endobacterial class Firmibacteria (=Teichobacteria Cavalier-Smith 2002) into separate classes Clostridia and Bacilli (Ludwig et al. 2009b) unwittingly simultaneously made three non-holophyletic classes, Clostridia, Bacilli, and Erysipelotrichia. Our decision now to abandon Erysipelotrichia and Mollicutes as classes eliminates one polyphyletic and two paraphyletic endobacterial classes, replacing them by one broadened holophyletic class, here renamed Bacillia to conform with rule 8 of ICNP, a change that also will prevent confusion with Bacilli excluding mollicutes. Class Clostridia remains non-holophyletic, but was recently made phenotypically more homogeneous by excluding Negativicutes (now called Selenomonadia) as a separate class (Marchandin et al. 2010). Despite rejecting class Mollicutes for formal taxonomy, we recommend retaining ‘mollicutes’ without capitals as a very useful vernacular term to refer to wall-free Endobacteria, an important polyphyletic grade of organisation for which a general term remains necessary. Discontinuing class Mollicutes also solves the problem that this name (like Bacilli here maintained informally for walled Bacillia) is not valid as it contravenes rule 8 of the current ICNP for classes (Parker et al. 2014).

Mollicute classification has been confused ever since they were put in separate order Mycoplasmatales (Freundt 1955). Most recently, five orders have been in use (Ruggiero et al. 2015). However, our trees and those of Davis et al. (2013), Gundersen et al. (1994), and Skennerton et al. (2016) suggest this is excessive as only three distinct mollicute clades are apparent. From these collectively, it is clear that Ureaplasma (sometimes placed in a separate order Ureaplasmatales, but not acccepted in Bergey’s Manual) and Spiroplasma (often segregated with Mesoplasma in a separate order Entomoplasmatales) belong in the same clade as Mycoplasma and that Mycoplasma is itself a polyphyletic genus. We therefore abandon Ureaplasmatales and Entomoplasmatales as separate orders, placing their genera and families all in Mycoplasmatales. That makes Mycoplasmatales a clade and solves the problem of demarcation between Mycoplasmatales and Ureaplasmatales. As our trees robustly show that Haloplasma is related to Acholeplasma, there was no justification for a separate order Haloplasmatales, here abandoned, formally transferring Haloplasmataceae Rainey et al. in Antones et al. 2016 to Acholeplasmatales. As Anaeroplasma is robustly related to Acholeplasma we also transfer it from Anaeroplasmatales and abandon Anaeroplasmatales. Asteroplasma formerly in Anaeroplasmatales is clearly not closely related to any other mollicutes, and likely represents a third independent loss of cell walls possibly from a deeper branching part of Bacillia rather than from Erysipelotrichales (see Davis et al. 2013; Gundersen et al. 1994) but it is premature to create a third mollicute order for it until genome data and site-heterogeneous multiprotein trees are available.

New class Halanaerobiia

Halanaerobiales have an OM, unlike all other Clostridia left in the class after removing Selenomonadia. As that is contrary to the original definition of Clostridia, formed for endobacteria with Gram-positive walls and no OM, we establish a new class Halanaerobiia to segregate them from typical Clostridia with no OM (see Taxonomic Appendix). Together with the exclusion of Selenomonadia, this for the first time makes class Clostridiia (spelling here corrected) uniformly with only a single membrane by restricting it to orders Clostridiales, Thermoanaerobacterales, Heliobacteriales ord. n., and Sulfobacillales ord. n. The first two of these orders generally have thick murein walls as in Actinobacteria, whereas the others have thinner walls as in negibacteria. We argue that their thin walls and those of Halanaerobiia are the ancestral condition for Endobacteria and that the thicker walls of non-mollicute Bacillia and Clostridiales/Thermoanaerobacterales are secondarily derived independently of the thick walls of Actinobacteria. Thus Endobacteria now comprise two classes (Halanaerobiia, Selenomonadia) with typical negibacterial envelopes (OM and thin murein) and two classes (anaerobic Clostridiia, often aerobic Bacillia) without an OM but with murein that may be thick, thin, or absent. A thin-walled Bacillus mutant shows that even thick-walled endobacteria can exist in a thin-walled state and that a thin wall is present all around the prespore cell during sporulation (Tocheva et al. 2013). We suggest that endosporulation evolved in a thin-walled ancestral endobacterium similar to Halanaerobiia and that the same thin-walled sporulation mechanism persisted after OM losses and after polyphyletic secondary thickening yielding a thick-walled posibacterial state convergently with Actinobacteria.

This four-class classification better reflects endobacterial fundamental diversity in cell organisation than previously. We do not agree with Yutin and Galperin (2013) that Selenomonadia nesting within other endobacteria requires their suppression as a class. Their referring to the results of sequence trees and morphological contrasts being ‘contradictory’ is misleading. Both are informative about different aspects of evolution and can be reconciled with a judicious evolutionary classification as done here. The widespread Hennig-initiated prejudice against all paraphyletic taxa is evolutionarily illogical (Cavalier-Smith 1998b, 2010a) and should not be a barrier to retaining ancestral class Clostridiia—if they were truly paraphyletic rather than polyphyletic. Some ancestral groups are taxonomically unavoidable in a sensible taxonomy that aims to classifiy organisms according to both their common ancestry and phenotypic disparity, given that evolution created derived groups from sometimes radically different ancestral groups that still survive.

At first sight, the presence of two negibacterial and two posibacterial classes in the same phylum is confusing. How did evolution produce this mixture in which the two negibacterial clades do not group together but are separated by (probably more than two) posibacterial ones which also do not all group together? One possibility is that Selenomonadia got their OM by lateral gene transfer (LGT); Campbell et al. (2014) suggested from BLAST results that they may have got their OM-related genes by LGT from Proteobacteria. However, it is highly unlikely that a complex OM with necessary bridges from the cytoplasmic membrane and export machinery to enable LPS, lipid, and protein transport to the OM could have evolved in one step by LGT of scores of necessary proteins. More likely, the frequency of top hits to Proteobacteria is an artefact of the vast numbers of proteobacterial sequences in GenBank compared with those for Halanaerobiia, the most likely relatives on the standard assumption of vertical inheritance. It is much more likely that the halanaerobial OM is the ancestral condition for Endobacteria and OMs were independently lost by Clostridiia and Bacillia.

Polyphyletic losses of the endobacterial outer membrane

The number of such evolutionary losses of the OM is not entirely clear as the relative branching order of clostridial orders, and with the clearly holophyletic Bacillia, is inconsistent on our CAT RP trees, e.g. one chain has Selenomonadia as sister to Bacillia, whereas the other shows Bacillales s. s. as their sister, both with maximal support. As there is another maximally supported contradiction within Clostridiia, whichever version of the tree were correct, we should have to postulate four separate losses. But if instead some hypothetical combined version of these trees were correct, one could reduce the number of losses to three or even two. The taxon-rich but site-homogeneous Bayesian tree of Kunisawa (2015) for Firmibacteria (i.e. excluding mollicutes) weakly makes Selenomonadia sister of Bacillia and has Clostridiia as an insignificantly supported clade. If it were correct only two losses would be necessary. Whether there were two, three, or (more likely) four OM (or even five if Dethiobacter has no OM) losses within Endobacteria, we must ask: why did it happen more than once in this phylum, given only two other inferred losses in the history of life (in Actinobacteria and, as argued below, independently in the neomuran ancestor)?

The answer we suggest lies in the unusual morphogenetic mechanism recently discovered for sporogenesis in Selenomonadia. Cryotomography of sporulating and germinating cells of the selenomonad Acetonema longum shows that during sporogenesis when the mother cell engulfs the prespore cell, only its inner cytoplasmic membrane (CM) grows around the prespore cell (Tocheva et al. 2011). Its growing lips whilst enwrapping the prespore cell pass round it within the peptidoglycan layer of the prespore. Being thus inside the OM of the prespore the growing CM lips therefore exclude the old prespore OM, which is not passed on directly to the daughter cell as it is in all non-endobacterial negibacteria. Instead, Acetonema loses the OM during every sporulation and a new OM is regenerated from the enwrappping mother cell CM during spore genermination. Thus, it remains true that the OM develops by growth and division of a preexisting membrane, in conformity with the universal principle omnis membrano e membrano (Blobel 1980). However, Acetonema provides a clear exception to the idea argued previously that all OMs, including those of mitochondria and chloroplasts, have arisen from preexisting OMs since the origin of life (Cavalier-Smith 1987b, c, d, 2004). We expect the same mechanism to be found in all endosporulating Selenomonadia and Halanaerobiia and predict that all negibacterial Endobacteria switch identity of the former mother cell CM to OM, some time after it enwraps the prespore murein but prior to the final stages of germination. This developmental identity switch from CM could be done by preexisting prespore bridge proteins and OM protein and lipid export machinery that has been separated from the old OM by the enwrapping mother cell CM. Topology of enwrapment generates the same OM topology before OM-specific molecules are inserted into it. Therefore, although this unique identity switch is an exception to the general rule for OM biogenesis, it adds support to the argument that membrane topology is often primary for membrane heredity, and chemical composition often secondary (Cavalier-Smith 2000, 2004).

The necessity for a CM-to-OM identity switch at every sporulation to maintain the OM into the next generation simply explains why Endobacteria is the only phylum that lost the OM more than once. Identity switching is a complex process, which like any complex mechanism cannot be perfect. It must sometimes fail through cross bridges and OM transporters not inserting properly or their insertion being so slow that a daughter cell without OM inserting molecules is generated. Sometimes such a developmental accident will survive and produce a viable endobacterium without an OM. The fact that several clades of endobacteria with no OM and only a thin murein layer exist means that they are not intrinsically non-viable. We therefore argue that four such losses in Endobacteria are much more likely than would be LGT to the ancestor of Selenomonadia. Establishing a LPS-containing OM by LGT would be so much more difficult; it almost certainly never happened in the entire history of life. After LGT, unlike in endobacterial CM-OM identity switching, a donor CM with properly assembled export and bridge complexes would not already be there, and a topologically correct OM would not already be present; even if LGT of scores of the requisite genes ever did occur (unlikely), it would almost certainly fail to make an OM morphogenetically. Too often, people underestimate the relative ease of multiple losses of complex characters than of their convergent gain. It is entirely wrong to estimate their probablity by parsimony counting of events. One must also evaluate event complexity to realistically guage their likelihood.

If our analysis is correct, Clostridiia are polyphyletic and ought eventually be subdivided into monophyletic units (whether holophyletic or paraphyletic), but that cannot be done sensibly until their internal branching order is more firmly established. For that, extremely taxon-rich Endobacteria trees with suitable outgroups and probably over 200 proteins may be necessary.

Though we currently accept OM loss only within Endobacteria and in the independent direct ancestors of Actinobacteria and neomura, we draw attention to the extremely thick Gram-positive murein wall of the chloroflexan Thermobaculum terrenum where micrographs are too poorly contrasted to reveal whether or not it has an OM like more typical Gram-negative chloroflexi (Botero et al. 2004). If it has an OM, the question arises how it gets its lipids and proteins across the thick wall. More likely, it and a minority of other chloroflexans are monoderm, some perhaps secondarily.

Rooting the prokaryote tree within monoderm Endobacteria is evolutionarily implausible

Although there are strong reasons, especially those concerned with the origin of the eubacterial flagellar motors from OM proteins why the eubacterial tree must be rooted within negibacteria, there has been a longstanding assumption (dating back at least to the early ideas of Haldane and Oparin) that their ancestor was a simple anaerobic Clostridium or mycoplasma-like fermenting cell, so many have been reluctant to concede that the cenancestral eubacterium was so complex as to have had two membranes sandwiching a peptidoglycan wall. The evidence that mollicutes are secondarily simplified by multiple losses of the peptidoglycan wall is now overwhelming. Our arguments for a unique ease of OM loss by endobacteria make it highly probable that ancestors of monoderm endobacteria were generated by OM loss as Blobel (1980) first suggested and one of us repeatedly argued and assembled extensive evidence (Cavalier-Smith 1987b, c, 1991a, b, 1992b, 2001, 2002a, b, 2006a, c, 2010a). If one were to place the eubacterial root within monoderm Endobacteria, it would likely be within Clostridiia which are mostly anaerobic, not Bacillia that appear to be ancestrally aerobic. But wherever within Clostridiia it were placed, one would have to invoke polyphyletic origins of the LPS-containing OM, which we consider evolutionarily incredible. Previously even one origin of an LPS-containing OM direct from a monoderm posibacterium was judged an evolutionary highly unlikely transition, compared with the origin first of an OM of standard phospholipids followed later by the evolution of the extremely complex LPS biosynthesis. To invoke two such origins independently is entirely unreasonable. Assuming one followed rapidly by an LGT to make a second within Endobacteria relatively soon after the first is highly implausible.

Polyphyly of classical Posibacteria

Until we realised the ease with which OM could be lost by endosporulating negibacteria as deduced from recent morphogenetic studies (Tocheva et al. 2011), it seemed unjustified to assume more than one loss of the OM unless phylogenetic evidence for Actinobacteria being unrelated to Endobacteria were stronger than it has been since some rDNA sequencers first supposed they were not directly related; no rDNA tree convincingly established eubacterial basal topology and some show Actinobacteria and Endobacteria as sisters (e.g. Mori et al. 2003). Therefore, all Posibacteria were argued to derive from a unimembanous common anestor (Cavalier-Smith 1987c, 2002a, 2006a, c) and Posibacteria have figured as a supposedly monophyletic eubacterial subkingom or infrakingdom in several prokaryote classifications (Cavalier-Smith 1989b, 1992b, 1998b, 2002a, 2006d; Ruggiero et al. 2015). Even some site-homogeneous multiprotein RP trees can group Endobacteria and Actinobacteria together as sisters (Lasek-Nesselquist and Gogarten 2013 fig. 8). But this never happened on our more accurate site-heterogeneous RP trees (the most taxon rich) or that of Lasek-Nesselquist and Gogarten (2013 fig. 5). However, Raymann et al. (2015) found a maximally supported Endobacteria/Actinobacteria clade in their less sampled two-domain Fig. 3 tree, but not with three-domains (their Fig. 5). Substantial agreement of our one- and two-domain trees, and their taxon-richness and the strong support for basal branching topology in our eubacteria-only tree give us enough confidence to now conclude that Actinobacteria and Endobacteria are most likely not sisters. Endobacteria are maximally supported as sister of Neonegibacteria, whereas Actinobacteria are near maximally supported as sisters of Endobacteria plus Neonegibacteria. That implies that Actinobacteria are somewhat older than Endobacteria (if rooting on Chloroflexi is correct) and that ancestral actinobacteria lost the OM and became monoderm before any Endobacteria did so. Furthermore, our strong demonstration that even within Endobacteria there have probably been about four OM losses means that we must accept polyphyly of monoderm Posibacteria; we must either cease to use it as a taxon or modify the concept of posibacteria to include diderm Endobacteria as did Ruggiero et al. (2015). But it now makes no sense to include Chloroflexi under the term Posibacteria as was done by (Cavalier-Smith 2014) and Ruggiero et al. (2015).

Abandoning Posibacteria as a phylum name entails raising its former subphyla, Endobacteria and Actinobacteria, each a maximally supported clade, to phylum rank (Taxonomic Appendix). As the introduction explained, Firmicutes was invented to embrace Actinobacteria and exclude mollicutes, but is now used in two contradictory senses, which is very confusing, as also is the fact that in neither sense does it refer to a unique ancestral shared character. By contrast Endobacteria as here emended refers to the ancestral character that first distinguished the phylum from all other prokaryotes, making it more distinctive and a better unambiguous name for the phylum than Firmicutes, even though endospores were secondarily lost by some included lineages.

If posibacteria are not a clade, we must explain how Actinobacteria and Bacillia/Clostridiales both share teichoic acids and a sortase for making lipoproteins unknown in any strictly negibacterial phyla but important for their shared thick wall structure. A possibility we favour is that both arose in their common ancestor after it diverged from its Melainabacteria/Cyanobacteria sister clade and were lost in the common ancestor of neonegibacteria. As both were lost at least twice in mycoplasmas, loss is possible. We searched for the teichoic acid synthesising protein TagB in GenBank and found it appears non-universal in Actinobacteria and Endobacteria so either losses occurred in both or teichoic acids orginated in one and key enzymes moved to the other by LGT. Teichoic acids appear general in Bacillia except mycoplasmas, Clostridiales sensu stricto and in most but not all of Selenomonadia, but seem absent in the two deepest endobacterial branches and rare in the next two deepest, suggesting frequent losses rather than LGT. It is now confirmed that a variety of Actinobacteria can make teichoic acids (Colagiorgi et al. 2015). Key synthetic glycosylases like TagB and TagF from Actinobacteria and Endobacteria are mutually more closely related than they are with more distant glycosylases in negibacterial phyla. That teichoic acids can exist in Selenomonadia shows that they are compatible with negibacterial envelopes so could have evolved before Actinobacteria and Endobacteria lost the OM, making them a preadaption for wall thickening, which can be regarded as parallel evolution from similar related ancestors rather than pure convergence. The same may be true of sortases. Because the common ancestor of Actinobacteria and Endobacteria probably had teichoic acid and one sortase, and these are successive branches on the tree we can regard these two phyla collectively as a monophyletic group characterised by the origin of these two wall properties and retain paraphyletic subkingdom Posibacteria, so long as we exclude Chloroflexi (unlike Ruggiero et al. 2015). If however Posibacteria were a clade as on Fig. 5 of Raymann et al. (2015), this would explain their unique sharing of sortases and teichoic acids. This possibility ought to be tested further by 200-300-protein site-heterogeneous eubacterial trees.

Though we consider it no longer useful to use Negibacteria as a taxon, negibacteria remains useful as the best vernacular term to refer collectively to all eubacteria with a porin-containing OM, irrespective of whether it contains LPS (most phyla) or not (all Chloroflexi; some Synthermota, some Hadobacteria, some spirochaetes, some Proteobacteria). The old term Gram-negative bacteria is not useful in this way and best reserved for the empirical results of Gram staining; as noted above, some bacteria that stain Gram-negatively are actually posibacteria without OM (e.g. mollicutes) and some that stain positively are actually negibacteria, e.g. Deinococcus. For clarity, it remains essential to maintain the subtle and too often ignored distinction between negibacteria (based on ultrastucture) and Gram-negative bacteria, and posibacteria (based on ultrastructure) and Gram-positive bacteria. Gram-negativity or positivity descriptors of empirical staining for light microscopy are not equivalent to the ultrastructurally defined terms posibacteria and negibacteria, which were never synonyms for the older terms. Gram staining is useful as an ancillary method for identification but not for large-scale taxonomy, unlike ultrastructure. Using Posibacteria now to include Gram-negative endobacteria makes it even less likely to be confused as a synonym with Gram-positive.

New phylum Aquithermota

The second edition of Bergey's Manual established new order Aquificales and class and phylum Aquificae for highly thermophilic negibacterial chemolithoautotrophs related to the hyperthermophile Aquifex. It also established a new phylum and class Thermodesulfobacteria for another new order (Thermodesulfobacteriales) then containing only Thermodesulfobacterium (now including Thermodesulfatator on our trees and four additional genera), a thermophilic negibacterial heterotrophic sulphate reducer ultrastructurally similar to Aquificales. It is curious that two separate phyla are still retained for such similar thermophiles especially now that sulphate reduction is known in Aquificales and there are several genera of chemoautotrophic Thermodesulfobacteriales, and the latter can group strongly with Aquificales rather than with Thermotogia, Hadobacteria, or Chloroflexi on 16S rDNA trees. As our taxon-rich CAT RP trees invariably place class Aquificia (spelling corrected to conform with ICNP), including Thermosulfidibacter whose inclusion is strongly supported despite being questioned by Gupta and Lali (2013), as sister to class Thermodesulfobacteriia (spelling here corrected to conform with ICNP) with maximal support, there is no reason to keep separate phyla for these classes. We therefore establish a new phylum Aquithermota (see Taxonomic Appendix) to group both classes together and establish order Thermosulfidibacterales for Thermosulfidibacter as its previous inclusion in Aquificales made the order as emended by Gupta and Lali (2013) polyphyletic and transferring it to physiologically more similar Thermodesulfobacteriia would have made them paraphyletic. Aquificia now has three orders and 16 genera; Thermodesulfobacteriales just 6 genera. Thus, Aquithermota have four orders and 22 genera. As they are remarkably homogeneous ultrastructurally and physiologically and certainly a clade, there can be no justification for splitting them into two or more phyla. From our RP phylogeny, we deduce Aquithermota were ancestrally anaerobic thermophiles, with hyperthermophilic microaerophilic Aquificales a derived clade.

It has long been controversial whether Aquificales are more closely related to Thermotogia (here corrected spelling for Thermotogae) or to Proteobacteria. Our trees show decisively that they are not specifically related to either. Instead, Aquithermota are maximally supported by CAT RP trees as sister to infrakingdom Gracilicutes, which includes Proteobacteria, Spirochaete, Planctobacteria, and Sphingobacteria and therefore, Aquificia are no more closely related to Proteobacteria than are the other three gracilicute phyla. Putting Aquificales in Proteobacteria (Cavalier-Smith 2002a, 2006c) was incorrect. This firm position implies that the 4-amino insertion shared by Aquifex and all Gracilicutes except Spirochaetae (Cavalier-Smith 2002a) was an ancestral character of clade Aquithermota/Gracilicutes lost secondarily by ancestral spirochaetes, which illustrates the hazard of using single indels alone to group phyla. Thermotogia, which lack that insertion, are robustly phylogenetically more distant, grouping with other thermophiles. Phylogenetic unity and likely ancestral thermophily of Aquithermota is suggested by both its major branches having reverse DNA gyrase just as do Archaebacteria (Brochier-Armanet and Forterre 2007). Concordantly with Boussau et al. (2008a), the common ancestor of Aquithermota and Thermotogia was not a hyperthermophile. If Fig. 5 is correct, it may not even have been a thermophile—unless neonegibacteria were ancestrally thermophilic and mesophily evolved repeatedly secondarily.

New phylum Synthermota

Thermotogae also were made a separate phylum in Bergey’s 2nd Edition just because they do not group reliably with other clades on rDNA trees. However, a 44-protein neighbour joining tree (Nishida et al. 2011) showed that they group strongly with three other thermophilic negibacterial groups: (1) anaerobic hyperthermophilic Dictyoglomus (non-motile chemoorganotrophs of class Dictyoglomia (Patel 2011) often treated as separate phylum Dictyoglomi); (2) thermophilic proteolytic fermenter Coprothermobacter, usually misclassified in Clostridia but recently put in new class Coprothermobacteria and phylum Coprothermobacteriota (Pavan et al. 2018); and (3) more distantly with class Synergistia comprising a mostly mesophilic family of amino acid digesters (also often treated as a separate phylum (Jumas-Bilak et al. 2009)). Our CAT trees strongly confirm that grouping to be a clade, and show that it also includes Caldisericum, an anaerobic sulphur-compound respirer recently put in class Caldisericia and phylum Caldiserica merely because of divergence on a crude 16S rDNA tree (Mori et al. 2009) as well as Thermodesulfobium, moderately thermophilic chemoautotrophic negibacterial respirers (Mori et al. 2003) currently misclassified in Thermoanaerobacterales in Clostridia. Thus, five groups related as a robust clade on eubacteria-only site-heterogeneous RP trees have been unnecessarily treated as separate phyla merely because of poor resolution of single-gene rDNA trees. Given much greater resolution attainable with RP multiprotein trees, separating them into five phyla was premature. We now group all five ‘phyla’ plus Thermodesulfobiaceae as one new negibacterial phylum Synthermota divided into two new subphyla, Synergistetes and Thermocalda, each maximally supported as clades on RP CAT trees. Our trees clearly show that contrary to Nishida et al. (2011) and Cavalier-Smith (2002a), they are not specifically related to Endobacteria, nor to Fusobacteria (Cavalier-Smith 2006d). Instead Synthermota branch with maximal support one node above Endobacteria as sister to all other Neonegibacteria. Virtually, all but Thermodesulfobium catabolise amino acids unlike the largely autotrophic Aquithermota.

Largely non-thermophilic Synergistetes has only class Synergistia with LPS biosynthetic enzymes related to those of Dictoglomus and ‘Atribacteria’ (Antunes et al. 2016; Sutcliffe 2010), whereas Thermocalda includes four former classes: Thermotogia, sufficiently distinct in their sheath-like toga partially separated from CM by a very wide periplasmic space and loss of LPS to retain class rank (now with three orders: Bhandari and Gupta 2014); Dictyoglomia, also morphologicaly distinct enough to merit class rank; plus Coprothermobacteria and Caldisericia. But Coprothermobacteria, Caldisericia, and Thermodesulfobiaceae are not mutually distinctive enough in morphology, physiology, or chemistry to be separate classes, and invariably form a strongly supported clade on RP trees; therefore, we merge all three into class Caldisericia, chosen as having the shortest name most appropriately descriptive of this robust thermophilic clade (it is also the oldest established of these classes, though ICNP does not require retention of the oldest class when merging them as it does for orders). These three groups all have relatively normal negibacterial cell envelope morphology, unlike Thermotogia and Dictyoglomia; Antunes et al. (2016) found no LPS enzymes in Caldisericum and the typically weak OM staining in this broadened Caldisericia makes it possible that LPS is absent. Coprothermobacterales (one family, one genus, two species), Caldisericales (one family, genus, species), and new order Thermodesulfobiales (in Caldisericia: see Taxonomic Appendix) are sufficiently highly ranked as orders. One does not need a phylum for each genus! Caldisericum is unusual in having an obvious cortical ribosome-free layer inside its CM which suggests a novel submembrane skeleton, so merits separate ordinal rank to reflect this uniqueness (Mori et al. 2009), which also emphasises the distinctiveness of cell envelopes in all Thermocalda.

Making these few quite similar species four classes or phyla greatly overrates their distinctiveness and unnecessarily complicates classification which should be kept as simple as is practicable and phylogenetically sound. The purpose of classification is to simplify biodiversity so that we can readily grasp it intellectually. Candidatus Cryosericum, sister to Caldisericum, has ridiculously been proposed as a new phylum purely because of sequence divergence (Martinez et al. 2018), but is important in showing that not all Thermocalda are thermophiles. When making Synergistetes a phylum, Jumas-Bilak et al. (2009) suggested that ‘a [eu]bacterial phylum is formed to accommodate a group of bacteria that cannot be aggregated to any taxon except Bacteria’; clearly that does not apply to Synergistia, and even more strongly not to any other taxa here aggregated as phylum Synthermota, which all have fundamental similarities in envelope organisation, making them distinct from Aquithermota and Gracilicutes. One of us long ago criticised the widespread practice of making phyla or ‘candidate phyla’ merely because of the low resolution of 16S rDNA trees and predicted that most candidate divisions, especially the thermophilic ones of Hugenholtz et al. (1998a, b) ‘when studied by good multiple-protein trees, will turn out to belong’ to already known phyla (Cavalier-Smith 2002a p. 67). This prediction is now fully borne out: only one of those candidate phyla turned out to be justifiably separate: Armatimonadetes. The others all group with previously known phyla on our trees.

Dictyoglomales, despite currently having only one genus and two closely related named species, are unique in having an OM well separated from the murein wall and CM by prominent hexagonally arrayed pegs (~ 80 nm long) (Hoppert et al. 2012). These pegs may be related to the 49 nm OMP-α rods that span the periplasmic space of Thermotogia (Lupas et al. 1995); unlike Thermotogales other than Fervidobacterium (Huber et al. 1990), Dictyoglomus is most unusual in being able to make giant ‘rotund bodies’ by repeatedly dividing their CM/murein without dividing their OM and additionally makes intermediary spindle-shaped assemblies of numerous protoplasts within a single membrane (Hoppert et al. 2012). Rotund bodies were first reported in the hadobacterium Thermus whose OM is also separated from murein by a wide clear periplasmic space across which fine bridges are visible (Brock and Edwards 1970), which we suggest may be distant homologues of OMP-α as some appear to have globular heads at the murein end like OMP-α. Brock and Edwards (1970) assumed that rotund bodies form by OM fusion of separate cells, but retention of daughter protoplast/murein rods within a single OM, as apparently generates the Dictyoglomus spindle-shaped assemblies, seems more likely to be a shared mechanism for both Thermocalda and Hadobacteria. Ability to make rotund bodies is deep-seated in Hadobacteria as phylogenetically distant Oceanithermus have them (Mori et al. 2004). But, as Hadobacteria are not sisters of Synthermota and rotund bodies have not been seen in Synergistia, it is unclear whether ability for partial disassociation of OM and murein, necessary for making them, evolved separately in these two phyla or reflects an ancestral mechanism in the common ancestor (i.e. the ancestral neonegibacterium), e.g. simple mutual attachment dependent only on Omp-α. Some Synergistia have close attachment of murein to the OM, e.g. Acetomicrobium, formerly Anaerobaculum, mobile (Magot et al. 1997), but Dethiosulfovibrio has a wider space in which thin rod-like bridges like Omp-α are visible (Magot et al. 1997), suggesting such simple coiled-coil bridges may be ancestral for neonegibacteria. Acetomicrobium (=Anaerobaculum) thermoterrenum grown on complex medium makes terminal sheaths bulging away from the protoplast (Rees et al. 1997) similarly to Thermotoga, suggesting that a structurally based potential for local separation of OM from murein may be ancestral for Synthermota as well as Hadobacteria. Possibly, more complex and varied attachments evolved after Hadobacteria and Synthermota diverged from the common ancestor of Fusobacteria (which have typically gracilicute-like envelopes), Aquithermota and Gracilicutes, giving them more consistently closely attached OM: the gracilicute Bacteroides seems to have both single and double bridges between OM and murein (Ushijima 1967).

Greatly differing envelope properties of Synthermota (nearly always organotrophs, rarely chemoautotrophs) and Aquithermota (nearly always chemoautotrophs, rarely heterotrophs) is consistent with being separate non-sister phyla. Synthermota may not have been ancestrally thermophilic, unlike Aquithermota. In agreement with that, a reverse gyrase tree suggests that Dictyoglomus got reverse gyrase laterally from Aquificales, whereas that of Thermotoga is less closely related (Brochier-Armanet and Forterre 2007). Given the Fig. 5 phylogeny, it appears that LPS was lost separately in Thermotogales and Caldisericia (Antunes et al. 2016); however, though core LPS-making enzymes were not identified in Caldisericum, 3-OH fatty acids suggested possible presence of LPS (Mori et al. 2009).

Eubacterial origins of hyperthermophily

It was once supposed that hyperthermophily arose in the ancestral archaebacterium and hyperthermophilic eubacteria such as Thermotogales and Aquificales evolved later by acquiring hyperthermophilic enzymes from them by LGT (Forterre et al. 2000). But evolution of reverse DNA gyrase, the most characteristic marker for hyperthermophily, does not support that; there is a single weakly supported bipartition between eubacterial and archaebacterial subtrees with no evidence for LGT between them (Brochier-Armanet and Forterre 2007; Campbell et al. 2009). As reverse gyrase is a chimaera of a eubacterial DNA helicase and eubacterial type of DNA topoisomerase I (Forterre 1996), it provided one of the strongest early proofs that archaebacteria are evolutionarily younger than and evolved from eubacteria (Cavalier-Smith 2002a). In agreement with that, a reverse gyrase distance tree rooted on topoisomerases places the archaebacterial clade within the eubacterial sequences with 66% support (Campbell et al. 2009). As the most divergent reverse gyrase sequences are from Aquithermota, which as noted above ancestrally had this enzyme, we argue that hyperthermophily and reverse gyrase most likely first evolved in stem Aquithermota and were acquired by the ancestral archaebacterium and other hyperthermophilic eubacteria by independent LGTs. The archaebacterial tree is consistent with vertical inheritance within archaebacteria (Brochier-Armanet and Forterre 2007), but the eubacterial reverse gyrase tree does not fit the eubacterial tree deduced here from RPs, arguing against vertical inheritance coupled with numerous losses—especially within Endobacteria and Proteobacteria. Contrary to RP trees, sequences of reverse gyrases of Thermotogales, ε-proteobacteria, Thermus (Hadobacteria), and hyperthermophilic endobacteria all nest within those of Aquithermota (Brochier-Armanet and Forterre 2007; Campbell et al. 2009). All Aquithermota on our trees have reverse gyrase (as do numerous other genera), whereas in Thermocalda Thermodesulfobium and Coprothermobacter lack it, but it is present in Dictyoglomus, Caldisericum and Thermotogales. As it is absent from most non-thermophilic Synergistia, Synthermota were probably not ancestrally thermophilic, in marked contrast to Aquithermota. The two deeply divergent paralogues in Aquifex are consistent with reverse gyrase having originated in ancestral Aquithermota prior to the origin of archaebacteria; two separate paralogues evolved independently in early crenarchaeotes (Brochier-Armanet and Forterre 2007). The Thermus sequence (on a plasmid, consistently with LGT) is related to the same Aquifex paralogue as that from Dictyoglomus, which is not closely related to those of Thermotoga (Brochier-Armanet and Forterre 2007), suggesting multiple LGTs to Thermocalda from Aquithermota. Not all LGT need have been directly from Aquithermota to other targets. The close relationship of Thermotoga and endobacterial sequences makes it likely that LGT also occurred directly between these two phyla. Taxon-richer trees might better establish the number and direction of eubacterial LGTs.

It might be argued that if neomura evolved from Aquithermota/Thermocalda as suggested by some three-domain trees (CAT only, e.g. Fig. S13; not ML, Fig. S14), but not by two-domain trees, reverse gyrase might have been acquired vertically by archaebacteria (rather than by LGT as we propose). However, as explained above, three-domain trees are much more distorted compared with single-domain trees than are two-domain trees and therefore likely less trustworthy than two-domain trees, so we judge that both the grouping of neomura with Aquithermota/Thermocalda and placement of Aquithermota within Synthermota as sister to Thermocalda on some CAT three-domain trees are likely LBA artefacts. Alternative grouping of neomura with Planctochlora (Planctobacteria, Sphingobacteria) on two-domain trees is technically more credible and strongly supported by many independent lines of evidence discussed in detail below for a likely genuine evolutionary relationship between eukaryotes and Planctobacteria in particular. This relationship clarifies greatly numerous previously poorly understood aspects of eukaryogenesis as well as origins of archaebacteria. Before treating this major evolutionary question, we briefly discuss the composition of Planctochlora and Proteobacteria and in somewhat more detail implications of our trees for the eubacterial evolution of photosynthesis.

Phylum Planctobacteria broadened by adding Elusimicrobia

Phylum Elusimicrobia recently established for tiny deeply branching fermentative negibacteria now includes just two genera (Elusimicrobium and Endomicrobium from animal guts) assigned respectively to classes Elusimicrobia and Endomicrobia (Geissinger et al. 2009; Herlemann et al. 2009; Zheng and Brune 2015; Zheng et al. 2016). Their affinity was previously unclear as a 22-protein ML tree weakly grouped them with Synergistia (Herlemann et al. 2009) whereas a 31-protein ML analysis including both genera put them as sister to spirochaetes; neither analysis reported bootstrap support nor used a site-heterogeneous algorithm. Our CAT analyses strongly show that Elusimicrobia are sister to Planctobacteria, the phylum initially established for free-living planctomycetes and intracellular parasitic Chlamydia on the assumption that both lacked murein (Cavalier-Smith 1987b, 1998b). Later the related Verrucomicrobia possessing murein were added to Planctobacteria (Cavalier-Smith 2002a) and it was discovered that both planctomycetes and Chlamydia have peptidoglycan remnants also (Pilhofer et al. 2008), planctomycetes and Protochlamydia complete sacculi but Chlamydiaceae only a ring at the division septum (Rivas-Marin et al. 2016). Although everyone accepts that these three groups are related, others have ranked each as separate phyla: Wagner and Horn (2006) grouped them with three other claimed phyla (Lentisphaerae (Cho et al. 2004), which ought to have been made a class within Planctobacteria sensu Cavalier-Smith 2002a; ‘Poribacteria’, and Omnitrophica (=OP3) as the ‘PVC superphylum’, compositionally equivalent to phylum Planctobacteria, apparently ignorant of its earlier establishment.

Superphylum rank was pointless taxonomic inflation that disregards their unifying characters, so we continue to treat Planctobacteria as a phylum. Uniquely in the living world, Planctobacteria share a small RNA-binding protein (sRp) of similar folding pattern to ribosomal L30, which uniquely is absent from Planctobacteria, so the unique protein may therefore be a group-specific substitute (Lagkouvardos et al. 2014); Gupta et al. (2012) found the same protein throughout Planctobacteria except Poribacteria, which did not group with other PVCs on their 16-protein tree so they questioned their inclusion in the group. An 83-protein tree for Gracilicutes weakly grouped ‘Poribacteria’ with Candidatus Hydrogenedentes, this clade being sister to Elusimicrobia plus Candidatus Aerophobetes (Kamke et al. 2014), these four groups being sister to ‘core PVC’. That well-sampled tree therefore supports Elusimicrobia and Poribacteria being part of the sister clade to core PVCs. BLAST revealed two sRp-homologues in Aerophobetes but none in Elusimicrobia. We found no convincing evidence by BLAST of L30 in any of them, the few strong hits in Chlamydias, and single one to Omnitrophica most likely being contamination or LGT from other eubacterial phyla. This is consistent with Elusimicrobia and these three environmental groups (too highly ranked as ‘phyla’) being sister to classical Planctobacteria, as our RP trees robustly show for Elusimicrobium, so we now include them all within Planctobacteria as new subphylum Elusimicrobia, and establish subphylum Euplancta with five classes to embrace classical Planctobacteria. ‘Phylum Kiritimatiellaeota’ (Spring et al. 2016) originally within Verrucomicrobia would have been more judiciously ranked as subclass; we rank Verrucomicrobiia as only a class together with two others within new infraphylum Opitutae (Table 3). By reducing the rank of Elusimicrobia to subphylum within phylum Planctobacteria, instead of 9 separate phyla as before, we now have just one: broadened Planctobacteria—a single robust clade on RP trees. This clade shares the unique propensity of often having a partially or greatly swollen periplasm through loosening attachment of murein to the CM (or loss of the sacculus altogether except at the septum). L30 appears to have been lost or replaced by sRp in their common ancestor, though sRp may not be present (or too divergent to recognise) in all subphylum Elusimicrobia.

Table 3 Revised Classification of Phylum Planctobacteria Cavalier-Smith 1987 em.

From recent cryotomographic studies of planctobacteria with sacculi, we argue that, in marked contrast to Synthermota where the OM tends to balloon away from the murein layer to form a sheath, in Planctobacteria, the often-inflated periplasm stems from greater weakness of the bridges between murein and the CM. Thus, the often swollen periplasm is not homologous in the two phyla: in Synthermota, the bridges between murein and OM are the structurally weaker link, whereas in Planctobacteria, it is bridges between murein and CM that are often broken in evolution. This consistent difference between the two phyla fits earlier arguments that differences in cell envelope organisation are key aspects of eubacterial evolution that merit great weight in higher classification (Cavalier-Smith 2002a). Weakness of the CM/murein bridges probably extends back to the planctobacterial ancestor as the periplasm is irregularly widened in Elusimicrobium (Geissinger et al. 2009) and especially Endomicrobium (Zheng et al. 2016) unlike the regular narrow state in Proteobacteria. Weak CM/murein links could have predisposed planctobacteria to their multiple losses of the sacculus and also helped the simultaneous losses of murein and the OM during the likely origin of neomura from planctobacteria as a later section explains. Like Planctomycetes, Elusimicrobia have a cell cycle involving budding, otherwise rare in eubacteria.

Our trees robustly group class Verrucomicrobiia and Lentisphaera as sisters; their joint clade (here new infraphylum Opitutae) is robustly sister to Chlamydia (here in new infraphylum Chlamydiia). Table 3 summarises the new planctobacterial classification. On rDNA trees ‘Candidatus Omnitrophica’ (misuse of the term Candidatus that properly refers only to prospective species (Parker et al. 2014)) is sister to Verrucomicrobia (Spring et al. 2016); whether it should be a subclass of Verrucomicrobia or a third class of Opitutae will depend on its phenotype and multiprotein CAT trees—but it should certainly not be a phylum.

Phylum Sphingobacteria broadened by adding Gemmatimonadetes

Sphingobacteria was the phylum name given to unite the Bacteroidetes/Flavobacterium clade, many of which have sphingolipids, and Chlorobiales (also with sphingolipids) assuming sphingolipids, gliding motility, and absence (as then thought) of flagella were shared ancestral characters (Cavalier-Smith 1987b), and formally made a phylum with classes Flavobacteria (including Bacteroidetes and Fibrobacter) and Chlorobea (Cavalier-Smith 1992b). The class names were eventually validly published (Cavalier-Smith 2002a), but later seemingly arbitrarily rejected (Tindall 2014); following Garrity et al. (2005) who invalidly split Flavobacteria into three classes (one confusingly called Sphingobacteria), most authors treat these bacteria as separate phyla Chlorobi and Bacteroidetes (Krieg et al. 2011b) even though they group together on rDNA trees. Our CAT and ML trees all group Chlorobi and Bacteroidetes as sisters with maximal support. The idea of a common ancestry was further substantiated by shared indels, some shared with class Fibrobacteriia, so all three were agreed to have a common ancestry and designated the FCB group (oddly ignoring then valid phylum Sphingobacteria) which was strongly holophyletic on an RNA polymerase C tree (Gupta 2004). Later, proteins uniquely shared by FCB taxa were identified (Gupta and Lorenzini 2007). Our RP trees do not have an FCB clade. Instead CAT trees strongly put Gemmatimonadales as sister to Chlorobi/Bacteroidetes whereas ML puts Gemmatimonadales (ranked too highly as ‘phylum’ Gemmatimonadetes: Zhang et al. 2003) weakly as sister to Fibrobacteriia. Thus, both strongly place Gemmatimonadales within FCB, so FCB alone is not a clade. However, both methods very strongly support holophyly of an FGCB group that also includes Gemmatimonadales.

As this group is extremely robust and clearly one of the four major clades within superphylum Gracilicutes, we redefine phylum Sphingobacteria to include not only Chlorobi/Bacteroidetes (each placed in new infraphyla within new subphylum Chlorobia) and Fibrobacterales (in new subphylum Fibrobacteria, a small reduction in rank) but also Gemmatimonadales (in new subphylum Gemmatimonadetes; now invalid as a class name) (see Table 2). Sphingobacteria as thus revised is congruent with RP trees and does not overrank its subgroups as they are if treated as four separate phyla. We argue below that the novel form of anaerobic photosynthesis in Gemmatimonas without chlorosomes (Dachev et al. 2017) arose following LGT of photosynthetic genes from Proteobacteria and is the most convincing example of such transfer. Fibrobacterales are non-flagellate so probably ancestrally lost flagella, but flagellar genes are present in deep-branching rhodothermian bacteroidetes Rhodothermus and Salinibacter, in Gemmatimonadales and Ignavibacteriia (Ignavibacterium, Melioribacter), and NCIL-2 from the thermophilic clade OPB56, so it is likely that the main bacteroidete subclade (new superclass Bacteroidia) and Chlorobiales lost flagella separately. Contrary to probably misrooted early rRNA trees, Ignavibacteriia are sister to OPB56 not to Bacteroidetes and this joint clade robustly sister to Chlorobea (Hiras et al. 2016), a now rejected class name replaced here by Chlorobiia to conform with the rules. OPB56 should therefore be placed in Ignavibacteriia, a here thus broadened class, which clearly belongs in Chlorobi, which deserves no higher rank than infraphylum (as here) or superclass—making Ignavibacteriae a separate phylum (Podosokorskaya et al. 2013) was unjustified rank inflation.

The ML 83-protein tree of Kamke et al. (2014) and 43-protein tree (Rinke et al. 2013) both had Gemmatimonadetes within FCB as insignificantly/weakly supported sister to Fibrobacter as in our ML trees and both strongly suggest that environmental DNA groups ‘Marinimicrobia’, ‘Latescibacteria’, ‘Cloacimonetes’ (excessively highly ranked as phyla) also should be put in Sphingobacteria; that is clearest for the first two, which branch within Sphingobacteria, but less so for Cloacimonetes that is their sister. Both strongly support clade Sphingobacteria as here emended (Taxonomic supplement). More richly sampled CAT trees are needed to check whether the alternative topology of our CAT trees is correct as we predict it will be. Though the 38-protein tree (Rinke et al. 2013) failed to resolve any deep relationships between our 14 phyla (or even show monophyly of Proteobacteria as here defined), the 83-protein tree strongly supports now-broadened Sphingobacteria and Planctobacteria being sisters (i.e. clade Planctochlora). Although most Sphingobacteria have closely parallel OM and CM (e.g. Bacteroidetes (Ushijima 1967), Ignavibacterium (Iino et al. 2010), like Proteobacteria) and no inflated periplasm, Gemmatimonas, has patches of inflated periplasm where the murein layer (incorrectly labelled plasma membrane on their Fig. 1b) more widely separates from the CM (Zeng et al. 2015); was this propensity present even in the common ancestor of Planctochlora and lost by other Sphingobacteria independently of Proteobacteria?

Krieg et al. (2011a) established phylum Bacteroidetes with three new classes: Bacteroidia, Cytophagia, and Sphingobacteriia. However, discoveries of closer phenotypic similarities between infraphyla Chlorobi and Bacteroidetes than previously supposed and the intermediate character of NCIL-2 (Hiras et al. 2016) make earlier inclusion of both in the single phylum Sphingobacteria, with Fibrobacter, taxonomically superior. All multiprotein trees strongly support inclusion of Fibrobacter and Gemmatimonas in the same phylum as Chlorobia (i.e. the invariably robust Chlorobi/Bacteroidetes clade).

We also add to Sphingobacteria the heterotrophic flagellate Calditrichales (Caldithrix and Calorithrix) as new subphylum Calditrichae, as rDNA and protein trees show they are sister to Chlorobia (Kublanov et al. 2017; Kompantseva et al. 2017). Separate phylum Calditrichaeota (Kublanov et al. 2017) was unnecessary and exaggerates their distinctiveness.

Proteobacteria comprise subphyla Rhodobacteria, Acidobacteria, and Geobacteria

In an earlier classification recognising only seven eubacterial phyla not 14 as here, Cavalier-Smith (2002a) considered that Proteobacteria to be monophyletic must include many negibacterial groups not previously assigned to that phylum, and therefore subdivided Proteobacteria into three subphyla (Rhodobacteria, Geobacteria, Thiobacteria) so as to include them. This broadened view of Proteobacteria comprising all predominantly gracilicute groups with uniformly narrow periplasm that ancestrally had external flagella (not periplasmic ones like spirochaetes) did not become widely accepted as 16S rDNA lacked the resolution to confirm or refute it. Accordingly, others later made three conjecturally proteobacterial lineages separate rDNA-defined phyla: Deferribacteres, Chrysiogenetes, Acidobacteria. Our RP trees now confirm that all three must be included in Proteobacteria if the phylum is to be monophyletic. This is so because ε-proteobacteria are so phylogenetically distant from the other nominal proteobacteria, being sister to Chrysiogenales plus Deferribacterales, whereas Acidobacteria are sisters to α-δ-proteobacteria. Cavalier-Smith had grouped ε- and δ-proteobacteria together as Thiobacteria. As our trees confirm earlier evidence that it is not a clade, we now abandon suphylum Thiobacteria and transfer δ-proteobacteria (as new class Myxococcia) to suphylum Rhodobacteria and ε-proteobacteria (as new class Nautiliia) to revised subphylum Geobacteria.

Our trees are fully concordant with a recent 98-protein study showing that Acidithiobacillia merited separation from γ-proteobacteria (Williams and Kelly 2013) and confirm that Acidithiobacillus and Thermothiobacillus are a robust clade (Hudson et al. 2014) that is sister to the β/γ-proteobacterial clade. As class names Betaproteobacteria and Gammaproteobacteria are no longer valid under ICNP, and we still think this joint clade should be one class (Cavalier-Smith 2002a), we establish new class Chromatiia to include β/γ-proteobacteria and Acidithiobacillia as three new subclasses: Acidithiobacillidae, Neisseriidae (β-proteobacteria), and Pseudomonadidae (γ-proteobacteria). The 98-protein tree put Mariprofundus, iron-oxidising lithoautotrophs (Makita et al. 2017), as distant sister of Chromatiia with strong support and grouped that joint clade with α-proteobacteria with insignificant support (Williams and Kelly 2013). Our CAT and ML trees both maximally support Chromatiia, Mariprofundus (ζ-proteobacteria) and α-proteobacteria (Caulobacteria cl. n.) jointly being a clade but are contradictory as to their relative branching order. ML agrees with previous ML trees in grouping Mariprofundus with Chromatiia, but much more weakly (64% support), whereas CAT strongly shows Chromatiia and α-proteobacteria (each of which contains photosynthetic purple bacteria) as a clade (i.e. Rhodobacteria sensu Cavalier-Smith 2002a). We suggest that the ML position of the long unbroken Mariprofundus branch is a long-branch artefact, caused by the long-branch α-proteobacterial clade being pulled one node too deeply. The likely correct CAT topology would have allowed us to retain Rhodobacteria in its original sense, which are almost certainly ancestrally photosynthetic (Imhoff et al. 2017), but the strong grouping of Mariprofundus with them and fairly strong grouping also of δ-proteobacteria with their joint clade (on CAT but not ML) makes it sensible to broaden Rhodobacteria to include Mariprofundales and δ-proteobacteria also even though neither clade is yet known to include purple photosynthetic bacteria. Inclusion of Mariprofundales in Rhodobacteria is robust to method and taxon sampling, but the position of δ-proteobacteria is sensitive to both and requires confirmation by independent evidence as they often instead group with Leptospirillum as sister to Acidobacteria. However, their grouping as sister to undoubted Rhodobacteria had over 90% support on a Mr Bayes protein tree and over 70% by ML (Lücker et al. 2013), so is probably correct. Their Bayesian tree equally strongly had Nitrospina as sister to that clade rather than to Acidobacteria. We therefore also include Nitrospina in Rhodobacteria as new class Nitrospinia (despite their ML tree having it weakly sister to Acidobacteria); treating it as separate phylum Nitrospinae (Lücker et al. 2013) was unwarranted taxonomic inflation that fails to show how it relates to other proteobacteria.

It is undesirable to further extend Rhodobacteria to include the next deepest clade (Acidobacteria plus Leptospirillum), because Acidobacteria now include Chloracidobacterium which is a moderately thermophilic, microaerophilic green photoheterotrophic bacterium with chlorosomes (Tank and Bryant 2015), not a purple bacterium. Leptospirillum was included in Geobacteria but appears not to have been formally placed in a family or order. NCBI classification assigns it together with Nitrospira and Thermodesulfovibrio and ‘Candidatus Magnetobacterium’ to ‘family Nitrospiraceae’, ‘order Nitrospirales’, ‘class Nitrospira’, and ‘phylum Nitrospirae’, though none of these higher groups appears to have been effectively or validly published. Presumably, the widespread use of ‘phylum Nitrospirae’ is based on the suggestion from a 16S rDNA tree entirely unresolved at the base that ‘Leptospirillum’ (then not a valid genus), Nitrospira’, and a clade including Candidatus ‘Magnetobacterium bavaricum’ may be a distinct ‘phylum Nitrospira’ (Ehrich et al. 1995). A 31-protein ML tree yielded maximal support for ‘Nitrospirae’ being a clade excluded from δ-proteobacteria, but only trivial support for it being sister to the sole included acidobacterial species, this clade being insignificantly supported as sister to δ-proteobacteria (Lin et al. 2014). Our CAT trees have moderate support for Leptospirillum (‘Nitrospirae’) being sister to Acidobacteria and stronger support for that joint clade being sister to Rhodobacteria rather than to ε-proteobacteria, but near maximal support for all these taxa forming a clade that also includes Deferribacterales and Chrysiogenales; apart from it strongly excluding Aquifex, this clade corresponds exactly to Proteobacteria sensu Cavalier-Smith (2002a). Therefore, ‘Nitrospirae’ cannot reasonably be excluded from Proteobacteria unless ε-proteobacteria are also excluded, which would be an undesirable break with past classifications. We therefore instead establish new class Nitrospiria and group it with former class Acidobacteria (now Blastocatellia) as new subphylum Acidobacteria within phylum Proteobacteria. This revised classification is as conservative as we could make it, as subphylum Acidobacteria has exactly the same circumscription as former phylum Acidobacteria Thrash and Coates 2012 (Thrash and Coates 2011), just slightly lower rank.

As validly published Acidobacteria Cavalier-Smith 2002a was rejected as a class name for no apparent reason (Parker et al. 2014), we replace it by new class Holophagia in conformity with Rule 8 of ICNP; however, that rejection does not stop continued use of Acidobacteria as a phylum or using it as subphylum as we now do. We do not agree that the relatively small additional divergence of Holophagales compared with other Acidobacteria was an adequate reason for separating them as class Holophagae (Fukunaga et al. 2008). Instead Acidobacteria sensu Cavalier-Smith (2002a) should remain one class, for which we adopt class name Blastocatellia (Pascual et al. 2015) in a broadened sense as the older alternative Holophagae is invalid under ICNP rule 8. Order Acidobacteriales Cavalier-Smith 2002a was also nomenclaturally rejected without sound reason (Tindall 2014). These unwise rejections caused taxonomic confusion within Acidobacteria as Acidobacteriaceae ceased to have a valid order or class. As class Acidobacteria explicitly included Holophaga, Acidobacterium, and Geothrix, which collectively merit only one class, we rectify that problem by establishing new order Terriglobales for former Acidobacteriales, as order Acidobacteriales must no longer be used (but it still is, e.g. Foesel et al. 2016). We also create new family Nitrospiraceae, order Nitrospirales, and class Nitrospiria. As Nitrospira and Thermodesulfovibrio are genetically as divergent as Chromatiia and α-proteobacteria (Lin et al. 2014), and more divergent than any Holophagales are from each other we establish a separate order Thermodesulfovibriales that also includes ‘Magnetobacterium’. We group Nitrospiria and emended Blastocatellia as sole classes in new subphylum Acidobacteria of Proteobacteria (see Taxonomic Appendix) and provide a new formal description for Rhodobacteria. Thus revised, Acidobacteria and Rhodobacteria are sister clades.

Class Ferrobacteria and its type order Geovibrionales validly published by Cavalier-Smith (2002a) also were eventually unfairly nomenclaturally rejected without a specific reason (Tindall 2014), so alternative names Deferribacteres and Deferribacterales (not validly published until later in 2002) are now widely used for the same groups. However, Deferribacteres is now invalid under ICNP rule 8, as is class Chrysiogenetes with sole order Chrysiogenales of anaerobic arsenate-respiring flagellate negibacteria (Garrity and Holt 2001b). As Chrysiogenales and Deferribacterales are strongly sisters on our CAT and ML RP trees and have rather similar phenotypes (Deferribacterales subclade comprising Denitrovibrio and Seleniivibrio can also respire arsenate (Denton et al. 2013; Rauschenbach et al. 2013)), making both separate phyla (Chrysiogenetes, Deferribacterales: Garrity and Holt 2001a, b) was unjustified rank inflation. We therefore group both orders of metal reducers in the same new class Deferribacteria within a reestablished proteobacterial subphylum Geobacteria, which comprises Deferribacteria plus ε-proteobacteria, fairly strongly supported sisters on CAT but separated on ML RP trees (see Taxonomic Appendix). As Epsilonproteobacteria (Waite et al. 2017) is now an invalid class under rule 9 of ICNP, and Epsilobacteria published earlier (Cavalier-Smith 2002a) was later rejected, we make new class Nautiliia for ε-proteobacteria, which are markedly different in flagellar structure and motility from those proteobacteria classified in Rhodobacteria (Beeby 2015). Flagella of Deferribacteria like those of Nautiliia are polar or bipolar and cells spiral or curved, Flexistipes being the only non-flagellate filamentous genus (Fiala et al. 1990); it is important to study their flagellar structure in detail as it is possible that distinctive features of Nautiliia flagella and motility are characteristic of all Geobacteria. Flagella of Acidobacteria need similar study to see if this might be the ancestral state for all Proteobacteria and those of peritrichous Rhodobacteria like Escherichia coli secondarily simplified.

Defects of ranking prokaryote taxa by arbitrary rDNA divergence

For decades, microbiologists have used rDNA similarity as a practical rule of thumb for assigning new prokaryote ‘species’ to existing orders, classes, and phyla. Commonly if it robustly groups on a 16S rDNA tree with an existing clade widely accepted as a phylum, it is assigned to that phylum, but if grouping is uncertain, it is often made the basis for a new phylum. The number of supposed ‘phyla’ has mushroomed, 39 listed on a popular website as of 19 July 2018 ( When this practice is (a) formalised by adopting an arbitrary numerical cutoff of 75% 16S rDNA identity as a threshold of divergence claimed to be sufficient reason to split prokaryotes into separate phyla (Yarza et al. 2014) and (b) extended to uncultivated environmental sequences to propose ‘candidate phyla’, supposed phylum numbers explode to 118 (Hug et al. 2016), which is scientifically unsound and taxonomically unwise.

The Candidatus concept when applied to partly studied species whose names are not validly published is practically useful, but extending it to phyla is seriously harmful to science, nomenclature, and taxonomy, as it tends to formalise ignorance rather than knowledge, and divert attention from the need for better evaluating the reasons for giving high ranks to some taxa. In traditional taxonomy, merely quantitative divergences like size, numbers of bristles on an insect leg, or flowers on a stem were treated as minor differences valuable for distinctions at low ranks only. Phyla were based on major evolutionarily very stable qualitative differences in shared body plan, as in chordate, arthropod, or molluscan animals or vascular plants versus green algae. In eukaryotic microbes, the same is done using a combination of ultrastructural and molecular characters, only 8 phyla now being recognised in kingdom Protozoa and 8 in kingdom Chromista (Cavalier-Smith 2018). That conservative approach ought to be applied in prokaryotes too to make phyla biologically meaningful and practically valuable for grouping definitely related subgroups more clearly and economically. Spirochaetes, Cyanobacteria (excluding Melainabacteria), Actinobacteria, Chloroflexi, and Endobacteria exemplify good, sensible eubacterial phyla, and the eu/archaebacterial distinction an excellent supraphyletic one, each of whose body plans are very different from the others. But establishing 34 new ‘candidate phyla’ in a ‘candidate phyla radiation’ (CPR) (Hug et al. 2016) of unknown body plans is a reductio ad absurdum of ranking by numerical thresholds (Yarza et al. 2014). As Candidatus by definition is of indeterminate rank, the phrase ‘candidate-phyla’ is thoughtless self-contradiction. CPR lineages are all miniaturised eubacteria (ultrastructurally negibacteria) with tiny genomes that are likely to be rapidly evolving (thus exaggerating their significant divergence) and likely mutually related as a single clade; the idea that they and convergently miniaturised DPANN archaebacteria (see below) are early primitive life forms (Castelle and Banfield 2018) will probably eventually be shown to be a serious misinterpretation as was Woese’s similar idea based just on hyperaccelerated rDNA trees that microsporidia were the most primitive eukaryotes (Vossbrinck et al. 1987). Subdividing them into phyla merely because many branches exceed the scientifically meaningless 75% difference threshold is taxonomically harmful. In our present state of ignorance, the ranks of CPR cannot be sensibly discussed, but there is no reason yet to think that more than one phylum will ever be needed for CPR as a whole. None of their genomes was available when our analyses began. But that probably does not seriously limit our conclusions, because more likely than not the whole CPR clade really belongs in a well-known phylum, e.g. Proteobacteria, and their separate position on the published ML tree is a LBA towards neomura caused by their accelerated evolution. Merely having accelerated sequence evolution through reductive evolution is not a rational reason for subdivision into numerous phyla.

That would be equivalent to subdividing parasitic long-branch microsporidia into lots of phyla merely because their 16S rDNA evolved so much faster than in other protozoa (Bass et al. 2018). But protozoologists avoid the mistake of believing that rDNA is a molecular chronometer and that mere differences in evolutionary rate is of any deep evolutionary or taxonomic significance; examples of major accelerations in rDNA evolution associated with marked cell miniaturising exist in rhizarian chromists—though temporarily taxonomically confusing their extreme rDNA divergence was no reason to establish new phyla (Stentiford et al. 2017). It would be ridiculous to set arbitrary levels of sequence divergence for ranking in protists as some do in prokaryotes. Degree of divergence must be taken into account by taxonomists, but not arbitrarily preset (Yarza et al. 2014) or overvalued compared with biologically more meaningful characters.

The recently described genus Abditibacterium (name effectively but not validly published) exemplifies current low standards of prokaryote higher level taxonomy, as new ‘phylum Abditibacteriota’ was proposed on the weak basis of rDNA trees including no other cultured bacteria but Armatimonadetes and protein trees including only Armatimonadetes and Chloroflexi (or additionally with Deinococcus) (Tahon et al. 2018) and added uncritically to NCBI ‘taxonomy’. This lineage may just be a deep-branching member of phylum Armatimonadetes meriting no higher rank than order or class; the more reliable protein trees do not rule that out and the rRNA trees show abditibacteria and WS1 as long branches that might have been artefactually excluded from Armatiomonadetes. The phylum was described thus: ‘defined based on the phylogenetic analysis of 16S rRNA gene sequences. Members of this phylum form a stable lineage separate from candidate lineage WS1 and Armatimonadetes’. That is totally inadequate and almost meaningless as that ‘description’ could apply to every phylum except Armatimonadetes. It neither specifies the clade included nor gives any characters it possesses. Most papers naming phyla are even worse as their ‘definitions’ seldom mention any taxa included or excluded! This paper was too recent for us to include its RPs in proper bacteria-wide trees and better evaluate it.

More patience and more knowledge are needed before a stable prokaryote higher taxonomy is possible. The purpose of hierarchical ranking is to simplify classification by keeping the number of highest ranked taxa as low as we reasonably can so it is easier for human brains to grasp the big picture of biodiversity (Cavalier-Smith 1998b). Earlier, Cavalier-Smith (1992b) recognised 13 eubacterial and two archaebacterial phyla, later reducing them to 7 and 1 (Cavalier-Smith 2002a), modified to 12 eubacterial ones in Cavalier-Smith (2006d). Cavalier-Smith (2002a) argued that many candidate eubacterial phyla then being proposed from environmental DNA sequencing of hot habitats would prove really to belong in known phyla when better resolving multigene trees were available. That has turned out to be true, Armatimonadetes being the only one that still merits that rank. Recent discoveries and the results of our RP trees now mean that we recognise 14 distinct eubacteria phyla (Fig. 11); in archaebacteria, only Euryarchaeota and Filarchaeota should be accepted as phyla, making 16 prokaryote phyla, the same as the number now recognised in kingdoms Protozoa and Chromista collectively. Nine of the 14 current eubacterial phyla were already represented as clades on the classic early 16S rDNA trees, which also showed Chlorobiales plus Bacteroidetes and Planctomycetes plus chlamydiae as clades (Woese 1987). However, the branching order on that pioneering rDNA tree amongst ‘phyla’ was almost entirely wrong, as shown by our more accurate completely resolved CAT RP tree, except for one thing: Planctobacteria and Sphingobacteria being sisters. Thus, site-heterogeneous RP trees are a much better basis for prokaryote evolutionary taxonomy than were site-homogeneous 16S rDNA trees. But even if a sister relationship exists with high support, as between Melainabacteria and Cyanobacteria, that alone is not a sound reason for lumping such microbes into one phylum (Utami et al. 2018), if distinctions between them are important enough to merit separate phyla as originally suggested (Di Rienzi et al. 2013).

Many relationships unclear from 16S rDNA are now well established by multiprotein trees. But even they can be confused and seriously biased by hyperaccelerated evolution in secondarily miniaturised cells like many parasites, as exemplified in eukaryotes by microsporidia and Mikrocytos which belong in phyla Opisthosporidia and Retaria respectively and do not merit separate phyla as sequence divergence alone would misleadingly tell a mindless computer. Eubacterial Dependentiae (SM6; often endoparasites of eukaryotes), none with validly published names so not includable in formal taxonomy but unwisely called a ‘candidate phylum’ (McLean et al. 2013; Deeg et al. 2019), are a prokaryote example, almost certainly just a branch of phylum Proteobacteria that might eventually deserve class or subclass rank within subphylum Acidobacteria.

Multiprotein sequence divergence ranking: illusory objectivity

Problems of false topology and extremely idiosyncratic rate changes would be reduced but not eliminated if multiprotein trees were used rather than rDNA, especially if some correction is made for rate differences across taxa, as in a eubacterial study of 120 proteins (Parks et al. 2018). Their elaborate computer-based approach concluded that the then 65 CPR ‘phyla’ collectively merited no more than one phylum, exactly as we did by a few minutes’ thought by one human brain, so their normalisation method to allow for rate differences is clearly superior to the naive rDNA distance approach. However, the method is not as objective as they supposed and has many arbitrary aspects. First, their divergence estimates depend on knowing the root position, the most controversial phylogenetic inference of all; they sidestepped the problem by using midpoint rooting and averaging, which is not evolutionarily or scientifically correct but merely computationally convenient, and necessarily biases estimates. Secondly, linear interpolation of divergence times is arbitrary as rates undoubtedly change with time in unpredictable ways. Thirdly, there is no particular evolutionary or biological significance of degree of sequence divergence, so using it as the sole criterion for ranking is not objective but a subjective choice just done for convenience. In fact, they did not choose a specific degree of divergence for establishing phyla, classes, orders, etc. objectively. Instead, they calculated the median degree of divergence for existing taxa of a given rank (presumably based on the hodge-podge NCBI taxonomy, but not explicitly stated) and used that to assign ranks of clades on their new trees. Therefore, their supposed objective method largely perpetuates the errors in judgement made earlier by erratic RNA distance ranking criticised above—not precisely, because their trees will be better and they will have been able to reduce polyphyly, and the spread of degrees of sequence divergence associated with different ranks is less. Furthermore, they did not apply the results consistently but made various manual adjustments (not individually specified or justified). Phylogenetic computer programmes all have a subjective basis of partially incorrect or arbitrary assumptions and of choice of input data or of algorithms. Reassuringly for the present study, Parks et al. (2018) found that using only 16 RPs gave almost as accurate trees as 120 proteins, 16S rDNA alone being much less accurate. The 26 RPs for eubacteria and 51 RPs for archaebacteria used here should be even closer to 120-protein trees. Our CAT trees are probably better than their trees because of the site-heterogeneous model. All our trees are likely to be less biased by long-branch artefacts as we excluded the CPR ‘phylum’ whose presence (together with that of neomura) probably explains why the eubacterial backbone of the Hug et al. (2016) tree has almost no bootstrap support.

It is extremely hard to evaluate their higher taxonomy as Parks et al. (2018) do not even list the excessive 99 eubacterial phyla (114 on website plus 11 archaebacterial) in their system and their website is extremely opaque—I could not find any such list or list of which classes are in each phylum comparable to our Table 2 or see how one could assess the effects of their computer output on particular groups of interest, and so on. From their Fig. 2, it appears that a given degree of normalised sequence divergence can correspond to two different ranks, and some classes even on their taxonomy can have the same degree of divergence as some phyla or some orders, so it is misleading to imply that it applies one objective standard to ranking. However, even though more sensible for CPR, it has grossly inflated the number of other taxa at each rank compared with NCBI, which for phyla (and arguably classes) at least is the opposite of what is required for a good sensible taxonomy that makes things simpler without compromising phylogenetic accuracy. Partly because they want to be able to feed every genome into one tree, the phylogenetic methods were chosen for computer speed not accuracy. It is good to use multiprotein trees rather than rDNA as a guide for establishing higher taxa, but better to use a representative sample to enable more accurate methods and study of artefacts, e.g. of taxon sampling, and to integrate results with other evidence, as here, when deciding on ranking, and not to delegate that important taxonomic function to an arbitrarily programmed computer. Parks et al. (2018) is not a practically useful contribution to taxonomy as none of their presumably numerous new names is validly published or individually explained. Such methods may be useful to genome sequencers wishing to assign quickly an unknown genome to approximately the correct place in the tree, but are inadequate as a general reference taxonomy, for which the eclectic classical approach used here is greatly preferable. Of the supraspecific names they used only 18% are validly published. There is a great risk that ranking by automated methods with opaque assumptions will cause thoughtless splitting, unnecessary name changes and overcomplications with no clear rationale. For sound higher taxonomy, human thought and expert taxonomic judgement is needed, which should not be pejoratively labelled subjective. It is always based on objective evidence. A posteriori ranking by one brain based on all available evidence is superior to a priori ranking by arbitrary numerical thresholds which give different results with different algorithms and data samples.

The excessive number of phyla and rank inflation generally in Parks et al. (2018) arises because they do not even consider the possibility of using intermediate-ranked categories like subphyla, infraphyla, superclasses, subclasses, superorders, suborders, as standard in eukaryote taxonomy, which would greatly improve prokaryote classification if more widely adopted by reducing drastically the number of phyla and classes, thus increasing comprehensibility and providing a better quick overview of bacterial diversity, as Table 2 exemplifies. We should not lose sight of the primary simplifying purpose of classification, best served by severely limiting the number of highest ranked taxa and keeping numbers relatively small at each higher rank by proper use of intermediate categories, especially in ultradiverse groups like Proteobacteria.

Supraphyletic prokaryote taxa

We here treat Prokaryota as a superkingdom or empire with Eubacteria and Archaebacteria ranked as kingdoms as in the seven-kingdom system of Cavalier-Smith (1986) and Ruggiero et al. (2015) and therefore now rank Euryarchaeota and Filarchaeota as phyla, the only two in the kingdom. The 14 eubacterial phyla need grouping in higher-level taxa. Earlier subkingdom Unibacteria, grouping posibacteria and archaebacteria (Cavalier-Smith 1998a), is polyphyletic and Negibacteria is multiply paraphyletic so we abandon them. The most fundamental and likely ancient contrast is between the primitively LPS-free Chloroflexi, here assigned to new subkingdom Chlorobacteria originally a phylum (Cavalier-Smith 1992b). The other 13 are here divided into three subkingdoms (two new) whose common ancestor ancestrally had an OM with LPS, but which multiple character losses made phenotypically heterogeneous. Earliest branching bacteria with LPS are the new subkingdom Eoglycobacteria, the clade comprising Armatimonadetes plus the Cyanobacteria/Melainabacteria subclade, invariably with murein sacculus and an OM with LPS. They are sisters to a much larger group that is more heterogeneous in envelope structure, here divided into two paraphyletic subkingdoms: Posibacteria (Actinobacteria, Endobacteria) and Neonegibacteria (infrakingdoms Gracilicutes and Thermobacteria infrak. n.). Why evolution makes acceptance of some ancestral (paraphyletic) groups like prokaryotes or Gracilicutes necessary or desirable was explained previously (Cavalier-Smith 1998b, 2010a).

Infrakingdom Gracilicutes (Cavalier-Smith 2006a) is now thoroughly established as monophyletic but RP trees show the other two infrakingdoms in that interim classification are polyphyletic, so we abandon them but establish paraphyletic infrakingdom Thermobacteria to embrace ancestrally or largely thermophilic phyla Aquithermota, Synthermota, Hadobacteria plus Fusobacteria that though not thermophilic nests somewhere within them. We establish superphylum Planctochlora for Planctobacteria plus Sphingobacteria, which are always a robust clade on site-heterogeneous eubacteria-only trees.

Battistuzzi and Hedges (2009) using less accurate site-homogeneous methods for 25 proteins thought they had established two major ‘clades’ of the less thermophilic eubacteria, but our RP trees imply that neither is a clade. Their ‘Terrabacteria’ comprise Chloroflexi, Cyanobacteria, Posibacteria, and Hadobacteria which never form a clade on our trees. Essentially, they comprise the most basal lineages on our trees plus Hadobacteria, so are polyphyletic. They are not a clade partly because of the inclusion of Hadobacteria (that likely artefactually grouped with Actinobacteria by ML) and also because their tree was misrooted in the neomuran stem; if our rooting is correct, even if we excluded Hadobacteria from terrabacteria they would be paraphyletic. Their group hydrobacteria is identical to Gracilicutes but as the name Gracilicutes was proposed three years earlier they should have used it and cited the indel evidence for it previously explained by Cavalier-Smith (2006a). Their tree had a phylum Sphingobacteria clade but wrongly put both Sphingobacteria and Spirochaetes within Planctobacteria, so failed to show the Planctochlora clade, which is inconsistent not only with our trees but most other recent multiprotein studies. In contrast to our trees that better sampled deep phylogeny, Thermotogales and Aquificales were sisters and jointly sister to Fusobacteria. Gracilicutes/hydrobacteria are not a clade, because neomura probably evolved from them as shown by our two-domain trees. GenBank should stop using the polyphyletic Terrabacteria group. Table 2 provides a better higher classification of eubacteria.

Eubacteria were ancestrally photosynthetic

The seven phyletically distinct eubacterial groups possessing photosynthesis are found in five different phyla: Chloroflexi, Cyanobacteria, Endobacteria, each with only one major kind of photosynthesis, plus Proteobacteria and Sphingobacteria, each with two distinct types. As photosynthetic reaction centres (RC) are all homologous (Sadekar et al. 2006), photosynthesis evolved once only, so we have to explain why none of these phyla is sister of another, all being interspersed with the nine entirely non-photosynthetic ones. Woese (1987) suggested that the ancestral eubacterium was possibly photosynthetic and Cavalier-Smith (1987b, 1992b, 2001, 2002a, 2006a, c) more strongly argued that the first eubacterium was photosynthetic. If so, photosynthesis was lost independently by immediate ancestors of all nine non-photosynthetic ones. As evolutionary loss is very easy by simple deletion and would often have been selectively advantageous in producing specialised heterotrophs or chemoautotrophs, it is entirely reasonable that numerous losses occurred—nine is far fewer than the number of photosynthesis losses inferred in eukaryote kingdom Chromista, though additional losses must have occurred within the four non-cyanobacterial phyla just listed. Yet ever since the reality of LGT was demonstrated, many have preferred to invoke LGT is an alternative explanation of the scattered distribution of photosynthesis, but have usually done so with extremely weak evidence or even no explicit suggestion of a source or sink of postulated transfers. Figure 11 shows how the two RC types map onto the now robust RP tree.

If the tree is rooted on Chloroflexi, the cenancestral RC was heterodimeric type II with distinct L/M paralogues, as previously argued (Cavalier-Smith 2002a, c), which would have evolved from a pre-LUCA homodimeric ancestor of L/M with identical subunits having five transmembrane helices that itself probably evolved from a simpler single-helix protein such as the light-harvesting (LH) antenna of Chloroflexi and Proteobacteria (Olson 2001). RC II proteins are shorter and simpler than RC I proteins and transfer electrons to (bacterio)phaeophytin quinones that could have been available prebiotically and are thus mechanistically more plausible than RC I as ancestral, contrary to a widespread view (Cardona 2017; Martin et al. 2018; Olson 2001). RC I appears to have evolved in an ancestor of cyanobacteria by duplication of RC II followed by a gene fusion linking it with the 6-transmembrane helices of a CP43-like protein to make the 11 transmembrane helices of RC I (Murray et al. 2006). As 6-helix CP43/CP47-like proteins are restricted to cyanobacteria, they probably arose at least as early as the stem lineage preceding the divergence of Cyanobacteria and the Endobacteria/neonegibacteria branch of the tree. This fusion could have occurred in the precyano/endobacterial stem after it diverged from Armatimonadates or one node earlier on the RP tree in the stem of all glycobacteria after it diverged from Chloroflexi. CP43/CP47 proteins may have played a central role in evolution of a second reaction centre (pre RC I) before the gene fusion that made RC I.

The great antiquity of photosynthesis by L/M reaction centres is reinforced by an eighth lineage known only from environmental DNA sequencing (‘Eremiobacterota’ = WPS-2 ‘candidate phylum’: Ji et al. 2017) which is sister to Armatimonadetes on 38-protein trees by FastTree (less accurate than ML) apparently having distinctive L/M reaction centres and RuBisCo in four sublineages from boreal mosses (Holland-Moritz et al. 2018; Ward et al. 2019). L and M proteins are most closely related to those of Chloroflexi but so distant that we cannot infer LGT from one to the other, making it probable that the common ancestor of Chloroflexi and WPS-2 was photosynthetic and both lineages multiply lost photosynthesis. Their bacteriochlorophyll synthesis gene bchY evolves rapidly like those of Heliobacteria and Chlorobi so it is likely that their grouping together on the tree is a long-branch artefact; anyway, this tree gives no evidence of LGT to or from other phyla and shows deep divergence from all. If this novel anoxygenic photosynthetic lineage were genuinely sister to Armatimonadetes, the second deepest branch after Chloroflexi on our trees, it could with advantage be made a new armatimonad subphylum rather than a novel phylum, which would increase the number of ancestrally photosynthetic phyla to six. Alternatively, as WPS-2 were as close to Chloroflexi as to Armatimonadetes by rDNA (, it might just be a highly divergent chloroflexan lineage deserving subphylum rank, as RC trees imply—Chloroflexi were nearly as close as Armatimonadetes on the 38-protein tree. It is vital to culture eremiobacteria to test inferences from metagenomes and study all aspects of their biology, including cell envelope structure and whether they have chlorosomes as they apparently branch so close to the inferred base of the tree. If they turned out to lack an OM, they would be candidates for a primitively monoderm lineage ancestral to negibacteria, thus the most divergent bacteria of all.

On this view, the ancestral L/M reaction centre was inherited vertically by Proteobacteria but was lost by Heliobacteria, Chlorobi, and Acidobacteria that kept the new RC I instead (in its original homodimeric form; only in the ancestor of cyanobacteria did RC I undergo duplication and divergence to make heteromeric (PsaA/B) photosystem I). The only plausible example of RC LGT between phyla to date is for the sphingobacterium Gemmatimonas (in Sphingobacteria: Fig. 5), whose L and M proteins both nest on trees within those of Rhodobacteria, implying that its reaction centre came by LGT from a proteobacterium after Proteobacteria and Sphingobacteria diverged. In contrast to Gemmatimonas, the presence of RC I related to that of Chlorobi in Chloracidobacterium cannot confidently be attributed to LGT, as it is not nested within Chlorobi, but is their sister just as it would be if it had been inherited vertically from the common ancestor of Gracilicutes, which could have possessed both RC I and RCII, and one cannot rule out the possibility that RC I was lost several times within Proteobacteria as RCII clearly has been. Thus vertical inheritance and lineage sorting by differential loss can explain RC present distribution with minimal LGT. Only cyanobacteria kept both RC I for photosystem I and RC II for oxygenic photosystem II. This interpretation is fully compatible with the distribution of indels in RC proteins which rules out many theoretically possible LGTs.

The gene fusion had to involve only one of the cenancestral RC proteins L and M, but it is impossible to determine which fused to make RC I as they are equidistant from RC I on trees. Though cyanobacteria kept RC II it was not the ancestral L/M version. Sequence phylogeny and RC 3D structure decisively show that an independent cyanobacterial D1 and D2 arose by a duplication of RC II independent of the one that generated L and M (Cardona 2015). It is simplest to suppose that D1 and D2 arose at the same time as the origin of RC in the same stem lineage. Indeed, the very same RC II duplication that preceded the gene fusion (whether of L or M) could have been serial and yielded four copies: one could have fused with CP43-like to make RC I, two could have diverged to make D1 and D2. The three structural regions unique to D1 and D2 (Cardona 2015) must have arisen in their immediate common ancestor (likely a secondarily homodimeric intermediate) before the duplicates diverged. D1 and D2 both have homologously attached peripheral chlorophylls that allow excitation energy transfer from core antenna to RC that are also present in RC I (Cardona 2017). This sharing of chlorophyll-coordinating histidines in homologous regions of RC I and D1/D2 (but not the ancestral L/M RC II) is simply explained if the same duplicated subunit (whether L or M) was ancestral to both RC I and D1/2 and this sequence signature arose in their common ancestor after it diverged from L/M in the glycobacterial stem lineage prior to the D1/D2 divergence. Thus, it is not necessary to suppose that photosystem II is chimaeric as recently argued (Cardona 2017), nor that the last common ancestor of all photosynthesisers had two RCs.

This model for RC evolution is therefore simpler than any previous ones and allows more gradual evolution and successive increases in complexity as well as later simplifications of non-chloroflexan anoxygenic lineages. This exemplifies how evolution becomes simpler to understand if one has a robust correctly rooted tree, and maps innovations carefully onto it, and how incorrect rooting can make things appear over-complex. Our interpretation implies a period of multiple RC II duplications and mutational divergence in the glycobacterial stem after it diverged from Chloroflexi but before the origin of oxygenic photosynthesis in the cyanobacterial stem, followed by differential losses as glycobacteria radiated.

Though RC evolution was largely vertical, one clear example of LGT exists within Chloroflexi: transfer from the Roseiflexus subclade (suborder Roseiflexineae) of the major exclusively photosynthetic subclade (order Chloroflexales of class Chloroflexia) to an unnamed member (CP2_42A) of the predominantly non-photosynthetic Anaerolineidae. This LGT is particularly convincing as it involves an unusual secondarily fused L/M fusion gene that evolved at the base of the Roseiflexus/Kouleothrix subclade (Roseiflexineae) and it involves a non-controversial serious mismatch between the RNA polymerase phylogeny which probably roughly represents organismal and cell lineage evolution, and the RC II tree. However, a second claimed LGT of an unfused operon from Chloroflexineae into Anaerolineidae involving the common ancestor of Candidatus Rosilinea gracile and JP3_7 is likely a misinterpretation, as discordance between these trees is markedly less: moving the fusion subclade across just one node on the RC tree (for which no statistical support is given) would make it congruent with the organismal/polymerase tree. If our interpretation of vertical inheritance of the Rosilinea RC is correct that would make photosynthesis the ancestral condition for Chloroflexi in accord with our view that photosynthesis extends back to LUCA. That would imply numerous losses of photosynthesis within Chloroflexia similarly to the numerous losses that most now accept occurred within purple bacteria (subphylum Rhodobacteria) of Proteobacteria. A slightly inaccurate sparsely sampled RC II tree plus vertical descent seems to us at least as likely as LGT. Though Ward et al. (2018) regarded multiple losses as ‘more complex’, losses are probably mechanistically simpler than LGTs yet there appears to be a subjective bias towards invoking LGT rather than losses in many bacterial papers.

Before chloroflexan RC IIs outside Chloroflexia were known, Shih et al. (2017) claimed to have demonstrated recent LGT of anoxygenic photosynthesis into Chloroflexi, but that was based purely on Chloroflexales nesting relatively shallowly within Chloroflexi and deeper branching photosynthetic lineages such as ‘Rosilinea’ appearing to be unknown. In other words, it depended on assuming that such deep branching lineages never existed and the assumption that Chloroflexi RCs were never lost. They did not specify a possible ancestor for that purported LGT so their assuming LGT was explanatorily empty and devoid of direct evidence. The only known bacteria with L/M RC II that could possibly be donors are Rhodobacteria and Gemmatimonas. If the donor was either, then Chloroflexi RCs should nest clearly within those of Proteobacteria as do those of Gemmatimonas; indeed, they should nest even more shallowly if the LGT were as recently as 867 Ma as claimed, as crown Rhodobacteria are probably over three times that age, RP trees like Fig. 5 implying they are somewhat older than stem Cyanobacteria. They do not, whether on separate L and M trees or on a concatenated tree (Imhoff et al. 2017), but are invariably distant sisters. This directly rules out both possible sources of LGT for Chloroflexi RCs. On the concatenated tree (Imhoff et al. 2017), the relative length of the Chloroflexi and rhodobacterial sister branches is as expected if they diverged early at the very base of our RP tree.

Furthermore, if Chloroflexi got RCs by LGT, they also would have had to get bacteriochlorophyll synthesis genes; but their BchX and BchL proteins (subunits of protochlorophyllide oxidoreductase (POR) and chlorin reductase (CR), the two enzymes that make the bacteriochlorin precursor of bacteriochlorophyll a) are more closely related to those of Chlorobi than to those of Proteobacteria (Gupta 2012). Evolution of bacteriochlorophyll genes is complicated and was often interpreted in terms of ill-specified LGTs (Xiong et al. 2000), but trees are confused by using paralogue rooting which is extremely unreliable and biased by long-branch attraction (LBA) when stems between paralogues are very long (Cavalier-Smith 2002a, 2006d), as is so for POR and CR (Gupta 2012; Xiong et al. 2000). If POR and LR subunit trees are rooted on Chloroflexi as here, instead of by extremely distant outgroups subject to LBA artefact as before (Gupta 2012; Xiong et al. 2000), all are congruent with the RP trees, so no LGT need be invoked. Cyanobacteria have two very different POR BchL paralogues, one sister to the proteobacterial proteins and one related to Heliobacterium BchL (Gupta 2012; Gupta and Khadka 2016). This implies a BchL duplication in the cyano/endobacterial stem (or one node earlier) and differential loss of one of the two paralogues in subsequent lineages. On that interpretation, the BchL tree is congruent with our RP tree rooted on chloroflexi. The paralogue shared by clade C cyanobacteria and Proteobacteria has a unique glutamate insertion (Gupta 2012), so the other paralogue (found in Chloroflexi, Chlorobi, Heliobacteria) must be the ancestral version if Chloroflexi diverged first. Thus, no LGT is required to explain the patchy distribution of (bacterio)chlorophyll synthesis proteins other than LGT from proteobacteria to Gemmatimonas (which has the glutamate insertion (Gupta and Khadka 2016)) as for its RC, which could have been mediated by one transfer of the entire photosynthetic gene cluster including RC and Bch genes. Previous more extensive LGT assumptions stem from misrooting the tree and failing to recognise distinct paralogues.

The actinobacterium Rubrobacter though non-photosynthetic has two of the three POR proteins (BchN and B); trees for both show that they do not nest within any photosynthetic phyla, so give no evidence for LGT (Gupta and Khadka 2016). It lacks CR but has homologues of all three units of magnesium chelatase (BchD, H, I) the enzyme that inserts Mg++ into protoporphyrin IX the first unique step in bacteriochlorophyll synthesis, which is homologous with the 3-subunit cobalt chelatase, not to POR/CR; BchI is not homologous with the other subunits but with the huge and ancient AAA+-ATPase family (Sousa et al. 2013a). BchD is homologous with von Willibrand factor A (WfA) and its tree is congruent with RP trees if rooted on Chloroflexi (not spuriously by WfA (Sousa et al. 2013a)). The presence of these enzymes in one of the deepest actinobacterial branches means that other actinobcteria lost them and is consistent with the RP trees and our argument that Actinobacteria and all other non-photosynthetic eubacteria lost photosynthesis secondarily. These proteins may be relics of their inferred eubacterial photosynthetic ancestry retained through acquiring other uses. BchI phylogeny is complicated by there being two ancient paralogues in Chloroflexi and Chlorobi, but their joint tree was apparently incorrectly rooted (Sousa et al. 2013a); we root it between subclade A comprising only Chloroflexi, Chlorobi, and Proteobacteria (including Acidobacteria) and subclade B containing Chloroflexi, Chlorobi, and Heliobacteria (Endobacteria). If A and B are treated as separate clades, A is precisely congruent with the RP tree if rooted on Chloroflexi; B is more complex having two seemingly paralogous Chlorobi subclades, but if the longer of these (likely to be LBA-sensitive and topologically misleading) is omitted, B topology is also identical to the RP tree. That implies vertical descent since LUCA of both A and B BchI paralogues and that Sousa et al. (2013a) misrooted the tree within the B paralogue. We speculate that Rubrobacter lost the large subunit BchB of POR and evolved a simplified dimeric enzyme with different function from trimeric POR. The alternative assumption that the putative BchN/B dimer was a precursor of photosynthesis (Gupta and Khadka 2016), not a relic, could be true only if the universal tree were rooted within Actinobacteria, which indel evidence strongly rejects (Gao and Gupta 2005, 2012; Gao et al. 2006) and would require LPS to have been secondarily lost by Chloroflexi.

All three POR and CR subunits are homologous with those of the three-subunit nitrogenases discussed below and must have a common origin. BchI only is homologous also with ParA, the ATPase that functions for segregating chromosomal DNA in all eubacterial phyla and many archaebacteria (Barillà 2016). As ParA function almost certainly evolved preLUCA and like POR/CR and nitrogenase is present in Chloroflexi and works as a simple homodimer, we suggest it is likely ancestral to BchI and evolved before the trimeric homologues arose by gene duplication preLUCA. On that interpretation, protein-coded photosynthesis and nitrogen fixation both evolved before LUCA.

Chlorosomes are glycosyldiacylglycerol lipid monolayer vesicles containing thousands of molecules of bacteriochlorophyll c whose self-assembled stacks are exceptionally efficient at harvesting dim light in Chlorobi, some Chloroflexi, and the proteobacterium Chloracidobacterium (Hohmann-Marriott and Blankenship 2007; Orf and Blankenship 2013). They are attached to the cytoplasmic membrane by a homo-oligomeric base-plate protein (CrmA) that contains bacteriochlorophyll a (universal in anoxygenic phototrophs) and transmits excitation energy from its antenna chlorophylls to RCs (Oostergetel et al. 2010). Chlorosomes include 10% carotenoids that enhance antenna assembly and also contribute some excitation to RCs, and differ in the three groups, and some quinones that help survival in oxidative conditions and are simplest (just menaquinone) in Chloroflexi. CrmA homology and unique chlorosome structure shows that chlorosomes evolved once only. LGT in contradictory hypotheses was frequently supposed to ‘explain’ their phylogenetically patchy distribution (Olson and Blankenship 2004), but evidence for any seems absent.

As multiple losses are mechanistically easy and would be advantageous in lineages specialising in bright light habitats, we argue that preLUCA chlorosome origin and universal vertical inheritance coupled with numerous losses in chlorosome-free lineages is a better explanation. Before the ozone layer developed after cyanobacteria made enough oxygen, UV radiation would have been so intense that photosynthetic bacteria were probably confined to deep or extremely well-shaded habitats where benefits of chlorosomes would be at a premium. Only after the 2.4 Ga great oxidation event (GOE) could phototrophs invade brighter habitats and polyphyletically evolve new antenna complexes adapted to different light regimes: phycobilisomes of cyanobacteria, bacteriochlorophyll g of Heliobacteria, and novel purple carotenoids of Rhodobacteria. Chloroflexi suborder Roseiflexineae (Gupta et al. 2013), a shallow subclade much younger than GOE (Shih et al. 2017), arguably lost chlorosomes secondarily. Chlorosomes of suborder Chloroflexineae transfer excitation to RC via ringshaped integral membrane LH complex B808-866 that contains γ-carotene and two polypeptides related to those of the carotene-containing rhodobacterial ring LH (Xin et al. 2005). By contrast, Chlorobi and Chloracidobacterium use water-soluble Fe/S Fenna–Matthews–Olson (FMO) protein trimer instead, which must have evolved in a common ancestor and was not transferred by LGT between them. Previous LGT ideas, from or to Chloroflexi (Olson and Blankenship 2004), are incompatible with this dichotomy. Our well-resolved RP tree enables simpler interpretation by vertical inheritance: the chloroflexan ring LH is the ancestral state retained by Rhodobacteria, but FMO evolved in the ancestral gracilicute (likely from RC I PScA, which we argue evolved well after Chloroflexi: Olson 2004) and was retained by Chlorobi and Chloracidobacterium with chlorosomes, where FMO replaced the ring LH, whereas rhodobacteria lost chlorosomes and FMO but kept the ring LH. Chlorobi can also be considered derived as their chlorosomes often have bacteriochlorophylls d and/or e as well as c, unlike the other two green bacterial groups (Hohmann-Marriott and Blankenship 2011). The idea that cyanobacteria arose by fusing two lineages was never mechanistically plausible as bacterial cells never fuse (except in some actinomycete filaments within a species). Vertical inheritance, gene duplication in the precyano/endobacterial or glycobacterial stem, and subsequent divergences and losses fully explain their origin, as elaborated above. Sousa et al. (2013a) also refuted the fusion hypothesis.

Molybdenum-dependent nitrogenase evolved before LUCA

Like photosynthesis, evolution of nitrogen fixation has been misinterpreted and LGT too often invoked through misrooting the tree and misunderstanding paralogues. Nitrogen fixation is known in euryarchaeotes and 12 eubacterial phyla including Chloroflexi but not in two small phyla (Armatimonadetes, Hadobacteria as suggested by our GenBank searches for Nif genes), so the claim that nitrogenase is not generally found ‘in deeply rooted linages’ (Boyd and Peters 2013) is mistaken. The nitrogen-fixing enzyme has two parts: a homodimer homologous to BchL and BchX that donates electrons, and a heterotetrameric acceptor with subunits homologous to BchN/Y and BchB/Z. The three related nitrogenase families use different metals: vanadium (V) by Vnf nitrogenases in Cyanobacteria, Endobacteria, Proteobacteria, and Euryarchaeota only, iron (Fe) by Anf nitrogenases in Proteobacteria, Sphingobacteria, and Euryarchaeota only, molybdenum (Mo) by Nif nitrogenases in all seven phyla. All species with Fe or V nitrogenases also have Mo nitrogenases, which occur additionally in Chloroflexi, Actinobacteria, Aquithermota, Synthermota, and Planctobacteria. As the taxonomically rarer V/Fe nitrogenases always group within Mo nitrogenases on concatenated sequence trees rooted on BchLNB proteins and have shorter branches, we infer that Mo nitrogenases evolved prior to LUCA, that V nitrogenases evolved no later than the cyano/endobacterial stem, whereas Fe nitrogenases evolved from a V-nitrogenase later still in the gracilicute stem from which euryarchaeotes inherited them vertically. This is consistent with isotopic evidence for a Mo-based nitrogen cycle going back at least 3.2 Ga (Stüeken et al. 2015), and with the combined sequence phylogenetic and palaeontological evidence that archaebacteria are at least three times younger than photosynthetic negibacteria (see below). However, Boyd et al. (2011a) claimed non-Mo nitrogenases to be ancestral and first evolving in euryarchaeotes and after transfer by LGT into eubacteria that Mo-nitrogenases only evolved later after the GOE. As we explain below, both conclusions were entirely unjustified phylogenetically; indeed, a few months later, three of the same authors (Boyd et al. 2011b) contradictorily but correctly concluded that Mo enzymes were ancestral, yet still kept the erroneous idea that they began in methanogens (Boyd and Peters 2013). Their errors probably stem partly from supposing that the universal root is between eubacteria and archaebacteria, but especially from misinterpreting paralogue trees, as nitrogenase evolution is complicated by multiple paralogues, e.g. two distinct paralogues occur in Chloroflexi, five in methanogenic archaebacteria, four in Endobacteria, and about four in Proteobacteria. We attribute most paralogues to early duplications and divergence but identify one clear case of LGT.

The metal cofactor of one subclade containing only endobacterial and methanogen paralogues has not been identified; but groups within the Mo enzymes, so does not affect our argument that Mo use was ancestral. The long-branch Chloroflexales subclade was also assumed not to be assigned to a particular metal cofactor (Boyd et al. 2011b), but we argue is almost certainly Mo-dependent as Chloroflexales also have genes for NifE and NifN subunits of the NifEN heterotetrameric scaffold essential for assembling the FeMo cofactor, and are more closely related to NifD and NifK respectively than to Anf or Vnf proteins (Boyd et al. 2011a). Their joint tree strongly suggests that NifD and NifK diverged from each other long before the AnfK/VnfK common ancestor diverged from NifNK (Boyd et al. 2011a fig1C). That is to be expected if the Mo-nitrogenase DK heterodimer evolved from a preexisting BchYZ that in turn arose before LUCA (as argued above) and if the Anf/VnfK branch arose later in the cyano/endobacterial stem. If the NifN and Nif/Anf/VnfK subtree is considered separately Anf/VnfK branches within NifK, suggesting that their cofactor assembly scaffold evolved secondarily from the FeMo cofactor scaffold. On the NifD and NifE/Anf/VnfD subtree Anf/VnfD branches within the homologous NifD, not within NifE, suggesting that the D paralogue of non-Mo nitrogenases also arose secondarily from a Mo-dependent ancestor. VnfE and N branches are very long and group together, not with either NifE or NifN from which one might have expected they evolved. We suggest that their grouping together and the long VnE/N branch is either an artefact of LBA and ultrapid evolution of the scaffold associated with V/Fe cofactor assembly or else one of these proteins is misannotated and might actually be orthologues of N or of E (one each of Anf and Vnf). Whether no AnfE/N homologues being identified stems from an even greater divergence or because cells use NifE/N for this function needs investigation. But the foregoing evidence for Anf and Vnf proteins both being secondarily derived from Nif proteins invalidates the assertion that ‘“VnfEN” branch near the root of the tree’ (Boyd et al. 2011a).

Applying molecular clock algorithms to an even more highly paralogous tree combining all these Nif proteins with BchNBYZ was extremely unwise and could not possibly have given sensible dates for anything given the clear evidence from the paralogue trees for hyperaccelerated evolution in most stems of the trees. This error was compounded by misrooting the fundamentally non-clock-like tree of VnfEN, the most recent in-group of all. The absurdity of that pseudo-clock analysis is shown by two things. First, the base of the crown of the supposedly most ancient VnfEN clade was assigned the youngest age (~ 0.7 Ga from Fig. 4 of (Boyd et al. 2011a)—consistent with their having evolved after both Nif and Bch as we argue, but not with non-Mo scaffold having being the most ancient of these proteins. Second, the base of the crown of BchZ, which must have preceded LUCA as explained above, is dated as only about 1.75 Ga and no Bch crowns are dated as older than the GOE. Third, all nitrogenase subclades are dated as < 2 Ga, inconsistent with isotopic evidence that Mo-nitrogenase is > 3.2 Ga. None of this makes evolutionary sense; careful cross comparison of evidence, as we attempt here, should have revealed the fundamental flaws of that meaningless ‘temporal’ analysis of paralogue trees that so dramatically flout oversimplified assumptions of ‘clock’ algorithms—useful only if applied to relatively uniformly evolving single orthologues and calibrated by fossil dates needing no signifcant extrapolation beyond the direct evidence (neither true here).

In concatenated nitrogenase HDK trees rooted on BchXYZ the single V/Fe subclade is maximally supported and nests within ancestral (paralogous) Mo-nitrogenases comprising two ancient paralogues (Boyd et al. 2011b; Boyd and Peters 2013). Within the Fe-nitrogenase subclade, the sole archaebacterial sequence (Methanosarcina) is sister to the gracilicute clade (Chlorobi/Proteobacteria), which does not support their claim that eubacteria got nitrogenase from archaebacteria. Within V-nitrogenases Methanosarcina nests weakly within eubacteria (Cyanobacteria/Endobacteria/Proteobacteria), thus also not supporting that claim. On one tree, V and Fe nitrogenases are sisters (Boyd and Peters 2013); on the other, V-nitrogenase is weakly ancestral suggesting they are of fairly equal age but the taxonomically restricted Fe form evolved somewhat later. The probably Mo-dependent nitrogenases of Chloroflexales are sister to the well-supported major Mo subclade of two major subclades (here designated A and B) each of which contains a maximally supported deep-branching endobacterial clade (that does not nest within any other phyla) as well as Proteobacteria and contrasting sets of negibacterial phyla. The dual position of Endobacteria and Proteobacteria cannot reasonably be attributed to LGT and likely represents a gene duplication involving all three proteins before Endobacteria and Proteobacteria. Clade A includes the endobacterial Heliobacterium/Deulfitobacterium subclade (i.e. Peptidococcaceae: Antunes et al. 2016), Cyanobacteria, Aquificales, and four proteobacterial subclades; though cyanobacteria and Aquificales appear within Proteobacteria (contrary to RP trees) this may be poor tree resolution not LGT. Clade B includes a well-supported eubacterial subclade with Methanosarcina its sister; eubacteria comprise a different endobacterial subclade (e.g. Clostridium) that is sister to a maximally supported clade comprising three gracilicute phyla (Planctobacteria, Sphingobacteria, Proteobacteria) plus the chloroflexan Dehalococcoides (Boyd et al. 2011b). The eubacterial part of subclade B implies vertical inheritance plus one relatively late LGT from the sphingobacterial stem to Chloroflexi. Methanosarcina appears to be sister to clade B eubacteria which is discordant with our prokaryote RP trees where archaebacteria branch with the gracilicute subclade planctochlora. This may indicate that it represents a third ancient subclade or that it branches too deeply because of unusually fast evolution and LBA. In a tree omitting Chloroflexales, a methanogen-only Mo-dependent clade (Methanococcus/Methanobacterium) is maximally supported sister to V/Fe nitrogenase. Contradictorily, their earlier tree put it as the most divergent of all nitrogenases (no significant support), presumably partly why they clung to the groundless belief that nitrogenase evolved in archaebacterial methanogens. However, as nitrogenase is unknown in Filarchaeota, we cannot strictly disprove two independent LGTs of Mo-nitrogenase from eubacteria, but the fact that neither nests within any eubacterial phylum makes that unlikely; therefore, we suggest that the methanogen-only Mo clade may represent another early diverging vertically inherited paralogue that diverged from the ancestral V/Fe paralogue before GOE, but after these clades diverged from Chloroflexales. The Endobacteria/methanogen subclade of unknown metal cofactor, which from its depth and non-grouping with the Fe/V clade we suspect is Mo-dependent, also nests within the Mo-nitrogenases; within this subclade, methanogens nest within Endobacteria, suggesting either that archaebacteria evolved from Endobacteria (Valas and Bourne 2011) or, as we suggest through its discordance with our RP trees, that Methanobacteria obtained this paralogue from Endobacteria by LGT (opposite to the LGT direction claimed by Boyd et al. (2011b)). The best sampled tree shows all three methanogen clades nested firmly within different eubacterial paralogue subtrees (Boyd and Peters 2013). Therefore, if their inheritance were vertical, eubacteria are ancestral to archaebacteria, as all neo- and palaeontological evidence when correctly interpreted shows (Cavalier-Smith 2006a, c, 2013a, 2014).

Paralogue trees combining BChl and Nif/Anf/VnF proteins have been completely misunderstood. Collectively, they have not just the 18 proteins with different names, but at least 15 more Nif paralogues of non-universal distribution. It is naive to suppose that they can all be rooted by adding a single outgroup such as ParA, the most likely ancestor, as this could only join the tree in one place (if itself a single paralogue) yet in fact each subparalogue has its own subtree and root—and roots will be of different ages depending on where in the tree the duplication generating the younger one occurred. To interpret such trees, one must identify each paralogue and recognise that evolutionary rates are often so much greater in paralogue subtree stems than in crowns that LBA will usually give spurious roots for each on the composite tree, possibly wrong in different ways. Previously, nobody attempted to disentangle such matters as done above, so earlier ideas were mutually contradictory and at variance with other evidence. We have inferred that duplications that generated BChl and all three subunits from an ancestor like ParA, as well as later duplications that generated the five named Nifs that are homologous with them, must all have occurred before LUCA; duplications making Vnfs probably postdated Cyanobacteria/Chloroflexi divergence, and Anfs arose after Endobacteria and Cyanobacteria diverged. Our interpretations are simpler than those previously, with many fewer LGTs, and compatible with the RP (likely organismal) tree and the early Archaean isotopic evidence for RuBisCo-based photosynthesis and Mo-based nitrogenase about a billion years before GOE and billions more years before archaebacteria evolved.

Not only does nitrogenase and FeMo scaffold phylogeny decisively disprove an archaebacterial ancestry for nitrogenase, but so does phylogeny of NifB, which is essential for making the FeMo cofactor. NifB is unrelated to nitrogenase in most eubacteria (Cyanobacteria, Actinobacteria, many Endobacteria, Sphingobacteria, and Proteobacteria); it exists as a gene fusion between an N-terminal domain from the S-adenosyl methionine (SAM) protein family and a C-terminal domain related to the NifX/NafY family (Boyd et al. 2011b). The only eubacteria in which NifB has the presumably ancestral state of separate unfused SAM- and NifX-related genes are Chloroflexi and Peptidococcaceae (Endobacteria). The simplest interpretation is that the SAM/NifX fusion occurred in the cyano/endobacterial stem after it diverged from Chloroflexi and that Endobacteria alone initially retained both unfused and fused versions, which were differentially lost in its sublineages, the unfused version being lost independently in Cyanobacteria, Actinobacteria, and Neonegibacteria. Euryarchaeotes also lack NifB fusion proteins: Methanococcus has separate SAM and NifX-like proteins, but only SAM genes were found in Methanosarcina (Boyd et al. 2011b). Rooting the SAM domain tree on the chloroflexan Dehalococcoides would make methanogen sequences branch from the cyano/endobacterial stem, the very point where the major nitrogenase gene duplications occurred, making it possible that they represent an ancient unfused version of NifB that persisted in the backbone of the tree until after all neonegibacterial phyla evolved. Alternatively, the methanogen genes may have evolved faster (suggested by failure to find NifX) and simply branch too low on the tree. Boyd et al. (2011b) used paralogue rooting with endobacterial molybdenum biosynthesis protein MoaA as the outgroup, which being very distant would likely have caused LBA to misroot the tree within the methanogen subtree, thereby contributing to the misconception that nitrogenase itself came from methanogens despite there being no direct phylogenetic evidence for that.

Planctobacterial origin of Neomura

Our two-domain RP trees are contradictory concerning the eubacterial ancestors of neomura. Eukaryotes always appeared within Planctobacteria but in slightly different places (none strongly supported). Prokaryote trees were less consistent, placing archaebacteria slightly lower, either beside or near the mostly robust Planctobacteria/Sphingobacteria clade: with 26 genes, CAT-GTR put archaebacteria weakly as sister to Planctobacteria/Sphingobacteria, but with 51 genes did not fully converge, one chain putting them as sister to Planctobacteria/Sphingobacteria, the other more deeply as sisters of Gracilicutes plus Aquithermota. Less accurate ML put archaebacteria within gracilicutes, but Planctobacteria were one node lower: thus, with 26 proteins, archaebacteria appeared as sisters of Sphingobacteria only and with 51 proteins to a likely artefactual clade comprising Sphingobacteria and Spirochaetes. All trees therefore placed neomura unambiguously with, almost all within, Gracilicutes; most with Planctobacteria and/or Sphingobacteria their sisters. Though such a grouping with Planctochlora was not found previously for RPs, three published three-domain rDNA trees if correctly rooted beside Chloroflexi put neomura as sisters of Planctomycetes (Brochier and Philippe 2002; Whitman 2009; Williams et al. 2012); we know none grouping them with Sphingobacteria. Two of them took more effort to avoid LBA than the generality of rDNA trees that mostly use site homogeneous methods without excluding fastest evolving sites, and therefore tended to put neomura with Aquithermota and/or Synthermota.

All two- and three-domain RP trees exclude with maximal or near maximal support neomura from within Actinobacteria or Endobacteria (which collectively include all certainly monoderm eubacteria). All place neomura strongly (CAT) or weakly (ML) within Neonegibacteria (typically with/within Planctochlora on two-domain trees or on three-domain trees with them or Aquithermota and/or Thermocalda), not sister to any monoderms. On site-heterogeneous trees, for neomura to group with either posibacterial phylum would require them to cross at least two, maximally or near maximally supported, clades. Even on ML trees, archaebacteria do not have to cross any significantly supported nodes to be sister of Planctobacteria—usually only one unsupported node. Even though the huge rate acceleration in neomuran and ribosomal stems means that a large majority of the ancestral information concerning their position must have been lost, our taxonomically extremely comprehensive site-heterogeneous RP trees are the strongest sequence tree evidence yet that neomura did not evolve from monoderm posibacteria as was long argued on parsimony grounds to minimise OM losses (Cavalier-Smith 1987c, 2002a, 2014). Instead, they provide strong support for neomuran origin from gracilicute negibacteria by simultaneous loss of murein and the OM. We therefore now abandon the idea that neomura evolved from posibacteria by loss of murein only, as happened during the polyphyletic origins of mycoplasmas. The OM was therefore lost more frequently than once supposed. As an endobacterial ancestry is excluded, loss could not have involved endospores as did multiple OM losses in Endobacteria. Nor is there any evidence that murein hypertrophied to make an extra thick wall as is likely for the ancestral actinobacterium.

Instead OM loss probably involved mutations breaking or inactivating OM lipid transport mechanisms associated with the bridges linking CM and OM. As there is no cell biological or other reason to regard Sphingobacteria as likely ancestors of neomura, but many arguments for a direct evolutionary link between Planctobacteria and eukaryotes, as a later section explains, we argue that our trees placing eukaryotes within Planctobacteria are likely historically correct, whereas those putting them slightly lower as sister to Planctobacteria/Sphingobacteria or to Sphingobacteria or more rarely with Aquithermota/Thermocalda may be misleading. As well as Planctobacteria (comprising Planctomycetia, Verrucomicrobia, Chlamydiia, Elusimicrobia, and other less studied lineages) being a very robust clade on RP trees (and nearly all other published trees), their shared cell envelope features make them particularly good candidates for simultaneous loss of murein and OM. Their periplasmic space is usually inflated and much thicker than in other negibacteria, thus with many fewer strong connections directly between murein and the CM. Moreover, many have undergone partial loss of murein, which in Planctomycetia and Chlamydia especially is so sparse it was originally thought entirely absent (Cavalier-Smith 1987b). Therefore, many planctobacterial cells probably depend less on either murein or the OM for mechanical support than do typical negibacteria, so their simultaneous loss may have been less traumatic than the original assumption of neomuran descent from posibacteria (Cavalier-Smith 1987c).

As argued in the next two sections, many features of the eubacterial rod-like cell growth pattern and division mechanism were retained throughout the inferred planctobacterial to archaebacterial transition. As later sections explain, the intermediate almost certainly had cortical microtubules (mts) like those of the verrucomicrobial Prosthecobacter, which additionally would have stabilised stem neomuran cells during evolution of their new glycoprotein walls/surface coats from a preexisting planctobacterial S-layer. Therefore, origin of neomura from a planctobacterial ancestor is mechanistically less traumatic than would have been origin via a posibacterial wall-less L-form in the original model for earliest stem neomura (Cavalier-Smith 1987c). Retention of so many eubacterial features during the transition explains why the archaebacterial cell cycle is so fundamentally similar to that of their eubacterial, specifically planctobacterial, ancestors.

Two shared features of archaebacteria and eukaryotes previously rationalised in terms of an actinobacterial ancestry are proteasomes and serine/threonine (ST) kinases, both crucial for the origin of eukaryotic cell cycle controls. Both can now be explained as well or better by a planctobacterial origin of neomura. ST kinases are even more abundant in Planctobacteria than in Posibacteria but not restricted to these groups, found more sparsely in Chloroflexi, genus Myxococcus of δ-proteobacteria (where their presence led to the mistaken notion of this group being involved in eukaryogensis by cell fusion), Spirochaetes, and Gemmatimonadia. On an ML tree, neomuran ST kinases (and the sole spirochaete one) group within those of Planctobacteria, whereas Myxococcus and posibacterial ones branch more deeply closer to Chloroflexi (Arcas et al. 2013), essentially congruently with the RP tree. Eubacterial proteasomes were originally thought to be only in Actinobacteria (Maupin-Furlow 2012) and are still only well studied in them (Becker and Darwin 2017), but the recent genome sequencing explosion shows 26S proteasome components in every prokaryote phylum except Spirochaetes, so they evolved before LUCA and must have been lost in some proteobacteria, e.g. Escherichia coli. Thus, proteasomes are no longer a reason for singling out actinobacteria as neomuran relatives.

The ubiquitin system that labels proteins for proteasomal digestion was once thought eukaryote-specific, but ubiquitylation is now known in diverse prokaryotes, but may not be the ancestral protein-tagging mechanism; for such labelling, distinct prokaryotic ubiquitin-like proteins (Pup) used by Actinobacteria and related Ubact system requiring different conjugases from ubiquitin may be older, being found in Armatimonadetes, a few Proteobacteria, and many Planctobacteria (Lehmann et al. 2017). Archaebacteria and hadobacterium Thermus have a tagging mechanism whose tags (SAMPs) are distantly related to ubiquitin, but sampylation requires only the E1 enzyme, not E1, E2, and E3 like eukaryotic ubiquitylation (Fu et al. 2016). A few filarchaeote archaebacteria (some Asgards, and thaumarchaeote Candidatus Caldiarchaeum subterraneum) have genuine ubiquitylation (Fuchs et al. 2018); though that was assumed to be ‘ancestral’, ubiquitylation more likely evolved in eubacteria as E2 homologues abound in Planctobacteria and also occur in Posibacteria, Cyanobacteria, and Myxococcus. Attributing all these eubacterial ubiquitylating enzymes to multiple LGTs from eukaryotes (Arcas et al. 2013) seems just to reflect the widespread, essentially evidence-free, prejudice that the universal root is in the neomuran stem: in fact on their ML tree, the eukaryotic E2s are a clade robustly within paraphyletic eubacteria and closer to Planctomycetia than to most posibacterial and cyanobacterial sequences (Arcas et al. 2013).

Thus, ubiquitylation probably evolved as early as the common ancestor of Cyanobacteria and Actinobacteria and passed vertically to neomura from their planctobacterial common ancestor, likely together with probably younger sampylation (post neonegibacteria). Their present distribution is explicable as differential losses of one or other functionally equivalent tagging machinery in different lineages, e.g. loss of sampylation by eukaryotes and ubiquitylation by most euryarchaeotes (scattered distribution of ubiquitylation components across the entire archaebacterial tree (Adam et al. 2017) is best explained by ancestral presence and multiple losses). Early origins, functional redundancies, and differential losses shaped cell evolution much more than is generally recognised.

Apparently unaware of the neomuran theory (Cavalier-Smith 1987c) or of the strong evidence that the universal tree is actually rooted within eubacteria (Cavalier-Smith 2002a, 2006d), Devos and Reynaud (2010) listed numerous planctobacterial characters shared with eukaryotes that they interpreted as evidence that planctobacteria may be phylogenetically closer to neomura than are any other eubacteria. These similarities were all dismissed as superficial convergence (or results of hypothetical LGTs) and against phylogenetic evidence (McInerney et al. 2011). On the contrary, our RP two-domain trees are the first reasonably clear sequence tree evidence for a planctobacterial ancestry for neomura, especially eukaryotes, as Reynaud and Devos (2011) explicitly suggested. For the first time, we show that a planctobacterial origin is NOT contrary to phylogenetic evidence but fully consistent with it and may actually be correct. In criticising Devos and Reynaud (2010) and the neomuran and other versions of phagotrophic origin of eukaryotes (Cavalier-Smith 2009; De Duve 2007), McInerney et al. (2011) misleadingly asserted that these ideas do not ‘involve the participation of archaebacteria’ and ‘offer no account of the obvious and extensive sequence similarity that many eukaryotic genes share with archaebacterial homologues’—egregious distortions of neomuran theory. The authors either misunderstood or misrepresented it, perhaps to promote Martin’s phylogenetically discredited hypothesis of mitochondrial origins (Martin and Müller 1998).

From the outset, neomuran theory explicitly explained the origin of shared neomuran characters absent in eubacteria as shared derived characters that arose in the neomuran stem and have been stably inherited ever since (Cavalier-Smith 1987c), as repeatedly explained in great detail (Cavalier-Smith 2002a, c, 2006a, c, 2007a, 2009, 2010d, 2014). It was designed to explain that very sharing. To imply that it denies them is nonsense. Admittedly, Devos and Reynaud (2010) were much less explicit about that, but their paper implicitly recognised a shared neomuran ancestry and did not argue that a possible relationship of eukaryotes with planctobacteria contradicts their long-established relationship with archaebacteria. It does not; that should have been recognised by any fair criticism of their paper, which clearly implied that both archaebacteria and eukaryotes could be related to planctobacteria. If the root of the overall tree of life is within eubacteria, as the neomuran interpretation always explicitly argued, eukaryotes can be both cladistically closer to archaebacteria than to any other prokaryotes and cladistically closer to Planctobacteria than to any other eubacteria as Reynaud and Devos (2011) explicitly suggested. The rest of our paper highlights major merits of this revised neomuran theory in which Planctobacteria are substituted for posibacteria in the original version as the direct eubacterial ancestors of neomura. This is the best phylogenetic interpretation of the whole tree of life and offers more gradual and mechanistically more comprehensible transitions between the three domains than any previous scenario. The next two sections apply this to archaebacterial diversification and origin, later ones to eukaryotes.

A central feature of neomuran theory was the argument that N-linked glycoproteins arose in stem neomura at the very time of murein loss and that key involvement of N-acetylglucosamine (GlucNac) in oligosaccharide linkage to glycoprotein asparagines and to oligopeptides in peptidoglycan suggests that glycoprotein synthesis in part evolved from murein synthesis relics when the stem neomuran mutationally lost muramic acid biosynthesis and consequently murein peptidoglycan (Caval