Multigene phylogeny and cell evolution of chromist infrakingdom Rhizaria: contrasting cell organisation of sister phyla Cercozoa and Retaria

Infrakingdom Rhizaria is one of four major subgroups with distinct cell body plans that comprise eukaryotic kingdom Chromista. Unlike other chromists, Rhizaria are mostly heterotrophic flagellates, amoebae or amoeboflagellates, commonly with reticulose (net-like) or filose (thread-like) feeding pseudopodia; uniquely for eukaryotes, cilia have proximal ciliary transition-zone hub-lattices. They comprise predominantly flagellate phylum Cercozoa and reticulopodial phylum Retaria, whose exact phylogenetic relationship has been uncertain. Given even less clear relationships amongst cercozoan classes, we sequenced partial transcriptomes of seven Cercozoa representing five classes and endomyxan retarian Filoreta marina to establish 187-gene multiprotein phylogenies. Ectoreta (retarian infraphyla Foraminifera, Radiozoa) branch within classical Cercozoa as sister to reticulose Endomyxa. This supports recent transfer of subphylum Endomyxa from Cercozoa to Retaria alongside subphylum Ectoreta which embraces classical retarians where capsules or tests subdivide cells into organelle-containing endoplasm and anastomosing pseudopodial net-like ectoplasm. Cercozoa are more homogeneously filose, often with filose pseudopodia and/or posterior ciliary gliding motility: zooflagellate Helkesimastix and amoeboid Guttulinopsis form a strongly supported clade, order Helkesida. Cercomonads are polyphyletic (Cercomonadida sister to glissomonads; Paracercomonadida deeper). Thecofilosea are a clade, whereas Imbricatea may not be; Sarcomonadea may be paraphyletic. Helkesea and Metromonadea are successively deeper outgroups within cercozoan subphylum Monadofilosa; subphylum Reticulofilosa (paraphyletic on site-heterogeneous trees) branches earliest, Granofilosea before Chlorarachnea. Our multiprotein trees confirm that Rhizaria are sisters of infrakingdom Halvaria (Alveolata, Heterokonta) within chromist subkingdom Harosa (= SAR); they further support holophyly of chromist subkingdom Hacrobia, and are consistent with holophyly of Chromista as sister of kingdom Plantae. Site-heterogeneous rDNA trees group Kraken with environmental DNA clade ‘eSarcomonad’, not Paracercomonadida. Ectoretan fossil dates evidence ultrarapid episodic stem sequence evolution. We discuss early rhizarian cell evolution and multigene tree coevolutionary patterns, gene-paralogue evidence for chromist monophyly, and integrate this with fossil evidence for the age of Rhizaria and eukaryote cells, and revise rhizarian classification. Electronic supplementary material The online version of this article (10.1007/s00709-018-1241-1) contains supplementary material, which is available to authorized users.

For Helkesimastix (grown in ASW1.0 with the addition of diluted LEMCO broth) and Micrometopion (with Procryptobia): cells were harvested as described above but spun down in prechilled 1.5 ml microcentrifuge tubes at 127,000 g for 32 sec at room temperature. Total RNA was then extracted from the cell pellets as above.
For Nudifila and Sandona: cultures were harvested as above but cells were spun down in pre-chilled 1.5ml microcentrifuge tubes at 112,000g for 10 minutes, and the cell pellet was chromists. This is the ML tree corresponding to the well consensus PhyloBayes CAT tree in Fig. 3 showing support for 100 fast bootstraps. The scale bar represents 0.2 substitutions per site.
Cercomonas clavideferens is now called Neocercomonas clavideferens. Note that bootstrap support drops from 100 to 84% for the Hacrobia/Harosa bipartition compared with Fig This ML tree uses the same method as Fig, S3 and differs only by excluding the corbihelian microhelid cryptist Microheliella (as did Burki et al. 2016). In marked contrast to Burki et al. (2013Burki et al. ( , 2016, but as in Cavalier-Smith et al. (2015) Telonema and Picomonas group together as a fairly strongly supported Corbistoma clade that is weakly sister to the other Cryptista (Palpitia plus Rollomonadia). That might partly be because we added numerous sequences for Telonema and Picomonas not included by Burki et al. (2013). Comparison of Figs S2 and S4 without Microheliella with Fig. S3 that included Microheliella shows that excluding Microheliella (as in Burki et al. 2016) also reduces support for the Corbihelia clade and for clades Cryptista sensu Cavalier-Smith et al. (2015) and Haptista, and to a lesser extent for the Hacrobia/Harosa bipartition. Haptista proteins and 31 Rhizaria. Rhizarian tree rooted between Cercozoa and Retaria as shown by Figs 2, 3, and S1-3 with close outgroups. Note that in this consensus tree for two chains (maxdiff 0.290668; topology identical in both chains) tree and in Fig. S5 although the bipartition between Cercozoa and Retaria is maximally supported there are some marked differences in basal branching order within the two phyla compared with trees that include halvarian and hacrobian outgroups (Figs 2, 3, S1-3). In principle including such outgroups should provide more information for algorithms to reconstruct ancestral sites more accurately so ought to be more evolutionarily correct so long as outgroups are not too distant or excessively long-branches. Cercomonas clavideferens is now called Neocercomonas clavideferens. Scale bar is 0.3 substitutions per site. Scoble and Cavalier-Smith 2014). Two chains were run which converged well to the same topology (Maxdiff 0.190343, meandiff 0.00225341); 38,865 trees were summed after removing the first 3000 as burn in. This is the most comprehensive rhizarian rDNA tree to date and for Monadofilosa is largely similar to that of Scoble and Cavalier-Smith (2014), which was not rooted correctly because of the absence of Reticulofilosa and Retaria. For the first time it includes Helkesida and Ventricleftida in the same tree; helkesids were excluded by Cavalier-Smith and Scoble (2014) because of their exceptionally long branch. Support values for some major groups are shown in larger type; for these the posterior probability for the CAT trees are on the left and for the corresponding RAxML GTR-Γ tree on the right (Fig. S9 shows the complete ML tree).    all respects they were the same as Fig. S8. One key difference was that in the 481-taxon consensus tree Radiozoa did not branch within Endomyxa but were sister to Endomyxa plus Filosa: Endomyxa was a clade with 0.24 support (its topology differed by having environmental DNA sequence FN598385 as deepest diverging rather than as sister to the Gromiidea/Ascetosporea subclade). This contradiction confirms that rDNA cannot clearly establish the branching order of these three major rhizarian groups. Novel clades 10 and 12 formed a joint clade on the 481-taxon tree with weak support (0.52) that was weakly sister to Endomyxa with insignifacnt support (0.18). Previously using only 148 sequences, 1539 positions Howe et al. (2011) put this joint clade as weakly sister to Filosa (0.59 Mr Bayes, 40% RaxML), but that tree included no non-Rhizaria so its root was simply assumed to be between Radiozoa and Endomyxa, which our multiprotein trees show is probably incorrect -it therefore had no objective way of establishing the rhizarian root and therefore whether clade 10/12 is sister to Filosa (as the rooting of Howe et al. (2011) Fig. 1 assumes) or to Retaria (as our CAT Figs S8 and S10, S11 all strongly show) or to Endomyxa (as in the 416-taxon consensus tree). As support for the position of clade 10/12 is so weak, any of these three could be correct. The most important unresolved questions for the deep phylogeny and early evolution of Rhizaria is what is the phenotype of the putative clade 10/12 and which of these three topologies is correct, for which multigene trees are essential. are broken into three, not two clades (and Chlorarachnea do not group with any of them but are lower than two). Despite these insignificantly supported contradictions, topological agreement between ML and CAT is generally good even though the basal branching order of both is very weakly supported compared with our protein trees.   Fig. S8 suggests that adding the long-branch Sainouroidea artefactually pushed Verrucomonas one node higher but was otherwise harmless to topology in these very taxon-and position-rich site-     Supplementary Fig. S12. 18S rDNA PhyloBayes CAT-GTR-Γ (4 rate classes) phylogeny for 316 Monadofilosa plus EF024169, an especially short branch representative of Granofilosea (the closest outgroup) to root the tree, using 1790 positions (5 more than the 1785 in the previously most taxon-rich 273-taxon Monadofilosa-only tree: Scoble and Cavalier-Smith 2014). The two chains converged well (Maxdiff 0.143037); 153,615 trees were summed after removing the first 28,560 as burn in. Support values are posterior probability; for key clades only, bootstrap percentages for the corresponding ML tree (Fig. S14) are added on the right. Fig. S13 is a CAT tree for the same taxa minus Cholamonas excluded because of its much longer branch.  percentages on the right for a few key clades from the corresponding RAxML tree: Fig. S15). This is the only one of the four Monadofilosa-only trees where Cercomonadidida are sisters of Paracercomonadida. In none of them are paracercomonads sister to Krakenida or Krakenia.

S9
Imbricatea are a clade by CAT but not by ML. Compared with Fig. S12 where Cholamonas was present, the two long-branch eSarcomonad sequences have jumped into pansomonads to join sequence AY620690, which in the Fig. S15 ML tree jumped out of pansomonads/glissomonads into eSarcomonads, lowering support values for both. That jumping of AY620690 into eSarcomonads was also seen on the ML tree without Cholamonas (Fig. S15) so is a systematic difference between ML and CAT.   (Fig. S12) Imbricatea appear as paraphyletic, not a clade, and Metromonadea are not the deepest clade, both being contradicted by the multiprotein trees, consistent with the general greater accuracy of the evolutionarily more realistic CAT model. Rooted by a short-branch granofilosean as outgroup.  excluding helkesid Cholamonas (next page). Rooted by a short branch granofilosean as outgroup.

Figure S14
In contrast to the corresponding CAT tree (Fig. S13)   weakly raises the psiibility that Ventricleftida might be paraphyletic ancestors of Helkesida, that appears to be ruled out by our Monadofilosa-only trees (Figs S12-S15) which appear to be more accurate and less affected by long-branch problems (see discussion below in (b)). In Fig. S8 Verrucomonas does not group with Ventrifissura but one node higher, insignificantly sister to Sarcomonadea alone. When all Helkesida are excluded (Fig. S11) . S9) it is still sister to Glissomonadida. It is the major sarcomonad clade with no known phenotype, but our CAT Monadofilosa-only trees suggest that it may be related to Kraken (Figs S12, S13), three other more sparsely represented deep branching ventrifilosan environmental DNA lineages (rectangles) were not previously assigned to orders. AB505573 from a moss pillar in an Antarctic lake that is weakly sister to Glissomonadida (Figs S8, S10) or within Ventricleftida (Figs S9, S11); in Monadofilosa-only trees it is sister to Verrucomonas within Ventricleftida (Fig. S12) or sister to Ventrocleftida and thus likely a ventricleftid. A fresh water clade from a stream or lake (AY620304, AB695519) is deeper branching, sister to Ventrifilosea plus Helkesea by CAT (Fig.   S8, S10, S11 and in the 481-taxon tree) so likely of more novel phenotype, but is consistently extremely weakly sister to cercomonads, glissomonads and Discomonas by ML (e.g. Fig. S9).
However, AY695519 was recently found to be a Kraken sequence (Dumack et al. 2016); our Monadofilosa-only trees confirm that and show that AY620304 is also a Kraken (Fig. S12). Finally, AB505500 from a deep-sea cold seep is consistently weakly sister to Thecofilosea by CAT (Figs S8, S10, S11) or to Spongomonadida by ML (Fig. S9); the Monadofilosa trees place it with or within Thecofilosea, likely its correct position. Thus none of the deepest branching cercozoan environmental sequences are likely to represent novel classes, though some might be new orders.
The only major contradiction between the Figs S8-11 18S rDNA trees that include at least one helkesid and our protein trees is that with 18S rDNA all cercomonads (Cercomonadida, Paracercomonadida) form a single near maximally supported clade so Pediglissa is not a clade.
However the Monadofilosan-only trees (Figs S12-15) mostly do not group both cercomonad orders together and thus are more consistent with the protein trees.
One feature of all these rather comprehensive Cercozoa-wide rDNA trees is that classical imbricates as sister to euglyphids plus Nudisarca (Fig S12, S13), so Table 1 leaves it in Imbricatia.
Ascetosporea invariably branch strongly within Gromiidea as sisters of Gromia.

(b) New Monadofilosa-only trees (Figs S12-S15)
Comparison of Fig. 12 with Fig. S11 that excludes all three Helkesida and Fig. S10 that excludes only the two longest branches (Helkesimastix and Sainouron) shows that the rest of the tree is almost identical whether or not these rapidly evolving taxa are present, so their inclusion is not significantly distorting for PhyloBayes CAT when this many taxa and nucleotide positions are included.
In marked contrast to Figs S8-11 Rhizaria-wide trees and many published trees that included the long-branch Rhizaria and Chlorarachnea as outgroups, Metromonadea is a robustly supported clade (0.97) in Fig. S12 S14) also gave no support for Kraken being related to paracercomonads but grouped it with insignificant support (10%) with the imbricate order Discomonadida and Metromonadea, whereas eSarco was in a deep unresolved position together with environmental DNA AY620290 as sister to Glissomonadida. As AY620290 consistently branches strongly within pansomonad glissomonads on our site-heterogeneous trees, we suggest its attraction to eSarco artefactually causes this clade to group with Glissomonadida rather than Kraken on ML trees. AY620290 has a third of the molecule missing and may be hard to place for that reason and might even be an unrecognised chimaera of a glissomonad and an eSarco sequence.
Further evidence of such likely misleading attraction caused by conflicting signals in AY620290 is given by Fig. S13 that excluded the helkesid Cholamonas; in that tree Kraken still groups with the original short branch eSarco sequences (which are essentially full length) but with insignificant support (0.43), whereas the two long-branch eSarco sequences (both with one third missing) move into pansomonads as sister to AY620290 (insignificant 0.47 support). In the corresponding ML tree   S15). Without Cholamonas paracercomonads plus two deep branching sequences likely to be also paracercomonads (a clade in ML, but not CAT) grouped with Cercomonadida with moderate (0.83) support by CAT only (Fig. S13). Even though the Fig. S12 CAT chains were run for a very long time they did not stably converge to precisely the same topology but differed in one respect: one was exactly as the Fig. S12 consensus; in the other the cercomonad/helkesid clade moved one node to be sister to paracercomonads. Part way through the Fig. S12 run maxdiff dropped to a lower value of 0.0997067 at which point the consensus also showed that second topology with the cercomonad/helkesid clade being insignificantly (0.32) sister to paracercomonads. This means that 18S rDNA has insufficient information to determine the relative positions of paracercomonads and cercomonads confidently. Therefore the fact that on some trees with some methods and taxon samples they group together is not a good reason for questioning the evidence from the multiprotein trees that they are not sisters, and does not contradict their classification here as two distinct orders.
The fact that in certain respects these monadofilosan-only trees agree more with the multiprotein trees than do the Rhizaria-wide trees indicates that inclusion of distant long-branch outgroups does distort some weakly supported branching orders. However the fact that the branching order within Thecofilosea on Figs S12-15 is essentially the same as in Figs (2007) who provided one of the core alignments used as an initial basis for assembling our alignments (see Cavalier-Smith et al. 2014) this gene was called rpl7A in our previous papers (Cavalier-Smith et al. 2014, 2015a,b, 2016; we change its name here to rpl7 to avoid confusion with a protein most commonly called rpl7ae that used to be called L8 in yeast and confusingly L7A in humans (new universal name eL8 as it is absent from many eubacterial lineages; this protein is not included in any alignments in this paper or in our preceding 4 papers).