Background

Archaea, as a vital component of Earth’s biodiversity, play a significant role in biogeochemical cycles and are a key partner in the evolutionary origin of eukaryotes [1, 2]. Most archaea are difficult to cultivate, limiting our understanding of their physiological and ecological properties [3]. Advances in metagenomics and single-cell genomics have significantly extended our understanding of the diversity and potential functions of uncultured archaea [4]. More than 20 novel uncultured phyla of archaea have been described in the past decade [5], as exemplified by the discovery of the Asgard and DPANN superphyla [6, 7]. A recent study divided the DPANN superphylum into seven Candidatus phyla, including Altiarchaeota, Iainarchaeota, Micrarchaeota, Undinarchaeota, Aenigmarchaeota, Nanohaloarchaeota, and Nanoarchaeota [8]. These lineages are united by very small cell sizes and small genome sizes with limited metabolic potential and are often considered to live in association with other microbes [9,10,11].

Ca. Nanohaloarchaeota represents a typical phylum in the DPANN superphylum that was first discovered in salt lakes and was originally identified as the sister branch of Halobacteria [12]. Using enrichment experiments and transmission electron microscopy, Hamm et al. demonstrated that some Ca. Nanohaloarchaeota cannot survive without Halobacteria partners [13]. A recent study revealed that the degradation of polysaccharides, including glycogen by some Ca. Nanohaloarchaeota, suggested that polysaccharide degradation might sustain a mutually beneficial interaction with its host Halobacteria [14]. Ca. Nanohaloarchaeota were originally thought to be exclusively derived from hypersaline habitats, and the observed taxa in these environments were limited to the order Ca. Nanosalinales [15,16,17,18,19]. However, a recent study discovered a novel Ca. Nanohaloarchaeota group in deep-subsurface marine sediment ecosystems [20], indicating that Ca. Nanohaloarchaeota may inhabit a wider range of habitats than previously understood and possibly harbor new functional niches that allow them to grow in different extreme environments. Thus, the diversity, metabolic potential, symbiotic lifestyles, and evolutionary adaptations to multiple extreme environments of the Ca. Nanohaloarchaeota are not completely understood.

By applying metagenomic sequencing, we reconstructed three Ca. Nanohaloarchaeota metagenome-assembled genomes (MAGs) from stratified salt crust samples. Phylogenomic analyses suggest that they represent a new order within Ca. Nanohaloarchaeota for which the name Nucleotidisoterales is proposed under the nascent SeqCode. Genomic analysis revealed that these MAGs lack genes for polysaccharide catabolism but instead encode complete nucleotide salvage pathways, suggesting that they might occupy a novel ecological niche and have an alternative strategy to interact with symbiotic partners. Ancestral character state reconstructions demonstrated that the last Ca. Nanohaloarchaeota common ancestor was unable to catabolize polysaccharides, and that shifts of energy conservation mechanisms led to the diversification of the Ca. Nanohaloarchaeota into multiple ecologically distinct lineages. This study represents a significant advancement in our understanding of the genomic diversity, ecology, and evolution of Ca. Nanohaloarchaeota.

Results and discussion

Overview of prokaryotic community structure

The Qi Jiao Jing (QJJ) Lake is a discharge playa lake located in Xinjiang Province of China, which has a salinity of more than 30% and is primarily composed of sodium, potassium, and chloride ions [21]. The high salinity forms a natural enrichment for halophiles with low species richness ranging from 143 to 181 ASVs (Fig. 1a). The microbial communities of the eight layers of the salt crust and one water column sample hosted similar microbial communities, but the organisms differed in their relative abundances (Fig. 1a). Archaea were more abundant than bacteria in all samples, with the middle layer (QJJ5) reaching the highest archaeal relative abundance (78.8%). Halobacteria and Ca. Nanohaloarchaeota predominated in all communities with relative abundances up to 60.7% and 15.1%, respectively. Ca. Woesearchaeales (1.7–6%) was the third most abundant group of archaea. Among bacteria, Rhodothermia, Bacteroidia, Gammaproteobacteria, Deltaproteobacteria, and Clostridia were the predominant groups, all of which are commonly found in saltern lakes [22]. Interestingly, the high abundance of Oligosphaeria in layers QJJ8 and QJJ9 suggests that they prefer the salt crust/water interface. Bacteria and archaea that were unclassified at the phylum level were also present in the salt crust community, indicating that there are still unknown microorganisms in high-salt environments.

Fig. 1
figure 1

The reconstructed genomes of novel Ca. Nanohaloarchaeota. a Relative abundance of the major microbial groups in eight layers of a stratified salt crust and underlying water based on amplicon sequencing of the 16S rRNA gene. b Relative abundances of the three Ca. Nanohaloarchaeota MAGs recovered from the present study. c Phylogenetic placements of the novel Ca. Nanohaloarchaeota MAGs. The tree was constructed based on the concatenated alignment of 16 ribosomal proteins using IQ-TREE with the best model of LG + F + R9. Bootstrap values were based on 1000 replicates, and nodes with confidence > 70% are indicated as black circles

Identification of two novel orders of Ca. Nanohaloarchaeota

Three MAGs representing a novel lineage within Ca. Nanohaloarchaeota were reconstructed from the metagenomic datasets. The sizes of the assembled MAGs range from 0.62 to 0.75 Mbp with GC contents ranging from 43.8 to 52.5% (Table 1). They encode an average of 829 genes with an average gene length of 836 bp. The MAGs have high estimated completeness (87.5 to 95.8%; the occurrence frequencies of 48 single-copy marker genes used for genome completeness calculation are recorded in Additional file 1: Table S1) with up to 33 tRNAs and low estimated contamination (< 0.93%). Reads mapped to these MAGs are exclusively from the bottom six layers of the salt crust, with relatively low abundances, suggesting that they represent a rare group within the salt crust community (Fig. 1b). Given the significantly lower abundance of Ca. Nanohaloarchaeota MAGs compared with 16S rRNA-based data, we speculate that some Ca. Nanohaloarchaeota failed to assemble and/or bin into discrete MAGs, particularly for the Ca. Nanosalinales.

Table 1 Genomic features of Ca. Nanohaloarchaeota MAGs reconstructed from the Qi Jiao Jing Lake

To resolve the phylogenetic affiliation of the newly obtained MAGs, two different marker protein sets were used to construct phylogenomic trees using maximum-likelihood methods: (i) 16 ribosomal proteins (Fig. 1c) and (ii) 122 archaeal marker proteins (Additional file 2: Fig. S1). Both phylogenies are well supported and concordant with clustering patterns based on average amino acid identity (Additional file 2: Fig. S2). Based on monophyly with high bootstrap support and calculated relative evolutionary divergence (RED) values (0.57 ± 0.004; Additional file 1: Table S2), these MAGs, along with two unclassified MAGs derived previously from hypersaline environments, could be assigned into a novel order within the previously defined class Ca. Nanosalinia. Three additional publicly available MAGs obtained from the subsurface within a deep-sea hydrothermal vent area can also be classified into this class but represent a third order, herein called Nanohydrothermales. A system of nomenclature was developed under the rules of the SeqCode, with the orders Nucleotidisoterales and Nanohydrothermales proposed to encompass MAGs reconstructed from hypersaline environments and deep-sea hydrothermal vents, respectively. Under the SeqCode, the nomenclatural types for the orders are the genera Nucleotidivindex and Nanohydrothermus. In total, four families encompassing four monospecific genera are proposed. All proposed taxa are supported by phylogenetic concordance and delineated based on RED values and the average nucleotide identity cutoff of < 95% at the species level (Additional file 3).

Nucleotidisoterales may be symbionts of Halobacteria

Analysis of the metabolic potential of Nucleotidisoterales genomes showed that they have genes for DNA replication, transcription, and translation but lack biosynthetic capacity to synthesize nucleotides, amino acids, lipids, and cofactors (Fig. 2). These gaps in essential biosynthetic pathways, particularly cell membrane biosynthesis, imply that they likely have a symbiotic lifestyle [9]. Also, in support of this hypothesis are the extremely small genome sizes of all five Nucleotidisoterales genomes, 0.62 to 0.75 Mbp, which are much smaller than any known free-living prokaryotes, and the general lack of evidence for a free-living lifestyle in any other member of the DPANN superphylum. Based on the abundance and diversity of cohabiting Halobacteria and known examples of symbioses between Ca. Nanohaloarchaeota and Halobacteria, members of the Nucleotidisoterales may be obligate symbionts of Halobacteria; however, this hypothesis would need to be resolved by more incisive experiments.

Fig. 2
figure 2

Overview of metabolic potentials in three new Ca. Nanohaloarchaeota MAGs. Genes related to glycolysis, AMP metabolism, TCA cycle, pyruvate metabolism, oxidative phosphorylation, protein degradation, membrane transporters, and pili are shown. Red solid circles represent genes present in QJJ-5_bin.20. Blue solid circles represent genes present in QJJ-7_bin.66. Purple solid circles represent genes present in QJJ-9_bin.46. Green solid circles represent genes present in at least one of the two NHA MAGs. Hollow circles represent structures where not all protein subunits are present in the genome. Abbreviations: G1P, glucose 1-phosphate; G6P, glucose 6-phosphate; F6P, fructose 6-phosphate; F1,6P2, fructose 1,6-bisphosphate; G3P, glyceraldehyde-3p; 3PG, glycerate-3-phosphate; 2PG, glycerate-2-phosphate; PEP, phosphoenolpyruvate; Oxa, oxaloacetate; Cit, citrate; Iso, isocitrate; 2-Oxo, 2-oxo-glutarate; Succ-CoA, succinate-CoA; Succ, succinate; Fum, fumarate; Mal, malate; NMP, nucleoside 5′-monophosphate; R1,5P, ribose-1,5-bisphosphate; RuBP, ribulose-1,5-disphosphate; MFS, major facilitator superfamily permease; 5,10-CH = THF, 5,10-methylenetetrahydrofolate; Carbamoyl-P, carbamoyl phosphate; phosphoserine, 3-phospho-L-serine

Metabolic potential of novel MAGs

The incomplete TCA cycle and the absence of most electron transport components reveal most members of this lineage are fermentative and anaerobic, in agreement with previous analyses of other Ca. Nanohaloarchaeota genomes and all other members of the DPANN superphylum [9, 11]. Notably, QJJ-5_bin.20 encodes cytochrome c oxidase, coxB, though other subunits of complex IV including coxACD were absent. It has been reported that the presence of cytochrome oxidase encoded by some DPANN genomes may be employed to adapt to oxic or microoxic environments [9, 10]. Genes involved in the oxidative and non-oxidative pentose phosphate pathways and upper glycolytic pathway are almost completely missing (Fig. 2, detailed gene copies are recorded in Additional file 1: Table S3). However, all MAGs encode complete nucleotide salvage pathways, including AMP phosphorylase (deoA), ribose 1,5-bisphosphate isomerase (e2b2), and form-III type of ribulose 1,5-bisphosphate carboxylase (rbcL), suggesting that they could degrade adenosine monophosphate (AMP) or nucleoside 5′-monophosphate (NMP) into 3-phosphoglycerate, which could feed into the lower glycolytic pathway with ATP released [23]. Notably, none of the previously described Ca. Nanohaloarchaeota genomes has been reported to contain this pathway, despite its common presence in other DPANN archaea [9, 24].

Phylogenetic analysis showed that the RbcL proteins of Nucleotidisoterales formed two groups within the form III-B lineage (Fig. 3), both of which are mainly composed of homologs from other DPANN archaea. Given the frequent horizontal gene transfers (HGTs) among DPANN archaea and very limited distribution of RbcL in Ca. Nanohaloarchaeota [24], we infer that the common ancestor of the RbcL homologs within Nucleotidisoterales might be endowed by other DPANN. Additionally, RbcL homologs annotated in several Halobacteria genomes were also divided into two groups, and both are located as sister lineages of RbcL proteins in the Nucleotidisoterales. Based on the wide presence of RbcL proteins in DPANN archaea and the close evolutionary relationship between homologs from Nucleotidisoterales and Halobacteria, we speculate that lineage III-B evolved within DPANN and passed to Halobacteria horizontally with Ca. Nanohaloarchaeota acting as potential donors. Rampant nucleotide scavenging is well-known in Halobacteria [25]. This also exemplifies that the bidirectional genetic exchange between Ca. Nanohaloarchaeota and co-existing, and possibly symbiotic, Halobacteria is probable.

Fig. 3
figure 3

Maximum-likelihood phylogenetic tree of RbcL proteins using IQ-TREE with the best model of LG + F + R10. a Overview of RbcL proteins phylogeny with groups indicated. b and c Detailed phylogeny of two different clades of RbcL proteins found in Nucleotidisoterales; red highlights represent the Nucleotidisoterales MAGs. QJJ-9_bin45, QJJ-9_bin.161, QJJ-8_bin.5, and QJJ-5_bin.106 represent medium- or low-quality MAGs of Nucleotidisoterales reconstructed from the same metagenomic datasets and thus not described in the main text. Bootstrap values were based on 1000 replicates, and nodes with percentages > 70% are indicated as black circles

Nucleotidisoterales may obtain DNA through pili (see below) or by direct transport from other organisms and then degrade it by use of nucleases [26]. We hypothesize that Nucleotidisoterales and Halobacteria may collaborate to degrade DNA, and the resulting oligonucleotides or nucleotides can be recycled by Nucleotidisoterales via the nucleotide salvage pathway. The glycerate-3P produced could flow into the lower glycolytic pathway, leading to pyruvate. This is feasible due to the possession of all genes encoding the lower glycolytic pathway, including a 2,3-bisphosphoglycerate-independent phosphoglycerate mutase (gpmI), enolase (eno), and phosphoenolpyruvate synthase (pps). Then, acetate could be transported into acetotrophic Halobacteria, which may be potential hosts [27]. Nucleotidisoterales lack the archaeal pyruvate reductase (porAB) for the conversion of pyruvate to acetyl-CoA. Instead, the pyruvate dehydrogenase complex (pdhABCD), which is commonly detected in DPANN [9], might alternatively be employed to perform the same function. The presence of acetate-CoA ligase (acdB) suggests the capability to catalyze the reversible conversion of acetyl-CoA and ADP to acetate and ATP, which was shown to be operational in Ca. Nanohalobium constans LC1Nh [14]. Alternatively, aldehyde dehydrogenase might be adopted to produce acetate and NADH by oxidizing acetaldehyde. Thus, we suggest a potential syntrophic relationship between the putative host/symbiont partners. Under this hypothesis, symbionts in the order Nucleotidisoterales may obtain diverse nutrients from hosts, and in turn, they may provide some small molecules, such as acetate, to acetotrophic Halobacteria hosts [27, 28]. Collectively, we hypothesize that Nucleotidisoterales organisms are obligate symbionts with the capability for nucleotide fermentation. Aside from the known co-metabolism of carbohydrates by Ca. Nanosalinales and Halobacteria partners, co-degradation of extracellular DNA by Nucleotidisoterales via the nucleotide salvage pathway, coupled with the lower glycolytic pathway and acetogenesis, might represent another strategy to support mutualistic symbiosis.

Metabolism of polysaccharides and peptides

A previous study demonstrated that the pure culture Halomicrobium sp. LC1Hm is unable to grow with glycogen as the sole carbon source. Instead, the co-cultured symbiont Ca. Nanohalobium constans LC1Nh can assist the degradation of glycogen, and the released glucose supports the growth of Halomicrobium sp. LC1Hm [14]. This interaction serves as an example of how microbial consortia often have a broader metabolic capacity than pure cultures, allowing the survival of partners in the relationship despite resource fluctuations or environmental disturbances. Unlike the co-culture of Ca. Nanohalobium constans LC1Nh and Halomicrobium sp. LC1Hm where the coexistence is maintained by the utilization of different polysaccharides, none of our Nucleotidisoterales MAGs encodes genes for glycoside hydrolases (Additional file 1: Table S4), further suggesting that a novel strategy might be employed to sustain a symbiotic relationship.

Instead of polysaccharides, Nucleotidisoterales may be able to utilize proteins or peptides by use of a variety of protein-degrading enzymes for catabolic and/or anabolic purposes. Specifically, two subunits of the proteasome (psmAB), as well as several molecular chaperones (Fig. 2), were present, indicating that they have the ability to degrade damaged or misfolded proteins into oligopeptides. Different families of peptidases were detected, which could participate in the processing and transport of oligopeptides or protein turnover (Additional file 1: Table S5), such as serine peptidase (e.g., S16 and S26), metallopeptidases (e.g., M26, M43, and M103), aspartic peptidase (e.g., A26), and other peptidases (e.g., U32, T01, and N10). Moreover, six peptidases (S16, S18, M26, M43, M64, and C25) carry signal peptides (Additional file 1: Table S6), suggesting that they could degrade peptides extracellularly into oligopeptides or amino acids that could be transported by the MSF transporter or ABC-2 transporter [14] (Fig. 2). Several pathways for the catabolic use of amino acids and interconversion of amino acids have been identified, implying that proteolysis is an important way of life for Nucleotidisoterales. For example, not only may glutamate be deaminated to generate ammonia but also it may be used as a compatible solute to regulate intracellular osmotic pressure [29]. Aspartate could potentially be converted into oxaloacetate and then used to produce acetate. This indicates that oligopeptides and amino acids are likely to be important substrates for the growth of Nucleotidisoterales.

Cell-surface structures

A pioneering study suggested that a prominent feature of Ca. Nanohaloarchaeota is large genes that encode the so-called SPEARE proteins, which usually contain several domains thought to be involved in the interaction between symbionts and hosts [10]. However, no open reading frames larger than 9000 nucleotides were annotated in any of the Nucleotidisoterales MAGs, yet several small genes encoding SPEARE-like proteins were identified (Additional file 1: Table S6). Therefore, Nucleotidisoterales may use different strategies to promote cell–cell interactions. Studies have shown that pili and archaella, as well as certain surface proteins, likely contribute to cell surface attachment and interaction between symbionts and hosts [30, 31]. Unlike other Ca. Nanohaloarchaeota, none of MAGs in the present study encodes archaella. All Nucleotidisoterales MAGs have at least two type 4 tight adherence (Tad) pilus-encoding gene clusters putatively involved in pilus formation with one cluster being identical in architecture and gene organization to that found in all known Ca. Undinarchaeota (Additional file 2: Fig. S3) [26]. Specifically, VirB11 family ATPases (TadA) are possibly used to energize the assembly and disassembly of pili by hydrolyzing nucleotide triphosphates [32] TadB and TadC could potentially provide a platform for pilus assembly [33] and type 4 prepilin peptidases (TadV) used to modify prepilins. However, no pilins were annotated in these genomes, possibly due to the small size and poor conservation of pilin genes. By comparison, Ca. Undinarchaeota, which also lack annotated pilin genes, have been confirmed to synthesize pili to promote cell–cell interactions [26]. The other two types of gene clusters in the new MAGs contain two copies of kaiC but lack prepilin peptidase. The kaiC genes are possibly involved into the regulatory or modulation of type 4 pilin [34]. Studies have also shown that several DPANN archaea may interact with hosts and import DNA into cells via pili [26, 31]. Consequently, we hypothesize that the pili of Nucleotidisoterales are not only used for communication with the host but also for nutrient transport, similar to all known Ca. Undinarchaeota [26]. In addition, signal-peptide-containing proteins, including LamG domain-containing proteins, glycosyltransferases, S-layer family proteins, and several hypothetical proteins, were identified (Additional file 1: Table S7), which could be transported outside by the Sec-SRP secretion system to promote cell–cell interactions [9]. Metal ion transporters were detected, albeit at low abundance, except for QJJ-5_bin.20, which contains five different metal transporters (Fig. 2). This includes several magnesium transporters, which have been experimentally confirmed to support the cell growth of Ca. Nanohaloarchaeota due to their reliance on high cytoplasmic magnesium [14].

Environmental adaptations

To sustain isoosmosis with the surrounding environment, different strategies might be used to maintain proper intracellular osmotic pressure. Salt-in is a key strategy used by Halobacteria, which leads to the proteomic enrichment of acidic and hydrophilic amino acids [35, 36]. Calculations of the average isoelectric points (pIs) of all protein-encoding genes yielded very low median pI values (average 4.5), confirming the salt-in strategy adopted by Nucleotidisoterales, similar to Ca. Nanosalinales [14] (Fig. 4a). In contrast, Nanohydrothermales MAGs possess more basic proteomes with high median pI values (average 8.94) [37]. Detailed investigation of amino acid usages revealed substantial differences between Nucleotidisoterales and Nanohydrothermales (Fig. 4b). The former possesses a high excess of surficial acidic amino acids (Glu, Asp), which are able to enhance hydration to keep the proteins in solution [38]. Moreover, these negatively charged amino acids bind to specific cations (e.g., Na+ and K+) to maintain structural stability and enzyme activity [39, 40]. Nanohydrothermales MAGs have amino acid compositions similar to those of thermophilic Ca. Aenigmarchaeota (Fig. 4b). The charged amino acid lysine is enriched, but uncharged polar amino acids are in low relative abundances (Ser, Thr, Gln, and Asn) [41]; lysine methylation could enhance protein stability under high-temperature conditions [42].

Fig. 4
figure 4

Genomic differences between Nucleotidisoterales and Nanohydrothermales. Five MAGs from Nucleotidisoterales and three MAGs from Nanohydrothermales are taken into consideration for comparative genomics. a The isoelectric points (pIs) of the proteins of MAGs from the two orders (see “Materials and methods” for detailed calculation of pI). b Amino acid usage of MAGs, as well as thermophilic Ca. Aenigmarchaeota. c The Venn diagram indicates differences between the two orders at the KO level. d Isoelectric points of Nucleotidisoterales-unique proteins. e Isoelectric points of Nanohydrothermales-unique proteins. f Isoelectric points of proteins shared between Nucleotidisoterales and Nanohydrothermales. g Functional distribution of acidic amino acids at the KEGG category level. Blue bars represent unique genes of Nucleotidisoterales. Light blue bars represent shared genes of Nucleotidisoterales. Red bars represent shared genes of Nanohydrothermales

Comparative genomics based on KEGG annotation revealed remarkable differences between the two orders (Fig. 4c). More genes could be assigned to KOs in Nucleotidisoterales compared with Nanohydrothermales, and the former group harbors more unique genes than the latter. In contrast, shared genes comprise 84.6% of Nanohydrothermales genomes. Both unique and shared genes within respective groups exhibited a similar pattern of pI values (Fig. 4 d–f). Interestingly, regardless of whether the genes are unique or not, Nucleotidisoterales MAGs are enriched in acidic amino acids with relatively low molecular weights. Unique genes with low pI in Nucleotidisoterales MAGs are involved in carbohydrate, amino acid, nucleotide, and energy metabolisms. In contrast, shared genes in Nucleotidisoterales exhibiting low pI values were enriched in translation, nucleotide metabolism, and replication and repair (Fig. 4g). Interestingly, most of these shared genes with low pI values in Nucleotidisoterales are basic with pI values > 7 in Nanohydrothermales, demonstrating divergent evolution of these genes to adapt to their distinct habitats. Collectively, the enriched genes and amino acid patterns appear primarily determined by the distinct physicochemical environments harboring these two orders.

Evolution of carbohydrate metabolism in Ca. Nanohaloarchaeota

A reassessment of the evolutionary history of Ca. Nanohaloarchaeota is necessary due to the discovery of Nucleotidisoterales described here and the recently discovered Nanohydrothermales from deep-sea hydrothermal vents. Including all Ca. Nanohaloarchaeota MAGs available in public databases, the reconstructed phylogeny reveals the deep-branching position of the Nanohydrothermales and a sister group encompassing Nucleotidisoterales and Ca. Nanosalinales (Fig. 5). Due to the demonstrated importance of polysaccharide utilization for the establishment of symbiotic relationships in some members of the Ca. Nanosalinales, genes involved in carbohydrate metabolism along with nucleotide metabolism were considered for evolutionary history inference. The reconstructed evolutionary history based on carbohydrate-related metabolisms revealed different evolutionary trajectories among different orders within the Ca. Nanohaloarchaeota that inhabit thermal and high-salt habitats. Specifically, the metabolic characteristics of Nanohydrothermales are distinct from other members of the phylum because of lacking pathways for sugar catabolism. The lower glycolytic pathway appears to be fundamental and widely distributed across both orders inhabiting high-salt environments. Specifically, the eno and pdhABCD genes were likely already present at the ancestral node of Nucleotidisoterales and Ca. Nanosalinales, and other genes, including gpmI and pps, are also found in the early stages of Nucleotidisoterales lineages (Fig. 5). Functional differentiation occurred after the two lineages diversified. In particular, genes encoding proteins involved in polysaccharide metabolism, such as glycogen debranching enzyme (AGL) and alpha amylase (amy), appear to be acquired features in Ca. Nanosalinales, which was critical for the evolution of symbiotic polysaccharide metabolism between the hosts and symbionts as reported [14]. Phylogenetic analysis suggests that Ca. Nanosalinales may have acquired these genes from other DPANN archaea via HGT (Additional file 2: Figs. S4 and S5). Meanwhile, Ca. Nanosalinales evolved the upper glycolysis pathway mostly driven by HGT, facilitating the degradation of polysaccharides, whose products then flow into a complete glycolysis pathway to conserve energy. Genes associated with carbohydrate metabolism were mostly acquired horizontally, suggesting the inability of the last Ca. Nanohaloarchaeota common ancestor to metabolize saccharides to obtain free energy. However, none of our Nucleotidisoterales MAGs harbors genes for the complete polysaccharide metabolism or the upper glycolytic pathway. In contrast, the nucleotide salvage pathway, conferring microbes with the ability to harvest energy by degrading nucleotides, is exclusively detected in Nucleotidisoterales within Ca. Nanohaloarchaeota. The three key genes (rbcL, deoA, and e2b2) involved in this pathway seem to be inherited vertically from their common ancestor, with few HGT events (Fig. 5). However, we cannot rule out the possibility that the common ancestor of Nucleotidisoterales may acquire these genes via inter-phylum HGT. This is probable which could be exemplified by the aforementioned evolution of rbcL gene. Ca. Nanohaloarchaeota thus differentiated into separate branches based on the different strategies for energy conservation. Ca. Nanosalinales conserve energy by degrading starch coupled with complete glycolysis and fermentation, whereas Nucleotidisoterales conserve energy via nucleotide salvage coupled with lower glycolysis. The nucleotide salvage pathway is much more energy efficient than the upper glycolysis pathway because it can produce two more ATPs per reaction. However, a study found that the activity of AMP phosphorylase and ribose-1,5-bisphosphate isomerase increased only at higher substrate concentrations, preventing the excessive degradation of intracellular nucleotides [43]. Additionally, glycolysis is capable of rapidly supplying energy [44], and likely represents a more effective manner to support cell growth. This could manifest in the much higher abundance of Ca. Nanosalinales than Nucleotidisoterales in the community studied here.

Fig. 5
figure 5

Evolutionary history reconstruction regarding carbohydrate metabolism and nucleotide salvage pathway in Ca. Nanohaloarchaeota. a The inferred gain and loss events related to carbohydrate metabolism and nucleotide salvage pathway in Ca. Nanohaloarchaeota. b The presence and absence of genes in each Ca. Nanohaloarchaeota MAG. Originations indicate either de novo gene birth events or inter-phylum HGTs. NHA represents Ca. Nanohaloarchaeota archaeon NHA-2 and Ca. Nanohaloarchaeota archaeon NHA-4

Conclusion

We provide a comprehensive analysis of the potential metabolism, ecology, and evolution of a novel order of Ca. Nanohaloarchaeota, Nucleotidisoterales. Unlike other Ca. Nanohaloarchaeota, Nucleotidisoterales are unable to degrade polysaccharides. Instead, genomic analysis suggests they can recycle and degrade nucleotides and proteins for anabolism and energy conservation, suggesting that they occupy different ecological niches. This is the first description of the nucleotide salvage pathway in Ca. Nanohaloarchaeota genomes, and phylogenetic analysis revealed that Nucleotidisoterales are possible donors of rbcL genes to Halobacteria. Comparative genomic analysis revealed remarkable differences between Nucleotidisoterales and thermophilic Nanohydrothermales, including amino acid usage, suggesting different physicochemical environments selected for distinct proteomes to thrive in different extreme environments. Evolutionary history analysis suggested that the last Ca. Nanohaloarchaeota common ancestor was unable to metabolize polysaccharides for energy conservation, and that later functional differentiation with respect to energy harvesting led to the diversification of Ca. Nanohaloarchaeota. Overall, the findings provide deeper insights into the understanding of the metabolic functions and the evolutionary history of the important but poorly studied phylum Ca. Nanohaloarchaeota.

Materials and methods

Sampling, DNA extraction, 16S rRNA amplicon, and metagenomic sequencing

A stratified salt crust ~ 14 cm thick was sampled from the surface of the Qi Jiao Jing (QJJ) Lake in Xinjiang Province of China (91.5881°E, 43.3806°N), in September 2019. The crust was characterized by distinct colored layers due to photosynthetic pigments and different mineral phases, and the bottom layer was at the interface with the underlying water (Additional file 2: Fig. S6). The salt crust was dissected into eight distinctive layers based on differences in color (QJJ1-8), and one sample from the underlying salt water was also collected (QJJ9). All nine samples were placed into 15 ml sterile tubes and stored in liquid nitrogen during shipment to the lab. DNA was extracted within 48 h as described previously [45]. The standard primer 515F-806R for the V4 region of the 16S rRNA gene was used for DNA amplification [46, 47]. High-throughput sequencing was performed on an Illumina MiSeq 2500 platform to generate 250 bp paired-end reads. Separately, a library with an insert size of ~ 400 bp was constructed from the total genomic DNA and was sequenced using the Illumina HiSeq 2500 instrument, generating ~ 36 Gbp (2 × 150 bp) raw data for each sample.

Analyses of 16S rRNA gene amplicons

The adapters and low-quality reads were removed by cutadapt v1.18 [48]. Clean data were processed according to the recommended tutorial in the QIIME2 program (2020.7) [49]. In brief, sequences were merged into amplicon sequence variants (ASVs) after demultiplexing, joining, filtering, and denoising. ASVs were taxonomically identified using the QIIME2 classifier by searching against the SILVA v132 database [50].

Processing metagenomic reads, assembly, and binning

All metagenomic raw reads were filtered by Sickle v1.33 (https://github.com/najoshi/sickle) with the parameters “-q 15 -l 50.” High-quality reads were assembled using SPAdes v3.12 [51] with the parameters “-k 21, 33, 55, 77, 99 -meta.” Genome binning was conducted on scaffolds with lengths ≥ 2500 bp using MetaBAT2 with default parameters [52]. The taxonomy of MAGs was obtained based on the GTDB database using GTDB-Tk v1.7.0 [53, 54]. Three MAGs belonging to Ca. Nanohaloarchaeota were retained for further analysis. The MAGs were evaluated for completeness by calculating the proportion of detected markers among 48 single-copy genes [55]. Contamination was assessed using CheckM v1.1.3 [56]. To optimize MAG quality, clean reads for each MAG were recruited using BBMap v38.92 (http://sourceforge.net/projects/bbmap/) with the parameters “minid = 0.97, local = t.” Then, MAGs were reassembled by SPAdes v3.12 based on the mapped reads with the following parameters: “–careful -k 21, 33, 55, 77, 99, 127.” To improve the accuracy of genome bins, all bins were manually examined to remove contamination. Specifically, scaffolds were treated as contamination and were discarded if they contained duplicate markers that were phylogenetically discordant with other Ca. Nanohaloarchaeota, and their read depths were discordant with other scaffolds in the same bin. The relative abundance of each MAG in each sample was calculated by calculating the proportion of reads mapped to each MAG against all preliminary genomes generated via MetaBAT2.

Functional annotation of Ca. Nanohaloarchaeota MAGs

All the available MAGs belonging to Ca. Nanohaloarchaeota were downloaded from NCBI, IMG, and other sources (Additional file 1: Table S2) [19]. MAGs with completeness < 50% and contamination > 10% were discarded. Pairwise average amino acid identity among MAGs was calculated as the mean identity of reciprocal best BLAST hits (E-value < 1e-5). rRNAs and tRNAs were identified using RNAmmer v1.2 and tRNAscan-SE v2.0.2, respectively [57, 58]. Putative protein-coding sequences were predicted using Prodigal v2.6.3 with the “ -p single” parameter [59]. Subsequently, predicted genes were annotated against KEGG, NCBI-nr, and eggNOG databases using DIAMOND (E-values < 1e-5) [60]. Carbohydrate-active enzymes were annotated using the carbohydrate-active enzymes (CAZy) database [61]. Peptidases were identified by BLAST searching against the MEROPS database [62]. SignalP-4.1 was used to predict signal peptides and the localization of the enzymes [63]. The isoelectric point of the proteins was calculated using IPC v1.0 [64].

Phylogenetic analyses

Two different marker gene sets were used to analyze relationships between members of the Ca. Nanohaloarchaeota. First, sixteen ribosomal protein sequences (L2, L3, L4, L5, L6, L14, L15, L16, L18, L22, L24, S3, S8, S10, S17, and S19) were selected to reconstruct phylogenomic relationships [65]. These sequences were identified by AMPHORA2 [66] and aligned using MUSCLE v3.8.31 by iterating 100 times [67]. Poorly aligned regions were eliminated using TrimAl v1.4 with the parameters “-gt 0.95 -cons 50” [68]. Then, multiple alignments were concatenated using a Perl script (https://github.com/nylander/catfasta2phyml). Second, a multiple sequence alignment of 122 archaea-specific conserved marker genes generated by GTDB-Tk v1.7.0 [53] was used for phylogenetic analysis. To build a phylogenetic tree of the ribulose 1,5-bisphosphate carboxylase (RbcL), reference protein sequences were obtained from a previous study and the NCBI-nr database [24]. RbcL proteins belonging to Halobacteria were identified from metagenomic data in the present study and were integrated to assess the evolution of this gene. All RbcL protein sequences within Halobacteria were clustered using CD-HIT v4.8.1 with the parameters “-c 0.95 -n 10 -G 0 -aS 0.9 -g 1 -d 0 -T 20” [69]. IQ-TREE v1.6.12 was applied to reconstruct maximal-likelihood phylogenetic trees with the following parameters “-alrt 1000 -bb 1000” [70]. Phylogenetic trees for the glycogen debranching enzyme and alpha amylase were generated similarly. All the tree files were uploaded to iTOL for visualization and annotation [71].

Evolutionary analysis

Protein families were obtained by applying the MCL algorithm (v14–137) to all Ca. Nanohaloarchaeota genomes [72]. Individual phylogenetic analyses for each protein family were constructed using the methods described above. To address the evolutionary history of Ca. Nanohaloarchaeota, gene gain and loss events were inferred by reconciling the topology difference between species tree and protein trees using ALE v1.0 [73]. ALEobserve was used to calculate the conditional clade probabilities from bootstrap samples, and 100 reconciliations with the species tree were sampled by ALEml_undated [74]. We used auxiliary scripts (https://github.com/Tancata/phylo/tree/master/ALE) to parse ALE outputs. A threshold of 0.3 was applied to the raw reconciliation frequencies of ALE output to judge whether an evolutionary event occurred or not [75]. If the gene copy parameter was greater than 0.3, the gene was considered to be present. Since noise from alignments and tree reconstructions can reduce the signal of some true events, this threshold is necessary [75].