Background

Bathyarchaeota, formerly named MCG (Miscellaneous Crenarchaeotal Group) [1], is a newly proposed archaeal phylum within the TACK (Proteoarchaeota) superphylum [2,3,4]. It is a cosmopolitan phylum, inhabiting various anoxic environments, such as groundwater, paddy soil, hot spring, salt marsh sediments, estuary, mangrove sediments, seafloor, and hydrothermal sediments [5,6,7,8,9,10,11]. It is also one of the most numerous archaeal groups in the marine sub-seafloor, estimated to have 2.0–3.9 × 1028 cells in the global ecosystem [3, 12]. The ubiquity and high abundance suggested that Bathyarchaeota might play a role in the global biogeochemical cycle [13]; however no pure cultures of Bathyarchaeota have been successfully established. Recently, an enrichment of Bathyarchaeota was obtained, suggesting the utilization of lignin as an energy source and bicarbonate as a carbon source by subgroup 8 (Bathy-8), yet more metabolisms need to be explored [14].

Based on the analysis of metagenome-assembled genomes (MAGs) and single-cell genomes (SAGs), Bathyarchaeota has been implicated to have potential abilities for CO2 fixation with Wood-Ljungdahl pathway, acetogenesis, methane metabolism, and degradation of peptides, fatty acids, aromatic, and other organic compounds [2, 3, 14,15,16,17], suggesting Bathyarchaeota may play an important role in the global carbon cycle. At least 25 subgroups have been identified in Bathyarchaeota based on the phylogenetic analyses of 16S rRNA genes [13], and many subgroups display distinct environmental preferences implicating diversification and adaptation to unique environmental conditions [6, 18,19,20,21]. Thus, the current information is too limited to comprehensively understand the metabolic capacities of Bathyarchaeota and its role in the geochemical cycle.

Bathyarchaeota are the most abundant archaeal phylum in the mangrove and mudflat sediments of Futian Nature Reserve (Shenzhen, China) and Mai Po Nature Reserve (Hong Kong, China) [6, 22]. Thus, following those studies, the total DNA and RNA of sediment samples from these two places were sequenced for constructing genomes and transcriptomes of Bathyarchaeota, respectively. Together with all available bathyarchaeotal MAGs in the public database (including the dozens of MAGs released lately [23]), we aimed to (1) search for the new metabolisms of Bathyarchaeota; (2) compare metabolic potentials among bathyarchaeotal subgroups; and (3) further predict the roles of Bathyarchaeota in the geochemical cycle.

Results and discussion

Genome construction and transcriptome

In total, eight layers in three sediment profiles from two habitats were selected for metagenomic and metatranscriptomic sequencing (Figure S1; details of the samples and sequencing are listed in Table S1). Raw DNA reads were trimmed, de novo assembled, and binned to obtain multiple MAGs. Among them, bathyarchaeotal MAGs were picked out and combined with reference bathyarchaeotal genomes to form a database, then short DNA reads of Bathyarchaeota were recovered by remapping DNA reads of all samples to the genome database. Finally, nine bathyarchaeotal MAGs were constructed by de novo assembling bathyarchaeotal reads and subsequent binning. All bathyarchaeotal MAGs ranged from ~ 0.6 to ~ 1.9 Mb in size, 34.68–58.90% G+C content, and estimated completeness (based on the presence of single-copy genes) of 58.03–95.33% (Table S2).

Phylogenetic analyses of 16 ribosomal proteins were conducted with all available bathyarchaeotal MAGs (91 reference genomes from database and 9 MAGs from this study; Fig. 1a) and high-completeness MAGs (containing all 16 ribosomal proteins; 22 reference genomes from database and 6 MAGs from this study; Figure S2), both of results show similar structure, confirming the valid subgroup assignments of bathyarchaeotal MAGs. Taken together with phylogenetic analysis of 16S rRNA genes and average nucleotide identity results (Fig. 1b, c), nine bathyarchaeotal MAGs in the current study were believed to belong to Bathy-6 (4 MAGs), -8 (2 MAGs), -15 (2 MAGs), and -17 (1 MAG), respectively. These four subgroups were also proved to be the major bathyarchaeotal subgroups in the previous reports of archaeal communities in both mangrove habitats [6, 22].

Fig. 1
figure 1

Subgroup assignment of bathyarchaeotal genomes by a phylogenetic tree based on 16 ribosomal proteins, b phylogenetic tree of 16S rRNA genes, and c average nucleotide identity of genomes. Two red lines represent two bathyarchaeotal MAGs with 16S rRNA genes in the current study. The scale bar indicates the average number of amino acid or nucleotide substitutions per site. The sequences were aligned independently using MUSCLE, columns with more than 95% gaps were trimmed using trimAL. The maximum likelihood trees of 16S rRNA gene and 16 ribosomal proteins were built using RAxML 8.0, the number of bootstraps was 1000, and the evolutionary models were GTRCAT (for 16S rRNA gene) and LG+GAMMA (for ribosomal protein), respectively

The coverages of metagenome and transcriptome to each MAG are shown in Figure S3 and Table S3. Similar to the bathyarchaeotal abundance in the mangrove and seafloor sediments using 16S rRNA gene sequencing [6, 18, 22], the metagenomic coverages of all MAGs were increased along with the sediment depth, with RPKM value from 0 (MF-5.3.1.4 in SZ_1) − 0.058 (MF-3.4 in SZ_1) in the surface to 0.017 (MF-10.5.5.11.1.24 in Maipo-9) − 0.392 (MF-3.4 in Maipo-9) in the deepest layer (Figure S3a and Table S3). However, the results of transcriptomic coverage had no significant correlations with depth, with the minimal coverage in SZ_2 (MF-10.5.5.11.1.24; RPKM value is 0) and maximal coverage in Maipo-8 (MF-9.11; RPKM value is 3.049) (Figure S3b and Table S3). These results suggested that genomic abundance of bathyarchaeotal members could not reflect their real transcritional activities in the sediments, and highlighted that it is important to investigate the transcriptome of the microbial community in the future ecological functions [24, 25].

Light sensing

Rhodopsins are membrane proteins engaged in light perception and are widespread in three domains of life. They are employed by many organisms to generate energy from light [26,27,28]. According to the annotation of bathyarchaeotal MAGs, rhodopsin genes were also found in the MAGs of Bathy-6 and -8 (Fig. 2). For further confirming the type of rhodopsin, a rhodopsin phylogenetic tree was constructed, clearly showing that the rhodopsins detected in Bathyarchaeota are heliorhodopsins (Fig. 3). Heliorhodopsins are newly described types of rhodopsins, which are abundant and globally distributed [29]. The photocycle of heliorhodopsins (including retinal isomerization and proton transfer, the same as in type-1 and type-2 rhodopsins) is long, which is common in sensory type-1 rhodopsins and benefits for the interaction between rhodopsins and transducer proteins [29]. This result suggests a light-sensory activity of heliorhodopsin, indicating that Bathyarchaeota may sense light. The metatranscriptomic analysis further supported the transcriptional activity for rhodopsin genes in Bathy-6 and -8 (Fig. 4), suggesting that members of Bathy-6 and -8 in mangrove sediments might sense light. However, previous studies have revealed that most of bathyarchaeotal members prefer subsurface of the sediments and large numbers of Bathyarchaeota were found in deeper biosphere where visible light could barely reach [6, 18, 22, 30], thus Bathyarchaeota may not capture visible light with rhodopsin. Infrared light has been proved to be an available energy source for some plants and bacteria [31,32,33,34,35], and rhodopsin could gain longer-wavelength or even infrared sensitivity by substituting all-trans-retinal (chromophore for archaeal cells) with 3,4-dehydroretinal [36], retinal A2, 3-methylamino-16-nor-1,2,3,4-didehydroretinal, or other analogs [37]. Previous studies have also shown that the retinal deficiency by deleting gene sll1541 (converting carotenal to retinal) in bacterial cells could in vivo reconstitute far-red-absorbing rhodopsin with exogenous retinal analog (all-trans-3,4-dehydroretinal and 3-methylamino-16-nor-1,2,3,4-didehydroretinal) [38]. In the current study, two bathyarchaeotal MAGs were found to harbor the genes for carotenoid biosynthesis (crtY) and the genes encoding retinol dehydrogenase (RDH8, 11, 12, 13, 14) were identified in seven bathyarchaeotal MAGs (Table S4). It is possible that bathyarchaeotal cells may utilize exogenous retinal analogs and gain infrared energy. However, the genes crtY and RDH were not found in the same MAG, and the other essential genes for retinal biosynthesis (including the genes encoding carotene dioxygenase and retinoid isomerohydrolase) were still missing, thus more evidences were needed to support the utilization of retinal (or analogs) by Bathyarchaeota. Another possibility for bathyarchaeotal rhodopsin is that, Bathyarchaeota may orient themselves to the subsurface with the rhodopsin as photosensitive protein. The genes for flagella biosynthesis were widespread in bathyarchaeotal MAGs, which is in agreement with the previous report [15], suggesting that bathyarchaeotal cells are capable of motion. Since rhodopsin could response to light by delivering electron, fading, or even breaking down [29, 39], the light sensory of Bathyarchaeota may possibly be one of forces to drive them towards the suitable habitats in subsurface sediments. However, more additional works are needed to tell the importance of bathyarchaeotal rhodopsin.

Fig. 2
figure 2

Presence (red) or absence (white) of marker genes within the metabolisms from each bathyarchaeotal genome

Fig. 3
figure 3

Maximum Likelihood tree of rhodopsin sequences. The scale bar indicates the average number of amino acid substitutions per site. The anchor sequences were from Pushkarev et al. [29]. Sequences were aligned using MUSCLE, columns with more than 95% gaps were trimmed using trimAL. The maximum likelihood tree was built using RAxML 8.0, the number of bootstraps was 1000, and the evolutionary model was LG+GAMMA

Fig. 4
figure 4

Metabolic pathways of nine bathyarchaeotal MAGs in the current study and the transcript activities of individual genes in each bin. Nine squares represent nine bathyarchaeotal MAGs in the current study, colors of the squares represent the subgroups MAGs belonged to, the absence of circles on the squares represents that the MAGs don’t harbor the gene, and the filled color of circles represents the transcript level of each gene normalized by ribosomal protein S3

In addition, by searching for the rhodopsin genes in archaeal genomes, plenty of archaeal rhodopsin sequences were found, and phylogenetic analysis implied that heliorhodopsin genes were also harbored by many archaeal phyla, including Euryarchaeota and Asgard archaea (Fig. 3), suggesting heliorhodopsin may be a common protein for archaea to perceive light [40].

Porphyrin biosynthesis

Porphyrin is an important type of tetrapyrrole for living organisms on Earth, many biological processes, including photosynthesis, respiration, circulation, and nutrition, are dependent on the compounds derived from it (chlorophylls, coenzyme F430, hemes, and cobalamin, respectively) [41, 42]. The biosynthesis of these derived compounds all starts with synthesizing Uroporphyrinogen III from glutamate or glycine, then different metal ions are chelated in porphyrin rings by different chelatases, in which dozens of enzymes are involved [43]. In the current study, all of genes related to anaerobic cobalamin biosynthesis were found in Bathyarchaeota, and some members within bathy-6, -8, and -20 were found to harbor more than half of them (including cobalt chelatase cbiK and cbiX), suggesting the potential cobalamin biosynthesis by Bathyarchaeota (Fig. 2). Cobalamin, also named Vitamin B12, is an essential enzyme cofactor in DNA, fatty acid, and amino acid metabolisms for all lives [44]. Cobalamin can only be produced in nature by a few bacteria and archaea [45], thus eukaryotic organisms and cobalamin auxotrophic microbes rely on them. A previous study suggests some members within domain Archaea serve as cobalamin producers in natural environments, including Euryarchaeota and Thaumarchaeota [44, 46,47,48]. To our knowledge, this is the first report to provide the genetic evidence of cobalamin biosynthetic pathway in two subgroups of Bathyarchaeota. This finding suggests that some members of Bathyarchaeota may benefit the growth of other lives via vitamin B12 production in diverse environments.

Interestingly, the phylogenetic analysis of the chelatase genes in Bathyarchaeota indicated that, besides cobalt chelatase (cbiK and cbiX), many magnesium chelatase genes were also harbored by Bathyarchaeota (cluster with chlD and chlI) (Figure S4), and most of bathyarchaeotal MAGs with the magnesium chelatase genes (including members of Bathy-1, -3, -15, -20, and -22) did not harbor the genes for cobalamin biosynthesis (Fig. 2). Magnesium chelatase is known to work in the first unique step of (bacterio)chlorophyll biosynthesis by inserting magnesium ion into protoporphyrin IX [49], further gene exploring indicated that some genes related to chlorophyll synthesis are also found in bathyarchaeotal MAGs (Bathy-6, -8, -15, and -17 in Fig. 2), thus the existence of magnesium chelatase genes might support a potential chlorophyll biosynthesis, suggesting the metabolic diversity in Bathyarchaeota.

Calvin-Benson-Bassham (CBB) cycle

Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) and phosphoribulokinase (PRK) are two representative enzymes of the CBB cycle [50]. In the current study, among 100 available bathyarchaeotal genomes, 33 genomes within 8 subgroups, including Bathy-6, -8, -15, and -17, harbored the genes of RuBisCO (Fig. 2), and they phylogenetically belonged to Form III (including both Forms III-a and III-b) (Fig. 5). The genes of PRK were found in the genomic bins of Bathy-15 and -17 (Fig. 2 and Figure S5), with transcript activity in Bathy-17 (Fig. 4). In comparison with the short scaffolds harboring the genes of PRK, some RuBisCO genes were harbored by the long scaffolds (> 10 kbp) encoding ribosomal proteins (in genomes B24, SG8-32-3, MF-5.3.1.4, etc.) and other CBB cycle-related enzymes (in genomes BA1, MF-10.3, MF-4.2.1.10.12.7, etc.), further supporting that Bathyarchaeota may participate in CBB cycle. Notably, it is the first time to report a Form III-a RuBisCO in bathyarchaeotal MAGs. Previously, Form III-a RuBisCO has only been identified in methanogens [51], which employ both PRK and Form III-a RuBisCO to regenerate carbon fixation [52]. A previous study has demonstrated that even Escherichia coli could generate a functional CBB cycle with the co-existence of RuBisCO and PRK [53]. Thus, considering that bathyarchaeotal MAGs harbored all genes of CBB cycle, including RuBisCO, prk, phosphoglycerate kinase (pgk), glyceraldehyde-3-phosphate dehydrogenase (gapA), triosephosphate isomerase (tpiA), fructose-bisphosphate aldolase (fbaB), fructose-1,6-bisphosphatase (fbp), transketolase (tkt), and ribulose-phosphate 3-epimerase (rpe), and they all have transcript activities (MF−10.5.5.11.1.24 in Fig. 4), all of the results suggested the metabolic potential for carbon fixation through the CBB cycle in the bathyarchaeotal cells. Taken together with the potential chlorophyll biosynthesis pathway described above, members of Bathyarchaeota may possess both metabolic pathways for carbon fixation and light sense (potential chlorophyll based and/or rhodopsin based). However, the co-existence and relationship of these two pathways in Bathyarchaeota are unknown, more works are needed to verify.

Fig. 5
figure 5

Maximum Likelihood tree of RuBisCO sequences. The scale bar indicates the average number of amino acid substitutions per site. The anchor sequences were from Jaffe et al. [51]. Sequences were aligned using MUSCLE, columns with more than 95% gaps were trimmed using trimAL. The maximum likelihood tree was built using RAxML 8.0, the number of bootstraps was 1000, and the evolutionary model was LG+GAMMA

Nitrogen metabolism

Several studies have found genomic evidence that Bathyarchaeota are involved in the nitrogen cycle [13, 15, 54]. In the current study, more nitrogen-related genes, including ammonium transporter (amt), hydroxylamine reductase (hcp), respiratory nitrate reductase (narH), nitrite reductase (nir), nitrogenase iron protein (nifH), and mono/di/trimethylamine aminotransferase (mttB/mtbB/mtmB), were found in bathyarchaeotal MAGs, and different bathyarchaeotal subgroups harbored different ones (Fig. 2). Taken together with the different transcript activities of these genes in different subgroups (Fig. 4), bathyarchaeotal members may be capable of producing ammonium with diverse nitrogen compounds. Genes involving in urea production were also found in bathyarchaeotal MAGs (Fig. 2) with high transcriptional activities (Fig. 4), further suggesting that Bathyarchaeota may convert ammonium to urea. For life in the ocean, nitrogen is a limiting nutrient [55], and the current study suggests that Bathyarchaeota may utilize diverse primary nitrogen sources to produce urea (Fig. 4), suggesting that Bathyarchaeota may act as a “transfer station” for nitrogen compounds in the global nitrogen cycle.

Moreover, for urea producing, two pathways, including arginase (rocF) and agmatinase (speB) pathways, were both found in Bathyarchaeota. Different from the widespread of speB in all bathyarchaeotal subgroups, rocF only existed in the MAGs of Bathy-6, -8 and -15 (Fig. 2), and had transcriptional activity only in Bathy-15 (Fig. 4). Gene rocF is formerly known only existing in the members of bacteria and eukaryotes [56]; however, according to the phylogenetic analysis in the current study, in addition to Bathyarchaeota, rocF genes were also found existing in Woesearchaeota and Thorarchaeota, and they formed a distinct clade in the phylogenetic tree (Figure S6), indicating that archaeal arginase evolves independently from those of Bacteria and Eukaryotes.

Sulfur metabolism

Sulfate or sulfite was previously reported as the important environmental factors to shape the distribution of bathyarchaeotal subgroups [18, 30, 57], and genomic evidence for dissimilatory sulfate and sulfite reduction via genes sat-aprAB (sulfate adenylyltransferase-adenylylsulfate reductase) were also reported [17, 58]. They both suggested that Bathyarchaeota could participate the global sulfur cycle. In the current study, different from previous studies, diverse genes related to assimilatory sulfur reduction via genes cysND-cysC-cysH-cysI (sulfate adenylyltransferase-phosphoadenosine phosphosulfate reductase-sulfite reductase) were identified from the bathyarchaeotal genomes (Fig. 2). Similar to the nitrogen metabolism, different subgroups of Bathyarchaeota harbored parts of sulfur reducing metabolism: more than half of genomes within Bathy-15 and -17 harbored the genes related to sulfate reduction (cysND, cysC, and cycH), while the gene cysI only detected in one Bathy-6 genome, and most of the genomes within Bathy-6 harbored the gene related to thiosulfate reduction (phs) (Fig. 2). The transcriptional activities of the genes within each subgroup were also different from each other (Fig. 4), suggesting different subgroups of Bathyarchaeota may participate in different parts of the sulfur cycle. In addition, most members of Bathyarchaeota may have the ability to reduce S0 to sulfide with hydA (hydrogenase/sulfur reductase), supporting the previous studies that high abundance of Bathyarchaeota in the sulfur-rich habitats [12, 20, 59, 60]. All of these results indicated a role of Bathyarchaeota in the global sulfur cycle.

Distinct microoxic lifestyle of Bathy-6

Notably, the genes related to the oxygen-dependent pathways were found in bathyarchaeotal MAGs, including pyruvate oxidase (poxL) in Bathy-6 and -8, and superoxide dismutase (SOD) in Bathy-1, -6, and -15 (Fig. 2 and Figure S7). In particular, most MAGs of Bathy-6 did not harbor poxL and SOD genes, while six reference MAGs within Bathy-6 harbor both genes (Fig. 2), suggesting that some members of Bathy-6 may live aerobically. Further, the phylogenetic analysis of bathyarchaeotal MAGs indicated that, the MAGs harboring the genes of both cobalamin biosynthesis (more than half of related genes) and oxygen-dependent pathways were phylogenetically clustered together and formed a functionally distinctive lineage within Bathy-6 (Figs. 1 and 2). In addition, rhodopsin was also found in the MAGs within this lineage, suggesting that members of this lineage may be a source of vitamin B12 preferring microoxic habitats with/without accessible light. It is totally different from the anoxic lifestyle of the other bathyarchaeotal members, supporting the distinct niche preference of Bathy-6 in the previous study [22, 30] and suggesting versatile metabolic abilities and varied lifestyles within Bathy-6.

Conclusions

Previous genomic analyses have suggested that Bathyarchaeota was an important driver for global carbon cycle. However, many potential metabolisms are ignored, thus it is underestimating the importance of Bathyarchaeota in global biochemical cycle. In this study, Bathyarchaeota was firstly found to potentially involve in rhodopsin and porphyrin biosynthesis, CBB cycle, and some pathways related to nitrogen and sulfur cycles. The potential biosynthetic pathway of rhodopsin and chlorophyll-like compounds suggested phototrophy of Bathyarchaeota, the potential biosynthesis of cobalamin indicated a possible vitamin B12 production by some Bathyarchaeota, and the pathway of utilizing diverse nitrogen compounds to produce urea implied that Bathyarchaeota might be an important “transfer station” for marine nitrogen cycle. Moreover, some members of Bathy-6 were found to have a light-sensory, vitamin B12 producing, and microoxic lifestyle, highlighting diverse metabolic abilities among or even within bathyarchaeotal subgroups. Considering Bathyarchaeota is a widespread and high-abundance phylum in diverse environments, the new knowledges of bathyarchaeotal metabolisms in the current study further highlight the crucial role of Bathyarchaeota in the global biochemical cycle.

Methods

Sample collection, DNA and RNA extraction, and sequencing

Mangrove wetland often occurs in subtropical coastal regions, and it supports plenty of plants, animals, meio/macro-fauna, and prokaryotes, contributes up to 15% of all carbon accumulation in marine settings [61, 62]. Futian Nature Reserve (Shenzhen, China) and Mai Po Nature Reserve (Hong Kong, China) are located at the north and south sides of Shenzhen Bay in Southern of China, respectively, and their mangrove forests join at the estuarine mouth of Shenzhen River (Figure S1). As described in the recent studies [13, 63], sediment cores were collected from the mangrove and mudflat in Futian Nature Reserve (Shenzhen, China) and Mai Po Nature Reserve (Hong Kong, China) using columnar samplers (Figure S1). Eight samples were picked out and put in an icebox before taken to the lab. Samples for RNA extraction were preserved in RNAlater (Ambion, China). For each sample, 10 g sediment was used for DNA and RNA extraction with PowerSoil DNA Isolation Kit and RNA Powersoil Total RNA Isolation Kit (QIAGEN, German), respectively. For RNA samples, Ribo-Zero rRNA removal kit (Illumina, USA) was used to remove rRNA, and the reverse transcription of remaining RNA was conducted using SuperScript III First Strand Synthesis System (Invitrogen, USA). Consequently, DNA and cDNA were sequenced using Illumina HiSeq 4000 (USA) PE150 by BerryGenomics (China).

Metagenome assembly, genome binning, and gene annotation

Raw metagenomic reads were dereplicated (100% identity over 100% length) and trimmed using sickle [64]. Remaining reads of each sample were de novo assembled using IDBA-UD [65] with the parameters -mink 65, -maxk 145, and -steps 10. The binning of scaffolds was conducted using MetaBAT [66] with 12 sets of parameters. Then, 12 results were analyzed using Das Tool [67] to obtain the optimized genomic bins. To improve the qualities of the bins, the scaffolds of bathyarchaeotal bins and reference genomes were remapped by the raw reads of all samples using BWA [68], all mapped reads were repeated assembling and binning as above. Finally, the genomic bins were decontaminated based on the results of contig-cluster tree using anvio5 (http://merenlab.org/software/#anvio). The completeness and contamination of MAGs were calculated using CheckM [69]. The taxonomic assignment of the MAGs was conducted with GTDB-Tk package [70] to ensure them belonging to Bathyarchaeota (Table S5), subgroup assignment was performed by building phylogenetic trees (see “Phylogenetic analyses and average nucleotide identity” section).

16S rRNA genes were predicted and taxonomically assigned by BLASTn against the SILVA NR99 database (v132) [71]. Genes were called using Prodigal with parameter -p meta [72]. Genes were annotated using KEGG Automatic Annotation Server [69] and BLASTp against NR database retrieved on December 2017 (e value < 1e−5). To further confirm the annotation of the marker genes related to Calvin-Benson-Bassham (CBB) cycle, urea cycle, light sensing, porphyrin biosynthesis, and microoxic lifestyle, amino acid sequences of ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO), phosphoribulokinase (PRK), arginase/agmatinase, rhodopsin, chelatase, and superoxide dismutase (SOD) were downloaded from UniProt database (Accessed July 2019) [73] to form the local ones, and the amino acid sequences called from bathyarchaeotal MAGs were BLASTp against the local database (e value < 1e−5). Finally, phylogenetic trees were built to ensure the annotation of the genes. Details of the related gene annotation are shown in Table S4.

Metagenomic and transcriptomic abundance of sequences

The gene abundance from each MAG was determined by mapping metagenomic reads to the sequences using BWA software with the default setting [68], and the relative abundances were calculated using the RPKM method [74]. Transcript abundance of predicted genes was calculated by mapping non-rRNA transcriptomic reads to gene sequences as above, and the relative abundance of each gene was normalized by the abundance of ribosomal protein S3, considering its transcripts could be detected in all bathyarchaeotal MAGs as single-copy conserved gene. Details of transcript level of the predicted genes are shown in Table S6.

Phylogenetic analyses and average nucleotide identity

Phylogenetic tree of 16S rRNA gene was built with all 16S rRNA gene sequences from bathyarchaeotal MAGs and the reference sequences from Zhou et al. [13]. Phylogenetic analysis of genomes was conducted with 16 ribosomal protein data sets (ribosomal proteins L2, L3, L4, L5, L6, L14, L15, L16, L18, L22, L24, S3, S8, S10, S17, and S19) [75] predicted by CheckM [69]. The phylogenetic trees of the functional proteins were built with sequences from the MAGs and anchor sequences from Jaffe et al. [51] (RuBisCO and PRK), Pushkarev et al. [29] (rhodopsin), Novák et al. [76] (agmatinase and arginase), or the sequences of local database mentioned above (chelatase and SOD), respectively. All trees were constructed as below: sequences were aligned independently using MUSCLE [77], columns with more than 95% gaps were trimmed using trimAL [78]. Before building tree, 16 ribosomal protein alignments were concatenated, and the taxa with less than 50% of the alignment columns were removed. The maximum likelihood trees of 16S rRNA gene, 16 ribosomal proteins, and functional proteins were built using RAxML 8.0 [79] on the CIPRES Science Gateway [80], the number of bootstraps was 1000, and the evolutionary models were GTRCAT (for nucleotide) and LG+GAMMA (for amino acid), respectively. Then, the trees were visualized on the iTOL web server [81].

The pairwise average nucleotide identity between each bathyarchaeotal genome was calculated and plotted by using get_homologues package [82] with default parameters.