Diversities and potential biogeochemical impacts of mangrove soil viruses
Mangroves are ecologically and economically important forests of the tropics. As one of the most carbon-rich biomes, mangroves account for 11% of the total input of terrestrial carbon into oceans. Although viruses are considered to significantly influence local and global biogeochemical cycles, little information is available regarding the community structure, genetic diversity and ecological roles of viruses in mangrove ecosystems.
Here, we utilised viral metagenomics sequencing and virome-specific bioinformatics tools to study viral communities in six mangrove soil samples collected from different mangrove habitats in Southern China.
Mangrove soil viruses were found to be largely uncharacterised. Phylogenetic analyses of the major viral groups demonstrated extensive diversity and previously unknown viral clades and suggested that global mangrove viral communities possibly comprise evolutionarily close genotypes. Comparative analysis of viral genotypes revealed that mangrove soil viromes are mainly affected by marine waters, with less influence coming from freshwaters. Notably, we identified abundant auxiliary carbohydrate-active enzyme (CAZyme) genes from mangrove viruses, most of which participate in biolysis of complex polysaccharides, which are abundant in mangrove soils and organism debris. Host prediction results showed that viral CAZyme genes are diverse and probably widespread in mangrove soil phages infecting diverse bacteria of different phyla.
Our results showed that mangrove viruses are diverse and probably directly manipulate carbon cycling by participating in biomass recycling of complex polysaccharides, providing the knowledge essential in revealing the ecological roles of viruses in mangrove ecosystems.
KeywordsMangrove soil Viruses Viromes Carbon cycling Auxiliary metabolic genes
Auxiliary metabolic gene
Total organic carbon
Polymerase chain reaction
Multiple displacement amplification
National Center for Biotechnology Information
Open reading frame
- NCBI NR database
NCBI non-redundant protein database
Terminase large subunit
Clusters of orthologous group
Evolutionary genealogy of genes: Non-supervised Orthologous Groups
Viruses are the most abundant biological entities on earth; they are virtually present in all ecosystems [1, 2]. By lysing their hosts, viruses control host abundance and affect the structure of host communities . Viruses also influence their host diversity and evolution through horizontal gene transfer, selection for resistance and manipulation of bacterial metabolisms [4, 5, 6, 7]. Importantly, viruses affect local and global biogeochemical cycles through the release of substantial amounts of organic carbon and nutrients from hosts and assist microbes in driving biogeochemical cycles with auxiliary metabolic genes (AMGs) [8, 9, 10, 11].
The presence of AMGs in viruses has been described previously; AMGs are presumed to augment viral-infected host metabolism and facilitate production of new viruses [4, 12]. AMGs are most extensively explored in marine cyanophages and include genes involved in photosynthesis, carbon turnover, phosphate uptake and stress response [13, 14, 15, 16]. Cultivation-independent metagenomic analysis of viral communities has identified additional AMGs that are involved in motility, central carbon metabolism, photosystem I, energy metabolism, iron–sulphur clusters, anti-oxidation and sulphur and nitrogen cycling [10, 17, 18, 19, 20, 21, 22]. Interestingly, a recent analysis of Pacific Ocean Virome data identified niche-specialised AMGs that contribute to depth-stratified host adaptations . Given that microbes drive global biogeochemical cycles, and a large fraction of microbes is infected by viruses at any given time , viral-encoded AMGs must play important roles in global biogeochemistry and microbial metabolic evolution.
Mangrove forests are the only woody halophytes that live in salt water along the world’s subtropical and tropical coastlines. Mangroves are one of the most productive and ecologically important ecosystems on earth. The rates of primary production of mangroves equal those of tropical humid evergreen forests and coral reefs . As a globally relevant component of the carbon cycle, mangroves sequester approximately 24 million metric tons of carbon each year [25, 26]. Most mangrove carbon is stored in soil and sizable belowground pools of dead roots, aiding in the conservation and recycling of nutrients beneath forests . Although mangroves cover only 0.5% of the earth’s coastal area, they account for 10–15% of the coastal sediment carbon storage and 10–11% of the total input of terrestrial carbon into oceans . The disproportionate contribution of mangroves to carbon sequestration is now perceived as an important means to counterbalance greenhouse gas emissions.
Despite the ecological importance of mangrove ecosystem, our knowledge on mangrove biodiversity is notably limited. Previous reports mainly investigated the biodiversity of mangrove fauna, flora and bacterial communities [29, 30, 31]. Particularly, little information is available about viral communities and their roles in mangrove soil ecosystems [32, 33]. In view of the importance of viruses in structuring and regulating host communities and mediating element biogeochemical cycles, exploring viral communities in mangrove ecosystems is essential. Additionally, the intermittent flooding of sea water and resulting sharp transition of mangrove environments may result in substantially different genetic and functional diversity of bacterial and viral communities in mangrove soils compared with those of other systems . Therefore, in this study, we utilised high-depth sequencing and virome-specific bioinformatics tools to explore viral communities and their possible roles in mangrove ecosystems.
Overview of mangrove soil viromes
Morphological and taxonomic diversities of mangrove soil viral communities
Examination of purified virus particles via TEM revealed four major morphological types of mangrove soil viruses, that is, non-tailed spherical viruses and three types of tailed viruses (myoviruses with contractile tails, siphoviruses with long and non-contractile tails and podoviruses with short tails) (Additional file 1: Figure S2).
A total of 2082 viral species were identified from mangrove soil viromes. Comparative analysis showed that mangrove soils had a similar profile of viral compositions, as 35.5% of viral species were shared by all six viromes and 88% of viral species were shared by at least two viromes (Additional file 1: Figure S3). However, we also identified some environment-specific viral species. For example, five environmental halophages and nineteen mycobacteriophages were unique to bay viromes; they are found in all bay viromes but not in river and port viromes. Several circo-like viruses (e.g., cyclovirus and avian orthoreovirus) and enterobacteriaphage were unique to river virome, whereas no unique viral species was identified in port virome.
Genetic diversity of mangrove soil viral communities
To assess the diversity and genetic distance among the viruses in the six mangrove soils, phylogenetic analyses were performed on the major viral families of the six viromes by using different marker genes. Virome contigs homologous to each marker (> 65% similarity and > 300 bp aligned nucleic acids) were selected and further clustered (> 95% similarity and 90% coverage) to generate unique viral contigs for phylogenetic analysis of the targeted viral groups.
Abundant auxiliary carbohydrate metabolic genes in mangrove viruses
Annotated auxiliary CAZymes from mangrove soil viruses
Viral auxiliary CAZymes
Alpha-amylase (GH13, GH57)
Glucan 1,3-alpha-glucosidase (GH15)
Glucan 1,3-beta-glucosidase (GH55)
1,3-beta-glucanase (GH16, GH17)
Sugar isomerase (GH58)
Glucose-6-phosphate isomerase (GH99)
Polysaccharide deacetylase (CE4)
Pectate lyase (PL3)
Quinoprotein glucose dehydrogenase (AA12)
l-sorbosone dehydrogenase (AA12)
Glycerate 2-kinase (GT16)
Alpha,alpha-trehalose-phosphate synthase (GT20)
Alpha-glucan phosphorylase (GT35)
Glycogen synthase (GT5)
Host prediction of CAZymes-containing viruses
Viral auxiliary CAZymes
Predicted hosts (phyla)
Identification of newly described viral clades
In contrast to our current understanding of marine viral communities, the soil virome and its function in terrestrial ecosystems have remained relatively understudied . Among approximately 100 distinct viromes reported to date, only eight soil viromes have been described in literature, and none of these viromes was found in mangrove soils, suggesting the severely underestimated and undersampled viral diversity in soil ecosystems, especially in mangrove soil ecosystems [33, 38]. Indeed, our study showed that mangrove soil viruses are largely unidentified, as a large proportion of mangrove soil virome sequences was poorly taxonomically affiliated (Fig. 2a), and most of the predicted ORFs featured no homologues in public databases (Fig. 5a). Such a high proportion of unknown viral sequences possibly resulted from the specificity of viruses, which are scarcely represented in the current databases, from mangrove soil ecosystems [10, 32]. Indeed, to our best knowledge, no genome of mangrove soil-derived viruses has been sequenced to date, highlighting the lack of knowledge and reference sequences for viruses of mangrove environments.
Phylogenetic analyses of the major viral groups (circo-like viruses and Caudovirales) highlighted a remarkable diversity and previously unknown viral clades. Notably, we identified two new mangrove clades with clear separation from known references and environmental metagenomic sequences in the Circo-like virus phylogenetic tree constructed from Rep protein (Fig. 3). Thus far, the astounding diversity of Circo-like virus has been uncovered from metagenomic studies [39, 40, 41, 42]. However, the exact evolutionary relationships among these viruses remain obscure. A previous study has demonstrated that chimeric ssRNA and ssDNA viruses (CHIVs) blur the evolutionary borders between the major groups of eukaryotic ssDNA viruses by capturing the capsid protein gene from RNA viruses and replacing Rep genes with distant counterparts from diverse ssDNA viruses . Interestingly, the two new mangrove clades were distant from known references and CHIVs and showed distinction from each other. As indicated in the phylogenetic tree, the mangrove clades that were intermediates between Circoviridae, Nanoviridae and Geminiviridae may provide additional clues to reveal the exact evolutionary relationships among these viruses. According to the genetic distance with references and animal and plant species inhabiting in mangroves, viruses of mangrove clade 1 most possibly infect mangrove trees, whereas those of clade 2 infect mangrove animals, such as crabs, shrimps, fishes and birds. Phylogenetic trees drawn from TerL also spotlighted the significant diversity and novelty of mangrove Caudovirales (Fig. 4). Although mangrove Caudovirales are widely distributed among the three-tailed phage family, most of the mangrove Terl sequences formed three novel clades with high internal diversity within Sipho- and Podoviridae families, highlighting an important uncharacterised diversity for Caudovirales in mangrove soils.
Mangrove soil viruses may share a common genetic pool
Despite the diverse and far-from-reference characteristics of the mangrove virome sequences, the protein sequences of marker genes are relatively similar among the six-mangrove soil viromes. In addition, all the newly identified mangrove clades of Circo-like viruses and Caudovirales contained sequences from nearly every mangrove samples (Figs. 3 and 4, respectively) regardless of the significant differences of environmental factors (Additional file 1: Table S1) and the bacterial community structures (Additional file 1: Figure S6) between these samples. These mangrove viral communities are thus possibly composed of evolutionarily close viral genotypes and differ primarily in terms of community compositions. In contrast to the mixing/connectivity properties of aquatic systems, soil habitats are intrinsically heterogeneous and diverse . The spatial heterogeneity of soil structure and the resulting lack of connectivity of individual ‘island’ microbiomes within soil aggregates promote parallel microbial evolution trajectories. Such parallel evolutionary events are bound to increase local microbial diversities [43, 44]. However, mangrove soils are intermittently flooded with marine water. Consequently, the ‘island’ microbiomes can, to a certain extent, mix and diminish parallel evolutionary effects. Therefore, we speculate that the similar genetic features of mangrove soil viruses are probably due to the mixing and connectivity effects of marine tides; this assumption also supports the former hypothesis that viral diversity can reach high levels on a local scale but is relatively limited globally [45, 46, 47].
Co-influences of marine and fresh water on mangrove soil viral community
Previous studies have shown that soil viromes demonstrate a distinct feature with aquatic viromes [48, 49]. Consistently, we also identified a number of typical soil viruses in mangrove soils. For example, nine phages infecting Rhizobium were widely distributed in six mangrove soil viromes. Rhizobium are important for soil ecosystems, as they can undergo endosymbiotic nitrogen-fixing association with plant roots in soils . Notably, unlike other soil systems, mangrove soils occupy the interface of terrestrial and marine systems. In particular, mangrove ecosystems possess the unique feature of intermittent flooding with seawater. Mangroves also constantly receive fresh water from river outflow and sanitary wastewaters. Therefore, mangrove soil viromes are possibly multiple-shaped or affected by soil, marine and fresh water systems. Deeper inspection of viral genotypes revealed notable marine signatures. Phages known to infect typical marine bacteria were widely present in the mangrove soil samples. For example, four dsDNA Pelagibacter phages were identified at high relative abundances in six mangrove soil viromes (the actual relative abundance of dsDNA viruses maybe higher, as they were not over-amplified by MDA), including pelagiphage HTVC010P, which infected ‘Candidatus Pelagibacter ubique’ of the SAR11 clade, the most abundant bacterium in surface seawater around the world . In a more extensive metagenomic investigation including samples from coastal and open ocean areas, pelagiphage HTVC010P was proposed as one of the most abundant virus subfamilies in marine environments, in which 38.8% of successfully assigned reads were assigned to HTVC010P . High relative abundance of Celeribacter phage P12053L was found in mangrove soil viromes. Celeribacter phage P12053L, a lytic dsDNA phage infecting bacteria of the Roseobacter clade, was isolated from seawater and collected off the coast of the Yellow Sea , providing further evidence of the impacts of marine water on mangrove soil viromes.
In general, relatively few impacts of freshwater were observed in mangrove soil virome, which agreed with the general environmental characteristics of our sampling sites. High salinity indicated that the sampling site was more similar to typical marine environments than freshwater. Numerous phages infecting enterobacteria and other mammalian pathogens were identified in the mangrove soil viromes. Similar to their pathogen hosts, the occurrence of these non-marine phages in mangrove soil viromes probably results from freshwater transfer from the river or sanitary wastewaters; such process is highly influenced by intense human activities. Collectively, our results showed that mangrove soil viromes are mainly affected by marine waters, with less influence coming from freshwater.
Mangrove soil viruses may directly manipulate mangrove carbon cycling
Thus far, most viral AMGs are identified from marine environments , and limited information is known about viral AMGs in soils [32, 33]. As soil and ocean are disparate ecosystems with unique ecological drivers, soil viral AMGs may be distinct from those identified in marine viruses. However, to our best knowledge, only three viral AMGs (trzN, phoH and RNR) are reported to date in soil ecosystems. trzN, a gene encoding chlorohydrolase required in atrazine catabolism, was identified from phages in atrazine-contaminated soils . trzN possibly improves viral capacity to produce more progeny under resource limitations (e.g., where atrazine may be an alternative or sole C and N source). The phoH gene (phosphate regulation gene) and RNR gene (encoding for ribonucleotide reductase) were found in the virome of Namib Desert hypoliths . The prevalence of phoH in the hypolithic system suggested a significant function of phoH in phosphate acquisition, whereas the high abundance of RNR gene may be advantageous for viruses in nutrient-limited environments.
Viral AMGs for carbon metabolisms have been extensively investigated in marine environments; most of them are involved in central carbon metabolism to facilitate viral replication . The current paradigm from marine studies is that central carbon AMGs shift host microbial metabolism to mimic a state of starvation. In this model, virally encoded glycogen synthase disrupts host glycolysis and directs host metabolism away from amino acid biosynthesis and towards pathways favouring phage replication [22, 56]. Recently, carbon AMGs relevant to carbohydrate metabolism were identified in bovine rumen viromes; these AMGs include five glycosidic hydrolases (beta-glucosidase, alpha-glucosidase, alpha-amylase, maltooligosyltrehalose, trehalohydrolase and endoglucanase) . Although the implication of viral-encoded glycosidic hydrolase in virus–host interaction remains unknown, it is proposed that rumen virus-encoded glycosidic hydrolases potentially augment the breakdown of complex carbohydrates to increase energy production and boost host metabolism during viral infection . Consistently, all the five rumen virus-encoded glycosidic hydrolases (beta-glucosidase, alpha-glucosidase, alpha-amylase, trehalase and endoglucanase) were also identified in mangrove soil viruses in our study. Moreover, we identified more novel auxiliary carbohydrate metabolism genes in mangrove viruses, including glycoside hydrolases, glycosyl transferases, polysaccharide lyases and carbohydrate esterase (Table 1). This study is the first to report such viral AMGs in soil; most of them were never reported in viruses before. Interestingly, most viral carbohydrate metabolic genes belong to CAZymes with glycoside hydrolase activities (Fig. 5b), indicating that mangrove soil viruses primarily participate in the decomposition of organic carbon.
Mangroves are among the most carbon-rich biomes, accounting for 11% of the total input of terrestrial carbon into oceans . In mangrove ecosystems, a large proportion of the organic carbon is stored as large pools in soils, dead plants and animals in the form of complex carbohydrates. Complex carbohydrates or polysaccharides, such as cellulose, xylan, pectin, starch, alginate, mannan and chitin, are major components of plant cell walls, crustacean shells and intercellular spaces and are highly difficult to degrade . Thus, the biolysis of complex polysaccharides in soils and organism debris is essential for mangrove biomass recycling and critical in local and global carbon cycles. The biodegradation of polysaccharides is a complex process that requires the participation of multiple enzymes [57, 58]. Significantly, mangrove soil viruses encode abundant genes of CAZymes, including core hydrolysis enzymes (cellobiosidase, xylanase, chitinase, alpha-amylase, mannanase and endoglucanase) and auxiliary enzymes (polysaccharide deacetylase, pectinesterase and pectate lyase), that are essential for the degradation of various polysaccharides, indicating the full-scale capabilities of mangrove soil viruses in the biolysis of complex polysaccharides (Table 1). Molecular evolutionary studies of viral-encoded photosynthesis AMGs showed that viruses obtain and maintain AMGs from within their known host ranges for their own fitness advantages . Here, phylogenetic analysis showed that viral CAZyme genes are diverse and probably derive from phages infecting distinct hosts of different phyla, suggesting that auxiliary carbohydrate metabolic genes may be widespread in mangrove soil viruses (Table 2).
In a marine environment, several of the most important and well-described viral AMGs are photosystem I and II genes that have been acquired by phages infecting photosynthetic marine cyanobacteria [59, 60]. The expressions of these genes during infection boost the photosynthetic output of infected cells and play important roles in marine nutrient and biogeochemical cycles . In contrast to marine microbes, most soil microbes are heterotrophic and acquire carbon and energies by decomposing complex organics [32, 33]. Therefore, viral carbohydrate AMGs possibly facilitate hosts to decompose and utilise complex carbohydrates and thus boost viral replication in soil ecosystems, similar with the functions of photosynthetic AMGs in marine environments.
Methodological Considerations and Limitations
The development of metagenomic approach provides a powerful tool for cultivation-independent investigation of viral communities across ecosystems. However, the procedure starting from sample collection, sequencing preparation to bioinformatics analysis of the virome is experimental and informatics-challenging, which may induce biases in exploring viral communities.
Conventional solutions for amplifying sufficient viral DNA for metagenomics analyses include MDA and linker amplification (LA) . However, both these methods seriously distort the proportion of dsDNA and ssDNA viruses recovered . The LA method is based on the ligation of dsDNA linker to sheared DNA . As ligation occurs between dsDNA fragments, ssDNA viruses are inefficiently recovered through this method. MDA is biassed to over-amplify circular ssDNA , which render the resulting metagenomes non-quantitative. Recently, a new virome library preparation protocol has been developed. This protocol incorporates an adaptase step prior to linker ligation and amplification, making the process efficient for both ssDNA and dsDNA templates . Fortunately, the major findings of our study will be exempt from the known bias of MDA approach, because the most significant part of our results (i.e., phylogenetic analysis, AMG characterisation and viral-host linkage) rely on the non-quantitative description of viral genes. Besides, given that MDA over-amplifies ssDNA viruses, the actual relative abundance of dsDNA viruses may be even higher than that looked like in the viromes, which rendered the finding that mangrove soil viromes are co-influenced by marine and fresh water even more robust.
A major concern of investigating viral AMGs is to exclude the contamination of cellular DNA in viromes. To this end, we have exerted our best efforts to remove cellular DNA completely during sample preparation. However, even the most thorough laboratory processing can also yield contaminated viromes . Thus, a second in silico filtration is required to identify and remove any non-viral signal. Currently, VirSorter is the most frequently utilised informatics tool to ensure that only viral genomic data are included within the virome . The VirSorter pipeline as a virome decontaminator features a precision higher than 98.99%, but its performance in detecting short viral contigs (< 3 kb) is extremely limited, as only about 13% of bona fide viral contigs are recovered when the contig length measures 1–2 kb . As the average contig length of our assembled virome is relatively short (616–687 bp, Additional file 1: Table S2), VirSorter may not be a good choice to detect viral signals. Instead, an alternative read filtering was processed by only selecting reads and contigs that were identified as viral or unknown . This method can remove as much contamination as possible but may also lose a considerable portion of viral signals, as certain viral reads and contigs are similar with microbial DNA or contain portions of microbial DNA. Genomic linkage analysis was further utilised to ensure that the predicted auxiliary CAZymes are bona fide viral sequences (Fig. 6). Collectively, these utilised experimental and informatics methods will ensure that virtually no known cellular signal was considered in our analyses.
In summary, we systemically explored the viral communities in mangrove soil for the first time. The results revealed extensive diversity and previously unknown viral clades in mangrove soils. Comparative analysis of viral genotypes revealed that mangrove soil viromes are mainly affected by marine waters, with less influence coming from freshwaters. Remarkably, we identified abundant auxiliary CAZyme genes from mangrove soil viruses. Given the global relevance of mangroves in the carbon cycle and the probable widespread of carbohydrate metabolic genes in mangrove viruses, the role of viral carbohydrate AMGs in global carbon cycle can be highly significant. Collectively, our results showed that mangrove soil viruses may directly manipulate carbon cycling through the biolysis of complex polysaccharides, implying the important and diverse roles of environmental viruses than previously suspected.
Sampling site descriptions and sample collection
Samples were collected from mangrove soils in Guangxi and Hainan Provinces, China (Fig. 1 and Additional file 1: Table S1). Guangxi mangrove sampling sites (GX_15_bay, GX_16_bay_1 and GX_16_bay_2) are located in Beibu Bay near Beihai City. These sites feature typical regular diurnal tide patterns and subtropical oceanic monsoon climates. A contiguous and well-preserved mangrove ecosystem, wherein Aegiceras corniculatum and Kandelia candel are the dominant plant communities, is found in this area due to minimal human activities. Hainan mangrove sampling sites (HN_17_bay, HN_17_river and HN_17_port) are located in Sanya City, north of South China Sea and exhibit irregular diurnal tide patterns and tropical oceanic monsoon climates. HN_17_river sample was collected from mangrove soil in the riverbank of Sanya River, which flows through the urban district of Sanya City. The dominant plant communities include Avicennia marina and Rhizophora apiculata. HN_17_bay sample was collected from mangrove soil in Yalong Bay, which is the most well-preserved mangrove in Sanya City, wherein Ceriops tagal, Rhizophora stylosa and Lumnitzera racemosa dominate the ecosystem. HN_17_port sample was collected from the mangrove soil of Tielu Port of Sanya City. The main plant communities comprise R. apiculata, A. marina and L. racemosa.
Mangrove soil samples were collected in different mangrove habitats between 2015 and 2017 (Additional file 1: Table S1). Three soil sample replicates were collected from each site at 5–10 cm depth. To prevent human contamination, the top 1–1.5 cm of soil core was carefully removed from all sides. Then, the soil sample replicates were combined and divided into two fractions. One fraction was frozen in dry ice and stored at -80 °C in the laboratory for about 1–6 months until metagenomic analysis. The other fraction was stored at 4 °C for further physicochemical analyses. Temperature, salinity and pH were directly measured in mangrove soils at a depth of 5–10 cm with sensors. Nutrient concentrations, including total nitrite, ammonia nitrogen, total carbon and TOC, were determined at Qingdao Science Standard Testing platform (Qingdao, China) by using standard methods.
Viruses were purified from soil samples according to the methods described by Williamson et al. but with specific modifications . Briefly, 30 g of soil per sample was first thawed on ice for 6–12 h. Then, the thawed soil sample was suspended in 100 mL of SM solution (100 mM NaCl, 8 mM MgSO4·7H2O and 50 mM Tris/HCl; pH 7.5), shaken for 30 min at room temperature and centrifuged at 3000×g for 15 min at room temperature to precipitate soil particles. The supernatant was harvested and filtered sequentially through 0.45 and 0.22 μm filters. Then, virus particles in the filtrate were enriched using 100 kDa centrifugal ultrafiltration tubes by centrifugation at 4000×g until the final sample volume measured less than 1 mL. Virus samples were then examined for purity and morphology under a transmission electron microscope (TEM, JEOL 100 CXII).
Viral DNA extraction and virome sequencing
Prior to viral DNA extraction, virus concentrates were treated with DNase I (Sangon Biotech, China) at 37 °C for 2 h to remove external-free DNA fragments. The absence of free and contaminating bacterial DNA was validated via PCR amplification of the bacterial 16S rRNA gene with universal primers 27F/1492R. Encapsidated viral DNA was extracted as described by Thurber et al. . To obtain adequate quantity of the viral DNA needed for high-throughput sequencing, MDA was employed to amplify the total viral DNA by using illustra Ready-To-Go GenomiPhi V3 DNA Amplification kit (GE, USA) in triplicates; the resulting products were pooled for further sequencing. For Illumina sequencing, viral DNA was firstly fragmented to approximately 300 bp by an Ultrasonic Cell Disruptor (M220, Covaris) and used as a template to create a metagenome library using the TruSeq DNA Sample Prep Kit (Illumina, San Diego, CA, USA). The prepared DNA library was then sequenced using an Illumina HiSeq 4000 platform at Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China) to generate 150 bp paired-end reads. The viromes are available on the NCBI Sequence Read Archive database with accession numbers SRX3777329, SRX3777330, SRX3777331, SRX3777332, SRX3777333 and SRX3777334 for HN_17_bay, HN_17_port, GX_15_bay, GX_16_bay_1, GX_16_bay_2 and HN_17_river samples, respectively.
After high-throughput sequencing, read ends were firstly trimmed to improve read quality with Seqprep (https://github.com/jstjohn/SeqPrep). Then, low-quality reads were removed with Sickle (https://github.com/najoshi/sickle) to obtain clean reads. Read filtering was performed to remove reads associated with contamination by cellular genomic fragments. Clean reads were firstly compared with Blast program against NCBI NR database, IMG/VR and RefseqVirus database (thresholds of 10−3 on E-value and 50 on bit score) to identify viral, bacterial, archaeal or eukaryotic sequences (Additional file 1: Table S3). The reads, annotated as ‘viral sequences’ or ‘unknown’ (without a significant similarity against the database), were selected as filtered reads set for further analysis. Each sample was then independently assembled using SOAPdenovo from filtered clean reads . Contigs were further filtered by removing non-viral signals (thresholds of 10−3 on E-value and 50 on bit score); ORFs were predicted with MetaGene from the filtered contigs . For viral taxonomic affiliation, each predicted gene was compared with Blastp against the NCBI NR database and RefseqVirus database with a threshold of 10−3 on E-value and 50 on bit score.
Phylogenetic analysis of mangrove viruses
Phylogenetic trees of mangrove viruses were generated based on the viral group specific makers, i.e., replication protein Rep for Circo-like viruses and TerL for Caudovirales [42, 69]. The assembled virome contigs were firstly blasted against the reference sequences of markers with blastX, and only contigs with >65% similarity and > 300 bp aligned nucleic acids were selected for phylogenetic analysis. The representative sequences were further clustered with the thresholds of 95% similarity and 90% coverage with CD-Hit software owing to the large number of aligned virome sequences. All target contigs were translated into amino acid sequences and were aligned using MUSCLE software. Then, the gaps and ambiguously aligned positions were deleted. After alignment, the phylogenetic tree was constructed based on maximum-likelihood method by using MEGA with 100 bootstrap replicates. The trees used in the figures were manually edited using iTOL version 4 .
Identification of auxiliary carbohydrate metabolic genes
COG of protein functional annotation of viromes was determined by Blastp comparisons of predicted ORFs with eggNOG database (http://eggnog.embl.de/) with a threshold of 10−5 . ORFs affiliated to COG function class of carbohydrate transport and metabolism were further clustered to generate unique carbohydrate metabolic ORFs with CD-Hit. Subsequently, CAZymes from these viral ORFs were identified on the dbCAN web server based on CAZyme family-specific HMMs . ORFs related to carbohydrate metabolism and CAZymes were compared with NCBI NR and Pfam database to determine the best annotation and similarity for each ORF.
For contig map analysis, contigs containing CAZymes were retrieved, and ORFs were identified with MetaGene. The ORFs were then compared with NCBI and Pfam databases for functional and taxonomic annotation based on protein sequences.
Identification of putative hosts of AMG-containing viruses
Putative hosts of AMG-containing viruses were predicted based on AMG phylogenetic trees. Molecular evolutionary studies of virus-encoded AMGs showed that viruses obtain and maintain AMGs from within their known host ranges . Therefore, AMG phylogeny serves as a powerful approach to predict putative hosts [10, 73]. Viral AMG sequences were compared with the NCBI NR database (blastp, threshold of 50 for bit score and 10−3 for E-value) to retrieve relevant reference sequences. Protein sequences were aligned with the ClustalW program, and the gaps and ambiguously aligned positions were deleted. the MEGA program was used to generate the phylogenetic tree according to maximum-likelihood method.
Multiple alignment of polysaccharide deacetylases
Multiple alignment of viral and bacterial polysaccharide deacetylase protein was performed using COBALT. The conserved motifs were identified by searching against NCBI Conserved Domain Database.
This work was financially supported by China Ocean Mineral Resources R&D Association (DY135-B-04), the National Natural Science Foundation of China (41606144), the Natural Science Foundation of Fujian Province, China (2016 J05098), the Scientific Research Foundation of Third Institute of Oceanography, SOA (2015019) and Public Science and Technology Research Funds Projects of Ocean (201505026-2).
Availability of data and materials
Raw sequencing data of mangrove soil viromes in this study were deposited in the NCBI SRA database under the accession numbers SRX3777329, SRX3777330, SRX3777331, SRX3777332, SRX3777333 and SRX3777334 for HN_17_bay, HN_17_port, GX_15_bay, GX_16_bay_1, GX_16_bay_2 and HN_17_river samples, respectively. Virome contigs used for phylogenetic analysis were deposited in GenBank under the accession numbers MK527110-MK527169. The complete phage genome assembled from viromes was deposited in GenBank under the accession number MK557849.
MJ and RYZ design the experiments. MJ and XG performed the experiments. MJ, WQ and BL analysed the data. MJ and RZ wrote the manuscript. All authors reviewed the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 17.Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W. The Sorcerer II Global Ocean Sampling Expedition: expanding the universe of protein families. In: International Conference on Application of Information and Communication Technologies; 2007. p. 1–4.Google Scholar
- 31.Ricklefs RE, Latham RE, Ricklefs RE, Schluter D. Global patterns of diversity in mangrove floras. Species diversity in ecological communities: historical and geographical perspectives; 1993. p. 215–29.Google Scholar
- 45.Roux S, Enault F, Ravet V, Colombet J, Bettarel Y, Auguet JC, Bouvier T, Lucas‐Staat S, Vellet A, Prangishvili D, Forterre P, Debroas D, Sime-Ngando, T. Analysis of metagenomic data reveals common features of halophilic viral communities across continents. Environmental Microbiology. 2016;18(3):889–903.CrossRefGoogle Scholar
- 50.Young JM, Kuykendall LD, Martínez-Romero E, Kerr A, Sawada H. A revision of Rhizobium Frank 1889, with an emended description of the genus, and the inclusion of all species of Agrobacterium Conn 1942 and Allorhizobium undicola de Lajudie et al. 1998 as new combinations: Rhizobium radiobacter, R. rhizogenes, R. rubi. Int J Syst Evol Microbiol. 2001;51:89–103.CrossRefGoogle Scholar
- 64.Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J, Greninger AL, Luk KC, Enge B. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24(7):1180–92.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.