Comparative genomic analysis of Parageobacillus thermoglucosidasius strains with distinct hydrogenogenic capacities
- 148 Downloads
The facultatively anaerobic thermophile Parageobacillus thermoglucosidasius produces hydrogen gas (H2) by coupling CO oxidation to proton reduction in the water-gas shift (WGS) reaction via a carbon monoxide dehydrogenase–hydrogenase enzyme complex. Although little is known about the hydrogenogenic capacities of different strains of this species, these organisms offer a potentially viable process for the synthesis of this alternative energy source.
The WGS-catalyzed H2 production capacities of four distinct P. thermoglucosidasius strains were determined by cultivation and gas analysis. Three strains (DSM 2542T, DSM 2543 and DSM 6285) were hydrogenogenic, while the fourth strain (DSM 21625) was not. Furthermore, in one strain (DSM 6285) H2 production commenced earlier in the cultivation than the other hydrogenogenic strains. Comparative genomic analysis of the four strains identified extensive differences in the protein complement encoded on the genomes, some of which are postulated to contribute to the different hydrogenogenic capacities of the strains. Furthermore, polymorphisms and deletions in the CODH-NiFe hydrogenase loci may also contribute towards this variable phenotype.
Disparities in the hydrogenogenic capacities of different P. thermoglucosidasius strains were identified, which may be correlated to variability in their global proteomes and genetic differences in their CODH-NiFe hydrogenase loci. The data from this study may contribute towards an improved understanding of WGS-catalysed hydrogenogenesis by P. thermoglucosidasius.
KeywordsBiohydrogen production Parageobacillus thermoglucosidasius Water-gas shift reaction Comparative genomics DSM 6285
Average nucleotide identity
Carbon monoxide dehydrogenase
Conserved orthologous groups
Genome-to-genome distance calculator
Hemicellulose utilization system
Optical density at 600 nm
Single nucleotide polymorphisms
Transcription factor binding sites
Members of the genus Parageobacillus are Gram-positive, facultatively anaerobic thermophiles belonging to the family Bacillaceae and the phylum Firmicutes . They are readily isolated from a wide range of high temperature environments including hot springs, deep oil wells and desert soils . The thermophilic nature of this genus has resulted in considerable interest in Parageobacillus as a source of a broad range of industrially relevant, thermostable enzymes, such as lipases, proteases and hemicellulases [3, 4, 5]. Furthermore, there has been increasing interest in the use of Parageobacillus spp. as whole cell biocatalysts in a broad of biotechnological applications, such as the production of bioethanol, the biorefinement of linen fibres and the bioremediation of environmental pollutants [6, 7, 8].
The biotechnological value of Parageobacillus spp. can partly be attributed to the expansive metabolic capacities of members of this genus, which effectively utilize a broad range of complex polysaccharides and oligosaccharides for growth, including hemicellulose , hydrocarbons and aromatic compounds [3, 9, 10]. Furthermore, Parageobacillus spp. can grow anaerobically, where they produce lactate, formate, acetate, ethanol and succinate using mixed acid fermentation pathways . Recently, we showed that the type strain of Parageobacillus thermoglucosidasius is also able to produce hydrogen gas (H2) in the anaerobic phase following aerobic growth, concomitant with the consumption of carbon monoxide (CO) . Genomic analysis linked this capacity to a genetic locus comprising of three genes coding for a carbon monoxide dehydrogenase (CODH) and 12 genes coding for a NiFe group 4a hydrogenase. This CODH-NiFe hydrogenase complex catalyses the water-gas shift (WGS) reaction, where CO is oxidized by CODH with the resultant electrons being used for the reduction of protons by the NiFe hydrogenase, resulting in production of H2 (CO + H2O → CO2 + H2) [11, 12].
In recent decades, the need for reducing the use of conventional energy sources and the use of so called ‘green power’ has received increasing attention [13, 14]. Hydrogen gas has been extensively studied as alternative energy source as it carries the highest energy per unit mass, can be stored easily and its combustion results in the release of water vapour, making it cost effective and environmentally ‘friendly’ [15, 16]. However, a significant hurdle for the use of H2 as alternative energy carrier is the currently available production practice . H2 is largely produced by industrial means, including through steam reformation of methane, coal gasification and electrolysis of water, all of which are costly and are often detrimental to the environment . There has thus been increasing interest in the development of biological H2 production processes . Bacteria using the WGS reaction show considerable promise in this regard, given that CO is a component of “syngas” (comprising primarily of CO, CO2 and H2) resulting from a wide range of industrial processes, including the gasification of coal . However, the majority of bacteria using the WGS are strictly anaerobic, implying that oxygen (O2) would first have to be removed from the gas mixture, at high cost . P. thermoglucosidasius would be a promising candidate for further exploration as it can grow aerobically and, once O2 has been consumed, can shift to the anaerobic WGS reaction [11, 12]. In the current study, the ability of four distinct P. thermoglucosidasius strains to produce H2 via the WGS reaction was analysed. Three of the four strains were hydrogenogenic and the hydrogenogenic strains showed differences in the time taken to start H2 production. Comparative genomic approaches were applied to identify the potential molecular basis for the variable hydrogenogenic capacities.
P. thermoglucosidasius strains vary in their ability to produce hydrogen
Only minor differences were observed in terms of the H2 produced and CO consumed after 84 h, for DSM 2542T (H2 produced: 2.470 ± 0.149 mmol; CO consumed: 2.280 ± 0.11 mmol), DSM 2543 (H2 produced: 2.389 ± 0083 mmol; CO consumed: 2.512 ± 0.106) and DSM 6285 (H2 produced: 2.637 ± 0.058 mmol; CO consumed: 2.552 ± 0.058 mmol), with an average yield of 1.02 H2/CO (Additional File 1). There was, however, an observable difference in the time taken by the hydrogenogenic strains to start utilizing CO and produce H2. Whereas DSM 2542T and DSM 2543 initiated H2 production after ~ 36 h, H2 production by DSM 6285 commenced ~ 16 h after inoculation (i.e., the lag phase between growth phase and H2 production was substantially shorter for P. thermoglucosidasius DSM 6285). In order to further characterise the different hydrogenogenic capacities of the P. thermoglucosidasius strains, and the faster onset of H2 production by P. thermoglucosidasius DSM 6285 compared to the other two hydrogenogenic strains, the genomes of the four strains were sequenced and compared using in silico methodologies.
Comparative genomics reveals substantial genome diversification among the compared P. thermoglucosidasius strains
DSM 6285 harbours one plasmid while the other three strains have two plasmids. Between 4329 (DSM 2543) and 4433 (DSM 21625) proteins are encoded on the genomes. The genomic relatedness of the four strains was determined by calculating the digital DNA-DNA hybridization (GGDC)  and OrthoANI  values for each paired combination of strains. This showed that P. thermoglucosidasius DSM 2542T and DSM 2543, isolated from the same environmental source, were most closely related , while DSM 21625 was the most distinct strain on the basis of these two genomic values (Fig. 3). However, both GGDC (> 70%) and ANI (> 95%) values exceed those distinguishing distinct species, confirming that all four strains belonged to the species P. thermoglucosidasius (Additional file 2).
Differences in the proteome may contribute to the variable H2 production capacities of the P. thermoglucosidasius strains
The core and accessory protein datasets of the four P. thermoglucosidasius strains were compared to assess whether the distinctive H2 production capacities might be correlated to differences in their protein complement.
A total of 383 protein families are unique to the non-hydrogenogenic strain (DSM 21625), while 112 protein families are restricted to the hydrogenogenic strains (DSM 2542T, DSM 2543 and DSM 6285) (Fig. 4; Additional file 3). Functional annotation and classification according to Conserved Orthologous Groups (COGs)  showed that in both cases the datasets are largely comprised of proteins belonging to the COG functional category S (function unknown), with 73.63% (282 proteins) and 76.79% (86 proteins) of the proteins in the non-hydrogenic and hydrogenic dataset, respectively, belonging to this category (Additional file 3). Most of the remaining proteins unique to the non-hydrogenogenic P. thermoglucosidasius DSM 21625 are involved in carbohydrate transport and metabolism (G – 9.14%), DNA replication, recombination and repair (L – 5.22%) and transcription (K – 3.39%) (Additional file 3). The majority of proteins in COG category G are encoded by the hemicellulose utilization system (HUS) locus, which has previously been identified as a highly variable locus among members of the genera Geobacillus and Parageobacillus, encoding a broad range of enzymes and metabolic pathways for the degradation of distinct hemicellulose polymers . Proteins linked to the COG category L include phage primases, endonucleases and terminases, a product of the large number of unique phage elements in this strain. Proteins that form part of an L-arabinose transporter (AraFGH) were unique to the hydrogenogenic strains, located within the HUS locus as well as a branched amino acid transporter (LivFGMHJ) .
The shorter H2 production lag phase for P. thermoglucosidasius DSM 6285 suggests that this strain reaches the metabolic state suitable for the WGS reaction sooner than the other hydrogenogenic strains. Analysis of the unique protein family complement of this strain indicated that the majority of the 468 proteins not shared with DSM 2542T and DSM 2543 belong to the COG category S (function unknown – 76.50%). Considering the proteins in other COG categories, only 24 proteins are involved in metabolic functions, including carbohydrate (G; 5 proteins), amino acid (E; 8 proteins) and inorganic ion transport and metabolism (P; 7 proteins), secondary metabolite biosynthesis, transport and catabolism (Q; 2 proteins) and energy production and conversion (C; 3 proteins) (Additional file 3). Among these metabolic proteins, four are involved in the synthesis of an inorganic ion ABC transporter (NCBI Acc. # DV713_01765–01780). The presence of conserved domains in DV713_01765 (CD08492: PBP2_NikA_DppA_OppA_like_15; E-value: 0e+ 00), DV713_01770 (TIGR02789: NikB; E-value: 4.52e-77), DV713_01775 (TIGR02790: NikC; E-value: 2.38e-67) and DV713_01780 (TIGR02770: NikD; E-value: 2.93e-79) suggest that this may represent a nickel transport system . Nickel is pivotal for the functioning of both anaerobic CODH and Ni-Fe hydrogenases, forming part of the metallocenter of both these enzymes .
Also unique to this strain are three proteins involved in the biogenesis of cytochrome caa3 oxidase. Cytochrome caa3 oxidase is the major oxidase involved in the last stages of the respiratory electron transport chain in B. subtilis grown under aerobic conditions, transferring electrons from the cytochrome c in the respiratory chain to the terminal electron acceptor, oxygen [24, 25]. Deletion of the structural genes for cytochrome caa3 oxidase in B. subtilis showed that this enzyme is not essential for growth . The unique presence of orthologues of three proteins which are central to cytochrome c oxidase biosynthesis in P. thermoglucosidasius DSM 6285 may imply that this strain could more efficiently oxidise cytochrome c and reduce O2 to H2O, thereby reaching the critical oxygenic limits for functioning of the anaerobic CODH-hydrogenase enzymes faster than the other strains. However, comparison of the O2-consumption rates of the hydrogenogenic strains did not show any substantial difference in terms of the time taken until O2 reached its minimum. Differences at the gene level, particularly in the CODH-NiFe hydrogenase loci, may also contribute to the disparity in the hydrogenogenic capacities of the P. thermoglucosidasius strains.
Variation in the CODH-hydrogenase locus of hydrogenogenic and non-hydrogenogenic P. thermoglucosidasius strains
To determine whether mutations within the CODH-NiFe hydrogenase genes might be responsible for the difference observed in hydrogenic capacities of the four P. thermoglucosidasius strains, the nucleotide sequences for each of the genes in the CODH-NiFe hydrogenase loci of the four strains were aligned and compared. In total, 72 Single Nucleotide Polymorphisms (SNPs) were identified across the 15 genes, with an average of 4.8 SNPs per gene. SNPs were interspersed across the genes rather than clustered together (Additional file 4). More SNPs was observed in the cooC (10 SNPs) and cooS (11 SNPs) genes, coding for the CODH maturation factor and CODH catalytic subunit, respectively, as well as phcA (8 SNPs), phcB (13 SNPs) and phcF (9 SNPs), which encode the NiFe group 4a hydrogenase component B, membrane subunit and large subunit, respectively. When comparing the different strains, 45 SNPs (62.5% of the total SNPs) were restricted to the non-hydrogenogenic P. thermoglucosidasius DSM 21625, with most of these occurring in the cooC (10 SNPs), cooS (9 SNPs) and phcF (8 SNPs) genes, respectively (Additional file 4). A further 14 SNPs were found in the genes of both DSM 21625 and DSM 6285, while 14 SNPs are only found in the hydrogenogenic P. thermoglucosidasius DSM 6285. When the proteins encoded by each of the genes were compared, it was observed that the SNPs resulted in only 29 non-synonymous mutations at the amino acid level (Additional file 4), the majority of which occurred in the proteins of DSM 21625 (19–65.72% of the total non-synonymous mutations), with most occurring in CooC (6 mutations) and PhcF (4 mutations). Six distinct non-synonymous mutations were also observed in DSM 6285, which initiates H2 production more rapidly than the other two hydrogenogenic strains.
Average amino acid identity values were calculated for the CODH-NiFe hydrogenase protein datasets. The three hydrogenogenic strains share an average amino acid identity of 99.87% across the 15 proteins. The proteins of the non-hydrogenogenic P. thermoglucosidasius DSM 21625 shared 99.50% average amino acid identity with those of the hydrogenogenic strains, indicating that this strain was the most divergent. The highest divergence was observed for CooC, where the DSM 21625 protein shared 97.64% average amino acid identity with the orthologous protein in the other three strains, across 254 amino acids.
Alignment of the entire locus using Mauve v2.3.1  revealed the presence of two deletions associated with the intergenic regions of the CODH-hydrogenase locus of DSM 21625, which are not observed in the loci of the three hydrogenogenic strains (Fig. 5). A 22 nucleotide deletion occurs in the intergenic region between cooC and cooS, 14 nucleotides downstream of the stop codon of cooC. The second deletion of 17 nucleotides occurred 115 nucleotides upstream of the start codon of cooC (and thus upstream of the CODH-NiFe hydrogenase locus). Putative transcription factor binding sites (TFBSs) were identified in a 500 base pair window upstream of the cooC start codon using the TFSITESCAN tool . One predicted TFBS shared homology with the binding site for the B. subtilis transition state regulator Hpr [30, 31, 32]. Alignment of the flanking regions of the P. thermoglucosidasius CODH-NiFe hydrogenase loci showed this transcription factor binds between 139 and 129 bp upstream of cooC and the last three nucleotides of this TFBS forms part of the 17 nucleotide deletion in P. thermoglucosidasius DSM 21625 (Fig. 5). The deletion within the Hpr binding site might thus explain the lack of H2 production in this strain. However, further laboratory analysis is required to identify the regulon for the CODH-NiFe hydrogenase locus to confirm this hypothesis.
The ability of four different P. thermoglucosidasius strains to produce hydrogen via the WGS reaction was evaluated. Our analysis revealed extensive differences in the hydrogenogenic capacities of the strains. In particular, P. thermoglucosidasius DSM 21625 was unable to produce H2 even though a CODH-NiFe hydrogenase locus was shown to be present on the genome. This suggests that the ability to produce H2 via the WGS reaction is not a universal trait among P. thermoglucosidasius strains. We identified one strain, P. thermoglucosidasius DSM 6285, with ‘superior’ hydrogenogenic capacity, with the initiation of H2 production after a shorter lag phase than for the other hydrogenogenic strains.
Comparative genomic analyses revealed a number of key differences at the molecular level that may underlie the distinct hydrogenogenic capacities observed for the different P. thermoglucosidasius strains. These include an extensive protein set which was unique to the hydrogenogenic strains, and differences in the protein complement of DSM 6285 and the other hydrogenogenic strains. The lack of clear phenotypic differences that can be linked to the variation at the protein level suggests that there may be other factors underlying the differences observed in H2 production times for DSM 6285, DSM 2542T and DSM 2543. For example, it is possible that some of the proteins assigned to COG category S (unknown function) play a role in these variable phenotypes. Similarly, proteins of unknown function among the protein families unique to the non-hydrogenogenic P. thermoglucosidasius DSM 21625 and unique to the hydrogenogenic strains DSM 2542T, DSM 2543 and DSM 6285 may also have an effect on the ability of the different strains to produce hydrogen.
Furthermore, SNPs in the CODH-NiFe hydrogenase loci, and the associated amino acid mutations and deletions in and adjacent to this locus, may also be responsible for the difference in hydrogenogenic phenotype. In particular, a deletion was observed in the binding site for the transition state regulator Hpr upstream of the CODH-NiFe hydrogenase locus on the non-hydrogenogenic strain P. thermoglucosidasius DSM 21625. In B. subtilis, Hpr has been shown to play a role in the up- and down-regulation of a range of genes involved in post-exponential phase processes such as motility, extracellular enzymes synthesis, antibiotic production and sporulation [30, 31, 32]. As the consumption of CO and production of H2 by the three H2-producing P. thermoglucosidasius strains occurs in the post-exponential phase, a role for an Hpr-like regulator in the control of this capacity is plausible.
It cannot be excluded that factors other than observable genetic differences may underlie these distinct phenotypes. For example, the shorter lag phase between aerobic growth and the WGS-driven H2 production may be due to differences in the O2 sensitivity of the CODH-hydrogenase complex of the hydrogenogenic strains. Proteomic, gene expression and biochemical analyses could shed further light on the phenotypic differences observed in this study.
P. thermoglucosidasius strains differ in their capacity to produce H2 via the CODH-NiFe hydrogenase-catalyzed WGS reaction. This may be correlated to extensive differences we observed in terms of the proteins encoded on the genomes of the strains, as well as to SNPs in the CODH-NiFe hydrogenase loci. Further gene expression, proteomic and physiological characterization will be undertaken to elucidate the factors underlying the distinct hydrogenogenic phenotypes. This data will be crucial in the selection of P. thermoglucosidasius strains and the optimization of fermentation conditions for incorporation in bioindustrial hydrogen production strategies.
Bacterial strains and culturing conditions
To verify the production of H2 of different P. thermoglucosidasius strains, four strains P. thermoglucosidasius DSM 2542T, P. thermoglucosidasius DSM 2543, P. thermoglucosidasius DSM 6285 and P. thermoglucosidasius DSM 21625 were grown in presence of CO. All strains were obtained from the DSMZ (Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Braunschweig, Germany).
The cultivation of the tested strains was conducted as previously described . Briefly, pre-cultures and experimental cultures were grown in mLB medium (modified Luria-Bertani): tryptone (1% w/v), yeast extract (0.5% w/v), NaCl (0.5% w/v), 1.25 ml/L NaOH (10% w/v), and 1 ml/L of each of the filter-sterilized stock solutions: 1.05 M nitrilotriacetic acid, 0.59 M MgSO4.7H2O, 0.91 M CaCl2.2H2O and 0.04 M FeSO4.7H2O. A first set of pre-cultures was grown aerobically at 60 °C and 120 rpm (24 h). A second pre-culture was inoculated to an OD600 = 0.1 from pre-culture 1 and incubated aerobically for 12 h. The cultivations were conducted in serum bottles (250 ml) with 50 ml medium and an initial gas atmosphere consisting of 50% CO and 50% air at 1 bar atmospheric pressure. The bottles were inoculated with 1 ml of the second pre-culture. All cultivations were undertaken at 60 °C and 120 rpm in an Infors Thermotron (Infors AG, Bottmingen, Switzerland) The experiments ran for 84 h and were performed as quadruplicates in stoppered bottles.
The gas compositions and culture growth were monitored at nine different time points during the experimental cultivation. For monitoring the growth, 1 ml of the culture was measured at OD600 using an Ultrospec 1100 pro spectrophotometer (Amersham Biosciences, USA). The gas composition was monitored at each time point using a 300 Micro GC gas analyzer (Inficon, Bad Ragaz, Switzerland) with the columns Molsieve and PLOT Q. Before and after taking the liquid and gas samples the pressure in the serum bottles was measured using a manometer (GDH 14 AN, Greisinger electronic, Regenstauf, Germany). Gas analysis and calculation of the gas composition were performed as previously described .
Genome sequencing, assembly and annotation
P. thermoglucosidasius DSM 2543, DSM 6285 and DSM 21625 were grown aerobically in mLB medium (60 °C; 120 rpm) to mid-log phase. Total DNA was extracted using Quick-DNA™ Fungal/Bacterial Miniprep Kit (Zymo Research, Irvine, CA, USA). The genome of P. thermoglucosidasius DSM 2542T was sequenced previously (NCBI Acc. #: CP012712.1). Genome sequencing of the other three strains was conducted using the Illumina Hiseq platform at GATC Biotech (Konstanz, Germany). A total of 9,152,896 (1.38 Gb: ~ 353x coverage), 9,467,702 (1.43 Gb: ~362x coverage) and 9,684,759 (1.46 Gb: ~ 369x coverage) paired reads were generated for P. thermoglucosidasius DSM 2543, DSM 6285 and DSM 21625, respectively. De novo genome assembly was undertaken using SPAdes genome assembler v3.11.1  and the resulting contigs were further assembled (scaffolded) with the aid of Medusa v1.6  and CSAR  using all available complete genome sequences of P. thermoglucosidasius as reference. The plasmids of P. thermoglucosidasius DSM 2542T were missing from the available complete genome sequence but were obtained from a second available draft genome of this strain (NCBI Acc. # LAKX01000000).
The high quality draft genome sequences of all four strains were structurally and functionally annotated using the Rapid Annotation RAST using Subsystems Technology (RAST v. 2.0) server . Putative integrated bacteriophages were identified using the Phast server . The genomic relatedness of the four strains was determined using the Genome-to-Genome Distance calculator (GGDC 2.0)  and OrthoANI 0.93 .
Comparative genomic analyses
The protein datasets predicted by RAST for all four strains were compared using Orthofinder 1.1.4  with default parameters. This allowed for the identification of protein families (orthologous proteins) found in all four strains (core), shared by two or three strains or unique to individual comparator strains (accessory). Both the core and accessory protein family datasets were functionally annotated by comparison against the EggNOG database (v. 4.5.1) using eggnog-mapper and the NCBI Conserved Domain Database using Batch CD-search [38, 39].
To identify variation in the CODH-NiFe group 4a hydrogenase loci of the four compared strains, these regions were extracted from the genome sequences and compared using Mauve v2.3.1 . SNPs in the genes in this locus were identified by pair-wise alignment of each gene using ClustalW in Bioedit v. 7.2.6 [40, 41]. The operon structures of the CODH-NiFe group 4a hydrogenase loci were determined in silico using FgenesB . Further, transcription factor binding sites (TFBSs) were identified using the TFSITESCAN tool .
We acknowledge support by Deutsche Forschungsgemeinschaft and Open Access Publishing Fund of Karlsruhe Institute of Technology.
TM was supported by the Federal Ministry of Education and Research (grant #031B0180). HA is supported by the Alexander von Humboldt Foundation.
Availability of data and materials
All the data related to this research is included in this publication and can be found within the article and its additional files. The genome sequences of P. thermoglucosidasius DSM 2543, DSM 6285 and DSM 21625 have been made publically available on the NCBI database under the Genbank Accession numbers [Bioproject] QQOJ00000000 [PRJNA482718], QQOK00000000 [PRJNA482719] and QQOL00000000 [PRJNA482720], respectively.
TM, HA, AN, DC and PDM designed the study. TM, HA, RK, MZ and PDM conducted the experiments and genomic analyses. TM and PDM wrote the initial manuscript. TM, HA, RK, MZ, DC, AN and PDM edited the manuscript. All authors read and approved the final version of the manuscript.
Ethics approval and consent to participate
Consent for publication
PDM is an Associated Editor for BMC Genomics (Prokaryote microbial genomics). The authors declare that they have no further competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 5.Thebti W, Riahi Y, Belhadj O. Purification and characterization of a new thermostable, Haloalkaline, solvent stable, and detergent compatible serine protease from Geobacillus toebii strain LBT 77. BioMed Res Int. 2016;2016:1–8.Google Scholar
- 21.Suzuki Y, Kishigami T, Inoue K, Mizoguchi Y, Eto N, Takagi M, Abe S. Bacillus thermoglucosidasius sp. Nov. a new species of obligately thermophilic bacilli. Int J Syst Evol Microbiol. 1983;4:487–95.Google Scholar
- 27.Solovyev V, Salamov A. Automatic annotation of microbial genomes and metagenomic sequences. In: Li RW, editor. Metagenomics and its applications in agriculture, biomedicine and environmental studies. New York: Nova Science Publishers; 2011. p. 61–78.Google Scholar
- 29.Tfsitescan: http://www.ifti.org/Tfsitescan. Accessed 20 Apr 2018.
- 33.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;5:455–77.CrossRefGoogle Scholar
- 36.Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonsetin V, Wattam AR, Xia F, Stevens R. The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res. 2014;42(Database issue):D206–14.CrossRefPubMedGoogle Scholar
- 38.Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016;44(D1):D286–93.CrossRefPubMedGoogle Scholar
- 41.Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–8.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.