Introduction

Pyrobaculum oguniense TE7T (=DSMZ 13380=JCM10595) was originally isolated from the Tsuetate hot spring in Oguni-cho, Kumamoto Prefecture, Japan [1], and subsequently found to grow heterotrophically at an optimal temperature near 94°C, pH 7.0 (at 25°C), and in the presence or absence of oxygen. Under anaerobic conditions, it can utilize sulfur-containing compounds (sulfur, thiosulfate, L-cystine and oxidized glutathione) but not nitrate or nitrite as terminal electron acceptors.

Initial 16S ribosomal DNA sequence analysis [1] placed Pyrobaculum oguniense TE7T in the Pyrobaculum clade and closest to P. aerophilum and Thermoproteus neutrophilus (recently renamed to Pyrobaculum neutrophilum [2]). DNA hybridization studies were conducted with P. aerophilum IM2, P. islandicum GEO3, P. organotrophum H10 and T. neutrophilus (P. neutrophilum) V24Sta, showing little genomic similarity to those species. P. arsenaticum PZ6T [3], P. sp. 1860 [4] and P. calidifontis VA1 [5] were not available at that time.

The genus Pyrobaculum is known for its range of respiratory capabilities [6]. Three of the currently known members of the genus can respire oxygen; P. aerophilum is a facultative micro-aerobe, while P. calidifontis and P. oguniense can utilize atmospheric oxygen. P. aerophilum [7], P. calidifontis, and four other metabolically unique Pyrobaculum species have been fully sequenced; together with P. oguniense, we sought to further broaden the understanding of this important hyperthermophilic group. Pairwise whole-genome alignments of previously sequenced Pyrobaculum species reveal many structural rearrangements. With the availability of high-throughput sequencing, we were able to further explore rearrangements that occur between species, and our use of a not-quite-clonal population allowed exploration of rearrangements within a single species.

Classification and features

Figure 1 and Table 1 summarize the phylogenetic position and characteristics of Pyrobaculum oguniense TE7 relative to other members of the Pyrobaculum genus, respectively.

Figure 1.
figure 1

Phylogenetic tree of the known Pyrobaculum species based on 16S ribosomal RNA sequence. Accession numbers and associated culture collection identifiers (when available) for 16S ribosomal RNA genes are: Pyrobaculum aerophilum (NC_003364.1, DSM 7523); P. calidifontis (NC_009073.1, DSM 21063); P. islandicum (NC_008701.1, DSM 4184); P. arsenaticum (NC_009376.1, DSM 13514); P. oguniense (CP003316, DSM 13380); Thermoproteus neutrophilus (NC_010525.1, DSM 2338); P. sp. 1860 (CP003098.1); P. organotrophum (AB304846.1, DSM 4185); P. sp. CBA1503 (HM594679.1); P. sp. M0H (AB302407.1); P. sp. AQ1.S2 (DQ778007.1); P.WIJ3 (AJ277125.1); ‘P. neutrophilum’ (X81886). Sequences were aligned using MAFFT v.6 [8], followed by manual curation [9] to remove 16S ribosomal introns and all terminal gap columns caused by missing sequence. The maximum likelihood tree was constructed using Tree-Puzzle v. 5.2 [10] using exact parameter estimates, 10,000 quartets and 1000 puzzling steps. Thermoproteus tenax Kra1 (NC_016070.1, DSM 2078) was included as an outgroup. Numbered branches show bootstrap percentages and branch lengths depict nucleotide mutation rate (see scale bar upper right).

Table 1. Classification and general features of Pyrobaculum oguniense according to the MIGS recommendations [11].

Genome sequencing information

Genome project history

Table 2 presents the project information and its association with MIGS version 2.0 compliance [23].

Table 2. Project information

Growth conditions and DNA isolation

The initial culture was obtained in 2003 from the Leibniz Institute-German Collection of Microorganisms and Cell Cultures (DSMZ), and grown anaerobically in stoppered, 150ml glass culture bottles at 90°C. This culture was stored at 4°C for an extended period (six years) before being sampled for this study.

A set of ten-fold dilutions of an actively growing culture (∼108 cells/ml) was carried out and growth was monitored over a five-day period. All cultures were grown at 90°C without shaking in 200ml modified DSM 390 medium, using 1g tryptone, 1g yeast extract, pH 7, supplemented with 10mm Na2S2O3 in 1L flasks under a headspace of nitrogen. At day four of growth, a new 400ml aerobic culture was inoculated with 20ml from the penultimate member of the dilution series (10-8) and shaken at 100 rpm, supplemented with 10mM Na2S2O3, and subsequently was used for sequencing. We note that at day five, turbid growth was seen in the final member of the dilution series (10-9 initial dilution). This implies that the initial 10-8 inoculum used for sequencing likely included more than 10 cells.

Cell pellets were obtained from the 400ml aerobic culture, frozen at −80°C and suspended in 15ml SNET II lysis buffer (20mM Tris-Cl pH 8, 5mM EDTA, 400mM NaCl, 1% SDS) supplemented with 0.5mg/ml Proteinase K and incubated at 55°C for four hours. DNA was extracted from this digest using an equal volume of Tris-buffered (pH 8) PCI (Phenol:Chloroform:Isoamyl-OH (25:24:1)). Following phase-separation (3220g, 10 min. at 4°C), the resulting aqueous phase was treated with RNase A (25µg/ml) for 30 minutes at 37°C. This reaction was PCI-extracted a second time, followed by CHCl3 extraction of the resulting aqueous phase and a final phase separation as before.

DNA was precipitated in an equal volume of isopropyl alcohol at −20°C overnight, followed by centrifugation (3,220 g, 15 min. at 4°C). The resulting pellet was washed in 70% EtOH, pelleted (3220g, 30 min. at 4°C) and aspirated to remove the supernatant. The final DNA pellet was suspended in 1ml TE (50mM Tris-Cl Ph 8, 1 mM EDTA) overnight at room temperature, yielding a final DNA concentration of 0.77 µg/µl.

Genome sequencing and assembly

Sequencing was performed by the UCSC genome sequencing center using both Roche/454 GS/FLX Titanium pyrosequencing and the ABI SOLiD system (mate-pair). Pyrosequencing reads were assembled with 59X coverage exceeding Q40 over 99.95% (2,449,310 bases) of the genome, producing 20 contigs at an N50 of 467,815 bp. This assembly included 24 Sanger reads generated by primer-walking across four of the five encoded CRISPR repeat regions. The resulting maximal base-error rate (<Q40) is 25 in 50,000.

Contigs were assembled to a single scaffold using the mate-pair library generated for use on the ABI SOLiD sequencer. The library was produced with an insert size range of 1000–3,500 bp, and final sequencing yielded 30,631,205 read pairs of 25 bp read length. Those read-pairs were mapped to the 20 pyrosequencing-derived contigs to produce a From::To table of uniquely mapping read-pairs; accumulated for each of the 20×20 contig-pair assignments in each of the three possible relative contig orientations (same, converging or diverging). The scaffold closed easily with these data and yielded a single main chromosome with three major inversions and an extra-chromosomal element.

Genome annotation

Gene prediction and annotation was prepared using the IMG/ER service of the Joint Genome Institute [24], where protein coding genes were identified using Prodigal [25] RNase P RNA [26], SRP RNA and ribosomal RNA(5S, 16S, 23S) were identified by homology to the currently described Pyrobaculum members using the UCSC Archaeal Genome Browser (archaea.ucsc.edu) [27]. Annotation of transfer RNA (tRNA) genes was established using tRNAscan-SE [28], supplemented with manual curation of non-canonical introns. C/D box sRNA genes were identified computationally using Snoscan [29] with extensions supported by transcriptional sequencing [30]. H/ACA-like sRNA genes were identified using transcriptionally-supported homology modeling of experimentally validated sRNA transcripts [31]. CRISPR repeats were identified using CRT [32] or CRISPR-finder [33], with strandedness established by transcriptional sequencing.

Genome properties

The properties and overall statistics of the genome are summarized in Table 3, Table 4, Table 5, Table 6, and Table 7. The single main chromosome (55.08% GC content) has a total size of 2,436,033 bp. Ultra-deep mate-pair sequencing has revealed three regions of the genome that are present in an inverted orientation within a minority of the population (Table 7). The genome also includes an extra-chromosomal element of 16, 887 bp (50.58% GC), that encodes 35 predicted protein-coding genes. Of those genes, seven have an annotated function and the remaining 28 genes are annotated as hypothetical proteins. Of the seven annotated genes, three are coded with viral functions [35].

Table 3. Nucleotide content and gene count levels of the main chromosomea
Table 4. Number of genes associated with the 25 general COG functional categories
Table 5. Sixteen largest regions present in Pyrobaculum oguniense and absent in P. arsenaticum.
Table 6. Summary of genome: one chromosome and one extra-chromosomal element
Table 7. Genomic inversions present within the sampled population

The majority of the P. oguniense genome is structurally syntenic to the genome of P. arsenaticum, and genes found in both species show an average of approximately 96% nucleotide identity. The P. oguniense genome is approximately 15% larger than P. arsenaticum, with the former encoding 536 more (2835–2299) open reading frames (ORFs) predicted to be genes. Vast stretches of sequence space are syntenic between the two species (Figure 2, regions in blue), broken by relatively few regions that appear to arise from either gene loss in P. arsenaticum or genomic expansion in P. oguniense, possibly a result of the numerous paREP elements present in these genomes (Figure 2). These repetitive regions are difficult to assemble, and some are putative transposons (PaREP2b, for example).

Figure 2.
figure 2

Genomic alignment of P. oguniense with P. arsenaticum. Outer ring: P. oguniense (+ strand); Inner ring: P. arsenaticum (- strand). Inter-species alignment blocks shown in light blue and gold (inverted orientation). Intra-species P. oguniense genomic inversions shown as arcs of different colors along outer ring: red: C8 inversion (red); Glutamate Dehydrogenase (GluDH) inversion (green); RAMP/paREP inversion (blue). Positions of paREP elements shown as ticks inside outer ring: paREP1 (red); paREP2b (blue); paREP7 (green). Positions of selected genes which are present in P. oguniense and missing in P. arsenaticum are shown in text inside outer ring: thiamine biosynthesis genes (ThiW and ThiC); CRISPR Cassette(CAS); cobalamin cluster; CO dehydrogenase(COdh); and the aerobic cytochrome clusters(Cyto-c). Aligned regions smaller than 500 nucleotides have been removed for clarity.

We can identify specific genes and gene clusters that are present in P. oguniense but are missing in P. arsenaticum. Notably, the cobalamin synthetic cluster and two thiamine synthetic genes (ThiW and ThiC) are absent in P. arsenaticum. The terminal cytochrome cluster associated with aerobic respiration [36] is also absent in P. arsenaticum as expected from an obligate anaerobe. Among the 16 largest deletions in P. arsenaticum (relative to P. oguniense), four are associated with paREP2 genes, six with paREP1/8, and one with paREP6 (Table 5).

Conclusion

Genomic sequencing and assembly of Pyrobaculum oguniense has yielded a complete genome and an extra-chromosomal element. The main chromosome is largely syntenic to Pyrobaculum arsenaticum and contains a number of gene clusters that are absent in that species. This is of particular interest considering that these species were isolated on opposite sides of the Eurasian continent; P. oguniense was isolated in Japan, while P. arsenaticum was isolated in an arsenic-rich anaerobic pool in Italy.

The synteny that has been retained between the genomes of P. oguniense and P. arsenaticum allows a close examination of gene gain or loss events in the genetic history of these two species. P. arsenaticum is missing the gene clusters that support cobalamin and thiamine synthesis, and it is missing the aerobic cytochrome cluster. Given that P. oguniense and the next closest member in the clade, P. aerophilum, have both retained these capabilities; the most parsimonious explanation is gene loss in P. arsenaticum. Because these genes are located at disparate positions in the P. oguniense genome, it would further appear that these losses are the result of multiple events in the evolutionary history of P. arsenaticum.

Within this genome, 145 non-coding RNA genes are described. These include a single operon encoding 16S and 23S ribosomal RNA, the associated 5S rRNA, the 7S signal recognition particle(SRP), and the RNase P RNA. There are 47 annotated tRNA genes, plus a single tRNA pseudogene. Also included are 83 predicted C/D box sRNA genes and nine additional H/ACA-like sRNA, each of which has been transcriptionally validated [31]. The non-coding RNA content of the P. oguniense genome has become the most extensively annotated among crenarchaeal genomes to date.

The use of a not-quite-clonal cell population for DNA isolation, coupled with ultra-deep sequencing has provided a view of three major inversions that are each present in over 17% of the sample population. The boundaries of one of these inversions are defined by an inverted repeat encoding a duplication of glutamate dehydrogenase (GluDH). Notably, this duplication appears to be present in each of the currently sequenced Pyrobaculum members, suggesting that those genomes may also host similar inversions. A second inversion has at its termini another inverted duplication, encoding a gene associated with one of the paREP members and a CRISPR-associated gene. It remains unclear if these common structural variants impart a physiological advantage, and if so, how the variation provides utility to its host. Based on our expanded genome diversity observations, we suggest that avoiding the use of a strictly clonal population for sequencing purposes can provide a significant benefit to understanding both the biology of the host and a clearer understanding of the genome dynamics of the species.