Introduction

Strain YNP1T (= ATCC BAA-798 = CCMEE 7001) is the proposed type strain of the not yet validly published species ‘Thermobaculum terrenum’, which represents the type species of the not yet validly published genus name ‘Thermobaculum’ [1]. The strain was cultivated from a moderately acidic (pH 3.9) extreme thermal soil in Yellowstone National Park (YNP), Wyoming (USA) for which a thorough chemotaxonomic characterization was published by Botero et al. in 2004 [1]. Although the biological characteristics of the novel strain fulfill all criteria required for the type strain of a novel genus, the proposed name ‘Thermobaculum terrenum’ (= hot small rod belonging to earth/soil) has not yet been validly published (= included in one of the updates of the Validation List that is regularly published in Int J Syst Evol Bacteriol), because rule 30 (3a) of the Bacteriological Code (1990 Revision), which requires that as of 1st January 2001 the description of a new species [...] must include the designation of a type strain, and a viable culture of that strain must be deposited in at least two publicly accessible service collections in different countries from which subcultures must be available [2]. Strain YNP1T is currently deposited only in two US culture collections. Here we present a summary classification and a set of features for ‘T. terrenum’ strain YNP1T, together with the description of the complete genomic sequencing and annotation.

Classification and features

Based on analyses of 16S rRNA gene sequences, strain YNPT is the sole cultured representative of the genus ‘Thermobaculum’. It has no close relatives among the validly described species within the Chloroflexi. The type strain of Sphaerobacter thermophilus [3] shares the highest pairwise similarity (84.9%), followed by Thermoleophilum album and T. minutum [46], the two sole members of the actinobacterial order Thermoleophilales [7] with 83.6% sequence identity, and three type strains from the clostridial genus Thermaerobacter (83.2-83.5%) [8], that are currently not placed within a named family. Only four uncultured bacterial clones in GenBank share a higher degree of sequence similarity with strain YNPT than the type strain of the ‘closest’ related species, S. thermophilus. These are clone DRV-SSB031 from rock varnish in the Whipple Mountains, California (92.1%) [9], and clones AY6_14 (FJ891044), AY6_27 (FJ891057) and AY6_18 (FJ891048) from quartz substrates in the hyperarid core of the Atacama Desert (86.9–87.9%). No phylotypes from environmental screening or metagenomic surveys could be linked to ‘T. terrenum’, indicating a rather rare occurrence in the habitats screened thus far (as of September 2010). A representative genomic 16S rRNA sequence of ‘T. terrenum’ YNPT was compared using BLAST with the most recent release of the Greengenes database [10] and the relative frequencies of taxa and keywords, weighted by BLAST scores, were determined. The three most frequent genera were Thermobaculum (81.2%), Sphaerobacter (10.3%) and Conexibacter (8.4%). The five most frequent keywords within the labels of environmental samples which yielded hits were ‘microbial’ (3.6%), ‘waste’ (3.3%), ‘soil’ (3.3%), ‘simulated’ (3.2%) and ‘level’ (3.1%). The five most frequent keywords within the labels of environmental samples which yielded hits of a higher score than the highest scoring species were ‘soil’ (4.5%), ‘structure’ (3.3%), ‘simulated’ (3.2%), ‘level/site/waste’ (2.9%) and ‘core’ (2.1%).

Figure 1 shows the phylogenetic neighborhood of ‘T. terrenum’ strain YNPT in a 16S rRNA based tree. The sequences of the two identical 16S rRNA gene copies in the genome do not differ from the previously published 1,333 nt long partial sequence generated from ATCC BAA-798 (AF391972).

Figure 1.
figure 1

Phylogenetic tree highlighting the position of ‘T. terrenum’ strain YNPT relative to the type strains of the other species within the phylum Chloroflexi. The trees were inferred from 1,316 aligned characters [11,12] of the 16S rRNA gene sequence under the maximum likelihood criterion [13] and rooted in accordance with the current taxonomy. The branches are scaled in terms of the expected number of substitutions per site. Numbers above the branches are support values from 1,000 bootstrap replicates [14] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [15] are shown in blue, published genomes [16] and GenBank records [CP000804, CP000875, CP000909, CP001337] in bold, e.g. the GEBA genome S. thermophilus [17].

The cells of strain YNP1T are 1–1.5 × 2–3 µm long, non-motile rods (Figure 2 and Table 1), enveloped by a thick cell wall external to a cytoplasmic membrane [1]. YNP1T cells occur singly or in pairs, stain Gram-positive in the exponential growth-phase, are obligately aerobic, and non-spore-forming [1]. Colonies are pink-colored and growth occurs best at pH 6–8 (pHopt 7) and 67°C, with a possible temperature range of 41–75°C [1]. Culture doubling time at Topt was 4 hours and increases sharply above 70°C, whereas growth at the temperature extremes was relatively poor [1]. Cells grow best in complex media containing 0.5% NaCl and yeast extract (for growth factors) [1], but also on sucrose, fructose, glucose, ribose, xylose, sorbitol, and xylitol [1]. Strain YNP1T was positive for catalase, urease, and nitrate reduction, but tested negative for oxidases, and was also negative for fermentation of glucose or lactose [1]. No anaerobic growth was observed in the presence of sulfate, nitrate, ferric iron, or arsenate as possible electron acceptors [1]. No chemolithoautotrophic growth was observed in an experimental matrix that included the electron donors H2, H2S, or S0 with oxygen as the electron acceptor. Surprisingly, the in vitro pH optimum of strain YNP1T (pH 7) is much higher than that of the soil from which it was isolated (pH 4–5) [1]. In pure culture, strain YNP1T failed to grow at such low pH values, suggesting that the thermal soil habitat is not optimal for the strain [1].

Figure 2.
figure 2

Transmission electron micrograph of ‘T. terrenum’ strain YNP1T, scale bar 0.1 µm

Table 1. Classification and general features of ‘T. terrenum’ strain YNP1T according to the MIGS recommendations [18].

Chemotaxonomy

Murein is present in large amounts, which is consistent with the observed thick (approximately 34 nm) cell walls with a muramic acid content similar to that of Bacillus subtilis [1]. The muramic acid content of strain YNP1T was roughly one quarter of that measured for B. subtilis) but almost 40-fold greater than in E. coli [1]. Lipopolysaccharide (LPS) was not detected [1]. Major fatty acids were dominated by straight and branched chain saturated acids: C18:0 (27.0%); iso-C17:0 (11.6%); iso-C19:0 (12.9%); anteiso-C18:0 (12.5%); C20:0 (16.5%) and C19:0 (6.6%). The pink pigment associated with strain YNP1T exhibited a significant absorption at wavelengths 267, 326, 399, 483, 511, and 549 nm [1].

Genome sequencing and annotation

Genome project history

This organism was selected for sequencing on the basis of its phylogenetic position [26], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [27]. The genome project is deposited in the Genome OnLine Database [15] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information

Growth conditions and DNA isolation

T. terrenum strain YNP1T, ATCC BAA-798, was grown in ATCC medium 1981 (M-R2A medium) [28] at 60°C. The culture used to prepare genomic DNA (gDNA) for sequencing was only two transfers from the original deposit. The purity of the culture was determined by growth on general maintenance media under both aerobic and anaerobic conditions. Cells where harvested after 24 hours by centrifugation and gDNA was extracted from lysozyme-treated cells using CTAB and phenol-chloroform. The purity, quality and size of the bulk gDNA preparation was assessed according to DOE-JGI guidelines. Amplification and partial sequencing of the 16S rRNA gene confirmed the isolate as ‘T. terrenum’. The quantity of the DNA was determined on a 1% agarose gel using mass markers of known concentration supplied by JGI. The average fragment size of the purified gDNA determined to be ∼43kb by pulsed-field gel electrophoresis.

Genome sequencing and assembly

The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website (http://www.jgi.doe.gov/). Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 3,926 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher or transposon bombing of bridging clones [29]. A total of 432 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher [30]). The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 10.0× coverage of the genome. The final assembly contains 32,920 Sanger reads.

Genome annotation

Genes were identified using Prodigal [31] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [32]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes - Expert Review (IMG-ER) platform [33].

Genome properties

The genome consists of two chromosomes: the low G+C (48%) 2,026,947 bp long chromosome 1, and the high G+C (64%) 1,074,634 bp long chromosome 2 (Table 3, Figure 3, Figure 4). Of the 2,930 genes predicted (1,935 on chromosome 1 and 995 on chromosome 2), 2,872 were protein-coding genes, and 58 RNAs; forty one pseudogenes were also identified. The majority of the protein-coding genes (73.4%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Figure 3.
figure 3

Graphical circular map of the 2Mb low G+C chromosome 1. From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.

Figure 4.
figure 4

Graphical circular map of the 1 Mb high-G+C chromosome 2. From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.

Table 3. Genome Statistics
Table 4. Number of genes associated with the general COG functional categories