Introduction

Strain G-20T (= DSM 43160 = ATCC 25078 = JCM 3152) is the type strain of the species Geodermatophilus obscurus, which is the type genus in the family Geodermatophilaceae [1,2]. The species name derives from the Latin word ‘obscurus’ meaning dark, obscure, indistinct, unintelligible [1]. The genus Geodermatophilus and family Geodermatophilaceae were originally proposed in 1968 by Luedemann [1]. The genus Geodermatophilus was first described as a genus closely related to genus Dermatophilus, but being isolated from soil, as indicated by the prefix ‘geo’, which derives from Greek ‘Gea’ meaning Earth [1]. In contrast, members of the genus Dermatophilus originated from skin lesions of cattle, sheep, horses, deer, and man [3], as the meaning of the genus name is ‘skin-loving’. Yet, on the basis of 16S rRNA gene sequences, Geodermatophilus proved to be only distantly related to Dermatophilus [4] and was thus included in 1989 in the family Frankiaceae [5], together with the genera Blastococcus and Frankia. In 1996, the genera Dermatophilus and Blastococcus were excluded again from the family Frankiaceae [6] and finally formally combined with the genus Modestobacter in the family Geodermatophilaceae again [2]. G. obscurus is the only validly described species in the genus Geodermatophilus [7], and consists of four subspecies [1] which have never been validly published [8].

The type strain G-20T, together with other strains, has been isolated from soil in the Amargosa Desert of Nevada, USA [3]. Further Geodermatophilus strains were isolated from limestone [8,9] and rock varnish [10] in the Negev Desert, Israel, from marble in Delos, Greece [8,9], from chestnut soil in Gardabani, Central Georgia [11], from rock varnish in the Whipple Mountains, California, USA [12], from orange patina of calcarenite in Noto, Italy [13], from gray to black patinas on marble in Ephesus, Turkey [13], and from high altitude Mount Everest soils [14,15]. Here we present a summary classification and a set of features for G. obscurus G-20T, together with the description of the complete genomic sequencing and annotation.

Classification and features

Cells of Geodermatophilus produce densely packed cell aggregates [8], which are described as a muriform, tuber-shaped, noncapsulated, holocarpic thallus consisting of masses of cuboid cells averaging 0.5 to 2.0 µm in diameter (Table 1 and Figure 1) [1]. The thallus breaks up, liberating cuboid or coccoid nonmotile cells and elliptical to lanceolate zoospores [1]. The single cell can differentiate further into polar flagellated motile zoospores [15]. Thus, cells of Dermatophilus may express a morphogenetic growth cycle in which it switches between a thalloid C-form and a motile zoosporic R-form [15]. It has been supposed that tryptose (Difco) contains an unidentified factor, M, which controls morphogenesis in Geodermatophilus [15], though others could not observe the motile, budding zoospores of the R-form [8]. As colonies, strains of Geodermatophilus strains exhibit usually a dark brownish, greenish, or black pigmentation with a smooth to rough surface and in most cases a solid consistency, including minor variations in colony shape [8]. Young colonies are almost colorless, having smooth edges which become distorted and lobed in older colonies, where the colony consistency becomes somewhat crumby [8]. The colonies become darkly pigmented immediately when they started to protrude upwards in the space above the agar [8]. Geodermatophilus does not produce hyphae, vesicles, outer membranous spore layers or capsules [5].

Figure 1.
figure 1

Scanning electron micrograph of G. obscurus G-20T

Table 1. Classification and general features of G. obscurus G-20T according to the MIGS recommendations [16]

Strain G-20T utilizes l-arabinose, d-galactose, d-glucose, glycerol, inositol, d-levulose, d-mannitol, sucrose, and d-xylose as single carbon sources for growth, but not d-arabinose, dulcitol, β-lactose, melezitose, α-melibiose, raffinose, D-ribose, and ethanol [1,23]. Growth with l-rhamnose is only poor [1]. Strain G-20T is negative for β-hemolysis of blood agar (10% human blood) [1]. Also, nitrate reduction occurs only sporadically with both inorganic or organic nitrate broth [1]. Strain G-20T hydrolyses starch, is weakly positive for gelatin liquefaction and negative for casein utilization [23].

Strain G-20T showed a remarkable production of extracellular functional bacterial amyloid (FuBA), which is accessible to WO2 antibodies without saponification [24]. The WO2 antibody has been shown to bind only to amyloid and not to other kinds of protein aggregates [20,24]. One strain of G. obscurus was described as having a lytic activity on yeast cell walls [12]. Another strain from rock varnish was shown to exhibit very strong resistance to UV-C light (220 J×m-2) [12]. Two strains from rock varnish in the Negev Desert were able to oxidize manganese [10].

Only three G. obscurus isolates have 16S rRNA gene sequences with >98% sequence similarity to strain G-20T: isolate G18 from Namibia, 99.1% [2], isolate 06102S3-1 from deep-sea sediments of the East Pacific and Indian Ocean (EU603760) 98.5%, and G. obscurus subspecies utahensis DSM 43162, 98.03% [8]. The highest degree of sequence similarity in environmental metagenomic surveys, 93.3% was reported from a marine metagenome (AACY020064011) from the Sargasso Sea [25]. (January 2010).

Figure 2 shows the phylogenetic neighborhood of for G. obscurus G-20T in a 16S rRNA based tree. The sequences of the three 16S rRNA gene copies in the genome of G. obscurus G-20T do not differ from each other, but differ by 24 nucleotides from the previously published 16S rRNA sequence obtained from DSM 43160 (X92356). These considerable discrepancies are most likely due to sequencing errors in the latter sequence. Genbank accession L40620, which was obtained from ATCC 25078, differs by only one single nucleotide from the 16S rRNA gene copies in the genome obtained from DSM 43160.

Figure 2.
figure 2

Phylogenetic tree highlighting the position of G. obscurus G-20T relative to the other type strains within the suborder Frankineae. The tree was inferred from 1,364 aligned characters [26,27] of the 16S rRNA gene sequence under the maximum likelihood criterion [28] and rooted with the type strain of the order Actinomycetales. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 350 bootstrap replicates [29] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [30], such as the GEBA organism Nakamurella multipartita [31] are shown in blue. Important non-type strains are shown in green [32], and published genomes in bold.

Chemotaxonomy

The major fatty acids of strain G-20T are iso-C15:0 (19.0%), iso-C16:0 (16.2%), C16:1 cis9 (13:0%), C17:1 (10.4%), C18:1 cis9 (6.6%), and anteiso-C15:0 (5.7%). All other fatty acids (iso-C14:0, C15:0, C15:1, C16:0, C17:0, iso-C17:0, and anteiso-C17:0) were each below 4% [33]. Qualitatively, these values are largely congruent with other sources [4,34]. Strain G-20T contains tetrahydro-menaquinones with nine isoprene units [MK-9(H4)] as sole component [4]. No whole cell wall sugar was found in strain G-20T, which contains only small amounts of galactose, glucose, and ribose [4,35]. The cell wall type is IIIC, and contains meso-2,6-diaminopimelic acid [35].

Genome sequencing and annotation

Genome project history

This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [30] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information

Growth conditions and DNA isolation

G. obscurus G-20T, DSM 43160, was grown in DSMZ medium 65 [36] at 28°C. DNA was isolated from 0.5–1 g of cell paste using Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany) with a modified protocol for cell lysis, (procedure st/L), and one hour incubation at 37°C, according to Wu et al. [37].

Genome sequencing and assembly

The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website (http://www.jgi.doe.gov/). 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 5,725 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher or transposon bombing of bridging clones [38]. A total of 1,530 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher). The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 29.8× coverage of the genome. The final assembly contains 48,209 Sanger reads and 353,553 pyrosequencing reads.

Genome annotation

Genes were identified using Prodigal [39] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [40]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes - Expert Review (IMG-ER) platform [41].

Genome properties

The genome is 5,322,497 bp long and comprises one main chromosome with a 74.0% GC content (Table 3 and Figure 3). Of the 5,219 genes predicted 5,161 were protein coding genes, and 58 RNAs. In addition, 350 pseudogenes were also identified. The majority of the protein-coding genes (69.8%) were assigned with a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Figure 3.
figure 3

Graphical circular map of the genome. From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.

Table 3. Genome Statistics
Table 4. Number of genes associated with the general COG functional categories

Comparison with closest related genomes

Table 5 provides an overall comparison of the genomes of G. obscurus strain G-20T with the closest available genomes, that is, Acidothermus cellulolyticus 11BT, Frankia alni ACN14A and N. multipartita Y-104T. The total length of (non-overlapping) high-scoring segment pairs (HSPs) and the number of identical base pairs within these HSPs were determined using the GGDC web server [42] by directly applying NCBI Blastn to the genomes represented as nucleotide sequences [43].

Table 5. Percent-wise 16S rRNA sequence divergence 1

Number and proportion of shared homologs were determined using the ‘Phylogenetic Profiler’ function of the IMG system [41] using default values. While the relative order of 16S rRNA difference does not correspond to the genomic similarities, the four genome-based measures uniformly indicate that N. multipartita Y-104T possesses the genome most similar to the one of G. obscurus G-20T, followed by F. alni ACN14A and A. cellulolyticus 11BT.