Introduction

Strain 134T (DSM 20109 = ATCC 482 = JCM 1489) is the type strain of the species Cellulomonas flavigena and was isolated from soil and first described in 1912 by Kellerman and McBeth [1], followed by a description in the first edition of Bergey’s Manual in 1923 [2].

Because of the absence of a definite proof linking the deposited strains to the original description Stackebrandt and Kandler proposed in 1979 C. flavigena and six other Cellulomonas strains as neotype strains of their respective species [3]. Here C. flavigena cells are reported as Gram-positive, non-motile and coryneform with snapping divisions [3].

In addition to the type species C. flavigena, the five Cellulomonas species, C. biazotea, C. cellasea, C. gelida, C. fimi and C. uda have been members of the genus since their original description in the first edition of Bergey’s Manual in 1923 [2]. Because of the phenetic resemblance of the different species to each other C. flavigena was recognized as the only species in the genus Cellulomonas in the eighth edition of Bergey’s Manual. This reduction to a single species was questioned by Braden and Thayer based on serological studies in 1976 [4] and by Stackebrandt and Kandler based on DNA reassociation studies in 1979 [3]. In 1980 the Approved Lists of Bacterial Names already listed six species: C. flavigena, C. biazotea, C. gelida, C. uda, C. fimi and C. cellasea [5]. Currently, 17 species belonging to the genus Cellulomonas are noted in the actual version of the List of Procaryotic names with Standing in Nomenclature [6]. Due to the cellulolytic activity of these organisms, their preferred habitats are cellulose enriched environments such as soil, bark, wood, and sugar fields, but they were also successfully isolated from rumen and from activated sludge. Here we present a summary classification and a set of features for C. flavigena 134T, together with the description of the complete genomic sequencing and annotation.

Classification and features

The 16S rRNA genes of the 16 other type strains in the genus Cellulomonas share between 92.2% (C. bogoriensis [7]) and 98.1% (C. persica [8]) sequence identity with strain 124T, whereas the other type strains from the family Cellulomonadaceae, which belong to the genera Actinotalea, Oerskovia, Paraoerskovia and Tropheryma, share less than 95.6% sequence identity [9]. Cultivated strains with highest sequence similarity include a so far unpublished strain 794 (Y09565) from human clinical specimen (99.7% sequence identity) and Everest-gws-44 (EU584517) from glacial meltwater at 6,350 m height on Mount Everest (98.1% sequence identity). The only reported uncultured clone with high sequence similarity (98.5%) originated from a diet-related composition of the gut microbiota of the earthworm Lumbricus rubellus [10]. Metagenomic surveys and environmental samples based on 16S rRNA gene sequences delivered no indication for organisms with sequence similarity values above 93–94% to C. flavigena, indicating that members of this species are not abundant in the so far screened habitats. The majority of these 16S rRNA gene sequences with similarity between 88% and 93% originate from marine metagenomes (status June 2010).

Figure 1 shows the phylogenetic neighborhood of C. flavigena 134T in a 16S rRNA based tree. The sequences of the two 16S rRNA gene copies in the genome differ by two nucleotides from each other and by up to four nucleotides from the previously published sequence generated from NCIMB 8073 (Z79463).

Figure 1.
figure 1

Phylogenetic tree highlighting the position of C. flavigena 134T relative to the other type strains within the family Cellulomonadaceae. The tree was inferred from 1,393 aligned characters [11,12] of the 16S rRNA gene sequence under the maximum likelihood criterion [13] and rooted with the type strain of the suborder Micrococcineae. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates [14] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [15] are shown in blue, published genomes in bold.

Cells of C. flavigena stain Gram-positive with a very fast rate of decolorization [3]. Cells in young broth cultures are typically coryneform with a snapping division (Table 1). In week old cultures a transformation to short rods can occur (Figure 2) [3]. On yeast extract-glucose agar C. flavigena forms smooth, glistening, yellow colonies about 5 mm in diameter. C. flavigena is described as non-motile [3,28], but according to Thayer et al. (1984) C. flavigena cells possess polar multitrichous flagella [31] (not visible in Figure 2). C. flavigena grows under aerobic conditions with an optimal growth temperature of 30°C [2] and an optimal pH of 7 [32].

Figure 2.
figure 2

Scanning electron micrograph of C. flavigena 134T.

Table 1. Classification and general features of C. flavigena 134T according to the MIGS recommendations [16].

Strain 134T is able to ferment glucose, maltose, sucrose, xylose and dextrin, but no fermentation of mannitol was observed [3]. While ribose, acetate and gluconate are utilized, there is no utilization of raffinose and L(+)-lactate [3]. It was shown by Kim et al. (1987) that gluconate is catabolized via the Entner-Doudoroff pathway and hexose monophosphate shunt [33]. C. flavigena produces catalase but no urease [3]. Esculin and gelatin are hydrolyzed and nitrate is not reduced to nitrite [3].

Chemotaxonomy

The peptidoglycan of C. flavigena contains as the diagnostic amino acid in position 3 of the peptide subunit ornithine with the interpeptide bridge containing D-aspartic acid. The major cell wall sugar is rhamnose, whereas mannose and ribose occur in minor amounts [34]. The major components of the fatty acid profile of C. flavigena are 12-methyltetradecanoic (ai-C15:0) and hexadecanoic (C16:0) acids; i-C15:0, ai-C17:0, C14:0 and C15:0 occur in lower amounts [35]. Menaquinone MK-9(H4) is the predominant isoprenoid quinone; minor amounts of MK-9(H2), MK-8(H4) and MK-7(H4) were detected [36]. The polar lipids consist of diphosphatidylglycerol, phosphatidylinositol and two so far unidentified phosphoglycolipids [37].

Genome sequencing and annotation

Genome project history

This organism was selected for sequencing on the basis of its phylogenetic position [38], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [39]. The genome project is deposited in the Genome OnLine Database [15] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information

Growth conditions and DNA isolation

C. flavigena 134T, DSM 20109, was grown in DSMZ medium 92 (Trypticase-Soy-Yeast Extract Medium) [40] at 30°C. DNA was isolated from 0.5–1 g of cell paste using Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany) following the standard protocol as recommended by the manufacturer.

Genome sequencing and assembly

The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website (http://www.jgi.doe.gov/). Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 4,499 overlap ping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated q-scores. A hybrid 454/Sanger assembly was made using PGA assembler. Possible mis-assemblies were corrected and gaps between contigs were closed by primer walks off Sanger clones and bridging PCR fragments and by editing in Consed. A total of 704 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. 12,171,379 Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher [41]). The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 65.38× coverage of the genome. The final assembly contains 46,659 Sanger reads and 601,307 pyrosequencing reads.

Genome annotation

Genes were identified using Prodigal [42] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [43]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes - Expert Review (IMG-ER) platform [44].

Genome properties

The genome is 4,123,179 bp long and comprises one main circular chromosome with a 74.3% G+C content (Table 3 and Figure 3). Of the 3,788 genes predicted, 3,735 were protein-coding genes, and 53 RNAs; 57 pseudogenes were also identified. The majority of the protein-coding genes (71.1%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Figure 3.
figure 3

Graphical circular map of the genome. From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.

Table 3. Genome Statistics
Table 4. Number of genes associated with the general COG functional categories

Insights from genome sequence

A closer look on the genome sequence of C. flavigena revealed a set of genes which are probably responsible for the yellowish color of C. flavigena cells by encoding enzymes that are involved in the synthesis of carotenoids. Carotenoids are produced by the action of geranylgeranyl pyrophosphate synthase (Cfla_2893), squalene/phytoene synthase (Cfla_2892), phytoene desaturase (Cfla_2891), lycopene cyclase (Cfla_2890, Cfla_2889) and lycopene elongase (Cfla_2888). Cfla_2893 is declared as a pseudo gene, but when ignoring the frame shift the deduced amino acid sequence shows significant similarity to geranylgeranyl pyrophosphate synthases. Geranylgeranyl pyrophosphate synthases start the biosynthesis of carotenoids by combining farnesyl pyrophosphate with C5 isoprenoid units to C20-molecules, geranylgeranyl pyrophosphate. The phytoene synthase catalyzes the condensation of two geranylgeranyl pyrophosphate molecules followed by the removal of diphosphate and a proton shift leading to the formation of phytoene. Sequential desaturation steps are conducted by the phytoene desaturase followed by cyclisation of the ends of the molecules catalyzed by the lycopene cyclase [45]. It is remarkable that the genes belonging to the putative carotenoid biosynthesis clusters of Beutenbergia cavernae (Bcav_3492-Bcav_3488) [46], Leifsonia xyli subsp. xyli (crtE, crtB, crtI, crtYe, lctB, crtEb) and Sanguibacter keddieii (Sked_12750-Sked_12800) [47] have a similar size and show the same organization as in the genome of C. flavigena.

In the eighth edition of Bergey’s manual the members of the genus Cellulomonas are described as motile by one or a few flagella or non-motile, even within the genus both characteristics occur [32]. Regarding the motility of C. flavigena there are different observations described. Thayer et al. (1984) report the existence of polar multitrichous flagella [31], whereas Stackebrandt et al. (1979) and Schaal (1986) reported C. flavigena as non-motile [3,48]. In contrast to Thayer’s observation we found no genes coding proteins belonging to the category ‘flagellum structure and biogenesis’ in the genome sequence. Kenyon et al. (2005) report for the genus Cellulomonas a coherency between the production of curdlan, a β-1,3-glucan, and non-motility. They observed that the production of curdlan EPS by the non-motile C. flavigena leads to a closer adherence to cellulose and hemicellulose. In contrast, cells of the motile Cellulomonas strain C. gelida produce no curdlan EPS and are not directly attached to the cellulose fibers [28]. The production of curdlan by C. flavigena is consistent with the observation of 17 glycosyl transferases (GT) belonging to family 2, as β-1,3-glucan synthases are often found in this GT family.

The characteristic attribute of C. flavigena and the other members of the genus Cellulomonas is the ability to degrade cellulose, xylan and starch. The most molecular work has been done on cellulase and xylanase genes from C. fimi, but also cellulases, xylanases and chitinases of C. flavigena were identified and characterized [4952]. The genome sequence and the subsequent annotation revealed that 9.6% of encoded proteins are classified into the COG category ‘carbohydrate transport and metabolism’. Among them several genes coding for xylan degrading enzymes; 14 genes coding for putative endo-1,4-β-xylanases belonging to glycoside hydrolase family 10 and five genes encoding β-xylosidases. For the hydrolysis of cellulose the concerted action of endo-1,4-β-glucanases, 1,4-β-cellobiohydrolases and β-glucosidases is necessary. Endo-1,4-β-glucanases randomly cleave within the cellulose molecule and increase the number of non-reducing ends which are attacked by 1,4-β-cellobiohydrolases. The released cellobiose is cleaved by β-glucosidases. In the genome of C. flavigena two genes coding endo-1,4-β-glucanases (Cfla_0016, Cfla_1897), three genes encoding 1,4-β-cellobiohydrolases (Cfla_1896, Cfla_2912, Cfla_2913) and three genes coding β-glucosidases (Cfla_1129, Cfla_3027, Cfla_2913) were identified.