Introduction

The strain MP5ACTX9T (=ATCC BAA-1859T =DSM 23138T) is the type strain of Granulicella tundricola [tun.dri.co’la. N.L. n. tundra, tundra, a cold treeless region; L. masc. suffix-cola (from L. n. incola) dweller; N.L. n. tundricola tundra dweller] that was isolated from soil at the Malla Nature Reserve, Kilpisjärvi, Finland; 69°01′N, 20°50′E) and described along with other species of the genus Granulicella isolated from tundra soil [1].

Acidobacteria is a phylogenetically and physiologically diverse phylum [2,3], the members of which are ubiquitously found in diverse habitats and are abundant in most soil environments [4,5] including Arctic tundra soils [6,7]. Acidobacteria are relatively difficult to cultivate, as they have slow growth rates. To date only subdivisions 1, 3, 4, 8, 10 and 23 Acidobacteria are defined by taxonomically characterized representatives [823] as well as three ‘Candidatus’ taxa [24,25]. The phylogenetic diversity, ubiquity and abundance of this group suggest that they play important ecological roles in soils. The abundance of Acidobacteria correlates with soil pH [26,27] and carbon [28,29], with subdivision 1 Acidobacteria being most abundant in slightly acidic soils. Acidobacteria, including members of the genera Granulicella and Terriglobus, dominate the acidic tundra heaths of northern Finland [26,3032]. Using selective isolation techniques we have been able to isolate several slow growing and fastidious strains of Acidobacteria [1,11]. On the basis of phylogenetic, phenotypic and chemotaxonomic data, including 16S rRNA, rpoB gene sequence similarity and DNA-DNA hybridization, strain MP5ACTX9T was classified as a novel species of the genus Granulicella [1]. Here, we summarize the physiological features together with the complete genome sequence, annotation and data analysis of Granulicella tundricola strain MP5ACTX9T.

Classification and features

Within the genus Granulicella, eight species are described with validly published names: G. mallensis MP5ACTX8T, G. tundricola MP5ACTX9T, G. arctica MP5ACTX2T,G. sapmiensis S6CTX5AT isolated from Arctic tundra soil [1] and G. paludicola OB1010T, G. paludicola LCBR1, G. pectinivorans TPB6011T,G. rosea TPO1014T,G. aggregans TPB6028T isolated from sphagnum peat bogs [2].

Strain MP5ACTX9T shares 95.5–97.2% 16S rRNA gene identity with tundra soil strains G. mallensis MP5ACTX8T (95.5%), G. arctica MP5ACTX2T (96.9%), G. sapmiensis S6CTX5AT (97.2%) and 95.2 – 97.7% identity with the sphagnum bog strains, G. pectinivorans TPB6011T (97.7%), G. rosea TPO1014T (97.2%), %), G. aggregans TPB6028T (96.8%), G. paludicola LCBR1 (95.9%), and G. paludicola strain OB1010T (95.3%), which were isolated from sphagnum peat. Phylogenetic analysis based on the 16S rRNA gene of taxonomically classified strains of family Acidobacteriaceae placed G. rosea type strain T4T (AM887759) as the closest taxonomically classified relative of G. tundricola strain MP5ACTX9T (Table 1, Figure 1).

Figure 1.
figure 1

Phylogenetic tree highlighting the position of G. tundricola MP5ACTX9T (shown in bold) relative to the other type strains within subdivision1 Acidobacteria. The maximum likelihood tree was inferred from 1,361 aligned positions of the 16S rRNA gene sequences and derived based on the Tamura-Nei model using MEGA 5 [42]. Bootstrap values >50 (expressed as percentages of 1,000 replicates) are shown at branch points. Bar: 0.01 substitutions per nucleotide position. The corresponding GenBank accession numbers are displayed in parentheses. Strains whose genomes have been sequenced, are marked with an asterisk; G. mallensis MP5ACTX8T (CP003130), G. tundricola MP5ACTX9T (CP002480), T. saanensis SP1PR4T (CP002467), T. roseus KBS63T (CP003379), and A. capsulatum ATCC 51196T (CP001472). Bryobacter aggregatus MPL3 (AM162405) in SD3 Acidobacteria was used as an outgroup.

Table 1. Classification and general features of G. tundricola strain MP5ACTX9T

Morphology and physiology

G. tundricola cells are Gram-negative, non-motile, aerobic rods, approximately 0.5 µm wide and 0.5 – 1.8 µm long. Colonies on R2A agar are pink, circular, convex and smooth. Growth occurs at +4 to 28°C and at pH 3.5–6.5 with an optimum at 21–24°C and pH 5 (Fig. 2). Genotypic analyses, including low rpoB gene sequence similarity and phenotypic characteristics clearly distinguished strain MP5ACTX9T from other Granulicella species/strains, leading us to conclude that MP5ACTX9T represents a novel species of the genus Granulicella, for which the name Granulicella tundricola sp. nov. was proposed [1].

Figure 2.
figure 2

Electron micrograph of G. tundricola MP5ACTX9T

Strain MP5ACTX9T hydrolyzed complex to simple carbon substrates [1] which include complex polysaccharides like aesculin, pectin, laminarin, starch and pullulan, but not gelatin, cellulose, lichenan, sodium alginate, xylan, chitosan or chitin. Strain MP5ACTX9T also utilized the following sugars as growth substrates: D-glucose, maltose, cellobiose, D-fructose, D-galactose, lactose, lactulose, D-mannose, sucrose, trehalose, D-xylose, raffinose, N-acetyl-D-glucosamine, glutamate and gluconic acid. Enzyme activities reported for the strain MP5ACTX9T include acid phosphatase, esterase (C4 and C8), leucine arylamidase, valine arylamidase, α-chymotrypsin, trypsin, naphthol-AS-BI-phosphohydrolase, α- and β-galactosidases, α- and β-glucosidases, N-acetyl-β-glucosaminidase, β-glucuronidase, α-fucosidase and α-mannosidase but negative for alkaline phosphatase and lipase (C14). Strain MP5ACTX9T is resistant to ampicillin, erythromycin, chloramphenicol, neomycin, streptomycin, tetracycline, gentamicin, bacitracin, polymyxin B and penicillin, but susceptible to rifampicin, kanamycin, lincomycin and novobiocin.

Chemotaxonomy

The major cellular fatty acids in G. tundricola are iso-C15:0 (46.4%), C16:1ω7c (35.0%) and C16:0 (6.6%). The cellular fatty acid composition of strain MP5ACTX9T was similar to that of other Granulicella strains with fatty acids iso-C15:0 and C16:1ω7c being most abundant in all strains. Strain MP5ACTX9T contains MK-8 as the major quinone and also contains 4% of MK-7.

Genome sequencing and annotation

Genome project history

G. tundricola strain MP5ACTX9T was selected for sequencing in 2009 by the DOE Joint Genome Institute (JGI) community sequencing program. The Quality Draft (QD) assembly and annotation were completed on May 24, 2010. The GenBank Date of Release was February 2, 2011. The genome project is deposited in the Genomes On-Line Database (GOLD) [43] and the complete genome sequence of strain MP5ACTX9T is deposited in GenBank (CP002480.1). Table 2 presents the project information and its association with MIGS version 2.0 [44].

Table 2. Project information.

Growth conditions and genomic DNA extraction

G. tundricola MP5ACTX9T was cultivated on R2 medium as previously described [1]. Genomic DNA (gDNA) of high sequencing quality was isolated using a modified CTAB method and evaluated according to the Quality Control (QC) guidelines provided by the DOE Joint Genome Institute [45].

Genome sequencing and assembly

The finished genome of G. tundricola MP5ACTX9T (JGI ID 4088693) was generated at the DOE Joint genome Institute (JGI) using a combination of Illumina [46] and 454 technologies [47]. For this genome we constructed and sequenced an Illumina GAii shotgun library which generated 42,620,699 reads totaling 3239 Mb, a 454 Titanium standard library which generated 146,119 reads and three paired end 454 libraries with an average insert size of 9.3 kb which generated 178,757 reads totaling 154.3 Mb of 454 data. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website [45]. The 454 Titanium standard data and the 454 paired end data were assembled with Newbler, version 2.3. Illumina sequencing data was assembled with Velvet, version 0.7.63 [48]. The 454 Newbler consensus shreds, the Illumina Velvet consensus shreds and the read pairs in the 454 paired end library were integrated using parallel phrap, version SPS - 4.24 (High Performance Software, LLC) [49]. The software Consed [50] was used in the finishing process. The Phred/Phrap/Consed software package [51] was used for sequence assembly and quality assessment in the subsequent finishing process. Illumina data was used to correct potential base errors and increase consensus quality using the software Polisher developed at JGI (Alla Lapidus, unpublished). Possible misassemblies were corrected using gapResolution (Cliff Han, un-published), Dupfinisher [52] or sequencing cloned bridging PCR fragments with sub-cloning. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR (J-F Cheng, unpublished) primer walks. The final assembly is based on 29.1 Mb of 454 draft data which provides an average 20× coverage of the genome and 975 Mb of Illumina draft data which provides an average 274× coverage of the genome.

Genome annotation

Genes were identified using Prodigal [53] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [54]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, (COGs) [55,56], and InterPro. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [57], RNAMMer [58], Rfam [59], TMHMM [60], and signalP [61]. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [62].

Genome properties

The genome is 5,503,984 bp in size, which includes the 4,309,151 bp chromosome and five plasmids pACIX901 (0.48 Mbp); pACIX902 (0.3 Mbp); pACIX903 (0.19 Mbp), pACIX904 (0.12 Mbp) and pACIX905 (0.12 Mbp), with a GC content of 59.9 mol%. There are 52 RNA genes (Figures 3 and 4, and Table 3). Of the 4,758 predicted genes, 4,706 are protein-coding genes (CDSs) and 163 are pseudogenes. Of the total CDSs, 68.8% represent COG functional categories and 27.5% consist of signal peptides. The distribution of genes into COG functional categories is presented in Figure 3 and Table 4, and Table 5.

Figure 3.
figure 3

Circular representation of the chromosome of G. tundricola MP5ACTX9T displaying relevant genome features. From outside to center; Genes on forward strand (colored by COG categories), genes on reverse strand (colored by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content and GC skew.

Figure 4.
figure 4

Circular representation of the plasmids of G. tundricola MP5ACTX9T displaying relevant genome features. From outside to center; Genes on forward strand (color by COG categories), genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content and GC skew. Order and size from left to right: pACIX901, 0.48 Mbp; pACIX902, 0.3 Mbp; pACIX903, 0.19 Mbp; pACIX904, 0.12 Mbp; pACIX905, 0.12 Mbp.

Table 3. Summary of genome: one chromosome and five plasmids
Table 4. Genome statistics.
Table 5. Number of genes associated with general COG functional categories.

Discussion

Granulicella tundricola MP5ACTX9T is a tundra soil strain with a genome consisting of a circular chromosome and five mega plasmids ranging in size from 1.1 × 105 to 4.7 × 105 bp for a total genome size of 5.5 Mbp. The G. tundricola genome also contains close to twice as many pseudogenes and a large number of mobile genetic elements as compared to Granulicella mallensis and Terrigobus saanensis, two other Acidobacteria isolated from the same habitat [29]. A large number of genes assigned to COG functional categories for transport and metabolism of carbohydrates (6.9%) and amino acids (6.5%) and involved in cell envelope biogenesis (8%) and transcription (6.9%) were identified. Further genome analysis revealed an abundance of gene modules encoding for functional activities within the carbohydrate-active enzymes (CAZy) families [63,64] involved in breakdown, utilization and biosynthesis of carbohydrates. G. tundricola hydrolyzed complex carbon polymers, including CMC, pectin, lichenin, laminarin and starch, and utilized sugars such as cellobiose, D-mannose, D-xylose and D-trehalose. Genome predictions for CDSs encoding for enzymes such as cellulases, pectinases, alginate lyases, trehalase and amylases are in agreement with biochemical activities in strain MP5ACTX9T. However, the genome of G. tundricola did contain many CDSs encoding for GH18 chitinases although no chitinase activity was detected after 10 day-incubation with chitinazure [29]. In addition, the G. tundricola genome contained a cluster of genes in close proximity to the cellulose synthase gene (bcsAB), which included cellulase (bscZ) (endoglucanase Y) of family GH8, cellulose synthase operon protein (bcsC) and a cellulose synthase operon protein (yhjQ) involved in cellulose biosynthesis. We previously reported on a detailed comparative genome analysis of G. tundricola MP5ACTX9T with other Acidobacteria strains for which finished genomes are available [29]. The data suggests that G. tundricola is involved in hydrolysis and utilization of stored carbohydrates and biosynthesis of exopolysaccharides from organic matter and plant based polymers in the soil. Therefore, G. tundricola may be central to carbon cycling processes in Arctic and boreal soil ecosystems.