Introduction

Strain IPP 1246T (= DSM 20469 = ATCC 33793 = JCM 10300) is the type strain of the species Atopobium parvulum and was first described by Weinberg et al. 1937 as ‘Streptococcus parvulus’ (basonym) [1]. In 1992 it was reclassified as A. parvulum [2]. A. parvulum is of high interest because it has frequently been isolated from the human oral cavity, especially from the tongue dorsum, where it has been associated with patients suffering from halitosis (oral malodor) [3,4]. In general, the malodorous compounds are volatile sulfur compounds, with the most frequent ones being hydrogen sulfide, methyl mercaptan, and dimethyl sulfide, which are produced by bacterial metabolism of the sulfur containing amino acids cysteine and methionine [3,4]. However, for A. parvulum itself, the production of these substances has not yet been studied. A. parvulum has not been found to be significantly associated with chronic periodontitis, though a participation in periodontitis can not be fully excluded [5]. Nevertheless, A. parvulum has been associated with odontogenic infections, e.g. dental implants, but also with the saliva of healthy subjects [6]. Here we present a summary classification and a set of features for A. parvulum IPP 1246T together with the description of the complete genomic sequencing and annotation.

Classification and features

Phylotypes with significant 16S sequence similarity to strain IPP 1246T were observed from intubated patients (EF510777) and from metagenomic human skin surveys (GQ081350) [7]. No significant similarities were found in human gut metagenomes (highest similarity is 92%, BABE01011651) [8], or in marine metagenomes (87%, AACY020304192) [9] (status June 2009).

Figure 1 shows the phylogenetic neighborhood of A. parvulum strain IPP P1246T in a 16S rRNA based tree. The sequence of the sole copy of the 16S rRNA gene in the genome is identical with the previously published sequence generated from ATCC 22793 (AF292372), but differs by four nucleotides from the sequence used for the last taxonomic emendation (X67150) [2].

Figure 1.
figure 1

Phylogenetic tree of A. parvulum strain IPP 1246T, all other type strains of the genus Atopobium and the type strains of all other genera within the Coriobacteriaceae, inferred from 1345 aligned characters [10,11] of the 16S rRNA gene sequence under the maximum likelihood criterion [12]. The tree was rooted with the type strains of the genera within the subclass Rubrobacteridae. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1000 bootstrap replicates if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [13] are shown in blue, published genomes in bold, including two of which are reported in this issue of SIGS [14,15]

The cells are cocci (approximately 0.3 to 0.6 µm in diameter) that occur singly, in pairs, in clumps, and in short chains, occasionally with central swelling [16,17] (Table 1 and Figure 2). The strains are non-motile and obligate anaerobic. Interestingly, growth is substantially stimulated by 0.02% (vol/vol) Tween 80 and by 10% (vol/vol) rabbit serum added to culture media [16]. Strain IPP 1246T is susceptible to chloramphenicol (12 µg/ml), clindamycin (1.6 µg/ml), erythromycin (3 µg/ml), penicillin G (2 U/ml), and tetracycline (6 µg/ml) [17].

Figure 2.
figure 2

Scanning electron micrograph of A. parvulum IPP 1246T

Table 1. Classification and general features of A. parvulum IPP 1146T according to the MIGS recommendations [18].

Strain IPP 1126T produces acid (final pH < 4.7) from cellobiose, esculin, fructose, galactose, glucose, inulin, lactose, maltose, mannose, salicin, sucrose, and trehalose; erythritol and xylose were weakly fermented; no acid was produced from amygdalin, arabinose, glycerol, glycogen, inositol, mannitol, melezitose, melibiose, pectin, raffinose, rhamnose, ribose, sorbitol, or starch. Esculin was hydrolyzed; neither starch nor hippurate was hydrolyzed. Nitrate was not reduced. Indole was not formed. A solid acid curd formed in milk; neither milk, gelatin, nor meat was digested. Neither catalase, urease, deoxyribonuclease, lecithinase, nor lipase was detected [17]. Other enzyme activities are positive for acid phosphatase, alanine arylamidase, arginine arylamidase, β-galactosidase, leucine arylamidase, pyroglutamic acid arylamidase, glycine arylamidase, tyrosine arylamidase, but negative for arginine dihydrolase, histidine arylamidase, proline arylamidase, serine arylamidase, as determined using the API system [24].

Chemotaxonomy

The chemotaxonomy of A. parvulum IPP 1246T is unfortunately hardly studied. There are no data known on the polar lipids. The strain possesses cell-wall peptidoglycan of type A4α, L-Lys-D-Asp (type A11.31 according to the DSMZ catalogue of strains; http://www.dsmz.de/microorganisms/main.php?content_id=35) [25]. The major cellular fatty acids (FAME: fatty acid methyl ester; DMA: dimethylacetyl) are C18:1 cis-9 (38.2%, FAME), C18:1 cis-9 (24.1%, DMA), C16:1 cis-9 (5.0%, FAME), C17:1 cis-8 (5.0%, FAME), C18:1 c11/t9/t6 (5.0%, FAME), C18:1 cis-11 (3.9%, DMA), C14:0 (3.4%, FAME), C10:0 (3.0%, FAME) [16].

Genome sequencing and annotation

Genome project history

This organism was selected for sequencing on the basis of each phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [13] and the complete genome sequence is deposited in GenBank Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Table 2. Genome sequencing project information

Growth conditions and DNA isolation

A. parvulum strain IPP 1246T, DSM 20469, was grown anaerobically in DSMZ medium 104 (modified PYG; Medium [26]) at 37°C. DNA was isolated from 0.5–1 g of cell paste using the JGI CTAP procedure with a modified protocol for cell lysis as described in Wu et al. [27].

Genome sequencing and assembly

The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found on the JGI website. 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 1,716 overlapping fragments of 1000bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [28] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification. A total of 125 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. Together all sequence types provided 51.2 x coverage of the genome. The final assembly contains 12,842 Sanger and 359,479 pyrosequence reads.

Genome annotation

Genes were identified using Prodigal [29] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [30]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [31].

Genome properties

The genome is 1,543,805 bp long and comprises one main circular chromosome with a 45.7% GC content (Table 3 and Figure 3). Of the 1419 genes predicted, 1369 were protein coding genes, and 50 RNAs. Sixteen pseudogenes were also identified. The majority of the genes (74.5%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Figure 3.
figure 3

Graphical circular map of the genome. From outside to the center: Genes on forward strand (color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew.

Table 3. Genome Statistics
Table 4. Number of genes associated with the general COG functional categories