Introduction

Parvibaculum lavamentivorans strain DS-1T (DSM13023 = NCIMB13966) was isolated for its ability to degrade linear alkylbenzenesulfonate (LAS), a major laundry surfactant with a world-wide use of 2.5 million tons per annum [1]. Strain DS-1T was difficult to isolate, is difficult to cultivate, and represents a novel genus in the Alphaproteobacteria [2,3]. Strain DS-1 catalyzes not only the degradation of LAS, but also of 16 other commercially important anionic and non-ionic surfactants (hence the species name lavamentivorans = consuming [chemicals] used for washing [3]). The initial degradation as catalyzed by strain DS-1T involves the activation and shortening of the alkyl-chain of the surfactant molecules, and the excretion of short-chain degradation intermediates. These intermediates are then completely utilized by other bacteria in the community [4,5]. P. lavamentivorans DS-1T is therefore an example of a first tier member of a two-step process that mineralizes environmentally important surfactants.

Other representatives of the novel genus Parvibaculum have been recently isolated. Parvibaculum sp. strain JP-57 was isolated from seawater [6] and is also difficult to cultivate [3]. Parvibaculum indicum sp. nov. was also isolated from seawater, via an enrichment culture that degraded polycyclic aromatic hydrocarbons (PAH) and crude oil [7]. Another Parvibaculum sp. strain was isolated from a PAH-degrading enrichment culture, using river sediment as inoculum [8]. Parvibaculum species were also reported in a study on marine alkane-degrading bacteria [9]. Parvibaculum species are frequently detected by cultivation-independent methods, predominantly in habitats or settings with hydrocarbon degradation. These include a bacterial community on marine rocks polluted with diesel oil [10], a bacterial community from diesel-contaminated soil [11], a petroleum-degrading bacterial community from seawater [12], an oil-degrading cyanobacterial community [13] and biofilm communities in pipes of a district heating system [14]. Parvibaculum species have also been detected in denitrifying, linear-nonylphenol (NP) degrading enrichment cultures from NP-polluted river sediment [15] and in groundwater that had been contaminated by linear alkyl benzenes (LABs; non-sulfonated LAS] [16]. Additionally, Parvibaculum species were detected in biofilms that degraded polychlorinated biphenyls (PCBs) using pristine soil as inoculum [17], and in a PAH-degrading bacterial community from deep-sea sediment of the West Pacific [18]. Finally, Parvibaculum species were detected in an autotrophic Fe(II)-oxidizing, nitrate-reducing enrichment culture [19], as well as in Tunisian geothermal springs [20]. The widespread occurrence of Parvibaculum species in habitats or settings related to hydrocarbon degradation implies an important function and role of these organisms in environmental biodegradation, despite their attribute as being difficult to cultivate in a laboratory.

Here we present a summary classification and a set of features for P. lavamentivorans DS-1T, together with the description of a complete genome sequence and annotation. The genome sequencing and analysis was part of the Microbial Genome Program of the DOE Joint Genome Institute.

Classification and features

P. lavamentivorans DS-1T is a Gram-negative, non-pigmented, very small (approx. 1.0 × 0.2 µm), slightly curved rod-shaped bacterium that can be motile by means of a polar flagellum (Figure 1, Table 1). Strain DS-1T grows very slowly on complex medium (e.g. on LB- or peptone-agar plates) and forms pinpoint colonies only after more than two weeks of incubation. The organism can be quickly overgrown by other organisms. Larger colonies are obtained when the complex medium is supplemented with a surfactant, e.g. Tween 20 (see DSM-medium 884 [29]) or LAS [3]. When cultivated in liquid culture with mineral-salts medium, strain DS-1T grows within one week with the single carbon sources acetate, ethanol, or succinate, or alkanes, alkanols and alkanoates (C8–C16); no sugars tested were utilized [3].

Figure 1.
figure 1

Scanning electron micrograph of P. lavamentivorans DS-1T. Cells derived from a liquid culture that grew in acetate/mineral salts medium.

Table 1. Classification and general features of Parvibaculum lavamentivorans DS-1T.

To allow for growth in liquid culture with most of the 16 different surfactants at high concentrations (e.g. for LAS, >1 mM; see [3].), the culture fluid needs to be supplemented with a solid surface, e.g. polyester fleece or glass fibers [2,3]. The additional solid surface is believed to support biofilm formation, especially in the early growth phase when the surfactant concentration is high, and the organism grows as single, suspended cells (non-motile) during the later growth phase. Growth with a non-membrane toxic substrate (e.g. acetate) is independent of a solid surface, and constitutes suspended, single cells (motile). We presume that the biofilm formation by strain DS-1T is a protective response to the exposure to membrane-solubilizing agents (cf. [30]).

Based on the 16S rRNA gene sequence, strain DS-1T was described as the novel genus Parvibaculum, which was originally placed in the family Phyllobacteriaceae within the order Rhizobiales of Alphaproteobacteria [3,31]. The nearest well-described organism to strain DS-1T is Afifella marina (formerly Rhodobium marinum) (92% 16S rRNA gene sequence identity), a photosynthetic purple, non-sulfur bacterium. The genus Rhodobium was later re-classified as a member of the novel family Rhodobiaceae [26,32], together with two novel genera of other photosynthetic purple non-sulfur bacteria (Afifella and Roseospirillum), as well as with two novel genera of heterotrophic aerobic bacteria, represented by the red-pigmented Anderseniella baltica (gen. nov., sp. nov.) [33,34] and non-pigmented Tepidamorphus gemmatus (gen. nov., sp. nov.) [35,36]. A phylogenetic tree (Figure 2) was constructed with the 16S rRNA gene sequence of P. lavamentivorans DS-1T and that of (i) other isolated Parvibaculum strains, (ii) representatives of other genera within the family Rhodobiaceae, (iii) representatives of the genera in the family Phyllobacteriaceae, as well as, (iv) representatives of other families within the order Rhizobiales. The phylogenetic tree shows now the placement of Parvibaculum species within the family Rhodobiaceae, and that the Parvibaculum sequences clustered as a distinct evolutionary lineage within this family (Figure 2). This classification of Parvibaculum has been adopted in the Ribosomal Database Project (RDP) and SILVA rRNA Database Project, but not in the GreenGenes database. The family Rhodobiaceae has also not been included in the NCBI-taxonomy, IMG-taxonomy, and GOLD databases.

Figure 2.
figure 2

Phylogenetic tree of 16S rRNA gene sequences showing the position of P. lavamentivorans DS-1T relative to other type strains within the families Rhodobiaceae, Phyllobacteriaceae and other families in the order Rhizobiales (see the text). Strains within the Rhodobiaceae and Phyllobacteriaceae shown in bold have genome projects underway or completed. The corresponding 16S rRNA gene accession numbers (or draft genome sequence identifiers) are indicated. The sequences were aligned using the GreenGenes NAST alignment tool [37]; neighbor-joining tree building and visualization involved the CLUSTAL and DENDROSCOPE software [38]. Caulobacterales sequences were used as outgroup. Bootstrap values >30 % are indicated; bar, 0.01 substitutions per nucleotide position.

Currently, 360 genome sequences of members of the order Rhizobiales of Alphaproteobacteria have been made available (GOLD database; August 2011), and within the family Phyllobacteriaceae there are 21 genome sequences available (Chelativorans sp. BNC1, Hoeflea phototrophica DFL-43, and 18 Mesorhizobium strains). No genome sequences currently exist for a representative of the novel family Rhodobiaceae, except of the genome of P. lavamentivorans DS-1T.

Chemotaxonomy

Examination of the respiratory lipoquinone composition of strain DS-1T showed that ubiquinones are the sole respiratory quinones present, and the major lipoquinone is ubiquinone 11 (Q11) [3]. The fatty acids of P. lavamentivorans are straight chain saturated and unsaturated, as well as ester- and amide-linked hydroxy-fatty acids, in membrane fractions [3]. The major polar lipids are phosphatidyl glycerol, diphosphatidyl glycerol, phosphatidyl ethanolamine, phosphatidyl choline, and two, unidentified aminolipids; the presence of the two additional aminolipids appears to be distinctive of the organism [3]. The G+C content of the DNA was determined to be 64% [3], which corresponds well to the G+C content observed for the complete genome sequence (see below).

Genome sequencing information

Genome project history

The genome was selected for sequencing as part of the U.S. Department of Energy - Microbial Genome Program 2006. The DNA sample was submitted in April 2006 and the initial sequencing phase was completed in October 2006. The genome finishing and assembly phase were completed in June 2007, and presented for public access on December 2007; a modified version was presented in February 2011. Table 2 presents the project information and its association with MIGS version 2.0 compliance [39].

Table 2. Project information

Growth conditions and DNA isolation

P. lavamentivorans DS-1T was grown on LB agar plates (2 weeks) and pinpoint colonies were transferred into selective medium (1 mM LAS/minimal salts medium; with glass-fiber supplement, 5-ml scale [3]). This culture was sub-cultivated to larger scale (100-ml and 1-liter scale) in 30 mM acetate/minimal salts medium; cell pellets were stored frozen until DNA preparation. DNA was prepared following the JGI DNA Isolation Bacterial CTAB Protocol [40].

Genome sequencing and assembly

The genome of P. lavamentivorans DS-1T was sequenced at the Joint Genome Institute (JGI) using a combination of 3.5 kb, 9 kb and 37 kb DNA libraries. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website [41]. Draft assemblies were based on 76,870 reads. Combined, the reads from all three libraries provided 16-fold coverage of the genome. The Phred/Phrap/Consed software package [42] was used for sequence assembly and quality assessment [4345]. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [46], PCR amplification, or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI, USA). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification (Roche Applied Science, Indianapolis, IN, USA). A total of 24 primer walk reactions were necessary to close gaps and to raise the quality of the finished sequence. The completed genome assembly contains 76,885 reads, achieving an average of 16-fold sequence coverage per base with an error rate less than 5 in 100,000.

Genome annotation

Genes were identified using a combination of Critica [47] and Glimmer [48] as part of the genome annotation pipeline at Oak Ridge National Laboratory (ORNL), Oak Ridge, TN, USA, followed by a round of manual curation. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases; miscellaneous features were predicted using TMHMM [49] and signalP [50]. These data sources were combined to assert a product description for each predicted protein. The tRNAScanSE tool [51] was used to find tRNA genes, whereas ribosomal RNAs were found by using BLASTn against the ribosomal RNA databases. The RNA components of the protein secretion complex and the RNaseP were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [52]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [41] developed by the Joint Genome Institute, Walnut Creek, CA, USA [53].

Genome properties

The genome of P. lavamentivorans DS-1T comprises one circular chromosome of 3,914,745 bp (62.33% GC content) (Figure 3), for which a total number of 3,714 genes were predicted. Of these predicted genes, 3,654 are protein-coding genes, and 2,723 of the protein-coding genes were assigned to a putative function and the remaining annotated as hypothetical proteins; 18 pseudogenes were also identified. A total of 60 RNA genes and one rRNA operon are predicted; the latter is reflective of the slow growth of P. lavamentivorans DS-1T [54,55]. Furthermore, one Clustered Regularly Interspaced Short Palindromic Repeats element (CRISPR) including associated protein genes were predicted. The properties and the statistics of the genome are summarized in Table 3, and the distribution of genes into COGs functional categories is presented in Table 4.

Figure 3.
figure 3

Graphical circular map of the genome of P. lavamentivorans DS-1T. From outside to center: Genes on forward strand (color by COG categories), genes on reverse strand (color by COG categories), RNA genes (tRNA, green; rRNA, red; other RNAs, black), GC content, GC skew.

Table 3. Nucleotide and gene count levels of the genome of P. lavamentivorans DS-1T
Table 4. Number of genes associated with the general COG functional categories in P. lavamentivorans DS-1T

Metabolic features

The genome of P. lavamentivorans encodes complete pathways for synthesis of all proteinogenic amino acids and essential co-factors, and the central metabolism is represented by a complete pathway for the citrate cycle, glycolysis/gluconeogenesis, and the non-oxidative branch of the pentose-phosphate pathway; no candidate genes for the oxidative branch of the pentose-phosphate pathway or for the Entner-Doudoroff pathway are predicted.

P. lavamentivorans DS-1T does not grow on D-glucose, D-fructose, maltose, D-mannitol, D-mannose, and N-acetylglucosamine [3,7], and there are no valid candidate genes predicted in the genome for ATP-dependent sugar uptake systems or for D-glucose uptake via a phosphotransferase system. Similarly, no valid candidate genes were predicted for ATP-dependent amino-acid and di/oligo-peptide transport systems or for other amino-acid/peptide transporters, which reflects the poor growth of strain DS-1T in complex medium (LB-medium).

For the assimilation of acetyl-CoA from the degradation of alkanes and surfactants [2,3,5], or during growth with acetate, the genome of P. lavamentivorans encodes the glyoxylate cycle (isocitrate lyase, Plav_0592; malate synthase, Plav_0593) to generate succinate for the synthesis of carbohydrates. The genome also encodes the complete ethyl-malonyl-CoA pathway to assimilate acetate [56]. This observation, i.e. glyoxylate cycle and ethyl-malonyl-CoA pathway in the same organism, has been made before [57], and these two pathways in P. lavamentivorans DS-1T might be differentially expressed under varying environmental conditions.

For the degradation of alkanes and surfactants through abstraction of acetyl-CoA [54], the genome contains a wealth of candidate genes for the entry into alkyl-chain degradation (omega-oxygenation to activate the chain) supplemented by a variety of genes predicted for omega-oxidations (to generate the corresponding fatty-acids) and fatty-acid beta-oxidations (to excise acetyl-CoA units). We are currently exploring this high abundance of genes for alkane/alkyl-utilization in strain DS-1T by transcriptional and translational analysis [unpublished]. For example, at least nine cytochrome-P450 (CYP) alkane monooxygenase (COG2124), 44 alcohol dehydrogenase (COG1028), 11 aldehyde dehydrogenase (COG1012), 20 acyl-CoA synthetase (COG0318), 40 acyl-CoA dehydrogenase (COG1960), 31 enoyl-CoA hydratase (COG1024), 14 acyl-CoA acetyl-transferase (COG0183), six thioesterase (COG0824), and 17 putative long-chain acyl-CoA thioester hydrolase (PF03061) candidate genes are predicted in the genome.

Other predicted oxygenase genes comprise three putative Baeyer-Villiger-type FAD-binding monooxygenase genes (COG2072). Cyclohexanone and hydroxyacetophenone, which are putative substrates for such oxygenases (e.g. [58,59]) were tested as carbon source for growth of strain DS-1T, as well as cycloalkanes (C6, C8, C12), however, none supported growth. The terpenoids camphor (for the involvement of a cytochrome-P450 oxygenase in the degradation pathway [60]) and geraniol, citronellol, linalool, menthol and eucalyptol (for the involvement of acyl-CoA interconversion enzymes in the degradation pathways) as substrates for growth were also tested negative.

In contrast to the high abundance of genes for aliphatic-hydrocarbon degradation, the genome contains few genes for aromatic-hydrocarbon degradation. One gene set for an aromatic-ring dioxygenase component (Plav_1761 and 1762; BenAB-type), three aromatic-ring monooxygenase component genes (Plav_1541 and 0131, MhpA-type; Plav_1785, HpaB-type), and three valid candidate genes for extradiol ring-cleavage dioxygenase (Plav_1539 [61] and 1787, BphC-type; Plav_0983, LigB-type) were predicted in the genome. Strain DS-1T did not grow with benzoate, protocatechuate, phenylacetate, phenylpropionate, or phenylalanine and tyrosine as carbon source when tested.

Finally, P. lavamentivorans DS-1T is predicted to store carbon in form of intracellular polyhydroxyalkanoate/butyrate (PHB) as its genome encodes a PHB-synthase (PhbC) gene (Plav_1129), PHB-depolymerase (PhaZ) gene (Plav_0012), and PHB-synthesis repressor (PhaR) gene (Plav_1572).