Introduction

The genus Lampropedia , a member of the family Comamonadaceae [1] was established by Schroeter in 1886 [2] with the description of square, tablet forming cells of Lampropedia hyalina . Henceforth, strains of the same species, L. hyalina have been isolated from pond water [2], liquid manure of a dairy farm yard [3], fistulated heifer [4] and activated sludge [5]. L. hyalina was isolated from activated sludge, and was tested for its phosphate removal capabilities and was classified as belonging to the functional group of polyphosphate accumulating microorganisms [5]. Another species, L. cohaerens strain CT6T [6] was isolated from arsenic rich microbial mats of a Himalayan hot water spring from Manikaran, India as a continuation to our efforts to explore the culturable [710] and unculturable [11] diversity at the Himalayan hot spring to understand the role played by niche specific genetic determinants in shaping the genomes of organisms inhabiting this stressed niche. L. cohaerens , a biofilm forming and arsenic tolerating bacterium [6], showed limited carbohydrate assimilation potential but could utilize some organic acids. Currently, the genus Lampropedia is represented by three species, L. hyalina ATCC 11041T [12], “ L. puyangensis 2-binT” (not validly published) [13] and L. cohaerens CT6T [6], leading to the description of the genus being emended [6], however, the genomic potential of this small group remains unresolved. The genome of strain CT6T, which is the type strain for Lampropedia cohaerens was sequenced in order to supplement the phenotypic taxonomical observations with genetic data and obtain genomic insights into heavy metal resistance and metabolic potential of gene complements of this microbial mat dweller. Here, we describe the summary classification, properties, genome sequencing, assembly and annotation of L. cohaerens CT6T (DSM 100029T =KCTC 42939T ).

Organism information

Classification and features

L. cohaerens was characterized by using a polyphasic approach with the integration of genotypic, phenotypic and chemotaxonomic methods [6]. This Gram-stain-negative, aerobic bacterial strain, forms white, smooth colonies with irregular margins on LB agar [6]. Transmission electron microscopy (TEM) revealed coccoid, unflagellated cells approximately 0.62 μm × 0.39 μm in dimension (Fig. 1). Summary characteristics are mentioned in Table 1. The slightly thermophilic and arsenic tolerant L. cohaerens strain CT6 can tolerate temperature in the range 20–55 °C and can tolerate arsenic trioxide up to 80 parts per billion [6]. The NaCl tolerance for strain CT6T was tested as 1–3 % (w/v) and pH range as 6–9. Biofilm formation is observed in LB media, inspiring its etymology. L. cohaerens showed closest phylogenetic similarity to “ L. puyangensis 2-binT” (96.4 %) and L. hyalina ATCC 11041T (95.4 %) on the basis of 16S rRNA gene sequences. A maximum-likelihood [14] phylogenetic tree based on Jukes-Cantor [15] model using MEGA version 6 [16] constructed with closely related members of family Comamonadaceae on the basis of Blast-n [17] of 16S rRNA gene placed strain CT6T along with the members of genus Lampropedia with bootstrap [18] confidence value of 98 % (Fig. 2). Positive biochemical tests included the hydrolysis of tween 20, tween 80 and starch and utilization of capric acid, malic acid, citric acid, xanthine and hypoxanthine [6]. Catalase test was positive whereas oxidase test was negative [6]. The most prominent fatty acid methyl esters were C16:0, summed feature 8 (C18:1 ω7c/C18:1 ω6c), C14:0, C19:0 ω8c cyclo and summed feature 3 (C16:1 ω7c/C16:1 ω6c) [6]. The major polar lipids detected in strain CT6T were phosphatidylethanolamine, phosphatidylglycerol and a glycolipid [6]. Strain CT6T demonstrated the presence of putrescine, 2-hydroxyputrescine and spermidine as the major polyamines and ubiquinone-8 as the major quinone [6].

Fig. 1
figure 1

TEM of Lampropedia cohaerens strain CT6T cells. Length of bar = 0.5 μm

Table 1 Classification and general features of Lampropedia cohaerens CT6T [39, 40]
Fig. 2
figure 2

Maximum-Likelihood phylogenetic tree based on 16S rRNA gene sequences of L. cohaerens strain CT6T and its nearest phylogenetic neighbours based on blast-n similarity. All phylogenetic neighbours belong to the family Comamonadaceae. The tree was computed using the Jukes and Cantor model. Bootstrap values (>70 %) calculated for 1000 subsets are shown at branch points. Bar 2 substitutions per 100 nucleotide positions. *Not validly published

Genome sequencing information

Genome project history

Whole genome sequencing was performed at Beijing Genomics Institute Technology Solutions, Hong Kong, China using the Illumina HiSeq 2000 technology. Sequencing was done using 500 bp and 2 kbp paired end libraries. Raw data was generated within a duration of 3 months. De-novo assembly was performed in-house at the University of Delhi. The draft genome sequence was submitted to NCBI under the accession number LBNQ00000000 (version 1 LBNQ01000000). The sequences were also submitted to IMG-JGI portal under GOLD Analysis Project ID Ga0079366. Sequence project information in compliance with MIGS version 2.0 is given in Table 2.

Table 2 Project information

Growth conditions and genomic DNA preparation

Genomic DNA was isolated from a 25 ml culture grown in LB medium incubated at 37 °C. Mid-logarithmic phase culture (O.D. 0.6) was harvested and cells were lysed in TE25S buffer (25 mM Tris-HCl pH 8.0, 25 mM EDTA, 0.3 M sucrose, 1.0 mg/ml lysozyme), followed by removal of proteins by 1.0 % SDS and 1.0 mg/ml proteinase-K at 55 °C. This was followed by DNA purification steps using Phenol : Chloroform : Isoamyl alcohol (25 : 24 : 1) and Chloroform : Isoamyl alcohol (24 : 1). DNA was precipitated using 0.6 volume of Isopropanol. After washing with 70 % ethanol, DNA was dissolved in 5 mM Tris-EDTA. Sample concentration was estimated as 347.1 ng/μl by microplate reader and integrity was checked using agarose gel electrophoresis prior to sequencing. Purity ratios 260/280 and 260/230 were 1.89 and 1.91 respectively.

Genome sequencing and assembly

Genomic DNA was sequenced using 500 bp and 2 kbp paired-end libraries. Raw read filtering and removal of adapters were carried out at the BGI Technology Solutions Co. Limited, China. A total of 7.5 Gb raw data was generated with 33,961,144 clean reads encompassing a total of 3,056,502,960 clean bases. De-novo assembly of raw reads using ABySS version 1.3.5 [19] generated 41 contigs greater than 500 bp at k-mer 51 with n50 value of 165,853. Assembly validation was done by aligning raw reads onto finished contigs using Burrows Wheeler Aligner version 0.7.9a [20] followed by visual inspection using Tablet version 1.14.04.10 [21]. The final draft was assembled into 41 contigs with a mean contig size of 77,047 bp. The assembled genome had 3,158,922 bases with 63.5 % G+C content.

Genome annotation

For initial annotations, sequences were submitted to the NCBI Prokaryotic Genomes Annotation Pipeline. Additionally, the sequences were uploaded on Integrated Microbial Genomes pipeline [22] under the umbrella of Joint Genome Institute [22]. Coding sequence prediction was performed using Prodigal V2.6.2 [23]. rRNA operons were predicted using RNAmmer version 1.2 [24]. tRNAs and tmRNAs were predicted using ARAGORN [25]. Phage Search Tool [26] was used to find phages in the genome. CRISPRs were found online by CRISPR finder online server [27]. For prediction of signal peptides and transmembrane domains, SignalP 4.1 server [28] and TMHMM server v. 2.0 [29] were used respectively. COG category assignment and Pfam domain predictions were done using WebMGA server [30].

Genome properties

The final draft genome consists of 41 contigs with a total of 3,158,922 bp and a G+C mol% of 63.5. A total of 2909 coding sequences were predicted accounting for a coding density of 88.92 %. Out of the total coding sequences, 83.84 % were assigned functions. Protein coding genes were 2823 and comprised 97.04 % of the total; RNA coding genes were 86 in number and 56 tRNAs were detected. Five rRNA operons were predicted with complete 5S-16S-23S rRNA genes (Fig. 3). Three confirmed CRISPRs were detected, one on contig 13 and two on contig 33. Two incomplete phages were also detected having a phage integrase and an attR site for integration. Pfam domains were detected for 2539 genes, 238 genes were found to code for proteins harbouring signal peptides and 665 genes with transmembrane domains (Table 3). Out of the total genes, 2713 (92.09 %) were assigned to COG categories. COG category assignment placed majority of genes to general function prediction only (10.62 %), amino acid transport and metabolism (10.31 %), inorganic ion transport and metabolism (6.92 %) and energy production and conversion (6.21). 6.24 % genes were placed in the function unknown category, whereas 7.91 % genes were not placed into the COGs (Table 4).

Fig. 3
figure 3

A graphical circular map of the genome performed with CGview comparison tool [49]. From outside to centre, ring 1 and 2 show protein coding genes on both the forward and reverse strand; ring 3 shows G+C% content plot, and ring 4 shows GC skew

Table 3 Genome statistics
Table 4 Number of genes associated with general COG functional categories

Insights from the genome sequence

Consistent with the limited metabolic potential of L. cohaerens , the genome sequence was found to lack hexokinase and glucokinase, key enzymes involved in glycolysis. Additionally, the lack of pentose phosphate pathway genes glucose-6-phosphate 1-dehydrogenase and 6-phosphogluconolactonase are responsible for the organism’s inability to utilize carbohydrates. However, genes involved in Entner-Doudoroff pathway and non-phosphorylated ED pathways were identified. nED pathway enzyme D-gluconate dehydratase (EC 4.2.1.39) which brings about the conversion of D-gluconate to 2-keto-3-deoxy-D-gluconate [31] was identified, along with conventional ED pathway enzyme 2-keto-3-deoxy-6-phosphogluconate aldolase (EC 4.1.2.14) which brings about the conversion of KDPG (generated after the first step in ED pathway) to pyruvate and glyceraldehyde-3-phosphate [32]. Although L. cohaerens possesses enzymes involved in both the ED and nED pathway, the link between the two could not be established as the enzyme KDG kinase which brings about the conversion of KDG to KDPG could not be identified.

L. cohaerens CT6T was isolated from hot spring microbial mats, known to be rich in heavy metal sulfides. Microbiota present at hot springs have developed resistance mechanisms to withstand and survive high heavy metal concentrations. Consequently, L. cohaerens demonstrated a repertoire of heavy metal resistance genes. Among genes imparting resistance against arsenic, arsenate reductase genes arsC (AAV94_10615), arsenic resistance genes arsH (AAV94_10620), arsenic transporter ACR3 (AAV94_10610) and transcriptional regulator arsR (AAV94_10600, AAV94_10605) were found. Two arsenic resistance clusters were found on contig 33 harbouring two copies of arsR, a copy of ACR3, and a copy of arsenate reductase arsC. In one of the clusters, an additional gene arsH, coding for arsenical resistance protein was found. Additionally, a gene arsB coding for arsenic efflux pump protein was identified. Among heavy metals, copper, a trace mineral element is taken up by living cells to get incorporated into a number of enzymes, particularly cytochrome oxidases; however, in excess it becomes toxic to the cells. Copper resistance mechanisms in bacteria involve the cus system, the cue system and the pco system [33]. Excess copper is removed either by efflux of the cations or by periplasmic detoxification. Cue system genes copA, an ion translocating ATPase; cueO [34], a perplasmic multicopper oxidase and cueR, a copper response metalloregulatory protein which acts as the regulator of both copA and cueO [35] were identified. Cus system genes cusA and cusB both coding for cation efflux proteins are harboured by L. cohaerens . Additionally, pco system genes, copC and copD were present in its genome. Genes imparting resistance to other heavy metals including cobalt, zinc and cadmium efflux system genes czcA, czcD,czcB and czcC which code for outer membrane transporter efflux proteins were identified. Magnesium and cobalt transport protein encoding genes corA and corC were identified. Transcriptional regulators of merR family were found in six copies. MerR transcriptional factors are known to be regulators of various environmental stimuli, particularly, high concentrations of heavy metals and oxidative stress [36].

The genetics of biofilm formation in bacteria is a complex process and is dependent on the modulation of expression of a number of genes, mainly those involved in adhesion and autoregulation [37]. The PGA operon is comprised of genes coding for the synthesis of a secreted polysaccharide poly-β-1,6-N-acetyl-D-glucosamine responsible for cell-cell and cell-surface adhesion in biofilms. Strain CT6T demonstrated biofilm formation in vitro, the genes responsible for which were found in its genome. The PGA operon genes pgaA - biofilm secretion outer membrane secretion, pgaB - biofilm PGA synthesis deacetylase and pgaC - biofilm PGA synthesis N-glucosyltransferase were found to be harboured as a single operon in the genome.

Members of the family Comamonadaceae have been shown to possess a mineral phosphate solubilisation phenotype. Genes associated with the MPS phenotype include a glucose dehydrogenase and a pyrroloquinoline-quinone synthase system. PQQ is a cofactor for glucose dehydrogenase. PQQ, a small molecule that serves as a redox cofactor in several enzymes has been found to be produced by Pseudomoas fluorescens, Enterobacter intermedium and many other bacteria. PQQ production has been shown to be involved in plant growth promoting effects in soil dwelling bacteria. Additionally, PQQ production has been associated with higher tolerance to radiation and free oxygen radicals, thus bringing to light its free radical scavenging role in bacteria [38]. PQQ dependent enzymes like GDH play a role in the availability of insoluble phosphates to plants, thus contributing to their mineral phosphate solubilisation phenotype. The MPS phenotype contributes significantly to the mineralization of phosphates, playing a key role in geochemical cycling of the element. Consequently, three copies of PQQ dependent glucose dehydrogenase gene were found. PQQ synthase genes pqqB, D, E were also found. Further, genes coding for isoquinoline 1-oxidoreductase α and β subunit corresponding to the isoquinoline degradation system were found. Isoquinoline 1-oxidoreductase catabolizes the first step in the hydroxylation of isoquinoline, a N-heterocyclic compound which is commonly associated with coal gasification, shale oil, coal tar, crude oil contaminated sites.

Conclusions

The genome of L. cohaerens strain CT6T , a biofilm forming and arsenic tolerating bacterium was found to harbour the genes necessary for arsenic tolerance and biofilm formation. Genes related with the transport and efflux of copper, cobalt, zinc and cadmium were identified. Limited metabolic potential was attributed to lack of key glycolysis and pentose phosphate pathway genes. A metabolically unique combination of genes involving both ED pathway and the nED pathway was encountered. Phylloquinoline-quinone synthetic genes were identified along with PQQ requiring glucose dehydrogenase. This was consistent with the phosphate removal phenotype of Lampropedia from sewage slugde samples [5]. L. cohaerens , which harbours MPS phenotype imparting genes, can be considered to belong to the group of MPS bacteria which are used to enhance the fertility of soil by ensuring availability of trapped phosphates to plants. The presence of isoquinoline degrading genes may be employed for removal of oil contaminations. Further experiments can be performed to link the genetic determinants of L. cohaerens with its actual functional potential. The genetic repertoire of L. cohaerens points towards survival capabilities at diverse stressed niches. The genes harboured by L. cohaerens enable the organism to survive at heavy metal rich microbial mats of hot spring. Biofilm formation may be considered as a niche specialised strategy adapted to survive the hot spring waters forming microbial mats. The diverse survival instincts are reflected in the genome by the presence of genes for a PQQ synthase system and PQQ-dependent glucose dehydrogenases. Isoquinoline degradation genes provide a supplemental benefit for survival at oil contaminated sites. Further, the presence of isoquinoline-degradation genes makes L. cohaerens a potential candidate for bioremediation of oil contaminated sites.