Introduction

The C. saccharolyticum species group is a poorly described and taxonomically confusing clade in the Lachnospiraceae, a family within the Clostridiales that includes members of clostridial cluster XIVa [1]. This group includes C. indolis, C. sphenoides, C. methoxybenzovorans, C. celerecrescens, and Desulfotomaculum guttoideum, none of which are well studied (Figure 1). C. saccharolyticum has gained attention because its saccharolytic capacity was shown to be syntrophic with the cellulolytic activity of Bacteroides cellulosolvens in co-culture, enabling the conversion of cellulose to ethanol in a single step [6,7]. Members of this group, such as C. celerecrescens, are themselves cellulolytic [8], and others are known to degrade unusual substrates such as methylated aromatic compounds (C. methoxybenzovorans) [9], and the insecticide lindane (C. sphenoides) [10]. C. indolis was targeted for whole genome sequencing to provide insight into the genetic potential of this taxa that could then direct experimental efforts to understand its physiology and ecology.

Figure 1.
figure 1

Phylogenetic tree based on 16S rRNA gene sequences highlighting the position of Clostridium indolis relative to other type strains (T) within the Lachnospiraceae. The strains and their corresponding NCBI accession numbers (and, when applicable, draft sequence coordinates) for 16S rRNA genes are: Desulfotomaculum guttoideum strain DSM 4024T, Y11568; C. sphenoides ATCC 19403T, AB075772; C. celerecrescens DSM 5628T, X71848; C. indolis DSM 755T, Pending release by JGI: 1620643–1622056; C. methoxybenzovorans SR3, AF067965; C. saccharolyticum WM1T, NC_014376:18567-20085; C. algidixylanolyticum SPL73T, AF092549; C. hathewayi DSM 13479T, ADLN00000000: 202–1639; Eubacterium eligens L34420 T, L34420; Ruminococcus gnavus ATCC 29149T, X94967; R. torques ATCC 27756T, L76604; E. rectale L34627T; Roseburia intestinalis L1-82T, AJ312385; R. hominis A2-183T, AJ270482; C. jejuense HY-35-12T, AY494606; C. xylanovorans HESP1T, AF116920; C. phytofermentans ISDgT, CP000885: 15754–17276. The tree uses sequences aligned by MUSCLE, and was inferred using the Neighbor-Joining method [2]. The optimal tree with the sum of branch lengths = 0.50791241 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches [3]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method [4] and are in the units of the number of base substitutions per site. Evolutionary analyses were conducted in MEGA 5 [5]. C. stercorarium ATCC 35414T, CP003992: 856992–858513 was used as an outgroup.

Classification and features

The general features of Clostridium indolis DSM 755T are listed in Table 1. C. indolis DSM 755T was originally named for its ability to hydrolyze tryptophan to indole, pyruvate, and ammonia [23] in the classic Indole Test used to distinguish bacterial species. It has been isolated from soil [24], feces [25], and clinical samples from infections [27]. Despite its prevalence, C. indolis is not well characterized, and there are conflicting reports about its physiology. It is described as a sulfate reducer with the ability to ferment some simple sugars, pectin, pectate, mannitol, and galacturonate, and convert pyruvate to acetate, formate, ethanol, and butyrate [28]. According to this source, neither lactate nor citrate are utilized, however other studies demonstrate that fecal isolates closely related to C. indolis may utilize lactate [29], and that the type strain DSM 755T utilizes citrate [30]. It is unclear whether C. indolis is able to make use of a wider range of sugars or break down complex carbohydrates, however growth is reported to be stimulated by fermentable carbohydrates [28].

Table 1. Classification and general features of Clostridium indolis DSM 755T

Genome sequencing information

Genome project history

The genome was selected based on the relatedness of C. indolis DSM 755T to C. saccharolyticum, an organism with interesting saccharolytic and syntrophic properties. The genome sequence was completed on May 2, 2013, and presented for public access on June 3, 2013. Quality assurance and annotation done by DOE Joint Genome Institute (JGI) as described below. Table 2 presents a summary of the project information and its association with MIGS version 2.0 compliance [31].

Table 2. Project information

Growth conditions and DNA isolation

C. indolis DSM 755T was cultivated anaerobically on GS2 medium as described elsewhere [32]. DNA for sequencing was extracted using the DNA Isolation Bacterial Protocol available through the JGI (http://www.jgi.doe.gov). The quality of DNA extracted was assessed by gel electrophoresis and NanoDrop (ThermoScientific, Wilmington, DE) according to the JGI recommendations, and the quantity was measured using the Quant-iTTM Picogreen assay kit (Invitrogen, Carlsbad, CA) as directed.

Genome sequencing and assembly

The draft genome of C. indolis was generated at the DOE Joint genome Institute (JGI) using a hybrid of the Illumina and Pacific Biosciences (PacBio) technologies. An Illumina std shotgun library and long insert mate pair library was constructed and sequenced using the Illumina HiSeq 2000 platform [33]. 16,165,490 reads totaling 2,424.8 Mb were generated from the std shotgun and 26,787,478 reads totaling 2,437.7 Mb were generated from the long insert mate pair library. A Pacbio SMRTbellTM library was constructed and sequenced on the PacBio RS platform. 99,448 raw PacBio reads yielded 118,743 adapter trimmed and quality filtered subreads totaling 330.2 Mb. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts [34]. Filtered Illumina and PacBio reads were assembled using AllpathsLG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1 FRAG COVERAGE=50 JUMP COVERAGE=25; RunAllpath-sLG: THREADS=8 RUN=std pairs TARGETS=standard VAPI WARN ONLY=True OVERWRITE=True) [35]. The final draft assembly contained 1 contig in 1 scaffold. The total size of the genome is 6.4 Mb. The final assembly is based on 2,424.6 Mb of Illumina Std PE, 2,437.6 Mb of Illumina CLIP PE and 330.2 Mb of PacBio post filtered data, which provides an average 759.7× Illumina coverage and 51.6× PacBio coverage of the genome, respectively.

Genome annotation

Genes were identified using Prodigal [36], followed by a round of manual curation using GenePRIMP [9] for finished genomes and Draft genomes in fewer than 10 scaffolds. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [37] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [38]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [39]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [40] developed by the Joint Genome Institute, Walnut Creek, CA, USA [41]. Information in the tables below reflects the gene information in the JGI annotation on the IMG website [40].

Genome properties

The genome of C. indolis DSM 755 consists of a 6,383,701 bp circular chromosome with GC content of 44.93% (Table 3). Of the 5,903 genes predicted, 5,802 were protein-coding genes, and 101 RNAs; 170 pseudogenes were also identified. 81.21% of genes were assigned with a putative function with the remaining annotated as hypothetical proteins. The genome summary and distribution of genes into COGs functional categories are listed in Tables 3 and 4.

Table 3. Nucleotide content and gene count levels of the genome of C. indolis DSM 755
Table 4. Number of genes in C. indolis DSM 755 associated with the 25 general COG functional categories
Table 5. Number of genes in each of the 25 general COG functional categoriesa found in C. indolis DSM 755T but not in closely related species

Carbohydrate transport and metabolism

Plant biomass is a complex composite of fibrils and sheets of cellulose, hemicellulose, waxes, pectin, proteins, and lignin. Bacteria from soil and the gut generally possess a variety of genes to degrade and transport the diversity of substrates encountered in these plant-rich environments. The genome of C. indolis includes 910 genes (17.65% of total protein coding genes) in this COG group including glycoside hydrolases with the potential to degrade complex carbohydrates including starch, cellulose, and chitin (Table 6), as well as an abundance of carbohydrate transporters (Figure 2). Almost 8% of the protein-coding genes in the genome of C. indolis were found to be associated with carbohydrate transport, represented by two main strategies. ABC (ATP binding cassette) transporters tend to carry oligosaccharides, and have less affinity for hexoses [43,44], while PTS (phosphotransferase system) transporters carry many different mono- and disaccharides, especially hexoses [45]. PTS systems provide a means of regulation via catabolite repression [46], and are thought to enable bacteria living in carbohydrate-limited environments to more efficiently utilize and compete for substrates [46]. Both C. indolis and its near relatives are more highly enriched in ABC than PTS transporters (Fig 2), however nearly a third of C. indolis and C. saccharolyticum transporters are PTS genes, suggesting a preference for hexoses, as well as an adaptation to more marginal environments. C. indolis also possesses ten genes associated with all three components of the TRAP-type C4-dicarboxylate transport system, which transports C4-dicarboxylates such as formate, succinate, and malate [47], as well as six putative malate dehydrogenases and two putative succinate dehydrogenases suggesting that C. indolis may have the potential to utilize both of these short chain fatty acids.

Figure 2.
figure 2

Distribution of ABC and PTS transporters in the genomes of C. indolis and related genomes determined from Integrated Microbial Genome (IMG) annotation [40] viewed based on (a) Total umber of COGS, and (b) Percentage of genes in the genome.

Table 6. Selected carbohydrate active genes in the C. indolis DSM 755T genome

Energy production and conversion

The genome of C. indolis contains 261 genes in COG category (C) Energy production and conversion, 28 of which are not found in the near relatives analyzed, including genes for citrate utilization (Table 7) and nitrogen fixation (Table 8).

Table 7. Selection of C. indolis DSM 755 genes related to citrate utilization.
Table 8. Selection of C. indolis DSM 755 genes related to nitrogen fixation.

Citrate utilization

Citrate is a metabolic intermediary found in all living cells. In aerobic bacteria, citrate is utilized as part of the tricarboxylic acid (TCA) cycle. In anaerobes, citrate is fermented to acetate, formate, and/or succinate. The first step is the conversion of citrate to acetate and oxaloacetate in a reaction catalyzed by citrate lyase (EC:4.1.3.6) [48]. C. sphenoides, a close relative of C. indolis that does not yet have a sequenced genome has been shown to utilize citrate [49], but there is conflicting evidence as to whether this phenotype is present in C. indolis [28,30]. The genome of C. indolis reveals a group of seven citrate genes organized in a cluster similar to operons found in other bacterial species [48,50] (Figure 3) including CitD, CitE, and CitF, the three subunits of the citrate lyase gene [48], CitG and CitX which have been shown to be necessary for citrate lyase function [50], CitMHS, a citrate transporter, and a putative two component system similar to citrate regulatory mechanisms in other bacteria [51].

Figure 3.
figure 3

Citrate utilization genes are in a single gene cluster on K401DRAFT_scaffold0000.1.1, including the citrate transporter CitMHS, and a putative two-component system.

Nitrogen Fixation

Nitrogen fixation has been observed in other clostridia [52,53] but has not been demonstrated in the C. saccharolyticum species group. It has been suggested that the capacity to fix nitrogen confers a selective advantage to cellulolytic microbes that live in nitrogen limited environments such as many soils [52]. The functional summary suggests that C. indolis can fix nitrogen. The C. indolis genome reveals 22 nitrogenase related genes in four gene clusters (Table 8), none of which are found in the near relatives analyzed in this study. A minimum set of six genes encoding for structural and biosynthetic components of a functional nitrogenase complex have been hypothesized [54]. Genes needed for the nitrogenase structural component proteins (nifH, nifD, and nifK) are present in C. indolis, but one of the three genes required to synthesize the nitrogenase iron-molybdenum cofactor (nifN) is not identified. Follow up experiments are needed to determine whether C. indolis can fix nitrogen as predicted by the genome analysis.

Lactate utilization

The genome of C. indolis includes both D- and L-lactate dehydrogenases, which convert lactate to pyruvate. Additionally, there is a lactate transporter, suggesting that C. indolis is able to utilize exogenous lactate [Table 9].

Table 9. Selection of C. indolis DSM 755 genes related to lactate utilization.

Bacterial microcompartments (BMC)

The C. indolis genome contains genes associated with bacterial microcompartment shell proteins. Bacterial microcompartments (BMCs) are proteinaceous organelles involved in the metabolism of ethanolamine, 1,2-propanediol, and possibly other metabolites (Rev in [5557]). BMCs are often encoded by a single operon or contiguous stretch of DNA. The different metabolic types of BMCs can be distinguished by a key enzyme (e.g., ethanolamine lyase and propanediol dehydratase) related to its metabolic function. While the other associated genes in the operon can vary, they frequently include an alcohol dehydrogenase, an aldehyde dehydrogenase, an aldolase and an oxidoreductase.

In C. indolis there are 2 separate genetic loci that code for BMCs (Table 10 and 11 and Figure 4). One C. indolis locus (Table 10) contains a gene (K401DRAFT_2189) with sequence similarity to a B12-independent propanediol dehydratase found in Roseburia inulinivorans and Clostridium phytofermentans [58,59] (both members of the Lachnospiraceae). This enzyme has been shown to be involved in the metabolism of fucose and rhamnose [58,59] and was subsequently categorized as the glycyl radical prosthetic group-based (grp) BMC [60]. The glycyl radical family of enzymes was recently expanded to include a choline trimethylamine lyase activity that is part of a microcompartment loci in Desulfovibrio desulfuricans [61]. The corresponding C. indolis enzymes (K401DRAFT_2189 and K401DRAFT_2190) are more similar to the D. desulfuricans protein, but there are differences in the gene content of the microcompartment loci. Further work is needed to determine the physiological role of this microcompartment.

Figure 4.
figure 4

CoAT BMC operon found in C. indolis, Caldalkalibacillus thermarum, C. stricklandii, C. saccharolyticum, and Bacillus selenitrireducens. Gene details are found in Table 11.

Table 10. grp-BMC genes found in the C. indolis genome.
Table 11. CoAT BMC genes found in the C. indolis genome.

The second C. indolis BMC loci (Table 11 and Figure 4) is even more enigmatic. This loci contains the shell proteins, alcohol dehydrogenase, aldehyde dehydrogenase, aldolase and oxidoreductase commonly found in microcompartments, but it lacks a known key enzyme. Homologs of this operon were found in four other bacterial species (Figure 4). They are all missing a known key enzyme and contain 2 genes annotated as CoA-transferase. We propose that the C. indolis genome and these other bacteria contain a novel type of microcompartment, designated the CoAT BMC. It is not clear that the function of the 2 annotated CoA-transferase genes are as predicted and further research is needed to demonstrate the physiological role of this BMC.

Secondary metabolites biosynthesis, transport and catabolism

Protocatechuate and other aromatics are intermediaries in the degradation of lignin in plant rich environments [62]. The genome of C. indolis contains two protocatechuate dioxygenases and an aromatic hydrolase, revealing the potential for utilizing aromatic compounds (Table 12).

Table 12. Selection of C. indolis DSM 755T genes related to degradation of aromatics.

Conclusion

The genomic sequence of C. indolis reported here reveals the metabolic potential of this organism to utilize a wide assortment of fermentable carbohydrates and intermediates including citrate, lactate, malate, succinate, and aromatics, and points to potential ecological roles in nitrogen fixation and ethanolamine utilization. Further culture-based characterization is necessary to confirm the metabolic activity suggested by this genomic analysis, and to expand the description of C. indolis.