Background

Clostridium difficile is a Gram-positive, endospore-forming obligate anaerobe and the current leading cause of antibiotic-associated diarrhea (AAD) within hospital settings worldwide [1]. Estimates have revealed that C. difficile infections (CDI) are responsible for 15–25% of all AAD cases [2]. Onset of CDI can be engendered by disruption of the hosts’ gut microbiota by broad-spectrum antibiotic treatments. Aging, prolonged stay in health care settings, and proton-pump inhibitor use all contribute to increased risk of CDI [3]. Although C. difficile has been characterized for decades, it first gained prominence in 2003 when an outbreak in North America was found to be caused by a strain with toxin hyperproduction capabilities [4]. The rapid spread of C. difficile NAP1/BI/027 strain (PCR ribotype 027 or RT027), which is the same strain characterized with different methods has resulted in outbreaks worldwide, although cases in Asia and Latin America were less reported compared with Europe and North America.

According to a previous case report, NCKUH-21 is the strain isolated from the first severe RT027 CDI in Taiwan, and it contains a deletion of 18 base pairs and a truncated mutation (D117A) in tcdC [5]. To further understand the relationship between NCKUH-21 and other RT027 strains including historic strains and hypervirulent strains, we determined the genome sequence of the C. difficile strain NCKUH-21 (the accession numbers: BDSN01000001–BDSN01000094) and compared it with other sequenced RT027 strains. We assessed the presence of virulence and antibiotic resistance genes for the NCKUH-21 genome. We also compared the genome sequences of the NCKUH-21 strain with its close relatives to investigate the genome synteny, reconstruct the phylogenetic tree, and identify NCKUH-21 strain-specific genes.

Methods

Genome sequencing, assembly, and annotation for the strain NCKUH-21, as well as comparative genomics of nine C. difficile strains (Table 1), were performed as described in Additional file 1: Materials and methods.

Table 1 Analysis of the genomic features of Clostridium strains

Quality assurance

Genomic DNAs were purified from a pure culture of a single bacterial isolate of NCKUH-21. A BLAST search against a nonredundant database revealed no potential contamination of the genomic libraries.

Results and discussion

Genomic features

Illumina MiSeq sequencing was performed to determine the genome sequence of the C. difficile strain NCKUH-21. The de novo assembly contained 94 contigs of length 4,217,149 bp, with a G+C content of 28.4% with sequencing coverage of 1611×. Genome annotation yielded a total of 3810 protein-coding sequences (CDSs).

Among the C. difficile strains analyzed in this paper, the genome size (Mb) ranged from 4.05 to 4.46, G+C content ranged from 28.4 to 29.2%, and CDS number ranged from 3485 to 4128 (Table 1). The general genomic features for the NCKUH-21 strain were thus similar to those of the other C. difficile strains.

Phylogeny

Clostridium difficile strains with the same PCR ribotype were reported to cluster together in the phylogenetic trees for the conserved genes [6]. The Roary pipeline produced a total of 8775 homologous groups of genes (“pan-genome”), of which 69 were shared by all the strains used in this study (“core-genome”). The core genome phylogeny indicated that the RT027 strains (R20291, CD196, NCKUH-21, BI1, and 2007855) formed a monophyletic group or clade, joined by the Z31 and 630 strains, followed by the M68 strain, and finally the M120 strain (Fig. 1).

Fig. 1
figure 1

Phylogenetic tree obtained from a concatenated nucleotide sequence alignment of the core genes for the Clostridium difficile strains. The horizontal bar at the base of the figure represents 0.002 substitutions per nucleotide site. The FastTree branch support values are indicated

Synteny

The Mauve Contig Mover (http://darlinglab.org/mauve/user-guide/reordering.html) was used to reorder the contigs of NCKUH-21 relative to the complete genome of C. difficile CD196. The genomes of the nine C. difficile strains were aligned using progressiveMauve, and this alignment was visualized using genoPlotR to investigate genomic rearrangement (Fig. 2). The genome synteny was determined to be conserved among all but one of the strains. An exception was the Z31 strain with large-scale genomic rearrangement, which had not been previously reported [7].

Fig. 2
figure 2

Genome alignment of Clostridium difficile strains performed using progressiveMauve (http://darlinglab.org/mauve/user-guide/progressivemauve.html) and visualized using genoPlotR (http://genoplotr.r-forge.r-project.org)

Antibiotic resistance and virulence genes

Antibiotic resistance and virulence genes were searched using ABRicate. Homologous DNA sequences for the binary toxin genes cdtA and cdtB listed in the Virulence Factors Database (accessions of AAF81760 and AAF81761, respectively) were detected in the NCKUH-21 genome [8]. Homologous DNA sequences for the antibiotic resistance genes cdeA, vanRG, and vanG listed in the Comprehensive Antibiotic Resistance Database (accessions of AJ574887.1:371–1697, DQ212986:2259–2967, and DQ212986:5985–7035, respectively) were detected in the NCKUH-21 genome. Although NCKUH-21 showed the genetic potential for becoming resistant to antibiotics, this strain was shown to be susceptible to moxifloxacin (minimum inhibitory concentration 0.5 μg/mL), metronidazole (0.094 μg/mL), and vancomycin (0.5 μg/mL) [5].

The genetic organization of the pathogenicity locus (PaLoc) of the CD630 strain is tcdR-tcdB-tcdE-orf-tcdA-tcdC (locus_tag: CD630_06590, CD630_06600, CD630_06610, CD630_06620, CD630_06630, and CD630_06640) [9]. The gene order was conserved in the NCKUH-21 genome (the accession number: BDSN01000011; locus_tag: NCKUH21_00647, NCKUH21_00648, NCKUH21_00649, NCKUH21_00650, NCKUH21_00651, and NCKUH21_00652). Moreover, another sequence similar to tcdE (CD630_06610) was found in the NCKUH-21 genome (locus_tag: NCKUH21_03847) with 83% amino acid identity. The genes tcdB and tcdA encoding Toxin B and Toxin A (locus_tag: CD630_06600 and CD630_06630; 2366 and 2710 amino acids in length), respectively, of the CD630 PaLoc were determined to be homologous with 48% amino acid identity; additionally, these two genes partly matched a sequence encoding “N-acetylmuramoyl-l-alanine amidase LytC” (the accession number: BDSN01000021; locus_tag: NCKUH21_02692; 644 amino acids in length) in the NCKUH-21 genome with 177 and 226 alignment length and 32 and 34% amino acid identity values, respectively. The PaLoc gene homologues may contribute to the virulence and pathogenicity for the C. difficile strain NCKUH-21.

NCKUH-21 strain-specific genes

To identify NCKUH-21 strain-specific genes, we searched the NCKUH-21 strain’s protein homologues in the genome sequences of all C. difficile strains by using the gene screen method with TBLASTN in the large-scale blast score ratio (LS-BSR) pipeline. Of the 3810 protein-coding genes identified in NCKUH-21, 3579 were conserved in all the other RT027 strains (R20291, CD196, BI1, and 2007855), and 2832 were conserved in all the C. difficile strains used in this study. Among the strains, the largest numbers of NCKUH-21 genes were conserved in the RT027 strains (R20291, CD196, BI1, and 2007855), ranging from 3592 to 3655, followed by other C. difficile strains (Z31, 630, M68, and M120), ranging from 3153 to 3431, and finally the outgroup LM2 (761).

A total of 140 protein-coding genes were present in the NCKUH-21 strain but absent in the other strains (Additional file 2: Table S1). The NCKUH-21 strain-specific genes could have been gained on the branch leading to the NCKUH-21 strain, and they could thus be linked to its specific phenotype (e.g., virulence and pathogenicity). Of the 140 NCKUH-21 strain-specific genes, 50 were encoded on the 40,525-bp-long contig sequence of the NCKUH-21 genome (the accession number: BDSN01000034), which showed a 99% identity match to the Clostridium phage \(\upvarphi\)CD38-2 (GenBank accession: HM568888). The genomic region highly similar to the Clostridium phage \(\upvarphi\)CD38-2 was designated as the prophage \(\upvarphi\)NCKUH-21.

Prophage \(\upvarphi\)NCKUH-21

The prophage \(\upvarphi\) NCKUH-21 detected in the draft genome for the C. difficile strain NCKUH-21 was further confirmed by phage induction examination and electron microscope imaging (data not shown). A previous study suggested that lysogenic \(\upvarphi\)CD38-2 replicates as a circular plasmid and boosts toxin production in C. difficile [10]. The high sequence identity between \(\upvarphi\)NCKUH-21 and \(\upvarphi\)CD38-2 suggests that these prophages have a similar role in C. difficile.

Reports have revealed that bacterial phages tend to be lower in G+C content than their hosts and that viruses match the G+C content of their hosts [11, 12], including the C. difficile bacteriophage \(\upvarphi\)CD119 [13]. Base composition statistics for the NCKUH-21 genes were calculated as the relative frequency of G+C at third codon positions (GC3). The median GC3 value for the prophage \(\upvarphi\)NCKUH-21 genes (0.21) was higher than that for the other genes (0.14) in the NCKUH-21 genome. A Wilcoxon rank sum test, which compared the GC3 values between the two groups of genes, was highly significant (P < 2.2e−16). This suggests that the prophage \(\upvarphi\)NCKUH-21 does not match the base composition of the host genome and may thus have been acquired by horizontal transfer based on the hypothesis of genome amelioration [14].

Concluding remarks

From 2013 to 2014, three RT027 C. difficile strains were isolated from patients in Taiwan [5, 15, 16]. Among them, NCKUH-21 is the first strain to have a whole-genome sequence for genome comparison. Whether the other two RT027 isolates also carry a complete prophage, what their phylogenetic relation with NCKUH-21 is, and what the relative toxin production level is between the three isolates are all topics for further research.