Background

Vibrio cholerae is a primary causative agent of life threatening diarrheal disease, cholera. Based on the somatic O antigens, more than two hundred serogroups of V. cholerae have been identified [1], among which O1 and O139 are recognized as the two major agents for cholera epidemics. V. cholerae serogroup O1 has two biotypes and is the causative agent for the previous two cholera pandemics, in which the classical biotype was dominant in the 6th pandemic and the El Tor in the 7th [2]. In 1992, a new non-O1 strain of V. cholerae, designated as serogroup O139 was identified in an epidemic cholera in India and Bangladesh [3, 4]. Since then, V. cholerae O139 has been frequently isolated in other Asian countries where the cholera epidemics have occurred. In China, V. cholerae O139 strains are the dominant contributors in cholera and have been continually isolated since it first appeared in 1993 [5].

Previous studies have identified that the major virulence of V. cholerae O1/O139 is encoded by a lysogenic bacteriophage (CTX prophage) integrated in the V. cholerae genome. Many other genetic elements, such as the toxin-linked cryptic (TLC), the RS1 element, and the pre-CTX prophage (VSK), are also known to be adjacent to the CTX prophage [6]. The CTX prophage in toxigenic V. cholerae is usually consists of two gene clusters, the core and the RS2 regions, which are functionally different [7]. The core region includes the ctxAB genes encoding cholera toxin (CT), and five other genes encoding necessary components for phage morphogenesis. The RS2 region encodes proteins involved in phage replication (RstA), integration (RstB) and regulation of site-specific recombination (RstR). Another noteworthy element in V. cholerae is the SXT/R391 family integrative conjugative element (ICE) which was first identified in a V. cholerae O139 clinical isolation in 1993 [8]. The SXT/R391 ICE in V. cholerae usually contributes to the resistance phonotype of V. cholerae, encoding resistance to several antibiotics like sulfamethoxazole and trimethoprim that had previously been used for cholera treatment.

Though great efforts have been made to understand and to control this pathogen in the past, cholera caused by V. cholerae is still occasionally outbreak in recent years [911]. To date, 9 complete and nearly 200 draft genomes of V. cholerae are accessible in the NCBI genome projects. However, to demonstrate the evolution and the adaption mechanism of this pathogen, detailed analysis of the genomic diversity of new clinical isolations appeared in different areas and time scales is undoubtedly needed. Here, we report the genome sequence of a V. cholerae O139 strain E306 we recently isolated from a cholera patient in Beijing, China. The genome here will shed light on the understanding of the endemicity of cholera in North China.

Methods

Strain isolation

V. cholerae O139 strain E306 was isolated from the stool sample of a cholera case in Beijing, China, on May 30, 2013. After enrichment by alkaline peptone broth, the strain was identified as O139 serogroup by combining the results of its 16S rRNA gene sequence, serum agglutination test and biochemical reaction (Vitek 2 compact, BioMerieux Corp.). This research was approved by the Research Ethics Committee of the Institute of Microbiology, Chinese Academy of Sciences, and informed consent was obtained from the patient. The strain we reported here is available in The 306th Hospital of PLA, Beijing, China.

Genome sequencing

The whole genome was sequenced using shotgun sequencing strategy on Illumina Genome Analyser platform. DNA Library was constructed by using the TruSeq sample preparation kit according to the manufacturer's instructions. Briefly, genomic DNA was sheared by sonication and was then end repaired. After adapters’ ligation (pair-end) with the TA cloning method, the resulting DNA fragments were size selected on a 2% agarose gel. The final DNA library was produced by PCR amplification of the selected ligation products in length of ~500 bp. DNA library (5 pM) was then loaded onto the sequencing chip; clusters were generated by using the Illumina cluster generation kit. After sequencing, image analysis and base calling were carried out by using the Illumina GA Pipeline software. Finally, a total of 6,112,322 pair-end reads were generated.

Genome assembly and annotation

The pair-end raw sequences were quality filtered by using the DynamicTrim and LengthSort Perl scripts provided in SolexaQA suite [12]. After filtering, short reads were assembled by using SOAPdenovo (http://soap.genomics.org.cn) and the gaps were closed by using SOAP GapCloser (http://soap.genomics.org.cn). Glimmer 3.02 [13] was used for prediction of open reading frames, while tRNAscan-SE [14] and RNAmmer [15] were used for tRNA and rRNA identification, respectively. The genome was further annotated with the help of the RAST program (Rapid Annotation using Subsystem Technology) [16]. The annotation results were then checked through comparisons with the databases of NCBI-NR (http://www.ncbi.nlm.nih.gov/), COG [17], and KEGG [18]. For searching the antibiotic resistance genes, the protein-coding sequences were further Blast against Antibiotic Resistance Database (ARDB) [19], using similarity thresholds as recommended in ARDB.

Comparative genomics

For comparative analysis, reference genome sequences of the closest genetic relatives of V. cholerae O139 strain E306 and representative strains belonging to important serogroups including V. cholerae O1 biovar El Tor str. N16961 (GenBank accession number AE003852 and AE003853), B33 (ACHZ00000000), V. cholerae RC9 (ACHX00000000), V. cholerae MO10 (AAKF03000000), V. cholerae MJ-1236 (CP001485 and CP001486), V. cholerae O1 classical O395 (CP000626 and CP000627), V. cholerae CIRS101 (ACVW00000000), V. cholerae IEC224 (CP003330 and CP003331), and V. cholerae O1 str. 2010EL-1786 (CP003069 and CP003070) were downloaded from the NCBI website. Whole-genome alignments and SNP identification were performed by using Progressive Mauve [20]. Concatenated SNPs in length of 23,648 bp were used to calculate the genetic distances, and a phylogenetic tree was constructed by using the neighbor-joining method in MEGA5 [21] based on these SNPs. The stability of the phylogenetic relationships was assessed by bootstrapping (1000 replicates). BWA alignment tool [22] and SAMTools [23] for SNP calling were also used for confirming the results. The genome similarities based on phylogenomic distances were analyzed using the Gegenees software [24].

Quality assurance

The genomic DNA used for sequencing was isolated from pure culture of V. cholerae O139 strain E306. The 16S rRNA gene from the draft genome sequence was further confirmed to be 16S rDNA of V. cholerae by BLSAT against the NCBI database. Sequence contamination was also assessed by RAST annotation systems.

Initial findings

Genome characteristics and phylogenetic analysis

The genome of V. cholerae O139 strain E306 was sequenced on Illumina Genome Analyzer IIx platform. A total of 6,112,322 raw reads with a mean read length of 116 bp, corresponding to 170-fold coverage of the genome were generated. After assembling, a total of 51 scaffolds with N50 length of 442,144 bp were obtained, and 9 gaps were spanned by 7 scaffolds resulting in a total length of 879,788 bp. The final assembled draft genome sequence is 4,165,057 bp with mean G + C content of 47.7%. The genome contains 3861 predicted coding DNA sequences (CDSs) and 82 RNA genes (4 rRNA genes and 78 tRNA genes). RAST annotation of the whole genome indicated the presence of 534 SEED subsystems (Figure 1A). The phylogenetic tree (Figure 1B) based on whole-genome SNPs showed that the closest ancestor for O139 strain E306 was V. cholerae MO10, which is also a member of the O139 serogroup and was isolated during the cholera outbreak in India and Bangladesh in 1992 [3, 4]. The detailed comparison of the subsystems in V. cholerae O139 strain E306 and V. cholerae O139 strain MO10 is shown in Figure 1A.

Figure 1
figure 1

Gene distribution and phylogenetic analysis. A. Comparison of the distribution of genes assigned to SEED subsystems between V. cholerae O139 strain E306 and MO10. Outer circle and inner circle represent V. cholerae O139 strain E306 and MO10, respectively. Genes with less than 2% attribution are not labeled. B. Phylogenetic relationships (based on SNPs) of 11 V. cholerae strains and their genomic distance analysis. Bootstrap values less than 50% are not shown. The heat-plot of the similarity matrices is based on fragmented alignments with settings 500/500.

Cholera toxin prophage

Interestingly, though the V. cholerae O139 strain E306 is very close to V. cholerae O139 strain MO10, the gene organization of the cholera toxin-encoding CTX prophage is identical to those in the O1 strains of CIRS101, 2010EL-1786, and El Tor N16961. It is noteworthy that the genomic arrangements of the CTX prophage and the RS1 element in O139 strain E306, CIRS101, and 2010EL-1786 are opposite to that in V. cholerae O1 El Tor N16961 (Figure 2). Overall, compared with its closest neighbor, V. cholerae MO10, O139 strain E306 harbors the El Tor-specific RS1 element, and there is no VSK adjacent to the core region.

Figure 2
figure 2

Diagrammatic indication of the structure of the CTX prophage and associated elements in V. cholerae O139 strain E306 and other 4 reference strains. The transcription direction of each gene is indicated by arrow and different genes are shaded in different colors. TLC: toxin-linked cryptic; VSK: pre-CTX prophage. The TLC and VSK elements are not drawn to scale.

Integrative conjugative elements (ICEs)

Based on the integrase gene similarity, a SXT/R391 family ICE in V. cholerae O139 strain E306 was identified inserted at the prfC locus. The general organization of this ICE was found to be highly similar to ICEVch Ind4 and SXT MO10 which were previously identified in V. cholerae O139 strains. Detailed alignment indicated that ICEE306 and ICEVchInd4 only differed by 3 SNPs (3 SNPs in 3 coding regions), and ICEE306and SXTMO10 differed by 26 SNPs (17 SNPs in 14 coding regions) (Table 1); no obvious large sequence changes such as deletions and insertions were observed. These results, consistent with other study [25], suggested that these ICEs in V. cholerae are very stable over time, and because of the high degree of similarity, the dissemination of the ICE-carrying V. cholerae strains between different regions cannot be excluded.

Table 1 SNPs identified in SXT MO10 and ICE Vch Ind4 compared with ICE E306

Antibiotic resistance genes

We compared all the predicted protein-coding genes from 11 V. cholerae strains with known antibiotic resistance genes (BLASTp against the ARDB database [19]), yielding 50 matches to antibiotic resistance genes, mainly aminoglycoside resistance genes and tetracycline resistance genes (Table 2). A chloramphenicol resistance gene type (catb5) encoding Group B chloramphenicol acetyltransferase is present in 9 out of the 11 genomes, which is the most common resistance gene type. Interestingly, V. cholerae O139 strain E306 has 9 resistance genes, but no resistance gene was identified in O395 and only one was found in N16961. These results implied that different V. cholerae strains have different resistance profiles; the new isolation V. cholerae O139 strain E306 seems to have accumulated more antibiotic resistance in an environment with rapid growth rate of drug resistance [26].

Table 2 Antibiotic resistance genes in the V. cholerae strains predicted by using the antibiotic resistance genes database

Future directions

Compared to the epidemic lineages of V. cholerae serogroup O1, our understanding of the genomic properties and their diversity of V. cholerae serogroup O139 is very limited. In this study, we sequenced the whole genome of a newly isolated strain of V. cholerae O139. This strain, carrying an El Tor-specific RS1 element that was found in V. cholerae O1 serogroup and more antibiotic resistance genes than other sequenced strains, highlights its high ability to adapt to new environments and poses a risk of causing new epidemic cholera. Moreover, the genome here will be of great interests for future V. cholerae comparative genomics.

Availability of supporting data

This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AWWA00000000. The version described in this paper is version AWWA01000000.