Objective

V. cholerae is a causative agent of human diarrheal disease cholera and poses a significant threat to global health [1, 2]. While V. cholerae is naturally found in aquatic systems [1, 3,4,5], its persistence in this environment is attributed to specific stress response and adaptation mechanisms that include biofilm formation on an array of surfaces, survival in different environmental conditions, as well as interaction with other organisms in such environment [2]. V. cholerae is also a foodborne pathogen that could be acquired either from consuming undercooked or raw seafoods or a direct contact with polluted waters [4, 5]. As part of a larger study that assesses the microbial community profile and tracks antimicrobial resistant and pathogenic clones of bacteria in aquatic systems [6,7,8,9], we isolated V. cholerae strain NB-183 from a recreational freshwater lake. The objective of this study is to report the characterized V. cholerae NB-183 strain using whole genome sequencing-based approach. We also provided an extensive genetic background and gene content analysis of this strain.

Water sample was collected in the Fall of 2023 from a recreational freshwater Kettle Lake in Ontario, Canada (43.9486°N, 79.4352°W). To concentrate and detect bacteria in the water sample, 3 mL of nanobeads (Ceres Nanosciences) was added to 5 L of lake water sample and stirred at room temperature for 30 min. Thereafter, beads were collected using a 5-micron sock filter. An aliquot (100 uL) was spread onto MacConkey agar plates and incubated overnight at 37 °C. Pure colonies of differing morphologies were transferred onto a fresh tryptic soy agar (TSA) plate. Phenotypic identification was performed using VITEK® (bioMérieux Canada).

Data description

DNA extraction and whole-genome sequencing of a pure colony was performed as previously described [6, 7]. Briefly, genomic DNA was extracted using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. DNA libraries were prepared using the DNA prep tagmentation kit and IDT for DNA/RNA unique dual (UD) indexes from Illumina. 2 × 150-bp paired-end sequencing was performed on the Illumina MiniSeq system. Raw reads were preprocessed with FastQC v0.11.9 (https://github.com/s-andrews/FastQC) and trimmed using Trimmomatic v0.39 [10]. Reads with Phred scores of ≥ 30 was assembled de novo using SKESA v2.4.0 [11]. The assembly quality and genome completeness was assessed using QUAST v5.2 [12] and BUSCO v5.5 [13]. Sequence type (ST) assignment was performed using the multilocus sequence type (MLST) database [14]. Genome annotation was performed using the NCBI Prokaryotic Genome Annotation Pipeline v6.6 [15]. Resistome and virulome were identified using CARD [16] and VFDB [17, 18], respectively using minimum coverage of 70% and minimum identity of 90%. Plasmids were identified using MOB-suite v3.1 [19] while PHASTER [20] was used to detect phage regions in the draft genome. Biosynthetic gene clusters were assessed using web-based antiSMASH v7 [21]. Default parameters were used for all bioinformatics pipelines except where otherwise stated.

A total of 1,477,996 paired-end reads were obtained from sequencing isolate NB-183 (Dataset 1, [22]). While the VITEK result was inconclusive, isolate NB-183 was identified as V. cholerae using the k-mer-based species identification with Kraken2 database [23] and refseq_masher using Mash MinHash (https://github.com/phac-nml/refseq_masher). The whole-genome sequencing of NB-183 isolate yielded 99 contigs (N50 = 125,765 bp) from 4,112,549 bp genome size with genome coverage of 96×, 47.42% G + C content, and BUSCO single-copy completeness of 99.2% (v5.5.0) [13] (Table 1; Datafile 1–2 [24, 25]). Whole genome-based comparison using OrthoANI program [26] showed that NB-183 had average nucleotide identity (ANI) of 98% with V. cholerae N16961 (Datafile 1 [24]). In addition, the MLST using the pubMLST database showed that NB-183 had a novel pyrC allele and unique allele profile that had been assigned a new sequence type 1668 (Datafile 3, [27]).

Table 1 Overview of datafiles/datasets

V. cholerae NB-183 strain contained 3,645 protein-coding sequences (CDS), 66 pseudo genes, and 60 RNAs (Dataset 2, [28]). Among the CDS were antibiotic resistance genes blaCARB-7 and almEFG operon that encode resistance to penams and polymyxin [16, 29], respectively. In addition, 127 virulence genes were identified in the whole-genome, including genes encoding type II secretion system (T2SS) essential components (espCDEFGHIJKLMN), type VI secretion system (T6SS)-associated genes (hcp-1, hcp-2), toxins (rtxBCD, toxA), among others. Of note, cholera toxin structural genes (ctxA and ctxB) and toxin co-regulated pilus gene (tcpA) were absent in V. cholerae NB-183 (Datafile 4, [30]). No plasmid was detected, but two intact phages (NBp1 and phage NBp2) were identified in the NB-183 genome (Datafile 5, [31]). One of these phages (phage NBp2, size = 6.8 Kbp) was highly similar (97% coverage and nucleotide identity) to Vibrio phage VCY-NC_016162.1 [32]. Likewise, six different biosynthetic gene clusters were identified, with only two showing high homology (Blastp = 100%) to two BGCs (vibriobactin and piscibactin) that were identified previously in Vibrio species (Datafile 6, [33]). The remaining four BGCs had low homology (Blastp = 0–33%) to those in the Minimum Information about a Biosynthetic Genes Cluster (MIBiG database).

Limitations

This data note was limited to the description of draft genome of a V. cholerae strain isolated from a freshwater sample. Further analysis on a larger collection is needed to source attribute the strain and assess the widespread and significance of the unique biosynthetic gene clusters identified.