Background

With the rapid expansion of aquaculture, vibriosis has become one of the most serious diseases to endanger sustainable aquaculture development [1, 2]. Vibrio harveyi is a halophilic, luminescent Gram-negative γ-proteobacteria, as an important pathogen of vibriosis [3,4,5]. V. harveyi has been extensively studied for more than a decade and is reported to be a serious pathogen for a range of marine vertebrates and invertebrates [4,5,6,7]. The earliest report of V. harveyi causing serious infections was to common snook (Centropomus undecimalis) in Florida, USA [8]. Subsequently, V. harveyi was shown to cause disease in Penaeus monodon and Penaeus japonicas in Thailand [10], brown-spotted grouper in Kuwait [9], common dentex (Dentex dentex) in Spain [10], Holothuria scabra (Holothuroidea, Echinodermata) in Toliara, Madagascar [11], Asian seabass (Lates calcarifer) in Malaysia [12], and Epinephelus spp. in China [13,14,15].

The wide use of antibiotics, increased environmental pollution and global climate change are leading to the enhancement of drug resistance of Vibrio spp., including V. harveyi. Several studies show that drug resistance and multidrug resistance (MDR) are occurring in aquaculture worldwide, including in shrimp ponds in Thailand, Malaysia, India, and China [16,17,18,19]; shellfish farms in Malaysia, Korea, USA, Poland and China [20,21,22,23,24]; and fish farms in Italy, Korea, and China [25,26,27]. Enhanced drug resistance leads to stronger virulence resulting in difficulty in preventing and treating V. harveyi infection [15, 16]. For example, Nakayama et al. [28] found that gradually increasing antibiotic concentration and frequent subculturing enhances V. harveyi antibiotic resistance, elevating the toxicity of V. harveyi. The pathogenic and drug-resistant genes of V. harveyi are a key to the fundamental cause of pathogenicity and drug resistance. Therefore, studying the pathogenic and drug-resistance genes of V. harveyi will provide an important foundation for determining pathogenic and drug-resistance mechanisms.

Although V. harveyi a recognized pathogen of marine animals, different strains vary in their ability to cause disease [29]. The main virulence factors of V. harveyi are extracellular proteases, outer membrane proteins, hemolysins, esterases, phospholipases, exotoxin and secretion systems [29, 30]. Production of antimicrobial enzymes that inactivate antibacterial drugs is an important resistant mechanism [31]. Inactivating enzymes can be expressed by genes on plasmids or chromosomes [32]. Virulence factors and antibiotic-resistance genes (ARGs) can also be vertically transferred and spread via horizontal gene transfer (HGT) through mobile genetic elements (MGEs) such as plasmids, bacteriophages, transposons, integrative and conjugative elements (ICEs), and genetic islands (GIs) [33]. Park et al. [34] found that MGEs that are prophages, GIs and pathogenicity islands carry different combinations of virulence factors that promote immune evasion and superantigens that contribute to serious Staphylococcus aureus infection. Le et al. [35] reported that 0.92% (36 of 38,895) analyzed proteins from 31 Rickettsiales genomes are associated with strong bootstrap support for HGT with function as ATPases, aldolases, transporter activities, cystathionine beta-lyases, sugar phosphate permeases, growth inhibitors and antitoxin activities. For horizontal transfer, exogenous DNA must evade the bacterial immune system such as restriction modification systems (RM systems) and CRISPR-Cas systems [36, 37].

Serious infection and multidrug resistance lead to the emergency of the availability of whole genome sequences for different V. harveyi strains to determine pathogenic and antimicrobial-resistance mechanisms. Evidence is growing that HGT is an important driving force for prokaryotic evolution affecting pathogenicity and drug resistance [34, 38]. Coastal urbanization has recently intensified, resulting in the production of a large amount of antibiotics, heavy metals, and nutrients pollutants (Additional file 1: Table S1), which may increase the pathogenicity and drug resistance of V. harveyi by affecting HGT [39, 40]. V. harveyi is the dominant species that causes serious infection and mortality of farmed fish in Guangdong, Southern China [14, 15]. However, little information is available on the mechanism of V. harveyi pathogenesis and drug resistance.

The predominant strain V. harveyi 345 is multidrug resistant to ampicillin, rifampicin, tetracycline, pediatric compound sulfamethoxazole tablets, vancomycin, doxycycline, trimethoprim, streptomycin, kanamycin, sulfamethoxazole, furazolidone, cefixime, and chloramphenicol. This strain was isolated from a diseased E. oanceolutus in Shenzhen, Southern China. V. harveyi 345 is suspected to lead to kidney enlargement and softening, spleen enlargement, and anal bleeding of E. oanceolutus and has a median lethal dose (LD50) of 9.83 × 105 CFU·g− 1 [41]. Therefore, its complete genome information and HGT events are helpful for clarifying V. harveyi pathogenesis and drug resistance, controlling V. harveyi disease and reducing economic losses. In this study, we present the entire genome sequence of V. harveyi 345 with comparative genomics analysis of its pathogenesis, antimicrobial resistance and genome expansion caused by HGT.

Results

General features of V. harveyi 345

A total of 1189 megabases clean data and 98,602 subreads were generated assembling into two contigs with average genome coverage of 41.71×. After error correction, GATK analysis and gap filling, a genome of 6,185,822 bp was generated, which was smaller than comparable strains KC13.17.5, E385, and 74F and larger than 27 other comparable strains (Additional file 2: Table S2). The chromosomal G + C content (44.76%) of 345 agreed with values from 30 comparable strains (44.60–45.60%) (Additional file 2: Table S2). Combined sequencing analysis revealed that the complete genome of V. harveyi 345 contained two circular chromosomes, named ChI (3,713,225 bp, 44.81% G + C, CP025537) and ChII (2,220,396 bp, 44.81% G + C, CP025538), and two circular megaplasmids named p345–185 (185,327 bp, 43.48% G + C, CP025539) and p345–67 (66,874 bp, 43.48% G + C, CP025540) (Table 1 and Fig. 1).

Table 1 Genome features of V. harveyi 345
Fig. 1
figure 1

Circular representation of the V. harveyi 345 genome. Inner to outer: genome size, all annotated genes, forward-strand genes, reverse-strand genes, tRNA, rRNA, sRNA, GC and GC-skew

Gene annotation

A total of 5678 genes (fewer than ZJ0603 and more than 29 other strains) (Additional file 2: Table S2) were predicted to be an average 929 bp with a mean GC content of 45.68%, accounting for 85.29% of the genome. The three highest levels of gene length distribution were 400–500 bp, 600–700 bp, and 300–400 bp and the three lowest were 0–100 bp, 1900–2000 bp, and 1800–1900 bp (Fig. 2a). In addition, 128 tRNA, 31 rRNA and 13 sRNA genes were identified with confidence. A total of 89 tandem repeats, including 44 minisatellite DNA and 35 microsatellite DNA, were predicted (Table 1 and Fig. 1).

Fig. 2
figure 2

Coding sequence length distribution and COG functional classifications. Coding sequence length distribution of V. harveyi 345 (a). COG functional classifications of V. harveyi 345 coding sequences (b)

Predicted open reading frames (ORFs) were further classified into Clusters of Orthologous Groups (COG) (4586 ORFs in total contenting 80.77%, Fig. 2b). According to COG categorization analysis, the top five groups in abundance were amino acid transport and metabolism (361 ORFs), transcription (350 ORFs), general function prediction only (343 ORFs), signal transduction mechanisms (312 ORFs) and carbohydrate transport and metabolism (312 ORFs) (Fig. 2b). In addition, 3247 genes were classified into 40 functional Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Additional file 3: Figure S1). Some genes were involved in more than one KEGG pathway and included the top five pathways for replication and repair (784 ORFs), carbohydrate metabolism (633 ORFs), signal transduction (506 ORFs), infectious diseases (463 ORFs) and membrane transport (432 ORFs). Furthermore, 3453 genes were classified into 40 functional Gene Ontology (GO) classifications (Additional file 4: Figure S2). Some genes were involved in more than one GO classifications and included the top five classifications of metabolic process (1914 ORFs), cellular process (1881 ORFs), catalytic activity (1785 ORFs), single-organism process (1526 ORFs) and binding (1494 ORFs).

Twelve RM system genes were annotated in V. harveyi strain 345 (Table 2). Ten belonged to Type I RM systems, encoding three R subunits (hsdR), two S subunits (hsdS) and five M subunits (hsdM). The other two belonged to type III RM systems, encoding a methyltransferase (mod) and a restriction enzyme (res).

Table 2 The RM system of V. harveyi 345

Virulence factors

A total of 487 putative virulence factors were predicted (Table 3 and Additional file 5: Table S3), involving adherence (type IV pili, lipooligosaccharide LOS, and OmpU), motility (polar flagellar proteins, chemotaxis proteins, lateral flagellin proteins, and motor proteins), regulation (Autoinducer-2 and cholerae autoinducer-1), secretion system (type II/III/IV/VI secretion system proteins, T2/3/4/6SS proteins), and iron uptake. Many genes were identical to genera other than Vibrio such as Escherichia, Pseudomonas, Salmonella, Aeromonas, and Mycobacterium. T3SS1, which is widely reported to be involved in Vibrio pathogenesis, was found on ChI. T6SSs are implicated in cell targeting and virulence, and are believed to mediate antibacteria [42]. Six T6SS proteins on Hcp secretion island-1 were detected. The tlh gene encoding a thermolabile hemolysin (TLH) was on ChII. In addition, four virulence genes (CU052_28670, CU052_00025, CU052_00190, and CU052_00400) were on plasmid with CU052_28670 on p345–185, and CU052_00025, CU052_00190, and CU052_00400 on p345–67.

Table 3 Virulence factors predicted with VFDB database

Antimicrobial-resistance genes

We identified 25 genes predicted to have > 40% identity to well-characterized ARGs (Table 4), including genes for aminoglycoside (golc, adeb, emre), penicillin (pbp2, pbp1a, pbp1b, pbp2), tetracycline (tet34, tetm, tetb), chloramphenicol (adeb, ceob, mdtl) and trimethoprim resistance (dfra26). In addition, five genes (CU052_28095, CU052_28120, CU052_28525, CU052_28540, and CU052_29140) were on the plasmid p345–185. These results were consistent with antibiotic-susceptibility assay data for strain 345 showing resistance to streptomycin, ampicillin, tetracycline, chloramphenicol, trimethoprim, and sulfamethoxazole.

Table 4 Antimicrobial resistance genes predicted with the ARDB database

Genomic islands, prophages and CRISPR-Cas systems

Forty-seven GIs were detected on ChI (Additional file 6: Table S4) and 24 on ChII in V. harveyi 345 (Additional file 7: Table S5). In the GIs of ChI, 40 transposases were encoded belonging to the IS110, IS21, IS256, IS3, IS5/IS1182, IS66, IS91, ISL3, and ISNCY families and seven belonging to two GIs. Another eight genes were identified that encoded IS66 family insertion sequence hypothetical proteins. Three integrases and one serine recombinase were encoded. Three genes were predicted to encode type III secretion proteins and one was predicted to encode another virulence factor (CU052_RS12915). In the GIs of ChII, nine genes encoding transposases belonging to the IS3 and ISNCY families were detected with four belonging to more than one GI. We found three genes encoding integrases and 13 genes on three GIs predicted to encode type VI secretion system (T6SS) proteins.

Two intact prophage sequences were identified (Additional file 8: Table S6 and no CRISPR elements were predicted.

Phylogenetic analysis, pan-core genes, dispensable and strain-specific genes

Phylogenetic trees showed that V. harveyi 345 closely related to other V. harveyi strains, especially V. harveyi VHJR7 and V. harveyi CAIM463 (Fig. 3).

Fig. 3
figure 3

Phylogenetic relationship of V. harveyi 345 and 30 compared V. harveyi strains. Results are based on the SNP matrix of the 31 strains and Treebest-1.9.2 software and maximum-likelihood method

The pan-gene refers to the total number of genes in all 31 strains, while core-gene represents orthologous genes shared among them. Dispensable genes were not included in at least one of the 31 strains and strain-specific genes were included in only one strain. Using 31 genomes (Additional file 2: Table S2), 9878 pan-genes with a total size of 2,764,797 amino acid (aa) and 3338 core-genes with a total size of 1,133,465 aa were identified (Fig. 4a-b-c). A dispensable gene heatmap showed that each strain contained strain-specific genes (Fig. 4d). A total of 217 genes were specific to V. harveyi 345 with 10 on p345–67, 94 on ChI, 17 on ChII, and 96 on p345–185 (Additional file 9: Table S7, Fig. 5a). Twenty-nine genes were annotated by Non-Redundant (NR) database and the others were hypothetical protein-encoding genes (Additional file 9: Table S7, Fig. 5b). Four annotated genes encoded MGEs, including two type IV conjugative transfer proteins (CU052_28715 and CU052_28750) and two recombinases (CU052_29120 and CU052_00050) on p345–185 or p345–67. Four (CU052_13455, CU052_13535, CU052_13525, and CU052_13445) were phage-related genes on ChI. Four genes (CU052_14535, CU052_14240, CU052_23640, and CU052_13020) on Chr I encoded membrane proteins. One beta-lactamase class C-encoding gene (CU052_00920) was on Chr I. Two DNA methylase-encoding genes (CU052_28385 and CU052_28885) belonging to the bacterial immune system were on p345–185. One virulence-associated protein D (CU052_13320) was on Chr I (Additional file 9: Table S7).

Fig. 4
figure 4

Pan-core genes, strain-specific and dispensable genes. Pan-gene dilution curve of 31 V. harveyi strains (a), core-gene dilution curve of the 31 strains (b), flower plot of strain-specific genes and core-genes (c), and heat map after removal of core-gene (dispensable heatmap) (d)

Fig. 5
figure 5

Strain-specific genes of V. harveyi 345. Strain-specific genes locations (a). Gene number of hypothetical proteins and proteins annotated by NR Database (b). Composition and relative abundance of strain-specific genes (c)

Strain-specific gene family

A strain-specific gene family was present in one of the 31 strains. Seven gene families were strain specific to V. harveyi 345 with > 50% identity to Pseudovibrio spp., Photobacterium spp., Escherichia spp., and other Vibrio spp. (V. parahaemolyticus) and were involved in recombination, peptide synthesis, conjugation and integration (Table 5).

Table 5 Strain-specific gene families of V. harveyi 345

HGT candidates on chromosomes

CU052_00920 in V. harveyi 345 encoded a beta-lactamase class C consisting of 486 amino acids. It shared no similarity with any protein of V. harveyi, Vibrio spp. or even γ-proteobacteria presently known. Instead, it shared more than 40% identity with the protein in four Pseudovibrio spp. (58.76–59.21% identity) suggesting recent acquisition of this gene from a Pseudovibrio strain (Fig. 6a and d). Two proteins were annotated, one as a serine hydrolase and the other as a class A beta-lactamase-related serine hydrolase.

Fig. 6
figure 6

Evolutionary analysis of CU052_00920, CU052_13320, and CU052_14535. Multisequence alignment (a-c) and phylogenetic analysis (D-F) of CU052_00920, CU052_13320, and CU052_14535 with amino acids. Multisequence alignment was by ClustalW and phylogenetic tree was constructed with MEGA 6. Bar substitutions per sequence position: 0.05 (d), 0.01(e), and 0.01(f)

CU052_13320 encoded a virulence-associated protein D of 94 amino acids. It shared 40–75.53% identity with Pseudomonas spp., Aeromonas spp., Neisseria spp., Haemophilus spp., Rodentibacter spp., Alysiella spp., Kingella spp., Haemophilus spp., and Halomonas spp. No similarity with any V. harveyi or known Vibrio spp. proteins was detected (data not shown). The top 15 similar strains were selected for muti-alignment and phylogenetic analysis (Fig. 6b and e). CU052_13320 was closest to a virulence factor in Pseudomonas aeruginosa (73.40% identity) and Aeromonas sobria (75.53% identity), suggesting HGT of this gene from P. aeruginosa or A. sobria (Fig. 6b and e).

The CU052_14535 gene, encoding a 167 amino acid membrane protein in V. harveyi 345, was identified as an OmpA family protein in many other homologs. CU052_14535 showed more than 40% identity with many strains of Photobacterium spp., Salinivibrio spp., Aeromonas spp., Ferrimonas spp., Grimontia spp., Aliivibrio spp., and other Vibrio species, but not with V. harveyi (data not shown). The top 15 similar strains were selected for muti-alignment and phylogenetic analysis (Fig. 6c and f). CU052_14535 was close to an outer membrane beta-barrel protein in V. splendidus (97.04% identity), and an OmpA family protein in V. crassostreae (97.04% identity), suggesting recent acquisition of this gene from V. splendidus or V. crassostreae (Fig. 6c and f).

HGT candidates on plasmids

Five antibiotic resistance genes and one virulence gene, CU052_28670, involved in immune evasion were detected on plasmid p345–185. Basic local alignment search tool (BLAST) of nucleotide (Blastn) results showed high homolog (query cover > 40% and identity > 40%) of p345–185 with plasmids pVPS62, V36, pVPS114, pVPH2, and pVPS91 from V. parahaemolyticus; pVAS114 and pVAS19 from V. alginolyticus; and pAQU1 from Photobacterium damselae, but no homology with other V. harveyi plasmids (Fig. 7a). Plasmid p345–185 showed 99.97% homolog (query cover = 99%) with pVPS62. CU052_29140 (sul2) was not found on 345 chromosomes and showed 100% amino acid identity to sul2 in Acinetobacter baumannii, Escherichia coli, Actinobacillus pleuropneumoniae, Histophilus somnim, and Salmonella enteric (Fig. 7b).

Fig. 7
figure 7

Evolutionary analysis of p345–185 and CU052_29140. Nucletides phylogenetic analysis of p345–185 (a) and amino acids multisequence alignment of CU052_29140 (b). Phylogenetic tree was constructed with the neighbor-joining method, max seq difference = 0.75 using BLAST pairwise alignments. Multisequence alignment was by ClustalW

Nucleotide sequence accession numbers

The complete genome sequence of V. harveyi 345 was deposited in GenBank with accession numbers CP025537 (ChI), CP025538 (ChII), CP025539 (plasmid p345–185) and CP025540 (plasmid p345–67).

Discussion

V. harveyi 345 was isolated and studied because of its multidrug resistance and serious virulence to E. oanceolutus in Shenzhen, Southern China. The complete genome sequence of V. harveyi 345 was determined and compared with 30 V. harveyi strains. V. harveyi 345 was assembled into two circular chromosomes and two plasmids. It had a larger genome than 27 strains other than KC13.17.5, E385, and 74F and had a consistent G + C content with the 30 others strains. Compared to the 30 V. harveyi strains, V. harveyi 345 had more predicated genes than the others strains except for ZJ0603. In total, 487 virulence genes encoding proteins in flagella, iron uptake, pili, LPS, CPS, chemotaxis and type II/III/IV/VI secretion involved in fish disease in clinics, were identified in V. harveyi 345. In addition, 25 ARGs including genes for resistance to aminoglycoside (golc, adeb, emre), penicillin (pbp2, pbp1a, pbp1b, pbp2), tetracycline (tet34, tetm, tetb), chloramphenicol (adeb, ceob, mdtl) and trimethoprim (dfra26) were found in V. harveyi 345, consistent with its resistance to streptomycin, ampicillin, tetracycline, chloramphenicol, trimethoprim, and sulfamethoxazole.

Virulence genes and antimicrobial-resistance genes are acquired by bacterial replication and by HGT. Horizontal transfer is the exchange of genetic material within species without any sexual mechanism [43]. This phenomenon is widely documented in bacteria and has a role in bacterial evolution and adaptation [43]. Comparative genomics of 31 V. harveyi strains was conducted to analyze HGT events by identifying strain-specific genes and strain-gene families. Considering the average gene number of 5087 for the 31 V. harveyi strains, the 3338 core genes represented approximately 66% of the total genome, meaning that approximately two-thirds of the genome was conserved among all strains. However, flower plots and dispensable heatmaps showed that each stain contained strain-specific genes, probably obtained by HGT. A total of 217 genes were specific to V. harveyi 345, which was more than other strains except for ZJ0603. This result suggested that a large number of HGT events happened in 345. In addition, seven gene families were specific to V. harveyi 345. Most (80.65%, 175/217) of the strain-specific genes had > 40% identify to other Vibrio species. The remaining came from other genera such as Shewanella, Photobacterium, Pseudovibrio, and Escherichia. We focused on the characterization of three strain-specific genes, CU052_00920, CU052_13320, and CU052_14535. CU052_00920 encoded a class C beta-lactamase and was transferred from Pseudovibrio spp. Beta-lactamase enzymes are reported to inactivate beta-lactam antibiotics by hydrolyzing the peptide bond of the characteristic four-membered beta-lactam ring, rendering the antibiotic ineffective [44]. This result suggested that CU052_00920 contributed to antibiotic resistance, especially ampicillin resistance of V. harveyi 345. CU052_13320 encoded a virulence-associated protein D and closes to a virulence factor in Pseudomonas aeruginosa (73.40% identity) and Aeromonas sobria (75.53% identity). CU052_13320 probably contributed to the virulence of 345. CU052_14535 encoded an OmpA family protein and was acquired from V. splendidus or V. crassostreae. Among pathogenic bacteria, OmpA proteins are important for pathogenesis including bacterial adhesion, invasion, or intracellular survival and evasion of host defenses or stimulation of proinflammatory cytokine production [45]. This result suggested that CU052_14535 regulates the virulence of V. harveyi 345. DNA can also be horizontally transmitted between bacteria through plasmids, phages or uptake of naked DNA from environment [46]. Homology analysis showed that p345–185 had 99.97% homology (query cover = 99%) to pVPS62 in V. parahaemolyticus. PVPS62 is reported to be a pAQU-type plasmid and emerge MDR conjugative plasmid among important pathogens [47]. Five antimicrobial-resistance genes (tetm, tetb, qnrs, dfra17, and sul2) involved in resistance to tetracycline, fluoroquinolone, trimethoprim, and sulfonamide, and one virulence gene (CU052_28670) involved in immune evasion were located on plasmid p345–185. These genes could be horizontally transferred, promoting drug resistance and virulence. Especially, CU052_29140 (sul2) showing 100% amino acid identity to sul2 in Acinetobacter baumannii, Escherichia coli, Actinobacillus pleuropneumoniae, Histophilus somnim, and Salmonella enteric, should contribute to the resistance of pediatric compound sulfamethoxazole tablets and sulfamethoxazole. Similarly, Klein et al. [48] found that chloramphenicol, oxytetracycline and chlortetracycline could be successful transferred by R-plasmids.

Except for plasmids, HGT encompasses a variety of genetic units, collectively known as MGEs [49] and including phages, GIs, and integrating conjugative elements (ICEs). A total of 47 GIs, with encoding many transposases, integrases, and recombinases, and two incomplete prophage sequences were identified in V. harveyi 345. These genetic units probably acted as HGT delivery tools, contributing to pathogenesis, drug resistance and environmental adaptation of V. harveyi 345. Several type III/VI secretion proteins along with IS family transposases were predicted on the same GI that could be easily transmitted. Dissemination of these genes could further compromise Vibrio infections, limiting treatment options. These results further indicated that HGT contributed to virulence and antibiotic resistance of V. harveyi 345.

The 345 strain was isolated from a diseased E. oanceolutus in Shenzhen, Southern China. Recent increasing aquaculture density has led to frequent breeding diseases and increased use of antibiotics in Southern China, resulting in serious antibiotic residues and pollution [50]. Intensification of coastal urbanization has led to massive discharge of pollutants (heavy metals, nutrients, biocides), and intensification of human activities has caused environmental climate change, especially global warming [51]. Many studies show that the wide use of antibiotics; pollution by heavy metals, nutrients and biocides; and global climate change regulate HGT by affecting plasmid replication, changing phage activity, adjusting HGT-related enzyme activity, and damaging immune systems. These factors regulate bacterial resistance and pathogenicity [52,53,54]. For example, Beaber et al. [53] found that ciprofloxacin induces transfer of SXT, an ICE derived from V. cholera encoding genes that confer resistance to chloramphenicol, sulphamethoxazole, trimethoprim and streptomycin. The mechanism is increasing expression of SXT activators, enhancing drug resistance. Similarly, high environmental temperatures in combination with UV irradiation accelerate the spread of stx (Shiga toxin) genes by enhancing Stx prophage induction and Stx phage-mediated gene transfer [52]. Therefore, environmental pollution and climate change may increase the communication of virulence genes and drug resistance genes by affecting HGT. The result will enhance the toxicity and drug resistance of V. harveyi and affect the ecological safety of aquaculture ecosystems.

Conclusions

We present the first complete genome of the serious disease-causing V. harveyi 345 which has a multidrug-resistant phenotype. We did comparative genomics between this strain and 30 other V. harveyi strains. The genome was determined to study V. harveyi 345 virulence and antimicrobial resistance. Multiple virulence factors and resistance genes were predicted in this strain, consistent with results of Jiang et al. [38]. We searched for evidence of HGT and evaluated genomic traits relating to HGT. The high quality complete genome sequence generated in this study will aid further studies for a deeper understanding of the mechanisms of Vibrio pathogenesis and antibiotic resistance. These studies could improve seafood quality and reduce economic loss. We recommend that more research be done on the response mechanism of HGT relative to environmental change and antibiotic use. This research will be an important scientific basis for predicting outbreaks and controlling V. harveyi disease. As environmental pollution including use of antibiotics and climate change enhance the virulence and drug resistance of pathogens, research should explore management approaches, for example by bacteriophages, to manage the occurrence of V. haveyi in the environment [55].

Methods

DNA extraction

V. harveyi 345 was grown overnight in 2216E medium (BD, USA) at 28 °C with vigorous shaking. Overnight cultures were inoculated with a 1:1000 dilution into 20 ml fresh 2216E medium and grown until OD600 = 0.5. Cells were collected by centrifugation and genomic DNA extracted using the CTAB method [56]. The quantity and quality of genomic DNA were evaluated using a Qubit Fluorometer (Invitrogen, Carlsbad, CA, USA) and 1% agarose gel electrophoresis. DNA was stored at − 80 °C and prepared for genome sequencing.

Whole genome sequencing and assembly

A PacBio RSII 20-kb template library was constructed from at least 10 μg genomic DNA at the Beijing Genomics Institute (BGI). The library was subjected to quality control (QC) with Qubit Fluorometer (Invitrogen, Carlsbad, CA, USA) and Nanodrop 2000 (Thermo Fisher Scientific, Wilmington, DE, USA) to check the concentration, purity and integrity of library templates. Libraries were sequenced on Illumina Hiseq 4000 and Pacbio RSII sequencing platforms. Genomes were primarily assembled with a variety of software for short sequence assembly based on clean data from the Illumina Hiseq 4000 platform. RS_HGAP Assembly3 of SMRT Analysis v.2.3.0 was used to assemble data from the Pacbio RSII platform. Contigs were error corrected successively by soapSNP and soapIndel software and GATK analysis. Gaps between contigs were filled by PCR.

General annotation

ORFs were predicted by Glimmer v.3.02 [57], and genome annotation was completed using the NCBI prokaryotic genome automatic annotation pipeline with identity ≥40% and E_value ≤10− 5 (Blast and annotation for all study sets used the same identities and E_values). COG, KEGG, GO and NR databases were used to search domain architecture. The tRNA genes were identified by tRNAscan-SE v.1.23 [58] and rRNA genes with RNAmmer v.1.2 [59]. The sRNAs were predicted by blasting nucleic acid against Rfam database [60]. Tandem repeats were predicted by Tandem Repeat Finder v.4.04 [61].

Identification of virulence factors and ARGs

Putative virulence factors of V. harveyi 345 were predicted by blasting against the Virulence Factor Database (VFDB) (http://www.mgc.ac.cn/VFs/main.htm) [62] and ARGs by blasting against the Antibiotic Resistance Genes Database (ARDB) (http://ardb.cbcb.umd.edu/) [63].

Prediction of GIs, prophages, and CRISPR-Cas systems

GIs, prophages, and CRISPR-Cas systems were respectively identified using online tools IslandViewer 4, PHAge Search Tool (PHAST), and CRISPR Finder software [64,65,66].

Comparative genomics

Comparative genome analysis was conducted among 31 V. harveyi strains with V. harveyi VH2 as the reference strain and an average gene number of 5087 (Additional file 2: Table S2) to assess V. harveyi evolution. A phylogenetic tree was constructed based on the single nucleotide polymorphism (SNP) matrix of the 31 strains with the maximum-likelihood method, bootstrapped 1000 times via Treebest-1.9.2 software [67].

Extraction of pan-, core-, dispensable and strain-specific genes

Core-pan genes were analyzed using a BLAST method [68]: Genes were taken from reference genome V. harveyi VH2 as a gene pool. Genes predicted in one strain among other 30 strains (Query samples) were BLAST with the gene pool, and results filtered by length and identity (> 40%). BLAST coverage ratios (BCR, reference BCR = match/reference length × 100%, query BCR = match/query length × 100%, and match stands for the length used for BLAST) of genes from the gene pool and Query samples were calculated separately. If BCR values from the reference and Query sample were smaller than the setting value (40%), the reference gene did not have homology with the Query gene. The nonhomologous gene from the Query genome was added to the gene pool. Query samples were processed and the final gene pool used as the pan gene pool. The non-homolog genes from the reference genome were the strain specific genes against the Query sample. Nonhomologous genes were BLAST with the gene pool of another Query sample, repeated for all samples, and the final nonhomologous gene pool was used as the strain-specific gene pool.

After aligning to genes from samples, BCR values of genes from the pan gene pool were calculated for each sample. The coverage array was generated for this pool. If the BCR value of a gene in each sample was larger than the setting value, the gene was a core gene. If the gene was predicted from an assembled result, BLAST results were filtered and the sequence was removed if the number N (N means uncertain nucleotides in gene) was larger than the setting (30%) for the gene. Dispensable genes were included in the pan-gene pool but not the core-gene pool.

Construction of gene families

Gene families were constructed using genes from the reference strain and the target bacterium. Protein sequences were aligned with basic local alignment search tool of protein (Blastp) and redundancy was eliminated. Gene family TreeFam clustering was carried out with alignment results and Hcluster_sg software. Alignment results for proteins were converted into multiple sequence amino acids in the CDS area after multiple sequence alignments with the clustered gene family using Muscle software [69]. Gene family tree construction analysis was carried out for multiple sequence alignment results using Muscle and neighborhood-joining (NJ) method with Treebest software [67].

Characterization of putative HGT

Of the strain-specific genes, we extracted CU052_00920, encoding a beta-lactamase class C; CU052_13320, encoding a virulence-associated protein D; and CU052_14535 encoding an OmpA family protein for further analysis on contribution to the antibiotic resistance or virulence of V. harveyi 345. The genes were compared to the NR protein database of NCBI using the Blastp tool. The results were filtered manually to find significant hits to proteins belonging to species other than V. harveyi. Multiple sequence alignment was conducted using ClustalW [70]. An alignment phylogenetic tree was constructed from aligned sequences using the Kimura 2-parameter model [71] with the neighbor-joining method, bootstrapped 1000 times, with MEGA6.0 software [72].

In addition, the entire sequence of p345–185 was compared to the NR nucleotide database of NCBI using a Blastn tool. A phylogenetic tree was constructed with the neighbor-joining method using max seq difference = 0.75 and BLAST pairwise alignments. The characterization method was the same for CU052_29140 (sul2) on p345–185 as for CU052_00920, CU052_13320, and CU052_14535.