Background

Gamma-proteobacteria of the genus Providencia (Enterobacterales; Morganellaceae) have been identified in diverse environmental samples as well as in association with both vertebrate and invertebrate animals and humans. Six out of 10 currently recognized Providencia species, namely P. rettgeri, P. stuartii, P. alcalifaciens, P. rustigianii, P. heimbachae and P. huaxiensis [1], were isolated from clinical samples [2,3,4] and comprise opportunistic human pathogens typically causing diarrhea [5,6,7] and urinary tract infections [8] that are often clinically complicated by multidrug resistance of the pathogen [9,10,11,12]. Moreover, bacteria assigned to the recognized Providencia species P. vermicola, P. rettgeri, P. alcalifaciens, P. sneebia, and P. burhodogranariaea [1] together with the recently proposed new species P. entomophila [13] have been found associated with or pathogenic to insects as honeybees [14], house and blow flies [15, 16], the fly-parasitic wasp Nasonia vitripennis [17], and a diverse range of fruit flies [13, 18,19,20,21,22,23].

The species P. vermicola is conspicuous among these as the putative insecticidal agent carried by entomoparasitic nematodes of the genera Steinernema [24], Butlerius [25] or Rhabditis [26]. However, further nematode-associated bacterial entomopathogens were identified as Providencia sp. [27] or P. rettgeri [28, 29]. The host-vector-pathogen relationship of these Providencia strains is functionally reminiscent of closely related Xenorhabdus and Photorhabdus bacteria (Enterobacterales; Morganellaceae) [30, 31]. A possible contribution of nematode-associated Providencia bacteria to insect biocontrol has been evaluated [25, 28, 29].

Moreover, P. vermicola has been reported to be a fish pathogen [32,33,34]. Several Providencia species, including P. vermicola, [35, 36] have been described as remarkably resistant to high concentrations of metals [37,38,39,40] and to several types of antibiotics [35, 36, 41]. During the past decade, a multi drug resistance (MDR) phenomenon linked to the spread of integrons carrying antibiotic resistance genes including the New Delhi metallo-lactamase gene (ndm1) has gained clinical importance globally. The ndm1 encoded carbapenemase enables pathogenic enterobacteria, including P. rettgeri and P. stuartii, to hydrolyze a wide spectrum of ß-lactam antibiotics [41, 42].

An increasing number of Providencia genome sequences have been published during the past decade, including the recent publication of the genomes (i.e. assemblies GCA_014396895.1, GCA_010748935.1 and GCA_016618195.1) of three Providencia strains assigned to the species P. vermicola, namely strain G1 isolated from fish in Algeria, strain P8538 obtained from a clinical sample in Congo and strain LLDRA6 isolated from contaminated soil in China [43]. Comparative approaches have been used to explore the genomes of both clinical [44] and insect derived [45] Providencia bacteria. However, the species P. vermicola has not been covered by these studies, mainly due to the lack of a reliable reference genome.

This study reports the genome sequence of the nomenclatural type strain Providencia vermicola DSM_17385 and presents the first complete genome analysis for the taxonomic species P. vermicola. Strain DSM_17385 has previously been isolated from infective juveniles of the entomoparasitic nematode Steinernema thermophilum collected in soils at New Delhi, India, and has been recognized as the type strain of a new taxonomic species on the basis of 16S rRNA gene comparisons, restriction pattern based ribo-printing, metabolic property analyses, and a DNA-DNA relatedness value of 30% with respect to the P. rettgeri type strain as determined by physical DNA-DNA reassociation [24].

Results

General characteristics of the P. vermicola DSM_17385 genome

The genome of the P. vermicola type strain DSM_17385 was sequenced in this study and assembled into 5 scaffolds and 13 contigs, starting from 765,178 paired-end reads (MiSeq 2 × 250 bp). Basic genome information is given in Table 1. The accumulated scaffold and contig length of this assembly is 4.23 Mb (Fig. 1). The GC content, N50, and L50 of the DSM_17385 genome were 41.1%, 344,020 and 5, respectively. Annotation with GLIMMER resulted in 3969 protein-encoding genes, 220 (i.e. app. 6%) hypothetical proteins and 74 structural RNA encoding genes, more exactly three 5S rRNA, one 16S rRNA, one 23S rRNA, and 69 tRNA genes. As sequence gaps can cause interruption or deletion of ORFs, and as the DSM_17385 genome was not optically mapped, inferred annotations might be incomplete.

Table 1 Genomic features of P. vermicola strain DSM_17385
Fig. 1
figure 1

Circular maps of the P. vermicola DSM_17385 genome (A) and of plasmid pPVER1 (B). A The scaffolds and contigs are ordered and oriented for maximum synteny with the P. rettgeri Dmel1 genome sequence. Scaffolds (S) and contigs (C) are not positioned in a complete circle because their order and orientation are not empirically known. The size of the gaps between scaffolds is unknown. Rings from the outermost to the center: genes on the forward strand (red), genes on the reverse strand (blue), tRNA and rRNA genes (orange), genes unique to P. vermicola (green). B Physical map of plasmid pPVER1: ORFs on the forward strand (red), genes on the reverse strand (blue)

We performed a GO-term annotation analysis of all protein-encoding genes identified by GLIMMER. Through this analysis, we were able to assign functional annotations to 3749 (app. 95%) of the predicted genes (Fig. 2a). Coding sequences were assigned to putative super-functional and functional categories using the clusters of orthologous groups of proteins (COG) database [46]. Approximately 40% of the predicted genes were dedicated to metabolic functions. One-third was roughly evenly split between cellular process/signaling functions and information storage/processing functions. Functions of those predicted genes in the remaining 20% of genome were either poorly categorized or uncategorized (Fig. 2a).

Fig. 2
figure 2

Functional annotation of P. vermicola DSM_17385 genes. A COG categories of predicted P. vermicola genes on the inner and COG subcategories on the outer ring. Each category or subcategory is graphed as a percentage of the total number of genes. M.: metabolism; P.C.: poorly characterized; U.: uncharacterized; I.S.P.: information storage and processing; C.P.S.: cellular processes and signaling. B Numbers of shared and unique protein-encoding genes when comparing P. vermicola DSM_17385 to P. rettgeri Dmel1. Numbers are the gene counts within each sector of the Venn diagram

Multilocus sequence typing and phylogenetic reconstruction

Phylogenetic reconstruction from three independent datasets was performed in order to assess the systematic position of P. vermicola DSM_17385. The complete 16S ribosomal RNA encoding gene comprising in length 1528 nucleotides was employed at a first level of phylogenetic analysis, extending the previous 16S rRNA based phylogenetic analysis [24] to a larger set of reference strains. The second dataset, referred to as “hMLST”, was composed of five housekeeping genes encoding translation elongation factor EF-G (fusA), DNA gyrase subunit B (gyrB), isoleucyl-tRNA synthetase (ileS), translation elongation factor EF-4 (lepA), and leucyl-tRNA synthetase (leuS) that have been used previously in molecular taxonomy studies of Providencia bacteria [13, 19]. The third dataset, referred to as “rMLST”, comprised the full set of ribosomal rpl, rpm and rps proteins employed in bacterial rMLST approaches [47]. The full coding sequences of the 16S rRNA, the five hMLST and 53 rMLST markers were identified in the genome sequence under study (Additional files 1 and 2).

Comparison of the 16S rRNA genes from a set of 31 bacterial strains representing the major groups of Providencia bacteria including the nomenclatural type strains of all recognized Providencia species gave rise to a phylogenetic tree (Suppl. Figure 1) generally characterized by insufficiently bootstrap supported clades. P. vermicola DSM_17385 together with supposed P. vermicola strain G1 and several strains assigned to the species P. rettgeri were grouped together in a clade with branches receiving from 20 to 52% bootstrap support. The two further supposed P. vermicola strains P8358 and LLDRA6 together with the P. sneebia type strain were loosely, i.e. with bootstrap support values between 16 and 43%, associated to an optimally supported clade comprising all P. stuartii strains together with the P. thailandensis type strain. Insufficient resolution of 16S rRNA phylogenies at the level of Providencia species delineation had already been stated previously [19, 24].

Concatenation of the identified hMLST and rMLST marker genes resulted in combined meta-gene sequences comprising 11,619 bp and 21,267 bp in length, respectively. Comparison with the concatenated orthologs from 31 representative Providencia genomes gave rise to phylogenies of essentially identical tree topology (Figs. 3 and 4). In particular, both the hMLST and rMLST based phylogenies coincided i) in placing the P. vermicola type strain DSM_17385 into a sister clade position to clades A and B of the P. rettgeri complex, well delineated from the type strains of all further Providencia species, and ii) in not co-locating the presumed P. vermicola whole genome sequences from Providencia strains P8538, LLDRA6 and G1 with the P. vermicola type strain. Whereas Providencia strain G1 was firmly, i.e. with 100% bootstrap support in both phylogenies, positioned within the P. rettgeri clade B, strains P8538 and LLDRA6 appeared loosely related to each other and to an optimally bootstrap supported clade comprising the nomenclatural type strains of both the species P. stuartii and P. thailandensis. Concerning the systematic position of Providencia strains P8358 and LLDRA6, both these hMLST and rMLST phylogenies essentially reproduced the results obtained from 16S rRNA gene comparisons.

Fig. 3
figure 3

Neighbor Joining (NJ) phylogeny of Providencia bacteria as reconstructed from concatenated complete fusA, gyrB, ileS, lepA and leuS gene sequences. Terminal branches are labelled by genus, species and strain designations as well as GenBank accession numbers; “TYPE” indicates nomenclatural type strains of the respective taxonomic species. Bacterial strains that have been assigned to the species P. vermicola are in bold type. Numbers on branches indicate bootstrap support percentages. P. rettgeri clades A and B are indicated at the right margin. The size bar corresponds to 1% sequence divergence; the length of dashed lines is not true to scale. The concatenation of orthologous sequences from the closely related bacterium Proteus mirabilis has been used as outgroup

Fig. 4
figure 4

Neighbor Joining (NJ) phylogeny of Providencia bacteria as reconstructed from 53 concatenated ribosomal protein encoding genes. Terminal branches are labelled by genus, species and strain designations as well as GenBank accession numbers; “TYPE” indicates nomenclatural type strains of the respective taxonomic species. Bacterial strains that have been assigned to the species P. vermicola are in bold type. Numbers on branches indicate bootstrap support percentages. P. rettgeri clades A and B are indicated at the right margin. The size bar corresponds to 1% sequence divergence; the length of dashed lines is not true to scale. The concatenation of orthologous sequences from the closely related bacterium Proteus mirabilis has been used as outgroup

When phylogenetic reconstruction from the concatenated hMLST marker set was extended to all 195 Providencia genomes currently available in the Genbank database (Suppl. Figure S2), both strains P8358 and LLDRA6 formed 100% bootstrap supported clades with a small number of strains assigned to the species P. stuartii. However, these clades were well delineated from the “main” P. stuartii clade comprising at 100% bootstrap support the vast majority of all strains assigned to this species together with the nomenclatural type strains of both P. stuartii and P. thailandensis.

Ribosomal typing of P. vermicola DSM_17385 and the three presumed P. vermicola genome strains gave rise to highly diverse rMLST based taxonomic assignments (Table 2). When compared to the PubMLST ribosomal protein encoding gene database, exactly matching alleles were identified for only five out of the 53 rMLST marker genes identified in the P. vermicola type strain genome, providing a too scarce statistical basis for reliable ribosomal sequence type (rST) assignment. The rMLST based taxonomic assignment to either P. rettgeri or P. burhodogranariea remained ill-supported and inconclusive. Ribosomal typing of Providencia strain P8538, in contrast, resulted in an apparently highly conclusive outcome, with 53/53 exact allele matches giving rise to an unequivocal rST identification and a maximally supported taxonomic assignment to the species P. vermicola. However, as the P8538 genome currently serves as reference for the species P. vermicola in the PubMLST database, these results cannot be judged meaningful. On the basis of ribosomal typing, presumed P. vermicola strains LLDRA6 and G1 were assigned with low to intermediate support to the species P. stuartii and P. rettgeri (clade B), respectively (Table 2).

Table 2 rMLST results for presumed P. vermicola genome sequencing strains

Phylogenetic reconstruction based on the concatenated hMLST marker set from the 195 Providencia genomes available in the Genbank database demonstrated that P. vermicola DSM_17385 appears molecular taxonomically most closely related to a single genome strain, namely Providencia strain MR4 that has previously been assigned to the species P. rettgeri (Suppl. Figure S2). In both the concatenated hMLST and rMLST marker based phylogenies P. vermicola DSM_17385 and strain MR4 form a maximally bootstrap supported clade with comparatively long branches indicating considerable sequence divergence (Figs. 3 and 4). However, orthologs of only 4/5 hMLST markers (all but gene lepA) and 50/53 rMLST markers (all but genes rplM, rpsI and rpsL) were identified in the published MR4 genome sequence. With respect to the hMLST data set, the respective clade was supported in the gyrB, ileS and leuS, but not in the the fusA single marker phylogenies. With respect to the rMLST data set, the p-distance matrix based pair-wise sequence similarity for the concatenation of 50 rMLST marker alleles from strains DSM_17385 and MR4 was calculated to be 97.8%. This corresponded to the pair-wise sequence similarities of the P. vermicola type strain to the P. rettgeri (clade A) type strain DSM_4542 (97.9%), the P. huaxiensis type strain WCHPr000369 (97.8%) or strains making up P. rettgeri clade B (range 97.4 - 97.5%) with sequence similarities to the type strains representing further Providencia species being 97.0% (P. alcalifaciens) or considerably lower (Suppl. Table S1). In contrast, sequence similarities within P. rettgeri clades A and B were generally superior to 99%. However, analogous pairwise sequence similarity percentages calculated from the hMLST data set are consistent with a comparatively closer phylogenetic relationship of P. vermicola DSM_17385 and strain MR4 (Suppl. Table S1). Ribosomal typing of Providencia strain MR4 identified 49/50 exact matches and gave rise to a maximally supported taxonomic assignment to the species P. rettgeri and to three rSTs representing this species (Table 2).

Digital DNA-DNA hybridization analysis

Among the methods for evolutionary distance assessment between bacterial species based on digital whole genome comparison, average nucleotide identity (ANI) is one of the most powerful approaches [48]. Therefore, pair-wise genome-wide average nucleotide identity by orthology (OrthoANI) has been determined for two subsets of 10 Providencia strains. The main results obtained were as follows (Fig. 5): i) pair-wise ANI values for the P. vermicola DSM_17385 genome and one of the nine Providencia type strains representing one of the further recognized species ranged between 77 and 81%, i.e. were in the range of values normally found for pair-wise ANI percentages across this set of specific type strains; ii) pair-wise ANI values for comparisons of strain DSM_17385 with one of the three further supposed P. vermicola strains G1, P8538 and LLDRA6 were in this same range, more exactly 81% for strain G1 and 77% for both strains P8538 and LLDRA6; iii) pair-wise ANI values for comparisons of supposed P. vermicola strains G1, P8538 and LLDRA6 and Providencia strains assigned to other species were found to be considerably higher than the above percentage range as, for instance, in the order of magnitude of 99% for the pair-wise comparisons of strain G1 with P. rettgeri strain RB151 (representing P. rettgeri clade B), of strain P8358 with P. stuartii strain PRV00010, and of strain LLDRA6 with P. stuartii strain Crippen.

Fig. 5
figure 5

Heatmaps showing pair-wise average nucleotide identity by orthology (OrthoANI) percentages for two sets of Providencia genomes as calculated using the OAT software. Providencia strains are labelled by species and strain designations; “TYPE” indicates nomenclatural type strains of the respective taxonomic species

Under the assumption that ANI values of 95–96% indicate bacterial species boundaries [49], these results are consistent with the following statements: i) P. vermicola strain DSM_17385 is not more closely related to any of the other specific type strains recognized within the genus Providencia as the latter are among each other and is, therefore, correctly considered type strain of an independent species; ii) none of the three supposed P. vermicola strains G1, P8358 and LLDRA6 belongs to the same species as the P. vermicola type strain; iii) instead, Providencia strains G1, P8358 and LLDRA6 should at the species level be assigned to the same taxon as strains P. rettgeri RB151, P. stuartii PRV00010 and P. stuartii Crippen, respectively.

Comparative genomic analysis

A comparative genomics approach was employed to identify orthologous proteins in P. vermicola DSM_17385 and the insect-derived P. rettgeri (clade A) strain Dmel1. Dmel1 has been originally isolated from wild-caught Drosophila melanogaster and has been demonstrated to be a fruit fly pathogen [20]. Moreover, the genome of strain Dmel1 is well annotated and has been compared to genomes of fruit fly associated Providencia strains falling under the species P. alcalifaciens, P. sneebia, and P. burhodogranariea [45].

Based on this comparison, 3127 bona fide orthologous pairs were identified (Fig. 2b) with the large majority being present as single copies. This core genome is 78% of the total genes identified in each genome alone, and the corresponding orthologous gene pairs were distributed over 86% of the P. vermicola and 83% of the P. rettgeri genome. The corresponding genes together covered an analyzed genomic region of 3.6 Mb in both P. vermicola and P. rettgeri. The absolute number of unique genes, i.e. those not assigned to any orthologous pair, was very similar in both genomes, with 847 and 859 unique genes being detected in P. vermicola and P. rettgeri, respectively (Fig. 2b). Thus, unique single-copy genes represent app. 21% of the total genome content for both species.

The scaffolds and contigs of DSM_17385 were ordered and oriented so that they were as similar to the P. rettgeri Dmel1 genomic orientations as possible, assuming the most parsimonious evolution of genome arrangements. The global identity estimated from LASTZ was 82.3% (Fig. 6a). Genomic rearrangements are highlighted on the physical synteny map reported in Fig. 6b. It is in principle possible that any of the P. vermicola contigs could be inverted or rearranged relative to their positions on our comparative syntenic plot, but only if the rearrangement breakpoints lie at contig breakpoints.

Fig. 6
figure 6

Comparison of strain P. vermicola DSM_17385 and P. rettgeri Dmel1 genome sequences. A Syntenic dotplot between P. vermicola DSM_17385 scaffolds or contigs (x-axis) and the P. rettgeri Dmel1 chromosome (y-axis). Syntenic regions were derived from orthologous blocks. Deflections of segments along either axis indicate insertions of DNA sequence. Blue dots represent homologous regions in the same, red dots in opposite orientation in both genome pairs. B Identification of collinear blocks in syntenic genomic regions. Collinear blocks are labelled with the same color and connected by lines. Block boundaries indicate breakpoints of genome rearrangements

The genomic rearrangements associated with speciation of P. vermicola and P. rettgeri from a common ancestor have partially preserved the location and organization of several homologous genomic regions. A total of 15 collinear blocks were discovered and identified between P. vermicola and P. rettgeri genomes (Fig. 6b). Many small rearrangements and two larger inversions were apparent across both genomes, the latter (in red Fig. 6a) involving a genomic region of app. 16 kb on contig 8. In particular, a large genomic inversion of app. 800 kb including a type III secretion system (T3SS-1) encoding gene cluster that has been identified previously in P. sneebia [45] is syntenically oriented in the P. vermicola genome with respect to P. rettgeri Dmel1.

Type III secretion systems

A single gene cluster encoding a type III secretion system (T3SS) or “injectisome”, i.e. a needle-like apparatus involved in protein secretion across a host cell membrane, was identified in the P. vermicola DSM_17385 genome. The cluster comprising 22 kb and 23 ORFs was highly similar (app. 65% of pairwise identity) in gene orientation and putative gene function to a T3SS-1 island of P. rettgeri Dmel1 (Fig. 7); in particular, both clusters comprised a gene encoding an InvA-type ATPase and were located in a region of synteny that is inversed in P. sneebia. In contrast, no region of significant similarity to a second T3SS-2 island comprising a Ysc-type ATPase gene that is present in insect-associated P. sneebia and P. burhodogranariea bacteria, was identified in the genome sequence under study.

Fig. 7
figure 7

Alignment of type III secretion systems (T3SS-1) of P. vermicola DSM_17385 and P. rettgeri Dmel1. The graph shows the pairwise identity (sliding windows size of 100 nucleotide) between T3SS-1 sequences. Average pairwise identity across the full length sequence (64.5%) is indicated by the dashed red line. Colored arrows indicate individual genes and their orientation. The aligned genomic regions indicate the approximate boundaries of the T3SS-1 islands based on gene annotation

Plasmids

No sequences corresponding to the 5.6 kb plasmid pPRET1 previously detected in P. rettgeri Dmel1 or to the plasmids known from other D. melanogaster associated Providencia species [45] nor to the multi-drug resistance plasmids of P. rettgeri and P. stuartii [50] were identified in the P. vermicola DSM_17385 genome. However, one small non-transmissible plasmid was identified. pPVER1 comprised in length 3682 bp and contained 10 ORFs (Fig. 1b). ORF2, ORF3 and ORF4 encoded hypothetical proteins comprising deduced sequences of, respectively, 66, 146, and 147 amino acids with > 90% similarity to gene products encoded by a family of qnrD-carrying plasmids that have been described previously for several strains of P. rettgeri (plasmids pDIJ09–518a, pGHS09–09a, pAB213, pYPR25–3), P. stuartii (pMF1A) and P. alcalifaciens (pBT169) and further Morganellaceae bacteria [51] as well as to (partially truncated) gene products encoded by plasmid p3–000369 of P. huaxiensis [4]. ORF1b of pPVER1 encoded a hypothetical protein of 88 amino acids with lower similarity to presumed orthologs in the genomes of Klebsiella, Enterobacter and Citrobacter bacteria (app. 60% similarity) and of sporadic presence in genomes of several P. rettgeri (51%), P. alcalifaciens (51%), P. stuartii (39%) and P. heimbachae (37%) strains, whereas the partially overlapping ORF1a encoded a hypothetical gene product of 170 amino acids with no significant similarities identified across the Genbank database. Moreover, five short (< 100 bp) ORFs named ORF5 through ORF9 with no significant similarity hits across Genbank were found located up- and downstream of ORF1a/b. Two 24 bp imperfect inverted repeats with eight mismatches delineated the region comprising colinear ORF1a, ORF1b and ORF2 and consistently defined a mobile insertion cassette (mic) of 2663 bp bracketed by both copies of a presumed duplicated CA insertion site.

Antibiotic resistance genes

Antibiotic resistance factors encoded in the P. vermicola DSM_17385 genome operate by four resistance mechanisms with antibiotic efflux being the predominant one, followed by antibiotic target alteration, antibiotic inactivation and reduced antibiotic uptake (Table 3, Suppl. Table S2, Additional file 3).

Table 3 Putative antibiotic resistance-associated factors (ARF) encoded by the P. vermicola DSM_17385 genome

Among the different types of efflux pumps identified, orthologs of the SMR-type efflux pump EmrE [52], the peptide-potassium antiporter RosAB [53], the MFS-type efflux pumps MdtG [54], KpnEF [55] and KpnGH-TolC [56, 57] as well as the ABC-type transporter MacAB-TolC [58, 59] are ubiquitously distributed across the genus Providencia, whereas the SMR-type efflux pump AbeS [60] potentially confers macrolide and aminocoumarin resistance to Providencia bacteria belonging to the species P. stuartii and P. rettgeri clade A and the MFS-type transporter Tet (59) [61] is mainly responsible for widespread tetracycline resistance observed in P. rettgeri, P. alcalifaciens and P. heimbachae bacteria [44]. Further identified efflux pump orthologs as the TolC-dependent transporters MdtABC [62], AcrAB [63] and EmrAB [64], the OpmD-dependent RND-type pump MexGHI [65], the MFS-type systems MdtH [66] and Bcr1/2 [67] appear sporadically across the genus Providencia. The identified ortholog of the transcriptional acrAB operon repressor AcrR [68] has been found to carry two known (Y114F, V165I) and one previously undescribed (M109L) mutations potentially conferring or increasing resistance to a spectrum of antibiotics including tetracycline, phenicols, penam, triclosan and fluoroquinolones [69]. Orthologs of further potentially resistance-relevant regulators as the carbon storage regulator protein CsrA [70, 71] and the cAMP-activated global transcriptional repressor CRP [72] have expectedly been identified, but will most likely not have an immediate role in antibiotic resistance regulation in P. vermicola as their known respective targets, i.e. the efflux pumps MexEF-OprN [73] and MdtEF [74, 75], respectively, are lacking. Moreover, an ortholog of the alternative porin OmpK37 [76] that reduces permeability of the cell envelope for a range of beta-lactams and is present in almost all sequenced Providencia genomes [44], has been identified in P. vermicola.

Among those antibiotic resistance conferring factors that act by antibiotic inactivation or molecular target alteration and are virtually ubiquitous across published Providencia genomes [44], the P. vermicola DSM_17385 genome encodes an ortholog of an SRT-2 type beta-lactamase [77] conferring resistance to cephalosporins and comprises two identical genes encoding orthologs of the lipid A modifying phosphoethanolamine transferase PmrE [78] that confers resistance to antimicrobial peptides and polymyxin. An ortholog of the NmcR regulator of the class A beta-lactamase NmcA [79] has been found encoded in the P. vermicola genome; however, as no gene encoding an NmrA ortholog has been identified, its relevance for beta-lactamase resistance is unclear. Moreover, P. vermicola comprises an ortholog of both chloramphenicol acetyltransferase CAT-III [80] and aminoglycoside acetyltransferase AAC (2′)-Ia [81]; both factors are widespread in mostly clinical strains of P. rettgeri and P. stuartii [44].

Among the resistance genes occurring sporadically across the genus, the P. vermicola DSM_17385 genome comprises orthologs of the phosphomycin thiol transferase FosA7 [82, 83], the ErmX-type rRNA methyltransferase RsmA [84, 85] and an ortholog of the vancomycin resistance protein VanW [86, 87]. Several proteins involved in basic cellular processes as DNA gyrase subunits A and B [88, 89], the RNA polymerase beta subunit [90], translation elongation factor Tu [91] and UDP-N-acetylglucosamine enolpyruvyl transferase MurA [92] carry one or several point mutations conferring resistance to antibiotics as phosphomycin, fluoroquinolones, elfamycin or rifamycin.

All antibiotic resistance factors identified in P. vermicola DSM_17385 appear to be chromosomally encoded; no antibiotic or multi drug resistance (MDR) plasmids as, e.g., those found in P. rettgeri or P. stuartii were identified. Moreover, operons comprising both widespread and sporadically occurring resistance genes are widely distributed over the P. vermicola genome. In particular, neither class 1 or 2 integrons nor SXT element that have been described previously to carry accumulated resistance genes in a Providencia isolate assigned to the species P. vermicola [35], appear to be present in the genome as IntI1, IntI2 or SXT integrases and qacE, qacEdelta1 or sul1 genes were not identified in the genome under study.

Discussion

Prior to sequencing of the genome of the P. vermicola type strain DSM_17385, three genome sequences from Providencia strains assigned to the species P. vermicola had been published. The present comparative analysis using ribosomal typing, phylogenetic reconstruction and digital DNA-DNA hybridization has revealed that the presumed 4 P. vermicola strains are correctly assigned to the genus Providencia, but do not belong to the same taxonomic species.

Phylogenetic reconstruction from 16S rRNA encoding sequences has been found not sufficiently phylogeny informative to provide sound species delineation in the present context, a problem reported earlier with respect to the genus Providencia [19, 24]. In contrast, phylogenetic reconstruction from housekeeping gene (hMLST) and ribosomal protein encoding gene (rMLST) datasets comprising marker sequences from the nomenclatural type strains of all currently recognized Providencia species has clearly demonstrated that strain DSM_17385 is not closely related to any of the other specific type strains and therefore, expectedly, rightfully represents the independent species P. vermicola. In particular, results obtained from the ribosomal marker dataset corroborate the respective earlier conclusions from hMLST based studies [13, 19]. Both phylogenies (Figs. 3 and 4) indicate that P. vermicola shares a common ancestor with P. rettgeri clade A (comprising strain Dmel1), P. rettgeri clade B (proposed to be organized into an independent species named P. entomophila) and P. huaxiensis before it is phylogenetically related to P. alcalifaciens (comprising strain Dmel2) or the still more more distantly related Drosophila melanogaster derived P. sneebia and P. burhodogranariea type strains. These results have been fully corroborated by digital DNA-DNA hybridization analysis. Moreover, this systematic situation is mirrored in the apparently “inconclusive” outcome when strain DSM_17385 is typed across 53 ribosomal marker alleles: as the P. vermicola type is not closely related to any of the Providencia species represented in the PubMLST database, no existing rST has been assigned to the DSM_17385 genome.

Unexpectedly in view of their previous taxonomic assignment, none of the three presumed P. vermicola genome strains appeared closely related to the P. vermicola type strain. The outcomes of phylogenetic reconstruction from hMLST and rMLST data sets, digital DNA-DNA hybridization and ribosomal typing are in line with the assignment of Providencia strain G1 to the species P. rettgeri (clade B), whereas strains LLDRA6 and P8538 appear most closely related to the P. stuartii / P. thailandensis species complex. The perfect ribosomal typing match of strain P8358 to the species P. vermicola can be judged a bioinformatic artefact as the P8538 genome itself currently serves – both erroneously and misleadingly - as unique reference for this species in the PubMLST database. Taking these results together, Providencia strains G1, LLDRA6 and P8538 appear inconsistently assigned to the taxonomic species P. vermicola as represented at the genomic level by the genome sequence of the nomenclatural type strain DSM_17385.

Among the 195 Providencia genomes currently available from the Genbank database uniquely that of strain MR4, assigned to the species P. rettgeri, displayed a comparatively closer molecular taxonomic relationship to P. vermicola DSM_17385 (Suppl. Figure S2): Providencia strain MR4 might, therefore, be considered a possible candidate for a further P. vermicola genome strain. Interestingly, strain MR4 has been isolated from medicinal plant material, more exactly from stem tissue of the Indian mallow, Abutilon indicum, in India (Genbank BioSample SAMN03646990) and is, therefore, geographically related to strain DSM_17385. However, with only 2132 ORFs identified on as much as 697 contigs, assembly of the MR4 whole genome shotgun is currently rather incomplete, hampering systematic comparative genomics or OrthoANI analyses. In previous studies, comparisons of pair-wise sequence similarities from the hMLST marker set have been employed to critically evaluate the distinct species status of P. vermicola, P. rettgeri clades A and B and P. huaxiensis [13]. Extending this approach to the systematic relationship of strains DSM_17385 and MR4 has demonstrated that results from both the hMLST and rMLST datasets are consistent with P. vermicola DSM_17385 and the presumed P. rettgeri strain MR4 belonging to different rather than to the same taxonomic species.

Taking the above results together, the sequence reported here represents not only the nomenclatural type strain, but the to date unique well-supported P. vermicola genome. Comparative genomics at the infra-specific level as, e.g., the identification of P. vermicola-specific genes is not feasible with the currently available genome data set.

Within the limits set by the mode of genome assembly, orientation of the 18 scaffolds making up the P. vermicola genome sequence alongside the genome of P. rettgeri strain Dmel1 revealed a very high degree of genomic synteny. In particular, two important genomic rearrangements that have been described earlier for the P. sneebia genome in comparison to the genomes of further insect-derived Providencia strains from the species P. rettgeri, P. alcalifaciens and P. burhodogranariea [45] were absent from the genome of P. vermicola DSM_17385.

The genome of P. rettgeri Dmel1 was found to share 3127 orthologous genes (78%) with that of P. vermicola DSM_17385. This is a higher number and fraction of common orthologs than in comparison with insect-associated P. alcalifaciens (2672, 70%), P. burhodogranariea (2654, 70%) or P. sneebia (2211, 58%) strains [45], reflecting the closer phylogenetic relationship of P. vermicola and P. rettgeri. Given the lack of large block insertions, deletions or rearrangements when comparing both genomes (Fig. 6), genetic speciation understood as the generation of a set of unique genes appears to be the result of many small scale gene gains or losses rather than few large scale events.

No plasmid related to those of insect-associated Providencia bacteria has been identified when assembling the P. vermicola genome sequence data, a finding that is in line with the generally high variability in identity, conservation and putative functional designation documented for these genetic elements [45]. The plasmid identified, pPVER1, belongs to a class of small non-conjugative qnrD-plasmids found mainly in Morganellaceae bacteria, including strains from several Providencia species [51, 93, 94]. Expression of the qnrD gene produces a pentapeptide repeat protein that confers resistance to (fluoro-)quinolone antibiotics by protecting the cellular targets, namely bacterial DNA gyrase and topoisomerase IV, from quinolone binding [95]. The 2683 bp comprising qnrD-plasmids from P. rettgeri, P. alcalifaciens and P. stuartii in addition to the qnrD gene typically contain three colinear open reading frames termed ORF2 through ORF4 encoding hypothetical proteins of yet unknown function. The qnrD gene and ORF2 are located on a putative mic element delineated by 24 bp imperfect inverted repeats. qnrD-plasmids from P. rettgeri, P.stuartii and P. alcalifaciens are highly homologous with pair-wise nucleotide sequence similarities typically ranging between 99.7% and identity.

pPVER1 from the P. vermicola DSM_17385 genome shares the same basic structure found in these qnrD-type plasmids with the important difference that 747 bp from within the putative mic element comprising the qnrD gene itself have been replaced by 1887 bp of a sequence comprising the partially overlapping ORFs 1a and 1b that encode two hypothetical proteins of unknown function. Pairwise sequence similarities between the conserved 1795 bp long segment of pPVER1 carrying ORF2 through ORF4 and the homologous region of Providencia qnrD-type plasmids are high, ranging from 91.7 to 93.0%, whereas no significant similarity can be detected between the ORF1a/b and qnrD regions. Most probably, pPVER1 has already been described previously when strain DSM_17385 was used as reference strain in a study investigating qnrD genes of Proteeae bacteria [96], but no respective DNA sequence has been published from these studies.

A further qnrD-type plasmid, named p3–000369, has been identified in the nomenclatural type strain of the Providencia species P. huaxiensis [4]. Interestingly, the qnrD gene region in this plasmid is replaced by a 979 bp long DNA sequence carrying three colinear ORFs that encode two subunits of a predicted helix-turn-helix transcriptional regulator and a hypothetical protein of unknown function. This region displays no significant similarity to the Providencia qnrD-type plasmids or pPVER1, but an almost identical region comprising three orthologous ORFs is encoded by the large (> 200 kb) plasmid of P. rettgeri strain BML2526. Moreover, plasmid p3–000369 contains important deletions in both ORF3 and ORF4.

It appears, therefore, that within the genus Providencia the region carrying the quinolone resistance gene in small qnrD-type plasmids underwent rearrangements including recombination events with both the bacterial chromosome and other plasmids. The rearranged region is part of a putative mic element defined by inverted repeats IRR and IRL carrying, respectively, one and two SNPs when compared for P. vermicola pPVER1, P. rettgeri pDIJ09–518a, P. stuartii pMF1A, P. alcalifaciens pBT169, and P. huaxiensis p3–000369. However, as only part of the putative mic element is rearranged and as no known mobilization structures as, e.g., mob genes were identified on these plasmids, the mechanism leading to these rearrangements is currently unclear. It appears most parsimonious to suppose that rearrangements within qnrD-type plasmids of the genus Providencia have occurred subsequently to the appearance of a precursor plasmid carrying qnrD within a mic element.

With respect to the nature and organization of identified antibiotic resistance determinants, P. vermicola DSM_17385 appears generally similar to non-clinical strains from other Providencia species including P. rettgeri and P. stuartii. Striking features are both the complete absence of multi drug resistance plasmids and integrons and the related absence of the range of beta-lactamases found in clinical P. rettgeri and P. stuartii strains that are the main sources of the MDR phenotype occurring in Providencia [41]. These findings are fully in line with the invertebrate pathogen P. vermicola being in its natural environment efficiently excluded from the propagation routes of MDR carrying genetic elements operating between human pathogens. Susceptibility to MDR plasmid acquisition will likely become a major criterion in the evaluation of P. vermicola for potential applications in biological pest control.

Gram-negative bacterial pathogens use type III secretion systems (T3SS) or “injectisomes” in a wide variety of physiological contexts to translocate effector proteins simultaneously across their own cell envelope and a eukaryotic host cell or vacuolar membrane. There are different T3SS families named according to the type of ATPase being part of the injectisome. T3SS structural components, but not the effectors translocated by them are encoded by gene clusters that have been transferred between bacteria by horizontal gene transfer [97]. T3SS gene clusters are widespread in Providencia bacteria [44].

The P. vermicola DSM_17385 genome contains a single T3SS island encoding an injectisome of the Inv-Spa family (Fig. 7). This type of T3SS, termed “T3SS-1”, is generally associated with host cell invasion, i.e. paradigmatically bacterial uptake by nonphagocytic cells triggered by induced actin reorganization, or intracellular survival as, e.g., in the case of the tsetse fly endosymbiont Sodalis glossinidius [98, 99] or the primary endosymbiont (SZPE) of the Maize weevil, Sitophilus zeamais [100]. The T3SS-1 gene cluster is almost ubiquitously distributed across genomes of Providencia and related Proteus bacteria [44] and has likely been acquired prior to speciation. In particular, a similar T3SS-1 gene cluster has been identified in insect-derived Dmel strains of the species P. rettgeri, P. alcalifaciens and P. sneebia, but not in P. burhodogranariea [45]. Moreover, P. sneebia and P. burhodogranariea genomes carry a T3SS-2 injectisome of the Ysc family that is generally associated with the extracellular localization of pathogens and appears to be absent from other Providencia genomes including those of strains P. rettgeri Dmel1, P. alcalifaciens Dmel1 and the P. vermicola strain under study. As T3SS-1 should likely be non-functional in P. sneebia due to disruption of the Inv-type ATPase ORF by a premature stop codon [45], it appears that insect-associated P. sneebia and P. burhodogranariea operate a T3SS-2 for extracellular, insect-associated P. vermicola, P. rettgeri and P. alcalifaciens a T3SS-1 for intracellular localization. As Inv-Spa-type injectisomes are widespread across the genus Providencia, but Ysc-type injectisomes appear to date limited to two closely related Providencia species, it is most parsimonious to assume that a common ancestor of P. sneebia and P. burhodogranariea has acquired a T3SS-2 island by, for instance, horizontal gene transfer, with subsequent mutational inactivation and – in P. burhodogranariea – loss of the original T3SS-1 gene cluster. Interestingly, a Ysc-type injectisome (T3SS-2) is present in the nematode-associated entomopathogenic bacterium Photorhabdus luminescens where it is involved in bacterial survival in the insect hemocoel and resistance to phagocytosis by macrophages [101].

Conclusions

The genome of the nomenclatural type strain DSM_17385 of the nematode-associated insect-pathogenic enterobacterial species P. vermicola has been sequenced and analyzed. The sequence reported represents the first well-supported published genome for the taxonomic species P. vermicola to be used as reference in further comparative genomics studies on Providencia bacteria. Genomic analysis has confirmed a closer phylogenetic relationship of P. vermicola to the P. rettgeri species complex including the recently proposed species P. huaxiensis and P. entomophila than to further Providencia species. The genome shows a high degree of synteny when compared to P. rettgeri strain Dmel1 isolated from D. melanogaster with 78% of the identified genes being present in both genomes. As most Providencia strains sequenced to date, P. vermicola DSM_17385 carries a type III secretion system (T3SS-1) with probable function in host cell invasion or intracellular survival and might therefore differ fundamentally in its mechanism of pathogenesis from insect-pathogenic P. sneebia or P. burhodogranariea bacteria carrying a different type of injectísome. Potentially antibiotic resistance-associated genes comprising numerous efflux pumps and point-mutated house-keeping genes, have been identified across the P. vermicola DSM_17385 genome. However, no antibiotic resistance gene carrying plasmids or mobile genetic elements as those causing MDR phenomena in clinical Providencia strains have been found. The only identified plasmid, pPVER1, is derived from a fluoroquinolone resistance plasmid family, but has lost the qnrD resistance gene by recombination from within the plasmid-encoded mic element. We conclude that the invertebrate pathogen P. vermicola is in its natural environment efficiently excluded from the propagation routes of MDR carrying genetic elements operating between human pathogens.

Methods

Bacterial cultivation and DNA extraction

The nomenclatural type strain P. vermicola DSM_17385 (= CIP_108829 = OP1) has been received from the German Collection of Microorganisms and Cell Cultures (DSMZ; https://www.dsmz.de). The strain had originally been isolated from surface-sterilized infective juveniles of entomoparasitic Steinernema thermophilum nematodes extracted from larvae of the greater wax moth (Galleria mellonella) [24]. For DNA extraction, bacteria were grown to late log phase in LB medium (10 g/l Tryptone, 5 g/l Yeast Extract, 5 g/l sodium chloride, pH = 7.0) containing 50 μg/ml tetracycline. DNA was extracted using the DNeasy Blood & Tissue kit protocol for Gram negative bacteria as provided by the manufacturer (Qiagen). Genomic DNA was eluted in 10 mM TrisCl (pH 8.5). DNA quality and quantity were controlled electrophoretically and using a NanoDrop NT-1000 UV spectrophotometer.

Genomic sequencing, assembly and gene annotation

The whole genome sequencing was performed (SEQ-IT, Kaiserslautern, Germany) on the Illumina MiSeq platform (Illumina, Inc., San Diego, CA), producing 2 × 250-bp end-paired reads, generating a total of 1,530,356 reads with ~ 90× coverage. The Trimmomatic algorithm (version 0.36) [102] was used to trim all the generated reads and their quality evaluated with in-house scripts using FastQC (version 0.11.9) [103], BedTools (version 2.25.0) [104], and SAMtools (version 1.3.1) [105] algorithms. High-quality filtered reads were subsequently de novo assembled using the SPADES assembler (version 3.7.1) [106].

GLIMMER (version 3) prokaryotic genome automatic annotation software was used to annotate this genome [107]. The size, GC content, number of contigs, N50, L50, average coverage, as well as the number of RNAs, tRNAs, and protein-coding sequences obtained for our isolate, can be found in Table 1. The structural RNA encoding genes were identified using tRNAscan-SE version 2.0.7 [108] and RNAmmer version 1.2 [109]. Finally, the circular multi-track plot was carried out using the ‘RCircos’ R software package [110].

An in-house pipeline was developed to annotate P. vermicola antibiotic resistance genes using the BLASTp algorithm (E-value < 1e-10 and %identity ≥50%). The queried DSM_17385 genes were identified by mapping protein sequences to the CARD database [111] and tBlastN similarity searches across annotated Providencia genomes. Blast2GO was used to provide automatic high-throughput annotation, gene ontology mapping and functional categorization of P. vermicola ORFs identified by GLIMMER. Finally, orthologous genes were evaluated using clusters of orthologous genes (COGs) and eggNOG [112, 113].

The complete plasmid assembly of pPVER1 was performed utilizing plasmiSPAdes (SPAdes v3.7.1) software with minor manual curation [109].

Prediction of Providencia orthologs and digital DNA-DNA hybridization analysis

Orthology analysis was conducted on proteomes of P. vermicola DSM_17385 and P. rettgeri Dmel1 using Inparanoid software with default parameters [114]. We used a confidence score threshold = 1 to directly estimate orthology relationships between the identified protein-encoding genes.

Orthologous Average Nucleotide Identity Tool (OAT) software v0.9.31 (https://www.ezbiocloud.net/tools/orthoani) [49] was used for calculation of pair-wise OrthoANI values for a set of published genomes comprising the four Providencia strains previously assigned to the species P. vermicola, the nomenclatural type strains representing all currently recognized taxonomic Providencia species (with the notable exception of the still insufficiently assembled genome of the P. stuartii type strain NCTC_11800), and four strains selected on the basis of the hMLST based phylogenetic analysis shown in Suppl. Figure 2 as the most closely related well annotated genomes with respect to the P. stuartii type strain (i.e. surrogate type genome of P. stuartii strain FDA-ARGOS 645) and to the three presumed P. vermicola strains G1, P8358 and LLDRA6.

Whole-genome alignment

The genomic DNA sequences of P. vermicola DSM_17385 and P. rettgeri Dmel1 (NZ_CM001774) were aligned using LASTZ (Large-Scale Genome Alignment Tool) (version 1.02.00) with default parameters [115]. Syntenic chromosomal regions were identified using the MAUVE (Multiple Alignment of Conserved Genomic Sequence with Rearrangements) software package. To determine a reasonable value for the Min Locally Collinear Blocks (LCBs), we performed an initial alignment at the default value and then used the LCB weight slider in the MAUVE graphical user interface (GUI) to fix parameters empirically eliminating all spurious rearrangements. Sequences were then realigned using the manually determined weight value. The T3SS-1 sequences of P. vermicola DSM_17385 and P. rettgeri Dmel1 (NZ_CM001774) were aligned using ClustalW (version 2.1) with default settings [116].

Multilocus sequence typing and phylogenetic reconstruction

The 16S rRNA, hMLST and rMLST marker genes identified in the annotated P. vermicola DSM_17385 genome (Additional files 1 and 2) were used as query in separate BlastN searches [117] across completed genome and whole genome shotgun entries of the Genbank database assigned to the genus Providencia (taxid 586). Orthologous genes from the type strain DSM_4479 (= ATCC_29906) of the related enterobacterium Proteus mirabilis were concomitantly identified to serve as outgroup for phylogenetic reconstruction. A whole genome shotgun sequence assigned to `Candidatus Providencia siddallii´ was not considered for sequence typing as preliminary analysis revealed that sequences were too highly divergent to be relevant for the problem under study. For each hMLST and rMLST marker, a set of orthologs from reference genomes representing the 10 recognized Providencia species was generated, each species being represented by its nomenclatural type strain and – if available - further strains spanning the known range of diversity included under this taxon.

Marker alignment and phylogenetic reconstruction were performed using the MEGA software tool [118] at the level of hMLST and rMLST meta-genes comprising concatenations of all respective single marker sequences. Phylogenies were reconstructed using a p-distance matrix-based Neighbor Joining (NJ) method as implemented in MEGA. Tree topology confidence limits were explored in non-parametric bootstrap analyses over 1,000 pseudo-replicates.

For rMLST, P. vermicola DSM_17385 genome data and the three published genomes assigned to this species were compared to the PubMLST database [47]. The rMLST typing tool compares each rps, rpl and rpm gene sequence of the query genome to an allele-specific reference database, identifies the closest ribosomal sequence type or types (rST) in the database, and translates this rST similarity into a taxonomic assignment.