Introduction

The Pectobacterium genus [1] gathers important plant pathogens that cause soft rot disease on a large variety of plant species [2]. Given their ability to cause disease on major crops, such as potato, Pectobacterium sp. have mainly been isolated from diseased plant during initial outbreak or sustained epidemic and their descriptions outside of agricultural context is scarce [3].

The classification of the Pectobacterium genus has been subject to extensive revision over the last decade. It is currently subdivided in 7 species; P. carotovorum [1], P. atrosepticum [4], P. betavasculorum [4], P. wasabiae [4], P. aroidearum [5] P. polaris [6], P. parmentieri [7] and the recently proposed “P. peruviense” [8]. The P. carotovorum specie is heterogeneous and is currently subdivided several recognized subspecies, P. carotovorum subsp. carotovorum [9, 10], P. carotovorum subsp. odoriferum [9, 10] and proposed subspecies “ P. carotovorum subsp. actinidiae” [11] and “ P. carotovorum subsp. brasiliense ” [12]. This heterogeneity led to assignation of many Pectobacterium isolates to P. carotovorum . One example is the strain UG32 (also named IFB5232, SCRI179, LMG30269 and PCM2893) that was initially described as P. carotovorum subsp. carotovorum and is now the proposed type strain of the “P. peruviense” specie [8, 13]. All the strains described so far in the “P. peruviense” specie have been isolated in Peru in the seventies during the twentieth century from potato plants cultivated at high altitude (2400–3800 m). Here we described the draft genome sequence of two strains A97-S13-F16 and A350-S18-N16 isolated in February and November 2016 at different altitudes in the Durance river stream in France.

Organism information

Classification and features

Strain A97-S13-F16 was isolated in february 2016 from fresh water sampled in the river Durance while strain A350-S18-N16 was isolated in november 2016 from fresh water sampled in river Bléone, close to the confluent with river Durance. The fresh water parameters measured at the sampling times respectively were the following respectively for A97-S13-F16 and A350-S18-N16 sampled water: temperature 6.4 °C and 10.4 °C; turbidity 2.69 NTU and 145 NTU, conductivity 629 μS and 629 μS. Following sampling, 500 ml of fresh water was filtered through 0.2 μm pore filters (Sartorius cellulose acetate filters), the bacteria present on the filters were suspended in 1 ml sterile distilled water and 100 μl of the suspension were poured onto semi selective modified single-layers CVPAG366 plates (same medium as described in [14] except that tryptone was not added to the medium, hereafter described as CVP). After 2 days of growth at 28 °C, two strains forming pits on CVP medium were further isolated, named A97-S13-F16 and A350-S18-N16 and stored in 40% /60% glycerol/ LB liquid medium (10 g tryptone, 5 g yeast extract, 10 g NaCl per one liter of medium) at − 80 °C.

Cells of both strains are rod shaped with length of approximately 2 μm in the exponential growth phase on LB medium (Fig. 1) and both strains are macerating potato tubers (Additional file 1: Figure S1). They are forming isolated colonies after 24 h at 28 °C on LB-15 g agar medium and after 48 h at 28 °C on TSA 10% medium (1,5 g tryptone, 0,5 g soy peptone, 0,5 g NaCl, 15 g agar per one liter of medium) and are inducing pits in CVP medium after 48 h at 28 °C.

Fig. 1
figure 1

Photomicrographs of Gram stained exponentially growing “P. peruviense” cells. (a) strain A97-S13-F16, (b) A350-S18-N16. A light microscope with 100X magnification was used. These photomicrographs show the rod shaped forms of both strains. The bar scale represent 5 μm

Amplification and sequencing of the gapA house keeping gene was recently described to rapidly characterize the different Pectobacterium species [15]. The gapA sequences of strains A97-S13-F16 and A350-S18-N16 clustered with the one of proposed “P. peruviense” type strain (Fig. 2A) and the clusterization of both strains with “P. peruviense” was confirmed through MLSA analysis of full genomes (Fig. 2B).

Fig. 2
figure 2

Phylogenetic trees of “P. peruviense” strains and strains of other Pectobacterium species and subspecies. a Phylogenetic tree constructed from the gapA nucleotide sequences. Sequences were aligned using the MUSCLE software [24] and the alignments were filtered by using the program GBLOCKS [25].Tree was computed using PHYML [26]. One hundred bootstrap replicates were performed to assess the statistical support of each node. Bootstrap support values (percentages) are indicated if superior to 95%. gapA sequences were retrieved from full genome of type strains (accession numbers are indicated in Fig. 1b) or obtained from the sequenced gapA amplicon for strains A97-S13-F16 and A350-S18-N16. b Phylogenetic tree constructed from concatenated sequences of 1266 homologous amino acid sequences. Before concatenation, the homologous sequences of each gene were aligned using the MUSCLE software [24] and the alignments were filtered by using the program GBLOCKS [25]. Tree was computed using PHYML [26]. One hundred bootstrap replicates were performed to assess the statistical support of each node. Bootstrap support values (percentages) are shown if less than 100%. The accession number for each genome is indicated inside brackets after the strain name. Dickeya solani RNS08.23.3.1.A was used as outgroup. Type strains are marked with T after the strain name

General feature of A97-S13-F16 and A350-S18-N16 are indicated in Table 1.

Table 1 Classification and general features of strains A97-S13-F16 and A350-S18-N16

Genome sequencing information

Genome project history

The aim of the project was to described Pectobacterium sp. isolated from environmental samples outside agricultural context. Fresh water sampling was performed in the river Durance and its tributaries in 2016. Amongst the isolated strains, the two strains A97-S13-F16 and A350-S18-N16, isolated in different locations and at different months in the river stream, were selected for sequencing following amplification and sequencing of their gapA house keeping gene because phylogenetic analysis of their gapA sequences positioned both gapA sequences close to the gapA sequence of the recently proposed “P. peruviense” type strain UGC32 [8, 13, 15].

Growth conditions and DNA isolation

After isolation from fresh water in 2016, strains A97-S13-F16 and A350-S18-N16 have been stored in 40%/60% glycerol /LB medium at − 80 °C. For preparation of genomic DNA, the strains were first grown overnight at 28 °C on solid LB medium. A single colony was then pick up and grown overnight in 2 ml of liquid LB medium at 28 °C with 120 rpm shaking. Bacterial cells were harvested by centrifugation (5 min at 12,000 rpm) and DNA was extracted with the wizard® genomic DNA extraction kit (Promega) following the supplier specification. DNA was suspended in 100 μl of sterile distilled water and the quantity and quality of DNA was assessed by nano-drop measurement, spectrophotometry analysis and gel analysis.

Genome sequencing and assembly

Genome sequencing was performed at the next generation sequencing core facilities of the Institute for Integrative Biology of the Cell, Bât. 21, Avenue de la Terrasse 91,190 Gif-sur-Yvette Cedex France. Nextera DNA libraries were prepared from 50 ng of high quality genomic DNA. Paired end 2 × 75 bp sequencing was performed on an Illumina NextSeq500 instrument, with a High Output 150 cycle kit.

CLC Genomics Workbench (Version 9.5.2, Qiagen Bioinformatics) was used to assemble 30,066,500 (mean length 53 bp) and 8,174,334 reads (mean length 52 bp) for strains A97-S13-F16 and A350-S18-N16 respectively. Final sequencing coverages were 331× and 86× with 61 and 73 scaffolds for strains A97-S13-F16 and A350-S18-N16 respectively (Table 2).

Table 2 Genome sequencing project information

Genome annotation

Coding sequences were predicted using the RAST server [16] with the Glimmer 3 prediction tool [17]. COG assignments and Pfam domain predictions were done using the Web CD-Search Tool [18]. CRISPRFinder [19] was used to detect CRISPRs. Signal peptide and transmembrane domain were detected with the SignalP 4.1 Server [20] and transmembrane helices were predicted with TMHMM [21].

Genomes properties

The “P. peruviense” A97-S13-F16 draft genome contains 4,775,191 bp with a GC content of 51%. Total predicted genes are 4503 while predicted protein coding genes are 4459 and RNA genes 44. The final assembly comprised 61 scaffolds. Among the predicted genes, 72.21% have a predicted function, 79.91% were assigned to COG and 85.40% have a predicted Pfam domain. Among the predicted proteins, 392 have a predicted signal peptide while 1090 contain a predicted transmembrane helix. Three CRIPS repeats array were detected in this genome.

The “P. peruviense” A350-S18-N16 draft genome contains 4,871,019 bp with a GC content of 51,1%. Total predicted genes are 4635 while predicted protein coding genes are 4487 and RNA genes 48. The final assembly comprised 73 scaffolds. Among the predicted genes, 72.01% have a predicted function, 78.77% were assigned to GOG and 85.09% have a predicted Pfam domain. Among the predicted proteins, 395 have a predicted signal peptide while 1095 contain a predicted transmembrane helix. Two CRIPS repeats array were detected in this genome.

The properties and the statistics of the two draft genomes are summarized in Tables 3 and 4.

Table 3 Genome statistics
Table 4 Number of genes associated with the 25 COG functional categories

Insight from genome sequences

Genome comparison between A97-S13-F16 and A350-S18-N16 and the genome of representative species of the Pectobacterium genus

A phylogenetic tree, constructed from concatenated sequences of 1266 homologs proteins, clustered the A97-S13-F16 and A350-S18-N16 strains together, close to UGC32 the proposed “P. peruviense” type strain (Fig. 1B). ANIb were further calculated between genomes of strains A97-S13-F16 and A350-S18-N16 and the genomes of described Pectobacterium species and subspecies (Additional file 2: Table S1). Pairwise ANIb values between the three “P. peruviense” genomes, A97-S13-F16 and A350-S18-N16 and UGC32, were above 97,5%. Pairewise ANIb values of these three “P. peruviense” genomes with genomes of other Pectobacterium species and subspecies were below 94%. dDDH is an in silico method to approach the wet-lab DDH method as closely as possible [22]. dDDH were calculated between the genomes of A97-S13-F16 and A350-S18-N16 and Pectobacterium genomes representative of known species and subspecies (Additional file 2: Table S1). dDDH values between A350-S18-N16, A97-S13-F16 genomes and the proposed “P. peruviense” UGC32 genomes were above 79%, well above the 70% species boundary. When pairwise calculations were performed between these three genomes with those of known Pectobacterium species and subspecies the estimated dDDH values dropped below 54%, well below the species boundary. This confirmed that A97-S13-F16 and A350-S18-N16 belong to the “P. peruviense” specie.

Genomes comparison between the “P. peruviense” strains

The phylogenetic trees (Fig. 2) indicate that strains A97-S13-F16 and A350-S18-N16 are more closely related to each other than they are from the “P. peruviense” type strain UGC32. To further gain insight into the distance between the three “P. peruviense” strains, we looked for shared and unique genes between genomes of strains A97-S13-F16, A350-S18-N16 and UGC32 type strain (Fig. 3). A97-S13-F16, A350-S18-N16 and UGC32 strains contain respectively a pool of specific genes of 292, 414 and 346. The slightly higher pool of specific genes observed in strain A350-S18-N16 could be partly related to its higher content of mobile genetic elements inserted in its genome as described in Table 4. Indeed, we observed 3 clusters of phage-related genes in strain A350-S18-N16, only one being also detected in strain A97-S13-F16. The Venn diagram indicated that 4129 genes are shared between strains A97-S13-F16 and A350-S18-N16 while only 3757 and 3765 genes are respectively shared between the type strain UGC32 and A97-S13-F16 / A350-S18-N16. This confirmed that A97-S13-F16 and A350-S18-N16 genomes are more closely related to each other than they are with the genome of the proposed type strain UGC32.

Fig. 3
figure 3

Venn diagram. Shared and unique genes between the genomes of “P. peruviense” A97-S13-F16 and A350-S18-N16 and the proposed “P. peruviense” type strain UGC32. Orthology was assumed using a threshold of 80% identity on at least 80% of the protein length

Conclusions

In this study we presented the draft genome sequences of two strains of “P. peruviense” isolated from fresh water in river stream in France. The “P. peruviense” specie has recently been proposed and, until our study, the described strains belonging to the “P. peruviense” specie have all been isolated on potato tubers in the altiplano in Peru [8]. The presence of strains belonging to the “P. peruviense” specie in two independent environmental samples in France indicates that the geographic distribution of this specie is likely to be larger than previously anticipated. Both French strains are able to rot potato tubers like the proposed type strain UG32. The two French isolates are more closely related to each other than they are with the type strain UGC32. Whether this reflects the geographic provenance (France vs Peru) or the niche provenance (water vs diseased plants) is unknown.