Introduction

Since the application of Next Generation Sequencing (NGS) in microbiology, thousands of bacterial genomes have been sequenced. Among these analyzed species is Pseudomonas syringae, and at present ca. 180 genomic sequences have been assembled for P. syringae (NCBI 2017), the plant pathogenic bacteria species that cause diseases in many agriculturally important crops and which has been divided into different pathovars.

One of the pathovars of P. syringae, namely lachrymans, is mainly a pathogen of the cucumber (Cucumis sativus L.), to which it causes serious damage and yield loss due to the presence of water-soaked lesions on the leaves that later become necrotic, thus reducing the photosynthetic capacity of the infected foliage (Olczak-Woltman et al. 2008; Lamichhane et al. 2015). The disease caused by this pathogen, i.e. bacterial angular leaf spot, is distributed worldwide and appears on other cucurbit species as well. Novel haplotypes of P. syringae were observed to be common on multiple cucurbit hosts, thus illustrating this species’ large ecological diversity (Newberry et al. 2016). Moreover, pathovar lachrymans is particularly detrimental because it can facilitate infection by Pseudoperonospora cubensis, which is the most destructive cucumber pathogen that causes downy mildew (Olczak-Woltman et al. 2008). Recently, outbreaks of angular leaf spot were reported in several Chinese provinces, where the disease affected 15–50% of growing fields, causing between 30% and 50% of yield reduction (Meng et al. 2016). This is an economically important pathogen, as cucumber is grown over an area of 2.1 million hectares with total production at 71.3 million tons, mainly in China but also in the EU and USA (FAO 2017).

Current genomic technologies provide the means not only for efficient genome sequencing but also for comparative genome analyses from which structural, phylogenetic or evolutionary conclusions can be drawn. Genome sequencing is expected to provide relevant tools in bacterial taxonomy and in an in-depth characterization of bacterial pathogens. To date, the genomes of seven strains belonging to pathovar lachrymans have been sequenced and are available as drafts or early drafts (Baltrus et al. 2011; Jeong et al. 2015; Mott et al. 2016; NCBI 2017). However, among the seven genomes, only two strains, MAFF301315 and MAFF302278, were described in detail and aligned with other representative strains of P. syringae by Baltrus et al. (2011). These strains, unlike other P. syringae strains, have only a low percentage of novel Type Three Effectors (TTEs), although MAFF301315 possessed a relatively higher number of TTEs. Moreover, MAFF301315 possessed a megaplasmid pMPPla107, approximately 1 Mb in size encoding 776 hypothetical proteins. This megaplasmid was found to be present in the very closely related strain N7512 but was absent in other lachrymans strains. It was inferred that the plasmid was a recent acquisition since the collected pathovar lachrymans strains possess nearly identical sequences at their MLST loci and only these two strains possess the megaplasmid (Baltrus et al. 2011). It was later shown that pMPPla107 is self-transmissible across a variety of diverse pseudomonad strains with conjugation dependent on a Type Four Secretion System (T4SS). However, its role in virulence remains elusive (Romanchuk et al. 2014). Recently, Baltrus et al. (2017) described four different Type Three Secretion Systems (T3SS) in P. syringae pathovars: canonical, rhizobial, single and atypical. The canonical system present in P. syringae pv. tomato DC3000 is required for virulence in planta in every pathogenic strain investigated so far, and its presence is strongly correlated with pathogenic potential on agriculturally relevant plants. On the other hand, the rhizobial system does not appear to be required for virulence in planta but was acquired via multiple horizontal gene transfers by strains within the P. syringae complex (Baltrus et al. 2017).

In this paper we report the genome sequence of pathovar lachrymans strain 814/98 which is highly virulent to cucumber. This sequence was compared with other P. syringae strains with a special focus on the strains of pathovar lachrymans.

Material and methods

The bacterial strain

Both virulence and genetic diversity of the strains collected at the Department of Plant Genetics, Breeding and Biotechnology of WULS were described previously (Olczak-Woltman et al. 2007; Słomnicka et al. 2015). Based on those studies, pathovar lachrymans strain 814/98, recognized as the most virulent strain to cucumber, was chosen for genome sequencing. The strain is of Dutch origin and was obtained from a collection maintained in the Pathogen Bank of the Institute of Plant Protection, Poland.

Bacterial growth and DNA isolation methods

The bacterial culture of strain 814/98 was initiated from a single colony and grown for 24 h in Luria Broth liquid medium on a rotary shaker at 28 °C and 200 rpm. Total genomic DNA was extracted using the DNA Genomic-tips 100/G kit (Qiagen, Germany), as per the manufacturer’s instructions. The DNA concentration was estimated by using a NanoDrop2000 spectrophotometer (Thermo Scientific, USA) and by electrophoresis on agarose gel stained with ethidium bromide. Finally, quality of the sample was verified by chip electrophoresis using the Experion™ Automated Electrophoresis System (Bio-Rad, USA); ca. 70 μg DNA of high purity was provided for sequencing.

Whole genome sequencing

An Illumina HiSeq 2000 platform was used for sequencing. Briefly, two types of DNA paired-end libraries with an insert size of 500 bp and 6500 bp were constructed according to manufacturer’s recommendations (Illumina, USA) to generate >100× genome coverage (Table S1). The DNA was sonicated, end-repaired and ‘A’ was added at 3′ ends using T4 polynucleotide kinase. Further adapters were ligated, size-selected DNA was enriched by PCR and used for library preparation. Later, commercial sequencing was performed at BGI Tech Solutions (Hong Kong, China).

De novo genome assembly and structural annotation

Raw Illumina reads were filtered to remove adapters and low quality bases. Clean data as FastQ files were assembled using SOAPdenovo (Li et al. 2008) into contigs and scaffolds (assembly P814h, NCBI GenBank Accession NBLF00000000, BioProject PRJNA380232, raw sequence read archive SRA SUB2542424). These were structurally analyzed and the number and length of the contigs and scaffolds (Table S2) as well as repetitive fragments (Table S3) were described. Tandem repeats were identified using a Tandem Repeat Finder (Benson 1999). Minisatellite and microsatellite DNA were classified based on the number and length of repeat units (15–65 bp for minisatellite DNA and 2–10 bp for microsatellite DNA). Sequences flanking microsatellite loci (100 bp up- and downstream) were compared using the BLASTX algorithm to sequences deposited at: NCBI (http://www.ncbi.nlm.nih.gov/), JGI (http://jgi.doe.gov/) and Pseudomonas Genome DB (http://www.pseudomonas.com/). The BLASTX hits were identified (E-value < 1e−5 with similarities of at least 85%) and the summary results are presented in Table S3.

Gene prediction and functional annotation

Genes encoding proteins were predicted from the genome assembly using Glimmer v.3.02 (Delcher et al. 2007). Genes encoding rRNA and tRNA were identified using RNAmmer v.1.2 (Lagesen et al. 2007) and tRNAscan-SE v.1.3.1 (Lowe and Eddy 1997). sRNA genes were predicted using the Rfam database (http://rfam.xfam.org/). Functional gene annotation was done by analyzing protein sequences (Table S4). Genes were aligned with several databases to obtain their corresponding annotations. The following were searched: KEGG v.59 (Kanehisa et al. 2006) and COG v.20090331 (Tatusov et al. 2003); Swiss-Prot v.2011_10_19 and GO v.1.419 (Bard and Winter 2000). To ensure biological meaning, high-quality alignment results were chosen for gene annotation.

Phylogenetic and comparative analyses

A dataset of 26 sequenced Pseudomonas spp. strains was constructed in order to perform comparative analysis (Table 1). It consisted of strain 814/98, other previously sequenced strains of pathovar lachrymans and published in NCBI, sequenced strains belonging to other P. syringae pathovars, and one strain each of: P. aeruginosa, P. cichorii, P. fluorescens and P. putida. The constructed dataset of sequenced Pseudomonas spp. strains was enriched with 15 P. syringae strains derived from our collection in order to perform MLST analysis. Of the analyzed MLST loci: cts, gapA, pgi, rpoD, gyrB, pfk and acn (Sankar and Gutman, 2004; Hwang et al. 2005), three sequences, i.e. acn, gyrB and pgi were chosen for phylogenetic analysis because the set of sequences without unknown nucleotides for all of the analyzed strains was found only for these genes. The corresponding nucleotide sequences were extracted for each of the genomes (Table S5a-c). Genome sequences were examined for the full length of the three gene sequences without the unknown nucleotides. Furthermore, sequences set of concatenated partial genes sequences were processed with BLASTclust (http://toolkit.tuebingen.mpg.de/blastclust) with clustering level of 100% sequence identity, then the obtained sequences were assembled with ClustalX and manually trimmed in CLC Genomic Workbench v.9.0 (CLC Bio, Denmark). Final block alignment was prepared using GBlocksServer with a less stringent selection option (http://molevol.cmima.csic.es/castresana/Gblocks_server.html). In the final alignment, concatenated sequences showing no differentiation were removed, except for sequences that belonged to strains of different species, such as P. fluorescens ICMP7711, P. savastanoi pv. neri ICMP16943 and P. savastanoi pv. savastanoi NCPPB3335. Nucleotide sequences of P. aeruginosa, P. cichorii, P. fluorescens and P. putida were used as the out-group. Final sequence alignment for 40 selected strains consisted of 1672 nucleotides of concatenated partial genes sequences and partial genes alignments consisted of 521, 672, 475 nucleotides for acn, gyrB, and pgi, respectively. The substitution model, nucleotide frequencies and substitution values were estimated with the jModel Test v.0.1.1 (Darriba et al. 2012) and with the AICc selection criterion (model GTR + I + G). Metropolis-coupled Markov chain Monte Carlo (MCMCMC) analysis was performed with two runs for 5 million generations with four chains, with the heating coefficient λ = 0.1 with MrBayes v.3.2.2 × 64 (Ronquist et al. 2012).

Table 1 Pseudomonas spp. strains, their NCBI accession numbers and references used in the bioinformatic and phylogenetic analyses. Details are listed in Supplementary Table S5a-c

Plasmid identification

Plasmid identification was performed using several different methods. Bioinformatic identification using CLC Genomic Workbench v.9.0 was done after mapping the raw clean reads on already assembled contigs and scaffolds. The calculated coverage was the basis for plasmid identification because short and multicopy plasmid sequences have higher coverage than chromosome sequences (CLC Genomics Manual) (Table S2). The other method was to search for the ori sequence. In order to perform this approach, proper sequences identified in other strains were downloaded from NCBI and Pseudomonas DB databases (Table S6). Finally, the scaffolds and contigs of 814/98 were surveyed with BLASTN for the presence of other structural genes or unique sequences associated with P. syringae plasmids. A whole scaffold or contig with a BLAST hit (E-value = 0,0; similarity length at least 1 kb) identifying the fragment as coming from a plasmid was compared to the NCBI database, and plasmid sequences available in the NCBI database were compared to the 814/98 scaffolds and contigs (BLAST and reciprocal-BLAST). The BLAST results are summarized in Table S6. BLAST results for gene sequences present on the plasmids and their descriptions are presented in Table S7. Plasmid identification on Eckhardt gels was performed previously (Słomnicka et al. 2015).

Identification of TTEs

A constructed dataset of 26 sequenced Pseudomonas spp. strains was used to identify the presence of known TTEs (Table 1). Additionally, strain B728a of P. syringae pv. syringae was added. The genome sequence of each strain was surveyed with the TBLASTN algorithm in CLC Genomic Workbench v.9.0. The set of known effectors was constructed based on an extended table of the Baltrus et al. (2011) concept and on validated TTE family members deposited on the PPI website (http://www.pseudomonas-syringae.org/). Each strain was considered to possess TTEs if a majority of the protein sequences had significant BLAST hits (<1e−5) with an identity of at least 80%. The binary matrix of either the presence or absence of TTEs for the Pseudomonas spp. collection was created (Table S8). Finally, this matrix was converted into a genetic distance matrix in NTSYSpc 2.1 software (Exeter Software, USA) using the Jaccard coefficient, and the dendrogram was constructed in Mega 7.0 software (Center for Evolutionary Medicine and Informatics, USA) using the NJ clustering method. Additionally, protein sequences were evaluated using Effective T3 database v.1.0.1 (http://effectivedb.org/).

Results

Strain 814/98’s genome sequencing and de novo assembly

In order to sequence the genome of strain 814/98, paired-end de novo sequencing on an Illumina HiSeq2000 platform was conducted to produce 1056 Mb sequence reads from short and long insert libraries, resulting in 160× genome coverage (Table S1). Sequence reads were de novo assembled into 102 contigs and 35 scaffolds. The longest scaffold was 1,437,579 bp in length and the longest contig was 443,518 bp in length with N50 1,124,118 bp for scaffolds and 141,450 bp for contigs (Table 2). Nineteen short contigs/scaffolds were identified to be either duplicates or almost identical to parts of longer contigs/scaffolds. Some of them were duplicated several times. After rejection of short duplicated contigs and scaffolds, the total number of scaffolds was reduced to 22. The main six scaffolds with a size of over 400 kb constituted the core of the 814/98 chromosome. The size of the other six scaffolds ranged from 10 kb to 100 kb. Only 10 scaffolds were shorter than 2 kb. The entire genome size was estimated to be 6,579,377 bp in length and the GC content was 57.97% (Table 2).

Table 2 Pseudomonas syringae pv. lachrymans strain 814/98’s genome assembly summary. The number of core scaffolds is presented in brackets

Functional annotation of the 814/98 genome

In total, 6024 genes encoding proteins were identified in strain 814/98 (Table 3, S4). The average gene length was estimated to be 934 bp and 400 genes were longer than 2 kb. In total, 92 genes encoding RNAs were identified: 62 genes encoding tRNAs, 16 genes encoding rRNAs (including 7 rrn5, 5 rrn16 and 4 rrn23) and 14 genes encoding sRNAs. The length of the non-coding RNAs was estimated to be 838,126 bp. The total length of the gene-encoding sequences was 5,629,212 bp, which constituted 85.56% of the genome, with the intergenic regions constituting 14.44%. GC content within the genic regions was 58.84%.

Table 3 Functional annotation summary of Pseudomonas syringae pv. lachrymans strain 814/98’s genome

The gene functional annotation was done by aligning the protein predictions with selected databases. The consistent results were obtained using KEGG and COG databases. Alignment with the COG database allowed for differentiation of 22 general gene classes. The largest number of genes was associated with membrane transport and metabolism. The predicted functions of 304 and 145 genes were related to transcription and posttranslational modification, and protein turnover, respectively (Fig. S1A). The KEGG database allowed for identification of 885 genes involved in membrane transport. A high number of the identified genes was also involved in metabolism, i.e. 503, 420 and 191 genes were related to amino acid, carbohydrate and energy metabolism, respectively (Fig. S1B). Moreover, a similar number of genes involved in DNA replication, recombination and repair processes and the signal transduction mechanism were identified in both databases (276 and 274 genes, respectively).

Tandem repeats in the 814/98 genome

Genome analysis of 814/98 revealed 208 tandem repeats (TR) – structural genomic components (repeat length from 4 to 1860 bp). These represented ca. 50 kb, i.e. less than 1% of the genome, and consisted of three classes of TR: long tandem repeats, minisatellites and microsatellites (Table 3). Out of 208 TR, 77 were classified as minisatellite DNA and 21 as microsatellite DNA. No CRISPR repeats or spacers were found. The repeat size of minisatellite DNA was from 15 to 63 bp and the total length was about 7 kb (about 0.1% of the genome). Microsatellite DNA possessed repeat size from 4 to 10 bp and a total length of 703 bp (about 0.01% of the genome). Microsatellite DNA may have different repeat unit size and repeat frequency, so it can be useful in molecular diagnostics (Table S3). BLAST analysis was performed for sequences flanking the microsatellite loci, and similarities to protein sequences for most of the loci were found (Table S3).

Phylogenetic placement of strain 814/98

A phylogenetic dendrogram of P. syringae strains, with P. aeruginosa, P. cichorii, P. fluorescens and P. putida species used as outgroups, consists of three main clusters (Fig. 1). The first large cluster, which corresponds to phylogroup 3 according to Hwang et al. (2005) and genomospecies 2 (Gardan et al. 1999), includes strains that primarily belong to P. syringae and P. savastanoi species, and it is divided further into subclusters. This large cluster united strains belonging to woody plant pathogens of Pseudomonas (pathovars nerii, savastanoi, fraxini, aesculi, ulmi and mori) and herbaceous plant pathogens (pathovars glycinea, phaseolicola, tabaci, sesami and lachrymans). The five strains of pathovar lachrymans, namely 107, 814/98, YM7902, 98-744A and MAFF301315, grouped in phylogroup 3, showed high genetic similarity to one another. The second main cluster corresponded to phylogroup 2 and genomospecies 1 and contained mainly strains of P. syringae pv. syrinage. The third main cluster corresponded to phylogroup 1 and genomospecies 3 and included strains belonging to pathovars tomato, morsprunorum, actinidiae, maculicola and several lachrymans strains; here were grouped lachrymans strains 3988, BG966, and LMG5070.

Fig. 1
figure 1

Phylogenetic relationship among Pseudomonas spp. strains based on MLST analysis. A Bayesian dendrogram was constructed based on three housekeeping gene fragments, i.e. acn, gyrB and pgi genes. Nucleotide sequences of P. aeruginosa, P. cichorii, P. fluorescens, and P. putida were used as the out-group. Final sequence alignment for 40 selected strains consisted of 1672 nucleotides with concatenated partial gene sequences (acn, gyrB, and pgi) and partial gene alignments consisted of 521, 672, 475 nucleotides for acn, gyrB, and pgi, respectively. The phylogroups (genomospecies) are indicated in colors. Strains indicated by asterisk (*) were classified into genomospecies 8 by Marcelletti and Scortichini (2014), which is closely-related to genomospecies 3

Comparison of distinct genomes of pathovar lachrymans strains

Phylogenetic reconstruction of sequence data clearly showed that strains assigned to P. syringae pv. lachrymans are grouped into two genetically divergent groups, i.e. phylogroups 3 and 1 (Fig. 1). This divergence is visible in the virulence test on susceptible cucumber accession line B10 (Fig. 2). Strain 814/98 (phylogroup 3) produced necrotic angular leaf spot symptoms (a), whereas strain BG 966 (b) placed in phylogroup 1 produced only weak symptoms. The existence of two distinct clusters within pathovar lachrymans was subsequently confirmed at both the nucleotide and amino acid level. 814/98 contigs and scaffolds were subsequently compared to all sequenced genomes of pathovar lachrymans, i.e. 98A-744, 107, YM7902, MAFF301315, MAFF302278, 3988 and ICMP3507. We found that the 814/98 genome exhibited the highest similarity to strain 98A-744, than to strains 107 and YM7902, and to MAFF301315 (all in phylogroup 3). In contrast, there was limited similarity to strains MAFF302278, ICMP3507 and 3988 which belonged to phylogroup 1 (data not shown). At the amino acid level the genome of strain 814/98 was compared with lachrymans genomes belonging to phylogroup 3 (MAFF301315) and phylogroup 1 (MAFF302278). The analysis detected the evolution of homologous genomes as indicated by variation in the location of gene clusters with similar function. This analysis of representative genomes confirmed the existence of two strain types. Strain 814/98 was very similar to MAFF301315 (phylogroup 3), except for the mega-plasmid sequence which was absent in 814/98 (Fig. 3).

Fig. 2
figure 2

Differentiation in virulence of P. syringae pv. lachrymans strains 814/98 (a) and BG 966 (b) on susceptible cucumber line B10 leaves 7 days after inoculation

Fig. 3
figure 3

Linear synteny of P. syringae pv. lachrymans genomes at the amino acid level. Strain 814/98 was compared with strains MAFF301315 (phylogroup 3) and MAFF302278 (phylogroup 1). In the box of sequences, the orange region stands for the amino acid sequence in the forward chain of the genome sequence and the blue region stands for the amino acid sequence in the reverse chain of this genome sequence. The orange lines stand for forward alignment of two sequences. Higher synteny is observed between pathovar lachrymans strains 814/98 and MAFF301315 than between 814/98 and MAFF302278. Box pMPPla107 indicates 1 Mb megaplasmid present in MAFF301315

Plasmid identification in strain 814/98

After mapping raw reads on the assembled contigs, a higher than average level of coverage with reads (180–520×) that characterize plasmids was identified for 21 contigs as compared to the average (140–160×) (Table S2). Five groups of contigs were formed according to coverage level. These groups could represent either potential plasmids or repetitive regions, therefore they were further investigated and surveyed for the presence of structural genes or unique sequences associated with P. syringae plasmids (Table S6). This survey revealed similarities to plasmids of pathovars actinidiae, tomato, maculicola, syringae, phaseolicola and P. fluorescens. Scaffold 7 showed similarity to the plasmid of P. fluorescens. Scaffold 8 showed similarity to the plasmid of P. syrinagae pv. actinidiae ICMP 18884, to the large plasmid of pathovar phaseolicola strain 1448A, and to both plasmids of the tomato DC3000 strain. Up to 20% of scaffold 8’s length displayed 100% sequence similarity to plasmids in the bacteria as listed above. Scaffold 9 exhibited 93–100% similarity in 40% of its length to several plasmids, namely to the small plasmid of pathovar phaseolicola strain 1448A and to the plasmids of strains belonging to pathovar maculicola and actinidiae (Table S6). The results obtained in the BLAST search were confirmed by reciprocal BLAST. Subsequently, after surveying the genome against ori sequences deposited in the databases, a similarity of over 90% was shown on scaffolds 8 and 9, confirming the existence of at least two plasmids in the 814/98 genome. Further blasting against the unique plasmid repA gene confirmed that the sequence of plasmid repA is present on scaffolds 8 and 9. The length of repA was ca. 1300 bp on both scaffolds and similarity was 91% and 87%, respectively. In total, 203 genes were found on the plasmids (Table S7). Most of the genes were identified on scaffold 7 (87) than on scaffold 8 (69) and scaffold 9 (47). However, only single type III effector gene hopAF1 was found on scaffold 9, where also repA, genes encoding plasmid stability protein StbB, conjugal transfer and VirB proteins related to T4SS and GntR were found. The plasmid genes on scaffold 8 showed similarity to repA, parA, parB, mobA, mobB, mobC, gntR, MFS transporter, several tra genes and also to conjugal transfer protein genes. This indicates that scaffolds 8 and 9 are probably conjugative plasmids as they possess many genes encoding T4SS and conjugation-related proteins. Genes on scaffold 7 showed similarity to genes encoding proteins related to pillus formation and hypothetical plasmid proteins (Table S7); thus BLAST analysis confirmed the existence of three plasmids in the 814/98 genome.

Identification of type III effector proteins (TTEs)

The genomes of the Pseudomonas spp. strains collected in our dataset (Table 1) were characterized for the presence of TTEs by using the approach of Baltrus et al. (2011). Of the 90 examined TTEs, 78 were identified in 22 of the analyzed genomes and 24 of them were identified in the 814/98 genome (Table S8). The same effectors were also present in the sequenced genomes of three pathovar lachrymans strains, i.e. 98A-744, 107 and YM7902. A different set of TTEs was present in lachrymans strains 3998 and MAFF302278. None of the effectors was found in P. aeruginosa PAO1, P. fluorescens SBW25, P. cichorii JBC1 or P. putida KT2440. A dendrogram consisting of two major clusters was built based on the TTEs’ presence (Fig. 4). The main cluster included strains belonging primarily to the P. syringae and P. savastanoi species. This cluster was divided into subclusters and contained strains belonging to woody plant pathogens of Pseudomonas spp. (pathovars neri, mori, fraxini, aesculi, ulmi and savastanoi) and pathovars which infect a diverse group of plant species (actinidiae, maculicola, morsprunorum, sesami, tabaci as well as phaseolicola and glycinea), including a subcluster of five lachrymans strains: 814/98, 107, 98A-744, YM7902 and MAFF301315 (phylogroup 3). A second major cluster includes pathovars tomato, syringae, pisi and two lachrymans strains from phylogroup 1, namely 3988 and MAFF302278.

Fig. 4
figure 4

Relationships among phytopathogenic strains of Pseudomonas spp. with a focus on pathovar lachrymans, based on the presence of TTEs. The division into phylogroups is indicated in orange and blue. The binary matrix of the TTEs’ presence or absence, which was the basis for construction of the tree, is presented in Table S8

After the TTEs analysis in many strains representing different pathovars, we concluded that a core set of eight TTEs was conserved in many of the P. syringae strains, including three fully sequenced genomes of P. syringae pv. tomato DC3000, P. syringae pv. syringae B728a and P. syringae pv. phaseolicola 1488A (bolded in Fig. 5). In addition to this conserved core set, each strain contained several unique TTEs. We compared the TTEs of 814/98 and 3988, the two strains of P. syringae pv. lachrymans which are located in different phylogroups and showed that besides the core set there were seven additional TTEs common for these strains (Fig. 5). An additional nine TTEs were present only in 814/98. A significantly different TTEs profile was found in strain 3988, as this lachrymans strain contained an additional 17 TTEs which are also present in P. syringae pv. tomato DC3000 (Fig. 5, Table S8). Interestingly, there were two TTEs, i.e. hopAW1 and hopBD1, which were present in lachrymans strains belonging to both phylogroups but were absent in the strains of other pathovars and therefore are possibly unique type III effectors for the lachrymans pathovar irrespectively of the phylogroup location.

Fig. 5
figure 5

TTEs present in two strains of P. syringae pv. lachrymans, namely in strains 814/98 and 3988, which represent two distinct phylogroups. TTEs indicated and bolded in the center of the diagram are conserved among all of the analyzed strains (core set of TTEs). The diagram was constructed based on the TTEs’ binary matrix (Table S8)

Discussion

Rapid technological progress and cost reduction achieved in DNA sequencing technology have both been instrumental in causing the increase in the number of drafts and complete genome sequences. In particular, a plethora of draft genomes were published recently for plant pathogenic bacteria (NCBI 2017). This abundance of information allows to make even more efficient comparisons of related strains, although the quality of the genomes varies. A draft genome sequence of strain 814/98 of P. syringae pv. lachrymans is presented here. Strain 814/98 is well-characterized phenotypically as highly virulent with the ability to cause large, water-soaked lesions on cucumber leaves that become necrotic after a few days (Fig. 2). The symptoms caused by strain 814/98 on cucumber leaves never failed to develop in every single test over the course of many years of testing. This strain did not produce fluorescent pigment on King’s medium B but displayed the typical phenotypic and biochemical characteristics of P. syringae in LOPAT tests (Olczak-Woltman et al. 2007; Olczak-Woltman et al. 2008; Słomnicka et al. 2015).

The genome size of strain 814/98 was estimated to be 6.58 Mb in length and the GC content was 57.97%. A genome size of ca. 6–6.5 Mb is typical for the whole P. syringae genome, and also typical for pathovar lachrymans (Baltrus et al. 2011). The de novo assembly of 814/98 seems to be one of the most complete among the sequenced pathovar lachrymans strains so far, as the whole genome is represented in 22 scaffolds. The number of scaffolds in the sequenced genome is dependent on the method used in sequencing and assembling. It is often larger than 30 and increases up to several hundred, whereas the number of contigs increases even up to 50 hundred (Baltrus et al. 2011; Martínez-García et al. 2015; NCBI 2017). NGS assembly often suffers from short reads and repetitive fragments which influence the analysis of genome coverage with reads. Paired-end sequencing combined with two types of libraries (500 bp and 6500 bp inserts) was used as a solution as it partly compensated for the lack of long reads (Zhang et al. 2011). A hybrid de novo assembly method using a combination of long reads and Illumina short reads data was shown to reduce the contig number after assembly (Boetzer and Pirovano 2014). Different technologies used in genome sequencing and assembly influence sequence quality and cause comparative analysis results to not be convincingly conclusive, e.g. miss-assembly or gaps in the genome sequence of MAFF302278 meant that we were not able to place it on the phylogenetic tree (Fig. 1).

An adequate strain, pathovar and species classification is an important goal. We previously attempted to describe the collected cucumber strains (Olczak-Woltman et al. 2007; Słomnicka et al. 2015); however, a more precise methodology and new data are beginning to challenge old conclusions. Baltrus (2016) proposed a sequence-based classification system that is unambiguous and would enable microbial classification without abandoning previous taxonomic systems. Establishing such a system is imperative because the existence of distinct clusters of strains within pathovars is reported often after genome sequencing, which forces nomenclature revision. Here we present strong evidence that there are two significantly divergent clusters of strains within pathovar lachrymans and grouped into different phylogroups. Strains from phylogroup 3 (Fig. 1) form a “necrotic type” of lachrymans strains capable of producing symptoms of angular leaf spot disease on cucumber leaves as exemplified by strain 814/98 (Fig. 2a). On the other hand, strains from phylogroup 1 produce only weak symptoms (Fig. 2b). It is interesting to note that Newberry et al. (2016) demonstrated that some strains associated with angular leaf spot in cucurbits are phylogenetically distinct from pathovar lachrymans. Strain classification is also an issue in other pathovars. Gironde and Manceau (2012) identified the pathovar tomato as a controversial one because of the phenotypic diversity of the strains, particularly at the level of pathogenicity. There are two populations within that pathovar that are pathogenic on different host species and should probably be classified as different pathovars. Moreover, strain DC3000, currently in the pathovar tomato, should probably be reclassified into pathovar maculicola, or the two pathovars should be grouped together (Gironde and Manceau 2012). Similarly, two very distinct clusters of strains in P. avellanae led to the claim that nomenclatural revision should be made (Scortichini et al. 2013). The genetic differences within P. syringae pv. actinidiae and differences in pathogenicity between strains were sufficient to define a new pathovar called P. syringae pv. actinidifoliorum (Cunty et al. 2015). The results as listed above indicate that the strains should be classified carefully because of genetic and phenotypic diversity. Here we present results which clearly show that there are two groups of P. syringae pv. lachrymans strains and that perhaps a new pathovar should also be defined. However, we believe that more evidence is needed to define the new pathovar. The strains representing the two lachrymans groups should be carefully tested on different cucurbit species and more genomes of lachrymans strains have to be analyzed.

Bacterial genomes consist of a chromosome and may contain one or more plasmids, and strains may vary in the number and size of the plasmids (Buell et al. 2003; Feil et al. 2005; Joardar et al. 2005; Zhao et al. 2005; Baltrus et al. 2011). In order to identify plasmids in strain 814/98, we simultaneously incorporated various methodologies, e.g. a search for similarity to known plasmids, bioinformatic analysis of structural plasmid genes, genome coverage with reads and laboratory plasmid isolation (Słomnicka et al. 2015). The bioinformatic analysis showed that strain 814/98 contains at least three plasmids. A small one, approximately 40–60 kb in size, was formed by scaffold 9. It has almost 400× coverage with reads, indicating that it is a medium-copy-number plasmid. A plasmid of this size was previously suggested in strain 814/98 based on electrophoretic plasmid separation using Eckhardt gels (Słomnicka et al. 2015). This plasmid shows sequence similarities to several other plasmids, including the small plasmid of pathovar phaseolicola strain 1448A (Joardar et al. 2005), the plasmid of pathovar syringae strain UMAF0158, which contains rulAB, repA, virB and virD genes, and to scaffold 29 of strain MAFF301315 (Cazorla et al. 2008; Baltrus et al. 2011; Martínez-García et al. 2015). The second, large plasmid of 814/98 is represented by scaffold 8 and possibly other smaller contigs of similar cover with reads (200–250×) that together are ca. 200 kb in length. This size corresponds to the largest plasmid detected by Słomnicka et al. (2015). This plasmid shows high similarity to plasmids of pathovar tomato strain DC3000 and to the large plasmid of pathovar phaseolicola strain 1448A (Buell et al. 2003; Joardar et al. 2005). On the other hand, it shows negligible similarity to strain MAFF301315 scaffolds (Baltrus et al. 2011). These two plasmids, represented by scaffolds 8 and 9, possess a large number of genes encoding proteins connected with T4SS and conjugation (Table S7). The presence of plasmids related to conjugation might have resulted in effective acquisition of virulence genes by strain 814/98. The third plasmid detected on Eckhardt gels by Słomnicka et al. (2015) was ca. 100 kb in size and corresponded to scaffold 7, which showed similarity to P. fluorescens SBW25 plasmids (Silby et al. 2009) carrying genes connected with pillus formation (Table S7) and no similarity to any sequences of already sequenced pathovar lachrymans genomes. This suggests that it may be a unique, relatively large, low-copy-number plasmid of strain 814/98. Unfortunately, we were not able to find the ori sequence on scaffold 7. The largest and best described plasmid family in P. syringae is pPT23A, also called PFP (Zhao et al. 2005; Ma et al. 2007). PFP plasmids appear to originate from a common ancestor and share homologous RepA-PFP related to ColE2 (Bardaji et al. 2017). The PFP plasmids are from 35 to over 100 kb in size. Based on our results, we concluded that the 814/98 plasmids most likely belong to this family.

The functional analysis of the 814/98 genome revealed similarity to B728a, 1448A and DC3000 strains, although a slightly smaller number of genes was identified. A high degree of conservation between 814/98 and the other strains belonging to the P. syringae complex was observed with respect to both the gene and the TTE numbers and classes (Martínez-García et al. 2015). The analysis of TTEs in the 814/98 genome resulted in identification of a total of 24 effectors. The identified TTEs are members of different TTE families as described by Baltrus et al. (2011), belonging either to the core TTEs found in all pathogenic P. syringae strains or to TTEs that are diverse in sequence and are present in a wide variety of genomic locations. The TTE analysis of the P. syringae genome presented here is in agreement with previously published results and confirms TTE-based phylogenetic strain grouping (Baltrus et al. 2011; O’Brien et al. 2011; Lindeberg et al. 2012). Moreover, the phylogenetic grouping conducted here corresponds well with phylogroups according to Hwang et al. (2005) and genomospecies according to Marcelletti and Scortichini (2014) (Fig. 4). Almost all of the strains in phylogroup 1 (genomospecies 3), represented by the pathovars actinidie, morsprunorum, tomato and a few lachrymans strains, carry at least 30 functional TTEs. In phylogroup 2 (genomospecies 1 – represented by pathovar syringae) and phylogroup 3 (genomospecies 2), containing, among others, “necrotic type” lachrymans strains, the number of effectors ranges between 19 and 30. Phylogenetic analysis of the TTEs suggests that some of the effectors were acquired, lost and reacquired (O’Brien et al. 2011; Lindeberg et al. 2012). This pattern indicates a heterogeneous distribution of TTEs within P. syringae. This observation may support the idea that there is strong selection pressure for the loss of effectors that can be recognized by host plants. In the case of pathovar lachrymans, the existence of two different strain sub-clusters was suggested previously (Słomnicka et al. 2015). Huge differences in lachrymans strains clustered in different phylogroups indicate that they may have evolved separately from different species.