Introduction

The enterobacterium Erwinia amylovora is the causal agent of the fire blight disease, threatening global pome fruit production (i.e., apple, pear) and a wide-variety of Rosaceae (Spiraeoideae, Maloideae, Rubus) species, including ecological cornerstone species (e.g., forest, landscape and rural ecosystems). The pathogen first was described in the late 1790s and originated in North America. From there it has relatively recently dispersed to New Zealand in the late 1910s, the United Kingdom and Northern Europe in the late 1950s and the Middle East in the mid-1960s (Bonn and van der Zwet 2000). Since the first reports of fire blight in Europe, the pathogen has continued to spread across the continent (Jock et al. 2002) and now threatens Central Asia, the germplasm region of origin for apple and pear. The quarantine status of E. amylovora in many countries imposes further economic losses from phytosanitary control measures and as a highly charged trade (Calvin and Krissoff 1998).

The general epidemiology of fire blight is well understood and is the basis for current control strategies. E. amylovora lacks enzymatic means for penetrating healthy host tissues and infects plants through natural openings (e.g., floral nectaries, leaf hydathodes) and via wounds (e.g., from hail or insect damage). Once inside a host, the pathogen can spread in the plant through the vascular system (Billing 2011), and aggressive sanitation is the key to remove inoculum reservoirs (e.g., tree removal) and prevent further advance within infected hosts (e.g., pruning well beyond visible disease symptoms). Dependent on the infected plant part, the disease develops as flower, shoot or rootstock blight. Typical symptoms are flower necrosis, fruit rot, shepherd’s crook in shoots, bacterial ooze and cankers in woody tissue. The disease develops under defined weather conditions that enabled the deployment of fire blight forecast models. Pathogen establishment and rapid infection of flowers (i.e., blossom blight) is the main infection court and epidemic driver of fire blight and thus the primary focus for preventative disease control efforts.

Control options against blossom blight include antibiotic and biocontrol agent application during bloom to reduce the epiphytic population of E. amylovora. Resistance to streptomycin (Chiou and Jones 1995), the most effective antibiotic against E. amylovora, and regulatory restriction on antibiotic use in plant agriculture (McManus et al. 2002) demands development of novel control measures with comparable efficacy. Natural epiphytic bacteria of the closely related species Pantoea agglomerans and Pantoea vagans have proven to be among the most reliable and effective antagonists of E. amylovora when applied during bloom-time (Stockwell et al. 2010). P. agglomerans strains have been isolated from various environments (e.g., soil, plant, water) (Gavini et al. 1989; Rezzonico et al. 2009), reflecting their potential to successfully compete with indigenous microbial populations. The growth inhibition induced by P. agglomerans strains might result from nutrient competition, active site exclusion and antibiotic production or the combination of these processes (Kearns and Mahanty 1998; Vanneste et al. 1992; Wodzinski et al. 1994; Pusey et al. 2008).

Although important insights have been acquired regarding this important phytopathogenic bacterium (Table 1), much remains uncertain about the evolutionary genetics of E. amylovora. The recent sequencing of eight Erwinia genomes (i.e., 3 E. amylovora, 2 Erwinia pyrifoliae, Erwinia sp., Erwinia billingiae and Erwinia tasmaniensis) (Sebaihia et al. 2010; Smits et al. 2010a, b; Kube et al. 2008, 2010; Park et al. 2011) and three genomes of the closely related genus Pantoea (i.e., P. vagans, P. agglomerans and Pantoea ananatis) (Smits et al. 2010c; De Maayer et al. 2010) provides a solid genomics foundation to infer the species evolution within these genera.

Table 1 Genes and gene clusters assessed on potential impact on pathogenicity

Erwinia amylovora genomics

The genome of the Crataegus isolate E. amylovora CFBP 1430 was sequenced, consisting of a 3.8 Mb chromosome and the 28 kb plasmid pEA29 (Smits et al. 2010b). In total, 3,736 CDS were automatically assigned using GenDB (Meyer et al. 2003) and manually curated. The annotated genome allowed the identification of known virulence factor genes [e.g., hypersensitivity response and pathogenicity (hrp), amylovoran biosynthesis gene cluster] of E. amylovora in the genome (Oh and Beer 2005), but also included the genes encoding several new putative factors, like two additional Inv/Spa-type type III secretion systems (T3SSs), a second flagellum and the complete desferrioxamine E biosynthesis cluster (Table 1) (Smits et al. 2010b).

Currently, two further E. amylovora genome sequences are available: Malus isolate Ea273 (ATCC 49946) and the Rubus isolate ATCC BAA-2158 (Powney et al. 2011). Comparison of the Ea273 genome to that of CFBP 1430 clearly shows that two large rearrangements must have occurred within the rRNA regions (Fig. 1) (Smits et al. 2010b) that could explain differences in the PFGE patterns observed before (Zhang and Geider 1997; Jock et al. 2002). Several other, relatively small differences mainly in the ITS regions rendered the chromosome of Ea273 in total 301 bp larger than that of CFBP 1430. Nevertheless, the both genomes share over 99.99% sequence identity over the complete length, indicating only minimal evolution since the geographical dispersal. Plasmid pEA72, identified in the genome of E. amylovora Ea273 (Sebaihia et al. 2010), is absent in E. amylovora CFBP 1430.

Fig. 1
figure 1

Synteny plot of E. amylovora strains CFBP 1430 and ATCC 49946 generated using EDGAR (Blom et al. 2009). The position of each CDS given on the X axis is plotted against the position of its ortholog in the other chromosome given on the Y axis. Identical gene organization on the chromosomes results in a diagonal plot, inversions and chromosomal rearrangements are plotted perpendicular to it

The draft genome of the closely related but genetically distinct E. amylovora strain ATCC BAA-2158 with restricted pathogenicity to Rubus spp. was recently published (Powney et al. 2011). Also here, collinearity was obtained over large regions of the chromosome. This strain carries, in addition to pEA29, two small plasmids (pEAR4.3, pEAR5.2) (Table 2). A total of 373 singletons in this strain may give indications towards the restricted host range of this strain (Powney et al. 2011).

Table 2 Summary of sequenced genomes of Erwinia and Pantoea

An obvious difference between the currently sequenced E. amylovora genomes is the presence of (cryptic) plasmids (Smits et al. 2010b; Sebaihia et al. 2010; Powney et al. 2011). Plasmids appear to be a major factor influencing the pan-genome of E. amylovora. However, although several plasmids of different sizes have been detected in isolates of this species (Chiou and Jones 1991; Foster et al. 2004; Laurent et al. 1989; McGhee et al. 2002; Steinberger et al. 1990), the knowledge on this extra-chromosomal material is limited to few strains and plasmids.

Erwinia inter-species genomics

Another five Erwinia genomes were recently sequenced, namely two E. pyrifoliae strains (DSM 12163T and Ep1/96) (Smits et al. 2010a; Kube et al. 2010), Erwinia sp. Ejp617 (Park et al. 2011), E. tasmaniensis Et1/99 (Kube et al. 2008) and E. billingiae Eb661 (Kube et al. 2010) and are available for inter-species comparisons.

E. pyrifoliae, a close relative of E. amylovora, is primarily a pathogen of Asian or Nashi pear (Pyrus pyrifolia) with a restricted geographical distribution to East Asia (Kim et al. 1999). Disease symptoms caused by E. pyrifoliae are almost indistinguishable to those of E. amylovora (Rhim et al. 1999). Another pathogenic Erwinia species, causing bacterial shoot blight of pear in Japan, was first described as E. amylovora due to similar disease symptoms and was later found to be closer related to E. pyrifoliae than to E. amylovora (Mizuno et al. 2000; Matsuura et al. 2007; Geider et al. 2009). E. tasmaniensis strains were isolated from apple and pear as non-pathogenic epiphytic bacteria (Geider et al. 2006; Powney et al. 2011). The non-pathogenic E. billingiae was isolated as non-pigmented E. herbicola and later reclassified (Billing and Baker 1963; Mergaert et al. 1999).

Chromosomal collinearity of E. amylovora to the closely related E. pyrifoliae strains, Erwinia sp. Ejp617 and E. tasmaniensis was observed, but chromosomal large-scale rearrangements were detected. These ecologically distinct species harbor distinct plasmids with low sequence similarity. The genome sizes of the above described Erwinia species range from approximately 3.8–4 Mb with the exception of E. billingiae having a genome size of 5.4 Mb (Table 2). The genome sizes of the sequenced pathogenic Erwinia spp. are markedly smaller as compared to other enterobacterial genomes, likely the effect of genome erosion (Georgiades and Raoult 2011). The loss of epiphytic fitness factors due to genome size reduction and the acquisition of the Hrp-type T3SS and genes needed for the biosynthesis of the exopolysaccharides levan and amylovoran are potentially a result of adaption to the pathogenic lifestyle.

Differential gene content can be displayed in Venn diagrams, calculated by reciprocal BLAST of all CDS to all the input sequences. Areas displayed in such a Venn diagram represent a subset of the compared genomes and the number of genes is indicated by numbers. The core genome (Tettelin et al. 2005) consists of CDS shared by all genome sequences, which usually includes metabolic and cellular functions. CDS shared by two or several species are present in overlapping regions. Areas not shared by any other pool of CDS (singletons) include accessory genes which may provide additional functions (virulence factors, host-range determinants, metabolic processes) and contribute to species variability (Blom et al. 2009).

The core genome of E. amylovora strain CFBP 1430, E. tasmaniensis Et1/99, E. pyrifoliae DSM 12163T and the non-pathogenic E. billingiae Eb661 displays 2414 genes shared between these species. The genes only shared by E. amylovora CFBP 1430, E. tasmaniensis Et1/99 and E. pyrifoliae DSM 12163T, in different combinations, include the most important virulence factors, e.g., T3SSs and the exopolysaccharides amylovoran and levan (Table 3). These genes are absent in E. billingiae Eb661 and might therefore represent the “pathogenic core” of disease eliciting Erwinia species (Fig. 2a). Virulence determinants setting host-range and specificity are most likely included in the singletons for the broad host range E. amylovora, and absent in the genomes of Erwinia species with restricted host-ranges (Fig. 2a).

Table 3 Selected factors analyzed by comparative genomic approaches
Fig. 2
figure 2

Venn diagram of Erwinia spp. (a) and Pantoea spp. including E. billingiae (b) generated using EDGAR. The numbers of CDS is indicated. Overlapping areas indicate shared CDS. The “pathogenic” (Fig. 2a)—and “biocontrol” core (Fig. 2b), respectively, are indicated by dotted lines

Selected features clarified using comparative genomics

Type III secretion systems

T3SSs are part of the “pathogenic core” and are absent in E. billingiae Eb661. The Hrp T3SS genes were identified in E. pyrifoliae DSM 12163T (Smits et al. 2010a) and E. tasmaniensis Et1/99 (Kube et al. 2008) showing differences in gene content compared to E. amylovora CFBP 1430 (Smits et al. 2010b). E. tasmaniensis Et1/99 lacks the HAE region present in the other two species, as well as ORFU1 and ORFU2 that are only present in E. amylovora CFBP 1430 (Fig. 3a).

Fig. 3
figure 3

Comparison of the Hrp (a) and Inv/Spa-type (b) Type III Secretion Systems in E. amylovora CFBP 1430, E. pyrifoliae DSM 12163T and E. tasmaniensis Et1/99. Related genes are shaded grey

Proteins of E. amylovora secreted by the Hrp-T3SS have been demonstrated to be essential for pathogenicity on host-plants (Table 1) (Oh and Beer 2005). The Hrp T3SS gene cluster is located on pathogenicity island 1 (PAI-1) and consists of the hrp/hrc region, flanked by the Hrp effectors and elicitors (HEE) region and the Hrp-associated enzymes (HAE) region. The Hrp/Hrc region contains regulatory genes as well as genes encoding for secreted proteins. DspA/E and HrpN encoded by genes in the HEE are secreted proteins essential pathogenicity factors of E. amylovora (Gaudriault et al. 1997; Wei et al. 1992). The products of the hrp-associated systemic virulence genes (hsv) encoded in the HAE region are required for systemic infection of host-plants (Oh et al. 2005).

Additional T3SSs (Inv/Spa-1, Inv/Spa-2) were identified in the genomes of the pathogenic Erwinia spp. that differ in gene content. While the inv/spa-2 gene cluster is present in E. amylovora CFBP 1430, E. pyrifoliae DSM 12163T and E. tasmaniensis Et1/99, the inv/spa-1 gene cluster is absent in E. pyrifoliae DSM 12163T and only partially present in E. tasmaniensis Et1/99 (Fig. 3b). The inv/spa-type T3SSs are located in low G + C regions on the chromosome of the three Erwinia spp. and might therefore be an acquired trait. The inv/spa-type T3SSs are similar to the Salmonella pathogenicity island SPI1 T3SS of Salmonella typhimurium LT-2 (McClelland et al. 2001) and the inv/spa T3SS of the insect endosymbiont Sodalis glossinidius (Dale et al. 2001) and are not directly implicated in virulence on host-plants (Zhao et al. 2009).

Exopolysaccharides

The E. amylovora CFBP 1430 and E. pyrifoliae DSM 12163T exopolysaccharide gene clusters differ from the respective clusters in E. tasmaniensis Et1/99 and E. billingiae Eb661 by an exchange of two glycosyltransferases (Fig. 4) resulting potentially in the production of amylovoran. Amylovoran biosynthesis is a specific virulence factor in E. amylovora and E. pyrifoliae reflected in the fact that deletion or mutagenesis of specific genes renders the pathogens avirulent (Bellemann and Geider 1992; Kim et al. 2002). The exopolysaccharide gene clusters, producing CPS of the two non-amylovoran producing species and also of Pantoea spp., might represent the ancestral state.

Fig. 4
figure 4

Comparison of the exopolysaccharide gene clusters of Erwinia spp. and Pantoea spp. Related genes are shaded

The additional exopolysaccharide levan is only produced by E. amylovora and E. tasmaniensis, whereas not by E. billingiae and E. pyrifoliae. The gene encoding the levansucrase, responsible for the synthesis of levan, most likely was acquired by a common ancestor of E. amylovora, E. tasmaniensis and E. pyrifoliae. The gene is retained by E. amylovora and E. tasmaniensis, whereas lost by E. pyrifoliae.

Siderophores

All so far sequenced genomes of Erwinia spp. contain the desferrioxamine E siderophore biosynthesis gene cluster (Kube et al. 2010; Smits et al. 2010a, b), whereas the enterobactin gene cluster, producing the catecholate siderophore enterobactin found in the genomes of many enterobacteria is absent.

Iron is an essential nutritional factor, required as cofactor for many proteins. In iron deprived environments high-affinity iron uptake siderophores are secreted to the environment to gain access to this limited factor by removing it from minerals and organic substances. Erwinia spp. produce the hydroxamate siderophore desferrioxamine E (Feistner et al. 1993; Kachadourian et al. 1996) and the specific TonB-dependent ferrioxamine receptor FoxR, both involved in iron uptake. Mutation of these genes leads to colonization defects of E. amylovora on flowers (Dellagi et al. 1998), whereas DFO E might be protective to oxidative conditions (Venisse et al. 2003).

Phylogenomic applications

A core genome tree was constructed (Fig. 5) which displays the phylogeny of the genus Erwinia which is in accordance to trees based on gyrB sequences. On both trees, E. billingiae groups to the Erwinia spp. and is the most distantly related Erwinia species sequenced until now. The position of E. billingiae seems to be close to the genus delineation between Erwinia and Pantoea. The marked difference in genome size, approximately 5.4 Mb E. billingiae Eb661 versus the nearly 4 Mb for the other Erwinia spp., might be the result of genome size reduction during specification to plant pathogenicity in the latter species. The Hrp, Inv/Spa-1, Inv/Spa- 2 T3SSs, absent in the genome of E. billingiae, might have been acquired by the pathogenic Erwinia ancestor, prior the divergence of E. amylovora, E. pyrifoliae and E. tasmaniensis.

Fig. 5
figure 5

Phylogenetic tree based on the 2022 core genes of Erwinia and Pantoea spp. generated using EDGAR. Percent divergence is indicated by the scale bar

Erwinia billingiae

The genera Erwinia and Pantoea are closely related (Hauben et al. 1998), which is supported by the core genome tree (Fig. 5) and the fact that gene synteny across the genus border is retained for large regions. E. billingiae first was isolated and described as non-pigmented E. herbicola (Billing and Baker 1963) [now P. agglomerans (Gavini et al. 1989)] and later reassigned to E. billingiae (Mergaert et al. 1999).

The exopolysaccharides amylovoran and levan are not produced by E. billingiae, but a capsular polysaccharide (CPS) similar to that of Pantoea spp. is formed. Compared to Pantoea spp., the CPS gene cluster in E. billingiae Eb661 has an inversion of two genes that is distinct for the members of the genus Erwinia. The number of CDS shared by other Erwinia species and E. billingiae (Fig. 2a) in the core genome is smaller (approximately 500 CDS) than the core genome calculated when E. billingiae is included with Pantoea species (Fig. 2b). The genes shared with Pantoea include many carbohydrate uptake and utilization pathways and phosphonate utilization; genes absent in other Erwinia spp. These genes, possibly involved in the epiphytic fitness, were lost in the course of genome size reduction towards a more specific plant-associated or pathogenic lifestyle.

Pantoea biocontrol agent genomics

The genome of the biocontrol agent P. vagans C9-1 (Ishimaru et al. 1988; Rezzonico et al. 2009, 2010) was recently sequenced, consisting of a 4,025 Mb circular chromosome and the three plasmids pPag1, pPag2 and pPag3 (Smits et al. 2010c). A total of 4,619 CDS were assigned using GenDB (Meyer et al. 2003) and manually annotated. Genome sequencing of P. agglomerans E325 (Pusey et al. 2008; Pusey 1997), a second biocontrol agent, is in progress (Smits and Duffy, unpublished). The genome sequences of both Pantoea spp. lack known enterobacterial virulence determinants such as T3SSs, toxins and pectolytic enzymes. A large repertoire of carbohydrate metabolic pathways, epiphytic fitness genes (e.g., AI-1 quorum sensing genes, IAA and carotenoid biosynthesis) and the biosynthetic genes for the antibacterial metabolite pantocin A were identified in the genome sequence of P. vagans C9-1 (Smits et al. 2010c).

Pantoea comparative genomics

Additional to the two biocontrol strains, the genome of P. ananatis LMG 20103, the causative agent of Eucalyptus blight and dieback, was sequenced (De Maayer et al. 2010) and was therefore included in comparative genomics analysis. Although being described as a plant pathogen (Goszczynska et al. 2006, 2007), its genome sequence lacks T3SSs, a major virulence factor in many plant-associated pathogens, rendering this organism an unusual plant pathogen (Coutinho and Venter 2009).

The calculated core genome of P. ananatis LMG 20103, P. vagans C9-1, P. agglomerans E325 and E. billingiae Eb661 (Fig. 2b) includes, additionally to the genes of the Erwinia core genome, carbohydrate metabolic pathways for maltose, rhamnose, glucarate, xylose, uronate, L-lactate, acetate as well as utilization of phosphonates, and many transporters (all types). Genes absent in the genome of the plant pathogen P. ananatis LMG 20103, but shared by the two other Pantoea and/or E. billingiae might represent the “biocontrol core” (Fig. 2b) due to the fact that genes potentially contributing to epiphytic fitness (e.g., nitrate assimilation, enterobactin synthesis genes) are common to these groups. Other factors implied in biocontrol efficacy, such as antibiotic biosynthesis, are not shared between the Pantoea spp. since they produce different antibiotics (Pusey et al. 2008; Coutinho and Venter 2009; Ishimaru et al. 1988). The antibiotic pantocin A biosynthesis genes, for example are only present in P. vagans C9-1, whereas absent in P. ananatis LMG 20103, P. agglomerans E325 and E. billingiae Eb661. The biosynthesis genes of P. vagans C9-1 are located on a low-G + C genomic island of about 29 kb, which was probably acquired by horizontal gene transfer (Smits et al. 2010c, d). The exopolysaccharide gene clusters of the Pantoea spp. are similar, whereas the E. billingiae cluster differs by inversion of two genes (Fig. 4).

Perspectives

The available sequenced Erwinia genomes from different hosts enable the identification of virulence, host-specificity and metabolic determinants involved in pathogenicity by comparative genomic analysis. The analyses could yield the information needed to develop novel control measures for the fire blight disease. Genome sequencing and analysis of Pantoea spp. will reveal their potential and, for the already successfully used biocontrol agents, uncover the factors (metabolism, antibiotic production) responsible for effective biocontrol. As more Erwinia and Pantoea genomes get sequenced, these can be used to consolidate the current data as well as refine evolutionary aspects.