Introduction

Erwinia amylovora, the plant pathogen responsible for the fire blight disease in Rosaceae still causes huge losses in apple and pear orchards worldwide (Vrancken et al. 2013). E. amylovora is one of the top ten plant pathogen for its impact on global apple and pear production (Mansfield et al. 2012).

In E. amylovora three main molecular systems are required for a successful infection. The type three secretion system (T3SS), that is necessary to secrete effector proteins and suppress the host organism’s defenses (Vrancken et al. 2013). The exopolysaccharide biosynthetic pathway, producing amylovoran and levan, required to protect the bacteria and causing wilting of shoots and blight symptoms (Langlotz et al. 2011; Gross et al. 1992). The siderophore mediated iron uptake system, which scavenges the iron required for the pathogen metabolism from the host (Smits and Duffy 2011). Besides these, a series of virulence and pathogenicity factors are necessary for the pathogen to thrive (Borruso et al. 2017).

For a better understanding of the molecular basis regulating the biology and pathogenicity of an organism, it is worth to study in vitro the proteins involved in its life cycle. Genome analysis and knock-out mutants are an excellent starting point useful in identifying potential targets for further investigation. 3D structure determination coupled to biochemical characterization provides relevant details to study structure and function relationship of proteins and enzymes. Different experimental techniques are used to elucidate the structure of proteins, Nuclear Magnetic Resonance (NMR), X-ray crystallography, and Cryo-Electron Microscopy are still the most used techniques (Cavalli et al. 2007; Muench et al. 2019). The availability of sequenced genomes has allowed the development of “structural genomics” strategies. Structural genomics consortium have been created in order to explore and elucidate unknown protein folds, infer proteins function from their structure by comparison, and deposit their results in the Protein Data Bank (PDB) using an high-throughput approach (Grabowski et al. 2016). The approach of integrative structural biology, aiming at modelling the structure of biological systems, makes use of data from different techniques (NMR, X-ray, homology modelling, molecular docking, Small Angle X-ray Scattering, etc.) providing information to understand cell biology also useful in drug discovery (Rout and Sali 2019). The availability of E. amylovora genome (Smits et al. 2010) has allowed to study the fire blight pathogen using different techniques. From genome analysis and comparison to identify genes (and therefore proteins) that are crucial in developing the disease, to the study of knock-out mutants and phenotype analysis, and to the cloning of genes with production of proteins to be studied in vitro.

This review aims at describing the progress in the structural and functional characterization of the proteins from the plant pathogen E. amylovora whose structure has been obtained to date in chronological order since publication and release from the PDB.

The Protein Data Bank

The Protein Data Bank (PDB http://www.rcsb.org/) (Berman et al. 2003; Burley et al. 2019) collects, validates and offers public access to experimental data of 3D structures of biological macromolecules (proteins, DNA, and RNA). The availability of the 3D structures enables researchers belonging to different fields, to study the function, and by comparison, the evolution of biological macromolecules, to enhance knowledge in fundamental biology, biomedicine, and biotechnology. The PDB is currently archiving 165.117 entries (06/15/2020) and Table 1 collects the entries related to E. amylovora.

Table 1 E. amylovora 3D structures currently available at the PDB (06/15/2020)

RcsB

The gene of the regulator of exopolysaccharide synthesis RcsB from E. amylovora was first cloned by Bereswill et al. in 1997 to investigate its role in modulating exopolysaccharide synthesis. Its overexpression in strains containing high-copy number plasmids carrying the rcsB gene, caused an increased amylovoran production while suppressing levansucrase activity (Bereswill and Geider 1997). It has been demonstrated that RcsB forms a heterodimer with RcsA to form a protein-DNA complex acting cooperatively to bind to the promoter of the ams operon responsible for amylovoran biosynthesis. More precisely the specific binding of the complex RcsA-RcsB is to a region upstream of the translation initiation codon of amsG, the first gene in the ams operon in E. amylovora (Kelm et al. 1997; Wehland et al. 1999). The importance of the Rcs phosphorelay system in regulating amylovoran production and its influence on virulence and pathogenicity were proved by gene knockout experiments and site directed mutagenesis (Wang et al. 2009; Ancona et al. 2015). The 24 kDa RcsB protein is composed of two domains, the N-terminal domain contains a phosphorylation motif with three aspartic acid (Asp10, Asp11 and Asp56), while the C-terminal domain features a helix-turn-helix (HTH) DNA binding motif (residues 151–194) homologous to LuxR-type regulators (Henikoff et al. 1990) The solution structure of the C-terminal DNA-binding domain of RcsB (C-RcsB, residues 129–215) was determined by heteronuclear magnetic resonance spectroscopy (NMR) in 2003. Beside the structure determination, Pristovsek et al. carried out the characterization of the interaction between RcsB and one of its target DNA, the RcsAB box, and the identification of the residues involved by NMR chemical shift perturbation mapping (Pristovsek et al. 2003). Twenty energy-minimized NMR structures were selected to represent the ensemble of possible conformations in the native state in solution. The structure of C-RcsB features five helices (from α6 to α10) of which the central ones α8 and α9 are part of the HTH motif and interacts with α7 while α10 completes the hydrophobic core of the domain (Fig. 1). The three helices α7 to α9 are held together by a cluster of hydrophobic residues side chains (Val158, Ile171, Leu175, Ile182). The residues of C-RcsB involved in RcsAB box DNA binding were identified, showing specific but weak interactions, by NMR chemical shift perturbation mapping (Lys153, Val158, Leu168, Thr170, Arg177, Ser178, Ile179, Thr181, Ile182). The identified residues are mostly located in the HTH motif (α8 to α9) and α7 (Fig. 1). The overall fold of the protein was not perturbed by the interaction with the specific DNA confirming that the affinity of RcsB to the RcsAB box is lower in the absence of RcsA of one order of magnitude. In conclusion heterodimerization of Rcsb with RcsA is not necessary for DNA interaction although the formation of the heterodimer RcsB-RcsA substantially increases DNA binding affinity for the RcsAB box (Pristovsek et al. 2003).

Fig. 1
figure 1

Cartoon representation of the NMR structure of RcsB DNA binding motif. Helices are in red and loops in green. Highlighted in Cyan are the residues identified by NMR chemical shift perturbation mapping (Lys153, Val158, Leu168, Thr170, Arg177, Ser178, Ile179, Thr181, Ile182, identified residues are mostly located in the HTH motif, α8 to α9 and α7)

Levansucrase

E. amylovora produces amylovoran and levan, two exopolysaccharides which are components of its protective biofilm (Koczan et al. 2009). The proteins involved in the biosynthesis of amylovoran, considered one of the major pathogenicity factors, are encoded in the ams operon (Langlotz et al. 2011), while the synthesis of levan, proposed to be a virulence factors of the bacterium, is carried out by the enzyme levansucrase (Lsc, EC 2.4.1.10) (Zhao et al. 2005; Geier and Geider 1993). In low sucrose concentrations, Lsc catalyzes the hydrolysis of sucrose giving fructose and glucose. In high sucrose concentrations, as the one encountered during flower infection into the nectaries, a transfructosylation reaction occurs, in which a fructosyl moiety is transferred from an enzyme-fructosyl intermediate to an acceptor, in a ping-pong mechanism (Chambert et al. 1974). The acceptors can either be sucrose or fructose thus giving rise to short chain fructo-oligosaccharides (FOS), or a long chain levan.

E. amylovora Lsc (EaLsc) was recombinantly produced and crystallized and its crystal structure solved (Caputi et al. 2013a; Wuerges et al. 2015). EaLsc predominantly produces short-chain FOS of three to six units (Caputi et al. 2013b) while the Gram-positive Bacillus megaterium SacB produces long chain levan (Strube et al. 2011). EaLsc may have evolved towards shorter FOS production for two main reasons: to help the pathogen survive the high sucrose concentration in the nectar during flower infection, and to provide glucose as a building block for the biosynthesis of the pathogenicity factor amylovoran. EaLsc shows the typical fold of glycoside hydrolase families 32 and 68 (Lammens et al. 2009) featuring a five-bladed β-propeller. Four twisted strands make up each β-sheet, with the outer β-strand almost perpendicularly oriented to the inner strand. A forty residue long ‘clamp’, begins at the N-terminus and stretches up to the first blade, wrapping around half the circumference of the propeller (Fig. 2).

Fig. 2
figure 2

Cartoon representation of the crystal structure of E. amylovora levansucrase. Helices are depicted in red, loops in green and β-strands in yellow. In sticks are represented the products of sucrose hydrolysis, glucose and fructose, in the active site cavity (carbon in grey and oxygen in red)

EaLsc crystal structure was compared to those of fructosyltransferases from Gram-positive and Gram-negative bacteria with the aim of contributing to the identification of determinants of the diverse product spectra. Together with a structural alignment of the available Lscs, Wuerges et al. performed a multiple sequence alignment of levansucrases from different Erwinia species and strains to identify differences between pathogenic and non-pathogenic species (Wuerges et al. 2015). While in E. amylovora deletion of the levansucrase gene lsc reduces the virulence of the bacterium (Geier and Geider 1993), in Erwinia tasmaniensis the lsc gene produces Lsc but the bacterium is epiphytic and non-pathogenic (Kube et al. 2010). Potentially significant deviations in EaLsc is Phe198 that is changed to Tyr198 in E. tasmaniensis levansucrase (EtLsc), which is the same amino acid found in the long chain levan producing BmSacB (Strube et al. 2011). The comparison between EaLsc and EtLsc recently published support the hypothesis that the role of Lsc in both bacteria is mostly related to survival and metabolism rather than strictly connected to the virulence or pathogenicity of E. amylovora, as both bacteria share the same 3D structure and very similar reaction products profile (Polsinelli et al. 2019b).

AmsI

One of the most important pathogenicity factor of E. amylovora is amylovoran, an exopolysaccharide characterized by the repetition of units containing galactose, glucose, glucuronic acid and pyruvate (Nimtz et al. 1996). Amylovoran biosynthesis requires the concerted action of 12 genes (from amsI to amsL) clustered in the 16 kb ams region in E. amylovora genome (Bugert and Geider 1995). The AmsI protein, coded by the amsI gene, is a key enzyme of amylovoran metabolism. AmsI is a protein tyrosine phosphatase (PTP) involved in signaling (Bugert and Geider 1997). The kinase-phosphatase dual system has a critical role in bacterial virulence and cell signaling, PTPs catalyse the dephosphorilation of cognate kinase phospho-tyrosine substrates which in E. amylovora is represented by AmsA. AmsI overproduction leads to inhibition of amylovoran synthesis due to increased dephosphorilation of the cognate kinase (Bugert and Geider 1997). AmsI is a member of the low molecular weight protein tyrosine phosphatases family (LMW-PTP with a Mr of about 18 kDa), which are found both in Prokarya and Eukarya. LMW-PTPs share a consensus sequence CXGNXCRSP, where X could be leucine, isoleucine, threonine or phenylalanine, part of a conserved P-loop motif (Caselli et al. 2016). LMW-PTPs share the overall fold, the P-loop at the N-terminal region and the catalytic residues. Variability is observed among the residues forming the protein surface around the active site, important for the interaction with the cognate kinase. The interaction between Escherichia coli Wzb (analog of AmsI) and Wzc (analog of AmsA, AmsI cognate kinase) has been studied in detail by Temel et al. (Temel et al. 2013). In LMW-PTPs the active site residues form a cradle that provides hydrogen-bonding interactions with the phosphate group of the substrate (Evans et al. 1996). The catalytic mechanism of LMW-PTPs starts with the formation of a covalent phospho-enzyme intermediate involving as a nucleophile the Sγ-atom of a cysteine. This first step is assisted by the protonation of the leaving group by a conserved aspartate residue. Substrate binding and stabilization of the reaction intermediate are favored by a conserved arginine. In the second step of the reaction the phospho-cysteine intermediate is attacked by a water molecule regenerating the free enzyme and releasing inorganic phosphate (Denu and Dixon 1995; Denu et al. 1996).

In order to investigate the properties of AmsI, Salomone-Stagni et al. solved its crystal structure to a resolution of 1.57 Å and studied its activity by a steady state kinetic analysis (Salomone-Stagni et al. 2016). AmsI active site contains a water molecule that bridges the catalytic arginine and a sulfate molecule derived from the crystallization condition (Fig. 3). The sulfate in AmsI is mimicking the phosphate group of the substrate while the water molecule, nestled within the P-loop in close proximity of both the sulfate and the catalytic cysteine, was proposed to attack the phospho-cysteine intermediate during the second step of the reaction.

Fig. 3
figure 3

Cartoon representation of AmsI. In cyan the helices, in magenta the strands and in salmon the connecting loops. Represented as sticks is the sulfate molecule bound in the active site, with the sulfur in yellow and the oxygen in red

The kinetic study on AmsI showed that citrate pH 5.5 is the optimal buffer with a decreased activity towards basic pH. To complete the investigation, homology modelling and molecular docking were used to propose a model of interaction between AmsI and AmsA. Potential putative interaction patches on the surface of the two proteins, to be validated by mutational studies, were identified and reported in the supplementary information of the published paper (Salomone-Stagni et al. 2016).

AmyR

AmyR is the protein product of the amylovoran repressor gene amyR, a member of the ybjN gene family (Wang et al. 2012). In E. coli ybjN has been described as a stress-related gene, as it depends on the temperature and the bacterial growth stage to be expressed (Chen et al. 2006). Studies have demonstrated that the over-expression of AmyR in E. amylovora negatively regulates the production of amylovoran and levan and that the amyR knockout produces 8-fold higher amylovoran than the control (Wang et al. 2012).

Bartho et al. reported the crystal structure of E. amylovora AmyR, the first representative structure of a YbjN protein from an Enterobacteriaceae species (Bartho et al. 2017). AmyR biologically functional assembly is a homodimer. The highest structural similarity of AmyR is with proteins belonging to class I T3C (Type 3 secretion chaperones) but showing some deviations in length and orientation in peripheric loops and secondary structure elements. Despite such structural similarity the role of AmyR is not as a secretion chaperone. The genomic location of amyR does not conform with the position of genes of secretion chaperones class I T3C which are located near the genes of the corresponding type 3 effectors (Lohou et al. 2013) The sequences alignment of 100 YbjN family proteins from a selection of species showed that the most conserved region was the stretch of residues 75–88 that is part of a hydrophobic dimerization interface. The second conserved region, comprising residues 13–24 could represent a site for protein-protein interaction similarly to class I T3Cs beside a plausible role of stabilizing protein fold (Fig. 4).

Fig. 4
figure 4

Left panel: cartoon representation of the biologically relevant dimer from the crystal structure of AmyR. One subunit is represented in cyan while the other one in green. Right panel: AmyR conserved regions are depicted in red and the surface in transparent grey

The gene regulation effect of AmyR was shown as a direct result of the presence of the AmyR protein, and not just co-regulation of the AmyR promoter since overexpression of AmyR under a non-native promoter showed changes in transcriptional levels of other genes (Wang et al. 2012). Bartho et al. proposed that AmyR could influence transcriptional regulation through an inhibitory binding, a stabilising binding, or a combination of the two. In the case of AmyR, one hypothesis is that the observed variations between the wild-type, the amyR knockout and the over expression mutant is because proteins of regulatory pathways could be bound and sequestered in a dose dependent manner (Bartho et al. 2017).

HsvA

Most of E. amylovora pathogenicity related genes are located in a Hrp pathogenicity island of its genome. Within the pathogenicity island, two operons are predicted to contain genes for the expression of proteins with enzymatic properties. They are controlled by the alternative σ factor HrpL during infection suggesting a role in bacterial virulence (Oh et al. 2005). The amidinotransferase HsvA is the product of the gene hsvA that together with hsvB and hsbC is found in the region of Hrp-associated enzymes required for full virulence (Oh et al. 2005). The reaction mechanism of amidinotransferases involves the action of three key residues in the active site, cysteine, histidine and aspartic acid, in a double displacement (ping-pong) mechanism as the access to the active site does not allow simultaneous access to both amidino donor and acceptor during catalysis (Humm et al. 1997a, b). A biochemical characterization and protein structure determination was carried out in 2017 with the aim of understanding the mechanism of action, the substrate specificity and suggest a possible role in virulence (Shanker et al. 2017). The amidinotransferase activity was investigated on a panel of 16 candidate acceptors using arginine as an amidino donor and detecting the amount of ornithine released during a 30 min course reaction. The most efficient acceptor was putrescine (with a chain of four carbons between primary amines) while other polyamines from two to six carbons length varied in activity. Considering that E. amylovora genome contains a gene for the expression of a spermidine synthase (SpeE) the authors found that HsvA was far more efficient when using spermidine as an acceptor rather than putrescine. The X-ray crystal structure was determined showing that HsvA is a homodimer, with an overall fold of each monomer arranged as an α/β propeller domain. The α-helices and β-strands are organized into five ββαβ modules around a pseudo 5-fold symmetry axis, resembling a basket. The active site contains the conserved amino acids Cys351, His249, and Asp200, forming a catalytic arrangement similar to that of cysteine proteases. These amino acids are at the base of a long narrow channel running from the protein surface to the active site cavity. Three glutamate residues are located at the entrance of the tunnel on the protein surface conferring a negative charge that could help positioning positively charged substrates (i.e., polyamines) into the channel. The arginine binding site is characterized by six amino acids in the same arrangement as in the human L-arginine:glycine amidinotransferase (Humm et al. 1997b). The preference for putrescine and spermidine is explained by the presence of Glu244 located just below the opening of the channel leading to the active site and about 11 Å from the catalytic cysteine, at a distance compatible with the length of putrescine and spermidine (Fig. 5).

Fig. 5
figure 5

Cartoon representation of the biologically relevant homodimer of HsvA. Chain A is in green while chain B in cyan. Highlighted in red is Glu244 located just below the opening of the channel leading to the active site and regulating substrate selectivity

The role of HsvA in E. amylovora pathogenicity was proposed to be that of regulating the level of polyamines, possibly enhancing the fitness of the bacterium during infection, or by supporting the production of virulence factors (Shanker et al. 2017).

GalU

The biosynthesis, regulation and secretion of E. amylovora exopolysaccharide amylovoran requires the concerted action of the proteins encoded by the twelve genes in the ams operon (Bugert and Geider 1995). The main amylovoran building block galactose is provided by enzymes of the glucose and galactose metabolism. Glucose-1-phosphate uridylyltransferase (GalU; UDP-glucose pyrophosphorylase, EC 2.7.7.9) catalyses the reaction between α-d-glucose 1-phosphate (Glc-1P) and UTP to give UDP-glucose and pyrophosphate. UDP-glucose is then converted by the epimerase GalE into UDP-galactose, the main building block of amylovoran (Metzger et al. 1994; Wagstaff et al. 2015).

The reaction mechanism of GalUs to obtain UDP-glucose starts with the binding of UTP to the enzyme in the presence of a Mg2+ ion followed by the substrate Glc-1P. The phosphate group of Glc-1P is positioned to favor the nucleophilic attack to the α-phosphate of UTP. Kim et al. proposed a mechanism for Helicobacter pylorii HpGalU, in which the substrates bind in sequential order, first Mg2+ and UTP followed by Glc-1P to form the reaction complex in a sequential Bi Bi mechanism, giving as final products UDP-glucose and PPi (Kim et al. 2010). E. amylovora GalU (EaGalU) may represent a good target to block amylovoran biosynthesis as its gene is located upstream of the ams gene cluster therefore crucial for amylovoran production. Besides the importance of this class of enzymes for bacterial survival the potential use of EaGalU in biotechnology and in chemo-enzymatic synthesis makes it a good candidate for further studies. The structure, the enzymatic activity of EaGalU and its substrate specificity have been determined for a better understanding of its structure and function relationship (Benini et al. 2017). EaGalU was recombinantly expressed in E. coli and crystallized (Toccafondi et al. 2014). EaGalU structure determined by protein X-ray crystallography shows two EaGalU molecules in the asymmetric unit that by crystallographic symmetry form the dimer of dimers to give the biologically functional homotetramer observed in bacterial GalUs (Fig. 6). EaGalU tertiary structure is characterized by a central open β-sheet surrounded by two small outlying β-sheets and α-helices, similar to the Rossmann fold characteristic of nucleotide-binding proteins (Unligil and Rini 2000).

Fig. 6
figure 6

Cartoon representation of the biologically relevant EaGalU homotetramer, a dimer of dimers, and colored according to secondary structure elements. In one dimer the helices are colored in red, the strands in yellow and the turns in green, in the other dimer the helices in cyan, the strands in magenta and the turns in salmon

Kinetic analysis in the synthetic direction for EaGalU revealed values of Km that are in the μM range (7.7 ± 0.9 μM for Glc-1P and 27.3 ± 2.6 μM) comparable with the one of the GalU enzyme from E. coli (EcGalU) (Wagstaff et al. 2015). Substrate specificity of EaGalU was investigated using a range of sugar 1-phosphates and UTP and the reaction products were analysed by 1H NMR. EaGalU showed a clear preference for its natural substrate α-d-glucose 1-phosphate (Glc-1P) with 100% conversion to UDP-glucose. The tested gluco-configured molecules were also substrates and converted to the corresponding UDP-sugar derivative. A conversion of 100% was observed for α-d-glucosamine 1-phosphate (GlcN-1P) and for the pentose α-d-xylose 1-phosphate (Xyl-1P) while N-acetyl-α-d-glucosamine 1-phosphate (GlcNAc-1P) showed a 74% conversion. The C2 epimer of the natural substrate α-d-glucose 1-phosphate, α-d-mannose 1-phosphate (Man-1P) is still converted to 70%. The enzymatic performance is much lower using galacto-configured substrates (C4 epimers) and α-d-galactose 1-phosphate (Gal-1P) is converted very slowly to 28% and no conversion was observed in the case of α-d-galactosamine 1-phosphate (GalN-1P) and for α-d-galacturonic acid 1-phosphate (GalA-1P). In silico molecular docking computations were employed to rationalize the different measured enzymatic activities and to understand the mechanism by which the different sugar 1-phosphates bind to the enzyme and initiate the reaction. The binding mode of all the sugar 1-phosphates used were simulated using the EaGalU structure. The formation of an H-bond between Lys202 Nζ and the phosphate group of the sugar 1-phosphates together with the establishment of an H-bond between Glu201 Oε and O2 on the sugar ring stabilizes the interaction between EaGalU and the cognate 1-phosphates sugars, and was proposed to help positioning of the substrate thus favoring phosphate nucleophilic attack to UTP. These results were proposed to serve as a guide for selecting sugar 1-phosphates with the ideal configuration to obtain UDP-sugar derivatives by EaGalU (Benini et al. 2017).

Desferrioxamine pathway, DfoJ, DfoA, DfoC

Iron is an essential nutrient in almost every living organism. The siderophore desferrioxamine E (DFO-E) plays an important role in E. amylovora pathogenesis as iron bioavailability in the host is very low. DfoJ, DfoA and DfoC are the enzymes responsible for desferrioxamine biosynthesis starting from lysine. (Fig. 7).

Fig. 7
figure 7

Schematic view of the desferrioxamine E biosynthetic pathway

In E. amylovora a single gene cluster (dfoJAC operon) encodes for the three proteins required for performing the four enzymatic reactions necessary for the biosynthesis of DFO-E (Polsinelli et al. 2019a). Mutants of E. amylovora CFBP1430, defective in the siderophore biosynthesis (disrupted dfoA gene) or siderophore uptake (ferrioxamine receptor FoxR defective) showed a two orders of magnitude reduced growth on apple flowers compared to the wild-type (Dellagi et al. 1998). Desferrioxamines (DFOs) consist of diamine and dicarboxylic acid building blocks linked by amide bonds. Desferrioxamine E is the major product of DFO biosynthesis in E. amylovora while other DFOs (D2, X1–7 and G1–2) are produced in lower quantities (Feistner et al. 1993; Feistner 1995).

Beside scavenging iron from the host, DFO-E protects the bacterium against the oxidative burst triggered by the plant against the infection, and could also enhance the oxidative stress induced by hairpins causing an increase of electrolyte leakage from the host plant (Dellagi et al. 1998).

With the complete structural characterization of the whole pathway Salomone-Stagni et al. confirmed that E. amylovora DFO-E biosynthesis begins with the decarboxylation of lysine by DfoJ, a lysine decarboxylase (group II pyridoxal-dependent enzymes, PFAM: PF00282) to give cadaverine. Cadaverine is immediately monooxygenated by DfoA, thus generating N-hydroxyl-cadaverine. In the last steps DfoC first catalyzes the condensation of N-hydroxyl cadaverine with a succinyl moiety, to give N-5-aminopentyl-N-(hydroxyl)-succinamic acid, which is then promptly processed by DfoC synthetase domain, for trimerization and cyclization, to give desferrioxamine E. Beside the structural characterization, in vitro activity of DfoA and DfoJ was also tested in the same work, to confirm the role of these enzymes in the pathway (Salomone-Stagni et al. 2018b).

DfoJ is composed of three domains: the N-terminal domain (NTD), the PLP-binding domain and the C-terminal domain (CTD). The NTD is composed of three α-helices and a long loop connecting the NTD to the large PLP binding domain which is formed by a seven-stranded β-sheet surrounded by α-helices. The CTD is small and formed by a four stranded anti-parallel β-sheet and three α-helices located opposite to the large PLP binding domain. DfoJ forms stable biologically relevant dimers that are related by crystallographic symmetry, as observed in the crystal, and confirmed by PDBePISA (Krissinel 2010). The PLP binding site is located at DfoJ dimer interface and residues from both subunits are making up the active site and are involved in the reaction mechanism.

DfoA is a FAD and NADPH dependent N-hydroxylating monooxygenase/hydroxylase (NMO) belonging to the IucD family of enzymes (PFAM PF13434) (Olucha and Lamb 2011). The structure of DfoA, solved in its holoform to 2.8 Å resolution contains the ternary complexes of DfoA with both FAD and NADP+. DfoA belongs to class B flavoprotein monooxygenases (van Berkel et al. 2006), contains two Rossmann-fold dinucleotide-binding domains for binding FAD and NADP(H) and forms biologically functional homotetramers. Homotetrameric oligomerization is also observed in other bacterial and fungal flavoprotein monoxygenases belonging to the NMOs subgroup such as the ornithine hydroxylase from Kutzneria sp. 744 (Setser et al. 2014), and from Aspergillus fumigatus (Franceschini et al. 2012).

KtzI from Kutzneria sp. 744 (Setser et al. 2014) is the most similar structure to DfoA as shown by a PDB search carried out with PDBeFOLD (Krissinel and Henrick 2004). To discover the determinant of substrate selectivity, a cadaverine molecule was manually placed in DfoA active site using as a guide the position of ornithine in KtzI. In DfoA Ala60 and Leu238 allow hydrophobic interactions with the aliphatic portion of the cadaverine, the backbone oxygen of Tyr268 may bind to cadaverine N1, and Gln55 contributes to the formation of the cadaverine binding pocket. DfoA Asp392 sidechain is in a perfect position to bind cadaverine N7 amino group. Furthermore, Thr241(Oγ) may form a hydrogen bond with cadaverine N7 amino group (Salomone-Stagni et al. 2018b).

Among siderophore biosynthesis enzymes DfoC is an unusual example as the succinyl transferase and siderophore synthetase domains are part of a single protein. DfoC N-terminal domain is responsible for the transfer of a succinyl moiety from a succinyl-CoA donor to the primary amine of N-hydroxyl-cadaverine and is therefore a succinyl transferase domain. The C-terminal domain belongs to the type-C siderophore synthetases (Non-Ribosomal Peptide Synthetase (NRPS)-Independent Siderophore (NIS) protein family (Challis and Naismith 2004)). Within the synthetase domain the nucleotide triphosphate (NTP) required for the reaction, binds at the base of the substrate binding pocket. Access to the NTP binding site would be blocked when the N-5-aminopentyl-N-(hydroxyl)-succinamic acid substrate is loaded. The authors identified a “channel” that would allow NTP recycling without requiring the major substrates to be released (Salomone-Stagni et al. 2018b). DfoC structure features a large cavity bridging across the DfoC dimerization interface that could allow the ‘head’ of N-5-aminopentyl-N-(hydroxyl)-succinamic acid to enter into the binding pocket of one subunit, and the ‘tail’ to enter into the binding pocket of the other subunit, therefore suggesting a cooperative synthesis of the DFO-E ring structure (Salomone-Stagni et al. 2018a, b). The expression of the N-terminal succinyl transferase and the C-terminal siderophore synthetase domains in a single protein might suggest cooperativity or coordinated substrate transfer but the active site locations of the N- and C-terminal domains do not. The expression of these two enzymes as a single protein may still improve substrate trafficking between their active site even if the substrate is not actively transported. The N-terminal domains may move as ‘balls on a string’ thanks to the linker loops connecting them to the dimerized C-terminal domains. This movement would bring the active sites of each domain closer to each other, enabling faster and more efficient substrates delivery from the N-terminal to the C-terminal domains Salomone-Stagni et al. 2018b).

SrlD

Sorbitol is used as a transport carbohydrate in plants belonging to the Amygdaloideae subfamily including apple, pear and quince (Zimmermann and Ziegler 1975) that are hosts for E. amylovora (Vanneste 1995). Sorbitol is used by E. amylovora as a carbon source thanks to the sorbitol operon (srlA, E, B, D, M, R genes) which encodes the six proteins required for sorbitol uptake. SrlD is a member of the short-chain dehydrogenases/reductases (SDRs), that catalyze NAD(P)(H)-dependent redox reactions. It is a sorbitol-6-phosphate 2-dehydrogenase (S6PDH; EC 1.1.1.140) that catalyzes the interconversion of d-sorbitol 6-phosphate (glucitol 6-phosphate; S6P) to d-fructose 6-phosphate (F6P). Mutations in the sorbitol operon of E. amylovora impair sorbitol utilization in apple shoots, thus preventing efficient colonization of host plant tissue (Aldridge et al. 1997). It has been proposed that, differences in host specificity between the Spiraeoideae infecting strains and the Rubus infecting strains might be due to the lack of a complete sorbitol operon in the Rubus infecting strains (Borruso et al. 2017). For a better understanding of SrlD mechanism of action its activity and 3D structure were determined together with structural and sequence comparisons (Salomone-Stagni et al. 2018a). The structure of SrlD was solved to a maximum resolution of 1.84 Å. SrlD structure features a seven strands (βA-βG) central β-sheet, surrounded by 10 α-helices, a typical Rossmann fold characteristic of the SDR superfamily (Oppermann et al. 2003; Persson and Kallberg 2013). SrlD quaternary structure is a homotetramer. Each monomer has two interaction surfaces which are involved in establishing contacts with equivalent surfaces of two neighboring monomers with about 25% of the monomer surface buried in the contact areas. The cofactor NAD+/NADH and substrate S6P binding site is evident on the protein surface, as a large pocket coming from the protein core and ending with a small channel. SrlD active site features the conserved catalytic tetrad (Asn112, Ser141, Tyr154, Lys158) as previously determined for bacterial SDRs (Philippsen et al. 2005). In the active site, and relevant for selectivity towards S6P rather than unphosphorylated sorbitol (towards which the protein is inactive) is Lys142. Moreover the authors propose that the substrate S6P would enter into the active site keeping the phosphate group near the protein surface where it would interact with four lysines (Lys142, Lys149, Lys203 and Lys218) that would form H-bonds with it (Salomone-Stagni et al. 2018a) (Fig. 8).

Fig. 8
figure 8

The entrance to SrlD active site showing the four lysines (Lys142, Lys149, Lys203 and Lys218) proposed by Salomone-Stagni et al. to be involved in substrate selectivity and orientation and located at the entrance of the active site channel

AvrRpt2

In order to suppress the host organism’s defenses, E. amylovora secretes several proteins into the host cytoplasm via the Type 3 Secretion System (T3SS) (Vrancken et al. 2013; Borruso et al. 2017). The cysteine protease AvrRpt2 is one of the type III effector proteins which are secreted by E. amylovora (AvrRpt2EA) (Zhao et al. 2006; Khan et al. 2012; Schropfer et al. 2018) AvrRpt2 is activated by peptidyl-prolyl cis/trans isomerization after secretion inside the host cell by T3SS (Coaker et al. 2006). Prolyl cis/trans isomerization of Pseudomonas syringae AvrRpt2 (AvrRpt2PS) depends on the eukaryotic cyclophilin family of peptidyl prolyl cis/trans isomerases (PPIases) (Aumuller et al. 2010), and occurs at 4 conserved GPxL motifs (Coaker et al. 2006). AvrRpt2PS and AvrRpt2EA are both activated in a cyclophilin-dependent manner as they share high sequence similarity and conservation of the GPxL motifs. Upon secretion into the host cell cyclophilin-dependent autocatalytic cleavage of an N-terminal fragment occurs, between the conserved residues G45 and G46 (AvrRtp2EA) or G71 and G72 (AvrRpt2PS) (Jin et al. 2003; Zhao et al. 2006). Cleavage of the N-terminal fragment has no effect on proteolytic activity (Chisholm et al. 2005; Aumuller et al. 2010).

The secretion of AvrRpt2EA into the host cell can increase virulence, but is not essential for E. amylovora infection (Zhao et al. 2006). In Malus × robusta 5 the presence of AvrRpt2EA is recognised by FB_MR5, the only identified resistance protein in a Malus species, thus preventing E. amylovora infection (Vogt et al. 2013). E. amylovora strains carrying an AvrRpt2EA mutation of cysteine 156 (Cys156) to a serine (Ser156) are able to overcome resistance. This naturally occurring mutation was found in several E. amylovora strains in North America and reproduced in AvrRpt2EA deletion strains by Vogt et al. (Vogt et al. 2013). The induced expression of AvrRpt2EA, in transgenic plants of the fire blight-susceptible cultivar Pinova caused shoots necrosis, browning of older leaves and increased expression of salicylic acid (Schropfer et al. 2018).

The structure of AvrRpt2EA70–222 (from residue number 70 to the C-terminal) was solved by X-ray crystallography providing insights into the cyclophilin-mediated maturation of AvrRpt2, and the mechanism of recognition by FB_MR5 (Bartho et al. 2019). AvrRpt2EA70–222 consists of a six stranded anti-parallel β-sheet flanked on both sides by 5 α-helices. AvrRpt2EA70–222 retains a conserved fold with other protease-like protein domains despite highly divergent sequences and different biological functions. The active site is located in a cleft formed by the connecting loops between beta-strands. Besides the residues responsible for the proteolytic cleavage (Cys88, His173, and Asp191), the cleft contains a negatively charged internal area and, at its edge, a positively charged arginine sidechain (Arg194) possibly contributing to substrate binding and specificity. The four conserved cyclophilin-binding motifs (CBM1 - CBM4) are present in the AvrRpt2EA70–222 structure in the trans configuration, as the crystallized protein was never exposed to a eukaryotic cyclophilin. The four CBMs are located in loops (CBM1, CBM3) or next to secondary structure elements (CBM2, CBM4). It was hypothesized that the structural changes induced by proline isomerization at CBM4 (residues 195–198, GPDL) would reposition the catalytic residue Asp191 within the active site, thus altering the surrounding substrate binding cleft and possibly modifying the substrate binding area and substrate specificity (Bartho et al. 2019). The structure of AvrRpt2EA70–222 shows that Cys156, implicated in recognition by the FB_MR5 resistance protein of Malus ×robusta 5, is on the opposite side of the protein in respect to the active site catalytic triad (Cys88, His173 and Asp191). Cys156 is thus not involved in substrate binding at the catalytic site, or in determining substrate specificity of AvrRpt2EA. In the structure of AvrRpt2EA70–222, Cys156 is co-located with CBM3 (residues 159–163, GPIMF, Fig. 9), a post-translational modification, or even a disulfide bridge formation may interfere with cyclophilin binding, blocking cis-trans isomerisation of the Gly-Pro site and preventing the AvrRpt2EA from activating or from interacting with plant host factor. These types of post-translational modification would not be possible on a serine therefore explaining the different behaviour of the two genetic variants (Cys156 vs Ser156) (Bartho et al. 2019). The resistance of Malus × robusta 5 to E. amylovora strains featuring AvrRpt2EA-Cys156 can be due to either FB_MR5 direct interaction with the Cys156 of AvrRpt2EA, or to a post-translational modification of Cys156 which would not be possible with a serine instead of a cysteine. Another hypothesis proposed by Bartho et al. is that the FB_MR5 resistance protein may recognize both the AvrRpt2EA-Cys156 and AvrRpt2EA-Ser156 forms, but only the latter would be correctly processed and able to suppress the resistance-signaling pathway triggered by FB_MR5 (Bartho et al. 2019).

Fig. 9
figure 9

Cartoon representation showing that in AvrRpt2EA70–222 Cys156 is co-located with CBM3 (residues 159–163, GPIMF)

Conclusions

The availability of sequenced genomes and the current technology allows to study the structure and function of proteins and enzymes reaching a high level of details. The structural data deposited and available in the PDB enable, researcher from different fields and background, to use the results accomplished by the structural biologists and get a better understanding of the structure and function relationships of macromolecules. Structural and functional characterization of proteins could provide, not only a deeper understanding of the biology of the organism under investigation, but also starting points for developing molecules to modulate or inhibit enzymatic activity and to control diseases.

This review reports examples of the wealth of information provided by structural biology for a better understanding of E. amylovora biology. The study of the transcriptional regulator RcsB allowed the identification and location of the residues involved in DNA binding. Ealsc structure in complex with glucose and fructose represents a snapshot of the reaction mechanism showing the product of sucrose hydrolysis trapped in the active site. In the crystal structure of AmsI it was possible to identify the water molecule supposed to attack the phospho-cysteine intermediate in the second step of AmsI reaction. In the case of AmyR the structural comparison of its crystal structure against the PDB database was crucial to discover the 3D similarity with T3Cs and to locate the conserved residues, as bioinformatics on its aminoacidic sequence alone could not give any clue about it. HsvA structure determination, beside confirming its role as a polyamine amidinotransferase, allowed not only the characterization of the active site, but also the identification of the residue (Glu244) that determines the enzyme substrate specificity. The structure of EaGalU was used as a model for molecular docking calculation to explain the different enzymatic efficiency towards a series of phosphorylated sugars. The structural characterization of the whole desferrioxamine E biosynthetic pathway confirmed the role of the enzymes involved, as well as, providing unprecedented structural information for DfoC. The structure of DfoC is the first of a siderophore synthetase coupling an acyltransferase domain at its N-term and a Non-Ribosomal Peptide Synthetase (NRPS)-Independent Siderophore (NIS) domain at its C-term. With the identification of four lysine (Lys142, Lys149, Lys203 and Lys218) around the entrance of the active site, the structure of SrlD provided an explanation for its enzymatic selectivity. SrlD is indeed active only towards phosphorylated sorbitol while completely inactive on sorbitol. The four lysines would favor binding of phosphorylated sorbitol into the active site for the catalysis thus providing the required selectivity. Finally, the structure of AvrRpt2EA70–222 provided crucial information about the location of the CBM motifs and of Cys156. The latter is exposed on the surface of the protein and therefore prone to protein-protein interaction (e.g., by disulfide bridge formation) or post-translational modifications. The obtained information could not be determined by sequence comparison alone or in vivo experiments.

In conclusion structural biology is an important tool to provide information towards the knowledge of the molecular mechanisms by which biomolecules operate and interact with each other, thus pushing forward the frontier of biochemical science.