Introduction

The mission of the Seattle Structural Genomics Center for Infectious Disease (SSGCID) is to supply the scientific community with structural information that can facilitate structure-based design of new chemotherapeutics to combat pathogenic organisms [1, 2]. As of this writing, our gene-to-structure pipeline has generated nearly 300 protein structures from Category A, B and C microbial organisms designated by the National Institute of Allergy and Infectious Disease (NIAID). The SSGCID directs effort to obtain ligand-bound structures for select high-value targets, for which we have adapted fragment-based screening methods to our structural genomics pipeline. Fragment screening with a diverse, metabolite-based compound library allows both the testing of proven and the discovery of new chemical moieties which bind to a protein [36]. The size and reduced complexity of typical fragment molecules (≤300 Da) allow for a diverse sampling of chemical space with a compound library small enough (~1,500 compounds) for practical applications in crystallographic screening [710]. With adequate sampling of the natural metabolome, a single, all-purpose fragment library can retrieve small molecule binders for protein targets from a diverse array of infectious disease organisms.

In this study, we describe fragment screening methods using our Fragments of Life™ (FOL) library to generate co-crystal structures for an infectious disease drug target from Burkholderia pseudomallei (Bp). We conducted fragment screens with 2C-methyl-D-erythritol 2,4-cyclodiphosphate (MECP) synthase from B. pseudomallei (BpIspF) using nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography. This bacterium causes melioidosis, a disease with a current mortality rate of 20–50% which require months-long antibiotic regiments to clear the host system [11]. These factors, along with the ability of B. pseudomallei to frequently evade immune response, has led to its designation as an emerging pathogen and a potential bioterrorism agent by the NIAID[1214]. MECP synthase is part of the methyl-erythritol isoprenoid (MEP) biosynthetic pathway, an alternative metabolic pathway for isoprene synthesis not present in humans [15, 16]. Previous studies have shown the MEP pathway to be essential for certain bacteria as well as species of Plasmodium and other protozoans, with clinical efficacy demonstrated for drugs targeting the IspC enzyme, upstream of MECP synthase (IspF) in the pathway [1621]. Ongoing gene deletion studies with B. pseudomallei and B. thailandensis indicate a likelihood that every non-duplicated gene product from the MEP pathway is essential for bacterial growth [22]. Using an iterative fragment-based approach to screening followed by complex structure determination, we have deposited over a dozen ligand-bound structures of MECP synthase. This ensemble of ligand-bound complexes now serves to guide medicinal chemists and other researchers in developing novel antibacterial agents to treat B. pseudomallei infection and other pathogenic organisms for which the MEP pathway is essential.

Materials and methods

Protein expression and purification

2C-methyl-D-erythritol-2,4-cyclodiphosphate synthase (E.C. 4.6.1.12) from Burkholderia pseudomallei (BpIspF; target database ID: BupsA.00122.a) was expressed in E. coli using BL21(DE3)R3 Rosetta cells and autoinduction media in a LEX bioreactor. Starter cultures of lysogeny broth with appropriate antibiotics were grown for ~18 h at 37°C. Antibiotics were added to 2 L bottles of sterile ZYP-5052 auto-induction media and the bottles inoculated with overnight cultures. Inoculated bottles were then placed into a LEX bioreactor and cultures grown for ~24 h at 25°C. The temperature was then reduced to 15°C and grown for an additional ~60 h. To harvest, the media was centrifuged at 4,000 RCF for 20 min at 4°C. Cell paste was flash frozen in liquid nitrogen and stored at −80°C prior to purification. Frozen cells were re-suspended in lysis buffer (25 mM HEPES (pH 7.0), 500 mM NaCl, 5% (v/v) glycerol, 30 mM imidazole, 0.025% (w/v) sodium azide, 0.5% (w/v) CHAPS, 10 mM MgCl2, 1 mM TCEP, 250 ng/mL AEBSF, and 0.05 μg/mL lysozyme) and disrupted on ice for 30 min with a Virtis sonicator using alternating on/off cycles of 15 s. Cell debris was incubated with 20 μL of Benzonase nuclease (25 U/mL) at room temperature for 45 min, and clarified by centrifugation on a Sorvall SLA-1500 at 29,700 RCF for 75 min at 4°C.

Protein for X-ray crystallography was purified from clarified cell lysate by immobilized metal affinity chromatography. We used a His Trap FF 5 mL column (GE Healthcare) equilibrated with binding buffer (25 mM HEPES (pH 7.0), 500 mM NaCl, 5% (v/v) glycerol, 30 mM imidazole, 0.025% (w/v) sodium azide, 1 mM TCEP). The protein was eluted in the same buffer with 250 mM imidazole added. Size exclusion chromatography (SEC) was done using a HiLoad 26/60 Superdex 75 column (GE Healthcare) equilibrated in SEC buffer (20 mM HEPES (pH 7.0), 300 mM NaCl, 2 mM DTT, and 5% (v/v) glycerol). Pure fractions were collected and pooled from a single peak in the chromatogram, and concentrated using Amicon Ultra centrifugal filters. The final protein was concentrated to approximately 27 mg/mL, aliquoted into 100 μL tubes, flash frozen in liquid nitrogen and stored at −80°C.

Protein for NMR spectroscopy was purified using the above protocol but with removal of the affinity tag by incubation with His-tagged 3C protease. This was done after the first His Trap column purification, and was followed by gravity-flow purification on a Ni–NTA packed column to remove the tag, the 3C protease, and any uncleaved BpIspF. The tagless protein was collected in the flow-through and further resolved using the same SEC purification method as the first batch. The protein was concentrated using Amicon Ultra filters to approximately 30 mg/mL, aliquoted into 100 μL tubes, flash-frozen in liquid nitrogen and stored at −80°C.

Crystallization and fragment screening by x-ray crystallography

Robust, well-diffracting crystals of BpIspF protein were grown by sitting drop vapor diffusion over 1–2 days in trays incubated at 16°C. Drops for initial crystal formation of the uncleaved protein contain 0.5 μL protein solution (20 mg/mL of BpIspF in SEC buffer) mixed with 0.5 μL crystallization buffer (200 mM NaCl, 100 mM Tris–HCl (pH = 8.0), 20% (w/v) PEG 4000, 5 mM ZnCl2), with reservoirs containing 80 μL of crystallization buffer. Fragment soaking trays were prepared by adding 1.0 μL methanol drops containing up to 8 fragments at 6.25 mM each to individual crystal tray wells, and allowed to evaporate to dry film. Compounds were then resuspended in 1.0 μL crystallization buffer, and 2 or 3 apo crystals transferred to each well, with 20 μL of crystallization buffer added to the reservoir [3]. Preliminary testing with several cytosine derivatives indicated that up to 3 weeks of soak time is necessary to obtain bound ligands with sufficient occupancy for structure determination. Therefore, crystals of BpIspF were allowed to soak in primary FOL mixture pools for approximately 3 weeks prior to harvesting and data collection. Cryo-protectant solutions were prepared by resuspending dry film fragment pools in drops consisting of 0.3 μL ethylene glycol and 0.7 μL crystallization buffer. To deconvolute fragment mixtures and confirm small molecule identity, individual follow-up soaks were conducted in the same manner with individual fragments at 25 mM in the soak drop. Focused soaking trials with pools and individual molecules identified from NMR-based fragment screening were conducted in the same manner, with fragments at 10–25 mM in the soak drop.

Fragment screening by NMR spectroscopy

NMR samples were prepared by diluting tagless, concentrated BpIspF protein in SEC buffer to 20 μM (60 μM monomer) with NMR buffer (10 mM K-Phos (pH 7.8), 50 mM NaCl, 10% (v/v) 2H2O). Fragment pools were assayed at ligand concentrations of 400 μM, with 400 μM cytidine and 20 μL deuterated dimethyl sulfoxide (d6-DMSO) present in a 500 μL sample volume. All experiments were conducted on a 600-MHz Bruker AV spectrometer with TCI cryoprobe set to 280 K. Screening was done using ligand-observe, proton-based one-dimensional saturation transfer difference nuclear magnetic resonance (STD-NMR) [23] and two-dimensional nuclear Overhauser effect spectroscopy (NOESY) [24], according to previously published methods [25]. Briefly, 32 scans and 32,000 points were acquired over a 14 ppm sweep width for STD-NMR data, with a total recycle delay of 4.0 s for each mixture. A low-power 30-ms spin-lock pulse was added to filter out low-level protein peaks, and a WATERGATE sequence added to suppress bulk water signal [26]. STD-NMR pre-saturation was done using a 3.0 s-long train of Gaussian-shaped pulses with a spectral width of 600 Hz focused at −1.0 ppm, with reference irradiation set to 30 ppm. For NOESY experiments, 2,048 × 160 points were collected with a mixing time of 500 ms and a recycle delay of 2.0 s, with WATERGATE solvent suppression for each mixture. A total of 390 compounds in a dozen mixtures were tested; typical data from NMR screening is shown for Mixture #8 (Fig. 1).

Fig. 1
figure 1

a STD-NMR and 2D NOESY spectra with b zoomed-in region for a sample containing BpIspF, cytidine and 34 small molecule fragments from primary screening. Peaks visible in the STD spectrum (purple) and positive crosspeaks (red) generated by negative NOE signals are indicative of small molecule binders. c Chemical structures for all compounds in this mixture; binders are boxed and labeled, including FOL535 which was previously observed to bind BpIspF by X-ray crystallography (PDB: 3K14). Figure generated using iNMR software (http://www.inmr.net)

Structure determination by X-ray crystallography

The apo structure of BpIspF (PDB ID: 3F0D) was collected on beamline 23-ID-D at the Argonne National Laboratory. The 12 ligand-bound complexes discussed in this work (PDB IDs: 3F0G, 3IEQ, 3IEW, 3IKE, 3IKF, 3JVH, 3K14, 3K2X, 3MBM, 3P0Z, 3P10 and 3QHD) were collected in-house using a Rigaku SuperBright FR-E + X-ray generator with Osmic VariMax HF optics and a Saturn 944+ CCD detector. Diffraction data were reduced and scaled either with HKL2000 [27] or XDS/XSCALE [28]. Each structure was solved by molecular replacement and rigid body refinement using pre-existing structures of IspF (Table 1). Small molecule structures were built manually using Sketcher, and ligand–protein complex models refined using REFMAC5 [29], both part of the CCP4 program suite [30]. Refinement included TLS parameters with up to 8 TLS groups per chain in each monomer[31]. Final structures were obtained after numerous rounds of REFMAC5 [29] refinement and manual model-building using the Crystallographic Object-Oriented Toolkit (Coot) [32]. Each structure was evaluated using the MolProbity webserver [33] and internally peer reviewed, prior to validation and deposition with the Protein Data Bank [34, 35]. Diffraction data and refinement statistics for each structure are listed below (Table 1).

Table 1 Ligand binding stoichiometries, x-ray diffraction collection data, and refinement statistics for one apo and a dozen ligand-bound structures of MECP synthase from B. pseudomallei (BpIspF)

Results and discussion

Primary fragment screening of BpIspF by X-Ray crystallography

Fragment screening by X-ray crystallography has the advantage of obtaining structural data on the complex directly from screening efforts [3, 36, 37]. Preliminary crystallographic trials with BpIspF revealed this enzyme capable of forming relatively large, solid crystals of the stable trimeric holoenzyme which did not dissolve under fragment soaking conditions. The BpIspF protein used for all crystallographic experiments retains a histidine-based affinity tag which was not visible by X-ray diffraction, and does not appear to interfere with ligand binding by X-ray or NMR analysis. We therefore conducted a fragment screen of BpIspF with our Fragments of Life™ (FOL) library [3] by X-ray crystallography, exposing the target to approximately 1,500 different small molecules. To conduct the screen, pre-formed apo crystals of BpIspF were soaked in pools containing up to 8 different small molecules resuspended in crystallization buffer (see “Methods”) [3, 38]. Observing electron density for bound small molecules requires high concentrations, due to the relatively weak-binding affinity of most fragment-sized molecules, and the high concentration of protein (and active sites) present in the crystal [3, 36, 37]. Preliminary time-course experiments with BpIspF further indicated that 2–3 weeks was sometimes necessary to observe zinc-coordinated fragments in the active site, even with high (25 mM) concentrations of ligand added (data not shown). This is likely due to competition with a thermodynamically stable water molecule bound to the catalytic coordination site of the metal ion. Therefore, crystals were allowed to soak for at least 3 weeks per FOL mixture, to provide sufficient time for high occupancy of fragments in the crystal.

Diffraction analysis of BpIspF crystals soaked in FOL pools resulted in two distinct categories of fragment hits (Fig. 2). One set consisted entirely of cytosine derivatives which recapitulate the cytidylyl moiety of the native substrate while bound (Fig. 3) [39]. Uracil, thymine, and other pyrimidine nucleotide analogs contained within the FOL library were not recovered from crystal soaking experiments, demonstrating the specificity of cytosine binding for this sub-pocket. A second set of FOL hits comprised a variable series of heteroaromatic compounds which bound in an adjacent sub-pocket of the active site. The main mode of binding for these fragments is through heteroaromatic nitrogen coordination to the catalytic zinc ion of the protein (Fig. 3). No other sites on the trimeric holoenzyme were observed to bind fragments from primary screening by X-ray crystallography. Molecules of Tris, glycerol, and acetate appear along the threefold axis of the trimeric protein in some BpIspF structures (Fig. 4), but no FOL molecules were observed to bind in this region. Altogether, structures of 6 unique cytosine derivatives and 3 different zinc-binding ligands were recovered from a single FOL library screen by X-ray crystallography (Table 1). Furthermore, these ligands appear to bind all three active sites in their respective crystal structures, resulting in a 3:1 ligand:homotrimer binding stoichiometry (Table 1). Some differences between active sites have been observed, suggestive of phosphate hydrolysis in the crystal (see PDB ID:3IEW). This is likely due to low-level activity of BpIspF, whose native function includes metal-activated cleavage of a phosphodiester bond [15, 39]. Nevertheless, the three active sites of a given homotrimer appear fully occupied by each fragment hit identified from primary screening by FOL soaks and X-ray diffraction.

Fig. 2
figure 2

Chemical structures and PDB codes for a cytosine pocket, b zinc-site, and c external site BpIspF-binding fragments

Fig. 3
figure 3

Small molecules which bind BpIspF fall into three distinct categories: a cytidine pocket binders, including cytosine (yellow), cytidine (cyan), 5′-iodo-cytidine (green), CMP (navy), CDP (magenta) and CTP (white); b zinc-site binders FOL535 (magenta), FOL717 (navy), FOL8395 (cyan) and FOL955 (white); and c external site binders FOL694 (magenta) and FOL795 (cyan). A single protein crystal structure (PDB ID: 3P10) is depicted for clarity. Key interactions illustrated in a between cytidine, D48 of one monomer, and A102, P105 and A108 of the opposite monomer (black dashes). Cytidine and FOL955 (white) are illustrated in the active site for c. Figure generated using PyMol [57]

Fig. 4
figure 4

Structure ensemble of fragments bound to 2C-methyl-D-erythritol-2,4-cyclo-diphosphate synthase from Burkholderia pseudomallei (PDB IDs 3IEQ, 3IEW, 3IKE, 3IKF, 3JVH, 3K14, 3K2X, 3MBM, 3P0Z, 3P10 and 3QHD). The holoenzyme possesses three active sites, located in a solvent-exposed groove along each monomer–monomer interface (green, cyan, magenta). Each active site also contains a catalytic zinc ion (yellow spheres). Molecules of tris, glycerol, and acetate were observed to bind in the center of the trimeric protein, but no FOL compounds were found in this site. For clarity, a single protein crystal structure (PDB ID:3P10) is viewed along the threefold trimer axis. Figure generated using PyMol [57]

Primary fragment screening by NMR spectroscopy

In searching for additional fragment binders to complement our BpIspF structure ensemble, we applied NMR spectroscopy to screen a subset of the FOL library. Fragment screening by NMR spectroscopy can quickly distinguish binding from non-binding ligands in solution, and is often used for initial screens prior to crystallographic efforts [4042]. Preliminary studies showed BpIspF to remain a stable trimer in low-salt buffer and amenable to ligand-observe NMR-based screening. We therefore screened a druglike set of 390 compounds contained within the FOL library by preparing twelve samples of BpIspF, each containing cytidine and 30–40 compounds (see “Methods”). Cytidine was added to specifically probe for inter-ligand NOE signals (ILOEs) [24, 43] between itself and fragments with affinity for the adjacent binding site. Fragments were selected which did not contain a cytosine ring structure, so as not to compete for the cytidine pocket in solution. Unfortunately, no ILOE signals were detected in the NOESY spectra, perhaps because of too few non-exchangeable proton resonances close to the zinc-binding site to serve as ILOE probes. Nevertheless, fragment screening by NMR spectroscopy led to the identification of 61 putative hits for BpIspF by STD-NMR, of which 56 were also observed to bind by transfer NOESY.

Among the hits identified from NMR screening were FOL535 and FOL717, ligands previously known to bind BpIspF through crystallographic fragment screening (PDB:3K14 and 3IKF, respectively). However, none of the remaining hits had been detected through primary screening by crystal soaking and X-ray diffraction, even though they were all part of the original FOL library. The size, shape and chemical properties of these NMR hits covered a slightly larger chemical space than the fragments discovered through direct crystal soaking experiments. However, the majority NMR hits were equally small and chemically quite similar to the FOL compounds discovered by X-ray crystallography, suggesting nothing to make them incapable of binding to BpIspF crystals. A separate analysis demonstrated no particular trend in crystal contacts or solvent channel diameters for the apo crystal of BpIspF which would prevent any fragment-sized molecule from binding [44]. Since we did not acquire specificity data during primary screening, it is possible that some of the NMR hits were non-specific binders, and thus would never appear bound to a protein crystal by X-ray diffraction. It is also possible that the NMR hits bind BpIspF too weakly to obtain sufficient occupancy for X-ray diffraction experiments. We therefore conducted additional soaks at higher concentrations with our NMR hits, to see if this would lead to new fragments and additional complex structures.

Follow-up crystal trials with NMR-based fragment hits

Ligand-observe NMR data on primary screening mixtures cannot distinguish specific from non-specific binders without competitive titration or other follow-up experiments [4547]. Since our main goal was the generation of new complex structures, we immediately employed these putative NMR hits into crystal soaks without first confirming their binding site specificity (or lack thereof) with additional experiments. All 61 NMR hits (including FOL535 and FOL717) were grouped into pools of 4–5 compounds based on shape dissimilarity and dissolved in drops to 10 and 25 mM for crystal soaking trials. The same pools were used in parallel soaks with either cytosine or cytidine added to replicate in some manner the conditions used for NMR screening. The soak time was also increased to give weakly-binding zinc-site fragments more time to displace the coordinated water molecule. After 4–8 weeks, none of the NMR hits soaked in the absence of cytosine or cytidine resulted in complex BpIspF structures, with the exception of FOL535 and FOL717. From the pools containing cytosine, only FOL717 appeared to possess high enough affinity and occupancy to generate a reasonable dataset for structure determination (PDB: 3MBM). However, pools containing NMR fragment hits and cytidine generated density for 3 new fragments which had not previously been observed to bind by X-ray crystallography (Fig. 3). Two of these (FOL694 and FOL795) appear to bind a hydrophobic region external to the active site in an often disordered loop above the catalytic zinc ion (PDBs: 3P10 and 3QHD). The third (FOL955) binds within the active site and coordinates to the catalytic zinc ion with an aromatic nitrogen, in a manner similar to that of other zinc-binding fragments (PDB: 3P0Z).

The identity of all 3 NMR-derived fragment hits have been confirmed by individual soaks of BpIspF crystals with cytidine, and exhibit a cytidine dependence for binding. Repeated attempts to soak these fragments with BpIspF crystals, either alone or in the presence of cytosine, failed to generate high-resolution complex data. Thus the binding of cytidine to pre-formed apo crystals of BpIspF appears to enhance the affinities of these other small molecules for the target. The native substrate of IspF branches across the BpIspF active site, making key interactions to residues on both monomers [39]. Some of these interactions are conserved when cytidine (but not cytosine) is bound, forming hydrogen bonds which mimic that of the substrate-bound conformation. The 4-amino protons and carbonyl oxygen of cytidine contact the backbone of one monomer at A102, P105 and A108, while the 2′- and 3′-ribosyl hydroxyls make contact with the D58 side chain of the other monomer (Fig. 3). This provides a structural basis for cytidine causing an increase in the structural integrity of the active site, hence the dependency of certain fragments on its presence for binding to BpIspF.

In addition to their dependence on cytidine, another unique aspect of the NMR-based hits is their binding stoichiometry. Unlike all of the hits obtained directly through crystal soaking, FOL694, FOL795 and FOL955 were only observed in one active site per homotrimer of BpIspF (Table 1). Repeat experiments show this to be the case for multiple crystal trials, as it was for FOL717 and FOL535 in binding all three active sites (data not shown). In most BpIspF structures bound to fragments, the loop which generates the binding site for FOL694 and FOL795 is too disordered to properly model, suggestive of a high degree of flexibility in this region. However, this does not explain the 1:3 stoichiometry of FOL955, which only binds to a single active site of homotrimeric BpIspF, in a structure with three clearly-defined pockets. In all of our BpIspF structures, the active site appears solvent accessible, and crystal contacts do not occlude any one active site more than the other two [44]. One possible explanation is differential affinity across the three active sites in the crystal form used for soaking. None of the crystals in our ensemble display perfect threefold crystallographic symmetry along the trimer interface, making each monomer (and each active site) slightly different. However, the presence of 2Fo–Fc density near the catalytic zinc ion in each active site of these structures suggest that 3:1 binding does occur, but without sufficient density to model fragments into all three active sites. Future studies with close chemical variants of these fragments and solution-state binding affinity measurements will hopefully shed additional light on the underlying causes for this asymmetric binding stoichiometry.

Conclusion

In searching for ways to rapidly generate multiple ligand-bound structures for a given target, we have found fragment-based screening offers a variety of solutions. Biophysical methods such as ligand-observe NMR spectroscopy and X-ray crystallography are broadly applicable to macromolecular targets with minimal target-specific requirements, and are therefore well-suited to a high throughput pipeline. We propose that our compound library derived from the natural metabolome will maximize chances for complexation with disparate targets, while minimizing the need for target-specific fine-tuning over the course of many fragment screening campaigns. In conducting our studies on BpIspF, small molecules which were not observed by one screening method were found by another. This outcome underlines the importance of implementing two or more orthogonal methods in fragment-based screening to identify and verify hits as well as increase the potential pool of candidates [40]. Results from this study further revealed the influence one fragment can have on the binding affinity of another for a static target in crystal form. With strategic preparation of focused fragment mixtures for secondary experiments, the effect one fragment has on another can be exploited to generate ternary and even higher-order complexes.

The wealth of ligand-binding data generated by this approach now serves as a starting point for structure-based drug design. Using structural information on fragment-bound structures to chemically elaborate hits into more potent molecules has previously been demonstrated in our laboratory [3, 48] and elsewhere in the literature [4954]. In collaboration with medicinal chemists outside the SSGCID, we are pursuing a similar approach with BpIspF and other enzymes from the MEP pathway, using our ensemble of fragment-bound complexes as an initial blueprint. With the pipeline for new antibacterial drugs and other medicines dwindling, there will be increased pressure to develop novel antimicrobials in the coming years to combat drug-resistant strains and other emerging threats to human health [55, 56]. Fragment-based screening coupled with structure-based drug design is a tandem approach which can rapidly generate novel leads for both known and novel biologically relevant targets. We have demonstrated one target from the SSGCID to be amenable to biophysical fragment screening, and are now refining these methods to increase our fragment screening throughput. We intend to generate similar structure ensembles for other MEP targets, from B. pseudomallei and a variety of pathogenic organisms as part of our structural genomics efforts.