Introduction

Many biomolecular interactions proceed via lowly populated, transient intermediates, which are increasingly recognized as important determinants of macromolecular recognition and association kinetics (Schreiber et al. 2009; Ubbink 2009). Due to a low population and inherent dynamics, this minor species is invisible to most structural and biophysical methods, which severely thwarts its experimental characterization. Recent advances in paramagnetic relaxation enhancement (PRE) nuclear magnetic resonance (NMR) spectroscopy have enabled direct visualization of transient intermediates in protein–protein (Tang et al. 2006; Volkov et al. 2006; Bashir et al. 2010; Xu et al. 2008) and protein-DNA complexes (Iwahara and Clore 2006) and biomolecular self-association (Tang et al. 2008a, b; Hartl et al. 2010). These and other studies (reviewed in refs. Schreiber et al. 2009; Ubbink 2009) have confirmed a long-held view that formation of a protein complex proceeds via a short-lived encounter state, which enables proteins to undergo reduced-dimensionality search of the optimal binding geometry, thereby accelerating molecular association as compared to 3D diffusion (Adam and Delbruck 1968).

Recent experimental work on weak, transient protein interactions has revealed that the population of the encounter state—defined as percentage of time spent in this state relative to the total lifetime of the complex—varies in a wide range, spanning predominantly single-orientation systems (Tang et al. 2006; Volkov et al. 2006; Bashir et al. 2010) and highly dynamic, pure encounter complexes (Worrall et al. 2002; Xu et al. 2008). Moreover, it was shown that the amount of the encounter state in a protein–protein complex can be broadly modulated by interfacial point mutations (Volkov et al. 2010), suggesting an intriguing possibility of adjusting the population of the minor species.

PRE is caused by magnetic dipolar interactions between a protein nucleus and unpaired electrons of a paramagnetic probe (Clore 2008; Clore and Iwahara 2009), which can be introduced into the molecular frame by bioconjugation techniques. Due to the large magnetic moment of the unpaired electron and \( \langle r^{ - 6} \rangle \) distance dependence, protein nuclei located close to the paramagnetic center experience very large PREs, so that even lowly populated species can give rise to a measurable effect. This exquisite sensitivity makes PRE NMR spectroscopy a suitable tool for the study of transient intermediates in biomolecular interactions. For protein complexes in the fast exchange regime, the measured PRE is a population-weighted average of the contributions from all protein–protein orientations and, as such, contains the information on both the specific binding form and the encounter state (Clore 2008). In principle, PREs contain both temporal (population) and spatial (distances from the paramagnetic center) information on the minor species, which in favorable cases can be decomposed into separate contributions.

One of the systems studied by PRE NMR spectroscopy is a complex of cytochrome c (Cc) and cytochrome c peroxidase (CcP). Both proteins come from the inter-membrane space of yeast mitochondria, where CcP catalyses the reduction of peroxides using the electrons donated by Cc—an important process mitigating the oxidative stress (Chance et al. 1967). In our earlier work we showed that interaction between Cc and CcP comprises a well-defined Cc–CcP form and a combination of non-specific protein–protein orientations (Volkov et al. 2006). The latter constitute an encounter state with the total population of 30% (Bashir et al. 2010). Here we present an analysis of the recently published, extended PRE dataset (Bashir et al. 2010). First, we describe a simple, general spatiotemporal mapping approach that provides a reliable estimate of the experimental coverage and, at higher coverage levels, allows to delineate the conformational space sampled by the minor species. Further, we use PRE-based ensemble simulations to refine the encounter state and show that encounter ensembles are distributed predominantly in a region encompassing the dominant form of the complex, providing experimental proof for the results of classical theoretical simulations (Northrup et al. 1988). The combination of the methods used here is superior to a low-resolution, geometric analysis employed in our earlier work (Volkov et al. 2006) and offers a detailed visualization of the encounter state.

Materials and methods

Encounter state PREs

The transverse paramagnetic relaxation enhancement, Γ2, is given by the Solomon–Bloembergen equation (Eq. 1; Solomon 1955; Solomon and Bloembergen 1956):

$$ \Upgamma_{2} = \frac{1}{15}\left( {{\frac{{\mu_{0} }}{4\pi }}} \right)^{2} \gamma_{1}^{2} g^{2} \mu_{B}^{2} S(S + 1)r^{ - 6} \left( {4\tau_{c} + {\frac{{3\tau_{c} }}{{1 + \omega_{h}^{2} \tau_{c}^{2} }}}} \right) $$
(1)

where r is the distance between the paramagnetic center and the observed proton, μ0 is the permeability of vacuum, γ1 is the proton gyromagnetic ratio, g is the electron g-factor, μB is the electron Bohr magneton, S is the electron spin number, τc is the rotational correlation time, and ωh is the proton Larmor frequency. The rotational correlation time is defined as \( \tau_{c} = \left( {\tau_{r}^{ - 1} + \tau_{s}^{ - 1} } \right)^{ - 1} \), where \( \tau_{r} \) is the rotational correlation time of the protein complex (equal to 16 ns for Cc–CcP; Volkov et al. 2006) and \( \tau_{s} \) is the effective electron relaxation time. For a nitroxide SL used in this work, \( \tau_{s} \gg \tau_{r} \) so that \( \tau_{c} \approx \tau_{r} \) (Clore and Iwahara 2009).

For each Cc backbone amide (i), the observed \( \left( {\Upgamma_{2,i}^{\text{obs}} } \right) \) is the sum of the population-weighted contributions of the specific form \( \left( {\Upgamma_{2,i}^{\text{specific}} } \right) \) and the encounter state \( \left( {\Upgamma_{2,i}^{ *} } \right) \):

$$ \Upgamma_{2,i}^{\text{obs}} = p_{\text{tot}} \Upgamma_{2,1}^{*} + (1 - p_{\text{tot}} )\Upgamma_{2,1}^{\text{specific}} $$
(2)

where p tot is the total population of the encounter state, defined as the percentage of time spent in this state relative to the total lifetime of the complex. The \( \Upgamma_{2,i}^{\text{specific}} \) values were back-calculated from the crystal structure of the complex (PDB 2PCC; Pelletier and Kraut 1992) using prePot module (Iwahara et al. 2004) in Xplor-NIH (Schwieters et al. 2003, 2006). To account for the mobility of the attached SL, the calculated effects were averaged over an ensemble of 150 SL conformers generated by simulated annealing in torsion angle space (Iwahara et al. 2004). For the Cc residues exhibiting no PREs (i.e. I para /I dia > 0.8, where I para and I dia are peak intensities in the HSQC spectra of the spin-labeled complex and a diamagnetic control, respectively; Bashir et al. 2010), \( \Upgamma_{2,i}^{\text{obs}} \) were set to 5 s−1 and the errors derived from I para /I dia values as reported before (Bashir et al. 2010). Otherwise \( \Upgamma_{2,i}^{\text{obs}} \) and their errors were taken from previous work (Bashir et al. 2010). For each of the 10 SL conjugation sites, the \( \Upgamma_{2,i}^{ *} \) values were obtained from Eq. 2 and used in further analysis.

Generating the conformational space grid

All molecular simulations were performed in Xplor-NIH (Schwieters et al. 2003, 2006). The coordinates of Cc–CcP complex were taken from the X-ray structure (PDB 2PCC; Pelletier and Kraut 1992) and oriented such that centers of mass (CMs) of CcP and Cc appeared at the origin of the coordinate system and on the positive z axis, respectively. The position of CcP was fixed, while Cc molecule was systematically rotated around x and z axes, corresponding to θ and φ rotations around CcP in the spherical coordinate space (Fig. 1b). The rotation increments δθ and δφ determine the desired spatial resolution, which in our case was set to 1 Å separation between neighboring Cc CMs. To emulate the rotational freedom, Cc was rotated around orthogonal χ, ψ, ξ axes originating at its CM (Fig. 1b). By systematically varying the rotational coordinates χ, ψ, ξ (0 ≤ χ ≤ 2π, 0 ≤ ψ ≤ 2π, 0 ≤ ξ < π) in the increments of δχ = δψ = δξ = π/3, a set of 108 non-redundant Cc rotamers was produced at each (θ, φ) position. For every (θ, φ, χ, ψ, ξ) combination, the intermolecular van der Waals (vdW) energy term was calculated, with vdW potential set to zero for protein sidechain atoms extending beyond Cβ. Cc was then translated along the vector joining Cc and CcP CMs in steps of 1 Å until the vdW energies reached the values between zero and a chosen cut-off, thus either relieving intermolecular steric clashes or bringing together separated molecules in a rigid-body mimic of a protein complex. The distance between protein CMs at each (θ, φ, χ, ψ, ξ) defines the other translational coordinate, r. In this way, we explored the entire conformational space available to the interacting proteins (0 ≤ θ ≤ π, 0 ≤ φ < 2π), sampling 12,205 (θ, φ) positions and producing a total of 1,318,140 Cc–CcP orientations at varying (θ, φ, r, χ, ψ, ξ).

Fig. 1
figure 1

Specific form of Cc–CcP complex. a Crystallographic Cc–CcP binding orientation (Pelletier and Kraut 1992). Cc and CcP are in grey and yellow, with heme groups in sticks. Cα atoms of CcP residues used for spin-labeling are shown as spheres, colored according to whether the attached SL exhibits intermolecular PREs (red) or not (blue). One SL position (K97, blue) is not seen in this view. b Definition of the spherical coordinates used in this work. Proteins’ centers of mass are shown as grey spheres, with CcP at the origin of the coordinate system. Cc possesses three translational (θ, φ, r) and three rotational (χ, ψ, ξ) degrees of freedom. For the definition of the rotational axes see “Materials and methods”. All protein representations in this work are visualized with PyMOL (DeLano 2002)

Mapping the encounter state

For each of 1,318,140 (θ, φ, r, χ, ψ, ξ) orientations, the expected PREs (Γ2,i ) were back-calculated as described above, and the maximal population (p max) at which no violations of the experimental \( \Upgamma_{2,i}^{ *} \) restraints occurred was obtained (Eq. 3):

$$ p_{ \max } \left\{ {\begin{array}{ll} {\mathop {\min }\limits_{i} \left( {\Upgamma_{2,i}^{*} /\Upgamma_{2,i} } \right),\quad \mathop {\min }\limits_{i} \left( {\Upgamma_{2,i}^{*} /\Upgamma_{2,i} } \right) < p_{\text{tot}} } \hfill \\ {p_{\text{tot}} ,\quad \quad \quad \quad {\kern 1pt} \quad \;\;\mathop {\min }\limits_{i} \left( {\Upgamma_{2,i}^{*} /\Upgamma_{2,i} } \right) \ge p_{\text{tot}} } \hfill \\ \end{array} } \right. $$
(3)

To visualize the results, the largest p max of 108 χ, ψ, ξ Cc rotamers at each (θ, φ) position [p max(θ, φ), Eq. 4] was noted, and p max(θ, φ) values were color-coded onto the interaction grid isosurface θ, φ, r (χ,ψ,ξ=0), thus producing the spatiotemporal map shown in Fig. 2b.

Fig. 2
figure 2

Spatiotemporal analysis of Cc–CcP encounter state. a Interaction grid isosurface θ, φ, r (χ,ψ,ξ=0) consisting of 12,205 Cc CMs (blue dots). b Isosurface in (a) coloured according to p max(θ, φ) (Eq. 4), ranging from 0 (blue) to 0.3 (red). The white curve limits an area around the dominant form of the complex that contains Cc orientations contributing ≥5 Hz to \( \Upgamma_{2}^{*} \). CcP (grey surface) and Cc (cartoon) in the top panels are in the same orientation as in Fig. 1a. The middle and bottom views are obtained by 180° rotation of the top-panel representations around, respectively, z and y axes. For each SL the oxygen atoms of 150 conformers used for ensemble averaging are space filled and colored red and blue to indicate, respectively, the presence and absence of the measured paramagnetic effects. The cyan sphere shows Cc CM in the dominant complex

$$ p_{ \max } (\theta ,\varphi ) = \mathop {\max }\limits_{\chi ,\psi ,\xi } [p_{\max } (\chi ,\psi ,\xi )_{{\theta ,\varphi = {\text{const}}}}] $$
(4)

To delineate the area containing protein–protein orientations contributing to \( \Upgamma_{2}^{*} \) (restricted by the white curve in Fig. 2b), we defined a set of encounter PRE restraints for Cc residues that exhibit violations of \( \Upgamma_{2,i}^{\text{obs}} \) in the specific Cc–CcP complex (i.e. highlighted areas in Fig. 4c) with \( \left( {\Upgamma_{2,i}^{\text{obs}} - \delta \Upgamma_{2,i}^{\text{obs}} } \right) - \Upgamma_{2,i}^{\text{specific}} \, > \,5\,{\text{s}}^{ - 1} \), where \( \delta \Upgamma_{2,i}^{\text{obs}} \) is the error of \( \Upgamma_{2,i}^{\text{obs}} \), and selected all Cc molecules that contribute at least 5 Hz to these encounter restraints at a given p. The scripts for the encounter state mapping are provided in Supplementary Material.

Ensemble refinement against intermolecular PREs

Using the \( \Upgamma_{2}^{*} \) dataset obtained from all 10 SL conjugation sites (see above), the rigid-body simulated annealing refinement of the Cc–CcP encounter state was carried out in Xplor-NIH (Schwieters et al. 2003, 2006) following the published procedure (Tang et al. 2006). Briefly, the position of CcP was fixed, and multiple copies of Cc molecules, representing ensembles with N = 1–20, were docked to minimize the energy function consisting of the PRE target term, vdW repulsion term to prevent atomic overlap between Cc and CcP, and a weak radius-of-gyration restraint used to encourage intermolecular Cc–CcP contacts (Tang et al. 2006). Note that this procedure allows for the atomic overlap among Cc molecules constituting an ensemble. As a rule, 100 independent refinement runs were performed.

To assess the agreement between the observed PREs and the PREs back-calculated from Cc ensembles generated in each run, we calculated a Q factor (Eq. 5):

$$ Q = \sqrt {{{\sum\limits_{j} {\sum\limits_{i} {\left( {\Upgamma_{2,ij}^{\text{obs}} - \Upgamma_{2,ij}^{\text{calc}} } \right)^{2} } } } \mathord{\left/ {\vphantom {{\sum\limits_{j} {\sum\limits_{i} {\left( {\Upgamma_{2,ij}^{\text{obs}} - \Upgamma_{2,ij}^{\text{calc}} } \right)^{2} } } } {\sum\limits_{j} {\sum\limits_{i} {\left( {\Upgamma_{2,ij}^{\text{obs}} } \right)^{2} } } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j} {\sum\limits_{i} {\left( {\Upgamma_{2,ij}^{\text{obs}} } \right)^{2} } } }}} $$
(5)

where j = 1 − 3 runs over three SL positions showing paramagnetic effects (N38C, N200C and T288C; Fig. 1a) and \( \Upgamma_{2,i}^{\text{calc}} \) is given by Eq. 6:

$$ \Upgamma_{2,ij}^{\text{calc}} = {\frac{{p_{\text{tot}} }}{N}}\sum\limits_{k = 1}^{N} {\Upgamma_{2,ijk}^{*} } + (1 - p_{\text{tot}} )\Upgamma_{2,ij}^{\text{specific}} $$
(6)

where p tot is the total population of the encounter state, N is the size of the encounter ensemble, \( {\Upgamma_{2,ijk}^{*} } \) is the PRE from SL (j) back-calculated for the residue (i) of the Cc ensemble member (k), and \( \Upgamma_{2,ij}^{\text{specific}} \) is the PRE back-calculated from SL (j) for the residue (i) of Cc in the dominant form of the complex. The reported Q e is the average Q factor obtained from the ensembles generated in repeated refinement runs, while Q ee is the ‘ensemble of ensembles average’ (Tang et al. 2006) calculated by using the average \( \Upgamma_{2,ij}^{\text{calc}} \) computed from all n ensembles (Eq. 7):

$$ \Upgamma_{2,ij}^{\text{calc}} = {\frac{{p_{\text{tot}} }}{nN}}\sum\limits_{m = 1}^{n} {\sum\limits_{k = 1}^{N} {\Upgamma_{2,ijkm}^{*} + (1 - p_{\text{tot}} )\Upgamma_{2,ij}^{\text{specific}} } } $$
(7)

Results

Mapping the encounter state

To map out the conformational space occupied by the Cc–CcP encounter state, we have analyzed the intermolecular paramagnetic effects exerted on Cc nuclei by an unpaired electron of a nitroxide spin-label (SL) placed at ten different positions, one at a time, on the surface of CcP (Fig. 1a). As reported before (Volkov et al. 2006; Bashir et al. 2010), three SLs located close to the crystallographic binding site (N38C, N200C and T288C, shown as red spheres in Fig. 1a) give rise to PREs, while SLs attached to any of the other seven positions (blue spheres in Fig. 1a) show no effects. Most of the observed PREs arise from the dominant form of Cc–CcP complex; however, several Cc regions experience additional paramagnetic effects (highlighted in Fig. 4c), originating from protein–protein orientations constituting the encounter state (Volkov et al. 2006). By subtracting the effects of the dominant orientation from the observed PREs (Eq. 2 in “Materials and methods”), we obtained a set of the encounter state’s PRE contributions \( \left( {\Upgamma_{2,i}^{ *} } \right) \). This dataset, together with the information from the SLs exhibiting no measurable effects, was used in the subsequent analysis.

First, using a rigid-body sampling procedure (see “Materials and methods”), we generated a grid of Cc–CcP orientations, corresponding to the entire conformational space available to the interacting proteins (Fig. 2a). Note that each dot in Fig. 2a represents the centre of mass (CM) of Cc orientations with the same (θ, φ) coordinates, produced by non-redundant rotations around χ, ψ, ξ axes (see Fig. 1b for axes definition and “Materials and methods” for details). Second, for each of the generated orientations, we back-calculated the expected PREs \( \left( {\Upgamma_{2,i} } \right) \) and obtained the maximal population (p max) at which no violations of the experimental \( \Upgamma_{2,i}^{ *} \) restraints occurred (Eq. 3). Finally, for each grid point in Fig. 2a, the largest p max of χ, ψ, ξ Cc rotamers [p max(θ, φ), Eq. 4] was noted, and all p max(θ, φ) values were color-coded onto the interaction isosurface, providing spatial (location) and temporal (population) map of the encounter state distribution (Fig. 2b and Supplementary Movie S1). The spatiotemporal map in Fig. 2b delineates the extent of the conformational space accessible to Cc–CcP orientations with populations p. In other words, the map shows regions of space where all solutions for a given p are to be found. This all-inclusiveness of p solutions is a salient feature of the spatiotemporal encounter maps and an important achievement afforded by the present approach.

However, there are several drawbacks associated with the current analysis. First, such ‘zero-resolution’ approach provides no molecular-level details on protein–protein geometries constituting the encounter state. Second, the introduced spatiotemporal maps outline the areas that can be, but not necessarily are, populated in the encounter state. Thus, to pin down the actual region occupied by an encounter ensemble, an adequate experimental coverage of the entire conformational space is essential. To illustrate this point, imagine that no experimental PRE data on Cc–CcP complex has yet been collected. Following our reasoning, the entire interaction surface in Fig. 2b can be painted red, with p max = p tot for all grid points. In other words, with no a priori assumptions, encounter ensemble members can be located anywhere in the conformational space, and their populations range from zero to the total encounter population, p tot. To continue our thought experiment, imagine that the first PRE dataset has been collected and, for simplicity’s sake, the introduced SL exhibited no paramagnetic effect. This would allow us to color an area next to the SL in blue, indicating that only protein–protein orientations with very low populations, if any, can be found there, thereby restricting the effective conformational space available to the encounter. Addition of more experimental data from SLs placed at other surface positions would restrict the red area even further, bringing us closer to the actual region encompassed by the encounter state. Note that at this stage the SLs exhibiting no effects are as valuable as those showing PREs, because they allow for large portions of no-go space to be carved out. Ultimately, with an adequate experimental coverage, we end up with a warm-color area found only around the SLs showing \( \Upgamma_{2}^{*} \) effects, which indicates the true location of the encounter space.

Going back to Fig. 2b, we notice that a large part of the isosurface is composed of warm-color grid points. Most of these correspond to Cc molecules that contribute to the experimental \( \Upgamma_{2}^{*} \) restraints (the region above the white curve in Fig. 2b), thus defining the extent of the encounter space at the current level of experimental coverage. However, many warm-color points lie outside this area, which indicates incomplete PRE sampling. To assess the experimental coverage of the Cc–CcP encounter state, we mapped out the regions containing Cc molecules contributing to (red) or violating (blue) the experimental \( \Upgamma_{2}^{*} \) restraints at different p values (Fig. 3a, b and Supplementary Fig. S1). In these views, the conformational space not covered by the effects from the introduced SLs is shown in white. As expected, the white areas increase with decreasing p values, implying that progressively more experimental input is required to track down more lowly populated species.

Fig. 3
figure 3

Experimental coverage of the conformational space of Cc–CcP encounter state. Cc CMs for the orientations that, respectively, violate experimental PREs (blue) or contribute ≥5 Hz to \( \Upgamma_{2}^{*} \) (red) at a p = 0.01 and b p = 0.1. Combination of the blue and red areas defines the total conformational space covered by the effects from the introduced SLs. The same isosurface as in Fig. 2a, b is shown. See the legend to Fig. 2 for more details. c Plot of experimental coverage, or surface area of CcP covered by PREs from a single SL, versus the population of the minor species

Integration over red and blue areas in Fig. 3a, b provides a simple means of quantifying the extent of the experimental coverage. In our case, there is a good, log-scale correlation between p and the calculated coverage (Fig. 3c). Thus, about one half of the conformational space is probed by PREs at p = 0.01, increasing to nearly 80% at p = 0.1 (Table 1, cf. Fig. 3a, b). We estimate that 13–20 SLs per CcP—corresponding to one SL attached per each 190–300 Å2 of the total surface area—are required to provide an adequate coverage at p = 0.1–0.01 (Table 1).

Table 1 Experimental coverage of the conformational space

Ensemble refinement of the encounter state

Direct use of \( \Upgamma_{2}^{*} \) restraints in an ensemble-based, rigid-body simulated annealing structure calculation protocol—pioneered by Clore and co-workers (Iwahara et al. 2004; Schwieters et al. 2006; Tang et al. 2006)—provides an alternative, potentially more informative, means of refining the encounter state. In this approach, multiple copies of Cc are docked simultaneously to CcP by minimizing the difference between the combination of PREs from all Cc molecules and the experimental \( \Upgamma_{2}^{*} \) values (see “Materials and methods” for details).

We performed multiple structure calculations with ensemble sizes (N) varying from 1 to 20 and the encounter state population p tot = 0.3, determined in our earlier work (Bashir et al. 2010, also see below). To assess the quality of solutions, we calculated a Q factor (Iwahara et al. 2003, 2004), which is a measure of agreement with the experimental data (the smaller the Q factor, the better the agreement; Eq. 5). In Fig. 4a, Q e (an average Q factor of the individual ensembles) and Q ee (a Q factor calculated by averaging PREs of Cc molecules in all ensembles; Tang et al. 2006) are plotted as a function of N. The Q factors diminish with the increasing ensemble size, leveling off at N = 10–20. As can be seen from Fig. 4a, Q ee is systematically smaller than Q e, which is due to the stochastic rather than unique combination of protein–protein orientations within each ensemble, such that averaging over all ensembles leads to a better agreement with the data (Tang et al. 2006). By randomly omitting 10% of \( \Upgamma_{2}^{*} \) restraints and verifying how well these ‘free’ PREs are predicted by the remaining, ‘working’ data set (i.e. 90% included in the refinement), we performed a complete cross-validation (Brünger et al. 1993), with Q free as a measure of the fit. The calculated Q free values (Fig. 4a) indicate that N = 10–20 is the optimal size of the Cc ensemble required to satisfy the experimental restraints and that the improvement in the Q factors is not due to over-fitting (Tang et al. 2006).

Fig. 4
figure 4

PRE-based ensemble simulations of the Cc–CcP encounter state. a Intermolecular Q factors: Q e (black), Q ee (red), and Q free (blue). See text for the definitions. b Correlation between the observed \( \left( {\Upgamma_{2}^{\text{obs}} } \right) \) and calculated \( \left( {\Upgamma_{2}^{\text{calc}} } \right) \) PREs for the dominant form of the complex alone (N = 0, top) or in combination with the simulated encounter ensemble (N = 10, bottom). c Observed and calculated PREs for Cc–CcP-SL complexes, with SLs attached at position N38C (top), N200C (middle) and T288C (bottom). Experimental \( \Upgamma_{2}^{\text{obs}} \) (black; Volkov et al. 2006; Bashir et al. 2010), \( \Upgamma_{2}^{\text{calc}} \) for the specific orientation (blue), and \( \Upgamma_{2}^{\text{calc}} \) for the combination of the specific form and an encounter ensemble (N = 10, red). Crosses indicate the value of \( \Upgamma_{2} \ge 125\,{\text{s}}^{ - 1} \) for the calculated PREs or identify the residues whose resonances disappear in the paramagnetic spectrum. The errors are standard deviations

As can be seen from \( \Upgamma_{2}^{\text{obs}} \) vs. \( \Upgamma_{2}^{\text{calc}} \) plots (Fig. 4b) and PRE profiles (Fig. 4c), a combination of PREs from the refined encounter ensemble and the dominant, crystallographic Cc–CcP orientation provides a good agreement with the experimental data. Clearly, most of the encounter PRE restraints are now satisfied (highlighted regions in Fig. 4c). To visualize the distribution of Cc molecules in the encounter state, we use a reweighted atomic probability density map (Schwieters and Clore 2002), calculated from 100 independently generated ensembles with N = 10 (Fig. 5a). Most of the minor species are found in an area surrounding the dominant form of the complex, and a small, low-density patch of solutions is located at the back of CcP (see below).

Fig. 5
figure 5

Spatial distribution of the encounter ensembles. a Reweighted atomic probability density maps for the overall distribution of Cc molecules obtained from 100 PRE-based ensemble calculations (N = 10, plotted at a threshold of 20% maximum). In the bottom view, SL atoms are removed for clarity. b Overlay of the Cc–CcP interaction isosurface coloured according to the experimental PREs at p = 0.03 (see the legend to Fig. 3 for details) and CMs of Cc molecules from 100 PRE-based ensemble simulations (N = 10, green spheres)

Note that the atomic probability density maps in Fig. 5a are derived from all Cc atoms, while spatiotemporal maps in Figs. 2, 3 show only the CMs. Thus, to enable a direct comparison of the two representations, we plotted the CMs of Cc molecules from 100 generated ensembles (N = 10) together with the interaction grid isosurface contoured at p = p tot/N = 0.03 (Fig. 5b and Supplementary Movie S2). With a few exceptions, Cc CMs are located in the allowed regions of the encounter maps, indicating a good agreement between the two methods. It should be noted that, unlike in the spatiotemporal mapping approach, small violations are tolerated in the simulated annealing ensemble refinement: a slight violation of one restraint, accompanied by concomitant satisfaction of several others, can provide a better agreement with the experimental data than a good solution for the same restraint coming at a price of multiple bad solutions for others.

Despite making no individual contributions to the observed PREs, Cc molecules found in the white regions of the encounter maps nevertheless influence the ensemble-averaged \( \Upgamma_{2}^{*} \) values obtained in the refinement procedure. The presence of such non-contributing solutions (e.g. in a low-density region at the back of CcP, Fig. 5a) could signify: (1) excessive ensemble size, (2) incorrect population of the encounter state used in the calculations, or (3) insufficient experimental coverage of the conformational space. In our case, the first of these possible causes can be dismissed as decreasing the ensemble size from N = 10 to N = 5 to N = 3 does not completely eliminate the non-contributing solutions and steadily increases the Q factor (Fig. 4a). Moreover, the complete cross-validation of \( \Upgamma_{2}^{*} \) dataset ruled out a possible over-fitting at higher N values (see above). To test the second possibility, we repeated calculations at different p tot for ensembles with N = 5 and N = 10 (Supplementary Fig. S2). In both cases, the Q factors fall sharply from p = 0 to p = 0.3 and then level off at p = 0.3–0.5, confirming that the value of p tot = 0.3, determined in a recent study (Bashir et al. 2010) and used throughout this work, is correct. Finally, to explore the third option, we performed control runs in which the number of Cc molecules in the ensemble was varied from N = 5 to N = 9 but their individual populations kept constant at p i  = 0.03, so that Σ i p i  < p tot. In this way, we assessed whether a subset of binding geometries with the combined population of 0.15 ≤ Σ i p i  ≤ 0.27 can account for \( \Upgamma_{2}^{*} \) effects of the entire encounter state (p tot = 0.3). In the control runs, decrease in Σ i p i is accompanied by only a small increase in Q factors (Supplementary Fig. S3), and the overall distribution of encounter ensembles remains essentially the same as that shown in Fig. 5a, except that the low-density patch at the back of CcP is steadily reduced with decreasing Σ i p i (e.g. compare the views for Σ i p i = 0.21 in Supplementary Fig. S4 and Σ i p i = p tot = 0.3 in Fig. 5a). These results indicate that the experimental \( \Upgamma_{2}^{*} \) values can be accounted for by a limited subset of protein–protein orientations, suggesting that Cc ensemble members found in the white regions of the encounter maps might represent a minor sub-population of the encounter state, not reported upon by the SLs introduced so far.

Discussion

Experimental description of the encounter state

The spatiotemporal mapping approach presented here is superior to a simple, geometric analysis of the encounter state employed in our earlier work (Bashir et al. 2010; Volkov et al. 2006) in that it uses protein structures and realistic van der Waals potentials, rather than spheres and uniform cut-off values, to sample the conformational space; relies on explicit \( \Upgamma_{2}^{*} \) data, instead of uniform estimates, for calculation of allowed p values; and utilizes extensive ensemble-averaging of the PRE effects over multiple SL conformers, thus accounting for the mobility of the attached paramagnetic probes. When applied to an extended experimental dataset spanning 10 SL positions, these methodological advances result in a more informative and detailed encounter map compared to our earlier, roughly shaped “clouds” drawn from the effects of 5 SLs (Volkov et al. 2006).

The main advantage offered by the encounter maps is that they include all possible spatial solutions for Q → 0 at Σ i p i = p tot. However, this comes at a price of providing no molecular-level details on the protein–protein orientations constituting the encounter state. To overcome the ‘zero-resolution’ limitation of the mapping, the encounter space was further refined by restrained ensemble simulations, affording a more detailed description of the minor species. The generated solutions reproduce well the experimental data (Fig. 3c); however, the Q factor (Q ee = 0.32 for N = 10–20) is slightly higher than those obtained in PRE NMR studies of other biomolecular interactions (Iwahara and Clore 2006; Tang et al. 2006). This can be attributed to large errors on the experimental \( \Upgamma_{2}^{\text{obs}} \) values (Volkov et al. 2006; Bashir et al. 2010), obtained from intensity analysis of HSQC spectra (Battiste and Wagner 2000). In principle, longer spectral acquisition or the use of a two-point \( \Upgamma_{2} \) measurement scheme (Iwahara et al. 2007; Clore and Iwahara 2009) could increase both accuracy and precision of the data. In practice, however, the instability of Cc–CcP complex—caused by autoreduction of Cc (Young and Caughey 1987) occurring on time scale of several hours (A.N.V., M.U. unpublished observations)—severely restricts the effective experimental time, precluding the use of two-point \( \Upgamma_{2} \) measurements. Still, despite the practical limitations inherent in our system, the PRE NMR analysis provides a meaningful picture of the Cc–CcP encounter state.

There is a certain overlap between the concepts of the two approaches used here to analyze the encounter state. For instance, the number of Cc molecules included in the ensemble simulations could be thought of as defining the size of a brush to paint the encounter space map, imbuing it with spatial resolution. Concerning the temporal resolution, though all ensemble members are uniformly populated at p = p tot/N, allowing for the overlap of Cc molecules during the simulations effectively reproduces non-uniform populations captured in the encounter maps. The major conceptual difference between these methods is that the encounter mapping is inherently negative (or exclusive, i.e. relies on carving out the regions of space that cannot be populated at a given p), while ensemble simulations are essentially positive (or inclusive, i.e. finding the solutions that satisfy given restraints). As a result, the former benefits from SLs exhibiting no PRE effects and is sensitive to the extent of the experimental coverage, while the latter relies on the observed PREs and is more tolerant to incomplete experimental sampling.

Narrower distribution of Cc CMs in the N = 10 ensembles, compared to the red area of the encounter map (Fig. 5b), indicates that only a limited subset of allowed solutions has been found in the ensemble simulations. This can be due to an incomplete experimental coverage of the encounter maps or an insufficient sampling during the refinement procedure. The former can be improved by the introduction of more SLs to further restrict the encounter space, while the latter may be remedied by a more aggressive search. Alternatively, to tease out encounter ensembles directly from the spatiotemporal maps, one could sample multiple combinations of allowed orientations in search for the ones reproducing the experimental \( \Upgamma_{2}^{*} \) data, using a suitable algorithm (e.g. a metaheuristic search; A.N.V. work in progress).

We would like to stress that the spatiotemporal map presents p max values for individual Cc–CcP orientations, some of which could populate the encounter state (enclosed by the white curve in Fig. 2b). Reconstitution of the analogous map for the entire encounter state is a non-trivial, multivariate problem. For instance, it is conceivable that a combination of protein–protein orientations, each of which is allowed individually at a given p, will summarily yield a prohibitively high \( \Upgamma_{2}^{*} \) value, violating the experimental PRE. Thus, a good sampling of multiple combinations of allowed orientations would be required to glean the total encounter state map. One way to approach this problem is offered by the metaheuristic search mentioned above, which is currently under investigation in our laboratory.

In its use of p max—the maximal population of a particular orientation compatible with the experiment—the present approach is akin to the method of maximum allowed probabilities, developed to characterize flexible, partially independent protein domains from residual dipolar couplings and pseudocontact shifts (Gardner et al. 2005; Longinetti et al. 2006; Bertini et al. 2007) and recently extended to small-angle X-ray scattering data (Bertini et al. 2010). Here we show that a similar idea can be successfully applied to characterization of protein–protein interactions by PRE NMR spectroscopy.

Comparison with theoretical simulations

The interaction between oppositely charged Cc and CcP was studied before by theoretical simulations employing Poisson–Boltzmann electrostatic potentials (Fig. 6a, c; Northrup et al. 1988; Gabdoulline and Wade 2001; Bashir et al. 2010). As shown in our recent work (Bashir et al. 2010), the ensemble of protein–protein geometries generated by electrostatics-based Monte Carlo (MC) protocol provides a good description of the Cc–CcP encounter state. In Fig. 6b, a typical MC ensemble is visualized using a reweighted atomic probability density map, revealing a good agreement with the results of a classical Brownian dynamics study (Fig. 6c; Northrup et al. 1988). In particular, four energy minima shown in the latter are also present in the MC simulations. In our case, the energy minimum around D148 is shallower, possibly due to the difference in the electrostatic potentials of Cc molecules used in the simulations (horse heart Cc in those of Northrup et al. 1988 and yeast iso-1 Cc in our case).

Fig. 6
figure 6

Comparison of the theoretical and experimental simulations of the Cc–CcP encounter state. a Crystallographic Cc–CcP orientation. To facilitate the comparison with the published data, three aspartates of CcP are labeled and shown as orange sticks. b Overall distribution of 1,701 Cc molecules in the simulated Cc–CcP encounter complex (Bashir et al. 2010), displayed as a reweighted atomic probability density map (Schwieters and Clore 2002; plotted at a threshold of 0.5% maximum, blue). The surfaces of a Cc and b CcP are colored by the electrostatic potential calculated at ±5 k B T (red—negative, blue—positive) with APBS (Baker et al. 2001). c The Boltzmann-averaged total electrostatic potential energy of interaction between CcP and horse Cc in units of k B T as a function of Cc CM. This panel is taken from ref. (Northrup et al. 1988) with permission from Science. d The blue and green meshes indicate reweighted atomic probability density maps for the overall distribution of Cc molecules obtained from, respectively, Monte-Carlo simulations (same as in b) and PRE-based ensemble calculations (same as in Fig. 5a), both plotted at a threshold of 20% maximum. The dominant form of the complex is in the same orientation as in Fig. 1a. In the bottom panel, SL atoms are removed for clarity

The density maps generated from the theoretical (MC) and experimental (PRE-based) encounter ensembles encompass an area around the dominant form of Cc–CcP complex and broadly overlap (Fig. 6d). Despite a similar location, the MC and PRE ensembles exhibit different Q-factors [Q ee = 0.54 and 0.32 (N = 10), respectively], indicating that the latter reproduce the experimental \( \Upgamma_{2}^{\text{obs}} \) better. This is further evidenced by comparison of the corresponding PRE profiles (Fig. 4c here and Fig. 4a–c in Bashir et al. 2010). In this light, a somewhat broader distribution of the PRE ensembles could suggest that, in addition to electrostatics, other intermolecular forces may contribute to protein–protein interactions in the encounter state.

It should be noted that the MC solutions are the result of theoretical simulations, while the encounter ensembles described in this work are the product of the direct refinement against the measured PREs, which explains better agreement of the latter with the experimental data. Still, as shown in our earlier work (Bashir et al. 2010) and confirmed here by PRE-based ensemble refinement at different p values (see above), MC simulations provide a good representation of the encounter state and offer a robust estimate of its population. However, to obtain a more detailed description of the encounter state’s conformational space, a further refinement of MC solutions appears to be necessary.

Practical considerations for the analysis of protein encounters by PRE NMR spectroscopy

As shown here, a good experimental coverage of the entire conformational space available to the interacting proteins is essential for an accurate description of the encounter state. In our case, the experimental sampling was achieved by varying the conjugation site of the paramagnetic probe on the surface of CcP. We estimate that 13–20 uniformly spaced SL attachment positions, located outside the crystallographic binding site, are required to provide an adequate PRE coverage for Cc–CcP complex at p = 0.1–0.01 (Table 1). This means that, in addition to our dataset, at least 3–10 extra SLs would be needed to complete the encounter map. (In practice, this number is expected to be higher due to non-uniform distribution of the already introduced SLs.)

To transpose our findings to other systems involving globular proteins, we note that attachment of one SL per each 190–300 Å2 of the total surface area (SA)—or 160–250 Å2 of the SA excluding the binding site—is required for the complete coverage at p = 0.01–0.1 (Table 1). In other words, a SL should be placed every 5–10 protein surface residues (defined as those with solvent-accessible SA > 10 Å2). The introduced SL must not perturb the biomolecular interaction studied (e.g. steric clashes with the dominant binding form or substitutions of charged residues altering electrostatic potentials should be avoided), which limits the choice of the SL attachment locations. In practical terms, introduction of a SL at each site necessitates preparation of the corresponding single-cysteine protein variant. Thus, a comprehensive, SL-based PRE NMR encounter mapping requires a significant experimental effort, approaching that of labor-intensive EPR studies (Crane et al. 2005, 2006, 2008).

Several strategies can be followed to expedite the analysis. First, if possible, paramagnetic labeling of the smaller protein should be carried out as it requires fewer conjugation sites for a good experimental coverage. In case of Cc–CcP, 5–7 SLs attached to Cc (i.e. approximately one-third of those needed for CcP; Table 1) would be enough for the complete coverage at p = 0.1–0.01. Further, paramagnetic tagging of both interacting proteins, one at a time, would allow one to decrease the number of attachment sites even more. The main limiting factors of this approach are the quality of the NMR spectra and the availability of backbone assignments for the bigger protein, which have thwarted its application to the Cc–CcP complex. Second, the use of stronger paramagnetic labels (e.g. an EDTA-Mn chelate or lanthanide-containing probes; Clore and Iwahara 2009; Su and Otting 2010) would significantly decrease the number of conjugation sites required for a good experimental coverage and could offer a number of additional advantages. For example, pseudo-contact shifts originating from the introduced lanthanide atoms could provide an independent means of verifying the structure of the dominant binding form in solution, and the use of rigid, two-armed, paramagnetic tags (Keizers et al. 2007, 2008) would obviate the need for extensive ensemble averaging of the measured PREs, thus allowing for a more accurate description of the encounter state.

Conclusions

The spatiotemporal mapping approach presented here provides a reliable estimate of the experimental coverage and, at higher coverage levels, allows to delineate the conformational space sampled by the minor species. As shown in recent studies of Cc–CcP (Bashir et al. 2010) and other complexes formed by charged proteins (Kim et al. 2008), electrostatics-based MC simulations afford a robust estimate of the encounter state population, which is further confirmed by ensemble refinement performed in this work. However, to obtain an accurate description of the encounter state’s conformational space, further refinement of MC solutions appears to be necessary. The combination of methods employed here for the analysis of Cc–CcP encounter state illustrates a general approach for comprehensive visualization of transient species in biomolecular systems.