Validation of macromolecular flexibility in solution by small-angle X-ray scattering (SAXS)
The dynamics of macromolecular conformations are critical to the action of cellular networks. Solution X-ray scattering studies, in combination with macromolecular X-ray crystallography (MX) and nuclear magnetic resonance (NMR), strive to determine complete and accurate states of macromolecules, providing novel insights describing allosteric mechanisms, supramolecular complexes, and dynamic molecular machines. This review addresses theoretical and practical concepts, concerns, and considerations for using these techniques in conjunction with computational methods to productively combine solution-scattering data with high-resolution structures. I discuss the principal means of direct identification of macromolecular flexibility from SAXS data followed by critical concerns about the methods used to calculate theoretical SAXS profiles from high-resolution structures. The SAXS profile is a direct interrogation of the thermodynamic ensemble and techniques such as, for example, minimal ensemble search (MES), enhance interpretation of SAXS experiments by describing the SAXS profiles as population-weighted thermodynamic ensembles. I discuss recent developments in computational techniques used for conformational sampling, and how these techniques provide a basis for assessing the level of the flexibility within a sample. Although these approaches sacrifice atomic detail, the knowledge gained from ensemble analysis is often appropriate for developing hypotheses and guiding biochemical experiments. Examples of the use of SAXS and combined approaches with X-ray crystallography, NMR, and computational methods to characterize dynamic assemblies are presented.
KeywordsSmall-angle X-ray scattering (SAXS) Macromolecular flexibility Rigid-body modeling Ensemble analysis
SAXS profile as a indicator of flexibility
SAXS profiles provide more accurate atomic-level information about structures in solution without crystallographic constraints
Methods of analysis based on the concept of a single conformer cannot provide a complete three-dimensional model of dynamic proteins. Using a single “best” conformer to represent the ensemble at most provides a model representing an average of the conformations that exist in solution. Such a “best” single model of the macromolecular state can still be informative by helping guide a hypothesis regarding the macroscopic conformational state (Hammel et al. 2002; Iyer et al. 2008; Jain et al. 2009; Pascal et al. 2004; Williams et al. 2009). For example, if the crystal structure of a macromolecular assembly is known, a theoretical scattering profile can be calculated from the atomic coordinates. This provides the opportunity to evaluate several user-generated models (Fig. 1). If an extended conformer fits SAXS data better than a compact crystal structure, then an opening of the assembly in solution may be assumed (Nagar et al. 2006; Pascal et al. 2004; Yamagata and Tainer 2007).
Crystal packing forces are a selective pressure on a ensemble that typically promote a single conformer within the crystal lattice. Differences between crystal and solution states often reflect the presence of crystal packing forces (Cotner-Gohara et al. 2010; Datta et al. 2009; Duda et al. 2008; Nishimura et al. 2009; Stoddard et al. 2010) that can be used to gain new insights into a protein's flexibility (Nishimura et al. 2009). Direct comparisons of different conformational states with model SAXS profiles calculated from atomic-resolution structures have been quite successful in identifying and decomposing the relative fractions of conformers of a sample in solution, such as with the archaeal secretion ATPase GspE. The MX structure of the hexameric ring revealed a mixture of open and closed states of the individual subunits (Yamagata and Tainer 2007). In contrast, SAXS studies of GspE suggested a much different conformational state in solution. In the presence of the transition state ATP analogue, AMP-PNP, SAXS experiments suggest the enzyme’s subunits assume an all-closed state. In the next step of the catalytic cycle, the ADP-bound state, SAXS experiments suggest GspE exists as a mixture of all-closed and all-open states. The original crystal structure of alternating open–closed states in a ring failed to explain the SAXS experiments and raises significant questions regarding the proper biological state of the crystallized GspE. Crystal packing forces are structurally selective (Nishimura et al. 2009; Stoddard et al. 2010); consequently, a structural biology approach solely dependent on MX will be limited in scope.
Accurate computation of SAXS profiles
Fitting theoretical models to SAXS profiles requires that a measure be established for determining the agreement between two scattering curves. I am not convinced that a “best” measure of assessing agreement between experimental and theoretical curves has been adequately developed. The standard χ clearly weighs the lowest resolution data most strongly. The χ values become less informative as the high resolution SAXS profiles with “low-noise” are used to fit atomistic models. For additional assessment of the quality of model-data agreements I suggest displaying the discrepancy by using the ratio calculated as I experiment/I model. This residual-ratio clearly displays discrepancies in the important small q region whereas the standard log10-based presentation of log (I) versus q frequently does not (Figs. 3a and 5).
Modeling of the conformational space
Although comparison of model SAXS profiles with the experimental data is one of the most straightforward applications of SAXS, the uniqueness of arrangements of atomic resolution structures that fit SAXS data must also be evaluated. The determination of multidomain or subunit assemblies using rigid-body modeling in conjunction with SAXS data involves preparing a large number of possible atomic models and comparing them with experimental data. The models can either be refined directly against experimental data (Petoukhov and Svergun 2005) or prepared independently using the SAXS data as a filter to select the “best fit” model(s) (Boehm et al. 1999; Forster et al. 2008). The biggest challenge in trying to model flexible multidomain systems using SAXS data is to avoid over-fitting. Most commonly, over-fitting can be detected by visually inspecting the selected models and examining for large unfolded regions or unrealistic inter-domain distances. Extremely elongated or partially unfolded structures may contribute to inappropriate “successful fits” of experimental data derived from aggregated or heterogeneous samples (reviewed by Putnam et al. 2007). For example, studies of mammalian lipoxygenase illustrate the need for establishing monodispersity of sample in cases where domain flexibility is proposed (Dainese et al. 2005; Hammel et al. 2004b; Shang et al. 2011). In early studies the discrepancy between the experimental curve of mammalian lipoxygenase and the profile calculated from the atomic coordinates were interpreted in terms of a very large movement of the N-terminal domain (Hammel et al. 2004b). In a recent study, however, Shang et al. (2011) found that mammalian lipoxygenase, besides its flexible N-terminal domain, forms a transient dimer that also leads to an elongated SAXS signal. Therefore, samples that are suspected of possessing intrinsic flexibility must be carefully characterized to ensure monodispersity before SAXS modeling (Rambo and Tainer 2010b).
Conformational sampling may also be performed with simplified coarse-grain (CG) models, where amino-acid residues are presented as spherical beads centered at corresponding Cα atom positions (Rozycki et al. 2011; Yang et al. 2010). Although extremely simplified, CG incorporates the main generic features and folding data of the protein under investigation. The CG models are used to not only speed up the production phase of conformational sampling but also to speed up the SAXS calculation. However, CG models are coarse representations, and it has been shown that full atomistic models are required for accurate calculation of SAXS profiles (Grishaev et al. 2010) (Fig. 3). Particularly for modeling flexible assemblies, the atomistic representation is essential for accurate representation in solution when the particles deviate from a canonical globular shape (Fig. 3).
Distance constraints in rigid-body modeling
Accurate assignment of the flexible regions is crucial to realistic conformational sampling. In most cases, analysis of high-resolution structures can indicate plausible regions of structural flexibility (Chen et al. 2010). Missing electron density (Bernstein et al. 2009; Biersmith et al. 2011; Hammel et al. 2007a; Hammel et al. 2010b) or regions with a high isotropic atomic displacement factor (ADF also called the B-factor) (Duda et al. 2008; Williams et al. 2011) are useful indicators of flexible regions. Empirical determination of flexible regions can be achieved by hydrogen–deuterium exchange mass spectrometry (HDX) which specifically follows changes in conformational states of proteins. For example, HDX clearly assigned the flexible region in the complement C3b molecule after its activation (Hammel et al. 2007b). This HDX experiment guided SAXS based rigid-body modeling used to visualize the C3b molecule as a highly dynamic system. SAXS modeling also revealed that C3b flexibility may be effected by an allosteric inhibitor, for example the extracellular fibrinogen-binding protein (Efb) from Staphylococcus aureus. This is the first reported evidence that the system is controlled by allosteric inhibitors and supports new views in which modulators may stabilize preexisting intrinsic conformations rather than inducing completely new domain arrangements (Chen et al. 2010) (Fig. 4).
Furthermore, realistic models may by derived by incorporating additional information about the system in question, for example known distance constraints. Techniques that provide local distance and angle information, for example Förster resonance energy transfer (FRET) (Rochel et al. 2011) and NMR (Bertini et al. 2008; Mareuil et al. 2007) may provide useful restriction in inter-domain movement and guide conformational sampling. The rigid body/torsion/Cartesian simulated annealing strategy developed by Grishaev et al. (Grishaev et al. 2005; Mittag et al. 2010) integrated both NMR and SAXS observations into a unique synergistic method for atomistic modeling. From NMR, residual dipolar coupling (RDC) data were used to orient the symmetrically related protein domains relative to the symmetry axis of the protein core whereas translational, shape, and size information was provided by SAXS (Schwieters et al. 2010). FRET in combination with SAXS guided rigid-body modeling to aid elucidation of the structural basis of the role of DNA in the spatial organization of nuclear hormone receptors in complex co-activators (Rochel et al. 2011). Distance restraints may also be generated from simple biochemical techniques, for example site-direct mutagenesis. For example, integrated site-directed mutagenesis and SAXS combined with conformational sampling of DNA binding sites were used to determine the DNA-binding properties of mPNK (Bernstein et al. 2009) and reveal the intramolecular metal ion transfer between flexibly-linked domains of mercury ion reductase (Johs et al. 2011) (Fig. 5).
The conformational ensemble
Although exhaustive conformational sampling significantly increases the number of realistic models to be used for modeling experimental SAXS data, a single best-fit conformation may be incapable of explaining the observed SAXS profile. The lack of convergence of a single best-fit conformation has been shown to correlate with conformational disorder rather than a limitation of the search space algorithm (Pelikan et al. 2009). In the case of scattering from a heterogeneous population, the measured scattering is derived from the population-weighted thermodynamic ensemble, and the interpretation of dynamic systems requires analysis beyond “best fit” conformations (Figs. 4 and 5). In recent years, new SAXS modeling techniques have been developed to describe dynamic systems in terms of ensembles of structures (Bernado et al. 2007; Pelikan et al. 2009; Rozycki et al. 2011; Yang et al. 2010). Four promising approaches for modeling the ensemble are pushing SAXS into an exciting new direction, the ensemble optimization method (EOM) (Bernado et al. 2007), minimal ensemble search (MES) (Pelikan et al. 2009), ensemble refinement of SAXS (EROS) (Rozycki et al. 2011), and basis-set supported by SAXS (BSS-SAXS) (Yang et al. 2010). Because of the nearly infinite number of conformations that can be adopted by flexible proteins in silico, obtaining meaningful models requires the development of robust statistical approaches that determine the probability a particular multi-conformational equilibrium will exist (Bertini et al. 2010). Again, a common problem with multi-conformational analysis is over-fitting, which occurs when an ensemble model describes noise or aggregation in the experimental system, rather than the desired underlying relationship. MES avoids over-fitting by asserting the minimum number of states that could be distinguished from SAXS data. In addition, to avoid over-fitting the data with the multiple conformations (Bernado et al. 2007), a quantitative description of the ensemble also requires the weighting of each conformer's distribution (Pelikan et al. 2009; Yang et al. 2010). For the purpose of avoiding over-fitting of raw data, Rozycki et al. constructed a pseudo free energy scheme to refine the statistical weights attributed to configurations generated by simulation (Rozycki et al. 2011). These SAXS ensemble methods seem enormously successful on the basis of analysis of several key biological systems: identification of the correct subunit positions for full-length Ku (Hammel et al. 2010b), demonstration of the flexibility in full-length polynucleotide kinase (Bernstein et al. 2009), establishment of the configurational space of Lys-63 linked tetraubiquitin (Datta et al. 2009), elucidation of the flexibility mode in a Ubiquitin-PCNA complex involved in DNA replication and repair (Tsutakawa et al. 2011), and describing the partially unfolded state of XRCC4 (Hammel et al. 2010a) and XRCC4-likes proteins (Hammel et al. 2011).
Conclusions and prospects
Structural biology now recognizes that partially populated states are crucial to biological function. The single conformation description of a macromolecule is only a snapshot of a macromolecular ensemble. We have seen that integrative methods that utilize NMR and MX with SAXS are proving to be essential for providing a larger description of the macromolecular ensemble. Using SAXS data as a source of experimental restraints for modeling macromolecular flexibility is an exciting and relatively underdeveloped discipline. SAXS data can provide important experimental feedback, and can be extended to include dynamic conformational changes characterized by time-resolved experiments. Time-resolved measurements require very high X-ray flux and fast detectors designed for rapid electronic shuttering. Both are now available, and SAXS, unlike traditional NMR and fluorescence experiments, is not affected by molecular rotation times, so time-resolved SAXS can be performed in an equivalent manner to the traditional static experiments. The development of the approaches for characterizing highly fluctuating conformational equilibria on the basis of traditional static experiments are becoming essential in the description of intrinsic dynamic biomolecular systems (Bernado and Blackledge 2010). Macromolecular machines with flexible and unstructured regions are now tractable to direct structural investigation (Bernado 2010; Bernado and Svergun 2012). These are some of the reasons why SAXS-based solution structure modeling of flexible macromolecular assemblies are gaining popularity and will be used in the future to elucidate the roles of dynamic equilibrium in biological processes (Rambo and Tainer 2011). A natural complement to the global shape and conformation from SAXS will be residue-level information from advancing techniques of enhanced hydrogen–deuterium exchange mass spectrometry, which can approach single-residue resolution as shown for the photocycle changes of photoactive yellow proteins (Brudler et al. 2006). Thus, SAXS is well positioned to become an important technique, with new weak-field aligned NMR and fluorescence experiments that can probe samples in the biologically interesting millisecond time frame. With appropriate resources for directed efforts, SAXS can provide complementary experimental data on flexibility in macromolecular interactions with widespread effects.
The author thanks Robert Rambo and David Shin (Lawrence Berkeley National Laboratory) for insightful discussions and careful reading of the manuscript. The author is supported in part by the DOE program Integrated Diffraction Analysis Technologies (IDAT) under Contract Number DE-AC02-05CH11231 with the U.S. Department of Energy and National Cancer Institute grant Structural Biology of DNA Repair (SBDR) CA92584.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
- Brudler R, Gessner CR, Li S, Tyndall S, Getzoff ED, Woods VL Jr (2006) PAS domain allostery and light-induced conformational changes in photoactive yellow protein upon I2 intermediate formation, probed with enhanced hydrogen/deuterium exchange mass spectrometry. J Mol Biol 363:148–160PubMedCrossRefGoogle Scholar
- Hammel M, Yu Y, Mahaney BL, Cai B, Ye R, Phipps BM, Rambo RP, Hura GL, Pelikan M, So S, Abolfath RM, Chen DJ, Lees-Miller SP, Tainer JA (2010b) Ku and DNA-dependent protein kinase dynamic conformations and assembly regulate DNA binding and the initial non-homologous end joining complex. J Biol Chem 285:1414–1423PubMedCrossRefGoogle Scholar
- Hammel M, Rey M, Yu Y, Mani RS, Classen S, Liu M, Pique ME, Fang S, Mahaney BL, Weinfeld M, Schriemer DC, Lees-Miller SP, Tainer JA (2011) XRCC4 protein interactions with XRCC4-like factor (XLF) create an extended grooved scaffold for DNA ligation and double strand break repair. J Biol Chem 286:32638–32650PubMedCrossRefGoogle Scholar
- Hura GL, Menon AL, Hammel M, Rambo RP, Poole FL 2nd, Tsutakawa SE, Jenney FE Jr, Classen S, Frankel KA, Hopkins RC, Yang SJ, Scott JW, Dillard BD, Adams MW, Tainer JA (2009) Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS). Nat Methods 6:606–612PubMedCrossRefGoogle Scholar
- Porod G (1982) General Theory. In: Glatter O, Kratky O (eds) Small angle X-ray scattering. Academic Press, London, pp 17–51Google Scholar
- Rochel N, Ciesielski F, Godet J, Moman E, Roessle M, Peluso-Iltis C, Moulin M, Haertlein M, Callow P, Mely Y, Svergun DI, Moras D (2011) Common architecture of nuclear receptor heterodimers on DNA direct repeat elements with different spacings. Nat Struct Mol Biol 18:564–570PubMedCrossRefGoogle Scholar
- Schwieters CD, Suh JY, Grishaev A, Ghirlando R, Takayama Y, Clore GM (2010) Solution structure of the 128 kDa enzyme I dimer from Escherichia coli and its 146 kDa complex with HPr using residual dipolar couplings and small- and wide-angle X-ray scattering. J Am Chem Soc 132:13026–13045PubMedCrossRefGoogle Scholar
- Tsutakawa SE, Van Wynsberghe AW, Freudenthal BD, Weinacht CP, Gakhar L, Washington MT, Zhuang Z, Tainer JA, Ivanov I (2011) Solution X-ray scattering combined with computational modeling reveals multiple conformations of covalently bound ubiquitin on PCNA. Proc Natl Acad Sci USA 108:17672–17677PubMedCrossRefGoogle Scholar