Understanding biomolecular motion, recognition, and allostery by use of conformational ensembles
- 1.7k Downloads
We review the role conformational ensembles can play in the analysis of biomolecular dynamics, molecular recognition, and allostery. We introduce currently available methods for generating ensembles of biomolecules and illustrate their application with relevant examples from the literature. We show how, for binding, conformational ensembles provide a way of distinguishing the competing models of induced fit and conformational selection. For allostery we review the classic models and show how conformational ensembles can play a role in unravelling the intricate pathways of communication that enable allostery to occur. Finally, we discuss the limitations of conformational ensembles and highlight some potential applications for the future.
KeywordsDynamic ensembles NMR Conformational selection Induced fit Allostery
Molecular recognition is of paramount importance in biology—without it life would not exist. Before the first 3D structures of biomolecules were determined (Watson and Crick 1953; Kendrew et al. 1958; Muirhead and Perutz 1963), the lock and key model of molecular recognition in the binding events associated with enzymatic catalysis had already been proposed (Fischer 1894). Over time, an appreciation of the structural changes that can occur upon binding led to the related induced fit (Koshland 1958) and fluctuation fit (Straub and Szabolcsi 1964) models. At about the same time two complementary models for describing allostery, a key biological mechanism that is responsible for information transfer, were also proposed, the concerted model (Monod et al. 1965) and the sequential model (Koshland et al. 1966). These models were proposed before the development of molecular dynamics (MD) simulations (McCammon et al. 1977), a method that enables atomic-level understanding of the types, amplitudes, and timescales of the motion of macromolecules and has influenced current views of how both molecular recognition and allostery occur (Karplus 2003; van Gunsteren and Dolenc 2008).
In addition to being important for fundamental reasons, theoretical or hybrid theoretical–experimental methods for characterisation of structural heterogeneity can be very important in structure-based drug discovery. In the near future, as recently demonstrated by the Al-Hashimi laboratory (Stelzer et al. 2011), it will indeed be possible to improve in-silico drug screening by use of conformational ensembles, because these contain the inherent functional motion of the therapeutic target. Here we review the advances that have been made in the understanding of the motion, molecular recognition, and allostery of biomolecules by use of conformational ensembles and discuss how these powerful techniques will continue to guide our understanding of these and related important biological phenomena.
Normal mode analysis, a theoretical tool that is complementary to MD, and its coarse-grained equivalents, the Gaussian network model (Bahar et al. 2010) and the anisotropic network model (Zheng 2010), have been useful to determine the motion that occurs on potentially slower time scales (low-frequency modes). Normal mode analysis produces projections of the modes rather than conformational ensembles but can, in principle, probe longer timescales. Although relating the frequencies of motion of biomolecules to biological function has remained contentious (Kamerlin and Warshel 2010; Karplus 2010) the motion of biomolecules is likely to be related to their function.
The experimental evidence for protein motion comes from a wide range of techniques covering different timescales. Faure et al. (1994) were able to directly model the motion of lysozyme in the crystalline form by using atomic fluctuations around the mean atomic positions that give rise to diffuse scattering of the beam in diffraction experiments. Neutron scattering has also been used, in a similar manner, to detect motion in myoglobin (Kneller and Smith 1994; Frauenfelder and Mezei 2010), as has Mossbauer spectroscopy, by use of which it was observed that fluctuations of the solvent cause internal protein motion (Frauenfelder et al. 2009). Further evidence of motion from X-ray crystallography data comes in the form of the multiple conformations of proteins and nucleic acids that are obtained when the crystallization process is carried out several times. These conventional experiments can be complemented by single-molecule techniques that enable the observation of single states of a given molecule. Single molecule fluorescence energy transfer (FRET) spectroscopy, for example, enables, in principle, observation of distance distributions rather than average distances. These methods have provided very solid evidence that, because of macromolecular dynamics, inter-atomic distances are not fixed, either in proteins (Deniz et al. 2000) or in nucleic acids (Deniz et al. 1999).
The most detailed and exhaustive experimental studies of protein motion have been conducted with NMR spectroscopy. This technique has the ability to probe structural motion with atomic detail over the entire range of timescales from picoseconds to seconds (Fig. 2). Since the first comprehensive study of fast protein motion by NMR (Allerhand et al. 1971) it is has become routine to characterize the motion of proteins (Kay et al. 1989) that is faster than rotational diffusion by use of heteronuclear relaxation rates. For a recent review of the application of NMR to the study of the rapid motion of biomolecules and their complexes, the reader is directed to Jarymowycz and Stone (2006). In addition, Kay et al. have shown that it is possible to use the Carr–Purcell–Meiboom–Gill (CPMG) NMR measurements developed by Palmer et al. (2001) and Loria et al. (1999) to identify and characterize conformations that are present with a very low population (Korzhnev et al. 2010) when they are in relatively slow (μs–ms) exchange with the most stable conformation of the macromolecule (Mittermaier and Kay 2006). Structural fluctuations occurring on the nanosecond to microsecond timescale can be probed by measurement of residual dipolar coupling (RDC) (Salmon et al. 2011).
Biomolecular dynamics from ensembles
Since the first simulations of biomolecules MD has become the most common method for studying the motion of proteins and nucleic acids (Karplus 2003). Simulations are started from a known configuration of the molecule derived from experimental (X-ray or NMR) data or determined by homology modelling. Use of the ensembles generated from MD to analyse the function of motion for proteins and nucleic acids is too extensive to discuss in detail here; we therefore highlight some of the newer applications. In conventional MD, the starting configuration evolves in discrete time steps to produce a trajectory that simulates the fate of the molecule under a force field that models the internal energy of the system as the sum of simple potentials (Fig. 4a). In selection methods the agreement of such trajectories with experiment is improved by reassigning the statistical weights of the snapshots of the trajectory (Fig. 4b). Structure calculations using experimental data often restrain the structure at each step, as is shown in Fig. 4c, and do not report on the structural heterogeneity caused by dynamics. Time and ensemble restraining schemes in Fig. 4d, e can, instead, lead to ensembles that accurately capture the structural heterogeneity of the system as reflected by the experimental data.
The motion of bacteriophage T4 lysozyme (B4L) and adenylate kinase (AdK) has been studied by MD for a number of years. B4L is a hydrolytic enzyme composed of two domains and its dominant motion changes the orientation between them. In the crystal the protein is in a closed conformation but MD studies indicate that in solution it is likely to be more open than the crystal structures suggest (de Groot et al. 1998). This open state of the molecule has also been observed experimentally (Goto et al. 2001) and it is likely that this protein passes between the open and closed states via collective opening of the active site (Hub and de Groot 2009). One question that remains unanswered, however, is what the populations of the open and closed states are for this molecule. For AdK, it is well known that motion is highly important in biological function (Olsson and Wolf-Watz 2010). This protein is known to undergo large conformational changes in both the presence and absence of substrates in solution (Beckstein et al. 2009). It seems that the binding and release of substrate, which are rate-limiting for this enzyme, are related to these conformational changes. The link between catalysis and the structural changes that this protein undergoes is however hotly debated.
Shaw et al. (2010) have recently shown the power of MD simulations by developing specific hardware and software for this purpose. By running microsecond-long MD simulations of the Kc1.2 pore domain they were able to show that potassium channel gating is a result of solvent expulsion from the active site, and confirmed the previously predicted model of ion transport (Hodgkin and Keynes 1955). Simulations of DNA and RNA are more limited, because of the relatively lower quality of the force fields, but are now maturing (Perez et al. 2007b), making it possible to explore the functional motion of nucleic acids. Comparison of the motion of double-stranded DNA and RNA, for example, has recently shown that RNA duplexes are more rigid than their DNA counterparts (Perez et al. 2007a; Faustino et al. 2010). It is well-established that MD suffers from limitations (van Gunsteren and Dolenc 2008) but, despite this, MD descriptions of macromolecular motion are quickly increasing in quality and this technique will continue to be a useful approach to determining ensembles.
Correlation with the experimental backbone order parameters (S 2) for unrestrained MD and for the selection method (SAS)
Correlation for unrestrained MD
Correlation for SAS
βARK1 PH domain
Time-averaged and ensemble-averaged ensembles can be seen as equivalent although they are generated using MD by using different approaches. In restrained time-averaged MD, which has been used to great effect and is reviewed elsewhere (Scott et al. 1998), the simulated molecule experiences a potential that biases the trajectory to be consistent with the time-averaged experimental observable (Fig. 4d); its key variable is the averaging time. Recent examples of its use include analysis of information about structural heterogeneity encoded in NMR data, for example NOEs and 3J couplings (Tonelli et al. 2003; Allison and van Gunsteren 2009; Missimer et al. 2010), and RDCs (Hess and Scheek 2003). Tonelli et al. (2003), for example, used restrained time-averaged MD to determine an ensemble of structures for an RNA/DNA hybrid duplex which enabled them to characterise the helix flexibility required for the binding of the duplex to ribonuclease A.
An alternative to time-averaging is restrained ensemble-averaged MD, in which multiple configurations are simulated in parallel using a potential that biases the ensemble-averaged NMR parameters to agree with the experimental observable at each step of the simulation (Fig. 4e). The characterisation of structural heterogeneity from ensemble restrained MD was initially hampered by the inability of NOE to correctly identify the correct distribution of inter-proton distances (Bonvin and Brunger 1996) because of the non-linear averaging of this NMR parameter. This problem has now been overcome to a significant extent by using more appropriate NMR data, enabling this approach to generate quite realistic representations of the structural heterogeneity of proteins and nucleic acids (Clore and Schwieters 2004a; Lindorff-Larsen et al. 2005a; Richter et al. 2007). Initially these methods used S 2 and NOE data, however observables such as scalar couplings (Lindorff-Larsen et al. 2005b), trans hydrogen bond scalar couplings (Gsponer et al. 2006), and RDCs (Clore and Schwieters 2004a; De Simone et al. 2009; Fenwick et al. 2010, 2011) can also be used.
The function of biomolecules is intimately linked to molecular recognition and, as a result, the purpose of much current research is to describe kinetically and structurally the interactions of biomolecules. As ensemble approaches encode information that is not present in a single structural snapshot (Chaudhury and Gray 2008) they offer clear advantages over conventional methods of structure determination.
The study of molecular recognition is important not only to gain an appreciation of the underlying mechanisms of essential biological function, but also to aid the development and generation of new drugs (Lane 2001; Aleksandrov and Simonson 2010). Four mechanistic models have been proposed for molecular recognition; these are the lock and key, induced fit, fluctuation fit, and conformational selection models. Recent reviews of the models themselves are already available in the literature (Ma et al. 1999; Kumar et al. 2000; Grunberg et al. 2004; Hammes et al. 2009; Zhou 2010; Vertessy and Orosz 2011). Here we will describe how conformational ensembles can be used to determine the extent to which the different models are appropriate for understanding the mechanism of molecular recognition for a given system.
The first of the models was proposed under the assumption that molecules were mainly static. In the model proposed by Fisher—the lock and key model—two molecules fit together because of their complementary shapes (Fischer 1894). The second model—the inducted fit model—proposed instead that one or both of the molecules changed conformation concomitantly with formation of the complex (Koshland 1958). These definitions distinguish the two possible mechanisms of binding for static molecules but they are not immediately applicable to most biomolecules because, as discussed above, these are appreciably dynamic.
Straub and Szabolcsi (1964) recast the lock and key model to accommodate the dynamic properties of biomolecules. In their new model, the fluctuation fit model, dynamic molecules pass through different conformational states and form a complex in a lock and key fashion when two complementary configurations occur (Straub and Szabolcsi 1964; Vertessy and Orosz 2011).
A unified model for molecular recognition has been proposed in which recognition proceeds via a three-step process (Grunberg et al. 2004), though in some cases some of the steps can be negligible. The first step is diffusion, followed by conformational selection, and the last step is induced fit. The authors argue that this mechanism fits the energetics and kinetics of complex formation better than conformational selection or induced fit alone. Recently the relative weights of conformational selection and induced fit were analysed for ubiquitin binding (Wlodarski and Zagrovic 2009; Long and Brüschweiler 2011a); in agreement with this unified model of molecular recognition both conformational selection and induced fit were observed to be involved in the binding mechanism.
In the literature, the induced fit model is more prevalent than the conformational selection mechanism, because of the common observation that the crystal structure of the bound protein is different to that of the free molecule. Others have noted, and we stress here again, that a conclusion of induced fit because of different snapshots of the free and bound proteins is not warranted (Boehr et al. 2009). Similarly, the inability to detect low populated states does not mean that they do not exist. Recent advances enable the detection and characterisation of near-invisible, low populated stable states in the native ensemble (Korzhnev et al. 2010). It will be seen below that structural heterogeneity can sometimes account for the observed differences without the need to invoke an induced fit mechanism. Discrimination of recognition mechanisms purely on the basis of static structures is not possible and the use of methods that provide information on structural heterogeneity can indeed be of great use.
Various mechanisms have been proposed to describe the binding behaviour of the three-domain protein AdK. In this protein, the two side domains need to close over the central domain for catalysis to occur. Early X-ray structures indicated that in the free form the molecule exists in an open state whereas binding to substrates or substrate analogues results in a closed state (Pai et al. 1977). This original report described the binding as induced fit but in a later NMR study it was observed that the side domains of AdK undergo substantial microsecond to millisecond conformational exchange in the absence of substrate (Shapiro et al. 2002) and that the substrate-bound form of the protein is compact. However, these results neither prove nor disprove the hypothesis that free AdK visits the closed state with any appreciable frequency. To determine whether conformational selection or induced fit best represent the mechanism by which AdK binds its substrates, a conformational ensemble is potentially a very powerful tool. If induced fit is the dominant mechanism for this protein the conformational ensemble would not contain closed configurations; if, instead, closed configurations can be found in the ensemble of the free protein it is then possible that AdK assumes the bound configuration by conformational selection.
Arora and Brooks (2007) performed MD simulations of AdK and observed that the protein does, indeed, assume the closed state, with only minor structural rearrangement required to obtain the bound configuration. Moreover, experimental evidence for AdK visiting the closed state in the absence of substrate comes from three configurations of a thermophilic AdK that were crystallised and are intermediate structural snapshots between the open and closed states (Henzler-Wildman et al. 2007). It would therefore seem that the dominant mechanism whereby AdK reaches the bound state, in terms of gross structural rearrangement, is conformational selection. It is clear from this example of AdK that conformational ensembles of the free state provide a simple way to determine which mechanism is dominant for a given system. Despite the clarity of the models above, the number of cases in which a given mechanism has been clearly demonstrated is limited. In the next section, we highlight some representative examples and discuss how they can be interpreted.
Biomolecular recognition from ensembles
The first examples of conformational selection mechanisms are starting to appear in the literature. MD simulations were used by Salsas-Escat and Stultz (2010) to generate ensembles of type III collagen that describe its conformational heterogeneity. They observed that collagen could adopt in the free state the configuration that it adopts in the active site of proteolytic enzyme CMMP8 that cleaves disordered peptides. Collagen contains additional cleavage sites for the proteolytic enzyme CMMP8 but these sites do not sample the bound configuration. These experiments suggest that proteolytic degradation of collagen is controlled by the formation of the correct collagen conformation, and the results are consistent with the experimentally observed cleavage pattern (Fields 1991).
Conformational selection has also been shown to be a viable molecular recognition model for nucleic acids. TAR RNA, for example, selects its ligands by conformational selection (Zhang et al. 2006). This RNA contains a bulge, which is a source of structural heterogeneity in the molecule. The motion of the bulge was analysed by use of RDCs with the help of MD simulations to generate a conformational ensemble that captures the dynamic sub-states of the system. Analysis of the ensemble revealed that the internal motion of TAR RNA was equivalent to the structural changes observed in X-ray structures of TAR RNA in complexes with its ligands.
A combination of conformational selection and induced fit has been observed in ubiquitin binding. More than 40 structures of ubiquitin in complex with binding partners are available and show that ubiquitin can adopt different configurations upon binding. Lange et al. (2008) determined a conformational ensemble of unbound ubiquitin from RDC and NOE data and were able to identify an ensemble member that was within ~0.8 Å of each of the bound structures of ubiquitin. This result showed that sampling the free state was sufficient to reach the configuration of the bound state in a conformational selection mechanism, with only minor induced fit changes to the backbone and side chain rotameric states. Wlodarski and Zagrovic (2009) further studied ubiquitin using global multidimensional scaling analysis to determine the relative weights of induced fit and conformational sampling, and concluded that the role of induced fit was only marginally lower than that of conformational selection. We recently generated a new conformational ensemble of ubiquitin using RDCs and NOEs and identified ensemble members that are extremely similar to the bound states of ubiquitin (~0.5 Å); this result suggests that conformational selection is likely to play a larger role than induced fit in the molecular recognition of ubiquitin (Fenwick et al. 2011). Recently Long and Brüschweiler have used a reweighting method to study the interplay between ubiquitin and UIM during complex formation. Their innovate technique gives further evidence of the role of conformational selection in the binding of ubiquitin partners via a population redistribution i.e. population shift mechanism (Long and Brüschweiler 2011a). We highlight the differences between this example of conformational selection with that of induced fit for TIS11d in Fig. 7. It can be seen that in the case of conformational selection the bound and free states have low and similar RMSD profiles, whereas for the induced fit case sampling of the bound and free states of TIS11d is quite different.
Gaspari et al. (2010) have shown that the canonical serine protease inhibitor lock and key model can be explained by conformational selection. They generated conformational ensembles from S 2 and NOE data for two peptide proteases inhibitors (SGCI and SGTI) and found in both cases that the conformational ensemble of the free states contained configurations corresponding to the structures of the inhibitors bound to the proteases (Gaspari et al. 2010).
For RNA–protein interactions, it has been proposed that induced fit makes the largest contribution to molecular recognition because in many cases the solution structure of the RNA is markedly different from that observed in the complexes with RNA-binding proteins (Williamson 2000). Despite this, counter examples can be found for which the dominant mechanism seems to be lock and key. Wright and co-workers show that a lock and key mechanism can explain the binding of finger 4 of the transcription factor IIIA to 5S RNA. A second report of lock and key for RNA has also become available in which one of the molecules is static and does not change its conformation upon binding. In this case the lock and key hypothesis was based on the solution structure of 7SK-SL RNA free and in complex with argininamide (Dethoff and Al-Hashimi 2010; Durney and D’Souza 2010), which showed no evidence of reorganisation upon binding. Moreover, mutations that freeze out motion of the related TAR RNA have been shown to increase the binding affinity (Stelzer et al. 2010). These examples highlight the possibility that the static lock and key mechanism may apply in some cases, although examples are limited.
Allostery and correlated motion
In the above examples, we have ignored the role of allostery in molecular recognition. Allostery is the process by which the affinity of a binding site for a ligand is affected by the binding of a second ligand in a different, distant, site. This process requires the transfer of structural and/or dynamic information across the macromolecule through, potentially, correlated conformational changes. Simple contact models can characterize networks of this type for some biomolecules but others seem to operate via more complex coupling mechanisms (Daily et al. 2008) that can include changes in quaternary structure (Daily and Gray 2009).
The Nussinov group recently emphasised the importance of allostery in signal transduction and transcriptional control (Ma et al. 2010). In their opinion, all interactions can potentially have allosteric consequences, because of the nature of population shift, and can result in highly complex and redundant networks. Using these ideas, they were able to rationalise the complexity of biomolecular interaction networks that operate in transcriptional regulation (Pan et al. 2009). This work has recently been reviewed (Pan et al. 2010).
In the absence of experimentally validated conformational ensembles, double mutant cycles are one of the strongest experimental validations of allosteric channels and are routinely used in their investigation. These types of data can be useful in that they can determine if cooperative channels and mechanisms are present. In some cases they can indicate which residues are involved in such channels. Determination of the underlying mechanisms is more difficult, however, and requires atomic level descriptions of biomolecular motion. Istomin et al. (2008) have used this type of analysis and observed clear channels of communication between residues separated by non-sequential residues in many different proteins. In an NMR approach Mayer et al. (2003) combined NMR relaxation data for ten mutants of the B1 IgG binding domain of streptococcal protein G and observed that the backbone order parameters varied less than would be expected if the residues fluctuated independently, suggesting widespread concerted motion of pairs of residues. This claim is controversial and could not be substantiated by MD simulations of this and other proteins (Lange et al. 2005).
Biomolecular allostery from ensembles
Descriptions of allostery from ensembles can be indicative of correlated motion. Because of the challenges in directly determining time-resolved coordinates from experiments, detection of correlations between the motion of residues distant in sequence has remained elusive. As described above, conformational ensembles are most often used for this purpose. They combine experimental (NMR) data with MD, and enable protein motion to be characterized at atomic resolution.
Analysis of MD simulations has become commonplace in Cartesian space, with examples covering a large range of proteins. Dihydrofolate reductase has non-additive behaviour in double mutant cycles and MD simulations (Agarwal et al. 2002; Rajagopalan et al. 2002), in which the affect of mutations can be rationalised in terms of the small structural changes and specific rearrangements of the hydrogen bonds (Rod et al. 2003). Antibody conformations in the free and bound forms are also significantly different. MD simulations of the unbound antibody were analysed to determine which motion is correlated between the various regions. It was observed that intra-domain correlations exist on timescales of 40 ps and above, and that inter-domain correlations between the heavy and light chains were not present below 60 ps (Viswanathan et al. 2000). These results suggest that allostery exists between the heavy and light chains of the antibody and may be important in communicating the bound signal through the antibody to the Fc region, which leads to the correct immune response.
Recently an MD study on the ferredoxin protein motif has revealed that mutations in a loop region distant (~20 Å) from the active site have an appreciable affect on the redox potential at the iron–sulphur cluster-binding region (Nechushtai et al. 2011). The loop truncation was observed to have limited effect on the structure of the molecule as determined by X-ray crystallographic studies; the authors propose that changes in motion, propagated via a channel, are responsible for the change in redox potential. MD simulations were used to characterise the motion of the system and the ensembles generated were analysed to uncover correlated motion. The simulations predicted the propagation of motion between strands of the β-sheet, consistent with the β-lever motion (Fenwick et al. 2011) that has been observed to couple motion across β-strands.
Illustration of correlations in dihedral space has primarily been applied to understanding of the local motion in the backbones of proteins. Indeed the first MD simulations reported for BPTI showed backbone correlations from analysis of the trajectory in dihedral space (McCammon et al. 1977). These correlations were observed to minimise the structural displacements that would otherwise be caused by large-amplitude bond rotation in the backbone. The crankshaft motion is caused by the rigidity of the peptide plane and couples ψ i−1 with ϕ i . The authors also observed that the crankshaft motion also operates between the χ angles of aromatic residues to enable motion of the side chain without flipping of the aromatic side chain (McCammon et al. 1977). These correlations have also been described by use of modern very long MD simulations with state of the art force fields (Fitzgerald et al. 2007; Li et al. 2009).
Ensembles generated with experimental data are also known to reproduce short and medium-range correlated motion. Clore and Schwieters (2004a) were able to obtain direct evidence of the existence of crankshaft motion from ensemble MD simulations of the B3 IgG binding domain of streptococcal protein G restrained by RDCs. A second study showed that a description of the dynamics of the same protein using the 3D Gaussian axial fluctuation model, that was fit to RDCs, gave better agreement with experimental NMR results when correlated motion between residues in opposite strands was invoked (Bouvignies et al. 2005). This study provided evidence that hydrogen bonds transfer motion between strands, which presumably could also operate to couple motion up and down helices.
Our analysis of the conformational ensemble of ubiquitin revealed that correlated backbone motion both conserves the structure of ubiquitin and provides a pathway for transfer of structural and dynamic information. The pathway is very detailed in the sense that the combination of experimental data with the MD force field gives rise to a pathway of dihedral rotations that can be connected in a linear sequence of events (Fenwick et al. 2011).
Challenges in characterizing molecular recognition and allostery
In the preceding sections, we have detailed the use of ensembles to elucidate mechanisms of binding and how the use of ensembles can help to identify the model of molecular recognition. We have also presented the current view of allostery with relevant examples to show that ensembles can play a pivotal role in dissecting the mechanisms that govern signal transduction. The use of the ensembles is not limited to globular folded proteins and nucleic acids but also finds applications in disordered and highly flexible biomolecules. Despite recent theoretical developments, limitations in ensemble generation still limit the resolution and quality of ensembles for these later systems. The current limit is not with the methods of analysis, but rather relate to the difficulty in determining ensemble representations.
One limitation is the sampling problem. In the applications above, which used unrestrained MD simulations to generate the ensembles, the timescale of the simulation required is generally 10 times longer than the process of interest that one is trying to study. For small globular proteins, with high-frequency motion only, MD simulations are now able to capture the required motion (Showalter and Brüschweiler 2007; Li and Brüschweiler 2009; Lange et al. 2010). However, for large or extended molecules, i.e. molecules with multiple domains, for which large amplitude motion is possible, conventional MD does not sample enough space in the practical timescales of these simulations. Advanced sampling techniques are now being used to reduce this limitation and have been shown to improve the agreement between simulations and experimental data (Lange et al. 2006; Allison and van Gunsteren 2009; Markwick et al. 2009). Ensemble restrained methods do not suffer from this problem as much as unrestrained MD, because simulated annealing is often used to explore conformational space, and the experimental restraints can be seen as generating a system-specific force field.
NMR data are not the only data that can be used to generate ensembles for study of molecular recognition. Some recent ensembles have been generated by use of small-angle X-ray scattering (SAXS) (Bernado et al. 2010). Other experimental techniques are not so limited by size, or improve in resolution as size increases. Thus, the use of FRET distance distributions, SAXS, small-angle neutron scattering (SANS), and other techniques may be of greater benefit than NMR for these systems. Indeed SAXS and NMR seem to give a consistent view for systems even as challenging as unfolded proteins (Bernado and Blackledge 2009). These methods tend to be used when the sizes of the complexes are too large to be studied by NMR. NMR is still favoured in many cases because of the detailed site-specific information that can be extracted by use of this technique.
Theoretical interpretation of the data can still be a problem. One example that comes from our laboratory is the use of RDCs in the presence of flexibility. We have shown that conventional methods for determining motion from RDCs cannot be used to interpret the motion of multi-domain biomolecules. Until recently, only simple models were available for domain motion (Ryabov and Fushman 2006). Current research is providing solutions to this problem and will enable the motion of these systems to be characterised at atomic resolution (Esteban-Martin et al. 2010; Huang and Grzesiek 2010).
The generation of ensembles for unfolded states is very difficult. Statistical coil models, in which ensembles were generated by randomly selecting dihedrals from loop libraries, had some success (Bernado et al. 2005; Jha et al. 2005; Bernado and Blackledge 2009). Despite the local properties being approximately correct for these ensembles, there is room for improvement (Nodet et al. 2009). Refining these ensembles with RDCs enabled the local refinement of these structures for ubiquitin (Esteban-Martin et al. 2010) whereas some corrections for longer-range contacts were still needed to generate ensembles of α-synuclein, although progress is being made (Bernado et al. 2005; Salmon et al. 2010).
Two remaining challenges are to accurately determine the population weights of the ensemble members and the timescale of their interconversion. It is very challenging to determine relevant population weights with certainty from experimental data but possibility of generating a free energy landscape from distances measured using FRET was recently demonstrated (Schuetz et al. 2010). Advances of this type together with improvements in ensemble methods will lead to determination of ensembles of larger or elaborate biomolecules.
How do biomolecules recognise their partners and lead to biological responses? We have in this review provided an account of how conformational ensembles determined by combining NMR data with molecular simulations can be used to determine the mechanism by which molecular recognition and its consequences occur in biological macromolecules. As we imply above, the field is still in its infancy and there are many types of structural heterogeneity in macromolecules that are challenging to experimentally characterize, because of difficulties in interpretation of the experimental data or because of the absence of algorithms for efficient generation of conformational ensembles at high resolution. There is no doubt, however, that as better experimental and computational methods become available the combination of experimental data with molecular simulations will enable us to better understand the mechanism of molecular recognition and binding allostery and to exploit this new knowledge for discovery of better and less expensive drugs by structure and dynamics-based drug design.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Allerhand A, Doddrell D, Glushko V, Cochran DW, Wenkert E, Lawson PJ, Gurd FR (1971) Conformation and segmental motion of native and denatured ribonuclease A in solution. Application of natural-abundance carbon-13 partially relaxed Fourier transform nuclear magnetic resonance. J Am Chem Soc 93:544–546PubMedCrossRefGoogle Scholar
- Deniz AA, Laurence TA, Beligere GS, Dahan M, Martin AB, Chemla DS, Dawson PE, Schultz PG, Weiss S (2000) Single-molecule protein folding: diffusion fluorescence resonance energy transfer studies of the denaturation of chymotrypsin inhibitor 2. Proc Natl Acad Sci USA 97:5179–5184PubMedCrossRefGoogle Scholar
- Eigen M (1967) Kinetics of reaction control and information transfer in enzymes and nucleic acids. Nobel Symp 5:333–369Google Scholar
- Fenwick RB, Esteban-Martin S, Richter B, Lee D, Walter KFA, Milovanovic D, Becker S, Lakomek NA, Griesinger C, Salvatella X (2011) Weak long-range correlated motions in a surface patch of ubiquitin involved in molecular recognition. J Am Chem Soc 133:10336–10339Google Scholar
- Karplus M (2010) Role of conformation transitions in adenylate kinase. Proc Natl Acad Sci USA 107:E71 (author reply E72)Google Scholar
- Long D, Brüschweiler R (2011b) Atomistic kinetic model of population shift and allostery in biomolecules. J Am Chem Soc. doi: 10.1021/ja208813t
- Nechushtai R, Lammert H, Michaeli D, Eisenberg-Domovich Y, Zuris JA, Luca MA, Capraro DT, Fish A, Shimshon O, Roy M, Schug A, Whitford PC, Livnah O, Onuchic JN, Jennings PA (2011) Allostery in the ferredoxin protein motif does not involve a conformational switch. Proc Natl Acad Sci USA 108:2240–2245PubMedCrossRefGoogle Scholar
- Straub FB, Szabolcsi G (1964) Remarks on the dynamic aspect of enzyme structure. In: Braunstein AE (ed) Molecular biology: problems and perspectives. Nauka, Moscow, pp 182–187Google Scholar