Physicochemical bases for protein folding, dynamics, and protein-ligand binding

Proteins are essential parts of living organisms and participate in virtually every process within cells. As the genomic sequences for increasing number of organisms are completed, research into how proteins can perform such a variety of functions has become much more intensive because the value of the genomic sequences relies on the accuracy of understanding the encoded gene products. Although the static three-dimensional structures of many proteins are known, the functions of proteins are ultimately governed by their dynamic characteristics, including the folding process, conformational fluctuations, molecular motions, and protein-ligand interactions. In this review, the physicochemical principles underlying these dynamic processes are discussed in depth based on the free energy landscape (FEL) theory. Questions of why and how proteins fold into their native conformational states, why proteins are inherently dynamic, and how their dynamic personalities govern protein functions are answered. This paper will contribute to the understanding of structure-function relationship of proteins in the post-genome era of life science research.

Proteins are important bio-macromolecules that are composed of one or more chains of amino acids. They perform a vast variety of functions in vivo and participate in virtually every process within a cell, including catalytic reactions in metabolism, response to stimuli during immune response, replicating DNA during cell division, and other crucial roles in signal transduction, molecular transport, cell adhesion, cell cycle, gene-expression regulation, and structural or mechanical functions in muscle and cytoskeleton. Therefore, understanding the functions of proteins is vital in the field of life sciences.
The central dogma of protein structural biology is that the amino acid sequence contains all the information necessary for a protein to fold into its three-dimensional structure under the proper physiological/experimental environment, and that the structure is essential for protein function [1][2][3]. The high-resolution X-ray crystallography has brought about a revolution in the field of structural biology through producing numerous static structures, resulting in a surge of studies into the structure-function relationship of proteins [4]. However, because proteins are dynamic entities, the static unique structure or "folded state" of a protein ob-tained from such experimental techniques cannot provide the final answer to its function. Indeed, the biological functions of proteins are rooted in their physical motions in the cell [5,6]. The physical motions consist of (i) protein folding process that results in the natively folded state; (ii) protein dynamics and molecular motions that occur within the folded conformational state; and (iii) the more specific function-related dynamics that occur during protein-ligand recognition and interaction.
Many methods have been developed to investigate protein dynamics on a variety of timescales at different resolutions [4]. Experimental approaches such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, Laue X-ray diffraction, cryo-electron microscopy, and small-angle X-ray scattering provide atomic-resolution snapshots or structural ensembles, which can be used to infer and/or obtain information about dynamics on slow timescales. Other classical biophysical techniques such as circular dichroism (CD), fluorescence resonance energy transfer (FRET), infrared spectroscopy, Raman spectroscopy, and electron paramagnetic resonance, despite being low-resolution and local-site methods, provide kinetic (i.e., the conformational transition) information over a large range of timescales. Computational approaches such as molecular dynamics (MD) simulation [7,8] and its derivative methods, including simulated annealing and replica exchange Monte Carlo techniques [9], force-field simplification models such as Gaussian network models [10] and Gõ models [11], and enhanced sampling techniques such as metadynamics [12][13][14], umbrella sampling [15][16][17][18] and targeted and steered simulations [19][20][21][22], can provide not only complete information about protein dynamics at the atomic level (i.e., the precise positions of individual particle motions as a function of time), but also the underlying forces and corresponding energies that dictate such motions. Of particular note is that the enhanced sampling methods of MD can be used to calculate free energy differences and then reconstruct the FEL of a protein-solvent system. Computational simulations, coupled with improved computer power and advanced algorithms [23][24][25], have greatly facilitated protein dynamics studies, including studies on the protein folding mechanism, conformational space sampling, molecular motions, and conformational transitions, which in turn enhance significantly our understanding of the relationship between protein dynamics and functions.
The experimental and computational methods used for investigating protein folding and dynamics, as well as specific study cases about dynamics-function relationship of individual proteins, have been reviewed extensively elsewhere [4,6,[26][27][28][29], and therefore are not covered in this review. Here we focus mainly on the fundamental physicochemical principles and/or mechanisms underlying the physical motions of proteins in an attempt to address the following questions: (i) Why can proteins fold rapidly into their native conformational states? What are the forces that drive the folding process? How do they do so? (ii) Why are proteins inherently dynamic? What are the factors that dictate the protein dynamics? iii) What are the driving forces for protein-ligand binding? How do they drive the recognition and binding processes? To address these questions the concepts of the enthalpy, entropy, free energy, and FEL for protein-solvent systems, need to be introduced.

Basic concepts of a protein-solvent thermodynamic system
A protein-solvent system is a thermodynamic system composed of the solute (the protein molecules), liquid water, buffer ions, and other substances such as small molecular compounds, cofactors, and ligands. Very complex interactions and energy/heat exchange exist among these substances; therefore, the macroscopic conformational states of a protein and a change in system properties are a result of the average behavior of a very large number of the microscopic constituents. The relationship between these substances and how heat transfer is related to various energy changes within a system are dictated by the laws of thermodynamics [30].
The entropy in a protein-solvent system can be regarded as a measure of how evenly the thermal energy (also called heat, heat energy or kinetic energy in physics) can be distributed over the entire system. Such a concept is central to the second law of thermodynamics, which states that thermal energy always flows spontaneously from regions of higher temperature to regions of lower temperature. Because the tendency to distribute energy as evenly as possible will reduce the state of order of the initial system, entropy is generally treated as an expression of disorder or randomness of the system [6]. When a protein-solvent system is under constant temperature and pressure, the origin of entropy is the heat energy stored in atoms or molecules within the system. This energy makes atoms jostle around and bump onto one another, and this leads to further vibrations, fluctuations, and motions of groups (e.g., amino acid residue side chains), residues, and molecules (e.g., water, protein, and ligands). Conclusively, entropy is a spontaneous process of the transfer and diffusion of thermal energy among various system constituents, which, despite a macroscopic process, originates from microscopic degrees of freedom of atoms and groups, leading ultimately to a as homogeneous as possible energy distribution over the system and, as a result, the random maximization of the system. Entropy cannot be measured directly, but entropy change can be measured quantitatively when it is considered as a change in the degrees of freedom of a system. Entropy change can be further divided into changes in conformational, rotational, and translational entropies of the protein and other solutes, as well as the changes in solvent entropies of the liquid water and ions [5].
Enthalpy is a measure of the total energy of a protein-solvent system. It is the sum of the internal energy (i.e., the energy required to create the system) and the amount of energy required to make room for the system by displacing its environment and establishing its volume and pressure (calculated as the product of the system volume multiplied by the pressure). Enthalpy is a quantifiable state function but the total enthalpy of a protein-solvent system cannot be measured directly. However, the change in enthalpy can be measured when it is expressed as the change in a system's energy. The formations of noncovalent bonds between atoms within a protein and between a protein and the solvent are exothermic processes, and therefore the change in enthalpy is negative. In contrast, breakages of noncovalent bonds are endothermic reactions and therefore lead to a positive enthalpy change of a system. It should be noted that the enthalpy change is a global property of the entire system, which arises from the combined effect of the formations and breakages of various noncovalent bonds (van der Waals contacts, hydrogen bonds, ion pairs, and other polar and nonpolar interactions) between atoms from both the solute and solvent [6].
The thermodynamic free energy is the energy in a physical system that can be converted to do work. For a protein-solvent system, the free energy is generally called the Gibbs free energy [31], which can be regarded as an approximation of the chemical potential, a thermodynamic quantity used to measure the change in the energy of a system. Gibbs free energy is a state function of the system and only when the change in free energy is negative can the state transition process occur spontaneously. The free energy change (∆G) is expressed as where ∆H and ∆S are the change in enthalpy and entropy of the system, respectively, and T is the temperature in Kelvin. An important principle of thermodynamics is, when a system reaches equilibrium at constant pressure and temperature, ∆G is negative and the Gibbs free energy of the system is minimized (the free energy minimum). The corresponding system state at the free energy minimum is called the free energy minimum state. Specifically, no spontaneous process will occur if ∆G is positive, unless the system constituents and system conditions are changed, or additional work is done. For a protein-solvent system, the Gibbs free energy is described further as the FEL [32,33], which is a very useful and vivid way to characterize the states of a protein-solvent system with their corresponding free energy values. Dill [34] defined the FEL of a protein-solvent system as where F(x) is the free energy at a state "x", which is further characterized by variables x 1 , x 2 ,…, x n that represent a variety of system states. A system state is the integral state that results from complex interactions between the solute and solvent, including not only the states of the protein and other small molecules (such as ligand, cofactor, and compound), but also those of water and ions. In protein folding and protein-ligand binding studies, researchers generally focus on the conformational state of the protein, and for simplicity, x 1 , x 2 ,…, x n are regarded as variables that specify only the microscopic states of the protein [34]. These include all the dihedral angles of the protein chain, the eigenvector projections derived from essential dynamics analysis of an MD trajectory [6,35,36], the number of native contacts, end-to-end distance of the peptide chain, and an order parameter that describes the similarity of the protein structure to the native or other states [37], or any degree of freedom of the protein [34].

The protein folding problem
The protein folding problem can be divided into two main problems [38,39]: the kinetic question of why the protein can fold so fast, known as Levinthal's paradox [40]; and the thermodynamic question of how the native structural states result from inter-atomic forces acting on an amino acid residue sequence, also called the folding code. The first question was originally posed by Levinthal [40] in 1968 because he noted that although proteins have enormous conformational spaces, they can search, converge and fold quickly to their native states within a microsecond to second timescale.
To address the kinetic question, the forces that drive and guide the protein folding process have to be ascertained and elucidated, thus invoking the thermodynamic question, the folding code and its action mechanism. Clearly, these two questions are closely related and both need to be solved to obtain a complete understanding of the protein folding mechanism [6,41].

Kinetics of protein folding
Zwanzig established a mathematical model and solved the kinetic question of protein folding by showing that a small and physically reasonable energy bias against locally unfavorable configurations was enough to reduce Levinthal's time to a biologically significant level [42]. In other words, only when a reasonable free energy gradient exists can the protein fold rapidly into its native state. Under denaturation conditions, namely, the addition of strong acid/base, a concentrated inorganic salt, or an organic solute (e.g., alcohol, chloroform or urea) to the protein-solvent system, the FEL of the system becomes relatively flat, and the protein loses the tertiary and secondary structures that were present in the native state [43]. If the denatured condition remains unchanged, the unfolded protein molecules wander randomly around the flat free energy surface, resulting in the extensive distribution of a huge number of conformational states over this endless surface. Therefore, the protein molecules are unable to find out an outlet to become folded. If the renaturation condition is restored, the denatured protein molecules fold rapidly back into their native state, as shown by the famous Anfinsen's experiment that was cited by the Nobel Prize Committee [3,44]. In essence, the recovery of the optimal folding condition changed the shape of FEL and then established a free energy gradient. Such a change is substantially larger than the small free energy fluctuations that occur on the FEL surface under the denatured conditions. Small free energy fluctuations produce a large number of small and shallow energy traps/wells within which different random coiled states are transiently trapped, thus making it easy for conformational transition to occur between these states. Only a substantial change in the shape of the FEL, which allows for the establishment of a steep enough slope, can trigger the folding process irreversibly towards the global free energy minimum state, although numerous traps/wells may also exist within the free energy slope.

Funnel-like shape of the FEL for protein folding
When a steep enough slope in the FEL is established under the proper solvent condition for folding, unfolded peptide chains begin to roll down the free energy slope, just like an ensemble of skiers jumping and skiing down a steep icy slope. This process essentially involves a lowering of the free energy of the protein-solvent system, including the rapid hydrophobic collapse stage [29,45,46] and subsequent slow bottleneck stages [47]. With a decrease in the system free energy, the protein gradually loses its degree of freedom because of the squeezing effect of water solvent (which is a result of the solvent entropy maximization) and the formation of noncovalent bonds within the protein and between the protein and solvent. The lower the system free energy is, the more conformational entropy the protein loses, and the more ordered conformational states the protein obtains. Thus, the lowering of system free energy accompanied by the reduction in protein conformational freedom makes the shape of the FEL of protein folding look like a funnel [47,48] (Figure 1).
Although the FEL of a protein-solvent system is highly multi-dimensional, it is pictured generally as a threedimensional surface because it is difficult to draw a multiple-dimensional space. In the canonical depiction of the folding funnel [47,49] (Figure 1), the depth of the funnel (i.e., the difference in free energy between the initial and final states) represents the energetic stability of the native versus the denatured states, and the funnel width represents the conformational space/entropy of the protein. The narrow bottom of the funneled landscape is the global free energy minimum region where the natively folded states of the protein reside. The region outside the funnel is described as a relatively flat surface with many small free energy wells distributed over this surface.

Folding process and pathways
Although protein folding problems are complex, several simplified models have been developed to explore and describe the folding process. They include models of diffusion Figure 1 Funnel-like FEL for protein folding. A, Schematic three-dimensional representation of the FEL with hills, traps/wells and free energy barriers. Five different potential folding paths are indicated by black lines with arrows showing the downhill directions from the surface outside the funnel (where the denatured states of protein molecules are located) towards the global free energy minimum (where the native states/substates are located). The rough region between the two red lines represents the location of the molten globule state resulting from the hydrophobic collapse process. B, Schematic two-dimensional representation of the folding funnel. The width of the funnel represents the conformational entropy, and the depth represents the energetic stabilization/conformational similarity of the denatured state versus the native state. The arrows indicate the free energy downhill directions and different stages in the folding process. The regions of molten globule states, transition states, glass transition states and native states are color-coded from light blue to dark blue with increasing color depth. A and B have been modified from Dill and Chan [47], and Onuchic et al. [61,62] and Wolynes et al. [63], respectively. collision [50,51], framework [52], nucleation condensation [53,54], zipping-and-assembly [55][56][57], Jigsaw puzzle [58], stoichiometry [59,60], hydrophobic collapse [46], and folding funnel [47,48]. These models are not independent and mutually exclusive but are inextricably intertwined and are commonly applied to help understand different aspects of the protein folding process (for details, see [5]). The folding funnel model is based on the FEL theory and embodies most of the aspects characterized by the other models listed above. Therefore, the folding funnel model is the model that is most widely accepted in describing the protein folding process.
Under the folding funnel model, protein folding is viewed as going down a funnel-like FEL via multiple parallel pathways from a vast variety of individual unfolded states to the native states located around the region of the global free energy minimum ( Figure 1A) [47]. At any stage the protein molecules exist as an ensemble of conformational states that can be trapped transiently in many local free energy minima because of the ruggedness of the landscape. The rapid equilibrium among these folding intermediates can be disrupted to guarantee a "continuous" downhill movement along different routes as the result of the overall trend in lowering the system free energy. In other words, there may be numerous events during the downhill process that lead to the formation of a variety of folding intermediates such as the molten globule state, transition state, glass transition state, and other relatively unstable states or substates. The detailed folding process for a small globular protein is shown in Figure 1B [6163]. The first stage is the hydrophobic collapse process, during which an ensemble of unfolded polypeptide chains collapses rapidly into a compact conformational ensemble called the molten globule. The molten globule is an important folding intermediate in which the hydrophobic amino acid side chains are squeezed together into the interior of the protein, some transient secondary structures and nonspecific tertiary interactions are formed, and the protein surface area is minimized. Nevertheless, many native contacts, or close residue-residue interactions present in the native state, have yet to form [62]. A recently published paper [29] reviewed the experimental evidence for (i) the existence of collapsed states under certain conditions (e.g., low concentration of a chemical denaturant) and (ii) the rapid collapse that occurs on a submicrosecond timescale. Each unfolded peptide chain collapses along a different pathway, down the very steep and relatively smooth upper part of the funnel, as shown in Figure 1A, and subsequently enters a rugged region filled with hills, valleys and traps. Molten globule states are therefore trapped and accumulated in these traps/wells/valleys located between hills/ridges for a relatively long time. The subsequent processes are relatively slow steps, because the trapped molten globules need to climb an uphill slope to reach a mountain pass before con-tinuing to the next downhill search [47]. Such transient trapping, uphill, downhill, transient trapping processes can repeat many times before the next outlet is found, thereby slowing exploration of the routes towards the native state. Therefore, these repetitive processes are regarded as a bottleneck for folding. The ensemble of the conformations located at the hilltop are considered as a "transition state" [47] to highlight the rate aspect of the process, but not the specific structures. Furthermore, a non-complementary change between the entropy and enthalpy can still lower the system free energy, ultimately leading the protein molecules to a so-called glass transition state [32,62]. This trapping state resembles the way a liquid becomes a glass when cooled, remaining fixed in one of many structures and being hard to reconfigure to the lowest free energy state [63]. Therefore, the transition process from the glass transition state to the native state is very slow, requiring a sufficient overall slope of the FEL so that the numerous valleys flow in a funnel towards the native state. In addition, when the folding process reaches the glass transition state, the intermediates have only a few paths towards the native structure [61]. It is worth noting that although the folding funnel model was established originally based on studies of small and fast-folding proteins, it can also be used to describe and explain the folding process of large and complex proteins [64].

Thermodynamics of protein folding
The folding-driving force, regarded as folding code, is encoded in the amino acid sequence because different/similar sequences fold into different/similar structures, and only a fraction of all possible sequences (i.e., the limited number of amino acid sequences found in nature) in the sequence space can fold into functionally structural states. The protein folding code has traditionally been seen as a sum of many individual small interactions including hydrogen bonds, ion pairs, van der Waals attractions, and other polar and nonpolar interactions [39,65]. The idea behind this viewpoint is that the primary sequences encode the protein secondary structures, which then encode the tertiary structures. Intuitively, local interactions lead first to the formation of simple local structures, and then interactions or collisions among these simple structures result in the formation of more complex tertiary structures, which is similar to the hierarchical folding process described by the diffusion collision and framework models [5052]. However, viewing the sum of many different interactions as the driving force may be one-sided because (i) a dominant component to the folding-driving force, the solvent entropy effect, is ignored [6]; (ii) the statistical mechanical model demonstrated that the folding code is distributed both locally and non-locally in the sequence [38]; and (iii) the local interactions are to a large extent a consequence but not a cause of the folding forces [39].
From a global viewpoint, the folding-driving force is a lowering of total Gibbs free energy of the protein-solvent system, which involves contributions from both the entropy (including changes in the solvent entropy and protein conformational entropy) and enthalpy (including formations and breakages of noncovalent bonds in the protein interior and between the protein and the solvent) [5]. Furthermore, because the protein folding process is very complex, changes in entropy and enthalpy dominate different stages of this process by contributing differentially to the lowering of the system free energy [41].

Why is the solvent entropy effect (the hydrophobic force) important?
There are large numbers of hydrogen bonds and van der Waals contacts and a relatively small number of ion pairs in already folded proteins. This has led to the idea that these interactions play a dominant role in driving protein folding. Hydrogen bonds are important because all possible hydrogen-bonding interactions are generally satisfied in the native structures, and hydrogen bonds between backbone amide and carbonyl groups are key components of protein secondary structures [39]. Similarly, tight packing in natively folded proteins implies that van der Waals interactions are also important. Compared with the hydrogen bonds and van der Waals contacts, the ion pairs/salt bridges may be less important because most proteins have only a limited number of electrostatically charged residues and many of them are exposed on the solvent accessible surface of proteins. We can conclude that these interactions are clearly important in stabilizing the native conformational states because these noncovalent bonds are observed only in the already folded native structures. However, to what extent they could drive protein folding needs to be examined carefully.
At first glance, the secondary structures are likely to be constrained and stabilized by their key components, the backbone hydrogen bonds. However, isolated secondary structures are seldom stable in solution on their own [39] and, the "chameleon" sequences in natural proteins can assume either -helical or -strand conformations, suggesting that the stability of the secondary structures relies on their tertiary context [66,67], namely, the compact packing of proteins, which is a consequence of the solvent entropy effect that causes hydrophobic collapse during the protein folding process.
The solvent entropy effect, also called hydrophobic force or hydrophobic interaction, has now been widely believed to play a crucial role in protein folding. Direct evidence, as described by Dill [39], is that (i) all natural proteins have hydrophobic cores, implying that the hydrophobic residue side chains are driven to be hidden from water; (ii) transferring a hydrophobic side chain from water into oil-like media requires energy of 1-2 kcal mol 1 [68]; (iii) proteins are denatured readily in nonpolar solvents; and (iv) sequences with jumbled amino acid residues but correct hydrophobic and polar patterning can fold to their expected native states [69][70][71].
All above evidence implies that solvent entropy maximization contributes substantially to the folding-driving force, i.e., the lowering of the total Gibbs free energy of the protein-solvent system. For example, the requirement to maximize the solvent entropy will exclude nonpolar amino acids from water, leading to a minimized protein surface area and a maximized exposure of polar and electrostatically charged residues on the surface. The hydrophobic side chain-solvent system will reach equilibrium, the global free energy minimum state, through solvent entropy maximization and formation of a water shell around the hydrophobic side chain group so as to guarantee the maximal entropy of other water molecules outside the shell. Therefore, the energy is required to disrupt the bonds within the water shell when transferring hydrophobic side chain into the oil-like nonpolar solvent, which is an enthalpically unfavorable process. Interestingly, the process of mixing hydrophobic group with nonpolar solvent is both enthalpically and entropically favorable because the interactions between solute and solvent are easy to establish and the mixture is more disordered than when the hydrophobic group and solvent are not mixed. This process will lead finally to an equilibrium that has a lower system free energy than that of the hydrophobic group-water solvent system. Similarly, the protein denaturation process in nonpolar solvents is also driven by the lowering of the total Gibbs free energy of the protein-nonpolar solvent system, in which both the entropy gain and enthalpy reduction make favorable contributions to the lowering of the system free energy, although the final shape of the FEL exhibits a relatively flat surface [41]. This, from an opposite side, shows that liquid water is an idea solvent that allows the formation and stabilization of the functional conformational states of proteins through its entropy maximization and enthalpy effect (e.g., the formation of hydrogen bonds between the exposed electrostatically charged/polar side chains and the water molecules). The observation that jumbled amino acid residue sequence retaining the correct hydrophobic and polar patterning can fold into the expected native states suggests that the hydrophobic collapse caused by the solvent entropy maximization is the most important stage in protein folding. The resulting molten globule states provide a prerequisite for further sculpting into the native state through conformational adjustments.

Distinct contribution of entropic and enthalpic components to lowering the system free energy
The statistical mechanic model of the "solvent-induced" force created by Ben-Naim [65] demonstrates that a small conformational change in a group (e.g., the side chain of an amino acid residue) can establish a gradient in the solvation free energy of the protein, which in turn will exert a force on the group, forcing it to move along the direction of the force. In fact, the free energy is a systemic concept that ap-proximates the overall chemical potential of the protein-solvent system but not merely the solvation free energy of the protein. Furthermore, not only can the conformational change in the protein (i.e., the conformational entropy change of the protein) lead to a change in system free energy, but also the solvent entropy change and the formations and breakages of noncovalent bonds between atoms within the protein and from the protein and solvent (i.e., the enthalpy change), or any change of the solvent condition, e.g., the addition of acid, base, denaturant, cofactor, ligand, substrate or other compounds into the solution, can change the system free energy [41]. The steeper the free energy gradient is, the stronger the force exerted on the protein will be, and the higher the probability that the protein will move towards the lower region of the FEL of the protein-solvent system.
The denaturation process is in essence the process of altering the shape of the FEL from the funnel to a relatively flat surface using denaturants or by changing the system temperature. The renaturation process, as discussed above, is the reverse process of establishing a steep enough free energy gradient that recovers the funneled shape of the FEL through restoring the "normal" solvent condition. The establishment of a steep enough and smooth slope on the upper part of the funnel-like FEL triggers a rapid rolling down of the unfolded protein molecules, resulting finally in the collapse of random coiled states into molten globule states. Traditionally, the hydrophobic interaction was thought to drive such a collapse [38,39,45,46]. However, we consider that the term "hydrophobic interaction" may not be appropriate because it is not a proactive interaction driven by mutual attraction between hydrophobic groups and thus may not be responsible for bringing these groups into the interior of the protein. Indeed, aggregation of hydrophobic side chains into the interior of the natively folded proteins is, as mentioned above, the consequence of the solvent entropy maximization, which is dictated by the second law of thermodynamics.
In the protein-solvent system, the water molecules have an absolute advantage in both quantity and mass compared to the protein molecules [41]. The requirement for the solvent entropy maximization will retain as much as possible the highly dynamic hydrogen bonds among water molecules. This will exclude the hydrophobic residue side chains from contacts with water molecules, thus sequestering them into the interior of the collapsed entities by minimizing the solvent accessible surface area of the entities. During the hydrophobic collapse process, energy is required to strip water molecules from the surface of the denatured protein, which is a positive enthalpy change that makes a negative contribution to the lowering of the system free energy; the conformational entropy of the protein is reduced because of the loss of degree of freedom during the collapse, which also makes a negative contribution to the system free energy lowering. Nevertheless, the solvent entropy gain is over whelming compared with the enthalpy increase and conformational entropy loss, and therefore makes a substantial contribution to lowering the total Gibbs free energy of the protein-solvent system. Clearly, the solvent entropy effect is a dominant component of the folding code, and the hydrophobic interactions are the consequence of the hydrophobic collapse rather than the cause for driving protein folding; however, the enthalpy reduction that arises from these hydrophobic contacts can make a minor contribution to lowering the system free energy.
The molten globule that results from the hydrophobic collapse is, as mentioned above, an ensemble of relatively stable folding intermediate states, which are compactly packed and contain some transient secondary structures and tertiary contacts. The molten globule is important because it provides a structural environment for forming not only the local interactions, but also the global tertiary interactions that are present in the natively folded states. As described above, the transition from the molten globule to the native states is a slow process that involves two main folding intermediates: the transition state and the glass transition state. This bottleneck process is slow because of the existence of hills, ridges, valleys, traps/wells, but is still driven by the decrease in total Gibbs free energy of the protein-solvent system. The process of uphill climbing is triggered by the nature of the molten globules to increase their entropy. In other words, the local and/or tertiary interactions are not strong enough to constrain the globules to a particular state. Therefore, the trend to increase the conformational entropy may initially cause the weakest noncovalent bonds to break, disrupting certain structural regions and triggering conformation adjustment, in which noncovalent bonds between atoms are broken and new noncovalent bonds between different atoms are formed. These competitive interactions may repeat many times, until bonds strong enough to counteract the effect of conformational entropy increase are formed. We consider that the conformational entropy increase of a protein may drive the uphill climbing process, while the enthalpy reduction drives the downhill conformational search. As more and more stable noncovalent bonds are formed, the negative enthalpy change will overcompensate for the conformational entropy loss of the protein, thus contributing substantially to the lowering of the system free energy. In addition, the water network formed around the protein surface also makes a favorable contribution to lowering the system free energy. Conclusively, the slow transition process from the molten globule to the native states is still driven by an overall trend in lowering the total Gibbs free energy of the protein-solvent system. Here the negative enthalpy change is a dominant component to the folding code, which overcompensates for the loss of protein conformational entropy and solvent entropy, ultimately leading the protein molecules to arrive at the global free energy minimum region.

Physicochemical basis of protein dynamics
When a protein folds into its native structure, we can regard that the protein molecules are at the bottom of the funnel-like FEL of the protein-solvent system. However, the global free energy minimum is not a smooth region and still contains free energy wells/traps and hills/barriers between wells. It is the ruggedness of the FEL bottom that allows the coexistence of different conformational states/substates of protein molecules in an ensemble manner. For example, natively folded protein molecules coexist in equilibrium as ensembles of different conformational states/substates within different free energy wells. The depth and the width of the free energy wells determine the relative populations, probabilities or lifetimes of the conformational states or substates. This is the thermodynamic property of protein dynamics. The height of the free energy barriers that separate two adjacent wells determines the conformational transition rates between two states. This is the kinetic property of the dynamics.
The conformations within individual large free energy wells are not static, but can fluctuate around an average/equilibrium conformational state, resulting in a large ensemble of closely related conformational substates. Because such fluctuations last for a relatively long time (timescales of microseconds to seconds), they are regarded as equilibrium fluctuations or equilibrium dynamics, which are generally thought to govern the biological functions of proteins [4,6]. On the contrary, the non-equilibrium fluctuations, which occur during the transition process between two conformational states, are thought to have a minimal effect on the overall rates of biological processes [4]. Because conformational transitions need to overcome the free energy barrier and the transition conformational states have transient lifetimes, transition states and the non-equilibrium fluctuations are hard to detect experimentally [5,6].

Hierarchical dynamics of proteins
As discussed in a number of studies [4,72], protein dynamics have an important fluctuation hierarchy feature, which means that the different structural components in a protein fluctuate with different amplitudes and directionalities on different timescales. The bottom of the funnel-like FEL of a protein-solvent system was shown in Figure 2. The conformational states/substates of a protein reside within individual wells of free energy minimum, with the depth and width of the well determining the amplitude and timescale of the dynamics. The transition states, however, are located on the free energy barriers that separate the wells, with the barrier height determining the directionality of the fluctuations and further, the conformational interconversion rates between two states. The amplitude and timescale of protein dynamics are described below. Conformational states/substates are considered to reside within different free energy wells. For example, the tier-0 states A and B reside within the two largest-and-deepest tier-0 wells, respectively, with the free energy difference G AB determining the difference in population distributions of the states A and B (e.g., population p A is larger than p B ). The difference in free energy barrier between the states A and B (G ‡ ) determines the conformational interconversion rates (e.g., k AB is slower than k BA ). The tier-0 dynamics involve the whole protein molecule and occur on timescales ranging from microseconds (s) to milliseconds (ms). Although the tier-0 states A and B coexist in equilibrium with different population distributions, a change in system condition will alter the FEL (from the dark line to the gray line), thus leading to the redistributions of the states A and B. The tier-1 and tier-2 substates reside within tier-1 and tier-2 wells, respectively, and the corresponding fluctuations around these substates involve loops and side chains and occur on faster timescales, e.g., nanoseconds (ns) and picoseconds (ps) for the tier-1 and tier-2 dynamics, respectively. This figure has been modified from Henzler-Wildman and Kern [4] and Ansari et al. [72].
The bottom of the FEL shown in Figure 2 contains two large-and-deep free energy wells, the tier-0 wells. Within the tier-0 wells, there exist relative small or even smaller free energy wells, i.e., two relatively small tier-1 wells exist within the left-side tier-0 well, and four and three smaller tier-2 wells (from left to right) within these two tier-1 wells, respectively. It is such a nested organization of free energy wells that determines the multiple hierarchies of protein dynamics on distinct timescales. For example, the two major conformational states, the tier-0 states, A and B reside within the two largest free energy wells, respectively, and therefore have large populations, long lifetimes or high probabilities (p A , p B ). Fluctuations around each tier-0 state (the tier-0 dynamics) are large amplitude motions that in-volve the whole protein molecule or an entire domain and occur on the timescales ranging from microsecond to million second. Such equilibrium fluctuations are often referred to as large-scale concerted motions or collective motions. The states A and B coexist in equilibrium and the difference in population distributions is determined by their free energy difference G AB . For example, the conformational transition from state A to B needs to overcome a higher barrier G ‡ (k AB ) than the transition from state B to A (G ‡ (k BA )). Therefore, the transition rate k BA will be faster than k AB , leading to a larger population/higher probability of state A than state B. The fluctuations within the tier-1 wells are fast dynamics (tier-1 dynamics) that occur on timescale of nanoseconds. Such fluctuations involve the loops and turns that connect the secondary structural elements, thus resulting in an ensemble of closely related substates within each tier-0 well. Even faster fluctuations (tier-2 dynamics), occur within the smaller tier-2 wells located at the bottom of the tier-1 wells. Such fluctuations mainly involve amino acid side chain rotations on the timescale of picoseconds. These rotations originate from the fastest fluctuations (tier-3 dynamics) such as the bond vibrations that occur on femtosecond timescale.

Relationship between crystal structures and conformational states within the hierarchical free energy wells
A protein structure determined by the X-ray crystallographic method is in essence an average or equilibrium structure under the crystallization condition. Therefore such a structure could be regarded as a conformational state trapped in a tier-0 free energy well. Crystallographic B-factors provide information about the spatial distribution or equilibrium fluctuations around the tier-0 state, indicating that a crystal structure also contains information about the tier-1 and tier-2 states. Therefore, the crystal snapshots and, more importantly, their B-factors are useful for probing the structure-function relationship of a protein.
In the PDB database (http://www.pdb.org), there may be several or many crystal structures for the same protein. The conformational differences between these structures are largely minor, because the most pronounced differences are usually in the loop regions rather than in the structural core. This observation implies that (i) the crystallization conditions used by different laboratories were to a large extent similar and therefore trapped similar tier-1 states within the same tier-0 well; and (ii) most of proteins may have only one tier-0 free energy well within which the equilibrium fluctuations that govern the biological functions are found.
A well-known example of a protein having at least two tier-0 states is the HIV-1 gp120 envelope glycoprotein. Significantly large conformational difference are observed between the two states of gp120: the unliganded state and the CD4-bound state, implying that these two states exist within different tier-0 wells. Most of the gp120 structures determined using X-ray crystallographic methods [7378], either in the presence or in the absence of the CD4 molecules, are in the CD4-bound state, suggesting that this state is stably trapped in a large and deep tier-0 well. However, the unliganded conformation of gp120 is hard to be crystallized and, therefore, only one unliganded state of SIV gp120 was deposited in the PDB database [79]. This suggests that the unliganded gp120 state is unstable and resides in another tier-0 well that has a higher free energy level than the well within which the CD4-bound state resides. Moreover, the higher conformational flexibility of the unliganded state compared to the CD4-bound state [80,81], suggests that the tier-0 well of the unliganded gp120 may have a more rugged bottom than that of the CD4-bound gp120. In the context of the functional viral spike in vivo, however, the unliganded conformation of gp120 can be constrained by interactions with gp41 and other subunits, thus providing advantages for HIV to evade immune surveillance [78]. The conformational stability enhanced by the interactions of gp120 with other molecules indicates that the FEL of the protein-solvent system is not static, but rather, dynamic, with the width and depth of the free energy wells and the height of the barriers being variable under the influence of solvent conditions (Figure 2).

Dynamic nature of the FEL
The free energy wells and barriers shown in Figure 2 are a static manifestation of the dynamic FEL of a protein-solvent system. When the solvent conditions, e.g., the temperature, pressure, pH, ion strength, and the constituents of the system (including the solute and solvent components) are constant, the rugged shape of the landscape bottom will last for a relatively long time period, exhibiting a stable distribution of the hills/barriers and valleys/wells. However, such stability is relative, because the atomic thermal motions and bond vibrations within molecules as well as the Brownian collisions among molecules will inevitably give rise to free energy changes. Such changes will make the rugged profile of the FEL bottom fluctuate around an equilibrium level, similar to the relatively static water ripples over the lake when a breeze is blowing. The existences of stable free energy wells and barriers allow for a stable trap of conformational states, equilibrium fluctuations around the stable states, and stable conformational transition rates between these states, which lead ultimately to a stable distribution of ensembles of conformational states/substates over the rugged bottom of the FEL.
However, any factor capable of perturbing the free energy of the protein-solvent system would likely break such equilibrium and re-establish a new equilibrium, thus leading to the redistribution of the conformational states of the protein molecules [4,41]. Factors that are able to alter the sys-tem free energy include external and internal ones. External factors include temperature, pressure, pH, ion strength, presence of denaturants, and addition of ligands, cofactors, substrates, compounds or any other molecules into the solution. Internal factors include the amino acid mutations, the effects of protein conformational entropy and solvent entropy, and competitive interactions between protein residues and between residues and solvent. As discussed above, high temperature and denaturants can cause substantial changes in shape of the FEL and thus lead to protein denaturation. These factors can also cause moderate changes in shape of the FEL (Figure 2, gray line), thus leading to the redistribution of protein conformational states. For example, the occasional collision of a ligand with the protein will displace the water molecule network around the surfaces of the two partners, leading to an increase of both the solvent entropy and enthalpy. Further interactions will lead to the formation of noncovalent bonds between the protein and ligand, results in the loss of the system enthalpy and the rotational and translational entropies of the two partners. These non-complementary changes between the entropy and enthalpy will cause the free energy fluctuations of the system, namely, the changes in width and depth of the free energy wells and in height of the free energy barriers.
Non-complementary changes between the entropy and enthalpy originate essentially from the tendency to distribute the heat energy as evenly as possible over the system, that is, the entropy maximization of the protein-solvent system. Atomic thermal energy causes the harmonic oscillations of atoms along the covalent bonds and further, the vibration of the bond connecting two atoms (i.e., the fastest tier-3 dynamics that occur on femtosecond timescale mentioned above). The accumulation of the bond vibrations produces large kinetic energy and the release of such energy causes Brownian motions of water molecules and rotational motions of residue side chains of the protein (i.e., the faster tier-2 dynamics on picosecond timescale). The Brownian collisions of the water molecules can break the hydrogen bonds between two adjacent water molecules and then lead to the formation of new bonds between another two water molecules. The repetition of breakages and formations of hydrogen bonds distributes the water kinetic energy over the system, which ultimately leads to the largest possible number of dynamic hydrogen bonds, that is, the maximization of the solvent entropy. The accumulation of the rotational motions of the protein side chain will break the local noncovalent bonds in regions where the structural constraints are weak (such as the surface-exposed loops and turns). This provides the opportunity to form new noncovalent bonds between protein residues and between residues and water molecules. If the new bonds are not strong enough, the nature to increase conformational entropy can cause these bonds to break, triggering a new round of competitive interactions. This process leads to loop/turn motions on the surface of the protein (i.e., the tier-1 dynamics on nanosec-ond timescale). Such loop/turn motions are transmitted either through the water molecule network that is formed around the protein surface or through specific structural components (such as the hinge-bending regions) over the entire protein structure, which results in the slow tier-0 dynamics, the collective motions of the protein.
Taken together, the macroscopic dynamics of proteins is a consequence of cascade amplification of the microscopic motions of atoms and atomic groups, for which the entropy originating from atomic thermal energy is most fundamental. Under a constant solvent condition, the trend to increase the system entropy can, to a large extent, be compensated for by an enthalpy increase. Thus, only minor free energy fluctuations occur and this in turn determines the relatively stable distribution of and transition between the different conformational states of a protein. Drastic alterations of the solvent condition, for example the addition of a denaturant into the solution, can introduce large positive entropy that cannot be compensated for by the positive enthalpy. This causes a substantial change in the shape of the FEL and, thus, leads ultimately to protein denaturation. On the contrary, the addition of a ligand or other compounds into the solution usually causes a moderate non-complementary change between the entropy and enthalpy, which leads to the redistribution of the protein conformational states by alterations in the height of free energy barriers and the size of free energy wells.

Physicochemical basis underlying the proteinligand binding
Proteins participate in biological processes and realize their functions mainly through interactions with proteins and peptides, nucleic acids, cofactors, ligands, substrates, small molecule compounds, and other small molecules such as oxygen or metal ions [5,6,82]. Therefore, a detailed understanding of the biological functions of proteins requires an in-depth understanding of the mechanisms underlying protein-ligand recognition and binding and this, in turn, will contribute greatly to drug discovery and design in the field of medicinal chemistry.

Process of protein-ligand binding and its driving force
The protein-ligand recognition and binding process, like the protein folding process, is driven by the decrease in the total Gibbs free energy of the protein-ligand-solvent system, and this is dictated by a delicately balanced mechanism that involves both the entropy and enthalpy contributions [5,41,83].
Traditionally, the process of protein-ligand binding was described by the "lock-and-key" [84] and "induced-fit" [85] models. The former assumes that both the protein (the lock) and ligand (the key) are rigid and their binding interfaces are perfectly matched so that the ligand binds to the protein like a key being inserting into a lock ( Figure 3A). The latter assumes that the binding interfaces between the protein and ligand are not ideally matched and that the binding-site regions of the protein are flexible. Therefore, the binding of the ligand induces a conformational change in the protein binding site ( Figure 3B). These two models have been widely applied to interpret the recognition and interaction mechanisms of enzyme-substrate, target protein-drug, and receptor-ligand binding. The difference in entropy and enthalpy contributions to the binding-driving force between the lock-and-key and induced-fit models is discussed below.
The first step in the protein-ligand binding process is diffusion followed by collisions between the protein and ligand molecules in the solvent. This process is often ignored when binding mechanisms are discussed in literatures. However, it is important because the initial contacts/ collisions between two molecules are a prerequisite for further interactions to occur. As discussed above, the release of the water molecule kinetic energy causes Brownian motions of individual water molecules. These motions, on the one hand, can lead to entropy maximization of the water solvent itself or, on the other hand, can cause rotational, translational or diffusion motions of the protein and ligand molecules. It should keep in mind that there is a large quantity of water solvent that surrounds the solute molecules, which can result in strong Brownian bombardments that cause molecule wander and subsequent accidental contacts/collisions between protein and ligand molecules. We emphasize that it is the solvent entropy maximization originating from atomic thermal motions that drives this process. The higher the solute concentrations are, the higher the probability of protein-ligand contact is, which makes it more likely to establish further interactions. This is true for both the lock-and- Figure 3 Schematic representations of models that describe the protein-ligand binding mechanisms. A, Lock-and-key model. B, Induced-fit model. C, Conformational selection model. This figure has been modified from Tobi and Bahar [88]. key and induced-fit models.
The subsequent steps are different for these two models. For the lock-and-key model, if the initial collisions occur between the complementary interfaces of the protein and ligand, a large number of water molecules will be displaced. Prior to collision, the water molecules formed a welldefined network around the surfaces of the protein and the ligand to suit the requirement for solvent entropy maximization while simultaneously making a favorable contribution to the lowering of the system free energy via the formation of the intermolecular hydrogen bonds (i.e., bonds between the protein and the water and between different water molecules). The initial collision between the protein and ligand will break some of these hydrogen bonds, and this is a process of the positive entropy that originates from the molecular kinetic energy compensating for the positive enthalpy (or energy) that is stored within the water network. This process triggers the overall water displacement and ultimately, leads to the maximization of the solvent entropy upon ligand binding. The ideally complementary interfaces will allow for the displacement of a large number of water molecules, thereby producing a large amount of solvent entropy (the solvent entropy gain) that overcompensates for not only the positive enthalpy change when breaking the hydrogen bonds of the water network, but also the negative entropy change caused by the loss of rotational and translational entropies of the protein and ligand. We conclude that the binding process described by the lock-and-key model is mainly an entropy-driven process, in which the solvent entropy gain makes a substantial contribution to the lowering of the system free energy, just as the hydrophobic collapse does during the protein folding process. When the water molecules are displaced, noncovalent bonds will be formed between the protein and ligand via a negative enthalpy change, which further contributes to the lowering of the system free energy. This process is similar to the slow bottleneck process during protein folding, in which the competitive interactions sculpt the molten globule into the native states. Because the perfect surface complementarity between binding partners is a prerequisite for achieving the large solvent entropy gain, the lock-and-key model can be used to explain the specificity of ligand binding.
For the induce-fit model, the tentative collisions between partners may repeat many times until an appropriate match between the interacting sites is found [86]. Here the "appropriate match" means that, although the contact interfaces are not perfectly complementary, they are well enough matched to provide the initial complex with enough strength and longevity to allow further interactions, which can induce conformational changes in the binding partners. This process will also exclude water molecules from the contact interfaces and thus contribute to the lowering of the system free energy, although this contribution is not as substantial as it is the lock-and-key binding because of the imperfectly complementary interfaces in the induced-fit model. How-ever, the high conformational flexibility of the binding-site regions allows for the subsequent conformational adjustments to suit the needs of the incoming ligand and, ultimately, to establish full contact between the interacting partners. The negative enthalpy change originating from the formation of the noncovalent bonds can overcompensate for not only loss of the protein conformational entropy, but also the loss of the rotational and translational entropies of the two binding partners, thereby making a substantial contribution to the lowering of the system free energy. Therefore, we conclude that the binding process described by the induced-fit model is mainly an enthalpy-driven process. Because the large number of non-bonded interactions resulting from the induced fit can stabilize the ligand within the binding site for a long time, we consider that the induced-fit model can be used to explain the ligand binding affinity.

Conformational selection mechanism of protein-ligand binding
The lock-and-key and induced-fit models can be used to interpret the binding of ligands to rigid and flexible receptor proteins, respectively. These two idealized models focus mainly on the binding process at a single molecule level but not at the population level. As discussed above, protein molecules coexist as ensembles of different conformational states/substates around the rugged FEL bottom of the system. Furthermore, these states/substates can interconvert, with the transition rate determining the probability distribution and population size of these states. Based on the thermodynamic and kinetic properties of proteins, a more "real" model, the conformational selection model [87][88][89] was proposed to explain the protein-ligand binding mechanism at the molecular population level.
In this model, the ligands can bind selectively to the ensemble of conformational states/substates that has a complementary surface best suiting the ligands, and this is followed by conformational adjustments of the protein, thus shifting the equilibrium towards the complexed/bound state ( Figure 3C). These two consecutive steps are similar to the two processes described by the lock-and-key and induced-fit models, respectively, and thus are driven by both solvent entropy gain and system enthalpy loss. In the conformational selection model, unliganded and complexed states (or approximately complexed states) coexist in equilibrium at the bottom of the FEL, while selective binding to the complexed states disrupts such equilibrium by altering the height of the free energy barrier (or the conformational transition rates), thus leading to a redistribution of these states with altered population size. In the binding process described by the conformational selection model, selective binding can not only efficiently lower the entropy penalty (e.g., the inevitable loss of the protein conformational entropy in the induced-fit model) but can also gain as much as possible the solvent entropy (e.g., as in the lock-and-key model). In nature, most proteins are flexible, especially in their ligand-binding regions. This allows protein molecules to exist as an ensemble of closely related conformational substates, and also allows conformational adjustments to occur to establish full contact between the two binding partners.
Thus, the conformational selection model, which describes the binding process as a selective binding followed by an induced-fit, is driven by both the solvent entropy gain and system enthalpy loss. Because this model (i) includes the binding processes described by the lock-and-key and induced-fit models, (ii) is based on the FEL theory and hence is capable of describing the binding process at the molecular population level, and (iii) can interpret both the ligand-binding specificity and affinity, we consider that it is a more comprehensive and realistic model for explaining and describing protein-ligand binding.

"Misfolding disease" is in fact "protein-protein binding" disease
Alzheimer's disease and prion diseases such as "Mad Cow", scrapie, and Creutzfeld-Jakob diseases [90] are called "misfolding diseases" because the symptoms are believed to be caused by the misfolding of relevant proteins and the aggregation of these misfolded proteins in the brain and other tissues.
Experimental evidence has shown that certain structural regions of the amyloid beta (A) protein and the prion protein can convert from their "native" -helical conformation to a "natively unfolded", prone-to-plaque/scrapie-formation -stranded conformation [91,92]. This implies that the bottoms of the FELs of these two proteins are rugged, allowing for the co-existence of their native and natively unfolded states [93]. The natively unfolded states may reside in a relatively smaller free energy well with a higher free energy level than the well within which the native states reside. Therefore, the natively unfolded states have a smaller population with shorter lifetimes than the native states. Nevertheless, the flexible surfaces of the natively unfolded conformers provide an opportunity for recognition and binding between these conformers, thereby forming seed for fibril growth. Once the seed is formed, the fibril will grow rapidly as the result of a large free energy decrease that arises from the solvent entropy gain. The formation of noncovalent bonds between the binding interfaces will contribute further to the lowering of the free energy, ultimately deepening and widening the free energy wells of the natively unfolded states, and shifting the equilibrium towards aggregation complex, as seen in the plaque observed in the tissues of patients. Therefore, the "misfolded" conformations that are finally observed in the plaque are more a consequence than a cause of the protein aggregation.
We speculate that it may be accumulative mutations that loosen the structural constraints in certain regions of the A and prion proteins, thereby altering the shape of the FEL through the increased conformational entropy, which in turn results in the coexistence of at least two conformational states (i.e., two tier-0 states) of the proteins. Aggregation is merely a manifestation of the side-effect of protein binding. In other words, the higher-energy, less populated conformationally altered monomeric states can recognize and bind selectively to one another to form the lower-energy, highly polymorphic aggregate species [94]. Although the misfolding diseases are caused by the accumulation of misfolded proteins which could not be cleared effectively by the protein quality control system in cells, we emphasize that it is the inter-molecular binding/interaction between proteins that stabilizes the misfolded states and protect them from the protein quality control system. Interestingly, because of the dynamic variability of the FEL, manipulating the solvent conditions or factors that can affect the shape of the FEL may help the aggregated proteins "jump out of" the free energy wells, thus shedding light on possible ways to treat the "misfolding disease" [41].

Protein folding funnel and binding funnels: concluding remarks
Protein folding and protein-ligand binding are similar processes, both being driven by the decrease in total Gibbs free energy of the systems. The only difference between them is the chain connectivity of the system components, giving rise to two different terms: intramolecular and intermolecular recognition and binding [5,95,96]. However, whether the recognition and binding occur within a molecule or between different molecules, the decrease in system free energy, which originates from the non-complementary change between the enthalpy and the entropy, drives the formation of stable, natively folded conformational states of a protein or a protein-ligand complex [97].
For protein folding, the lowering of the system free energy coupled with the gradual reduction in conformational degree of freedom of folding intermediates, determines that the FEL for protein folding must be funnel-like. Although protein-ligand associations occur around the rugged bottom of the FEL, the exclusion of water from the binding interfaces and the formation of noncovalent bonds between the two partners can still lower the system free energy. In conjunction with the loss of the rotational and translational degrees of freedom of the binding partners as well as the loss of the conformational entropy of the protein, these processes could merge, downwards expand, and further widen the free energy wells within which the protein-ligand binding process takes place, thereby making them look like a funnel, which we called the binding funnel [97].
Based on the FEL theory, Nussinov et al. proposed a building block model to demonstrate the similarity between protein folding and protein-ligand binding [95,98]. In this model, the protein or the ligand was divided into a set of building blocks that were located within different microfunnel-like free energy wells. Then, the processes of protein folding or protein-ligand binding were considered as the recognition and association among these building blocks that are driven by fusing microfunnels into a higher dimensional funnel, regardless of the chain connectivity. Therefore, the essence of the building block model is a series of microfunnel fusion events, which lower the total Gibbs free energy of the system and lead to the global free energy minimum states of the protein or the protein-ligand complex.
The free energy downhill processes for protein folding and protein-ligand binding are also similar. In the protein folding process, the first stage is the hydrophobic collapse driven by the solvent entropy maximization, resulting in the molten globule intermediate in which some of the native secondary structural elements and tertiary contacts may have been formed, while many native contacts or close residue-residue interactions present in the native state have yet to form. We consider that the molten globule states are important because they provide a structural environment for further conformational adjustments through conformational entropy increase and competitive interactions between protein residues and between residues and water molecules. In the protein-ligand binding process, the first step is also driven by the solvent entropy effect. This includes two consecutive sub-steps: (i) the trend to increase the solvent entropy causes solute molecules to wander and promotes the subsequent collisions between them, and (ii) the requirement to maximize the entropy displaces the water molecule network around the collision interfaces of the two partners. Therefore, these two sub-steps are also driven by the solvent entropy maximization, ultimately resulting in an initial complex within which some noncovalent bonds have been established, although the two partners are still loosely associated. The initial complex is analogous to the molten globule because it provides the structural environment for further sculpting into the final compactly packed/tightly bound states. The second step is driven mainly by the negative enthalpy change through competitive interactions triggered by conformational entropy increase. Therefore, whether in the protein folding or in the protein-ligand binding process, both the entropy-driven first step and the enthalpy-driven second step contribute to the lowering of the system free energy, which results in the folding and binding FEL being funnel-like.
The non-complementary change between entropy and enthalpy will bring about free energy fluctuations that lead to the ruggedness either in the funnel wall or around the bottom of the funnel. For protein folding, the overall trend in free energy reduction overcomes the negative effect of the ruggedness in the funnel wall, making it possible for the unfolded protein to roll down towards the global free energy minimum region. When protein molecules arrive at the bot-tom of the funneled FEL, the ruggedness allows for the coexistence in equilibrium of ensembles of different conformational states/substates with relatively stable population distributions (the thermodynamic property of the protein dynamics) and conversion rates (the kinetic property). However, alternations of the solvent conditions (e.g., the addition of the ligand), or internal changes in the protein (e.g., the amino acid mutation) will perturb the system free energy and disrupt this equilibrium, finally resulting in redistributions of the conformational states/substates.
Although the fluctuations/perturbations of the system free energy come from the non-complementary change between the entropy and enthalpy, the trend to increase the entropy is most primary and critical in triggering the free energy change of the system [99]. This is because (i) during the protein denaturation process, the denaturant molecules/the increasing temperature initially attack/melts the weakest constrained regions of the protein that have large conformational entropy [100], e.g., the surface-exposed loops and turns; (ii) during the protein folding and protein-ligand binding processes, the requirement for solvent entropy maximization establishes a free energy gradient that is steep enough to trigger the hydrophobic collapse and promote the formation of the initial complex; furthermore, the nature to increase the conformational entropy of the protein triggers the conformational adjustments via competitive interactions; and (iii) for the natively folded protein molecules, the nature to increase the conformational entropy makes the less well constrained regions fluctuate around the protein surface, resulting in a large ensemble of closely related conformational substates. These fluctuations, together with the fluctuations of the water network (which originate from the requirement to maximize the solvent entropy) around the protein surface, can transmit over the entire protein molecule, leading to the large concerted motions that are most relevant to function [6,36,80,81,101103]. Interestingly, using the relative entropy as a minimization object function has been shown to be an effective approach for protein design, indicating the importance of entropy in both folding and inverse folding of proteins [104].
In summary, we consider that the tendency to maximize the entropy of the protein-solvent system that originates from the atomic thermal energy is the most fundamental driving factor for protein folding, binding and dynamics. However, the enthalpy reduction, an opposing factor that tends to make the system become ordered, can compensate for the effect of entropy, which allows the system to reach equilibrium at the global or local free energy minimum.