This study aimed to evaluate the performance of the more general and well-established conformational analysis methods MCMM and MTLMOD in comparison with the new specialized macrocycle sampling techniques MD/LLMOD and PRIME-MCS. Given the importance of macrocycle-protein modelling in drug discovery, we envisaged that a systematic study of both classical and recent specialized methods would provide guidance for other practitioners within the field. In addition to assessing the relative performance of these conformational search methods, we also wanted to address the challenge of performing conformational analysis of large macrocyclic structures with many rotatable bonds. This included studying the degree of conformational space covered in a conformational search. The energy differences between the conformers most similar to the X-rayppw conformation and the lowest energy conformation identified were also studied. However, the default settings of the general methods have not necessarily been optimized to perform well on macrocycles [32]. Therefore, using 10 macrocycles as a representative subset of the full data set, we first investigated whether small changes to the MCMM and MTLMOD methods could enhance conformational sampling of these challenging ring systems and improve the X-rayppw reproduction accuracy. Thereafter, we used the full data set with macrocycles extracted from 44 crystal structures and compared the two general methods (with and without enhanced settings) with the two more specialized methods MCS-PRIME and MD/LLMOD.
Conformational sampling can be run using many different settings. To enable fair comparison of the current work with previous studies, we opted to employ 10,000 search steps per structure, an amount that should be feasible for most modelling projects [33, 34, 41, 64]. To further align our work with the literature, an energy window of 15 kcal mol−1, [44] and up to 50,000 minimization steps [32] was used. For all methods except MD/LLMOD and MCMM-Exhaustive, three runs with different seeds were performed to assess how the stochastic element of the searches affected the results [44]. Finally, we ran an exhaustive conformational search using the MCMM-Enhanced settings and 1,000,000 search steps per structure to compare with the results obtained from the searches using only 10,000 steps search per structure.
It should be noted that when the MD/LLMOD method was developed it was trained on about two-thirds of the macrocycles used in this study, which could potentially bias the results [32].
Data set selection
In general, macrocycle conformational analysis studies have used structures obtained from both the PDB and the Cambridge Structural Database (CSD) [65], with the majority of the structures retrieved from the latter. A significant difference between these databases is that reported macrocycle structures are typically crystallised either with or without protein partners in the PDB and CSD, respectively. Whilst reported conformations of macrocycles reported in both the “free” and protein-bound state can be similar, they may also diverge significantly [36]. Given our emphasis on biologically relevant macrocycles, we directed efforts towards X-ray conformations extracted from the PDB as these protein–ligand complexes can be considered as the bioactive, bound-state conformations. This allowed us to exclusively study if the conformational analysis methods could produce conformations similar to the protein-bound macrocycle conformations.
Currently, there are several macrocycle data sets publicly available. Two of the most frequently used were published by Chen and Foloppe [44] and Watts et al. [32]. In the present study we used the 47 protein-macrocycle complexes previously published by Alogheli et al. [40] Briefly, the Alogheli et al. data set originates from the 150 structures collected by Watts et al. However, all 83 structures obtained from the CSD were removed. The remaining 67 PDB structures were further filtered where structures with either a ring size below 10 atoms, an overall resolution above 2.5 Å, poor ligand resolution, uncertain stereochemistry, predominantly solvent exposed ligands, extensive ligand-ligand interactions and structures that did not overlap with the binding site produced by SiteMap [66], were removed. After this, 31 structures remained. Subsequently, using the same criteria, 16 PDB structures containing macrocycles were added, which gave 47 structures altogether. As described in the Methods section, there are three duplicate structures in the Alogheli data set (Erythromycin in 2IYF/3FRQ, Tacrolimus in 1FKJ/4NNR, and Geldanamycin in 1YET/2ESA). To avoid duplicate sampling, 2IYF, 1FKJ and, 1YET were removed from the present data set. Thus, the number of unique macrocycles in our data set is 44 and the number of unique ring system is 38.
The macrocycle ring sizes varied between 11 and 29 ring atoms (median 16), and the number of rotatable bonds ranged from 8 to 47 (median 23), see Table 1 for 2D structures and Table 2 for a summary of some characteristics of the data set.
Table 2 Characteristics of the Full Data set Consisting of 44 Macrocycles Generating a non-biased starting conformation
In a conformational analysis study, it is less of a challenge to generate a conformation close to the X-ray conformation if the starting conformation is geometrically similar. Therefore, to ensure a starting geometry sufficiently dissimilar from the X-ray structure, it is typical to convert the X-ray structure to a 2D SMILES string and then convert it back to a 3D structure (keeping stereochemical information). In this study we used an alternative approach where conformational ensembles generated via a 10,000 step MCMM search were generated for each macrocycle and compared to the corresponding X-rayppw conformation. The conformer with the highest atomic RMSD to the X-rayppw conformation was then chosen as the starting conformation for all subsequent conformational analysis studies. Using this approach no starting conformers had RMSD values below 1 Å to the X-rayppw conformation, see Table 3 (“starting conformer”) and Table S1 for detailed results. This should be compared with the generally accepted procedure of converting SMILES strings to 3D structures, which had four structures below the 1 Å RMSD threshold. However, it is well-known that one can obtain a high RMSD between two structures by altering only one or a few torsional angles, leaving all other geometric parameters unchanged. Therefore, to further evaluate the similarity between the starting conformers and the X-ray conformation, the torsional RMSD values were calculated.
Table 3 Comparing different strategies to generate non-biased starting conformers Comparing the torsional RMSD values between the two different approaches, the median torsional RMSD were 64.3° and 68.9° for the SMILES generated conformers and the conformational ensemble generated starting conformers, respectively (Table S1). For comparison, the energy minimized X-rayppw conformations had a median torsional RMSD of 8.9°. We also compared the number of torsional angles that differed by more than 120° in comparison to the X-rayppw conformation, which were similar for the two approaches. The median number of torsional angles differing by more than 120° was five for both the starting conformer and the SMILES generated conformer. The median number of torsional angles differing by more than 60° was slightly higher for the starting conformers (16 torsional angles) compared to the SMILES conformers (14 torsional angles). Thus, no major difference was observed between the two approaches for generating a starting conformation of sufficient dissimilarity to the X-rayppw structure. Taken together, we used the most dissimilar structure based on atomic RMSD as the starting conformation in the conformational analysis studies. Overall, this conformation was more dissimilar to the X-rayppw conformation as compared to the SMILES generated conformer, since four of the SMILES generated conformers had atomic RMSD values below the 1 Å threshold. As a general note, evaluating the dissimilarity between starting and X-rayppw conformations is advisable irrespective of the generating method.
Exploring energy minimization method and ring closure settings for MCMM and MTLMOD on a diverse subset of 10 macrocycles
Energy minimization method
Chen and Foloppe have shown that the settings for sampling macrocycles can be enhanced for improved search performance [44]. As it was observed that a major part of the conformational searches was spent on energy minimization of the generated conformations, the choice of minimization method was investigated using the diverse 10 macrocycle subset. The conformational search methods in MacroModel offer a wide variety of minimization methods, where the Polak-Ribiere Conjugated Gradient method (PRCG) is the default method while the truncated-Newton conjugate-gradient (TNCG) is described as a superb method for flexible structures [63]. Newton based minimization methods have also been frequently employed [30, 41, 67]. The PRCG and TNCG minimization methods were compared for the two conformational analysis methods MCMM and MTLMOD. To align all minimizations, up to 50,000 minimization steps were chosen since this is default setting for the MD/LLMOD method (the minimization terminates when the convergence criterion is met). In summary, applied on the diverse macrocycle subset, MCMM and MTLMOD ran 1.6 times and, 7.5 times faster, respectively, using TNCG instead of PRCG, see Table 4. Therefore, all minimizations with the MCMM and MTLMOD methods were run with TNCG instead of PRCG in this study.
Table 4 Computational times used in the conformational analysis of the ten macrocycles in the diverse subset using two different minimization methods Ring closure criterion for MCMM and MTLMOD conformational searches
The MCMM and MTLMOD search methods are implemented in MacroModel. The MTLMOD method uses either LowMode steps or MCMM steps [67]. To generate a new conformation for ring-containing compounds using the MCMM or MTLMOD methods, the ring needs to be temporarily opened, thus, a cleavage site must be identified. Using ths default settings the acceptance criteria for ring closure after torsional variation is 0.5–2.5 Å. To evaluate the performance if all opened rings are re-closed, a very wide ring closure criteria between 0 and 100 Å was investigated. Similar wide ring closure distances have been used in previous studies (e.g., 0.1–30 Å) [30]. Cleavage sites adjacent to a stereocenter can potentially present complications as reconnecting the two atoms after the torsional movement may induce inversion of the original stereocenter. Accordingly, a chirality check is used to reject conformations where this occurs. Therefore, avoiding stereocenters as ring closure atoms might increase the number of generated conformations by reducing the amount rejected due to altered stereochemistry. Applied on the 10 diverse macrocycles, the performance for both MCMM and MTLMOD when changing the ring opening width and ring-opening placement were evaluated. As expected, avoiding ring-closures adjacent to chiral centers and increasing the ring opening width provided the highest number of unique ring conformations (Table 5). This combination also generated at least one conformer within 2 Å RMSD to the X-rayppw conformation for all 10 macrocycles. We reasoned that the ability to generate many ring conformations when analysing macrocycles is of key importance and therefore these settings were used in all subsequent studies (termed MCMM-Enhanced and MTLMOD-Enhanced). A more detailed walkthrough of the modified parameters is described in section “Method Optimization Using a Diverse Subset of 10 Macrocycles” in the supporting information.
Table 5 Summary of conformational analysis settings and results of the 10 macrocycles in the diverse subset Comparing all search methods using the full data set of 44 macrocycles
To compare the general conformational search methods (MCMM, MTLMOD) with the more specialized macrocycle-sampling methods (MD/LLMOD, PRIME-MCS) we applied these methods on all macrocycles contained in the full data set. As the MCMM-Enhanced and MTLMOD-Enhanced methods performed well for the diverse subset, these were also included in the comparison study. Methods were evaluated based on the following criteria; the ability to identify (i) unique conformers, (ii) unique macrocycle ring conformations, (iii) the global energy minimum, and (iv) the methods’ computational speed and (v) the ability to identify conformers similar to the X-rayppw conformation, and (vi) to the X-rayppw ring conformation. To evaluate how well the different conformational search methods performed and to get an approximation of the search efficiency, it would also be interesting to compare the generated conformational ensembles with the complete set of all possible conformers. However, as the number of conformers increases almost exponentially with the number of rotatable bonds, generation of such complete ensembles is difficult [64]. To at least address this challenge, the conformational search methods examined in this work were benchmarked against an exhaustive MCMM run using 1,000,000 search steps. The MCMM method was used in this study because it is an efficient method for generating conformers and the most efficient in generating ring conformations. MCMM, MCMM-Enhanced, MTLMOD, MTLMOD-Enhanced and PRIME-MCS were run three times independently using different seeds. The settings used for the different methods are summarized in Table 6 and all results are summarized in Table 7. For those methods where 3 different seeds were used, the Max, Min and Average results for each method are presented.
Table 6 Summary of the conformational analysis settings for the evaluated methods and literature protocols Table 7 Summary of conformational analysis results for the full data set of 44 macrocycles PRIME-MCS employs, in comparison with other methods that use 15 kcal mol−1, a much wider energy window of 100 kcal mol−1 for saving conformers, which may result in a larger number of conformers generated. Furthermore, PRIME-MCS minimizations are performed in vacuum as compared to the GB/SA water solvation model that is used by the other methods. Therefore, whilst the results of PRIME-MCS are not directly comparable with the MCMM, MTLMOD and MD/LLMOD methods, they are still included as a comparison in all results except in the search for the global energy minimum.
Total number of conformers generated
For each of the three runs using different seeds, the sum of all conformers identified for all 44 macrocycles was calculated for each search method. Since all methods except MD/LLMOD were run three times, the average number of conformers per run will be presented to allow a comparison between the methods. This number was calculated as the sum of identified conformers for each method divided by the number of runs that were made for that method. Across all search methods, MCMM generated the highest average number of conformers over all 44 macrocycles (155,296), see Table 7 and Table S9. MCMM-Enhanced identified 149,831 conformers on average followed by MTLMOD-Enhanced (134,396), MTLMOD (117,490), MD/LLMOD (45,917), and PRIME-MCS (31,118).
In the 10,000 step runs, the variation in number of identified conformations using the three different seeds did not vary considerably within the different methods for each macrocycle. The largest observed difference was for MCMM-Enhanced with a variation of 7% between the highest and lowest number of generated conformations (see max/min in Table 7). For example, for 1BXO, which contains 24 rotatable bonds, MCMM-Enhanced generated 6141/5345/5963 conformations out of 10,000 possible for each run.
With 155,296 identified conformers, MCMM produces a new conformer within the given energy window approximately every third iteration (440,000 possible conformers if every Monte Carlo step would generate a new conformer). Using 100 times as many search steps, the MCMM-Exhaustive search identified about 48 times more conformers in total (7,528,356), as compared to the shorter MCMM protocol (discussed further in section “Conformational coverage—are we reaching convergence”) However, the MCMM-Exhaustive searches were intended to serve as a benchmark in this study and generating large conformational ensembles can be problematic in terms of disk space, data handling and, downstream processing such as visual inspection, docking, pharmacophore modelling and quantum mechanical optimizations, etc.
Number of unique ring conformations generated
As not all conformational sampling software support macrocyclic ring sampling and since ring sampling in itself is not always easily performed, [36] we evaluated the different methods’ ability to generate unique ring conformations. This was defined as the sum of identified ring conformations for each method divided by number of runs that were made for that method. Whilst one could hypothesize that generating more conformers would also provide more ring conformations, the sum of all ring conformations identified by each method did not parallel the total number of generated conformers for the full data set. Instead, MCMM-Enhanced produced the largest number of ring conformations (49,324) followed by MTLMOD-Enhanced (41,040), PRIME-MCS (24,953), MTLMOD (23,367), MD/LLMOD (19,189) and MCMM (18,950) (Table 7 and Table S10). Thus, running MCMM and MTLMOD using adjusted settings regarding the ring opening bond (the enhanced settings) increased the number of generated ring conformations drastically. The MCMM-Exhaustive searches generated 967,844 ring conformations in total showing that with increased sampling more ring conformations could be found.
Identifying the global energy minimum
The ability of the methods to identify the global energy minimum was also evaluated. As minimization uses a GB/SA solvation model in all methods except for PRIME-MCS (vacuum), this method was not evaluated in this section. As MCMM-Exhaustive (1,000,000 search steps) identified the lowest energy conformer for all macrocycles, this was considered as the global minimum. For the other methods, the global minimum energy was considered identified if a conformer within 1 kJ mol−1 of the global energy minima was generated. We also set out to investigate if these conformers were geometrically identical to the global energy minima conformer. This was determined by first analyzing whether the global energy ring conformation was identified and, secondly, whether the whole conformer was identified. The global energy conformer and ring conformation were considered geometrically identical if the RMSD between the two conformers were below 0.1 Å RMSD when using either all heavy atoms present or just those of the macrocyclic ring, respectively. The mirror-image conformers of the global energy minimum were considered identical in this analysis.
To compare the ability of different methods to identify the global energy minimum using 10,000 search steps, the percentage of runs in which these conformations were identified was calculated. By only considering the energy, MD/LLMOD identified the global energy minima in 64% of the conformational searches (28/44), whereas MTLMOD-Enhanced and MCMM-Enhanced identified a conformer within 1 kJ mol−1 from the global energy minimum for 60% (79/132) and 59% (78/132) of the runs, respectively, see Table 7 and Table S11. MCMM and MTLMOD found the global energy minimum for and 57% (75/132), 56% (74/132) of the runs, respectively. Consequently, when using the enhanced settings, both MCMM and MTLMOD performed slightly better. Since the success rate of finding a conformer within 1 kJ mol−1 from global minimum ranged between 56 and 64%, this suggests that the 10,000 search steps might not be adequate for finding the global minimum for macrocycles. However, considering the number of rotatable bonds a macrocycle may have and, consequently, the large number of conformers that may exist, the probability of generating the global energy minimum should be rather low. Thus, the low success rates of generating the global energy minimum are not surprising.
By requiring that the ring conformation should be identical to that of the global energy minimum, MD/LLMOD instead identified the global energy minima in 61% (27/44) of the conformational searches. This was followed by MCMM-Enhanced 57% (75/132), MTLMOD-Enhanced 55% (73/132), MTLMOD 54% (71/132) and, MCMM 53% (70/132). Using the strictest definition were the whole (all heavy atoms) global energy minima conformer needs to be identical, MD/LLMOD identified the global energy minima in 55% (24/44) of the conformational searches, followed by MCMM-Enhanced 52% (68/132), MTLMOD-Enhanced 50% (66/132), MTLMOD 48% (64/132) and, MCMM 48% (63/132). In summary, MD/LLMOD generated the global energy minima most frequently of the evaluated methods across all three definitions of the global energy minima.
The challenge of identifying the global energy minimum for a given macrocycle can be estimated by the number of times it is found over 13 different runs (3 MCMM runs, 3 MCMM-Enhanced runs, 3 MTLMOD runs, 3 MTLMOD-Enhanced runs and 1 MD/LLMOD run) [44]. For each macrocycle, the relationship between the number of times the methods found the global energy minimum (using the all heavy atoms definition) and the number of rotatable bonds is shown in Fig. 2 below. For example, all 13 methods identified the global energy minimum for 1S9D (15 rotatable bonds), while none of the methods found the global energy minima for 1NSG (46 rotatable bonds). As expected, the methods are more successful for identifying the global energy minimum for less flexible macrocycles (less than 20 rotatable bonds, dotted line in Fig. 2). In contrast, this is more challenging for flexible macrocycles with more than 33 rotatable bonds (grey area in Fig. 2). This suggests that if the aim is to generate the global energy minimum for a macrocycle with many rotatable bonds, more extensive conformational sampling than 10 000 search steps is required. For nine macrocycles, none of the methods except MCMM-Exhaustive were able to identify the global energy minimum (1FKD, 1FKI, 1NMK, 1NSG, 1TPS, 2ASP, 2DG4, 3BXS and 4NNR). These compounds contain some of the largest ring structures in the data set except 3BXS. In 3BXS none of the methods identified the correct valine side-chain orientation.
Computational speed
The computational speed per method is given as the average CPU-time to sample the 44 macrocycles (total CPU time/number of seeds). The fastest method overall, was MCMM (3740 min) followed by MTLMOD (4125 min), MD/LLMOD (4163) MTLMOD-Enhanced (4182 min), MCMM-Enhanced (4642 min) and PRIME-MCS (9791 min), see Table 7 and Table S12. PRIME-MCS (9791 min) was thereby roughly twice as slow in comparison to the second slowest method MCMM-Enhanced (5345 min). MCMM-Exhaustive ran for 925,026 min, i.e. almost 2 years, in total (median 17,412 min, approximately 12 days per macrocycle). Other conformational sampling methodologies requiring up to 7 days on a 100 CPU cluster have been published [37]. However, this amount of conformational sampling is rarely described. Thus, using MCMM-Exhaustive is probably unreasonably lengthy for most modelling projects in the field of drug discovery.
Generating a conformation similar to the X-rayppw conformation
The ability to generate conformers similar to the bioactive conformation in the X-ray structure (best-fit conformation in Table 7) was analysed by calculating the RMSD between all conformers for each macrocycle and the X-ray ligand after Protein Preparation Wizard treatment (the X-rayppw conformation). Structures downloaded from the PDB are interpretations of electron densities and therefore models that may contain errors, especially in the ligand structures [68,69,70,71]. Therefore, we allowed for “adjusting” distorted bond angles and lengths, etc., in protein–ligand complexes and thereby aligning the structures to the force field. Thus, we used the X-rayppw conformation as the X-ray structure in all comparisons. When available, the fit to the electron density was evaluated for the protein-macrocycle complexes after the restrained energy minimization. The conformer most similar to the X-rayppw conformation was grouped in one of the following categories based on the RMSD value; below 1 Å, between 1 Å–2 Å and greater than 2 Å. All non-hydrogen atoms were considered for the RMSD calculations. Furthermore, since one of the complexes (3BXS) contains the macrocycle bound to two separate binding sites with different conformation, 45 instead of 44 macrocycles extracted from protein-macrocycle complexes were evaluated in this section. The median RMSD value between the X-ray ligand structure before and after protein preparation wizard treatment was 0.19 Å. For methods run in triplicate, the mean value is presented.
Furthermore, to explore the local minimum closest to the X-rayppw conformation we energy minimized the X-rayppw structure and compared it with the X-rayppw conformation (“energy minimized X-rayppw ligand” in Table 7). 40 out of the 45 macrocycles had a local minimum with RMSD values below 1 Å when superimposed on the X-rayppw structure, and five energy minimized X-rayppw conformers had RMSD values between 1 and 2 Å, see also Table S13.
The different methods’ ability to generate conformers similar to the X-rayppw conformation can be seen in Table 7 (“best fit conformation”), Table S13 and Table S14. Analysing the results below 1 Å RMSD, the MCMM-Exhaustive searches were able to identify such conformers for 44 out of 45 macrocycles. The RMSD for 1TPS was 1.22 Å and this macrocycle contained the highest number of rotatable bonds (47), which could be a reason for this. The second best search method was MCMM-Enhanced (40 of 45 macrocycles), followed by MTLMOD-Enhanced (36 out of 45 macrocycles). The more specialized methods MD/LLMOD and PRIME-MCS searches generated a conformer below 1 Å RMSD for 31 and 24 macrocycles, respectively. The results from the MCMM-Exhaustive search imply that conformers very close to the X-rayppw conformations can be generated if the sampling is sufficiently increased. Surprisingly, when comparing the MCMM-Enhanced and the MTLMOD-Enhanced searches with the MCMM-Exhaustive search the results are not dramatically different. Thus, 10,000 search steps seems to be adequate for generating a conformer close to the X-rayppw conformation for the enhanced methods. Overall, the general methods seemed more efficient at generating conformers close to the X-rayppw conformations in comparison to the more specialized methods. A visual overview of how the different methods performed is depicted in Fig. 3.
Generating the X-rayppw ring conformation
Most of the published macrocycle sampling studies have focused on the RMSD to the macrocyclic ring atoms as found in the X-ray. Thus, as a final comparison, we wanted to evaluate if the methods examined here could identify a ring conformation similar to the X-rayppw ring conformation (RMSDRING). These results are summarized in Table 7 and are presented in greater detail in Tables S15 and S16. For 3BXR, which consists of two macrocyclic rings, an RMSDRING value was calculated for each ring separately. Therefore, 46 RMSD values instead of 45 will be presented in this section. For methods run in triplicates, the mean value is presented.
Ideally, a conformational search method should be able to identify a conformer below 0.5 Å RMSDRING (a commonly used threshold [33, 36, 38]). Analysing the results of the benchmarking methods MCMM-Exhaustive and the energy minimized X-rayppw structures, these methods were able to identify a conformer below 0.5 Å RMSDRING in 46 and 45 cases out of the 46 macrocyclic rings, respectively. Analysing the results obtained with the other methods, they all had a median RMSDRING below 0.5 Å. However, MCMM-Enhanced and MTLMOD-Enhanced most accurately regenerated the macrocyclic ring conformation and both methods identified a conformer below 0.5 Å RMSDRING in 44 out of the 45 cases. For comparison, the third best method MTLMOD identified such a conformer for 39 out of the 46 macrocyclic rings. Compared to the other methods in Fig. 4, MCMM-Enhanced and MTLMOD-Enhanced seem to be more efficient at generating a ring conformations close to the X-rayppw ring conformation.
Comparing the results with prior work in the field
Several recent publications have studied the conformational preference of macrocycles and presented new conformational sampling methods. However, a direct comparison of these methods with those presented in this study is difficult since the datasets differ and the energies, number of conformers and ring conformations, may change significantly depending on, for example, what force field has been used. Another challenge with such comparisons is that small changes in programme settings can change the internal rankings between the methods. Despite this, we compared the ability of the methods presented in each paper to reproduce the X-ray structure (X-ray accuracy). Specifically, we analyzed the methods ability to re-generate the X-ray conformation with respect to all heavy-atoms RMSD, as well as the backbone (ring) atoms RMSDRING (see Table 8). As it is not meaningful to directly compare the results between different studies, we only looked at the internal order of the different methods in each paper with respect to RMSD values. The comparison starts with the well-cited work from Chen and Foloppe [44] where MD/LLMOD and MTLMOD, among other methods were evaluated, followed by an analysis of the publications where the MD/LLMOD [32] and PRIME-MCS [33] methods were published. Finally, we included the publications that introduces the new methods BRIKARD [35], PLOP [36], and ForceGen [34]. In all the publications above, with the exception of ForceGen, the MD/LLMOD method has been included, which allows it to serve as a reference method in the current analysis (the results of MD/LLMOD are shown in bold in Table 8).
Table 8 Summary of the X-ray accuracy reported in the literature Chen and Foloppe evaluated several different settings on a series of conformational sampling methods with the aim of optimizing search efficiency. The optimal settings derived are presented with the prefix “CF-” below and in Table 6. Unfortunately, not all heavy-atoms RMSD values were reported in this study but instead the number of X-ray structures that were reproduced within an RMSD of 1 Å. The CF-MTLMOD method was the most accurate in generating conformers below 1 Å RMSD (79%) followed by CF-MD/LLMOD (77%), MOE LowModeMD (72%), and MOE stochastic search (53%) [44].
Watts et al. presented the MD/LLMOD method and a macrocycle data set consisting of 67 PDB structures (150 structures in total) [32]. This data set, as well as the MD/LLMOD method, has been used in several other studies. Applied on the 67 PDB structures, Watts et al. reported a median heavy-atom RMSD of 1.14 Å (values reported in supporting information in ref [32]) compared to the X-ray structure).
Sindhikara et al. used 60 out of the 67 macrocycles in the Watts et al. data set and included the MD/LLMOD method, a molecular dynamics simulation method (simulations were run for 24 ns), and MOE LowModeMD [31] as reference methods when they presented and evaluated the PRIME-MCS method [33]. Applied on those 60 PDB structures, Sindhikara et al. reported the median all heavy-atom RMSD values to the X-ray structure to be the lowest for MD/LLMOD (1.10 Å), followed by PRIME-MCS (1.49 Å), MOE LowModeMD (1.69 Å), and molecular dynamics (1.89 Å) (values reported in supporting information in ref [33]). The median RMSDRING values followed a pattern analogues to the all heavy-atom RMSD values. The lowest median RMSDRING value was obtained for MD/LLMOD (0.38 Å) followed by PRIME-MCS (0.40 Å), MOE LowModeMD (0.41 Å), and molecular dynamics (0.56 Å) (values reported in supporting information in ref [33]). Thus, in both cases, the MD/LLMOD method had the best accuracy for reproducing the X-ray conformation.
In a study by Coutsias et al., a new method called BRIKARD was presented [35]. To evaluate BRIKARD, Coutsias et al. collected a data set of 67 structures, of which 39 originated from the PDB. BRIKARD was benchmarked against MD/LLMOD, as well as the optimized methods CF-MD/LLMOD and CF-LowModeMD from the work of Chen and Foloppe (see Table 6 for settings). Using all 67 structures, BRIKARD had a median RMSDRING value of 0.47 Å followed by CF-MD/LLMOD (0.54 Å), MD/LLMOD (0.63 Å) and CF-LowModeMD (0.64 Å) (values calculated from supporting information in ref [35]). Thus, BRIKARD seems to reproduce the ring conformation slightly better than MD/LLMOD. However, when comparing the X-ray ring accuracy for the PDB structures only, the differences between the methods were smaller. For the 39 structures originating from the PDB, the median RMSDRING was 0.42 Å for BRIKARD, followed by CF-MD/LLMOD (0.47 Å), MD/LLMOD (0.49 Å), and CF-LowModeMD (0.54 Å) (median values calculated from supporting information in ref [35]).
Wang et al. developed the PLOP method based on 37 macrocycles originating from both the PDB and CSD [36]. In their study, MD/LLMOD and PLOP were compared based on how well the backbone (ring atoms) were reproduced. The optimized protocol of PLOP was able to reproduce the crystal structure within 0.50 Å RMSDRING for 31 out of 37 macrocycles with a median RMSDRING value of 0.25 Å. Wang et al. concluded that the performance of PLOP was not statistically different compared to MD/LLMOD.
Cleves and Jain compared the performance of ForceGen, CF-LowModeMD and CF-MTLMOD using the Chen and Foloppe data set (30 PDB structures) [34]. Applied on those macrocycles, CF-LowModeMD showed equivalent reproduction of the X-ray conformation (all heavy-atoms) compared to ForceGen, whereas CF-MTLMOD showed marginally better results compared to ForceGen.
To summarize, as shown from the studies above, the specialized macrocycle sampling method MD/LLMOD is able to reproduce the X-ray structures accurately, generating better or comparable results to other methods in prior publications. Interestingly, we have shown that by using slightly tweaked versions of the general methods, such as MCMM-Enhanced and MTLMOD-Enhanced, X-rayppw accuracy results comparable to, or even better than, MD/LLMOD may be obtained at least for the data set and parameters used in this study. Therefore, to further explore the general methods abilities, including MCMM-Enhanced and MTLMOD-Enhanced in future method comparison studies could be of interest.
Conformational coverage—are we reaching convergence
The absolute degree of conformational space covered in a conformational search is often difficult to describe [72, 73]. In the literature, parameters such as the number of conformations identified, the number of times the lowest energy conformation is visited, the range of compactness/extendedness covered by the conformations as described by the radius of gyration, and the number of visited 3D pharmacophore points have been considered [31, 44]. Full conformational coverage can also be defined as when all possible conformers within a specified energy window have been found. As macrocycles are said to be conformationally restricted, we aimed to explore the conformational space in a more exhaustive way than is typical.
Since the shape of the energy hypersurface is force field dependent, the number of possible low-energy conformers varies between the force fields. Quantum mechanical methods would also most likely change the number of possible low-energy conformers. However, since most drug design projects are carried out in a molecular mechanics force field environment, we were interested to explore how many conformers that energy hypersurface contains. Therefore, we ran the MCMM-Enhanced search with 1,000,000 search steps (MCMM-Exhaustive) for the full data set (44 macrocycles). The results were visualized by plotting the number of search steps against the number of conformations generated within 15 kcal mol−1 from the global energy minimum (Fig. 5a depicts the 10 examples, chosen to represent the structurally diverse range of macrocycles). Our results show that the number of conformers generated increases steadily for all macrocycles, with the exception of 3JRX and 2HFK/2J9M, which are on top of each other. These three macrocycles have relatively small macrocyclic ring systems (12, 14 and 15, respectively) and are not extensively substituted. As these macrocycles have the smallest number of torsional angles they are therefore, expected to have fewer conformations, for example, in comparison with 1S22. This macrocycle contains a much larger ring system, substituted with a large flexible side-chain thereby increasing the degrees or torsional freedom. Thus, after 1,000,000 search steps, full conformational coverage has, as expected, not been reached for the majority of the ten displayed macrocycles. For the full data set, MCMM produced 155,846 conformers whereas MCMM-Exhaustive identified 7,528,356, corresponding to roughly 48 times more conformers than MCMM.
As expected, when examining the number of generated conformations within a narrower energy window (e.g. 10 and 5 kcal mol−1, in Fig. 5b and c, respectively) the rate of conformer generation decreases for most of the macrocycles. As seen in Fig. 5b, for six out of ten macrocycles (2HFK, 2J9M, 2DG4, 2XBK, 3I6O and, 3JRX) the rate of conformer generation decreases. Within 5 kcal mol−1, all but one (1S22) of the ten conformational searches asymptotically approaches full coverage (Fig. 5c).
Looking further at the energy distribution for all conformers of the 10 macrocycles, the vast majority of the conformers have a relative energy above 10 kcal mol−1. For a more thorough discussion about the energy distribution, see section “Conformational coverage and conformational energy distribution.” in the supporting information.
Energy window for sampling macrocycles
Numerous studies have been performed to understand the conformational energy cost when a ligand binds to its target. Some argue that the acceptable conformational energy penalty is relatively low (below 3 kcal mol−1 [74, 75], mostly below 5 kcal mol−1 [76], between 4 and 6 kcal mol−1 [77], and mostly below 6 kcal mol−1 [78]) However, energies, above 9 kcal mol−1 [76], around 15.9 ± 11.5 kcal mol−1 [79], between 0 – 25 kcal mol−1 [80], and even up to 27 kcal mol−1 [81], have been suggested as feasible for protein-bound ligands. For conformational sampling of macrocycles, Chen and Foloppe noticed an improved reproduction of the X-ray conformation for MTLMOD using an increased energy window of up to 15 kcal mol−1 [44]. Also, Alogheli et al. used a 15 kcal mol−1 energy window and found that the conformer closest to the energy minimized X-ray structure averaged around 5 kcal mol−1 from the global minimum with the largest difference being 10.8 kcal mol−1 (almost the same data set as in this study and energies were calculated with OPLS-2005) [40].
Calculating the conformational energy penalty upon binding has been discussed rigorously in the literature and has recently been summarized by Peach [82]. Thus, no attempts tackling this subject will be made herein. However, as many modeling methods utilize an energy window for generating conformers, the width of this window is of great importance. Therefore, two energy differences were calculated and analyzed. The first between the global energy minimum and the energy minimized X-rayppw structure and the second between the global energy minimum and the MCMM-Exhaustive generated conformer closest to the X-rayppw structure. As mentioned in section "Generating a Conformation Similar to the X-rayppw Conformation", the conformer derived by energy minimizing the X-rayppw conformation should correspond to the minimum closest to the X-rayppw conformation. Therefore, with the aim of generating conformers closest to the bioactive conformation, the energy difference to this conformation could serve as an upper cut-off value for keeping conformers. The energy difference between the minimized X-rayppw conformation and the global minimum varied between 0 and 13.7 kcal mol−1, with a median value of 5.6 kcal mol−1 (Table S17). Therefore, the energy window of 15 kcal mol−1 used herein seems appropriate. It should be noted that only five out of the 45 macrocycles had energy differences exceeding 10 kcal mol−1.
As previously mentioned, the MCMM-Exhaustive searches were able to generate conformations similar to the X-rayppw structure (< 1 Å) for all but one of the 45 protein-macrocycle complexes, see Tables 7 and S13. When the energy difference between these conformations and the corresponding global energy minimum (generated by any method) was analyzed it varied between 0 and 21.4 kcal mol−1 with a median of 6.6 kcal mol−1 (see Table S17). The conformation closest to the X-rayppw conformation was found within 5 and 10 kcal mol−1 for 19 and 33 of the macrocycles, respectively.
Considering both energy differences, a 15 kcal mol−1 energy window for keeping conformers seems appropriate. However, there are many cases where an energy window of 10 kcal mol−1 or even 5 kcal mol−1, could be used instead.