Introduction

Simulations with molecular dynamics (MD) or Monte Carlo provide structural and dynamic information of chemical systems at high resolution and thus are essential complements to wet-lab experiments [1, 2]. The usefulness of such simulations is to a large extent determined by the underlying molecular mechanics force fields, and it is therefore essential to quantify the accuracy of the force field. A basic requirement is that the force field should correctly describe the solvation thermodynamics of small molecules, such as amino-acid analogues or drug-fragments. This has been the strategy to benchmark force fields in numerous publications [38]. The ability to truly predict solvation free energies has been assessed by several blind challenges under the SAMPL label [912]. The previous four challenges have consisted of a set of hydration free energies, whereas the current challenge is the first one to consider the partitioning between two phases, viz. water and cyclohexane [13].

Molecular simulations are not only limited by the accuracy of the force field, but also the timescales that can be reached [14]. An all-atom (AA) force field, describing each atom individually cannot reach the long time-scales relevant for many biochemical applications unless acceleration techniques [15, 16] or special-purpose hardware [17] is employed. A popular solution to reach longer time-scales is coarse-graining (CG), i.e. grouping atoms into pseudo-particles or beads [18, 19]. This reduces the number of particles that need to be simulated and increases the diffusion rate of the molecules. The CG models are inherently less accurate than AA models: especially CG models of proteins and small molecules currently have a limited usefulness [20]. To remedy this, a hybrid all-atom/coarse-grained model was recently developed, where the most essential part of the system, e.g. a protein or a small molecule, is described with an AA model and the rest of the system, e.g. solvent molecules, are described with a CG model [21]. This model has been used to study small molecules and proteins in water and membrane environments [21, 22]. It has also been used to estimate water/hexane and water/octanol partition coefficients [23]. In this paper, we describe the performance of this hybrid model in the SAMPL5 distribution coefficient challenge.

Methods

Solvent models

The solvents, water and cyclohexane, were described with the Elba coarse-grained (CG) model [24]. The Elba water model has been described and extensively benchmarked previously [25]. In Elba, a single water molecule is modelled as a point dipole attached to a Lennard-Jones site (see Figure S1), i.e. a Stockmayer model. The cyclohexane model was developed for the SAMPL5 challenge, with a similar approach to the models of hexane and octane described previously [23]. A single cyclohexane molecule is described by three connected, uncharged, Lennard-Jones sites as shown in Figure S1. The beads have the same parameters as the non-polar bead used to describe lipid tails, except that σ and ε are multiplied by a factor of 0.9. This is a similar reduction applied to ring beads in the MARTINI force field [26]. Therefore, σ = 0.41 nm and ε = 3.19 kJ mol−1. The bond length and bond force constant are 0.405 nm and 1269 kJ mol−1 nm−2, respectively. The validation of this model is discussed further in the Supplementary Material.

Compound setup

The inputs provided by the organisers for LAMMPS were used as a starting point. The general Amber force field [27, 28] and coordinates of the compounds were retained, whereas the all-atom solvent molecules were coarse-grained using in-house scripts; the all-atom water molecules were replaced by Elba water beads which were positioned at the respective oxygen atom, and the cyclohexane molecules were replaced by Elba cyclohexane molecules with beads placed on the first, third and fifth carbon atom. The system was minimized with 1000 steps of steepest descent and equilibrated for 1.2 ns in the NPT ensemble. A multiple timestep integrator was used [21], propagating the CG–CG non-bonded forces with a 6 fs timestep and all other forces with a 2 fs timestep. The CG–CG non-bonded interactions are a combination of a shifted-force dipole–dipole potential and Lennard-Jones potential. The CG beads interact with the atoms through shifted-force charge–dipole and Lennard-Jones potentials [21]. The cut-off was in all cases 12 Å. The atom–atom non-bonded interactions combine a Lennard-Jones potential with a cut-off at 12 Å and particle–particle particle-mesh Ewald [29] with a 12 Å real-space cut-off. SHAKE [30] was used to constrain covalent bonds involving hydrogen atoms in the compounds. The solvent and compound were coupled to two different Langevin thermostats [31] with a 6 ps coupling constant, keeping the temperature fixed at 298 K. The pressure was kept at 1 atm with a weak-coupling algorithm [32] and a 6 ps coupling constant.

Free energy simulations

The free energy simulations follow to a large extent a previously outlined method [23]. The Gibbs free energy of solvation was estimated using thermodynamic integration (TI) [33], by coupling the system energy, U to a parameter λ. At λ = 0, the compound is fully interacting with the solvent, and at λ = 1, it is completely decoupled, i.e. behaves as a gas-phase molecule. U is scaled with a fourth-power function \(f(\lambda ) = (1 - \lambda )^{4}\) and twenty-five equally spaced values of λ from 0 to 0.96 were simulated, whereas λ = 1 was estimated by linear extrapolation. The integration was carried out using the trapezium rule. One long simulation was carried out and the value of λ was changed step-wise every 4.8 ns and the initial 1.2 ns at each step was discarded as equilibration. The sampling frequency of the energies for TI was 0.6 ps. In some cases, each value of λ was simulated for 3.6 with 1.2 ns discarded as equilibration, further discussed in the text. For the simulations in water, ten independent repeats were initiated by assigning different starting velocities. For the simulations in cyclohexane, only five independent repeats were used.

Quality analysis

The quality of the predictions was quantified by the mean absolute deviation (MAD), mean signed deviation (MSD), root-mean-squared deviation (RMSD), Pearson’s correlation coefficient (R) and the percentage of correctly predicted signs.

Systematic deviations due to the presence of specific chemical groups were analysed using an established procedure [4]. The BEDROC (Boltzmann-enhanced discrimination of receiver-operating characteristic) metric [34] was computed for the different chemical groups as described previously. The checkmol program [35] (version 0.5) was used to identify the chemical groups, and the BEDROC analysis was performed with the CROC python package [36] (version 1.1). The uncertainty of the BEDROC metric was estimated by 500 bootstrap iterations. A Student’s t-test was performed on the absolute deviation for the different groups compared to the entire population of absolute errors.

Results and discussion

We present predictions for the SAMPL5 distribution coefficient challenge. The predictions were produced by computing the solvation free energy, ΔG solv, in water and cyclohexane, using molecular dynamics employing an inexpensive hybrid all-atom/coarse-grained (AA/CG) model. The solvent was described with the Elba CG model and the compounds with the general Amber force field. We did not attempt to estimate the solvation free energy of each possible protonation state of the compounds, or even the most likely; rather we computed the solvation free energy of the neutral compound in the tautomeric state given by the organizers and thus approximate the distribution coefficient with the partition coefficient

$$\log\ D \approx \log\ P = \frac{{\Delta G_{\text{solv}} ({\text{water}}) - \Delta G_{\text{solv}} ({\text{cyclohexane}})}}{2.3RT}$$
(1)

where R is the gas constant and T the absolute temperature. This is motivated by two considerations: (1) the accurate prediction of ΔG for multiple tautomers of a compound would probably be prohibitively expensive, and (2) the estimation of the solvation free energy of ionic compounds is challenging with molecular dynamics simulations. The second consideration is especially true with CG models, which generally do not employ long-range electrostatics.

Submitted predictions

The ΔG solv as well as log D are listed in Table 1 for the 53 compounds in the challenge. The standard error of the ΔG solv estimates is generally good, between 0.02 and 1.0 kJ/mol for the estimates in cyclohexane and 0.06 and 2.1 kJ/mol for the estimates in water. We used five and ten independent repeats for the cyclohexane and water estimates, respectively, which was deemed necessary after computing estimates for all compounds using only two repeats and only 3.6 ns sampling at each value of λ. It would be prohibitively expensive to reduce the standard error further for some of the estimates in water. The larger standard error of the estimates in water stems from the need to decouple electrostatic interactions (charge–dipole) in this phase, whereas the cyclohexane CG model is uncharged. The submitted predictions were based on 4.8 ns sampling at each value of λ, with 1.2 ns discarded as equilibration. To check that the simulations were converged, we also computed free energies for all compounds in both water and cyclohexane with only 3.6 ns sampling. These estimates are given in the Supplementary Material. The solvation free energies in cyclohexane changed by at most 2.5 kJ/mol when increasing sampling by 1.2 ns, but by only 0.3 kJ/mol on average over all compounds. For only three compounds (63, 83 and 92) the estimate of the solvation free energy changes by more than 1 kJ/mol, and therefore, we submitted the predictions based on 4.8 ns sampling. The solvation free energies in water changed by at most 1.8 kJ/mol when increasing the sampling by 1.2 ns, and by 0.3 kJ/mol on average. For only four compounds (37, 67, 83, and 84), the free energy changed by more than 1.0 kJ/mol when increasing the sampling, and thus we consider these estimates to be converged and we submitted the predictions based on 4.8 ns sampling.

Table 1 Submitted estimates of log D as well as solvation free energies in kJ/mol in water and cyclohexane

The correlation between the predictions and experiments is fair as seen in Fig. 1a, with a correlation coefficient, R of 0.64, which is statistically significant (p-value < 0.001). For 77 % of the compounds the prediction of log D has the correct sign, and if we exclude predictions or experiments where log D is not significantly different from zero (determined by a t-test with a 95 % confidence level), the percentage of correctly predicted signs is 82 %. The correlation with experiment and percentage of correctly predicted signs are on a par with previously published predictions of water/hexane partition coefficients but slightly worse than predictions of water/octanol partition coefficients [23]. The deviations of the predictions range from 0.0 to 8.1 log units; the largest deviation is observed for compound 74. This is also the only outlier in the error distribution as seen in the boxplot in Fig. 1b. The second largest deviation, 5.4 log units is observed for compound 75. The mean absolute deviation (MAD) is 1.8 log units, which contains only a small systematic component, as the mean signed deviation (MSD) is only 0.3 log units. The root-mean-squared deviation (RMSD) is 2.4 log units. Compared to previous estimates of partition coefficients with the hybrid model [23], the MAD is significantly larger. For instance, hexane/water and octanol/water partition coefficients were predicted with MADs of 0.86 and 0.66 log units, respectively, i.e. about 1 log unit better than the cyclohexane log D values. There are of course many possible reasons for this, but two of the arguably most significant factors are the larger size of compounds in the SAMPL5 set and the fact that we are here trying reproduce experimental log D values rather than comparing to log P values as in the previous study. However we still compute log P values, and hence neglect the effects of tautomers and ionization.

Fig. 1
figure 1

a Experimental versus predicted log D and b boxplot of absolute deviations compared to experiments. The vertical line in the middle of the box shows the median and the box covers the interquartile range. The whiskers extend to 1.5 of the interquartile range and the cross outside is considered an outlier

To analyze the predictions further, we divided the compounds based on the chemical groups they contain. The objective is to see if compounds with specific moieties lead to significantly worse estimates than the other compounds. We used the checkmol program [35] to classify the compounds and could identify ten groups that contained at least five compounds and at most 47. All of them are listed in Table 2. The largest group is heterocyclic compounds, to which 47 compounds belong. The group of carboxylic acids and phenols only contains five compounds each. For these ten groups, we list in Table 2 the BEDROC metric, the p-value for a t-test of the absolute deviations for the group compared to the total population and the MSD. The analytical BEDROC value, assuming a uniform predictive power of all chemical groups, is listed as well, and serves as a yardstick to determine if the observed BEDROC value of a chemical group indicates a systematic deviation. We observe some BEDROC values that are larger than that expected from a uniform distribution, e.g. amines have an observed value of 0.65 compared to 0.50 for a uniform distribution. However, none of the differences between observed and uniform BEDROC value are significant at the 95 % confidence level, indicating that no particular chemical group is producing worse predictions than the other groups. This is also confirmed by the p-value of the absolute deviations that is larger than 0.05 for all groups; the smallest p-value is found for halogen derivatives, 0.08. Finally, the MSD for many groups is less than 1 log unit, also indicating a lack of systematic error. The largest MSDs are found for ethers, 1.9 log units and phenols, 1.8 log units. Thus, we can conclude that the deviations of the predictions compared to experiments are most likely random in nature.

Table 2 Analysis of the deviation between hybrid predictions and experiment for different chemical groups

Comparison with all-atom predictions

Arguably the main approximation of the submitted predictions lies in the simple CG model of the solvent molecules. Fortunately, we can make a rough quantification of the effect of this approximation by comparing to submissions that utilized all-atom solvents. There were several such submissions, but here we will only compare to a submission from the Mobley lab [37]. They used the same force field for the compounds and the same starting conformations as we used. There are some differences in the free energy methodology, but the length of the simulations is largely similar. Therefore, we consider this to be the closest all-atom submission to the hybrid AA/CG submission presented herein. The Mobley lab was also kind enough to provide the individual solvation free energies, which enables further analysis.

There are clear differences between the AA and AA/CG predictions as seen in Table 3. For ΔG solv in water the absolute deviations range from 0.2 to 41.2 kJ/mol, with a MAD of 12.7 kJ/mol. The differences are systematic as the MAD is almost as large as the MSD, and in general the hybrid estimates of the hydration free energies are more negative than AA. The same holds true for the estimates in cyclohexane, but in this medium the deviations are smaller; the absolute deviations range from 0.8 to 13.5 kJ/mol with a MAD of 4.8 kJ/mol. For log D the deviations range from 0.1 to 6.2 log units, with a MAD of 1.7 log units. Thus it is clear that the deviations between the AA/CG and AA log D values are of similar magnitude as the deviations between the hybrid predictions and experiments (see Table 1). However, the correlation between the AA and hybrid predictions, R = 0.86 is stronger than the correlation between the hybrid predictions and experiment, R = 0.64. In fact, the correlation between an AA and AA/CG is stronger for the estimates of ΔG solv, but because the slope is different in the two media this correlation does not translate to log D.

Table 3 Statistics on the deviation between hybrid and all-atom estimates

The predictions of ΔG solv for compound 74 differ by 41.2 and 5.6 kJ/mol in water and cyclohexane, respectively. Thus, it is clear that the difference between the AA and AA/CG models is manifested differently in the two media. We investigated this further by computing the BEDROC metric of the same ten groups used above, but here we analyze the difference between the AA and AA/CG estimates of ΔG solv and log D. For the predictions of ΔG solv in water, we observe a BEDROC metric that is significantly larger than expected from a uniform distribution for aromatic amines, carboxylic acids, heterocyclic compounds and phenols (see Table 4). For all of these groups, except phenols, the significantly larger BEDROC values are also observed with log D. For the predictions in cyclohexane, we only observe significantly larger BEDROC values for amines and ethers, which is not translated to the log D estimates. Thus, we see that compounds with some chemical groups give large differences in water, and compounds with other groups give large differences in cyclohexane. Whether these differences also give large differences in log D depends on the individual compounds. It is also striking that there is no apparent trend among the groups that show large differences. For instance, it is not immediately clear why we observe a significantly larger BEDROC value for aromatic amines in water, but not for all amines, whereas the opposite is true in cyclohexane.

Table 4 BEDROC metric of the deviation between hybrid and all-atom predictions for different chemical groups

Conclusion

We have presented a submission to the SAMPL5 challenge on distribution coefficients. Our methodology is simple and efficient: we approximate the distribution coefficient by the partition coefficient through the estimation of solvation free energies in water and cyclohexane, employing a hybrid all-atom/coarse-grained model. Such an approach is at least ten times faster than a corresponding all-atom approach [21, 22]; a solvation free energy in water and cyclohexane is computed in 13 and 7 CPU hours on average, respectively on 12 cores of a Cray XC30 machine. We have previously used this hybrid model to produce hexane/water and octanol/water predictions with high accuracy both in comparison to experiment and to a more expensive all-atom solvent model [23]. The SAMPL5 predictions presented herein are a further testament to the accuracy and robustness of this computationally inexpensive model. We obtain a mean absolute deviation of 1.8 log units and a significant correlation coefficient, R of 0.64. In addition, 84 % of the predictions had the correct sign, which is arguably the most important quality for a model predicting partitioning. The estimates seem to be without any systematic bias, and neither is the model more sensitive to a particular chemical group. This observed quality of the AA/CG predictions is on a par with or better than the other submissions employing a simulation approach with a fixed-charged atomistic force field [37]. However, the deviations to experiments are larger than what was expected from previous estimates of log P [23] and there are several possible reasons for this: The compounds in the SAMPL5 challenge are larger, which is also seen in the increased uncertainty of the estimates. Furthermore, we compare to experimental log D, and hence neglect the contribution from all but one tautomer and the possible ionization in the water phase. The much better quality of cyclohexane/water log P values for 79 compounds from the Minnesota database [38] presented in the Supplementary Material, is a clear indication of this. Thus, it seems that the logical place to start on improvements is to add corrections to the log P estimates accounting for different tautomers and ionization effects. However, such corrections are far from accurate or complete [37], and therefore we argue that corrections have to be the subject of future investigations. Other possible error sources include the neglect of a finite water concentration in the cyclohexane phase, compound dimerization, and experimental setup. Even so, the results herein clearly show that a majority of the physics involved in the partitioning of small molecules between water and cyclohexane is captured with a simple CG solvent model.