Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge
In the recent SAMPL5 challenge, participants submitted predictions for cyclohexane/water distribution coefficients for a set of 53 small molecules. Distribution coefficients (log D) replace the hydration free energies that were a central part of the past five SAMPL challenges. A wide variety of computational methods were represented by the 76 submissions from 18 participating groups. Here, we analyze submissions by a variety of error metrics and provide details for a number of reference calculations we performed. As in the SAMPL4 challenge, we assessed the ability of participants to evaluate not just their statistical uncertainty, but their model uncertainty—how well they can predict the magnitude of their model or force field error for specific predictions. Unfortunately, this remains an area where prediction and analysis need improvement. In SAMPL4 the top performing submissions achieved a root-mean-squared error (RMSE) around 1.5 kcal/mol. If we anticipate accuracy in log D predictions to be similar to the hydration free energy predictions in SAMPL4, the expected error here would be around 1.54 log units. Only a few submissions had an RMSE below 2.5 log units in their predicted log D values. However, distribution coefficients introduced complexities not present in past SAMPL challenges, including tautomer enumeration, that are likely to be important in predicting biomolecular properties of interest to drug discovery, therefore some decrease in accuracy would be expected. Overall, the SAMPL5 distribution coefficient challenge provided great insight into the importance of modeling a variety of physical effects. We believe these types of measurements will be a promising source of data for future blind challenges, especially in view of the relatively straightforward nature of the experiments and the level of insight provided.
KeywordsSAMPL Distribution coefficient Blind challenge Free energy Alchemical Molecular simulation
D.L.M. and C.C.B. appreciate financial support from the National Institutes of Health (1R01GM108889-01) and the National Science Foundation (CHE 1352608), and computing support from the UCI GreenPlanet cluster, supported in part by NSF Grant CHE-0840513. This work was made possible in part by NIH grant U01 GM111528 for the Drug Design Data Resource, which supported the SAMPL workshop. M.K.G. thanks the National Institutes of Health for Grant GM061300. The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. M.K.G. has an equity interest in and is a cofounder and scientific advisor of VeraChem LLC. We would also like to acknowledge John Shelley, Art Bochevarov, Robert Abel, and Mats Svensson from Schrödinger for their help with pKa and tautomer enumeration calculations. We also thank all the SAMPL5 participants and D3R Workshop attendees, and we especially appreciate valuable discussions with John Chodera (MSKCC), Ariën Rustenburg (MSKCC), Andreas Klamt (COSMOLogic), Christopher Fennell (Oklahoma State University), Samuel Genheden (Gothenburg University), and Frank Pickard (National Institute of Health).
- 8.Rustenburg AS, Dancer J, Lin B, Feng JA, Ortwine DF, Mobley DL, Chodera JD (2016) J Comput Aided Mol DesGoogle Scholar
- 15.Jorgensen WL, Briggs JM, Contreras L (1990) J Phys 94(4):1683Google Scholar
- 22.I. OpenEye Scientific Software. Oechem (2010). www.eyesopen.com
- 24.Wilk MB, Gnanadesikan R (1968) Biometrika 55(1):1Google Scholar
- 30.Páll S, Abraham MJ, Kutzner C, Hess B, Lindahl E (2014) Solving software challenges for exascale, vol 8759. Springer, StockholmGoogle Scholar
- 39.Lide DR (ed) (1996) CRC handbook of chemistry and physics, 76th edn. CRC Press, Boca RatonGoogle Scholar
- 41.Schrödinger Release 2014-4: Epik, version 3.0, Schrödinger, LLC, New York, NY, (2014)Google Scholar
- 44.Schrödinger Release 2014-4: Ligprep, version 3.2, Schrödinger, LLC, New York, NY, (2014)Google Scholar
- 45.Wang R, Fu Y, Lai L (1997) J Chem Inf Model 37(3):615Google Scholar
- 64.Fennell CJ (2016) Personal CommunicationGoogle Scholar
- 65.Klamt A (2016) Personal CommunicationGoogle Scholar
- 66.Pickard IV FC (2016) Personal CommunicationGoogle Scholar