Abstract
QSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using machine learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity during the drug discovery process. However, the confidence or reliability of predictions from a QSAR model are difficult to accurately assess. We frame the application of QSAR to preclinical drug development in an off-line inductive conformal prediction framework and apply it prospectively to historical data collected from four different assays within AstraZeneca over a time course of about five years. The results indicate weakened validity of the conformal predictor due to violations of the randomness assumption. The validity can be strengthen by adopting semi-off-line conformal prediction. The non-randomness of the data prevents exactly valid predictions but comparisons to the results of a traditional QSAR procedure applied to the same data indicate that conformal predictions are highly useful in the drug discovery process.
Similar content being viewed by others
References
Perkins, R., Fang, H., Tong, W., Welsh, W.J.: Quantitative structure-activity relationship methods: perspectives on drug discovery and toxicology. Environ. Toxicol. Chem. 22(8), 1666–1679 (2003)
Netzeva, T.I., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern. Lab. Anim. 33(2), 155–173 (2005)
Dragos, H., Gilles, M., Alexandre, V.: Predicting the predictability a unified approach to the applicability domain problem of QSAR models. J. Chem. Inf. Model. 49(7), 1762–1776 (2009)
Jaworska, J., Gabbert, S., Aldenberg, T.: Towards optimization of chemical testing under REACH: a Bayesian network approach to integrated testing strategies. Regul. Toxicol. Pharmacol. 57(2–3), 157–167 (2010)
Bassan, A., Worth, A.P.: Computational toxicology: risk assessment for pharmaceutical and environmental chemicals. In: Computational Tools for Regulatory Needs, pp. 751–775. John Wiley & Sons, Inc. (2007)
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World, 1st edn. Springer (2005). ISBN 0387001522
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)
Papadopoulos, H.: Inductive conformal prediction: theory and application to neural networks. In: Tools in Artificial Intelligence, pp. 315–330 (2008)
Papadopoulos, H., Vovk, V., Gammerman, A.: Regression conformal prediction with nearest neighbours. J. Artif. Intell. Res. 40(1), 815–840 (2011)
Norinder, U., Ek, M.E.: Qsar investigation of NaV1.7 active compounds using the svm/signature approach and the bioclipse modeling platform. Bioorg. Med. Chem. Lett. 23(1), 261–263 (2013). doi:10.1016/j.bmcl.2012.10.102
Eklund, M., Norinder, U., Boyer, S., Carlsson, L.: Application of conformal prediction in QSAR. In: AIAI (2), pp. 166–175 (2012)
Nouretdinov, I., Gammerman, A., Qi, Y., Klein-Seetharaman, J.: Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method. Pac. Symp. Biocomput. 311–322 (2012)
Papadopoulos, H., Gammerman, A., Vovk, V.: Reliable diagnosis of acute abdominal pain with conformal prediction. Int. J. Eng. Intell. Syst. Electr. Eng. Commun. 17(2–3), 127–137 (2009). ISSN 1472-8915
Halgren, T.A.: Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17(5–6), 490–519 (1996)
Bosnić, Z., Kononenko, I.: Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl. Eng. 67(3), 504–516 (2008)
Faulon, J.-L., Visco, D.P. Jr., Pophale, R.S.: The signature molecular descriptor. 1. using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003)
Faulon, J.-L., Collins, M.J., Carr, R.D.: The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J. Chem. Inf. Comput. Sci. 44(2), 427–436 (2004)
Vapnik, V.N.: Statistical Learning Theory, 1st edn. Wiley (1998). ISBN 0471030031
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer-Verlag, New York (2009)
Papadopoulos, H., Haralambous, H.: Reliable prediction intervals with regression neural networks. Neural Netw. 24(8), 842–851 (2011). ISSN 0893-6080
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Vovk, V., Nouretdinov, I., Gammerman, A.: Testing exchangeability on-line. In: Proceedings of the 20th International Conference on Machine Learning, pp. 768–775 (2003)
Fedorova, V., Gammerman, A., Nouretdinov, I., Vovk, V.: Plug-in martingales for testing exchangeability on-line. In: Proceedings of the 29th International Conference on Machine Learning (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eklund, M., Norinder, U., Boyer, S. et al. The application of conformal prediction to the drug discovery process. Ann Math Artif Intell 74, 117–132 (2015). https://doi.org/10.1007/s10472-013-9378-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-013-9378-2