Skip to main content
Log in

The application of conformal prediction to the drug discovery process

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript


QSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using machine learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity during the drug discovery process. However, the confidence or reliability of predictions from a QSAR model are difficult to accurately assess. We frame the application of QSAR to preclinical drug development in an off-line inductive conformal prediction framework and apply it prospectively to historical data collected from four different assays within AstraZeneca over a time course of about five years. The results indicate weakened validity of the conformal predictor due to violations of the randomness assumption. The validity can be strengthen by adopting semi-off-line conformal prediction. The non-randomness of the data prevents exactly valid predictions but comparisons to the results of a traditional QSAR procedure applied to the same data indicate that conformal predictions are highly useful in the drug discovery process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Perkins, R., Fang, H., Tong, W., Welsh, W.J.: Quantitative structure-activity relationship methods: perspectives on drug discovery and toxicology. Environ. Toxicol. Chem. 22(8), 1666–1679 (2003)

    Article  Google Scholar 

  2. Netzeva, T.I., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern. Lab. Anim. 33(2), 155–173 (2005)

    Google Scholar 

  3. Dragos, H., Gilles, M., Alexandre, V.: Predicting the predictability a unified approach to the applicability domain problem of QSAR models. J. Chem. Inf. Model. 49(7), 1762–1776 (2009)

    Article  Google Scholar 

  4. Jaworska, J., Gabbert, S., Aldenberg, T.: Towards optimization of chemical testing under REACH: a Bayesian network approach to integrated testing strategies. Regul. Toxicol. Pharmacol. 57(2–3), 157–167 (2010)

    Article  Google Scholar 

  5. Bassan, A., Worth, A.P.: Computational toxicology: risk assessment for pharmaceutical and environmental chemicals. In: Computational Tools for Regulatory Needs, pp. 751–775. John Wiley & Sons, Inc. (2007)

  6. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World, 1st edn. Springer (2005). ISBN 0387001522

  7. Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)

    MATH  MathSciNet  Google Scholar 

  8. Papadopoulos, H.: Inductive conformal prediction: theory and application to neural networks. In: Tools in Artificial Intelligence, pp. 315–330 (2008)

  9. Papadopoulos, H., Vovk, V., Gammerman, A.: Regression conformal prediction with nearest neighbours. J. Artif. Intell. Res. 40(1), 815–840 (2011)

    MATH  MathSciNet  Google Scholar 

  10. Norinder, U., Ek, M.E.: Qsar investigation of NaV1.7 active compounds using the svm/signature approach and the bioclipse modeling platform. Bioorg. Med. Chem. Lett. 23(1), 261–263 (2013). doi:10.1016/j.bmcl.2012.10.102

    Article  Google Scholar 

  11. Eklund, M., Norinder, U., Boyer, S., Carlsson, L.: Application of conformal prediction in QSAR. In: AIAI (2), pp. 166–175 (2012)

  12. Nouretdinov, I., Gammerman, A., Qi, Y., Klein-Seetharaman, J.: Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method. Pac. Symp. Biocomput. 311–322 (2012)

  13. Papadopoulos, H., Gammerman, A., Vovk, V.: Reliable diagnosis of acute abdominal pain with conformal prediction. Int. J. Eng. Intell. Syst. Electr. Eng. Commun. 17(2–3), 127–137 (2009). ISSN 1472-8915

    Google Scholar 

  14. Halgren, T.A.: Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17(5–6), 490–519 (1996)

    Article  Google Scholar 

  15. Bosnić, Z., Kononenko, I.: Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl. Eng. 67(3), 504–516 (2008)

    Article  Google Scholar 

  16. Faulon, J.-L., Visco, D.P. Jr., Pophale, R.S.: The signature molecular descriptor. 1. using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003)

    Article  Google Scholar 

  17. Faulon, J.-L., Collins, M.J., Carr, R.D.: The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J. Chem. Inf. Comput. Sci. 44(2), 427–436 (2004)

    Article  Google Scholar 

  18. Vapnik, V.N.: Statistical Learning Theory, 1st edn. Wiley (1998). ISBN 0471030031

  19. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer-Verlag, New York (2009)

    Book  MATH  Google Scholar 

  20. Papadopoulos, H., Haralambous, H.: Reliable prediction intervals with regression neural networks. Neural Netw. 24(8), 842–851 (2011). ISSN 0893-6080

    Article  Google Scholar 

  21. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at

    Article  Google Scholar 

  22. Vovk, V., Nouretdinov, I., Gammerman, A.: Testing exchangeability on-line. In: Proceedings of the 20th International Conference on Machine Learning, pp. 768–775 (2003)

  23. Fedorova, V., Gammerman, A., Nouretdinov, I., Vovk, V.: Plug-in martingales for testing exchangeability on-line. In: Proceedings of the 29th International Conference on Machine Learning (2012)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Martin Eklund.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eklund, M., Norinder, U., Boyer, S. et al. The application of conformal prediction to the drug discovery process. Ann Math Artif Intell 74, 117–132 (2015).

Download citation

  • Published:

  • Issue Date:

  • DOI:


Mathematics Subject Classifications (2010)