The application of conformal prediction to the drug discovery process

  • Martin EklundEmail author
  • Ulf Norinder
  • Scott Boyer
  • Lars Carlsson


QSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using machine learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity during the drug discovery process. However, the confidence or reliability of predictions from a QSAR model are difficult to accurately assess. We frame the application of QSAR to preclinical drug development in an off-line inductive conformal prediction framework and apply it prospectively to historical data collected from four different assays within AstraZeneca over a time course of about five years. The results indicate weakened validity of the conformal predictor due to violations of the randomness assumption. The validity can be strengthen by adopting semi-off-line conformal prediction. The non-randomness of the data prevents exactly valid predictions but comparisons to the results of a traditional QSAR procedure applied to the same data indicate that conformal predictions are highly useful in the drug discovery process.


QSAR Conformal prediction Drug discovery Temporal model updating 

Mathematics Subject Classifications (2010)

62-07 92-08 68T05 68U20 68U07 62H99 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Perkins, R., Fang, H., Tong, W., Welsh, W.J.: Quantitative structure-activity relationship methods: perspectives on drug discovery and toxicology. Environ. Toxicol. Chem. 22(8), 1666–1679 (2003)CrossRefGoogle Scholar
  2. 2.
    Netzeva, T.I., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern. Lab. Anim. 33(2), 155–173 (2005)Google Scholar
  3. 3.
    Dragos, H., Gilles, M., Alexandre, V.: Predicting the predictability a unified approach to the applicability domain problem of QSAR models. J. Chem. Inf. Model. 49(7), 1762–1776 (2009)CrossRefGoogle Scholar
  4. 4.
    Jaworska, J., Gabbert, S., Aldenberg, T.: Towards optimization of chemical testing under REACH: a Bayesian network approach to integrated testing strategies. Regul. Toxicol. Pharmacol. 57(2–3), 157–167 (2010)CrossRefGoogle Scholar
  5. 5.
    Bassan, A., Worth, A.P.: Computational toxicology: risk assessment for pharmaceutical and environmental chemicals. In: Computational Tools for Regulatory Needs, pp. 751–775. John Wiley & Sons, Inc. (2007)Google Scholar
  6. 6.
    Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World, 1st edn. Springer (2005). ISBN 0387001522Google Scholar
  7. 7.
    Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Papadopoulos, H.: Inductive conformal prediction: theory and application to neural networks. In: Tools in Artificial Intelligence, pp. 315–330 (2008)Google Scholar
  9. 9.
    Papadopoulos, H., Vovk, V., Gammerman, A.: Regression conformal prediction with nearest neighbours. J. Artif. Intell. Res. 40(1), 815–840 (2011)zbMATHMathSciNetGoogle Scholar
  10. 10.
    Norinder, U., Ek, M.E.: Qsar investigation of NaV1.7 active compounds using the svm/signature approach and the bioclipse modeling platform. Bioorg. Med. Chem. Lett. 23(1), 261–263 (2013). doi: 10.1016/j.bmcl.2012.10.102 CrossRefGoogle Scholar
  11. 11.
    Eklund, M., Norinder, U., Boyer, S., Carlsson, L.: Application of conformal prediction in QSAR. In: AIAI (2), pp. 166–175 (2012)Google Scholar
  12. 12.
    Nouretdinov, I., Gammerman, A., Qi, Y., Klein-Seetharaman, J.: Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method. Pac. Symp. Biocomput. 311–322 (2012)Google Scholar
  13. 13.
    Papadopoulos, H., Gammerman, A., Vovk, V.: Reliable diagnosis of acute abdominal pain with conformal prediction. Int. J. Eng. Intell. Syst. Electr. Eng. Commun. 17(2–3), 127–137 (2009). ISSN 1472-8915Google Scholar
  14. 14.
    Halgren, T.A.: Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17(5–6), 490–519 (1996)CrossRefGoogle Scholar
  15. 15.
    Bosnić, Z., Kononenko, I.: Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl. Eng. 67(3), 504–516 (2008)CrossRefGoogle Scholar
  16. 16.
    Faulon, J.-L., Visco, D.P. Jr., Pophale, R.S.: The signature molecular descriptor. 1. using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003)CrossRefGoogle Scholar
  17. 17.
    Faulon, J.-L., Collins, M.J., Carr, R.D.: The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J. Chem. Inf. Comput. Sci. 44(2), 427–436 (2004)CrossRefGoogle Scholar
  18. 18.
    Vapnik, V.N.: Statistical Learning Theory, 1st edn. Wiley (1998). ISBN 0471030031Google Scholar
  19. 19.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer-Verlag, New York (2009)zbMATHCrossRefGoogle Scholar
  20. 20.
    Papadopoulos, H., Haralambous, H.: Reliable prediction intervals with regression neural networks. Neural Netw. 24(8), 842–851 (2011). ISSN 0893-6080CrossRefGoogle Scholar
  21. 21.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at CrossRefGoogle Scholar
  22. 22.
    Vovk, V., Nouretdinov, I., Gammerman, A.: Testing exchangeability on-line. In: Proceedings of the 20th International Conference on Machine Learning, pp. 768–775 (2003)Google Scholar
  23. 23.
    Fedorova, V., Gammerman, A., Nouretdinov, I., Vovk, V.: Plug-in martingales for testing exchangeability on-line. In: Proceedings of the 29th International Conference on Machine Learning (2012)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Martin Eklund
    • 1
    • 2
    Email author
  • Ulf Norinder
    • 3
  • Scott Boyer
    • 2
  • Lars Carlsson
    • 2
  1. 1.Pharmaceutical BiosciencesUppsala UniversityUppsalaSweden
  2. 2.AstraZeneca Research and DevelopmentMölndalSweden
  3. 3.H. Lundbeck A/SValbyDenmark

Personalised recommendations