Abstract
QSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using statistical learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity. However, predictions from a QSAR model are difficult to assess if their prediction intervals are unknown. In this paper we introduce conformal prediction into the QSAR field to address this issue. We apply support vector machine regression in combination with two nonconformity measures to five datasets of different sizes to demonstrate the usefulness of conformal prediction in QSAR modeling. One of the nonconformity measures provides prediction intervals with almost the same width as the size of the QSAR models’ prediction errors, showing that the prediction intervals obtained by conformal prediction are efficient and useful.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Netzeva, T.I., et al.: Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern. Lab Anim. 33(2), 155–173 (2005)
Dragos, H., Gilles, M., Alexandre, V.: Predicting the predictability: a unified approach to the applicability domain problem of QSAR models. J. Chem. Inf. Model. 49(7), 1762–1776 (2009)
Jaworska, J., Gabbert, S., Aldenberg, T.: Towards optimization of chemical testing under REACH: a Bayesian network approach to Integrated Testing Strategies. Regul. Toxicol. Pharmacol. 57(2-3), 157–167 (2010)
Bassan, A., Worth, A.P.: Computational Tools for Regulatory Needs. In: Computational Toxicology: Risk Assessment for Pharmaceutical and Environmental Chemicals, pp. 751–775. John Wiley & Sons, Inc. (2007)
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World, 1st edn. Springer (2005)
Shafer, G., Vovk, V.: A Tutorial on Conformal Prediction. Journal of Machine Learning Research 9, 371–421 (2008)
Halgren, T.A.: Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. Journal of Computational Chemistry 17(5-6), 490–519 (1996)
Bosnić, Z., Kononenko, I.: Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl. Eng. 67(3), 504–516 (2008)
Faulon, J.L., Visco Jr., D.P., Pophale, R.S.: The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003)
Faulon, J.L., Collins, M.J., Carr, R.D.: The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J. Chem. Inf. Comput. Sci. 44(2), 427–436 (2004)
Vapnik, V.N.: Statistical learning theory, 1st edn. Wiley (1998)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Papadopoulos, H., Vovk, V., Gammerman, A.: Regression conformal prediction with nearest neighbours. J. Artif. Int. Res. 40(1), 815–840 (2011)
Huuskonen, J.: Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. Journal of Chemical Information and Computer Sciences 40(3), 773–777 (2000)
Hintze, J.L., Nelson, R.D.: Violin plots: A box plot-density trace synergism. The American Statistician 52(2), 181–184 (1998)
Adler, D.: vioplot: Violin plot (2005), R package version 0.2
van Drie, J.H.: Pharmacophore discovery–lessons learned. Curr. Pharm. Des. 9(20), 1649–1664 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Eklund, M., Norinder, U., Boyer, S., Carlsson, L. (2012). Application of Conformal Prediction in QSAR. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H., Karatzas, K., Sioutas, S. (eds) Artificial Intelligence Applications and Innovations. AIAI 2012. IFIP Advances in Information and Communication Technology, vol 382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33412-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-33412-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33411-5
Online ISBN: 978-3-642-33412-2
eBook Packages: Computer ScienceComputer Science (R0)