Abstract
This chapter describes quantitative tools for analyzing chemical structures and relating them to assay results using statistical models. The focus is on prediction of new compounds as well as the exploratory analysis and data mining of large compound databases. Other issues related to how these analytical methods are used are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdi H, Williams L (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459
Agrafiotis DK, Shemanarev M, Connolly PJ, Farnum M, Lobanov VS (2007) SAR maps: a mew SAR visualization technique for medicinal chemists. J Med Chem 50(24):5926–5937
Austin P, Brunner L (2004) Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat Med 23(7):1159–1178
Bishop C (2007) Pattern recognition and machine learning. Springer, New York
Brown F (1998) Chemoinformatics: what is it and how does it impact drug discovery? In: Bristol J (ed) Annual reports in medicinal chemistry vol 33. Academic, New York, pp 375–384
Brown H, Prescott R (2006) Applied mixed models in medicine. Wiley, New York
Burdick R, Borror C, Montgomery D (2005) Design and analysis of gauge R&R studies. SIAM, Philadelphia
Clark R (1997) OptiSim: an extended dissimilarity selection method for finding diverse representative subsets’. J Chem Inf Comput Sci 37(6):1181–1188
Derringer G, Suich R (1980) Simultaneous optimization of several response variables. J Qual Technol 12(4):214–219
Free S, Wilson J (1964) A mathematical contribution to structure-activity studies. J Med Chem 7(4):395–399
Friedman J (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232
Griffen E, Leach A, Robb G, Warner D (2011) Matched molecular pairs as a medicinal chemistry tool. J Med Chem 54(22):7739–7750
Han J, Kamber M, Pei J (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Harrington E (1965) The desirability function. Ind Qual Control 21(10):494–498
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin
Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 33(5):445–459
Karthikeyan M, Glen R, Bender A (2005) General melting point prediction based on a diverse compound data set and artificial neural networks. J Chem Inf Model 45(3):581–590
Kauffman G, Jurs P (2001) QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors. J Chem Inf Comput Sci 41(6):1553–1560
Keefer C, Kauffman G, Gupta R (2013) Interpretable, probability-based confidence metric for continuous quantitative structure-activity relationship models. J Chem Inf Model 53(2): 368–383
Kenny P, Montanari C (2013) Inflation of correlation in the pursuit of drug-likeness. J Comput Aided Mol Des 27(1):1–13
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin
Leach A, Gillet V (2007) An introduction to chemoinformatics. Springer, Berlin
Leach A, Jones H, Cosgrove D, Kenny P, Ruston L, MacFaul P, Wood J, Colclough N, Law B (2006) Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem 49(23):6672–6682
Machatha S, Yalkowsky S (2005) Comparison of the octanol/water partition coefficients calculated by ClogP, ACDlogP and KowWin to experimentally determined values. Int J Pharm 294(1–2):185–192
Maglich J, Kuhn M, Chapin R, Pletcher M (2014) More than just hormones: H295R cells as predictors of reproductive toxicity. Reprod Toxicol 45:77–86
Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012) Does rational selection of training and test sets improve the outcome of QSAR modeling? J Chem Inf Model 52(10):2570–2578
Mojirsheibani M (1998) Iterated bootstrap prediction intervals. Stat Sin 8:489–504
Mojirsheibani M, Tibshirani R (1996) Some results on bootstrap prediction intervals. Can J Stat 24(4):549–568
Myers R (1990) Classical and modern regression with applications, vol 2. Duxbury Press, Belmont, CA
Netzeva T, Worth T, Aldenberg A, Benigni R, Cronin M, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant C (2005) Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Altern Lab Anim 33:155–173
Schutt R, O’Neil C (2013) Doing data science. O’Reilly, Sebastopol, CA
Sedykh A, Zhu H, Tang H, Zhang L, Richard A, Rusyn I, Tropsha A (2010) Use of in vitro HTS-derived concentration-response data as biological descriptors improves the accuracy of QSAR models of in vivo toxicity. Environ Health Perspect 119(3):364–370
Snarey M, Terrett N, Willett P, Wilton DJ (1997) Comparison of algorithms for dissimilarity-based compound selection. J Mol Graph Model 15(6):372–385
Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, New York
Wassermann A, Haebel P, Weskamp N, Bajorath J (2012) SAR matrices: automated extraction of information-rich SAR tables from large compound data sets. J Chem Inf Model 52(7): 1769–1776
Weaver S, Gleeson P (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26(8):1315–1326
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36
Acknowledgements
Thanks to Scot Mente and David Potter for providing feedback on draft versions of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kuhn, M. (2016). Quantitative-Structure Activity Relationship Modeling and Cheminformatics. In: Zhang, L. (eds) Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-319-23558-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-23558-5_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23557-8
Online ISBN: 978-3-319-23558-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)