Quantitative-Structure Activity Relationship Modeling and Cheminformatics

Kuhn, Max

doi:10.1007/978-3-319-23558-5_6

Max Kuhn⁶

Part of the book series: Statistics for Biology and Health ((SBH))

3058 Accesses

Abstract

This chapter describes quantitative tools for analyzing chemical structures and relating them to assay results using statistical models. The focus is on prediction of new compounds as well as the exploratory analysis and data mining of large compound databases. Other issues related to how these analytical methods are used are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdi H, Williams L (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459
Article Google Scholar
Agrafiotis DK, Shemanarev M, Connolly PJ, Farnum M, Lobanov VS (2007) SAR maps: a mew SAR visualization technique for medicinal chemists. J Med Chem 50(24):5926–5937
Article Google Scholar
Austin P, Brunner L (2004) Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat Med 23(7):1159–1178
Article Google Scholar
Bishop C (2007) Pattern recognition and machine learning. Springer, New York
Google Scholar
Brown F (1998) Chemoinformatics: what is it and how does it impact drug discovery? In: Bristol J (ed) Annual reports in medicinal chemistry vol 33. Academic, New York, pp 375–384
Google Scholar
Brown H, Prescott R (2006) Applied mixed models in medicine. Wiley, New York
Book MATH Google Scholar
Burdick R, Borror C, Montgomery D (2005) Design and analysis of gauge R&R studies. SIAM, Philadelphia
Book MATH Google Scholar
Clark R (1997) OptiSim: an extended dissimilarity selection method for finding diverse representative subsets’. J Chem Inf Comput Sci 37(6):1181–1188
Article Google Scholar
Derringer G, Suich R (1980) Simultaneous optimization of several response variables. J Qual Technol 12(4):214–219
Google Scholar
Free S, Wilson J (1964) A mathematical contribution to structure-activity studies. J Med Chem 7(4):395–399
Article Google Scholar
Friedman J (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77
Article Google Scholar
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232
Google Scholar
Griffen E, Leach A, Robb G, Warner D (2011) Matched molecular pairs as a medicinal chemistry tool. J Med Chem 54(22):7739–7750
Article Google Scholar
Han J, Kamber M, Pei J (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Google Scholar
Harrington E (1965) The desirability function. Ind Qual Control 21(10):494–498
Google Scholar
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin
Google Scholar
Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 33(5):445–459
Google Scholar
Karthikeyan M, Glen R, Bender A (2005) General melting point prediction based on a diverse compound data set and artificial neural networks. J Chem Inf Model 45(3):581–590
Article Google Scholar
Kauffman G, Jurs P (2001) QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors. J Chem Inf Comput Sci 41(6):1553–1560
Article Google Scholar
Keefer C, Kauffman G, Gupta R (2013) Interpretable, probability-based confidence metric for continuous quantitative structure-activity relationship models. J Chem Inf Model 53(2): 368–383
Article Google Scholar
Kenny P, Montanari C (2013) Inflation of correlation in the pursuit of drug-likeness. J Comput Aided Mol Des 27(1):1–13
Article Google Scholar
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin
Book MATH Google Scholar
Leach A, Gillet V (2007) An introduction to chemoinformatics. Springer, Berlin
Book Google Scholar
Leach A, Jones H, Cosgrove D, Kenny P, Ruston L, MacFaul P, Wood J, Colclough N, Law B (2006) Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem 49(23):6672–6682
Article Google Scholar
Machatha S, Yalkowsky S (2005) Comparison of the octanol/water partition coefficients calculated by ClogP, ACDlogP and KowWin to experimentally determined values. Int J Pharm 294(1–2):185–192
Article Google Scholar
Maglich J, Kuhn M, Chapin R, Pletcher M (2014) More than just hormones: H295R cells as predictors of reproductive toxicity. Reprod Toxicol 45:77–86
Article Google Scholar
Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012) Does rational selection of training and test sets improve the outcome of QSAR modeling? J Chem Inf Model 52(10):2570–2578
Article Google Scholar
Mojirsheibani M (1998) Iterated bootstrap prediction intervals. Stat Sin 8:489–504
MATH MathSciNet Google Scholar
Mojirsheibani M, Tibshirani R (1996) Some results on bootstrap prediction intervals. Can J Stat 24(4):549–568
Article MATH MathSciNet Google Scholar
Myers R (1990) Classical and modern regression with applications, vol 2. Duxbury Press, Belmont, CA
Google Scholar
Netzeva T, Worth T, Aldenberg A, Benigni R, Cronin M, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant C (2005) Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Altern Lab Anim 33:155–173
Google Scholar
Schutt R, O’Neil C (2013) Doing data science. O’Reilly, Sebastopol, CA
Google Scholar
Sedykh A, Zhu H, Tang H, Zhang L, Richard A, Rusyn I, Tropsha A (2010) Use of in vitro HTS-derived concentration-response data as biological descriptors improves the accuracy of QSAR models of in vivo toxicity. Environ Health Perspect 119(3):364–370
Article Google Scholar
Snarey M, Terrett N, Willett P, Wilton DJ (1997) Comparison of algorithms for dissimilarity-based compound selection. J Mol Graph Model 15(6):372–385
Article Google Scholar
Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, New York
Google Scholar
Wassermann A, Haebel P, Weskamp N, Bajorath J (2012) SAR matrices: automated extraction of information-rich SAR tables from large compound data sets. J Chem Inf Model 52(7): 1769–1776
Article Google Scholar
Weaver S, Gleeson P (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26(8):1315–1326
Article Google Scholar
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36
Article Google Scholar

Download references

Acknowledgements

Thanks to Scot Mente and David Potter for providing feedback on draft versions of this manuscript.

Author information

Authors and Affiliations

Pfizer Global R&D, Groton, CT, USA
Max Kuhn

Authors

Max Kuhn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Max Kuhn .

Editor information

Editors and Affiliations

Nonclinical Statistics, Abbvie Inc, North Chicago, Illinois, USA
Lanju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kuhn, M. (2016). Quantitative-Structure Activity Relationship Modeling and Cheminformatics. In: Zhang, L. (eds) Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-319-23558-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-23558-5_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23557-8
Online ISBN: 978-3-319-23558-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics