Skip to main content

Quantitative-Structure Activity Relationship Modeling and Cheminformatics

  • Chapter
Nonclinical Statistics for Pharmaceutical and Biotechnology Industries

Part of the book series: Statistics for Biology and Health ((SBH))

  • 3058 Accesses

Abstract

This chapter describes quantitative tools for analyzing chemical structures and relating them to assay results using statistical models. The focus is on prediction of new compounds as well as the exploratory analysis and data mining of large compound databases. Other issues related to how these analytical methods are used are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abdi H, Williams L (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459

    Article  Google Scholar 

  • Agrafiotis DK, Shemanarev M, Connolly PJ, Farnum M, Lobanov VS (2007) SAR maps: a mew SAR visualization technique for medicinal chemists. J Med Chem 50(24):5926–5937

    Article  Google Scholar 

  • Austin P, Brunner L (2004) Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat Med 23(7):1159–1178

    Article  Google Scholar 

  • Bishop C (2007) Pattern recognition and machine learning. Springer, New York

    Google Scholar 

  • Brown F (1998) Chemoinformatics: what is it and how does it impact drug discovery? In: Bristol J (ed) Annual reports in medicinal chemistry vol 33. Academic, New York, pp 375–384

    Google Scholar 

  • Brown H, Prescott R (2006) Applied mixed models in medicine. Wiley, New York

    Book  MATH  Google Scholar 

  • Burdick R, Borror C, Montgomery D (2005) Design and analysis of gauge R&R studies. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  • Clark R (1997) OptiSim: an extended dissimilarity selection method for finding diverse representative subsets’. J Chem Inf Comput Sci 37(6):1181–1188

    Article  Google Scholar 

  • Derringer G, Suich R (1980) Simultaneous optimization of several response variables. J Qual Technol 12(4):214–219

    Google Scholar 

  • Free S, Wilson J (1964) A mathematical contribution to structure-activity studies. J Med Chem 7(4):395–399

    Article  Google Scholar 

  • Friedman J (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77

    Article  Google Scholar 

  • Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232

    Google Scholar 

  • Griffen E, Leach A, Robb G, Warner D (2011) Matched molecular pairs as a medicinal chemistry tool. J Med Chem 54(22):7739–7750

    Article  Google Scholar 

  • Han J, Kamber M, Pei J (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Harrington E (1965) The desirability function. Ind Qual Control 21(10):494–498

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin

    Google Scholar 

  • Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 33(5):445–459

    Google Scholar 

  • Karthikeyan M, Glen R, Bender A (2005) General melting point prediction based on a diverse compound data set and artificial neural networks. J Chem Inf Model 45(3):581–590

    Article  Google Scholar 

  • Kauffman G, Jurs P (2001) QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors. J Chem Inf Comput Sci 41(6):1553–1560

    Article  Google Scholar 

  • Keefer C, Kauffman G, Gupta R (2013) Interpretable, probability-based confidence metric for continuous quantitative structure-activity relationship models. J Chem Inf Model 53(2): 368–383

    Article  Google Scholar 

  • Kenny P, Montanari C (2013) Inflation of correlation in the pursuit of drug-likeness. J Comput Aided Mol Des 27(1):1–13

    Article  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin

    Book  MATH  Google Scholar 

  • Leach A, Gillet V (2007) An introduction to chemoinformatics. Springer, Berlin

    Book  Google Scholar 

  • Leach A, Jones H, Cosgrove D, Kenny P, Ruston L, MacFaul P, Wood J, Colclough N, Law B (2006) Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem 49(23):6672–6682

    Article  Google Scholar 

  • Machatha S, Yalkowsky S (2005) Comparison of the octanol/water partition coefficients calculated by ClogP, ACDlogP and KowWin to experimentally determined values. Int J Pharm 294(1–2):185–192

    Article  Google Scholar 

  • Maglich J, Kuhn M, Chapin R, Pletcher M (2014) More than just hormones: H295R cells as predictors of reproductive toxicity. Reprod Toxicol 45:77–86

    Article  Google Scholar 

  • Martin T, Harten P, Young D, Muratov E, Golbraikh A, Zhu H, Tropsha A (2012) Does rational selection of training and test sets improve the outcome of QSAR modeling? J Chem Inf Model 52(10):2570–2578

    Article  Google Scholar 

  • Mojirsheibani M (1998) Iterated bootstrap prediction intervals. Stat Sin 8:489–504

    MATH  MathSciNet  Google Scholar 

  • Mojirsheibani M, Tibshirani R (1996) Some results on bootstrap prediction intervals. Can J Stat 24(4):549–568

    Article  MATH  MathSciNet  Google Scholar 

  • Myers R (1990) Classical and modern regression with applications, vol 2. Duxbury Press, Belmont, CA

    Google Scholar 

  • Netzeva T, Worth T, Aldenberg A, Benigni R, Cronin M, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant C (2005) Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Altern Lab Anim 33:155–173

    Google Scholar 

  • Schutt R, O’Neil C (2013) Doing data science. O’Reilly, Sebastopol, CA

    Google Scholar 

  • Sedykh A, Zhu H, Tang H, Zhang L, Richard A, Rusyn I, Tropsha A (2010) Use of in vitro HTS-derived concentration-response data as biological descriptors improves the accuracy of QSAR models of in vivo toxicity. Environ Health Perspect 119(3):364–370

    Article  Google Scholar 

  • Snarey M, Terrett N, Willett P, Wilton DJ (1997) Comparison of algorithms for dissimilarity-based compound selection. J Mol Graph Model 15(6):372–385

    Article  Google Scholar 

  • Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, New York

    Google Scholar 

  • Wassermann A, Haebel P, Weskamp N, Bajorath J (2012) SAR matrices: automated extraction of information-rich SAR tables from large compound data sets. J Chem Inf Model 52(7): 1769–1776

    Article  Google Scholar 

  • Weaver S, Gleeson P (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26(8):1315–1326

    Article  Google Scholar 

  • Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Scot Mente and David Potter for providing feedback on draft versions of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Kuhn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kuhn, M. (2016). Quantitative-Structure Activity Relationship Modeling and Cheminformatics. In: Zhang, L. (eds) Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-319-23558-5_6

Download citation

Publish with us

Policies and ethics