Skip to main content

Part of the book series: Statistics for Biology and Health ((SBH))

Abstract

This chapter is a broad overview of the drug discovery process and areas where statistical input can have a key impact. The focus is primarily in a few key areas: target discovery, compound screening/optimization, and the characterization of important properties. Special attention is paid to working with assay data and phenotypic screens. A discussion of important skills for a nonclinical statistician supporting drug discovery concludes the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.pngu.mgh.harvard.edu/purcell/plink/.

  2. 2.

    The term “Mendelian Randomization” refers to the notion that we are randomized at birth to the genetic “treatment” of the SNP.

  3. 3.

    Thankfully, the academic community has been highly co-operative with one another in creating large consortia to produce meta-analyses from many smaller GWAS studies that total to hundreds of thousands of subjects.

  4. 4.

    http://www.iconplc.com.

  5. 5.

    http://www.certara.com.

  6. 6.

    http://www.simcyp.com/.

  7. 7.

    http://www.simulations-plus.com/.

  8. 8.

    http://www.mbswonline.com.

  9. 9.

    http://bit.ly/1qilzvh.

References

  • Abecasis G, Cherny S, Cookson W, Cardon L (2001) Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97–101

    Article  Google Scholar 

  • Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2007) Molecular biology of the cell. Garland Publishing, New York

    Google Scholar 

  • Alberts B, Bray D, Hopkin K, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2013) Essential cell biology. Garland Publishing, New York

    Google Scholar 

  • Anderson B, Holford N (2008) Mechanism-based concepts of size and maturity in pharmacokinetics. Ann Rev Pharmacol Toxicol 48(1):303–332

    Article  Google Scholar 

  • Arrowsmith J (2011a) Trial watch: phase III and submission failures: 2007–2010. Nat Rev Drug Discov 10(2):87–87

    Google Scholar 

  • Arrowsmith J (2011b) Trial watch: phase II failures: 2008–2010. Nat Rev Drug Discov 10(5): 328–329

    Google Scholar 

  • Bickle M (2010) The beautiful cell: high-content screening in drug discovery. Anal Bioanal Chem 398(1):219–226

    Article  Google Scholar 

  • Bonate P (2011) Pharmacokinetic-pharmacodynamic modeling and simulation. Springer, Berlin

    Book  Google Scholar 

  • Box GEP, Hunter S, Hunter W (2005) Statistics for experimenters: design, innovation, and discovery. Wiley, Hoboken

    Google Scholar 

  • Burdick R, Borror C, Montgomery D (2003) A review of methods for measurement systems capability analysis. J Qual Technol 35(4):342–354

    Google Scholar 

  • Burdick R, Borror C, Montgomery D (2005) Design and analysis of gauge R&R studies: making decisions with confidence intervals in random and mixed ANOVA models, vol 17. SIAM, Philadelphia

    Book  Google Scholar 

  • Burton P, Clayton D, Cardon L, Craddock N et al (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145):661–678

    Article  Google Scholar 

  • Clark J, Flanagan M, Telliez J-B (2014) Discovery and development of janus kinase (JAK) inhibitors for inflammatory diseases. J Med Chem 57(12):5023–5038

    Article  Google Scholar 

  • Cochran W, Cox G (1950) Experimental designs. Wiley, New York

    Google Scholar 

  • Crick F (1970) Central dogma of molecular biology. Nature 227(5258):561–563

    Article  Google Scholar 

  • Curry S, McCarthy D, DeCory H, Marler M, Gabrielsson J (2002) Phase I: the first ppportunity for extrapolation from animal data to human exposure. Wiley, New York, pp 95–115

    Google Scholar 

  • Djebali S, Davis C, Merkel A, Dobin A et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108

    Article  Google Scholar 

  • Dunham I, Kundaje A, Aldred S, Collins P et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74

    Article  Google Scholar 

  • Eggert U (2013) The why and how of phenotypic small-molecule screens. Nat Chem Biol 9(4):206–209

    Article  Google Scholar 

  • Espie P, Tytgat D, Sargentini-Maier M, Poggesi I, Watelet J (2009) Physiologically based pharmacokinetics (PBPK). Drug Metab Rev 41(3):391–407

    Article  Google Scholar 

  • Evans S, Dawson P (1988) The end of the p value? Br Heart J 60(3):177

    Article  Google Scholar 

  • Fieller E (1954) Some problems in interval estimation. J R Stat Soc Ser B (Methodological) 16(2):175–185

    MATH  MathSciNet  Google Scholar 

  • Ganesh T, Jiang J, Yang M, Dingledine R (2014) Lead optimization studies of cinnamic amide EP2 antagonists. J Med Chem 57(10):4173–4184

    Article  Google Scholar 

  • Gao X (2011) Multiple testing corrections for imputed SNPs. Genet Epidemiol 35(3):154–158

    Article  Google Scholar 

  • Gentleman R, Carey VJ, Bates D et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80

    Google Scholar 

  • Gregory R (2005) Synergy between sequence and size in large-scale genomics. Nat Rev Genet 6(9):699–708

    Article  Google Scholar 

  • Griffith M, Griffith O, Coffman A, Weible J, McMichael J, Spies N, Koval J, Das I, Callaway M, Eldred J, Miller C, Subramanian J, Govindan R, Kumar R, Bose R, Ding L, Walker J, Larson D, Dooling D, Smith S, Ley T, Mardis E, Wilson R (2013) DGIdb: mining the druggable genome. Nat Methods 10(12):1209–1210

    Article  Google Scholar 

  • Grundberg E, Small K, Hedman A, Nica A et al (2012) Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet 44(10):1084–1089

    Article  Google Scholar 

  • Haaland P (1989) Experimental design in biotechnology, vol 105. CRC Press, Boca Raton

    Google Scholar 

  • Haney S, Lapan P, Pan J, Zhang J (2006) High-content screening moves to the front of the line. Drug Discov Today 11(19–20):889–894

    Article  Google Scholar 

  • Harvey P, Tarran R, Garoff S, Myerburg M (2011) Measurement of the airway surface liquid volume with simple light refraction microscopy. Am J Respir Cell Mol Biol 45(3):592–599

    Article  Google Scholar 

  • Hendriks M, de Boer J, Smilde A (1996) Robustness of analytical chemical methods and pharmaceutical technological products. Elsevier, Amsterdam

    Google Scholar 

  • Hermann J, Chen Y, Wartchow C, Menke J, Gao L, Gleason S, Haynes N, Scott N, Petersen A, Gabriel S, Vu B, George K, Narayanan A, Li S, Qian H, Beatini N, Niu L, Gan Q (2013) Metal impurities cause false positives in high-throughput screening campaigns. ACS Med Chem Lett 4(2):197–200

    Article  Google Scholar 

  • Hill A, LaPan P, Li Y, Haney S (2007) Impact of image segmentation on high-content screening data quality for SK-BR-3 cells. BMC Bioinf 8(1):340–353

    Article  Google Scholar 

  • Holmes M, Simon T, Exeter H, Folkersen L et al (2013) Secretory phospholipase A2-IIA and cardiovascular disease. J Am Coll Cardiol 62(21):1966–1976

    Article  Google Scholar 

  • Howie B, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000529

    Article  Google Scholar 

  • Hughes J, Rees S, Kalindjian S, Philpott K (2011) Principles of early drug discovery. Br J Pharmacol 162(6):1239–1249

    Article  Google Scholar 

  • Hwang W, Fu Y, Reyon D, Maeder M, Tsai S, Sander J, Peterson R, Yeh J-R, Joung J (2013) Efficient genome editing in zebrafish using a CRISPR-Cas system. Nat Biotechnol 31(3): 227–229

    Article  Google Scholar 

  • Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice Hall, New York

    MATH  Google Scholar 

  • Jones S, de Souza P, Lindsay M (2004) siRNA for gene silencing: a route to drug target discovery. Curr Opin Pharmacol 4(5):522–527

    Article  Google Scholar 

  • Jorde L, Wooding S (2004) Genetic variation, classification and ‘race’. Nat Genet 36:S28–S33

    Article  Google Scholar 

  • Kainkaryam R, Woolf P (2009) Pooling in high-throughput drug screening. Curr Opin Drug Discov Dev 12(3):339–350

    Google Scholar 

  • Kalbfleisch J, Prentice R (1980) The statistical analysis of failure time data. Wiley, New York

    MATH  Google Scholar 

  • Kang H, Sul J, Service S, Zaitlen N, Kong S, Freimer N, Sabatti C, Eskin E (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42(4):348–354

    Article  Google Scholar 

  • Kim S, Swaminathan S, Inlow M, Risacher S, The Alzheimer’s Disease Neuroimaging Initiative (ADNI) (2013) Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLoS ONE 8(7):e70269

    Article  Google Scholar 

  • Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 3(8):711–716

    Article  Google Scholar 

  • Korn K, Krausz E (2007) Cell-based high-content screening of small-molecule libraries. Curr Opin Chem Biol 11(5):503–510

    Article  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, Berlin

    Book  MATH  Google Scholar 

  • Landry Y, Gies J-P (2008) Drugs and their molecular targets: an updated overview. Fundam Clin Pharmacol 22(1):1–18

    Article  Google Scholar 

  • Li J, Ji L (2005) Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95(3):221–227

    Article  Google Scholar 

  • Li Y, Willer C, Ding J, Scheet P, Abecasis G (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8):816–834

    Article  Google Scholar 

  • Lin J, Lu A (1997) Role of pharmacokinetics and metabolism in drug discovery and development. Pharmacol Rev 49(4):403–449

    Google Scholar 

  • Lindsay M (2003) Target discovery. Nat Rev Drug Discov 2(10):831–838

    Article  Google Scholar 

  • Lonsdale J, Thomas J, Salvatore M, Phillips R et al (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45(6):580–585

    Article  Google Scholar 

  • Luo C, Laaja P (2004) Inhibitors of JAKs/STATs and the kinases: a possible new cluster of drugs. Drug Discov Today 9(6):268–275

    Article  Google Scholar 

  • Malo N, Hanley J, Cerquozzi S, Pelletier J, Nadon R (2006) Statistical practice in high-throughput screening data analysis. Nat Biotechnol 24(2):167–175

    Article  Google Scholar 

  • Matthews J, Altman D (1996) Statistics notes: interaction 2: compare effect sizes not P values. Br Med J 313(7060):808–808

    Article  Google Scholar 

  • McVean G, Altshuler D, Durbin R, Abecasis G et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65

    Article  Google Scholar 

  • Montgomery D (2012) Introduction to statistical quality control. Wiley, New York

    Google Scholar 

  • Muller P, Milton M (2012) The determination and interpretation of the therapeutic index in drug development. Nat Rev Drug Discov 11(10):751–761

    Article  Google Scholar 

  • Murray C, Rees D (2009) The rise of fragment-based drug discovery. Nat Chem 1(3):187–192

    Article  Google Scholar 

  • Nyholt D (2004) A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 74(4):765–769

    Article  Google Scholar 

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, Maller J, Sklar P, de Bakker P, Daly M, Sham P (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575

    Article  Google Scholar 

  • R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org

  • Rang H, Dale M, Ritter J, Moore P (2007) Pharmacology. Churchill Livingstone, Edinburgh

    Book  Google Scholar 

  • Ratjen F, Doring D (2003) Cystic fibrosis. Lancet 361(9358):681–689

    Article  Google Scholar 

  • Remlinger K, Hughes-Oliver J, Young S, Lam R (2006) Statistical design of pools using optimal coverage and minimal collision. Technometrics 48(1):133–143

    Article  MathSciNet  Google Scholar 

  • Rendic S, Di Carlo F (1997) Human cytochrome P450 enzymes: a status report summarizing their reactions, substrates, inducers, and inhibitors. Drug Metab Rev 29(1–2):413–580

    Article  Google Scholar 

  • Rockman M, Kruglyak L (2006) Genetics of global gene expression. Nat Rev Genet 7(11):862–872

    Article  Google Scholar 

  • Sackett D (2001) Why randomized controlled trials fail but needn’t: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). Can Med Assoc J 165(9):1226–1237

    Google Scholar 

  • Shariff A, Kangas J, Coelho L, Quinn S, Murphy R (2010) Automated image analysis for high-content screening and analysis. J Biomol Screen 15(7):726–734

    Article  Google Scholar 

  • Shin S, Fauman E, Petersen A, Krumsiek J et al (2014) An atlas of genetic influences on human blood metabolites. Nat Genet 46(6):543–550

    Article  Google Scholar 

  • Simpson E (1951) The interpretation of interaction in contingency tables. J R Stat Soc Ser B (Methodological) 13:238–241

    MATH  Google Scholar 

  • Smith G, Shah E (2003) Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32(1):1–22

    Article  Google Scholar 

  • Soille P (2003) Morphological image analysis: principles and applications. Springer, Berlin

    Google Scholar 

  • Sterne J (2001) Sifting the evidence—what’s wrong with significance tests? Another comment on the role of statistical methods. Br Med J 322(7280):226–231

    Article  Google Scholar 

  • Swinney D (2013) Phenotypic vs. target-based drug discovery for first-in-class medicines. Clin Pharmacol Ther 93(4):299–301

    Article  Google Scholar 

  • Swinney D, Anthony J (2011) How were new medicines discovered? Nat Rev Drug Discov 10(7):507–519

    Article  Google Scholar 

  • The C Reactive Protein Coronary Heart Disease Genetics Collaboration (2011) Association between c reactive protein and coronary heart disease: mendelian randomisation analysis based on individual participant data. Br Med J 342:d548

    Article  Google Scholar 

  • Verkman A, Song Y, Thiagarajah J (2003) Role of airway surface liquid and submucosal glands in cystic fibrosis lung disease. Am J Physiol Cell Physiol 284(1):C2–C15

    Article  Google Scholar 

  • Voight B, Peloso G, Orho-Melander M, Frikke-Schmidt R, Barbalic M, Jensen M, Hindy G, Holm H, Ding E, Johnson T et al (2012) Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380(9841):572–580

    Article  Google Scholar 

  • Wang Q, Rager J, Weinstein K, Kardos P, Dobson G, Li J, Hidalgo I (2005) Evaluation of the MDR-MDCK cell line as a permeability screen for the blood-brain barrier. Int J Pharm 288(2): 349–359

    Article  Google Scholar 

  • Watson J (1992) Recombinant DNA. Macmillan, New York

    Google Scholar 

  • Wilks A (2008) The JAK kinases: not just another kinase drug discovery target. Semin Cell Dev Biol 19(4):319–328

    Article  MathSciNet  Google Scholar 

  • Yang H, Liu X, Chimalakonda A, Lu Z, Chen C, Lee F, Shyu W (2010) Applied pharmacokinetics in drug discovery and development. Wiley, Hoboken, pp 177–239

    Google Scholar 

  • Zhang X (2011) Optimal high-throughput screening: practical experimental design and data analysis for genome-scale RNAi research. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Zheng W, Thorne N, McKew J (2013) Phenotypic screens as a renewed approach for drug discovery. Drug Discov Today 18(21–22):1067–1073

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank David Potter and Bill Pikounis for providing feedback on a draft of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Kuhn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kuhn, M., Yates, P., Hyde, C. (2016). Statistical Methods for Drug Discovery. In: Zhang, L. (eds) Nonclinical Statistics for Pharmaceutical and Biotechnology Industries. Statistics for Biology and Health. Springer, Cham. https://doi.org/10.1007/978-3-319-23558-5_4

Download citation

Publish with us

Policies and ethics