Skip to main content

Statistics and Decision Making in High-Throughput Screening

  • Protocol
  • First Online:
High Throughput Screening

Part of the book series: Methods in Molecular Biology ((MIMB,volume 565))

Abstract

Screening is about making decisions on the modulating activity of one particular compound on a biological system. When a compound testing experiment is repeated under the same conditions or as close to the same conditions as possible, the observed results are never exactly the same, and there is an apparent random and uncontrolled source of variability in the system under study. Nevertheless, randomness is not haphazard. In this context, we can see statistics as the science of decision making under uncertainty. Thus, the usage of statistical tools in the analysis of screening experiments is the right approach to the interpretation of screening data, with the aim of making them meaningful and converting them into valuable information that supports sound decision making.

In the HTS workflow, there are at least three key stages where key decisions have to be made based on experimental data: (1) assay development (i.e. how to assess whether our assay is good enough to be put into screening production for the identification of modulators of the target of interest), (2) HTS campaign process (i.e. monitoring that screening process is performing at the expected quality and assessing possible patterned signs of experimental response that may adversely bias and mislead hit identification) and (3) data analysis of primary HTS data (i.e. flagging which compounds are giving a positive response in the assay, namely hit identification).

In this chapter we will focus on how some statistical tools can help to cope with these three aspects. Assessment of assay quality is reviewed in other chapters, so in Section 1 we will briefly make some further considerations. Section 2 will review statistical process control, Section 3 will cover methodologies for detecting and dealing with HTS patterns and Section 4 will describe approaches for statistically guided selection of hits in HTS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

EDA:

Exploratory Data Analysis

IQR:

Inter-Quartile Range

M:

Mean

MSR:

Minimum Significant Ratio

PR:

Pattern Recognition

QA:

Quality Assurance

QC:

Quality Control

QSAR:

Quantitative Structure Activity Relationship

SD:

Standard Deviation

SDI:

Standard Deviation of Inactives

SEL:

Systematic Error Level

SPC:

Statistical Process Control

SQC:

Screening Quality Control

uHTS:

ultra-High-Throughput Screening

VEP:

Variance Explained by the Patterns

References

  1. Charles Annis, Statistical Engineering. Available online at http://www.statisticalengineering.com

  2. Malo N, Hanley JA, Cerquozzi S, Pelletier J, Nadon R. (2006) Statistical practice in high-throughput screening data analysis. Nat Biotechnol; 24(2): 167–175.

    Article  CAS  Google Scholar 

  3. Macarron, R and Hertzberg R. Chapter 2 of this book, Design and Implementation of High Throughput Screening Assays.

    Google Scholar 

  4. Assay Guidance Manual Version 4.1. (2005) Eli Lilly and Company and NIH Chemical Genomics Center. Available online at http://www.ncgc.nih.gov/manual/toc.html

  5. Taylor P, Stewart F, Dunnington DJ et al. (2000) Automated assay optimization with integrated statistics and smart robotics. J Biomol Screen; 5: 213–225.

    Article  CAS  Google Scholar 

  6. Eastwood BJ, Farmen MW, Iversen PW, Craft TJ, Smallwood JK, Garbison KE, Delapp NW, Smith GF. (2006) The minimum significant ratio: a statistical parameter to characterize the reproducibility of potency estimates from concentration-response assays and estimation by replicate-experiment studies. J Biomol Screen; 11(3): 253–261.

    Article  Google Scholar 

  7. Sittampalam GS, Iversen PW, Boadt JA, Kahl SD, Bright S, Zock JM, Janzen WP, Lister MD. (1997) Design of signal windows in high throughput screening assays for drug discovery. J Biomol Screen; 2: 159–169.

    Article  Google Scholar 

  8. Iversen PW, Eastwood BJ, Sittampalam GS, Cox KL. (2006) A comparison of assay performance measures in screening assays: signal window, Z' factor, and assay variability ratio. J Biomol Screen; 11: 247–252.

    Article  CAS  Google Scholar 

  9. Zhang JH, Chung TDY, Oldenburg KR. (1994) A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen; 4: 67–73.

    Article  Google Scholar 

  10. Gribbon P, Lyons R, Laflin P, Bradley J, Chambers C, Williams BS, Keighley W. (2005) Sewing A. Evaluating real-life high-throughput screening data. J Biomol Screen; 10(2): 99–107.

    Article  CAS  Google Scholar 

  11. Wu Z, Sui, Y. (2008) Quantitative assessment of hit detection and confirmation in single and duplicate high-throughput screenings. J Biomol Screen Online First; first published on January 23, 2008 as doi:10.1177/1087057107312628.

    Google Scholar 

  12. Gunter B, Brideau C, Pikounis B, Liaw A. (2003) Statistical and graphical methods for quality control determination of high-throughput screening data. J Biomol Screen; 8(6): 624–633.

    Article  Google Scholar 

  13. Brideau C, Gunter B, Pikounis B, Liaw A. (2003) Improved statistical methods for hit selection in high-throughput screening. J Biomol Screen; 8(6): 634–647.

    Article  Google Scholar 

  14. Wu G, Yuan Y, Hodge CN. (2003) Determining appropriate substrate conversion for enzymatic assays in high-throughput screening. J Biomol Screen; 8(6): 694–700.

    Article  CAS  Google Scholar 

  15. Padmanabha R, Cook L, Gill J. (2005) HTS quality control and data analysis: a process to maximize information from a high-throughput screen. Comb Chem High Throughput Screen; 8(6): 521–527.

    Article  CAS  Google Scholar 

  16. Westgard JO. (2001) Six Sigma Quality Design & Control. Desirable Precision and Requisite QC for Laboratory Measurement Processes. Westgard QC, Inc., Madison.

    Google Scholar 

  17. Enrick NL. (1985) Quality, Reliability, and Process Improvement. Industrial Press Inc, New York.

    Google Scholar 

  18. Coma I, Clark L, Diez E, Harper G, Herranz J, Hofmann G, Lennon M, Richmond N, Valmaseda M, Macarron R. (2009) Process validation and screen reproducibility in high-throughput screening. J Biomol Screen; 4(1): 66–76.

    Google Scholar 

  19. Analytical Methods Committee. Robust Statistics-How Not to Reject Outliers. (1989); Analyst 114: 1693–1697.

    Article  Google Scholar 

  20. Kevorkov D, Makarenkov V. (2005) Statistical analysis of systematic errors in high-throughput screening. J Biomol Screen; 10(6): 557–567.

    Article  Google Scholar 

  21. Available online at http://www.info2.uqam.ca/∼makarenv/HTS/old/hts.html

  22. Root DE, Kelley BP, Stockwell BR. (2003) Detecting spatial patterns in biological array experiments. J Biomol Screen; 8(4): 393–398.

    Article  CAS  Google Scholar 

  23. Makarenkov V, Zentilli P, Kevorkov D, Gagarin A, Malo N, Nadon R. (2007) An efficient method for the detection and elimination of systematic error in high-throughput screening. Bioinformatics; 23(13): 1648–1657.

    Article  CAS  Google Scholar 

  24. Tukey JW. (1977) Exploratory Data Analysis. Addison-Wesley, Reading, MA.

    Google Scholar 

  25. Hoaglin J, Mosteller F, Tukey J. (1983) Understanding Robust and Exploratory Data Analysis. John Wiley, New York.

    Google Scholar 

  26. Inglese J, Auld DS, Jadhav A et al. (2006) Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. Proc Natl Acad Sci USA; 103(31): 11473–11478.

    Article  CAS  Google Scholar 

  27. Popa-Burke IG, Issakova O, Arroway JD, Bernasconi P, Chen M, Coudurier L, Galasinski S, Jadhav AP, Janzen WP, Lagasca D, Liu D, Lewis RS, Mohney RP, Sepetov N, Sparkman DA, Hodge CN. (2004) Streamlined system for purifying and quantifying a diverse library of compounds and the effect of compound concentration measurements on the accurate interpretation of biological assay results. Anal Chem; 76(24): 7278–7287.

    Article  CAS  Google Scholar 

  28. Gagarin A, Makarenkov V, Zentilli P. (2006) Using clustering techniques to improve hit selection in high-throughput screening. J Biomol Screen; 11(8): 903–914.

    Article  Google Scholar 

  29. Zhang JH, Chung TD, Oldenburg KR. (2000) Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. J Comb Chem; 2(3): 258–265.

    Article  CAS  Google Scholar 

  30. Fogel P, Collette P, Dupront A, Garyantes T, Guedin D. (2002) The confirmation rate of primary hits: a predictive model. J Biomol Screen; 7(3): 175–190.

    Article  CAS  Google Scholar 

  31. Zhang XD. (2007) A new method with flexible and balanced control of false negatives and false positives for hit selection in RNA interference high-throughput screening assays. J Biomol Screen; 12 (5): 645–655.

    Article  CAS  Google Scholar 

  32. Wu X, Sills MA, Zhang JH. (2005) Further comparison of primary hit identification by different assay technologies and effects of assay measurement variability. J Biomol Screen; 10(6): 581–589.

    Article  CAS  Google Scholar 

  33. Sui Y, Wu Z. (2007) Alternative statistical parameter for high-throughput screening assay quality assessment. J Biomol Screen; 12(2): 229–234.

    Article  Google Scholar 

  34. Li Z, Mehdi S, Patel I, Kawooya J, Judkins M, Zhang W, Diener K, Lozada A, Dunnington D. (2000) An ultra-high throughput screening approach for an adenine transferase using fluorescence polarization. J Biomol Screen; 5(1): 31–38.

    Article  CAS  Google Scholar 

  35. Janzen W, Bernasconi P, Cheatham L, Mansky P, Popa-Burke I, Williams K, Worley J, Hodge N. (2004) Optimizing the chemical genomics process. In: Darvas F, Guttman A, Dorman F (eds) Chemical Genomics: Advances in Drug Discovery and Functional Genomics Applications. Marcel Dekker, New York.

    Google Scholar 

  36. Rousseeuw PJ, Leroy AM. (1987) Robust Regression and Outliers Detection. John Wiley, New York.

    Book  Google Scholar 

  37. Ripley BD, Venables WN. (2000) Modern Applied Statistics with S. Springer.

    Google Scholar 

Download references

Acknowledgements

The authors are greatly indebted to Ricardo Macarron, Mike Snowden, Mark Lennon, Gavin Harper, Martin Everett, Liz Clark, Glenn Hofmann, Geoff Mellor, Chris Molloy, Andy Vines, Dave Bolton and Javier Sanchez-Vicente for all the productive discussions about how to best implement statistical methodologies in the HTS process at GlaxoSmithKline. Likewise, we would like to thank many other colleagues in IT and Screening for their ideas and experimental data. SQC software has been the result of a joint collaborative effort with Tessella. We are also grateful to Robert Hertzberg, Stephen Pickett and Emilio Diez for their support in the writing of this manuscript.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendix 1: Estimation of the data centre in ASDIC

Appendix 1: Estimation of the data centre in ASDIC

If the results of an HTS campaign are as below, we note:

n

size of the sample, number of compounds or pools

\((x_1 ,x_2 ,...,x_n )\)

activity values

\((x_{1:n} ,x_{2:n} ,...,x_{n:n} )\)

ordered activity values

\(x_{i:n}\)

ith value in the ordered sample

\(\hat \theta\)

location estimator

\(\hat \theta\)

location estimator

\(r_i = \left( {x_i - \hat \theta } \right)\)

residuals

\(\left( {r^2 } \right)_{i:n}\)

ordered squared residuals

  • The mean is the LS (least squares) estimator, because it minimises the expression

    $$\mathop {\min }\limits_{\hat \theta } \sum\limits_{i = 1}^n {r_i^2 }$$
  • The LMS (least median squares) estimator minimises the expression

    $$\mathop {\min }\limits_{\hat \theta } \left( {\mathop {median}\limits_{i = 1,...,n} \left( {r_i^2 } \right)} \right)$$
  • The LTS (least trimmed squares) estimator minimises the expression

    $$\mathop {\min }\limits_{\hat \theta } \sum\limits_{i = 1}^h {\left( {r^2 } \right)_{i:n} }$$

    where \(h = \left[ {{n \mathord{\left/ {\vphantom {n 2}} \right. \kern-\nulldelimiterspace} 2}} \right] + 1\) is the half sample size

  • The LTSq (least trimmed squares quarter) estimator minimises the expression

    $$\mathop {\min }\limits_{\hat \theta } \sum\limits_{i = 1}^q {\left( {r^2 } \right)_{i:n} }$$

    where \(q = \left[ {{n \mathord{\left/ {\vphantom {4}} \right. \kern-\nulldelimiterspace} }} \right] + 1\) is the quarter sample size.

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Coma, I., Herranz, J., Martin, J. (2009). Statistics and Decision Making in High-Throughput Screening. In: Janzen, W., Bernasconi, P. (eds) High Throughput Screening. Methods in Molecular Biology, vol 565. Humana Press. https://doi.org/10.1007/978-1-60327-258-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-258-2_4

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-257-5

  • Online ISBN: 978-1-60327-258-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics