Skip to main content

Multiple Testing in Large-Scale Genetic Studies

  • Protocol
  • First Online:
Data Production and Analysis in Population Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 888))

Abstract

Recent advances in Molecular Biology and improvements in microarray and sequencing technologies have led biologists toward high-throughput genomic studies. These studies aim at finding associations between genetic markers and a phenotype and involve conducting many statistical tests on these markers. Such Please confirm the changes in the sentence “Such a wide...” a wide investigation of the genome not only renders genomic studies quite attractive but also lead to a major shortcoming. That is, among the markers detected as associated with the phenotype, a nonnegligible proportion is not in reality (false-positives) and also true associations can be missed (false-negatives). A main cause of these spurious associations is due to the multiple-testing problem, inherent to conducting numerous statistical tests. Several approaches exist to work around this issue. These multiple-testing adjustments aim at defining new statistical confidence measures that are controlled to guarantee that the outcomes of the tests are pertinent.The most natural correction was introduced by Bonferroni and aims at controlling the family-wise error-rate (FWER) that is the probability of having at least one false-positive. Another approach is based on the false-discovery-rate (FDR) and considers the proportion of significant results that are expected to be false-positives. Finally, the local-FDR focuses on the actual probability for a marker of being associated or not with the phenotype. These strategies are widely used but one has to be careful about when and how to apply them. We propose in this chapter a discussion on the multiple-testing issue and on the main approaches to take it into account. We aim at providing a theoretical and intuitive definition of these concepts along with practical advises to guide researchers in choosing the more appropriate multiple-testing procedure corresponding to the purposes of their studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://pngu.mgh.harvard.edu/~purcell/plink.

  2. 2.

    http://cran.r-project.org.

  3. 3.

    http://www.statmethods.net.

  4. 4.

    http://stat.genopole.cnrs.fr/sg/software/kerfdr.

References

  1. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG (2001) Replication validity of genetic association studies. Nat Genet 29:306–309.

    Article  PubMed  CAS  Google Scholar 

  2. Page GP, George V, Go RC, Page PZ, Allison DB (2003) “Are we there yet?”: Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. Am J Hum Genet 73:711–719.

    Article  PubMed  CAS  Google Scholar 

  3. Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791.

    Article  PubMed  CAS  Google Scholar 

  4. Rice TK, Schork NJ, Rao DC (2008) Methods for handling multiple testing. Adv Genet 60:293–308.

    Article  PubMed  Google Scholar 

  5. Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32:567–573.

    Article  PubMed  Google Scholar 

  6. van den Oord EJCG (2008) Controlling false discoveries in genetic studies. Am J Med Genet B Neuropsychiatr Genet 147B:637–644.

    Article  PubMed  Google Scholar 

  7. Noble WS (2009) How does multiple testing correction work? Nat Biotechnol 27:1135–1137.

    Article  PubMed  CAS  Google Scholar 

  8. Chen JJ, Roberson PK, Schell MJ (2010) The false discovery rate: a key concept in large-scale genetic studies. Cancer Control 17:58–62.

    PubMed  Google Scholar 

  9. Fisher RA (1925) Statistical methods for research workers, 11th edn.(rev.). Oliver & Boyd, Edinburgh.

    Google Scholar 

  10. Bonferroni C (1935) Studi in Onore del Professore Salvatore Ortu Carboni, chapter Il calcolo delle assicurazioni su gruppi di teste. pp. 13–60.

    Google Scholar 

  11. Bonferroni C (1936) Teoria statistica delle classi e calcolo delle probabilita. Publicazioni del R Instituto Superiore de Scienze Economiche e Commerciali de Firenze 8:3–62.

    Google Scholar 

  12. Sidak Z (1967) Rectangular confidence region for themeans of multivariate normal distributions. J Am Stat Assoc 62:626–633.

    Google Scholar 

  13. McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York

    Book  Google Scholar 

  14. Jung SH (2005) Sample size for fdr-control in microarray data analysis. Bioinformatics 21:3097–3104.

    Article  PubMed  CAS  Google Scholar 

  15. Wang SJ, Chen JJ (2004) Sample size for identifying differentially expressed genes in microarray experiments. J Comput Biol 11:714–726.

    Article  PubMed  CAS  Google Scholar 

  16. Pounds S, Morris SW (2003) Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19:1236–1242.

    Article  PubMed  CAS  Google Scholar 

  17. McLachlan G, Bean R, Ben-Tovim Jones L (2006) A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22:1608–1615.

    Article  PubMed  CAS  Google Scholar 

  18. Markitsis A, Lai Y (2010) A censored beta mixture model for the estimation of the proportion of non-differentially expressed genes. Bioinformatics 26:640–646.

    Article  PubMed  CAS  Google Scholar 

  19. Mosig MO, Lipkin E, Khutoreskaya G, Tchourzyna E, Soller M, et al. (2001) A whole genome scan for quantitative trait loci affecting milk protein percentage in israeli-holstein cattle, by means of selective milk dna pooling in a daughter design, using an adjusted false discovery rate criterion. Genetics 157:1683–1698.

    PubMed  CAS  Google Scholar 

  20. Scheid S, Spang R (2004) A stochastic downhill search algorithm for estimating the local false discovery rate. IEEE/ACM Trans Comput Biol Bioinform 1:98–108.

    Article  PubMed  CAS  Google Scholar 

  21. Langaas M, Lindqvist BH, Ferkingstad E (2005) Estimating the proportion of true null hypotheses, with application to dna microarray data. J R Stat Soc Ser B 67:555–572. AQ: Please check the inserted author names are appropriate in the reference [21]”.

    Google Scholar 

  22. Lai Y (2007) A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data. Biostatistics 8:744–755.

    Article  PubMed  Google Scholar 

  23. Liao JG, Lin Y, Selvanayagam ZE, Weichung JS (2004) A mixture model for estimating the local false discovery rate in dna microarray analysis. Bioinformatics 20:2694–2701.

    Article  PubMed  CAS  Google Scholar 

  24. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445.

    Article  PubMed  CAS  Google Scholar 

  25. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerfull approach to multiple testing. JRSSB 57:289–300.

    Google Scholar 

  26. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188.

    Article  Google Scholar 

  27. Wojcik J, Forner K (2008) Exactfdr: exact computation of false discovery rate estimate in case-control association studies. Bioinformatics 24:2407–2408.

    Article  PubMed  CAS  Google Scholar 

  28. Efron B, Tibshirani R (2002) Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 23:70–86.

    Article  PubMed  Google Scholar 

  29. Allison DB, Gadbury G, Heo M, Fernandez J, Lee CK, et al. (2002) Mixture model approach for the analysis of microarray gene expression data. Comput Statist Data Anal 39:1–20.

    Article  Google Scholar 

  30. Robin S, Bar-Hen A, Daudin JJ, Pierre L (2007) A semi-parametric approach for mixture models: Application to local false discovery rate estimation. Comput Statist Data Anal 51:5483–5493.

    Article  Google Scholar 

  31. Broet P, Lewin A, Richardson S, Dalmasso C, Magdelenat H (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20:2562–2571.

    Article  PubMed  CAS  Google Scholar 

  32. Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5:155–176.

    Article  PubMed  Google Scholar 

  33. Guedj M, Robin S, Celisse A, Nuel G (2009) Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics 10:84.

    Article  PubMed  Google Scholar 

  34. Strimmer K (2008) A unified approach to false discovery rate estimation. BMC Bioinformatics 9:303.

    Article  PubMed  Google Scholar 

  35. Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mickaël Guedj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this protocol

Cite this protocol

Bouaziz, M., Jeanmougin, M., Guedj, M. (2012). Multiple Testing in Large-Scale Genetic Studies. In: Pompanon, F., Bonin, A. (eds) Data Production and Analysis in Population Genomics. Methods in Molecular Biology, vol 888. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-870-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-870-2_13

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-869-6

  • Online ISBN: 978-1-61779-870-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics