Multiple Testing in Large-Scale Genetic Studies

Bouaziz, Matthieu; Jeanmougin, Marine; Guedj, Mickaël

doi:10.1007/978-1-61779-870-2_13

Matthieu Bouaziz^3,4,
Marine Jeanmougin^3,4 &
Mickaël Guedj³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 888))

3980 Accesses
11 Citations

Abstract

Recent advances in Molecular Biology and improvements in microarray and sequencing technologies have led biologists toward high-throughput genomic studies. These studies aim at finding associations between genetic markers and a phenotype and involve conducting many statistical tests on these markers. Such Please confirm the changes in the sentence “Such a wide...” a wide investigation of the genome not only renders genomic studies quite attractive but also lead to a major shortcoming. That is, among the markers detected as associated with the phenotype, a nonnegligible proportion is not in reality (false-positives) and also true associations can be missed (false-negatives). A main cause of these spurious associations is due to the multiple-testing problem, inherent to conducting numerous statistical tests. Several approaches exist to work around this issue. These multiple-testing adjustments aim at defining new statistical confidence measures that are controlled to guarantee that the outcomes of the tests are pertinent.The most natural correction was introduced by Bonferroni and aims at controlling the family-wise error-rate (FWER) that is the probability of having at least one false-positive. Another approach is based on the false-discovery-rate (FDR) and considers the proportion of significant results that are expected to be false-positives. Finally, the local-FDR focuses on the actual probability for a marker of being associated or not with the phenotype. These strategies are widely used but one has to be careful about when and how to apply them. We propose in this chapter a discussion on the multiple-testing issue and on the main approaches to take it into account. We aim at providing a theoretical and intuitive definition of these concepts along with practical advises to guide researchers in choosing the more appropriate multiple-testing procedure corresponding to the purposes of their studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG (2001) Replication validity of genetic association studies. Nat Genet 29:306–309.
Article PubMed CAS Google Scholar
Page GP, George V, Go RC, Page PZ, Allison DB (2003) “Are we there yet?”: Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. Am J Hum Genet 73:711–719.
Article PubMed CAS Google Scholar
Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791.
Article PubMed CAS Google Scholar
Rice TK, Schork NJ, Rao DC (2008) Methods for handling multiple testing. Adv Genet 60:293–308.
Article PubMed Google Scholar
Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32:567–573.
Article PubMed Google Scholar
van den Oord EJCG (2008) Controlling false discoveries in genetic studies. Am J Med Genet B Neuropsychiatr Genet 147B:637–644.
Article PubMed Google Scholar
Noble WS (2009) How does multiple testing correction work? Nat Biotechnol 27:1135–1137.
Article PubMed CAS Google Scholar
Chen JJ, Roberson PK, Schell MJ (2010) The false discovery rate: a key concept in large-scale genetic studies. Cancer Control 17:58–62.
PubMed Google Scholar
Fisher RA (1925) Statistical methods for research workers, 11th edn.(rev.). Oliver & Boyd, Edinburgh.
Google Scholar
Bonferroni C (1935) Studi in Onore del Professore Salvatore Ortu Carboni, chapter Il calcolo delle assicurazioni su gruppi di teste. pp. 13–60.
Google Scholar
Bonferroni C (1936) Teoria statistica delle classi e calcolo delle probabilita. Publicazioni del R Instituto Superiore de Scienze Economiche e Commerciali de Firenze 8:3–62.
Google Scholar
Sidak Z (1967) Rectangular confidence region for themeans of multivariate normal distributions. J Am Stat Assoc 62:626–633.
Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Book Google Scholar
Jung SH (2005) Sample size for fdr-control in microarray data analysis. Bioinformatics 21:3097–3104.
Article PubMed CAS Google Scholar
Wang SJ, Chen JJ (2004) Sample size for identifying differentially expressed genes in microarray experiments. J Comput Biol 11:714–726.
Article PubMed CAS Google Scholar
Pounds S, Morris SW (2003) Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19:1236–1242.
Article PubMed CAS Google Scholar
McLachlan G, Bean R, Ben-Tovim Jones L (2006) A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22:1608–1615.
Article PubMed CAS Google Scholar
Markitsis A, Lai Y (2010) A censored beta mixture model for the estimation of the proportion of non-differentially expressed genes. Bioinformatics 26:640–646.
Article PubMed CAS Google Scholar
Mosig MO, Lipkin E, Khutoreskaya G, Tchourzyna E, Soller M, et al. (2001) A whole genome scan for quantitative trait loci affecting milk protein percentage in israeli-holstein cattle, by means of selective milk dna pooling in a daughter design, using an adjusted false discovery rate criterion. Genetics 157:1683–1698.
PubMed CAS Google Scholar
Scheid S, Spang R (2004) A stochastic downhill search algorithm for estimating the local false discovery rate. IEEE/ACM Trans Comput Biol Bioinform 1:98–108.
Article PubMed CAS Google Scholar
Langaas M, Lindqvist BH, Ferkingstad E (2005) Estimating the proportion of true null hypotheses, with application to dna microarray data. J R Stat Soc Ser B 67:555–572. AQ: Please check the inserted author names are appropriate in the reference [21]”.
Google Scholar
Lai Y (2007) A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data. Biostatistics 8:744–755.
Article PubMed Google Scholar
Liao JG, Lin Y, Selvanayagam ZE, Weichung JS (2004) A mixture model for estimating the local false discovery rate in dna microarray analysis. Bioinformatics 20:2694–2701.
Article PubMed CAS Google Scholar
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445.
Article PubMed CAS Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerfull approach to multiple testing. JRSSB 57:289–300.
Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188.
Article Google Scholar
Wojcik J, Forner K (2008) Exactfdr: exact computation of false discovery rate estimate in case-control association studies. Bioinformatics 24:2407–2408.
Article PubMed CAS Google Scholar
Efron B, Tibshirani R (2002) Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 23:70–86.
Article PubMed Google Scholar
Allison DB, Gadbury G, Heo M, Fernandez J, Lee CK, et al. (2002) Mixture model approach for the analysis of microarray gene expression data. Comput Statist Data Anal 39:1–20.
Article Google Scholar
Robin S, Bar-Hen A, Daudin JJ, Pierre L (2007) A semi-parametric approach for mixture models: Application to local false discovery rate estimation. Comput Statist Data Anal 51:5483–5493.
Article Google Scholar
Broet P, Lewin A, Richardson S, Dalmasso C, Magdelenat H (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20:2562–2571.
Article PubMed CAS Google Scholar
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5:155–176.
Article PubMed Google Scholar
Guedj M, Robin S, Celisse A, Nuel G (2009) Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics 10:84.
Article PubMed Google Scholar
Strimmer K (2008) A unified approach to false discovery rate estimation. BMC Bioinformatics 9:303.
Article PubMed Google Scholar
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, Pharnext, Paris, France
Matthieu Bouaziz, Marine Jeanmougin & Mickaël Guedj
Statistics and Genome laboratory, UMR CNRS 8071, USC INRA, University of Evry, Val d’Essonne, France
Matthieu Bouaziz & Marine Jeanmougin

Authors

Matthieu Bouaziz
View author publications
You can also search for this author in PubMed Google Scholar
Marine Jeanmougin
View author publications
You can also search for this author in PubMed Google Scholar
Mickaël Guedj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mickaël Guedj .

Editor information

Editors and Affiliations

, Laboratoire d'Ecologie Alpine, Université Grenoble I, CNRS-UMR5553, rue de la Piscine 2233, Grenoble Cedex 09, 38041, France
François Pompanon
Laboratoire d'Ecologie Alpine, Université Grenoble 1, Rue de la Piscine 2233, Grenoble, 38041, France
Aurélie Bonin

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bouaziz, M., Jeanmougin, M., Guedj, M. (2012). Multiple Testing in Large-Scale Genetic Studies. In: Pompanon, F., Bonin, A. (eds) Data Production and Analysis in Population Genomics. Methods in Molecular Biology, vol 888. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-870-2_13

Download citation

DOI: https://doi.org/10.1007/978-1-61779-870-2_13
Published: 24 April 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-869-6
Online ISBN: 978-1-61779-870-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics