Abstract
Recent advances in Molecular Biology and improvements in microarray and sequencing technologies have led biologists toward high-throughput genomic studies. These studies aim at finding associations between genetic markers and a phenotype and involve conducting many statistical tests on these markers. Such Please confirm the changes in the sentence “Such a wide...” a wide investigation of the genome not only renders genomic studies quite attractive but also lead to a major shortcoming. That is, among the markers detected as associated with the phenotype, a nonnegligible proportion is not in reality (false-positives) and also true associations can be missed (false-negatives). A main cause of these spurious associations is due to the multiple-testing problem, inherent to conducting numerous statistical tests. Several approaches exist to work around this issue. These multiple-testing adjustments aim at defining new statistical confidence measures that are controlled to guarantee that the outcomes of the tests are pertinent.The most natural correction was introduced by Bonferroni and aims at controlling the family-wise error-rate (FWER) that is the probability of having at least one false-positive. Another approach is based on the false-discovery-rate (FDR) and considers the proportion of significant results that are expected to be false-positives. Finally, the local-FDR focuses on the actual probability for a marker of being associated or not with the phenotype. These strategies are widely used but one has to be careful about when and how to apply them. We propose in this chapter a discussion on the multiple-testing issue and on the main approaches to take it into account. We aim at providing a theoretical and intuitive definition of these concepts along with practical advises to guide researchers in choosing the more appropriate multiple-testing procedure corresponding to the purposes of their studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG (2001) Replication validity of genetic association studies. Nat Genet 29:306–309.
Page GP, George V, Go RC, Page PZ, Allison DB (2003) “Are we there yet?”: Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. Am J Hum Genet 73:711–719.
Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791.
Rice TK, Schork NJ, Rao DC (2008) Methods for handling multiple testing. Adv Genet 60:293–308.
Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32:567–573.
van den Oord EJCG (2008) Controlling false discoveries in genetic studies. Am J Med Genet B Neuropsychiatr Genet 147B:637–644.
Noble WS (2009) How does multiple testing correction work? Nat Biotechnol 27:1135–1137.
Chen JJ, Roberson PK, Schell MJ (2010) The false discovery rate: a key concept in large-scale genetic studies. Cancer Control 17:58–62.
Fisher RA (1925) Statistical methods for research workers, 11th edn.(rev.). Oliver & Boyd, Edinburgh.
Bonferroni C (1935) Studi in Onore del Professore Salvatore Ortu Carboni, chapter Il calcolo delle assicurazioni su gruppi di teste. pp. 13–60.
Bonferroni C (1936) Teoria statistica delle classi e calcolo delle probabilita. Publicazioni del R Instituto Superiore de Scienze Economiche e Commerciali de Firenze 8:3–62.
Sidak Z (1967) Rectangular confidence region for themeans of multivariate normal distributions. J Am Stat Assoc 62:626–633.
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Jung SH (2005) Sample size for fdr-control in microarray data analysis. Bioinformatics 21:3097–3104.
Wang SJ, Chen JJ (2004) Sample size for identifying differentially expressed genes in microarray experiments. J Comput Biol 11:714–726.
Pounds S, Morris SW (2003) Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19:1236–1242.
McLachlan G, Bean R, Ben-Tovim Jones L (2006) A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22:1608–1615.
Markitsis A, Lai Y (2010) A censored beta mixture model for the estimation of the proportion of non-differentially expressed genes. Bioinformatics 26:640–646.
Mosig MO, Lipkin E, Khutoreskaya G, Tchourzyna E, Soller M, et al. (2001) A whole genome scan for quantitative trait loci affecting milk protein percentage in israeli-holstein cattle, by means of selective milk dna pooling in a daughter design, using an adjusted false discovery rate criterion. Genetics 157:1683–1698.
Scheid S, Spang R (2004) A stochastic downhill search algorithm for estimating the local false discovery rate. IEEE/ACM Trans Comput Biol Bioinform 1:98–108.
Langaas M, Lindqvist BH, Ferkingstad E (2005) Estimating the proportion of true null hypotheses, with application to dna microarray data. J R Stat Soc Ser B 67:555–572. AQ: Please check the inserted author names are appropriate in the reference [21]”.
Lai Y (2007) A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data. Biostatistics 8:744–755.
Liao JG, Lin Y, Selvanayagam ZE, Weichung JS (2004) A mixture model for estimating the local false discovery rate in dna microarray analysis. Bioinformatics 20:2694–2701.
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445.
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerfull approach to multiple testing. JRSSB 57:289–300.
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188.
Wojcik J, Forner K (2008) Exactfdr: exact computation of false discovery rate estimate in case-control association studies. Bioinformatics 24:2407–2408.
Efron B, Tibshirani R (2002) Empirical bayes methods and false discovery rates for microarrays. Genet Epidemiol 23:70–86.
Allison DB, Gadbury G, Heo M, Fernandez J, Lee CK, et al. (2002) Mixture model approach for the analysis of microarray gene expression data. Comput Statist Data Anal 39:1–20.
Robin S, Bar-Hen A, Daudin JJ, Pierre L (2007) A semi-parametric approach for mixture models: Application to local false discovery rate estimation. Comput Statist Data Anal 51:5483–5493.
Broet P, Lewin A, Richardson S, Dalmasso C, Magdelenat H (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20:2562–2571.
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5:155–176.
Guedj M, Robin S, Celisse A, Nuel G (2009) Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics 10:84.
Strimmer K (2008) A unified approach to false discovery rate estimation. BMC Bioinformatics 9:303.
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media New York
About this protocol
Cite this protocol
Bouaziz, M., Jeanmougin, M., Guedj, M. (2012). Multiple Testing in Large-Scale Genetic Studies. In: Pompanon, F., Bonin, A. (eds) Data Production and Analysis in Population Genomics. Methods in Molecular Biology, vol 888. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-870-2_13
Download citation
DOI: https://doi.org/10.1007/978-1-61779-870-2_13
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-869-6
Online ISBN: 978-1-61779-870-2
eBook Packages: Springer Protocols