Abstract
In genome-wide association studies, hundreds of thousands of genetic features (genes, proteins, etc.) in a given case-control population are tested to verify existence of an association between each genetic marker and a specific disease. A popular approach in this regard is to estimate local false discovery rate (LFDR), the posterior probability that the null hypothesis is true, given an observed test statistic. However, the existing LFDR estimation methods in the literature are usually complicated. Assuming a chi-square model with one degree of freedom, which covers many situations in genome-wide association studies, we use the method of moments and introduce a simple, fast and efficient approach for LFDR estimation. We perform two different simulation strategies and compare the performance of the proposed approach with three popular LFDR estimation methods. We also examine the practical utility of the proposed method by analyzing a comprehensive 1000 genomes-based genome-wide association data containing approximately 9.4 million single nucleotide polymorphisms, and a microarray data set consisting of genetic expression levels for 6033 genes for prostate cancer patients. The R package implementing the proposed method is available on CRAN https://cran.r-project.org/web/packages/LFDR.MME.
Similar content being viewed by others
References
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc, Series B 57:289–300
Bickel DR (2013) Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions. Stat Appl in Genet Mol Biol 12(4):529–543
Bukszár J, McClay JL, van den Oord EJ (2009) Estimating the posterior probability that genome-wide association findings are true or false. Bioinform 25(14):1807–1813
Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT (2011) Basic statistical analysis in genetic case-control studies. Nat Protoc 6(2):121–133
Efron B (2004) Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Am Stat Assoc 99:96–104
Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102:93–103
Efron B (2012) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, New York
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Efron B, Turnbull BB, Narasimhan B (2011) locfdr: Computes local false discovery rates. Reference Manual, R package version 1.1-7
Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biom 75(4):800–802
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70
Karimnezhad A, Bickel DR (2020) Incorporating prior knowledge about genetic variants into the analysis of genetic association data: An empirical Bayes approach. IEEE/ACM Trans Comput Biol and Bioinform 17(2):635–464
Harris D, Mátyás L (1999) Introduction to the generalized method of moments estimation. In: Mátyás L (ed) Generalized method of moments estimation. Cambridge University Press, New York, pp 3–30
Muralidharan O (2010) An empirical Bayes mixture method for effect size and false discovery rate estimation. Ann Appl Stat 4(1):422–438
Nikpay M, Goel A, Won HH, Hall LM, Willenborg C, Kanoni S et al (2015) A comprehensive 1000 Genomes- based genome-wide association meta-analysis of coronary artery disease. Nat Genet 47(10):1121–1130
Padilla M, Bickel DR (2012) Estimators of the local false discovery rate designed for small numbers of tests. Stat Appl Genet Mol Biol 11(5) Art. 4
Pan W, Lin J, Le CT (2003) A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genom 3(3):117–124
Slatkin M (2008) Linkage disequilibrium-understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9(6):477–485
Shao J (2007) Mathematical statistics, 2nd edn. Springer-Verlag, New York
Sidák Z (1968) On multivariate normal probabilities of rectangles: their dependence on correlations. Ann Math Stat 39(5):1425–1434
Sidák Z (1971) On probabilities of rectangles in multivariate Student distributions: their dependence on correlations. Ann Math Stat 42(1):169–175
Simes RJ (1986) An improved Bonferroni procedure for multiple tests of significance. Biometrika 73(3):751–754
Storey JD (2002) A direct approach to false discovery rates. J Royal Stat Soc, Series B 64:479–498
Yang Y, Aghababazadeh FA, Bickel DR (2013) Parametric estimation of the local false discovery rate for identifying genetic associations. IEEE/ACM Trans Comput Biol and Bioinform 10:98–108
Yang Y, Padilla M, Ali A, Leckett K, Yang Z, Li Z (2015) LFDR.MLE. Reference Manual, R package version 1.1-10
Zhao Z, Wang W, Wei Z (2013) An empirical Bayes testing procedure for detecting variants in analysis of next generation sequencing data. Ann Appl Stat 7(4):2229–2248
Zheng G, Yang Y, Zhu X, Elston RC (2012) Analysis of genetic association studies. Springer Science and Business Media, New York
Acknowledgements
The author is grateful to two anonymous reviewers for their constructive comments. Prostate data are available online through http://statweb.stanford.edu/ckirby/brad/LSI/datasets-and-programs/datasets.html. The data on coronary artery disease have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.CARDIOGRAMPLUSC4D.ORG. The HB and ML based LFDR estimates have been computed using the locfdr (Efron et al. 2011) and LFDR.MLE (Yang et al. 2015) packages, respectively.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Karimnezhad, A. A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis. Stat Methods Appl 31, 159–180 (2022). https://doi.org/10.1007/s10260-021-00560-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-021-00560-y