Skip to main content
Log in

A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

In genome-wide association studies, hundreds of thousands of genetic features (genes, proteins, etc.) in a given case-control population are tested to verify existence of an association between each genetic marker and a specific disease. A popular approach in this regard is to estimate local false discovery rate (LFDR), the posterior probability that the null hypothesis is true, given an observed test statistic. However, the existing LFDR estimation methods in the literature are usually complicated. Assuming a chi-square model with one degree of freedom, which covers many situations in genome-wide association studies, we use the method of moments and introduce a simple, fast and efficient approach for LFDR estimation. We perform two different simulation strategies and compare the performance of the proposed approach with three popular LFDR estimation methods. We also examine the practical utility of the proposed method by analyzing a comprehensive 1000 genomes-based genome-wide association data containing approximately 9.4 million single nucleotide polymorphisms, and a microarray data set consisting of genetic expression levels for 6033 genes for prostate cancer patients. The R package implementing the proposed method is available on CRAN https://cran.r-project.org/web/packages/LFDR.MME.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc, Series B 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Bickel DR (2013) Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions. Stat Appl in Genet Mol Biol 12(4):529–543

    Article  MathSciNet  Google Scholar 

  • Bukszár J, McClay JL, van den Oord EJ (2009) Estimating the posterior probability that genome-wide association findings are true or false. Bioinform 25(14):1807–1813

    Article  Google Scholar 

  • Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT (2011) Basic statistical analysis in genetic case-control studies. Nat Protoc 6(2):121–133

    Article  Google Scholar 

  • Efron B (2004) Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Am Stat Assoc 99:96–104

    Article  MathSciNet  Google Scholar 

  • Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102:93–103

    Article  MathSciNet  Google Scholar 

  • Efron B (2012) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, New York

    MATH  Google Scholar 

  • Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160

    Article  MathSciNet  Google Scholar 

  • Efron B, Turnbull BB, Narasimhan B (2011) locfdr: Computes local false discovery rates. Reference Manual, R package version 1.1-7

  • Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biom 75(4):800–802

    MathSciNet  MATH  Google Scholar 

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70

    MathSciNet  MATH  Google Scholar 

  • Karimnezhad A, Bickel DR (2020) Incorporating prior knowledge about genetic variants into the analysis of genetic association data: An empirical Bayes approach. IEEE/ACM Trans Comput Biol and Bioinform 17(2):635–464

    Google Scholar 

  • Harris D, Mátyás L (1999) Introduction to the generalized method of moments estimation. In: Mátyás L (ed) Generalized method of moments estimation. Cambridge University Press, New York, pp 3–30

    Chapter  Google Scholar 

  • Muralidharan O (2010) An empirical Bayes mixture method for effect size and false discovery rate estimation. Ann Appl Stat 4(1):422–438

    Article  MathSciNet  Google Scholar 

  • Nikpay M, Goel A, Won HH, Hall LM, Willenborg C, Kanoni S et al (2015) A comprehensive 1000 Genomes- based genome-wide association meta-analysis of coronary artery disease. Nat Genet 47(10):1121–1130

    Article  Google Scholar 

  • Padilla M, Bickel DR (2012) Estimators of the local false discovery rate designed for small numbers of tests. Stat Appl Genet Mol Biol 11(5) Art. 4

  • Pan W, Lin J, Le CT (2003) A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genom 3(3):117–124

    Article  Google Scholar 

  • Slatkin M (2008) Linkage disequilibrium-understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9(6):477–485

    Article  Google Scholar 

  • Shao J (2007) Mathematical statistics, 2nd edn. Springer-Verlag, New York

    Google Scholar 

  • Sidák Z (1968) On multivariate normal probabilities of rectangles: their dependence on correlations. Ann Math Stat 39(5):1425–1434

    Article  MathSciNet  Google Scholar 

  • Sidák Z (1971) On probabilities of rectangles in multivariate Student distributions: their dependence on correlations. Ann Math Stat 42(1):169–175

    Article  MathSciNet  Google Scholar 

  • Simes RJ (1986) An improved Bonferroni procedure for multiple tests of significance. Biometrika 73(3):751–754

    Article  MathSciNet  Google Scholar 

  • Storey JD (2002) A direct approach to false discovery rates. J Royal Stat Soc, Series B 64:479–498

    Article  MathSciNet  Google Scholar 

  • Yang Y, Aghababazadeh FA, Bickel DR (2013) Parametric estimation of the local false discovery rate for identifying genetic associations. IEEE/ACM Trans Comput Biol and Bioinform 10:98–108

    Article  Google Scholar 

  • Yang Y, Padilla M, Ali A, Leckett K, Yang Z, Li Z (2015) LFDR.MLE. Reference Manual, R package version 1.1-10

  • Zhao Z, Wang W, Wei Z (2013) An empirical Bayes testing procedure for detecting variants in analysis of next generation sequencing data. Ann Appl Stat 7(4):2229–2248

    Article  MathSciNet  Google Scholar 

  • Zheng G, Yang Y, Zhu X, Elston RC (2012) Analysis of genetic association studies. Springer Science and Business Media, New York

    Book  Google Scholar 

Download references

Acknowledgements

The author is grateful to two anonymous reviewers for their constructive comments. Prostate data are available online through http://statweb.stanford.edu/ckirby/brad/LSI/datasets-and-programs/datasets.html. The data on coronary artery disease have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.CARDIOGRAMPLUSC4D.ORG. The HB and ML based LFDR estimates have been computed using the locfdr (Efron et al. 2011) and LFDR.MLE (Yang et al. 2015) packages, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Karimnezhad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karimnezhad, A. A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis. Stat Methods Appl 31, 159–180 (2022). https://doi.org/10.1007/s10260-021-00560-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-021-00560-y

Keywords

Navigation