A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

Karimnezhad, Ali

doi:10.1007/s10260-021-00560-y

A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

Original Paper
Published: 30 April 2021

Volume 31, pages 159–180, (2022)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Ali Karimnezhad ORCID: orcid.org/0000-0002-5340-1858¹

218 Accesses
1 Citation
Explore all metrics

Abstract

In genome-wide association studies, hundreds of thousands of genetic features (genes, proteins, etc.) in a given case-control population are tested to verify existence of an association between each genetic marker and a specific disease. A popular approach in this regard is to estimate local false discovery rate (LFDR), the posterior probability that the null hypothesis is true, given an observed test statistic. However, the existing LFDR estimation methods in the literature are usually complicated. Assuming a chi-square model with one degree of freedom, which covers many situations in genome-wide association studies, we use the method of moments and introduce a simple, fast and efficient approach for LFDR estimation. We perform two different simulation strategies and compare the performance of the proposed approach with three popular LFDR estimation methods. We also examine the practical utility of the proposed method by analyzing a comprehensive 1000 genomes-based genome-wide association data containing approximately 9.4 million single nucleotide polymorphisms, and a microarray data set consisting of genetic expression levels for 6033 genes for prostate cancer patients. The R package implementing the proposed method is available on CRAN https://cran.r-project.org/web/packages/LFDR.MME.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Multi-omics approaches to disease

Article Open access 05 May 2017

GWAS to Identify SNPs Associated with Common Diseases and Individual Risk: Genome Wide Association Studies (GWAS) to Identify SNPs Associated with Common Diseases and Individual Risk

References

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Royal Stat Soc, Series B 57:289–300
MathSciNet MATH Google Scholar
Bickel DR (2013) Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions. Stat Appl in Genet Mol Biol 12(4):529–543
Article MathSciNet Google Scholar
Bukszár J, McClay JL, van den Oord EJ (2009) Estimating the posterior probability that genome-wide association findings are true or false. Bioinform 25(14):1807–1813
Article Google Scholar
Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT (2011) Basic statistical analysis in genetic case-control studies. Nat Protoc 6(2):121–133
Article Google Scholar
Efron B (2004) Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Am Stat Assoc 99:96–104
Article MathSciNet Google Scholar
Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102:93–103
Article MathSciNet Google Scholar
Efron B (2012) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, New York
MATH Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Article MathSciNet Google Scholar
Efron B, Turnbull BB, Narasimhan B (2011) locfdr: Computes local false discovery rates. Reference Manual, R package version 1.1-7
Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biom 75(4):800–802
MathSciNet MATH Google Scholar
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70
MathSciNet MATH Google Scholar
Karimnezhad A, Bickel DR (2020) Incorporating prior knowledge about genetic variants into the analysis of genetic association data: An empirical Bayes approach. IEEE/ACM Trans Comput Biol and Bioinform 17(2):635–464
Google Scholar
Harris D, Mátyás L (1999) Introduction to the generalized method of moments estimation. In: Mátyás L (ed) Generalized method of moments estimation. Cambridge University Press, New York, pp 3–30
Chapter Google Scholar
Muralidharan O (2010) An empirical Bayes mixture method for effect size and false discovery rate estimation. Ann Appl Stat 4(1):422–438
Article MathSciNet Google Scholar
Nikpay M, Goel A, Won HH, Hall LM, Willenborg C, Kanoni S et al (2015) A comprehensive 1000 Genomes- based genome-wide association meta-analysis of coronary artery disease. Nat Genet 47(10):1121–1130
Article Google Scholar
Padilla M, Bickel DR (2012) Estimators of the local false discovery rate designed for small numbers of tests. Stat Appl Genet Mol Biol 11(5) Art. 4
Pan W, Lin J, Le CT (2003) A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genom 3(3):117–124
Article Google Scholar
Slatkin M (2008) Linkage disequilibrium-understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9(6):477–485
Article Google Scholar
Shao J (2007) Mathematical statistics, 2nd edn. Springer-Verlag, New York
Google Scholar
Sidák Z (1968) On multivariate normal probabilities of rectangles: their dependence on correlations. Ann Math Stat 39(5):1425–1434
Article MathSciNet Google Scholar
Sidák Z (1971) On probabilities of rectangles in multivariate Student distributions: their dependence on correlations. Ann Math Stat 42(1):169–175
Article MathSciNet Google Scholar
Simes RJ (1986) An improved Bonferroni procedure for multiple tests of significance. Biometrika 73(3):751–754
Article MathSciNet Google Scholar
Storey JD (2002) A direct approach to false discovery rates. J Royal Stat Soc, Series B 64:479–498
Article MathSciNet Google Scholar
Yang Y, Aghababazadeh FA, Bickel DR (2013) Parametric estimation of the local false discovery rate for identifying genetic associations. IEEE/ACM Trans Comput Biol and Bioinform 10:98–108
Article Google Scholar
Yang Y, Padilla M, Ali A, Leckett K, Yang Z, Li Z (2015) LFDR.MLE. Reference Manual, R package version 1.1-10
Zhao Z, Wang W, Wei Z (2013) An empirical Bayes testing procedure for detecting variants in analysis of next generation sequencing data. Ann Appl Stat 7(4):2229–2248
Article MathSciNet Google Scholar
Zheng G, Yang Y, Zhu X, Elston RC (2012) Analysis of genetic association studies. Springer Science and Business Media, New York
Book Google Scholar

Download references

Acknowledgements

The author is grateful to two anonymous reviewers for their constructive comments. Prostate data are available online through http://statweb.stanford.edu/ckirby/brad/LSI/datasets-and-programs/datasets.html. The data on coronary artery disease have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.CARDIOGRAMPLUSC4D.ORG. The HB and ML based LFDR estimates have been computed using the locfdr (Efron et al. 2011) and LFDR.MLE (Yang et al. 2015) packages, respectively.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada
Ali Karimnezhad

Authors

Ali Karimnezhad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Karimnezhad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karimnezhad, A. A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis. Stat Methods Appl 31, 159–180 (2022). https://doi.org/10.1007/s10260-021-00560-y

Download citation

Accepted: 25 February 2021
Published: 30 April 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10260-021-00560-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

Abstract

Access this article

Similar content being viewed by others

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Multi-omics approaches to disease

GWAS to Identify SNPs Associated with Common Diseases and Individual Risk: Genome Wide Association Studies (GWAS) to Identify SNPs Associated with Common Diseases and Individual Risk

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A simple yet efficient method of local false discovery rate estimation designed for genome-wide association data analysis

Abstract

Access this article

Similar content being viewed by others

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Multi-omics approaches to disease

GWAS to Identify SNPs Associated with Common Diseases and Individual Risk: Genome Wide Association Studies (GWAS) to Identify SNPs Associated with Common Diseases and Individual Risk

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation