Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models

Cui, Tingting; Wang, Pengfei; Zhu, Wensheng

doi:10.1007/s11749-020-00746-8

Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models

Original Paper
Published: 02 January 2021

Volume 30, pages 737–757, (2021)
Cite this article

TEST Aims and scope Submit manuscript

418 Accesses
3 Citations
Explore all metrics

Abstract

It is more and more important to consider the dependence structure among multiple testings, especially for the genome-wide association studies (GWAS). The existing procedures, such as local index of significance (LIS) and pooled local index of significance (PLIS), were proposed to test hidden Markov model (HMM)-dependent hypotheses under the framework of compound decision theory, which was successfully applied to GWAS. However, the etiology of complex diseases is not only with respect to the genetic effects, but also the environmental factors. Failure to account for the covariates in multiple testing can produce misleading bias of the association of interest, or suffer from loss of testing efficiency. In this paper, we develop a covariate-adjusted multiple testing procedure, called covariate-adjusted local index of significance (CALIS), to account for the effects of environmental factors via a factorial hidden Markov model. The theoretical results show that our procedure can control the false discovery rate (FDR) at the nominal level and has the smallest false non-discovery rate (FNR) among all valid FDR procedures. We further demonstrate the advantage of our novel procedure over the existing procedures by simulation studies and a real data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performing Genome-Wide Association Studies with Multiple Models Using GAPIT

Joint Analysis of Multiple Phenotypes in Association Studies based on Cross-Validation Prediction Error

Article Open access 31 January 2019

MARS: leveraging allelic heterogeneity to increase power of association testing

Article Open access 30 April 2021

References

Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171
Article MathSciNet Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57:289–300
MathSciNet MATH Google Scholar
Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25(1):60–83
Article Google Scholar
Bickel PJ, Ritov YA, Ryden T (1998) Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models. Ann Stat 26(4):1614–1635
Article MathSciNet Google Scholar
Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, Mccarthy M, Iand Ouwehand WH, Samani NJ (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145):661–78
Article Google Scholar
Cai TT, Sun W, Wang W (2019) Covariate-assisted ranking and screening for large-scale two-sample inference. J R Stat Soc Ser B (Methodol) 81(2):187–234
Article MathSciNet Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Article MathSciNet Google Scholar
Genovese C, Wasserman L (2002) Operating characteristics and extensions of the false discovery rate procedure. J R Stat Soc Ser B (Methodol) 64(3):499–517
Article MathSciNet Google Scholar
Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1035–1061
Article MathSciNet Google Scholar
Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273
Article Google Scholar
Jiang Y, Zhang H (2011) Propensity score-based nonparametric test revealing genetic variants underlying bipolar disorder. Genetic Epidemiol 35(2):125–132
Article Google Scholar
Krystal JH, Sanacora G, Blumberg H, Anand A, Charney DS, Marek G, Epperson CN, Goddard A, Mason GF (2002) Glutamate and gaba systems as targets for novel antidepressant and mood-stabilizing treatments. Mol Psychiatry 7(1):S71
Article Google Scholar
Kuan PF, Chiang DY (2012) Integrating prior knowledge in multiple testing under dependence with applications to detecting differential dna methylation. Biometrics 68(3):774–783
Article MathSciNet Google Scholar
Lei L, Fithian W (2018) Adapt: an interactive procedure for multiple testing with side information. J R Stat Soc Ser B (Methodol) 80(4):649–679
Article MathSciNet Google Scholar
Leroux BG (1992) Maximum-likelihood estimation for hidden Markov models. Stochc Process Their Appl 40(1):127–143
Article MathSciNet Google Scholar
Liang K, Nettleton D (2010) A hidden Markov model approach to testing multiple hypotheses on a tree-transformed gene ontology graph. J Am Stat Assoc 105(492):1444–1454
Liang K, Du C, You H, Nettleton D (2018) A hidden Markov tree model for testing multiple hypotheses corresponding to gene ontology gene sets. BMC Bioinf 19(1):107
Liu J, Zhang C, Page D (2016) Multiple testing under dependence via graphical models. Ann Appl Stat 10(3):1699–1724
MathSciNet MATH Google Scholar
Merikangas KR, Mehta RL, Molnar BE, Walters EE, Swendsen JD, Aguilar-Gaziola S, Bijl R, Borges G, Caraveo-Anduaga JJ, Dewit D (1998) Comorbidity of substance use disorders with mood and anxiety disorders: results of the international consortium in psychiatric epidemiology. Addict Behav 23(6):893–907
Article Google Scholar
Newton MA, Noueiry AO, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–76
Article Google Scholar
Schork AJ, Thompson WK, Phillip P, Ali T, Cooper J, R, Sullivan PF, Kelsoe JR, O’Donovan MC, Helena F, Schork NJ, (2013) All SNPS are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPS. PloS Genet 9(4):e1003449
Shu H, Nan B, Koeppe R (2015) Multiple testing for neuroimaging via hidden Markov random field. Biometrics 71(3):741–750
Article MathSciNet Google Scholar
Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B (Methodol) 64(3):479–498
Article MathSciNet Google Scholar
Sun W, Cai TT (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102(479):901–912
Article MathSciNet Google Scholar
Sun W, Cai TT (2009) Large-scale multiple testing under dependence. J R Stat Soc Ser B (Methodol) 71(2):393–424
Article MathSciNet Google Scholar
Sun W, Reich BJ, Cai TT, Guindani M, Schwartzman A (2015) False discovery control in large-scale spatial multiple testing. J R Stat Soc Ser B (Methodol) 77(1):59–83
Article MathSciNet Google Scholar
Vawter MP, Thatcher L, Usen N, Hyde TM, Kleinman JE, Freed WJ (2002) Reduction of synapsin in the hippocampus of patients with bipolar disorder and schizophrenia. Mol Psychiatry 7(6):571
Article Google Scholar
Wang X, Ye Y, Zhang H (2006) Family-based association tests for ordinal traits adjusting for covariates. Genet Epidemiol 30(8):728–736
Article Google Scholar
Wei Z, Sun W, Wang K, Hakonarson H (2009) Multiple testing in genome-wide association studies via hidden Markov models. Bioinf 25(21):2802–2808
Article Google Scholar
Xiao J, Zhu W, Guo J (2013) Large-scale multiple testing in genome-wide association studies via; region-specific hidden Markov models. BMC Bioinf 14(1):282–282
Zablocki RW, Schork AJ, Levine RA, Andreassen OA, Dale AM, Thompson WK (2014) Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics 30(15):2098–2104
Article Google Scholar
Zablocki RW, Levine RA, Schork AJ, Xu S, Wang Y, Fan CC, Thompson WK (2017) Semiparametric covariate-modulated local false discovery rate for genome-wide association studies. Ann Appl Stat 11(4):2252–2269
Article MathSciNet Google Scholar
Zhang H, Liu CT, Wang X (2010) An association test for multiple traits based on the generalized Kendall’s tau. J Am Stat Assoc 105(490):473–481
Article MathSciNet Google Scholar
Zhu W, Jiang Y, Zhang H (2012) Nonparametric covariate-adjusted association tests based on the generalized Kendall’s tau. J Am Stat Assoc 107(497):1–11
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are grateful to the editor, the associate editor, and two anonymous reviewers for their constructive comments that helped us improve the article substantially. This work is supported in part by the National Natural Science Foundation of China (no. 11771072 and 11371083); the Science and Technology Development Plan of Jilin Province (no. 20191008004TC). The authors also thank WTCCC for permission to use the GWAS data.

Author information

Authors and Affiliations

Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, China
Tingting Cui, Pengfei Wang & Wensheng Zhu

Authors

Tingting Cui
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wensheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wensheng Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

11749_2020_746_MOESM1_ESM.pdf

\bf{Supplementary Material} Additional information for this article is available online. The content of the supplementary material provides detailed proofs of Sect.~\inlink{2.3}{secsps2.3}, Theorems~\theolink{3}{FPar3}--\theolink{5}{FPar10} in Sect. \inlink{2.4}{secsps2.4}, {an explanation of the conservative of the LIS procedures and the results of the additional real data analysis.} We implement the CALIS procedure by using the R code. All core code of CALIS procedure are freely accessible on GitHub (\url{https://github.com/wszhustat/CALIS-via-FHMM}). (pdf 226 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, T., Wang, P. & Zhu, W. Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models. TEST 30, 737–757 (2021). https://doi.org/10.1007/s11749-020-00746-8

Download citation

Received: 17 June 2019
Accepted: 26 November 2020
Published: 02 January 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11749-020-00746-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models

Abstract

Access this article

Similar content being viewed by others

Performing Genome-Wide Association Studies with Multiple Models Using GAPIT

Joint Analysis of Multiple Phenotypes in Association Studies based on Cross-Validation Prediction Error

MARS: leveraging allelic heterogeneity to increase power of association testing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

11749_2020_746_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models

Abstract

Access this article

Similar content being viewed by others

Performing Genome-Wide Association Studies with Multiple Models Using GAPIT

Joint Analysis of Multiple Phenotypes in Association Studies based on Cross-Validation Prediction Error

MARS: leveraging allelic heterogeneity to increase power of association testing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

11749_2020_746_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation