Skip to main content
Log in

Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

It is more and more important to consider the dependence structure among multiple testings, especially for the genome-wide association studies (GWAS). The existing procedures, such as local index of significance (LIS) and pooled local index of significance (PLIS), were proposed to test hidden Markov model (HMM)-dependent hypotheses under the framework of compound decision theory, which was successfully applied to GWAS. However, the etiology of complex diseases is not only with respect to the genetic effects, but also the environmental factors. Failure to account for the covariates in multiple testing can produce misleading bias of the association of interest, or suffer from loss of testing efficiency. In this paper, we develop a covariate-adjusted multiple testing procedure, called covariate-adjusted local index of significance (CALIS), to account for the effects of environmental factors via a factorial hidden Markov model. The theoretical results show that our procedure can control the false discovery rate (FDR) at the nominal level and has the smallest false non-discovery rate (FNR) among all valid FDR procedures. We further demonstrate the advantage of our novel procedure over the existing procedures by simulation studies and a real data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171

    Article  MathSciNet  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25(1):60–83

    Article  Google Scholar 

  • Bickel PJ, Ritov YA, Ryden T (1998) Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models. Ann Stat 26(4):1614–1635

    Article  MathSciNet  Google Scholar 

  • Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, Mccarthy M, Iand Ouwehand WH, Samani NJ (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145):661–78

    Article  Google Scholar 

  • Cai TT, Sun W, Wang W (2019) Covariate-assisted ranking and screening for large-scale two-sample inference. J R Stat Soc Ser B (Methodol) 81(2):187–234

    Article  MathSciNet  Google Scholar 

  • Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160

    Article  MathSciNet  Google Scholar 

  • Genovese C, Wasserman L (2002) Operating characteristics and extensions of the false discovery rate procedure. J R Stat Soc Ser B (Methodol) 64(3):499–517

    Article  MathSciNet  Google Scholar 

  • Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1035–1061

    Article  MathSciNet  Google Scholar 

  • Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273

    Article  Google Scholar 

  • Jiang Y, Zhang H (2011) Propensity score-based nonparametric test revealing genetic variants underlying bipolar disorder. Genetic Epidemiol 35(2):125–132

    Article  Google Scholar 

  • Krystal JH, Sanacora G, Blumberg H, Anand A, Charney DS, Marek G, Epperson CN, Goddard A, Mason GF (2002) Glutamate and gaba systems as targets for novel antidepressant and mood-stabilizing treatments. Mol Psychiatry 7(1):S71

    Article  Google Scholar 

  • Kuan PF, Chiang DY (2012) Integrating prior knowledge in multiple testing under dependence with applications to detecting differential dna methylation. Biometrics 68(3):774–783

    Article  MathSciNet  Google Scholar 

  • Lei L, Fithian W (2018) Adapt: an interactive procedure for multiple testing with side information. J R Stat Soc Ser B (Methodol) 80(4):649–679

    Article  MathSciNet  Google Scholar 

  • Leroux BG (1992) Maximum-likelihood estimation for hidden Markov models. Stochc Process Their Appl 40(1):127–143

    Article  MathSciNet  Google Scholar 

  • Liang K, Nettleton D (2010) A hidden Markov model approach to testing multiple hypotheses on a tree-transformed gene ontology graph. J Am Stat Assoc 105(492):1444–1454

  • Liang K, Du C, You H, Nettleton D (2018) A hidden Markov tree model for testing multiple hypotheses corresponding to gene ontology gene sets. BMC Bioinf 19(1):107

  • Liu J, Zhang C, Page D (2016) Multiple testing under dependence via graphical models. Ann Appl Stat 10(3):1699–1724

    MathSciNet  MATH  Google Scholar 

  • Merikangas KR, Mehta RL, Molnar BE, Walters EE, Swendsen JD, Aguilar-Gaziola S, Bijl R, Borges G, Caraveo-Anduaga JJ, Dewit D (1998) Comorbidity of substance use disorders with mood and anxiety disorders: results of the international consortium in psychiatric epidemiology. Addict Behav 23(6):893–907

    Article  Google Scholar 

  • Newton MA, Noueiry AO, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–76

    Article  Google Scholar 

  • Schork AJ, Thompson WK, Phillip P, Ali T, Cooper J, R, Sullivan PF, Kelsoe JR, O’Donovan MC, Helena F, Schork NJ, (2013) All SNPS are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPS. PloS Genet 9(4):e1003449

  • Shu H, Nan B, Koeppe R (2015) Multiple testing for neuroimaging via hidden Markov random field. Biometrics 71(3):741–750

    Article  MathSciNet  Google Scholar 

  • Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B (Methodol) 64(3):479–498

    Article  MathSciNet  Google Scholar 

  • Sun W, Cai TT (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102(479):901–912

    Article  MathSciNet  Google Scholar 

  • Sun W, Cai TT (2009) Large-scale multiple testing under dependence. J R Stat Soc Ser B (Methodol) 71(2):393–424

    Article  MathSciNet  Google Scholar 

  • Sun W, Reich BJ, Cai TT, Guindani M, Schwartzman A (2015) False discovery control in large-scale spatial multiple testing. J R Stat Soc Ser B (Methodol) 77(1):59–83

    Article  MathSciNet  Google Scholar 

  • Vawter MP, Thatcher L, Usen N, Hyde TM, Kleinman JE, Freed WJ (2002) Reduction of synapsin in the hippocampus of patients with bipolar disorder and schizophrenia. Mol Psychiatry 7(6):571

    Article  Google Scholar 

  • Wang X, Ye Y, Zhang H (2006) Family-based association tests for ordinal traits adjusting for covariates. Genet Epidemiol 30(8):728–736

    Article  Google Scholar 

  • Wei Z, Sun W, Wang K, Hakonarson H (2009) Multiple testing in genome-wide association studies via hidden Markov models. Bioinf 25(21):2802–2808

    Article  Google Scholar 

  • Xiao J, Zhu W, Guo J (2013) Large-scale multiple testing in genome-wide association studies via; region-specific hidden Markov models. BMC Bioinf 14(1):282–282

  • Zablocki RW, Schork AJ, Levine RA, Andreassen OA, Dale AM, Thompson WK (2014) Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics 30(15):2098–2104

    Article  Google Scholar 

  • Zablocki RW, Levine RA, Schork AJ, Xu S, Wang Y, Fan CC, Thompson WK (2017) Semiparametric covariate-modulated local false discovery rate for genome-wide association studies. Ann Appl Stat 11(4):2252–2269

    Article  MathSciNet  Google Scholar 

  • Zhang H, Liu CT, Wang X (2010) An association test for multiple traits based on the generalized Kendall’s tau. J Am Stat Assoc 105(490):473–481

    Article  MathSciNet  Google Scholar 

  • Zhu W, Jiang Y, Zhang H (2012) Nonparametric covariate-adjusted association tests based on the generalized Kendall’s tau. J Am Stat Assoc 107(497):1–11

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor, the associate editor, and two anonymous reviewers for their constructive comments that helped us improve the article substantially. This work is supported in part by the National Natural Science Foundation of China (no. 11771072 and 11371083); the Science and Technology Development Plan of Jilin Province (no. 20191008004TC). The authors also thank WTCCC for permission to use the GWAS data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wensheng Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

11749_2020_746_MOESM1_ESM.pdf

\bf{Supplementary Material} Additional information for this article is available online. The content of the supplementary material provides detailed proofs of Sect.~\inlink{2.3}{secsps2.3}, Theorems~\theolink{3}{FPar3}--\theolink{5}{FPar10} in Sect. \inlink{2.4}{secsps2.4}, {an explanation of the conservative of the LIS procedures and the results of the additional real data analysis.} We implement the CALIS procedure by using the R code. All core code of CALIS procedure are freely accessible on GitHub (\url{https://github.com/wszhustat/CALIS-via-FHMM}). (pdf 226 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, T., Wang, P. & Zhu, W. Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models. TEST 30, 737–757 (2021). https://doi.org/10.1007/s11749-020-00746-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-020-00746-8

Keywords

Mathematics Subject Classification

Navigation