Abstract
It is more and more important to consider the dependence structure among multiple testings, especially for the genome-wide association studies (GWAS). The existing procedures, such as local index of significance (LIS) and pooled local index of significance (PLIS), were proposed to test hidden Markov model (HMM)-dependent hypotheses under the framework of compound decision theory, which was successfully applied to GWAS. However, the etiology of complex diseases is not only with respect to the genetic effects, but also the environmental factors. Failure to account for the covariates in multiple testing can produce misleading bias of the association of interest, or suffer from loss of testing efficiency. In this paper, we develop a covariate-adjusted multiple testing procedure, called covariate-adjusted local index of significance (CALIS), to account for the effects of environmental factors via a factorial hidden Markov model. The theoretical results show that our procedure can control the false discovery rate (FDR) at the nominal level and has the smallest false non-discovery rate (FNR) among all valid FDR procedures. We further demonstrate the advantage of our novel procedure over the existing procedures by simulation studies and a real data analysis.
Similar content being viewed by others
References
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57:289–300
Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25(1):60–83
Bickel PJ, Ritov YA, Ryden T (1998) Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models. Ann Stat 26(4):1614–1635
Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, Mccarthy M, Iand Ouwehand WH, Samani NJ (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145):661–78
Cai TT, Sun W, Wang W (2019) Covariate-assisted ranking and screening for large-scale two-sample inference. J R Stat Soc Ser B (Methodol) 81(2):187–234
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Genovese C, Wasserman L (2002) Operating characteristics and extensions of the false discovery rate procedure. J R Stat Soc Ser B (Methodol) 64(3):499–517
Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1035–1061
Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273
Jiang Y, Zhang H (2011) Propensity score-based nonparametric test revealing genetic variants underlying bipolar disorder. Genetic Epidemiol 35(2):125–132
Krystal JH, Sanacora G, Blumberg H, Anand A, Charney DS, Marek G, Epperson CN, Goddard A, Mason GF (2002) Glutamate and gaba systems as targets for novel antidepressant and mood-stabilizing treatments. Mol Psychiatry 7(1):S71
Kuan PF, Chiang DY (2012) Integrating prior knowledge in multiple testing under dependence with applications to detecting differential dna methylation. Biometrics 68(3):774–783
Lei L, Fithian W (2018) Adapt: an interactive procedure for multiple testing with side information. J R Stat Soc Ser B (Methodol) 80(4):649–679
Leroux BG (1992) Maximum-likelihood estimation for hidden Markov models. Stochc Process Their Appl 40(1):127–143
Liang K, Nettleton D (2010) A hidden Markov model approach to testing multiple hypotheses on a tree-transformed gene ontology graph. J Am Stat Assoc 105(492):1444–1454
Liang K, Du C, You H, Nettleton D (2018) A hidden Markov tree model for testing multiple hypotheses corresponding to gene ontology gene sets. BMC Bioinf 19(1):107
Liu J, Zhang C, Page D (2016) Multiple testing under dependence via graphical models. Ann Appl Stat 10(3):1699–1724
Merikangas KR, Mehta RL, Molnar BE, Walters EE, Swendsen JD, Aguilar-Gaziola S, Bijl R, Borges G, Caraveo-Anduaga JJ, Dewit D (1998) Comorbidity of substance use disorders with mood and anxiety disorders: results of the international consortium in psychiatric epidemiology. Addict Behav 23(6):893–907
Newton MA, Noueiry AO, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–76
Schork AJ, Thompson WK, Phillip P, Ali T, Cooper J, R, Sullivan PF, Kelsoe JR, O’Donovan MC, Helena F, Schork NJ, (2013) All SNPS are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPS. PloS Genet 9(4):e1003449
Shu H, Nan B, Koeppe R (2015) Multiple testing for neuroimaging via hidden Markov random field. Biometrics 71(3):741–750
Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B (Methodol) 64(3):479–498
Sun W, Cai TT (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102(479):901–912
Sun W, Cai TT (2009) Large-scale multiple testing under dependence. J R Stat Soc Ser B (Methodol) 71(2):393–424
Sun W, Reich BJ, Cai TT, Guindani M, Schwartzman A (2015) False discovery control in large-scale spatial multiple testing. J R Stat Soc Ser B (Methodol) 77(1):59–83
Vawter MP, Thatcher L, Usen N, Hyde TM, Kleinman JE, Freed WJ (2002) Reduction of synapsin in the hippocampus of patients with bipolar disorder and schizophrenia. Mol Psychiatry 7(6):571
Wang X, Ye Y, Zhang H (2006) Family-based association tests for ordinal traits adjusting for covariates. Genet Epidemiol 30(8):728–736
Wei Z, Sun W, Wang K, Hakonarson H (2009) Multiple testing in genome-wide association studies via hidden Markov models. Bioinf 25(21):2802–2808
Xiao J, Zhu W, Guo J (2013) Large-scale multiple testing in genome-wide association studies via; region-specific hidden Markov models. BMC Bioinf 14(1):282–282
Zablocki RW, Schork AJ, Levine RA, Andreassen OA, Dale AM, Thompson WK (2014) Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics 30(15):2098–2104
Zablocki RW, Levine RA, Schork AJ, Xu S, Wang Y, Fan CC, Thompson WK (2017) Semiparametric covariate-modulated local false discovery rate for genome-wide association studies. Ann Appl Stat 11(4):2252–2269
Zhang H, Liu CT, Wang X (2010) An association test for multiple traits based on the generalized Kendall’s tau. J Am Stat Assoc 105(490):473–481
Zhu W, Jiang Y, Zhang H (2012) Nonparametric covariate-adjusted association tests based on the generalized Kendall’s tau. J Am Stat Assoc 107(497):1–11
Acknowledgements
The authors are grateful to the editor, the associate editor, and two anonymous reviewers for their constructive comments that helped us improve the article substantially. This work is supported in part by the National Natural Science Foundation of China (no. 11771072 and 11371083); the Science and Technology Development Plan of Jilin Province (no. 20191008004TC). The authors also thank WTCCC for permission to use the GWAS data.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
11749_2020_746_MOESM1_ESM.pdf
\bf{Supplementary Material} Additional information for this article is available online. The content of the supplementary material provides detailed proofs of Sect.~\inlink{2.3}{secsps2.3}, Theorems~\theolink{3}{FPar3}--\theolink{5}{FPar10} in Sect. \inlink{2.4}{secsps2.4}, {an explanation of the conservative of the LIS procedures and the results of the additional real data analysis.} We implement the CALIS procedure by using the R code. All core code of CALIS procedure are freely accessible on GitHub (\url{https://github.com/wszhustat/CALIS-via-FHMM}). (pdf 226 KB)
Rights and permissions
About this article
Cite this article
Cui, T., Wang, P. & Zhu, W. Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models. TEST 30, 737–757 (2021). https://doi.org/10.1007/s11749-020-00746-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-020-00746-8
Keywords
- Factorial hidden Markov model
- Covariate adjustment
- Multiple hypotheses testing
- False discovery rate
- GWAS