Abstract
It is more and more important to consider the dependence structure among multiple testings, especially for the genomewide association studies (GWAS). The existing procedures, such as local index of significance (LIS) and pooled local index of significance (PLIS), were proposed to test hidden Markov model (HMM)dependent hypotheses under the framework of compound decision theory, which was successfully applied to GWAS. However, the etiology of complex diseases is not only with respect to the genetic effects, but also the environmental factors. Failure to account for the covariates in multiple testing can produce misleading bias of the association of interest, or suffer from loss of testing efficiency. In this paper, we develop a covariateadjusted multiple testing procedure, called covariateadjusted local index of significance (CALIS), to account for the effects of environmental factors via a factorial hidden Markov model. The theoretical results show that our procedure can control the false discovery rate (FDR) at the nominal level and has the smallest false nondiscovery rate (FNR) among all valid FDR procedures. We further demonstrate the advantage of our novel procedure over the existing procedures by simulation studies and a real data analysis.
Introduction
The problems of largescale multiple testing arise from many scientific applications. For instance, in genomewide association studies (GWAS), one needs to perform tens of thousands of tests to identify the single nucleotide polymorphisms (SNPs) associated with the complex disease. Other examples include neuroimaging data analysis (Shu et al. 2015), microarray data analysis (Liang and Nettleton 2010; Liang et al. 2018), spatial data analysis (Sun et al. 2015), etc. To date, a number of multiple testing procedures have been proposed to apply in various scientific fields. However, there are still some cumbersome issues left to cope with in largescale multiple testing. First, the growing availability of highthroughput data requires us to conduct tens of thousands of tests simultaneously. The multiple testing procedures based on traditional control criteria, such as the familywise error rate (FWER), are overly conservative and have small power when testing a large number of hypotheses. Second, the hypotheses in multiple testing often exhibit complex dependence in practice. For example, in gene category testing problems, one needs to test hundreds of null hypotheses that correspond to nodes in a gene ontology graph (Liang and Nettleton 2010; Liang et al. 2018). Ignoring the dependence structure among hypotheses may suffer from loss of testing efficiency. Finally, the statistics yielded by multiple tests may be affected by external covariates. In genetic association analyses, we usually need to evaluate the association between genetic factors and disease variables of interest by adjusting for covariate effects. Failure to account for covariate effects may lead to misleading bias of the association or suffer from loss of testing efficiency (Zhu et al. 2012).
To address the aforementioned issues, a collection of largescale multiple testing approaches have been proposed. In a seminal paper, Benjamini and Hochberg (1995) proposed a novel control criterion referred to as false discovery rate (FDR) for multiple testing and reported that multiple testing procedures based on FDR tend to achieve more multiple testing efficiency. FDR is defined as the expected proportion of false rejections among all rejections. Correspondingly, false nondiscovery rate (FNR; Genovese and Wasserman 2002) which is an alternative measure of Type II error is defined as the expected proportion of falsely accepted hypotheses. In general, we wish to develop a multiple testing procedure which controls FDR at a prespecified level \(\alpha \) and has the smallest FNR among all FDR procedures at level \(\alpha \). The traditional pvaluebased FDR procedures (Benjamini and Hochberg 1995, 2000; Storey 2002; Genovese and Wasserman 2002) are essentially trying to find a cutoff along the ranked pvalues. Typically, these procedures assumed that the tests are independent. In reality, however, this assumption can rarely hold due to the presence of complex dependence structures among tests. It is worthy and challenging to develop a multiple testing procedure that can exploit the dependent information among hypotheses properly. Recently, Sun and Cai (2009) suggested to use the hidden Markov model (HMM) to characterize the local dependence among tests and proposed a novel multiple testing procedure, termed as the LIS procedure hereafter. Due to the fact that diseaseassociated SNPs tend to be clustered and dependent, the HMM which can correctly model the correlation among adjacent SNPs has been successfully applied in GWAS. Wei et al. (2009) assumed that the whole chromosome follows an HMM and extended the LIS procedure to allow for ranking LIS values across all chromosomes. Xiao et al. (2013) pointed out that different chromosome regions should follow different HMMs and suggested to use a modified LIS procedure based on regionspecific HMM in GWAS. Some other detailed extensions of the LIS procedure can be found in Kuan and Chiang (2012), Shu et al. (2015) and Liu et al. (2016).
It is important to note that all these existing procedures do not take into account the covariate effects in multiple testing. It has been reported that the etiology of complex diseases (such as asthma, atherosclerosis, bipolar disorder and alcoholism) depend not only on the genetic effects, but also on covariates, such as the environmental factors (Wang et al. 2006; Jiang and Zhang 2011; Zhu et al. 2012). An appropriate covariateadjustment in GWAS is crucial for identifying genetic effects of interest. So far, only a handful of methods have been proposed for multiple testing with covariate effects. Zablocki et al. (2014) suggested to leverage locusspecific covariates, such as functional annotations, to improve gene discovery in GWAS. Zablocki et al. (2017) further proposed a semiparametric procedure for covariatemodulated multiple testing. In a recent research, Lei and Fithian (2018) proposed a multiple testing procedure which can adaptively select the pvalue rejection threshold by using generic side information as covariates. However, little work has been done to address both the covariate and dependence issues. Thereby it is necessary to explore new multiple testing procedures that allow for the covariateadjustment along with the dependent information among tests.
To adjust for the effects of covariates, we provide a factorial HMM for largescale multiple testing. The factorial HMM was first named by Ghahramani and Jordan (1997) and is a generalization of the HMM in which the hidden state is factored into multiple state variables. Specifically, we factor each hidden state of HMM into two state variables, among which one indicates the state (null or nonnull) of a hypothesis that we mainly focus on and the other one denotes the existence status (presence or absence) of covariate effects. Following the fundamental work of Sun and Cai (2009), we further assume that the existence status (presence or absence) of covariate effects follow a Markov chain. It is necessary to point out that the above assumption is natural and reasonable in many applications. For example, in GWAS, SNPlevel functional annotations may affect the distribution of testing statistics of interest (Schork et al. 2013). Since the adjacent genomic loci tend to cosegregate in meiosis, it is reasonable to assume that the SNPs with similar functional annotations are clustered and locally dependent.
Essentially, most of the multiple testing procedures involve two steps: ranking the hypotheses based on some suitable multiple testing statistics such as pvalues (Benjamini and Hochberg 1995); Lfdr values (local false discovery rate; Efron et al. 2001); LIS values (local index of significance; Sun and Cai 2009), and then choosing a cutoff along the rankings. Consequently, there are two fundamental problems to solve, namely, deriving an optimal multiple testing statistic based on the factorial HMM and then setting a suitable threshold. To address the foregoing problems, we take several steps as follows. First, we demonstrate that the optimal statistics in the weighted classification problems are equivalent to those in the multiple testing problems under the generalized monotone ratio condition (GMRC); Second, we derive the optimal statistics in the weighted classification problems and then define the corresponding multiple testing statistics (termed as covariateadjusted local index of significance, CALIS); finally, with similar derivations in Sun and Cai (2009), Genovese and Wasserman (2004) and Newton et al. (2004), we can obtain a suitable cutoff along ranked CALIS values. It is necessary to note that based on the factorial HMM, the novel multiple testing procedure (called as the CALIS procedure hereafter) can not only leverage the dependent information among tests but also accommodate the covariate effects. Furthermore, both simulations and real data analysis illustrate that the CALIS procedure is valid and achieves more efficiency in multiple testing by adjusting for covariate effects.
The rest of this paper is organized as follows. In Sect. 2, we first give a brief review of the LIS procedure under an HMM (Sun and Cai 2009). Followed by their fundamental work, we introduce the framework of the covariateadjusted multiple testing under the factorial HMM and illustrate the connection between the weighted classification problems and the covariateadjusted multiple testing problems. Then, we propose optimal covariateadjusted multiple testing procedures with the factorial HMM parameters that are known and unknown, respectively (i.e., the oracle CALIS procedure and datadriven CALIS procedure). The concrete theoretical results and detailed implementations of these procedures are discussed in the end. In Sect. 3, we carry out extensive simulations to evaluate the performance of our oracle and datadriven CALIS procedures. A real data analysis of the bipolar disorder is in Sect. 4. Finally, we conclude with some discussions and suggestions for future work in Sect. 5.
Statistical methods
The LIS procedure under an HMM
Suppose that there are m hypotheses of interest \(\{H_i\}^m_{i=1}\) to test simultaneously. Let \(\{\theta _i\}^m_{i=1}\) be the underlying states of the m hypotheses, where \(\theta _i=1\) implies hypothesis \(H_i\) belongs to the nonnull case and \(\theta _i=0\) otherwise. Sun and Cai (2009) suggested to use an HMM to model the dependence structure among hypotheses and assumed that \(\{\theta _i\}^m_{i=1}\) follows a stationary, irreducible and aperiodic Markov chain. They further assumed that the observations \(\{z_i\}^m_{i=1}\) are conditionally independent given the hidden states \(\{\theta _i\}^m_{i=1}\), i.e.,
where the observed value \(z_i\) can be a zvalue (Wei et al. 2009) or a test statistic (Liu et al. 2016) corresponding to the ith hypothesis. The LIS is defined as
where \(\varphi \) represents the parameters of the HMM. Denote by \(LIS_{(1)},LIS_{(2)},\) \(\dots ,\) \(LIS_{(m)}\) the ordered LIS values and \(H_{(1)},H_{(2)},\dots ,H_{(m)}\) the corresponding null hypotheses. Then the LIS procedure can be described as follows:
It is worth noting that the LIS procedure implicitly assumed that the observations merely depend on the states of main effect hypotheses, namely, \(\{\theta _i\}^m_{i=1}\). In practice, however, the observations are usually affected by external covariates. To account for the covariate effects, we introduce a factorial HMM in the following section.
The covariateadjusted multiple testing via the factorial HMM
Let \(\{\gamma _i\}^m_{i=1}\) be the existence status of covariate effects, where \(\gamma _i=1\) indicates \(z_i\) is affected by external covariate effects and \(\gamma _i=0\) otherwise. Assume that \(\{\theta _i\}^m_{i=1}\) and \(\{\gamma _i\}^m_{i=1}\) are mutually independent, and each follows a stationary, irreducible and aperiodic Markov chain, respectively. The corresponding transition probabilities are
and
Moreover, assume that the observations \(\{z_i\}^m_{i=1}\) are conditionally independent given the hidden states \(\{\theta _i\}^m_{i=1}\) and \(\{\gamma _i\}^m_{i=1}\), namely,
Following the twocomponent mixture model (Sun and Cai 2009; Wei et al. 2009; Xiao et al. 2013), we further assume the random variable \(Z_i\) (with respect to \(z_i\)) follows a fourcomponent mixture model:
where \(F_{0,0}\), \(F_{0,1}\), \(F_{1,0}\), and \(F_{1,1}\) are conditional distributions of \(Z_i\) given \((\theta _i, \gamma _i)=(0,0), (0,1), (1,0)\), and (1, 1), respectively. The dependence model (1)–(3) is called factorial hidden Markov model and was also discussed by Ghahramani and Jordan (1997). The structure of the factorial HMM can be intuitively understood with a directed graph in Fig. 1.
In practice, it is natural to assume that \(F_{0,0}\) is the standard normal distribution N(0, 1), and \(F_{0,1}\), \(F_{1,0}\) and \(F_{1,1}\) are normal mixtures. Since our main idea is to adjust for the covariate effects in multiple testing, for simplicity, we only assume that \(F_{1,1}\) is a normal mixture and \(F_{0,1}\) and \(F_{1,0}\) are normal distributions. Extending to the settings where all alternatives are normal mixtures is straightforward, but requires additional computations. It is necessary to note that the number of components in the normal mixture, denoted by L, is usually unknown. As Sun and Cai (2009) suggested, we can use likelihoodbased criteria, such as the Akaike or Bayesian information criterion (AIC or BIC) to select appropriate L. Denote by \({\mathcal {A}}=(a_{ij})_{2\times 2}\) and \({\mathcal {B}}=(b_{ij})_{2\times 2}\) the transition probability of \(\{\theta _i\}^m_{i=1}\) and \(\{\gamma _i\}^m_{i=1}\), where \(a_{ij}=P(\theta _s=j\theta _{s1}=i)\) and \(b_{ij}=P(\gamma _s=j\gamma _{s1}=i)\) for \(i=0, 1\), \(j=0, 1\). Let \(\pi =(\pi _0, \pi _1)\) and \({\tilde{\pi }}=({\tilde{\pi }}_0,{\tilde{\pi }}_1)\), where \(\pi _j=P(\theta _s=j)\) and \({\tilde{\pi }}_j=P(\gamma _s=j)\) are the stationary distributions of \(\{\theta _i\}^m_{i=1}\) and \(\{\gamma _i\}^m_{i=1}\), respectively. For convenience, let \({\mathcal {F}}=\{F_{00}, F_{01}, F_{10}, F_{11}\}\), then denote by \(\vartheta =({\mathcal {A}}, {\mathcal {B}}, \pi , {\tilde{\pi }}, {\mathcal {F}})\) the parameters of the covariateadjusted multiple testing under a factorial HMM.
The relationship between covariateadjusted multiple testing and weighted classification
Sun and Cai (2009) developed a compound decision theoretic framework for both the weighted classification problem and the multiple testing problem in an HMM. They had shown that under the monotone ratio condition (MRC), the optimal statistic in the weighted classification problem is equivalent to that in the multiple testing problem. Inspired by their fundamental work, we extend the MRC to the generalized monotone ratio condition (GMRC) for our covariateadjusted multiple testing under a factorial HMM. It can be shown that the optimal statistics in the weighted classification problem and the covariateadjusted multiple testing problem are still equivalent under the GMRC.
For simplify, let \({\mathbf {z}}=\{z_i\}^m_{i=1}\). Let \(\lambda \) be the relative cost of false positive to false negative. Consider the weighted classification problem with the loss function
where \(\delta _i=1_{\{T_i({\mathbf {z}})<t\}},i=1,\dots ,m,\) is a classification rule and \(T_i({\mathbf {z}})\) is a classification statistic. Let \(G^{jk}_i(t)=P(T_i({\mathbf {z}})<t\theta _i=j,\gamma _i=k)\) be the conditional cumulative distribution functions (CDFs) of \(T_i({\mathbf {z}})\) for \(j=0,1,k=0,1\). Let \(G^{jk}(t)=\frac{1}{m}\sum \limits ^m_{i=1} G^{jk}_i(t)\) be the average conditional CDFs of \(T_i({\mathbf {z}})\) and \(g^{jk}(t)=(d/dt)G^{jk}(t)\) be the average conditional probability density functions (PDFs) of \(T_i({\mathbf {z}})\). Define the generalized monotone ratio condition:
The GMRC can be viewed as a generalized version of the MRC in Sun and Cai (2009). It is easy to see the GMRC reduces to the MRC when there is no covariate affects the statistic. The detailed derivations are deferred to the Supplementary Material.
For clarity, denote \({\mathcal {T}}\) the collection of statistics satisfy the GMRC. The following theorem shows that the GMRC is an advisable condition for inference in the factorial HMM.
Theorem 1
Consider the factorial hidden Markov model (1)–(3). Let \(\delta \) be a decision rule of the form \(\delta ({\mathbf {T}},c)=(1_{(T_i({\mathbf {z}})<c)}:i=1,\dots ,m)\) with \(T_i({\mathbf {z}})\in {\mathcal {T}}\). Then

(a)
mFDR of \(\delta ({\mathbf {T}},c)\) is strictly increasing in the threshold c.

(b)
mFNR of \(\delta ({\mathbf {T}},c)\) is strictly decreasing in the threshold c.

(c)
In the weighted classification problem, the optimal cutoff c that minimizes the classification risk is strictly decreasing in \(\lambda \).
The form of the optimal classification statistic \(\varvec{\Lambda }\) in the weighted classification problem is given by the following theorem. Moreover, if \(\varvec{\Lambda } \in {\mathcal {T}}\), it is also the optimal testing statistic in the covariateadjusted multiple testing problem.
Theorem 2
Consider the factorial hidden Markov model (1)–(3). Suppose that the true parameters \(\vartheta \) are known. Then the optimal classification rule which minimizes the expectation of the loss function in the weighted classification problem is \(\delta (\varvec{\Lambda },c)=(\delta _1,\dots ,\delta _m)\), where
and \(\delta _i=1_{(\Lambda _i({\mathbf {z}})<1/\lambda )}\) for \(i=1,\dots ,m\). Moreover, if \(\varvec{\Lambda } \in {\mathcal {T}}\), then \(\varvec{\Lambda }\) is also the optimal statistic in the covariateadjusted multiple testing problem in the sense that, for each mFDR at level \(\alpha \), there is a unique \(c(\alpha )\) such that \(\delta (\varvec{\Lambda },c(\alpha ))\) controls mFDR at level \(\alpha \) with the smallest mFNR among all \(\alpha \)level testing rule with the testing statistic satisfies GMRC.
Since the proofs of Theorems 1 and 2 are analogous to those in Sun and Cai (2007), we omit the the proofs here. The next theorem will show that the optimal classification statistic \(\varvec{\Lambda }\) belongs to GMRC class \({\mathcal {T}}\) indeed.
Theorem 3
Consider the optimal classification statistic \(\varvec{\Lambda }\) in Theorem 2. Let \(G^{jk}_i(t)=P(\varvec{\Lambda }_i({\mathbf {z}})<t\theta _i=j,\gamma _i=k),\) \(G^{jk}(t)=\frac{1}{m}\sum \limits ^m_{i=1} G^{jk}_i(t)\), and \(g^{jk}(t)=(d/dt)G^{jk}(t)\) for \(j=0,1,k=0,1\). Then we have
namely, \(\varvec{\Lambda }\) belongs to GMRC class.
The CALIS procedure under a factorial HMM
It has been shown that the optimal classification statistic \(\varvec{\Lambda }\) is also optimal for covariateadjusted multiple testing problem in the previous section. Note that \(\varvec{\Lambda }_i({\mathbf {z}})\) is increasing with \(P_{\vartheta }(\theta _i=0{\mathbf {z}})\). Hence, an optimal covariateadjusted multiple testing rule in a factorial HMM can be written in the form of \(\delta _i=1_{\{P_{\vartheta }(\theta _i=0{\mathbf {z}})<t\}},i=1,\dots ,m\). We define the covariateadjusted local index of significance (CALIS) for hypothesis \(H_i\) as
It is important to point out that the definitions of CALIS and LIS may seem similar on the surface, however, there are fundamental differences between CALIS and LIS on their implications and calculations. First, the different subscripts (\(\varphi \) and \(\vartheta \)) indicate that LIS and CALIS are based on different dependence models (HMM and factorial HMM). As illustrated in the previous section, the factorial HMM which can accommodate covariateadjustment is more reasonable and flexible. Second, CALIS can be expressed into two parts, namely,
where each part in the right hand can be effectively calculated by using a modified forward–backward algorithm. This implies that we indeed adjust for the effects of covariates when calculating CALIS, which is exactly what we want to do for largescale multiple testing under dependence in this paper.
Given the optimal covariateadjusted multiple testing statistic, CALIS, next we need to derive a suitable cutoff with respect to CALIS. Since the derivations of searching for the suitable cutoff are the same with Sun and Cai (2009), Genovese and Wasserman (2004) and Newton et al. (2004), we omit the details of the derivations.
Given the parameters \(\vartheta \) of the factorial HMM, denote by \(CALIS_{(1)},CALIS_{(2)}\) \(,\dots ,CALIS_{(m)}\) the ordered CALIS values and \(H_{(1)},H_{(2)},\dots ,H_{(m)}\) the corresponding null hypotheses. The oracle CALIS procedure (\(\vartheta \) are known) operates as follows:
The next theorem shows that the oracle CALIS procedure is valid, namely, it controls FDR at the prespecified level.
Theorem 4
Consider the factorial hidden Markov model (1)–(3). Then the oracle testing procedure (7) controls FDR at \(\alpha \).
In reality, the parameters \(\vartheta \) of the factorial HMM are typically unknown. We use the plugin \({\widehat{CALIS}}_i, i = 1,\dots ,m\) in the datadriven procedure by replacing \(\vartheta \) with its MLE \({\hat{\vartheta }}\). Similarly, the datadriven CALIS procedure with unknown parameters \(\vartheta \) operates as follows:

Calculate the plugin \({\widehat{CALIS}}_i=P_{{\hat{\vartheta }}}(\theta _i=0{\mathbf {z}})\) value, where \({\widehat{\vartheta }}\) can be obtained by using EM algorithm.

Rank the plugin \({\widehat{CALIS}}\) values. Denote by \({\widehat{CALIS}}_{(1)},\dots ,{\widehat{CALIS}}_{(m)}\) the ordered \({\widehat{CALIS}}\) values and \(H_{(1)},\dots ,H_{(m)}\) the corresponding null hypotheses.

Let \(l=\max \left\{ i:\frac{1}{i}\sum ^i_{j=1}{\widehat{CALIS}}_{(j)}\le \alpha \right\} \). Then reject all \(H_{(i)}\), for \(i=1,\dots ,l. ~~~~~\)
Next, we will show that the oracle CALIS procedure and the datadriven CALIS procedure are asymptotically equivalent under some standard assumptions on the factorial HMM. A detailed illustration of these assumptions can be found in the literature (Bickel et al. 1998; Leroux 1992).
Assumption 1
The hidden states \(\{\theta _i\}_{i=1}^m\) and \(\{\gamma _i\}_{i=1}^m\) are mutually independent, and each follows a irreducible, aperiodic and stationary Markov chain, respectively. They are characterized by \(\vartheta _0=({\mathcal {A}}_0, {\mathcal {B}}_0, {\pi }_0, \tilde{{{\pi }}}_0, {\mathcal {F}}_0)\). \(\vartheta _0\) is an interior point of the parameter space \(\varTheta \).
Assumption 2
Denote by \({\mathcal {A}}_{\vartheta }=(a_{ij}(\vartheta ))\) and \({\mathcal {B}}_{\vartheta }=(b_{ij}(\vartheta ))\) the transition matrices and \(\pi _\vartheta =(\pi _0(\vartheta ), \pi _1(\vartheta ))\) and \({\tilde{\pi }}_{\vartheta }=({\tilde{\pi }}_0(\vartheta ),{\tilde{\pi }}_1(\vartheta ))\) the stationary distributions of the hidden states \(\{\theta _i\}_{i=1}^m\) and \(\{\gamma _i\}_{i=1}^m\), respectively. There are \(\gamma >0\) and \(\epsilon _0>0\) such that, for all \(\vartheta \vartheta _0<\gamma \) and all \(i,j=0,1,\) \(a_{ij}(\vartheta )\ge \epsilon _0>0\), \(b_{ij}(\vartheta )\ge \epsilon _0>0\), \(\pi _i(\vartheta )\ge \epsilon _0>0\) and \({\tilde{\pi }}_i(\vartheta )\ge \epsilon _0>0\).
Assumption 3
There is a \(\gamma >0\) such that \(P(\rho _0(Z_1)\theta _1=j)<1\) for all j, where
Assumption 4
\({\widehat{\vartheta }}\) is a consistent estimate of \(\vartheta _0\).
Assumption 5
\(\sum \nolimits ^1_{k=0}P_{\vartheta }(Z_1=z\theta _{1}=j,\gamma _{1}=k){\tilde{\pi }}_k\) for \(j=0,1\) are continuous and positive over the real line, and
for all \(\vartheta \vartheta _0<\gamma \).
Theorem 5
Consider the factorial HMM (1)–(3). Let \(FDR_{CALIS}^{OR}\), \(FDR_{CALIS}\), and \(FNR_{CALIS}^{OR}\), \(FNR_{CALIS}\) be the FDR levels and FNR levels that are yielded by oracle CALIS and datadriven CALIS procedures, respectively. If assumptions 1–5 hold, then \(FDR_{CALIS}^{OR}FDR_{CALIS}\rightarrow 0\), as \(m\rightarrow \infty \). In addition, if at least a fixed proportion of hypotheses are not rejected, then \(FNR_{CALIS}^{OR}FNR_{CALIS}\rightarrow 0\), as \(m\rightarrow \infty \).
The forward–backward algorithm for computing CALIS
According to the definition of the CALIS, it can be expressed as:
where the forward variable \(\alpha _k(p,q)=P_{\vartheta }(\{z_i\}^k_{i=1},\theta _k=p,\gamma _k=q)\) and backward variable \(\beta _k(p,q)=P_{\vartheta }(\{z_i\}^m_{i=k+1}\theta _k=p,\gamma _k=q)\). With a few minor modifications on the forward–backward algorithm (Baum et al. 1970), we can obtain
and
where \(f_{p,q}\) is the probability density with respect to \(F_{p,q}\), for \(p,q=0,1\).
The EM algorithm for calculating parameters of the factorial HMM
In this section, we give a detailed EM algorithm for calculating parameters of the factorial HMM when the number of components \(L = 1\) in Table 1. It is easy to extend it to the case of \(L \ge 2\), but requires more complex notations.
Simulation studies
In this section, we conduct a series of simulation studies to evaluate the numerical performance of our CALIS procedures, including the oracle CALIS procedure and the datadriven CALIS procedure. The simulations are divided into two scenarios in terms of mechanisms for generating observed values. In Scenario 1, the observed values are generated from the factorial HMM described in Sect. 2.2. In order to simulate a more realistic dependence structure among SNPs, we simulate genotypes based on the HapMap3 dataset, generate casecontrol subjects via a logistic regression model, and calculate the observed value of each genetic locus through existing test methods in Scenario 2. It is necessary to point out that the number of components L is known in settings of Scenario 1, whereas L is completely unknown for Scenario 2 and we use BIC for selection.
Scenario 1: the factorial HMM dependence structure
In this scenario, the simulation results are based on 100 replications and the number of hypotheses that we mainly focus on is set to be 3000. Consider the factorial HMM, as described in Sect. 2.2. The states of primary hypotheses \(\{\theta _i\}^{3000}_{i=1}\) are generated with the transition matrix \({\mathcal {A}}=(0.95,0.05;0.1,0.9)\) and the initial distribution is set to be (0.95, 0.05). Similarly, the existence status of covariate effects \(\{\gamma _i\}^{3000}_{i=1}\) are generated with the transition matrix \({\mathcal {B}}=(0.9,0.1;0.05,0.95)\) and the initial distribution is set to be (0.8, 0.2). The observations \(\{z_i\}_{i=1}^{3000}\) are generated from the fourcomponent mixture model (4) described in Sect. 2.2, where \(F_{0,0}\sim N(0, 1)\), \(F_{0,1}\sim N(1, 1)\), \(F_{1,0}\sim N(1, 1)\), and \(F_{1,1}\) is a normal mixture. Here we assume that the number of components L in \(F_{1,1}\) is known,and further divide this simulation into two cases with different values of the number of components in the normal mixture (\(L=1\) and 2). In essence, the case with \(L=1\) is equivalent to that all alternatives (\(F_{0,1}, F_{1,0}\) and \(F_{1,1}\)) are normal distributions.
Case 1 (\(L=1\)): \(F_{1,1}\sim N(\mu _1, 1)\)
In this case, we vary \(\mu _1\) from 1 to 5 with an increment 0.5 and exhibit the simulation results in Fig. 2.
In Fig. 2, we can see from panel (a) that all four procedures control FDR levels at the prespecified level 0.1 consistently. However, the LIS procedures (LIS.or and LIS.dd; the oracle and datadriven LIS procedures) are always conservative with a low FDR level around 0.02. The results in panel (b) illustrate that: (1) the FNR values yielded by the CALIS procedures (CALIS.or and CALIS.dd; the oracle and datadriven CALIS procedures) are almost the same; (2) the same holds true for the LIS procedures; (3) the FNR values of the LIS procedures are much higher compared with those of the CALIS procedures; (4) the FNR value of the CALIS procedures is decreasing when \(\mu _1\) varies from 1 to 5. The foregoing results indicate that our CALIS procedures are valid and significantly outperform their competitors by exploiting the information of the covariate effects properly.
In addition, we provide an explanation of the conservative of the LIS procedures under the preceding model setting by focusing on the special case that the tests are independent (Efron et al. 2001), and put detailed explanations into the Supplementary Material for the sake of coherence.
Case 2 (\(L=2\)): \(F_{1,1}\sim 0.5N(3,1)+0.5N(\mu _2,1)\)
In this case, we vary \(\mu _2\) from 1 to 3 with an increment 0.25 and exhibit the simulation results in Fig. 3.
In Fig. 3, we can observe from panel (a) that all four procedures control FDR at the nominal level 0.1 approximately. Although the datadriven CALIS procedure possess the largest FDR, it is still acceptable (FDR=0.107). The LIS procedures are conservative and lead to a small FDR value around 0.05. We can also see from panel (b) that the FNR yielded by the CALIS procedures are nearly overlapped and uniformly dominate those of the LIS procedures. When \(\mu _2\) is relatively large, we can see that the datadriven LIS procedure outperforms the oracle LIS procedure. This may be due to a higher FDR level of the LIS procedure. By and large, the numerical results almost coincide with those in Case 1.
It is worth to pay attention that the higher power of our procedures is not gained at the cost of a higher FDR level. To account for this point, we further evaluate the sensitivities yielded by these procedures at different FDR levels for the fixed \(\mu _1\) under the setting of Case 1, where the sensitivities are calculated as the average proportions of correctly identified SNPs over 100 replications. The results are shown in Fig. 4. We can observe that: (1) the sensitivity curves of the CALIS procedures are almost overlapped; (2) the same holds true for the LIS procedures except for the case that \(\mu _1=1\); (3) the sensitivity values of our procedures are consistently superior than those of LIS procedures. The above results imply that our CALIS procedures enjoy a higher multiple testing efficiency compared with the LIS procedures at the same FDR level.
Scenario 2: the more realistic SNPdependence structure
In order to further compare the numerical performance of our CALIS procedure (CALIS; the CALIS procedure using the covariateadjusted observed values) and the LIS procedures using the covariateunadjusted and covariateadjusted observed values (the LIS procedure and the LIS.cov procedure) with more realistic LD patterns among SNPs, we generate a genotype pool by randomly matching the 340 haplotypes from the subjects of JPT (Japanese in Tokyo, Japan) and CHB (Han Chinese in Beijing, China) collected by HapMap3. To be fair, we also take into account the CALIS procedure using the covariateunadjusted observed values (the CALIS.uncov procedure) in the comparisons. Here the covariateadjusted and covariateunadjusted observed values are, respectively, calculated by using the covariateadjusted association test (Jiang and Zhang 2011) and the covariateunadjusted association test (Zhang et al. 2010). To focus on the main points, we restrict attention to a region of the first chromosome which consists of 1000 SNPs. Four SNPs are selected as the diseaseassociated SNPs (with relative risk 1.5), among which two SNPs are far away and the other two SNPs are close (separated by 3 SNPs). In addition, we consider two environmental factors, namely, a continuous covariate \(E_{co}\) generated from N(0, 1) and a categorical covariate \(E_{ca}\) generated from binomial distribution B(1, 0.5). The phenotype Y is generated according to a logistic regression model:
where \(\beta ={(\beta _1,\beta _2,\beta _3,\beta _4)}^{T}, G={(G_1,G_2,G_3,G_4)}^{T}\) and \(G_i\) is the corresponding genotype of the ith causal SNPs. We set \(\beta _1=\beta _2=\beta _3=\beta _4= \log (1.5)\) and consider the following settings with different \(\gamma _1\) and \(\gamma _2\).

(a)
Setting 1: \(\gamma _1=3.5,\gamma _2=3.5\).

(b)
Setting 2: \(\gamma _1=4,\gamma _2=4\).

(c)
Setting 3: \(\gamma _1=4.5,\gamma _2=4.5\).
Correspondingly, \(\beta _0\) is set to be \(11.45, 12.75\), and \(14\) so that the prevalence of the disease is controlled at 0.02. For each setting, we repeatedly generate the disease status for each individual until we obtain 1000 cases and 1000 controls. The four diseaseassociated SNPs are removed from our simulated data set. Then, the twentyone SNPs which comprise the 3 adjacent SNPs on each side of the 4 diseasecausal SNPs are defined as relevant SNPs. We evaluate the implementing of a testing procedure by selection rate of relevant SNPs. As mentioned earlier, the number of components L is unknown in this scenario and it is selected by BIC. The simulation is repeated for 100 times and the results for the above model settings are displayed in Figs. 5, 6 and 7, respectively.
From Figs. 5a, 6a and 7a, we can see that the sensitivity (defined as the percentages of true positives that are selected by the top K SNPs) is increasing when the top K SNPs varies from 0 to 300. The sensitivity yielded by the CALIS procedure is uniformly larger than those of LIS procedures and the CALIS.uncov procedure. This indicates that our CALIS procedure achieves higher ranking efficiency and can discover more true positives at the same number of rejections. It is interesting to note that the difference of the sensitivities between the CALIS procedure and the LIS procedures is increasing in the values of \(\gamma _1\) and \(\gamma _2\). This illustrates that utilizing the covariateadjustment is helpful especially when the covariate effect is large. We can also observe that the sensitivity yielded by the CALIS procedure dominates that of the CALIS.uncov procedure. For such LIS procedures (LIS.cov and LIS), the LIS.cov procedure is preferable. This reveals that taking into account the covariate effect is helpful in multiple testing.
From Figs. 5b, 6b and 7b, we can see an alternative measure of ranking efficiency, namely, the ROC curve. Here, the ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at different threshold settings over 100 replications. It is apparent to see the ROC curves yielded by the CALIS procedure dominate those of its competitors and the results almost coincide with those in Scenario 1.
To evaluate the numerical performance of the CALIS procedure even if there is no covariate effect, we conduct additional simulation studies in the setting where \(\gamma _1=\gamma _2=0\). We set \(\beta _1=\beta _2=\beta _3=\beta _4= \log (1.5)\) and \(\beta _0=4.8\) so that the the prevalence of the disease is controlled at 0.02. Due to the absence of covariate effects, we choose the LIS procedure using the covariateunadjusted association test (Zhang et al. 2010) as a benchmark. The simulation is repeated for 100 times, and the results are displayed in Fig. 8. We can observe that the CALIS procedure performs well when the top K or FPR is small although it is somewhat conservative overall for \(\gamma _1=\gamma _2=0\). Interestingly, the sensitivity or the ROC curve of the CALIS.uncov procedure is not much different from that of the LIS procedure. This suggests that the CALIS.uncov procedure can be served as an alternative when there is no covariate effect.
Application to bipolar disorder datasets
Bipolar disorder (BD) is a manic depressive illness that causes periods of depression and periods of elevated mood. There are a series of evidences for substantial genetic and environmental contributions to the risk of BD (Merikangas et al. 1998). However, the pathogenic mechanism of BD is not clearly understood. With the purpose of identifying SNPs associated with BD while adjusting for covariates, we apply our new procedure to an analysis of BD datasets. The datasets were collected by Wellcome Trust Case Control Consortium (WTCCC) and contain 1998 cases and 3004 controls, among which there are 1504 control samples from the 1958 Birth Cohort (58C) and 1500 control samples from UK Blood Service (UKBS). The study subjects were genotyped by using GeneChip 500k arrays at the Affymetrix Services Lab. In addition, we consider gender and age at recruitment as covariates.
A series of procedures are performed for quality control (QC), before real data analysis. We exclude 130 samples form the BD cohort, 24 samples form the 58C cohort and 42 samples from the UKBS cohort owing to the high missing rate, overall heterozygosity and nonEuropean ancestry. In addition, we eliminate the SNPs in accordance with the exclusion list provided by WTCCC. Furthermore, the SNPs with minor allele frequency less than 5% are also excluded.
It has been reported that fifteen SNPs are showing to be associated with BD, where thirteen SNPs are showing moderate evidence of association with BD by Burton et al. (2007), the others (rs7680321 and rs11089599; Krystal et al. 2002 and Vawter et al. 2002) are related to GABA neurotransmission in GABRB1 and synaptic function in SYN3, respectively. Note that all of the suspected SNPs are located on the ten different chromosomes. To illustrate the main point, we only perform our novel procedure on the ten chromosomes separately and make a comparison with the classical LIS procedure (the LIS.cov procedure). Likewise, we use the BIC to select the number of components L. The observed values are calculated by using the covariateadjusted association test proposed by Jiang and Zhang (2011). It has been shown that ranking LIS.cov values across all chromosomes can achieve more testing efficiency (Wei et al. 2009). Hence, we first calculate chromosomespecific CALIS values and LIS.cov values, and then rank the CALIS values and LIS.cov values across all ten chromosomes. The detailed results are shown in Table 2. The FDR level is set to be \(1\times 10^{7}\) for both CALIS procedure and LIS.cov procedure. There are 249 SNPs identified by the CALIS procedure and 202 SNPs identified by the LIS.cov procedure from a total of 182,072 SNPs. Furthermore, among the 15 suspected SNPs detected by Burton et al. (2007), six of them are identified by the CALIS procedure while only one (rs1344484) is identified by the LIS.cov procedure. Note that a smaller value of CALIS values or LIS.cov values indicates the SNP is more likely to be associated with the corresponding disease. To be specific, in Table 2, we can observe that the CALIS value is uniformly smaller than the LIS.cov value for each identified SNP. This implies that taking into account the covariateadjustment, the CALIS procedure achieves more evidences of association between suspected SNPs and BD. We can also see that our CALIS values uniformly enjoy smaller rankings compared with the LIS.cov values. This illustrates that the CALIS procedure enjoys more efficient rankings of SNPs that are associated with BD.
We further apply the LIS procedure and the CALIS procedure with the covariates (gender and age) are excluded in this real data analysis, which are, respectively, denoted by the LIS procedure and the CALIS.uncov procedure. The observed values in this case are calculated by employing the covariateunadjusted association test (Zhang et al. 2010). There are 66 SNPs identified by the CALIS.uncov procedure and 64 SNPs identified by LIS procedure. Furthermore, among those 15 suspected SNPs, one of them is identified by the CALIS.uncov procedure, while none is identified by the LIS procedure. The detailed results are listed in Supplementary Material. Based on the results of Table 2 and Table S1 in Supplementary Material, we can conclude that accommodating covariate adjustment in multiple testing can improve the chance of identifying diseaserelated SNPs.
Discussion
In this paper, we propose a covariateadjusted multiple testing procedure based on a factorial HMM. The new procedure can adjust for covariate effects when detecting the major interests associated with the outcomes. The theoretical results show that our procedure is valid and optimal when taking into account covariate effects. Simulations and real data analysis show that the efficiency of multiple testing can be substantially improved by employing our new procedure.
Our novel procedure can be extended in several ways. First, it might be a strong assumption that the transition probabilities are invariant. Kuan and Chiang (2012) developed a multiple testing procedure based on the nonhomogeneous HMM and allowed for exogenous information to be incorporated systematically. This approach may give rise to a bright way to address this issue. However, a new problem will arise when using nonhomogeneous HMM to characterize the dependence structure in the tests. To the best of our knowledge, the consistency of the estimates of nonhomogeneous HMM has not been investigated. Second, in practice, we would like to discover the SNPs together with the environmental factors, and SNPenvironmental interactions which are truly associated with the disease. It is meaningful to develop a multiple testing procedure which can test these effects simultaneously while allow for dependence structure included.
Another more interesting question is how to relax or remove GMRC for our CALIS procedure, which makes it possible to be more widely used in practice. Although GMRC is a straightforward extension of MRC proposed by Sun and Cai (2009), to preserve the equivalence between the optimal statistics in the weighted classification problem and the covariateadjusted multiple testing problem without GMRC is also very challenging for multiple testing under dependence. Recently, Cai et al. (2019) creatively developed the covariateassisted ranking and screening (CARS) procedure for largescale twosample inference. They have shown that the CARS procedure controls the mFDR with the largest expected number of true positives (ETP) without MRC, which seems to open the possibility to the above problem. However, as Cai et al. (2019) pointed out that the theoretical results of CARS cannot be applied to dependent tests directly. Recognizing all this, we plan to pursue this problem in our further research.
References
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57:289–300
Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25(1):60–83
Bickel PJ, Ritov YA, Ryden T (1998) Asymptotic normality of the maximumlikelihood estimator for general hidden Markov models. Ann Stat 26(4):1614–1635
Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, Mccarthy M, Iand Ouwehand WH, Samani NJ (2007) Genomewide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447(7145):661–78
Cai TT, Sun W, Wang W (2019) Covariateassisted ranking and screening for largescale twosample inference. J R Stat Soc Ser B (Methodol) 81(2):187–234
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Genovese C, Wasserman L (2002) Operating characteristics and extensions of the false discovery rate procedure. J R Stat Soc Ser B (Methodol) 64(3):499–517
Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1035–1061
Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273
Jiang Y, Zhang H (2011) Propensity scorebased nonparametric test revealing genetic variants underlying bipolar disorder. Genetic Epidemiol 35(2):125–132
Krystal JH, Sanacora G, Blumberg H, Anand A, Charney DS, Marek G, Epperson CN, Goddard A, Mason GF (2002) Glutamate and gaba systems as targets for novel antidepressant and moodstabilizing treatments. Mol Psychiatry 7(1):S71
Kuan PF, Chiang DY (2012) Integrating prior knowledge in multiple testing under dependence with applications to detecting differential dna methylation. Biometrics 68(3):774–783
Lei L, Fithian W (2018) Adapt: an interactive procedure for multiple testing with side information. J R Stat Soc Ser B (Methodol) 80(4):649–679
Leroux BG (1992) Maximumlikelihood estimation for hidden Markov models. Stochc Process Their Appl 40(1):127–143
Liang K, Nettleton D (2010) A hidden Markov model approach to testing multiple hypotheses on a treetransformed gene ontology graph. J Am Stat Assoc 105(492):1444–1454
Liang K, Du C, You H, Nettleton D (2018) A hidden Markov tree model for testing multiple hypotheses corresponding to gene ontology gene sets. BMC Bioinf 19(1):107
Liu J, Zhang C, Page D (2016) Multiple testing under dependence via graphical models. Ann Appl Stat 10(3):1699–1724
Merikangas KR, Mehta RL, Molnar BE, Walters EE, Swendsen JD, AguilarGaziola S, Bijl R, Borges G, CaraveoAnduaga JJ, Dewit D (1998) Comorbidity of substance use disorders with mood and anxiety disorders: results of the international consortium in psychiatric epidemiology. Addict Behav 23(6):893–907
Newton MA, Noueiry AO, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–76
Schork AJ, Thompson WK, Phillip P, Ali T, Cooper J, R, Sullivan PF, Kelsoe JR, O’Donovan MC, Helena F, Schork NJ, (2013) All SNPS are not created equal: genomewide association studies reveal a consistent pattern of enrichment among functionally annotated SNPS. PloS Genet 9(4):e1003449
Shu H, Nan B, Koeppe R (2015) Multiple testing for neuroimaging via hidden Markov random field. Biometrics 71(3):741–750
Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B (Methodol) 64(3):479–498
Sun W, Cai TT (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102(479):901–912
Sun W, Cai TT (2009) Largescale multiple testing under dependence. J R Stat Soc Ser B (Methodol) 71(2):393–424
Sun W, Reich BJ, Cai TT, Guindani M, Schwartzman A (2015) False discovery control in largescale spatial multiple testing. J R Stat Soc Ser B (Methodol) 77(1):59–83
Vawter MP, Thatcher L, Usen N, Hyde TM, Kleinman JE, Freed WJ (2002) Reduction of synapsin in the hippocampus of patients with bipolar disorder and schizophrenia. Mol Psychiatry 7(6):571
Wang X, Ye Y, Zhang H (2006) Familybased association tests for ordinal traits adjusting for covariates. Genet Epidemiol 30(8):728–736
Wei Z, Sun W, Wang K, Hakonarson H (2009) Multiple testing in genomewide association studies via hidden Markov models. Bioinf 25(21):2802–2808
Xiao J, Zhu W, Guo J (2013) Largescale multiple testing in genomewide association studies via; regionspecific hidden Markov models. BMC Bioinf 14(1):282–282
Zablocki RW, Schork AJ, Levine RA, Andreassen OA, Dale AM, Thompson WK (2014) Covariatemodulated local false discovery rate for genomewide association studies. Bioinformatics 30(15):2098–2104
Zablocki RW, Levine RA, Schork AJ, Xu S, Wang Y, Fan CC, Thompson WK (2017) Semiparametric covariatemodulated local false discovery rate for genomewide association studies. Ann Appl Stat 11(4):2252–2269
Zhang H, Liu CT, Wang X (2010) An association test for multiple traits based on the generalized Kendall’s tau. J Am Stat Assoc 105(490):473–481
Zhu W, Jiang Y, Zhang H (2012) Nonparametric covariateadjusted association tests based on the generalized Kendall’s tau. J Am Stat Assoc 107(497):1–11
Acknowledgements
The authors are grateful to the editor, the associate editor, and two anonymous reviewers for their constructive comments that helped us improve the article substantially. This work is supported in part by the National Natural Science Foundation of China (no. 11771072 and 11371083); the Science and Technology Development Plan of Jilin Province (no. 20191008004TC). The authors also thank WTCCC for permission to use the GWAS data.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Cui, T., Wang, P. & Zhu, W. Covariateadjusted multiple testing in genomewide association studies via factorial hidden Markov models. TEST (2021). https://doi.org/10.1007/s11749020007468
Received:
Accepted:
Published:
Keywords
 Factorial hidden Markov model
 Covariate adjustment
 Multiple hypotheses testing
 False discovery rate
 GWAS
Mathematics Subject Classification
 62M02
 62P10
 62E20