Abstract
Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariates on both the disease condition and the network. To address this issue, we propose a general covariateadjusted test for differential network analysis. Our method assesses differential network connectivity by testing the null hypothesis that the network is the same for individuals who have identical covariates and only differ in disease status. We show empirically in a simulation study that the covariateadjusted test exhibits improved typeI error control compared with naïve hypothesis testing procedures that do not account for covariates. We additionally show that there are settings in which our proposed methodology provides improved power to detect differential connections. We illustrate our method by applying it to detect differences in breast cancer gene coexpression networks by subtype.
Introduction
Complex diseases are often associated with aberrations in biological networks, such as gene regulatory networks and brain functional or structural connectivity networks (Barabási et al. 2011). Performing differential network analysis, or identifying connections in biological networks that change with disease condition, can provide insights into the disease mechanisms and lead to the identification of networkbased biomarkers (Ideker and Krogan, 2012; de la Fuente, 2010).
Probabilistic graphical models are commonly used to summarize the conditional independence structure of a set of nodes in a biological network. A common approach to differential network analysis is to first estimate the graph corresponding to each disease condition and then assess between condition differences in the graph. For instance, when using Gaussian graphical models, one can learn the network by estimating the inverse covariance matrix using the graphical LASSO (Friedman et al. 2008); one can then identify changes in the inverse covariance matrix associated with disease condition (Zhao et al. 2014; Xia et al. 2015; He et al. 2019). Alternatively, the conditionspecific networks can be estimated using neighborhood selection (Meinshausen and Bühlmann, 2006); in this approach, partial correlations among nodes are estimated by fitting a series of linear regressions in which one node is treated as the outcome, and the remaining nodes are treated as regressors. Changes in the network can then be delineated from differences in the regression coefficients by disease condition (Belilovsky et al. 2016; Xia et al. 2018). More generally, the conditionspecific networks are often modeled using exponential family pairwise interaction models (Lin et al. 2016; Yang et al. 2015; Yu et al. 2019; Yu et al. 2020).
The approaches to differential network analysis described above may lead to the detection of betweengroup differences in biological networks that are not necessarily meaningful, in particular, when the conditionspecific networks depend on covariates (e.g., age and sex). This is because betweengroup network differences can be induced by confounding variables, i.e., variables that are associated with both the withingroup networks, and the disease condition. In such cases, the network differences by disease condition may only reflect the association between the confounding variable and the disease. It is therefore important to account for the relationship between covariates and biological networks when performing differential network analysis.
In this paper, we propose a twosample test for differential network analysis that accounts for withingroup dependence of the networks on covariates. More specifically, we propose to perform covariateadjusted inference using a class of pairwise interaction models for the withingroup networks. Our approach treats each conditionspecific network as a function of the covariates. It then performs a hypothesis test for equivalence of these functions. To accommodate the highdimensional setting, in which the number of nodes in the network is large relative to the number of samples collected, we propose to estimate the networks using a regularized estimator and to perform hypothesis testing using a biascorrected version of the regularized estimate (van de Geer, 2016).
Our proposal is related to existing literature on modeling networks as functions of a small number of variables. For example, there are various proposals for estimating highdimensional inverse covariance matrices, conditional upon continuous lowdimensional features (Zhou et al. 2010; Wang and Kolar, 2014). Also related are methods for regularized estimation of highdimensional varying coefficient models, wherein the regression coefficients are functions of a small number of covariates (Wang and Xia, 2009). Our method is similar but places a particular emphasis on hypothesis testing in order to assess the statistical significance of observed changes in the network. Our approach lays the foundation for a general class of graphical models and is the first, to the best of our knowledge, to perform covariateadjusted hypothesis tests for differential network analysis.
The rest of the paper is organized as follows. In Section 2, we begin with a broad overview of our proposed framework for covariateadjusted differential network analysis in pairwise interaction exponential family models and introduce some working examples. In the following sections, we specialize our framework by considering two different approaches for estimation and inference: In Section 3, we describe a method that uses neighborhood selection (Meinshausen and Bühlmann, 2006; Chen et al. 2015; Yang et al. 2015), and in Section 4, we discuss an alternative estimation approach that utilizes the score matching framework of Hyvärinen (2005, 2007). We assess the performance of our proposed methodology on synthetic data in Section 5 and apply it to a breast cancer data set from The Cancer Genome Atlas (TCGA) (Weinstein et al. 2013) in Section 6. We conclude with a brief discussion in Section 7.
Overview of the Proposed Framework
Differential Network Analysis without Covariate Adjustment
To formalize our problem, we begin by introducing some notation. We compare networks between two groups, labeled by g ∈{I,II}. We obtain measurements of p variables \(X^{g} = \left ({X^{g}_{1}},\ldots ,{X_{p}^{g}}\right )^{\top }\), corresponding to nodes in a graphical model (Maathuis et al. 2018), on n^{I} subjects in group I and n^{II} subjects in group II. We define as the sample space of X^{g}. Let \(X^{g}_{i,j}\) denote the data for node j for subject i in group g, and let \(\mathbf {X}_{j}^{g} = (X^{g}_{1,j}, \ldots , X^{g}_{n^{g},j})^{\top }\) be an n^{g}dimensional vector of measurements on node j for group g.
Our objective is to determine whether the association between variables X_{j} and X_{k}, conditional upon all other variables, differs by group. Our approach is to specify a model for X^{g} such that the conditional dependence between any two nodes \({X^{g}_{j}}\) and \({X^{g}_{k}}\) can be represented by a single scalar parameter \(\beta ^{g,*}_{j,k}\). If the association between nodes j and k is the same in both groups I and II, \(\beta ^{\mathrm {I},*}_{j,k} = \beta ^{\text {II},*}_{j,k}\). Conversely, if \(\beta ^{\mathrm {I},*}_{j,k} \neq \beta ^{\text {II},*}_{j,k}\), we say nodes j and k are differentially connected. We assess for differential connectivity by performing a test of the null hypothesis
We consider a general class of exponential family pairwise interaction models. For x = (x_{1},…,x_{p})^{⊤}, we assume the density function for X^{g} takes the form
where ψ_{j,k} and μ_{j} are fixed and known functions, β^{g,∗} is a p × p matrix with elements \(\beta ^{g,*}_{j,k}\), and U(β^{g,∗}) is the logpartition function. The dependence between \({X^{g}_{j}}\) and \({X^{g}_{k}}\) is measured by \(\beta ^{g,*}_{j,k}\), and nodes j and k are conditionally independent in group g if and only if \(\beta ^{g,*}_{j,k} = 0\).
This class of exponential family distributions is rich and includes several models that have been studied previously in the graphical modeling literature. One such example is the Gaussian graphical model, perhaps the most widelyused graphical model for continuous data. For the density function for meancentered Gaussian random vectors can be expressed as
and is thus a special case of Eq. 2 with ψ_{j,k} = −x_{j}x_{k} and μ_{j} = 0. The nonnegative Gaussian density, which takes the form of Eq. 3 with the constraint that x takes values in , also belongs to the exponential family class. Another canonical example is the Ising model, commonly used for studying conditional dependencies among binary random variables. For x ∈{0,1}^{p}, the density function for the Ising model can be expressed as
Additional examples include the Poisson model, the exponential graphical model, and conditionallyspecified mixed graphical models (Yang et al. 2015; Chen et al. 2015).
When asymptotically normal estimates of \(\beta ^{\mathrm {I},*}_{j,k}\) and \(\beta ^{\text {II},*}_{j,k}\) are available, one can perform a calibrated test of \(H^{0}_{j,k}\) based on the difference between the estimates. In many cases, asymptotically normal estimates can be obtained using wellestablished methodology. For instance, when the logpartition function U(β^{g,∗}) is available in closed form and is tractable, one can obtain estimates via (penalized) maximum likelihood. This is a standard approach in the Gaussian setting, in which case the logpartition function is easy to compute. However, this is not the case for other exponential family models. Likelihoodbased estimation strategies are thus generally difficult to implement. In this paper, we consider two alternative strategies that have been proposed to overcome these computational challenges and are more broadly applicable.
The first approach we discuss is neighborhood selection (Chen et al. 2015; Meinshausen and Bühlmann, 2006; Yang et al. 2015). Consider a subclass of exponential family graphical models for which the conditional density function for any node \({X^{g}_{j}}\) given the remaining nodes belongs to a univariate exponential family model. Because the logpartition function in univariate exponential family models is available in closed form, it is computationally feasible to estimate each conditional density function. By estimating the conditional density functions, one can identify the neighbors of nodes j, that is, the nodes upon which the conditional distribution depends. This approach was first proposed as an alternative to maximum likelihood estimation for estimating Gaussian graphical models (Meinshausen and Bühlmann, 2006). To describe our approach, we focus on the Gaussian case, though this approach is more widely applicable and can be used for modeling dependencies among, e.g., Poisson, binomial, and exponential random variables as well (Chen et al. 2015; Yang et al. 2015).
In Gaussian graphical models, the dependency of node j on all other nodes can be determined based on the linear model
The regression coefficients \(\beta ^{g,*}_{j,k}\) measure the strength of linear association between nodes j and k conditional upon all other nodes and are zero if and only if nodes j and k are conditionally independent; \(\beta ^{g,*}_{j,0}\) is an intercept term and is zero if all nodes are meancentered. (We acknowledge a slight abuse of notation here, as the regression coefficients in Eq. 4 are not equivalent to parameters in Eq. 2. However, either estimand fully characterizes conditional independence). In the lowdimensional setting (i.e., p ≪ n^{g}), statistically efficient and asymptotically normal estimates of the regression coefficients can be readily obtained via ordinary least squares. In highdimensions (i.e., p ≥ n^{g}), the ordinary least squares estimates are not welldefined, so to obtain consistent estimates we typically rely upon regularized estimators such as the LASSO and the elastic net (Tibshirani, 1996; Zou and Hastie, 2005). Regularized estimators are generally biased and have intractable sampling distributions, and as such, are unsuitable for performing formal statistical inference. However, several methods have recently emerged for obtaining asymptotically normal estimates by correcting the bias of regularized estimators (Javanmard and Montanari, 2014; van de Geer et al. 2014; Zhang and Zhang, 2014).
The second computationally efficient approach we consider is to estimate the density function using the score matching framework of Hyvärinen (2005, 2007). Hyvärinen derives a loss function for estimation of density functions for continuous random variables that is based on the gradient of the logdensity with respect to the observations. As such, the score matching loss does not depend on the logpartition function in exponential family models. Moreover, when the joint distribution for X^{g} belongs to an exponential family model, the loss is quadratic in the unknown parameters, allowing for efficient computation. In low dimensions, the minimizer of the score matching loss is consistent and asymptotically normal. In high dimensions, one can obtain asymptotically normal estimates by minimizing a regularized version of the score matching loss to obtain an initial estimate (Lin et al. 2016; Yu et al. 2019) and subsequently correcting for the bias induced by regularization (Yu et al. 2020).
CovariateAdjusted Differential Network Analysis
We now consider the setting in which the withingroup networks depend on covariates. We denote by W^{g} a qdimensional random vector of covariate measurements for group g, and we define \(\mathcal {W}\) as the sample space of W^{g}. Let \(W^{g}_{i,r}\) refer to the value of covariate r for subject i in group g, and let \({W^{g}_{i}} = (W^{g}_{i,1},\ldots ,W^{g}_{i,q})^{\top }\) be a qdimensional vector containing all covariates for subject i in group g. We assume the number of covariates is small relative to the sample size (i.e., q ≪ n^{g}).
To study the dependence of the withingroup networks on the covariates, we specify a model for the nodes X^{g} given the covariates W^{g} that allows for the internode dependencies to vary as a function of W^{g}. The model defines a function that takes as input a vector of covariates and returns a measure of association between nodes j and k for a subject in group g with identical covariates. One can interpret \(\eta ^{g,*}_{j,k}\) as a conditional version of \(\beta ^{g,*}_{j,k}\), given the covariates.
We assume that \(\eta ^{g,*}_{j,k}\) can be written as a lowdimensional linear basis expansion in W^{g} of dimension d — that is,
where is a map from a set of covariates to its expansion, \(\alpha _{j,k}^{g,*}\) is a ddimensional vector, and 〈⋅,⋅〉 denotes the vector inner product. Let ϕ_{c}(w) refer to the cth element of ϕ(w). One can take the simple approach of specifying ϕ as a linear basis, \(\phi (w) = \left (1, w_{1}, \ldots , w_{q}\right )\) for , though more flexible choices such as polynomial or Bspline bases can also be considered. It may be preferable to specify ϕ so that \(\eta ^{g,*}_{j,k}\) is an additive function of the covariates. This allows one to easily assess the effect of any specific covariate on the network by estimating the subvector of \(\alpha ^{g,*}_{j,k}\) that is relevant to the covariate of interest.
When the association between nodes j and k does not depend on group membership, \(\eta ^{\mathrm {I,*}}_{j,k}(w) = \eta ^{\mathrm {II,*}}_{j,k}(w)\) for all w, and \(\alpha ^{\mathrm {I,*}}_{j,k} = \alpha ^{\mathrm {II,*}}_{j,k}\). In other words, if one subject from group I and another subject from group II have identicallyvalued covariates, the corresponding measure of association between nodes j and k is also the same. In the covariateadjusted setting, we say that nodes j and k are differentially connected if there exists w such that \(\eta ^{\mathrm {I,*}}_{j,k}(w) \neq \eta ^{\mathrm {II,*}}_{j,k}(w)\), or equivalently, if \(\alpha ^{\mathrm {I,*}}_{j,k} \neq \alpha ^{\mathrm {II,*}}_{j,k}\). We can thus assess differential connectivity between nodes j and k by testing the null hypothesis
Similar to the unadjusted setting, when asymptotically normal estimates of \(\alpha ^{\mathrm {I},*}_{j,k}\) and \(\alpha ^{\text {II},*}_{j,k}\) are available, a calibrated test can be constructed based on the difference between the estimates.
We now specify a form for the conditional distribution of X^{g} given W^{g} as a generalization of the exponential family pairwise interaction model Eq. 2. We assume the conditional density for X^{g} given W^{g} can be expressed as
where w = (w_{1},…,w_{q})^{⊤}, and the proportionality is up to a normalizing constant that does not depend on x. Above, ζ_{j,c} is a fixed and known function, and the main effects of the covariates on X^{g} are represented by the scalar parameters \(\theta ^{g,*}_{j,c}\). The conditional dependence between nodes j and k, given all other nodes and given that W^{g} = w is quantified by \(\eta ^{g,*}_{j,k}(w)\), and \(\eta ^{g,*}_{j,k}(w) = 0\) if and only if nodes j and k are conditionally independent at w. One can thus view \(\eta ^{g,*}_{j,k}\) as a conditional version of \(\beta ^{g,*}_{j,k}\) in Eq. 7.
Either of the estimation strategies introduced in Section 2.1 can be used to perform covariateadjusted inference. When the conditional distribution of each node given the remaining nodes and the covariates belongs to a univariate exponential family model, the covariatedependent network can be estimated using neighborhood selection because the node conditional distributions can be estimated efficiently with likelihoodbased methods. Alternatively, we can estimate the conditional density function in Eq. 7 using score matching.
As a working example, we again consider estimation of covariatedependent Gaussian networks using neighborhood selection. Suppose the conditional distribution of X^{g} given W^{g} takes the form
Then the dependencies of node j on all other nodes can be determined based on the following varying coefficient model (Hastie and Tibshirani, 1993):
The varying coefficient model is a generalization of the linear model that treats the regression coefficients as functions of the covariates. In Eq. 9, \(\eta ^{g,*}_{j,k}(w)\) returns a regression coefficient that quantifies the linear relationship between nodes j and k for subjects in group g with covariates equal to w. Then \({X^{g}_{j}}\) and \({X^{g}_{k}}\) are conditionally independent given all other nodes and given W^{g} = w if and only if \(\eta ^{g,*}_{j,k}(w) = 0\). The varying coefficients \(\eta ^{g,*}_{j,k}\) can thus be viewed as a conditional version of the regression coefficients in Eq. 4. (We have again abused the notation, as the varying coefficient functions in Eq. 9 are not equal to the parameters in Eq. 8, though both functions are zero for the same values of w). The intercept term \(\eta ^{g,*}_{j,0}\) accounts for the main effect of W^{g} on \({X^{g}_{j}}\). We can remove this main effect term by first centering the nodes \({X^{g}_{j}}\) about their conditional mean given W^{g} (which can be estimated by performing a linear regression of \({X^{g}_{j}}\) on ϕ(W^{g})).
In Sections 3 and 4, we discuss construction of asymptotically normal estimators of \(\alpha ^{g,*}_{j,k}\) in the low and highdimensional settings using neighborhood selection and score matching. Before proceeding, we first examine the connection between the null hypotheses \(H^{0}_{j,k}\) and \(G^{0}_{j,k}\).
The Relationship between Hypotheses \(H^{0}_{j,k}\) and \(G^{0}_{j,k}\)
Hypotheses \(H^{0}_{j,k}\) in Eq. 1 and \(G^{0}_{j,k}\) in Eq. 6 are related but not equivalent. It is possible that \(H^{0}_{j,k}\) holds while \(G^{0}_{j,k}\) fails and vice versa. We provide an example below. Suppose we are using neighborhood selection to perform differential network analysis in the Gaussian setting, so we are making a comparison of linear regression coefficients between the two groups. Suppose further that the withingroup networks depend on single scalar covariate W^{g}, and the nodes are centered about their conditional mean given W^{g}. One can show that the regression coefficients \(\beta ^{g,*}_{j,k}\) are equal to the average of their conditional versions \(\eta ^{g,*}_{j,k}(W^{g})\). That is, . Now, suppose \(G^{0}_{j,k}\) holds. If W^{I} and W^{II} do not share the same distribution (e.g., the covariate tends to take higher values in group I than in group II), the average conditional internode association may differ, and \(H^{0}_{j,k}\) may not hold. Although the conditional association between nodes, given the covariate, does not differ by group, the average conditional association does differ, as illustrated in Fig. 1a. In such a scenario, the difference in the average conditional association is induced by the dependence of the covariate on group membership and the dependence of the internode association on the covariate. Thus, inequality of \(\beta ^{\mathrm {I},*}_{j,k}\) and \(\beta ^{\text {II},*}_{j,k}\) does not necessarily capture a meaningful association between the network and group membership. Similarly when \(H^{0}_{j,k}\) holds, it is possible that \(\eta ^{\mathrm {I},*}_{j,k} \neq \eta ^{\text {II},*}_{j,k}\). For instance, suppose that the distribution of the covariate is the same in both groups, and in both groups. If the betweennode association depends more strongly upon the covariates in one group than the other, \(G^{0}_{j,k}\) will be false. This example is depicted in Fig. 1b. In this scenario, adjusting for covariates should provide improved power to detect differential connections. We note that for other distributions, it does not necessarily hold that , but regardless, there is generally no equivalence between hypotheses \(H^{0}_{j,k}\) and \(G^{0}_{j,k}\).
CovariateAdjusted Differential Network Analysis Using Neighborhood Selection
In this section, we describe in detail an approach for covariateadjusted differential network analysis using neighborhood selection. To simplify our presentation, we focus on Gaussian graphical models, though this strategy is generally applicable to graphical models for which the node conditional distributions belong to univariate exponential family models.
Covariate Adjustment via Neighborhood Selection in Low Dimensions
We first discuss testing the unadjusted null hypothesis \(H^{0}_{j,k}\) in Eq. 1, where the \(\beta ^{g,*}_{j,k}\) are the regression coefficients in Eq. 4. Suppose, for now, that we are in the lowdimensional setting, so the number of nodes p is smaller than the sample sizes n^{g}, g ∈{I,II}.
It is wellknown that the regression coefficients can be characterized as the minimizers of the expected least squares loss — that is,
One can obtain an estimate \(\hat {\boldsymbol {\beta }}_{j}^{g} = (\hat {\beta }^{g}_{j,1},\ldots ,\hat {\beta }^{g}_{j,p})\) of \(\boldsymbol {\beta }^{g,*}_{j} = (\beta ^{g,*}_{j,1}\ldots .,\beta ^{g,*}_{j,p})\) by minimizing the empirical average of the least squares loss, taking
where ∥⋅∥_{2} denotes the ℓ_{2} norm. The ordinary least squares estimate \(\hat {\boldsymbol {\beta }}^{g}_{j}\) is available in closed form and is easy to compute. The estimates \(\hat {\beta }^{g}_{j,k}\) are unbiased, and, under mild assumptions, are approximately normally distributed for sufficiently large n^{g} — that is,
with \(\tau ^{g}_{j,k} > 0\) (though \(\tau ^{g}_{j,k}\) can be calculated in closed form, we omit the expression for brevity).
We construct a test of \(H_{j,k}^{0}\) based on the difference between the estimates of the groupspecific regression coefficients, \(\hat {\beta }^{\mathrm {I}}_{j,k}  \hat {\beta }^{\text {II}}_{j,k}\). When \(H^{0}_{j,k}\) holds, \(\hat {\beta }^{\mathrm {I}}_{j,k}  \hat {\beta }^{\text {II}}_{j,k}\) is normally distributed with mean zero and variance \(\tau ^{\mathrm {I}}_{j,k} + \tau ^{\text {II}}_{j,k}\). Given a consistent estimate \(\hat {\tau }^{g}_{j,k}\) of the variance, we can use the test statistic
which follows a chisquare distribution with one degree of freedom under the null for n^{I} and n^{II} sufficiently large. A pvalue for \(H^{0}_{j,k}\) can be calculated as
In the lowdimensional setting, performing a covariateadjusted test is similar to performing the unadjusted test. We can obtain an estimate \(\hat {\boldsymbol {\alpha }}^{g}_{j} = \left ((\hat {\alpha }^{g}_{j,1})^{\top },\ldots ,(\hat {\alpha }^{g}_{j,p})^{\top }\right )^{\top }\) of \(\boldsymbol {\alpha }^{g,*}_{j} = \left ((\alpha ^{g,*}_{j,1})^{\top },\ldots ,(\alpha ^{g,*}_{j,p})^{\top }\right )^{\top }\) by minimizing the empirical average of the least squares loss
To simplify the presentation, we introduce additional notation that allows us to rewrite Eq. 10 in a condensed form. Let \(\mathcal {V}^{g}_{k}\) be the n^{g} × d matrix
We can now equivalently express Eq. 10 as
Again, \(\hat {\alpha }^{g}_{j,k}\) is unbiased and approximately normal for sufficiently large n^{g}, satisfying
where \({{\varOmega }}^{g}_{j,k}\) is a positive definite matrix of dimension d × d (though a closed form expression is available, we omit it here for brevity).
We construct a test of \(G^{0}_{j,k}\) based on \(\hat {\alpha }^{\mathrm {I}}_{j,k}  \hat {\alpha }^{\text {II}}_{j,k}\). Under the null hypothesis, \(\hat {\alpha }^{\mathrm {I}}_{j,k}  \hat {\alpha }^{\text {II}}_{j,k}\) follows a normal distribution with mean zero and variance \({{\varOmega }}^{\mathrm {I}}_{j,k} + {{\varOmega }}^{\text {II}}_{j,k}\). Given a consistent estimate \(\hat {{{\varOmega }}}^{g}_{j,k}\) of \({{\varOmega }}^{g}_{j,k}\), we can test \(G^{0}_{j,k}\) using the test statistic
Under the null, the test statistic follows a chisquared distribution with d degrees of freedom, and a pvalue can therefore be calculated as
Covariate Adjustment via Neighborhood Selection in High Dimensions
The methods described in Section 3.1 are only appropriate when the number of nodes p is small relative to the sample size. The model in Eq. 9 has (p − 1)d parameters, so the least squares estimator of Section 3.1 provides stable estimates as long as n^{I} and n^{II} are larger than (p − 1)d. However, in the highdimensional setting, where the the number of parameters exceeds the sample size, the ordinary least squares estimates are not welldefined.
To fit the varying coefficient model Eq. 9 in the highdimensional setting, we use a regularized estimator that relies upon an assumption of sparsity in the networks. The sparsity assumption requires that within each group only a small number of nodes are partially correlated, meaning that in Eq. 9, only a few of the vectors \(\alpha ^{g,*}_{j,k}\) are nonzero. To leverage the sparsity assumption, we propose to use the group LASSO estimator (Yuan and Lin, 2006):
where λ > 0 is a tuning parameter. The group LASSO provides a sparse estimate and sets some \(\tilde {\alpha }_{j,k}\) to be exactly zero, resulting in networks with few edges. The level of sparsity of \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) is determined by λ, with higher λ values forcing more \(\tilde {\alpha }_{j,k}\) to zero. We discuss selection of the tuning parameter in Section 5.1.
Though the group LASSO provides a consistent estimate of \(\boldsymbol {\alpha }^{g,*}_{j}\), the estimate is not approximately normally distributed. The group LASSO estimate of \({\alpha }^{g,*}_{j,k}\) retains a bias that diminishes at the same rate as the standard error. As a result, the group LASSO estimator has a nonstandard sampling distribution that cannot be derived analytically and is therefore unsuitable for hypothesis testing.
We can obtain approximately normal estimates of \(\alpha ^{g,*}_{j,k}\) by correcting the bias of \(\tilde {\alpha }^{g}_{j,k}\), as was first proposed to obtain normal estimates for the classical ℓ_{1}penalized version of the LASSO (van de Geer et al. 2014; Zhang and Zhang, 2014). These “debiased” or “desparsified” estimators can been shown to be approximately normal with moderately large samples even in the highdimensional setting; they are therefore suitable for hypothesis testing. Our approach is to use a debiased version of the group LASSO. Bias correction in group LASSO problems is well studied (van de Geer, 2016; Honda, 2019; Mitra and Zhang, 2016), so we are able to perform covariateadjusted inference by applying previouslydeveloped methods.
The bias of the group LASSO estimate can be written as
where \(\delta ^{g}_{j,k}\) is a nonzero ddimensional vector (recall d is the dimension of \(\alpha ^{g,*}_{j,k}\)). Our approach is to obtain an estimate of the bias \(\tilde {\delta }_{j,k}\) and to use a debiased estimator, defined as
For a suitable choice of \(\tilde {\delta }_{j,k}\), the biascorrected estimator is approximately normal for a sufficiently large sample size n^{g} under mild conditions, i.e.,
where the variance \({{\varOmega }}^{g}_{j,k}\) is a positive definite matrix, for which we obtain an estimate \(\check {{{\varOmega }}}^{g}_{j,k}\). We provide a derivation for the biascorrection and the form of our variance estimate in Appendix ??.
Similar to Section 3.1, we test the null hypothesis \(G^{0}_{j,k}\) in Eq. 6 using the test statistic
The test statistic asymptotically follows a chisquared distribution with d degrees of freedom under the null hypothesis.
CovariateAdjusted Differential Network Analysis Using Score Matching
In this section, we discuss covariateadjustment using the score matching framework introduced in Section 2. We first describe the score matching estimator in greater detail and then specialize the framework to estimation of pairwise exponential family graphical models in the low and high dimensional settings. As shown later in this section, for exponential family distributions with continuous support, the score matching loss function is a quadratic function of parameters, providing a computationallyefficient framework for estimating graphical models.
The Score Matching Framework
We begin by providing a brief summary of the score matching framework (Hyvärinen, 2005; 2007). Let be a random vector generated from a distribution with density function h^{∗}. For any candidate density h, we denote the gradient and Laplician of the logdensity by
The score matching loss L is defined as a measure of divergence between a candidate density function h and the true density h^{∗}:
It is apparent that the score matching loss is minimized when h = h^{∗}. A natural approach to constructing an estimator for h^{∗} would then be to minimize the empirical score matching loss given observations Z_{1},…,Z_{n}, defined as
Because the score matching loss function takes as input the gradient of the log density function, the loss does not depend on the normalizing constant. This makes score matching appealing when the normalizing constant is intractable.
The empirical loss seemingly depends on prior knowledge of h^{∗}. However, if h(z) and ∥∇h(z)∥_{2} both tend to zero as z approaches the boundary of \(\mathcal {Z}\), a partial integration argument can be used to show that the score matching loss can be expressed as
where ‘const.’ is a term that does not depend on h. We can therefore estimate h^{∗} by minimizing an empirical version of the score matching loss that does not depend on h^{∗}. We can express the empirical loss as
The score matching loss is particularly appealing for exponential family distributions with continuous support, as it leads to a quadratic optimization function (Lin et al. 2016). However, when Z is nonnegative, the arguments used to express Eq. 18 as Eq. 19 fail because h(z) and ∥∇h(z)∥_{2} do not approach zero at the boundary. We can overcome this problem by instead considering the generalized score matching framework (Yu et al. 2019; Hyvärinen, 2007) as an extension that is suitable for nonnegative data. Let be positive and differentiable functions, let \(v(z) = \left (v_{1}(z_{1}),\ldots ,v_{p}(z_{p})\right )^{\top }\), let \(\dot {v}_{j}\) denote the derivative of v_{j}, and let ∘ denote the elementwise product operator. The generalized score matching loss is defined as
and is also minimized when h = h^{∗}. As for the original score matching loss in Eq. 18, the generalized score matching loss seemingly depends on prior knowledge of h^{∗}. However, under mild technical conditions on h and v (see Appendix ??), the loss in Eq. 20 can be rewritten as
The generalized score matching loss thus no longer depends on h^{∗}, and an estimator can be constructed by minimizing the empirical version of Eq. 21 with respect to h. To this end, the original generalized score matching estimator considered \(v_{j}(z_{j}) = {z_{j}^{2}}\) (Hyvärinen, 2007). In this case, it becomes necessary to estimate high moments, leading to poor performance of the estimator. It has been shown that by instead taking v as a slowly increasing function, such as \(v_{j}(z_{j}) = \log (1 + v_{j})\), one obtains improved theoretical results and better empirical performance (Yu et al. 2019).
Covariate Adjustment in HighDimensional Exponential Family Models via Score Matching
In this subsection, we discuss construction of asymptotically normal estimators for the parameters of the exponential family pairwise interaction model Eq. 7 using the generalized score matching framework. To simplify our presentation, we consider the setting in which we are only interested in studying the connectedness between one node \({X^{g}_{j}}\) and all other neighboring nodes in the network. To this end, it suffices to estimate the conditional density of \({X^{g}_{j}}\) given all other nodes and the covariates W^{g}. A similar approach to the one we describe below can also be used to estimate the entire joint density in Eq. 7. For simplicity, we assume that in Eq. 7, there exist functions ψ and ζ such that ψ = ψ_{j,k} for all (j,k) and ζ_{j,c} = ζ for all (j,c), and that μ_{j} = 0. For x = (x_{1},…,x_{p})^{⊤} and w = (w_{1},…,w_{q})^{⊤} the conditional density can thus be expressed as
where the density is up to a normalizing constant that does not depend on x_{j}.
We first explicitly define the score matching loss for the conditional density function in Eq. 22. Let \(\boldsymbol {\alpha }^{g,*}_{j} = \left ((\alpha ^{g,*}_{j,1})^{\top }, {\ldots } ,(\alpha ^{g,*}_{j,p})^{\top }\right )^{\top }\), and similarly let \(\boldsymbol {\theta }^{g,*}_{j} = (\theta ^{g,*}_{j,1}, {\ldots } ,\theta ^{g,*}_{j,p})^{\top }\). Let \(\dot {\psi }\) and \(\ddot {\psi }\) denote the first and second derivatives of ψ with respect to x_{j}, and similarly, let \(\dot {\zeta }\) and \(\ddot {\zeta }\) denote the first and second derivatives of ζ with respect to x_{j}. We define a nonnegative function , and let \(\dot {v}_{j}\) denote the first derivative of v_{j}. Then for candidate parameters \(\boldsymbol {\alpha }_{j} = \left (\alpha ^{\top }_{j,1},\ldots ,\alpha ^{\top }_{j,p}\right )^{\top }\) and 𝜃_{j} = (𝜃_{j,1},…,𝜃_{j,d})^{⊤}, the empirical generalized score matching loss for the conditional density of \({X_{j}^{g}}\) given all other nodes and the covariates can be expressed as
The true parameters \(\boldsymbol {\alpha }^{g,*}_{j}\) and \(\boldsymbol {\theta }^{g,*}_{j}\) can characterized as the minimizers of the population score matching loss , as discussed in Section 4.1.
The loss function in Eq. 23 is quadratic in parameters \(\boldsymbol {\alpha }^{g}_{j}\) and \(\boldsymbol {\theta }_{j}^{g}\) and can thus be solved efficiently. When the sample size n^{g} is much larger than the number of unknown parameters (p + 1)d, one can estimate \(\boldsymbol {\alpha }^{g,*}_{j}\) and \(\boldsymbol {\theta }_{j}^{g,*}\) by simply minimizing \(L^{g}_{n,j}\) with respect to the unknown parameters. Moreover, we can readily establish asymptotic normality of the parameter estimates using results from classical Mestimation theory (van der Vaart, 2000). To avoid including cumbersome notation, we reserve the details for Appendix ??.
When the sample size is smaller than the number of parameters, the minimizer of \(L^{g}_{n,j}\) is no longer welldefined. Similar to Section 3.2, we use regularization to obtain a consistent estimator in the highdimensional setting. We define the ℓ_{2}regularized generalized score matching estimator as
where λ > 0 is a tuning parameter. Similar to the group LASSO estimator in Eq. 13, the regularization term in Eq. 24 induces sparsity in the estimate \(\tilde {\boldsymbol {\alpha }}_{j}^{g}\) and sets some \(\tilde {\alpha }^{g}_{j,k}\) to be exactly zero. The tuning parameter controls the level of sparsity, where more vectors \(\tilde {\alpha }^{g}_{j,k}\) are zero for higher λ. In Appendix ??, we establish consistency of the regularized score matching estimator assuming sparsity of \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) and some additional regularity conditions.
As is the case for the group LASSO estimator, the regularized score matching estimator has an intractable limiting distribution because its bias and standard error diminish at the same rate. We can obtain an asymptotically normal estimate by subtracting from the initial estimate an estimate of the bias. In Appendix ??, we construct such a biascorrected estimate \(\check {\alpha }^{g}_{j,k}\) that, for sufficiently large n^{g}, satisfies
for a positive definite matrix \({{\varOmega }}^{g}_{j,k}\). Given biascorrected estimates and a consistent estimate \(\check {{{\varOmega }}}^{g}_{j,k}\) of \({{\varOmega }}^{g}_{j,k}\), we can test the null hypothesis in Eq. 6 using the test statistic
Under the null hypothesis, the test statistic follows a chisquared distribution with d degrees of freedom.
Numerical Studies
In this section, we examine the performance of our proposed test in a simulation study. We consider the neighborhood selection approach described in Section 3. Our simulation study has three objectives: (1) to assess the stability of our estimators for the covariatedependent networks, (2) to examine the effect of sample size on statistical power and typeI error control, and (3) to illustrate that failing to adjust for covariates can in some settings result in poor typeI error control or reduced statistical power.
Implementation
We first discuss implementation of the neighborhood selection approach. The group LASSO estimate in Eq. 13 does not exist in closed form, in contrast to the ordinary least squares estimate in Eq. 12. To solve Eq. 13, we use the efficient algorithm implemented in the publicly available R package gglasso (Yang and Zou, 2015).
The group LASSO estimator requires selection of a tuning parameter λ, which controls the sparsity of the estimate. We select the tuning parameter by performing Kfold crossvalidation, using K = 10 folds. Since the selection of λ is sensitive to the scale of the columns of \(\mathcal {V}_{k}^{g}\) in Eq. 11, we scale the columns by their standard deviations prior to crossvalidating. After fitting the group LASSO with the selected tuning parameter, we convert the estimates back to their original scale by dividing the estimates by the standard deviations of the columns of \(\mathcal {V}_{k}^{g}\).
Simulation Setting
In what follows, we describe our simulation setting. In short, we generate data from the varying coefficient model in Eq. 9, where we treat nodes 1 through (p − 1) as predictors, and treat node p as the response. We first randomly generate data for nodes 1 through (p − 1) in groups I and II from the same multivariate normal distribution. We then construct \(\eta ^{g,*}_{j,k}\) and generate data for two covariates \({W_{i}^{g}} = (W_{i,1}^{g}, W_{i,2}^{g})^{\top }\) so that one covariate acts as a confounding variable, and the other covariate should improve statistical power to detect differential associations after adjustment.
To simulate data for nodes 1 through (p − 1), we first generate a random graph with (p − 1) nodes and an edge density of .05 from a power law distribution with power parameter 5 (Newman, 2003). Denoting the edge set of the graph by E, we generate the (p − 1) × (p − 1) matrix Θ as
with Θ_{j,k} = Θ_{k,j}. Defining by a^{∗} the smallest eigenvalue of Θ, we set Σ = (Θ − (a^{∗}− .1)I)^{− 1}, where I is the identity matrix. We then draw \((X_{i,1}^{g},\ldots ,X_{i,p1}^{g})^{\top }\) from a multivariate normal distribution with mean zero and covariance Σ for i = 1,…,n^{g} for each group g.
We generate \(W^{\mathrm {I}}_{i,1}\) from a Beta(3/2,1) distribution and \(W^{\text {II}}_{i,1}\) from a Beta(1,3/2) distribution. We center and scale both \(W^{\mathrm {I}}_{i,1}\) and \(W^{\text {II}}_{i,1}\) to the (− 1,1) interval. We generate \(W^{\mathrm {I}}_{i,2}\) and \(W_{i,2}^{\text {II}}\) each from a Uniform(− 1,1) distribution.
We consider two different choices for the varying coefficient functions \(\eta ^{g,*}_{j,k}\):

Linear Polynomial:
$$ \begin{array}{@{}rcl@{}} \eta^{\mathrm{I,*}}_{p,1}(w_{1}, w_{2}) = .5 + .5w_{1}; &&\eta^{\mathrm{II,*}}_{p,1}(w_{1}, w_{2}) = .5 + .5w_{1} \\ \eta^{\mathrm{I,*}}_{p,2}(w_{1}, w_{2}) = .5 + .25w_{2}; &&\eta^{\mathrm{II,*}}_{p,2}(w_{1}, w_{2}) = .5 + .75w_{2} \\ \eta^{\mathrm{I,*}}_{p,3}(w_{1}, w_{2}) = 0; &&\eta^{\mathrm{II,*}}_{p,3}(w_{1}, w_{2}) = .5, \end{array} $$and \(\eta ^{g,*}_{p,k} = 0\) for k ≥ 4.

Cubic Polynomial:
$$ \begin{array}{@{}rcl@{}} \eta^{\mathrm{I,*}}_{p,1}(w_{1}, w_{2}) = .5 + .5\left( w_{1} + {w_{1}^{2}} + {w_{1}^{3}}\right); &&\eta^{\mathrm{II,*}}_{p,1}(w_{1}, w_{2}) = .5 + .5\left( w_{1} + {w_{1}^{2}} + {w_{1}^{3}}\right) \\ \eta^{\mathrm{I,*}}_{p,2}(w_{1}, w_{2}) = .5 + .25\left( w_{2} + {w_{2}^{3}}\right); &&\eta^{\mathrm{II,*}}_{p,2}(w_{1}, w_{2}) = .5 + .75\left( w_{2} + {w_{2}^{3}}\right) \\ \eta^{\mathrm{I,*}}_{p,3}(w_{1}, w_{2}) = 0; &&\eta^{\mathrm{II,*}}_{p,3}(w_{1}, w_{2}) = .5, \end{array} $$and \(\eta ^{g,*}_{p,k} = 0\) for k ≥ 4.
The first covariate \(W_{i,1}^{g}\) confounds the association between nodes p and 1. The distribution of \(W_{i,1}^{g}\) depends on group membership, and \(W_{i,1}^{g}\) affects the association between nodes p and 1. However, \(\eta ^{\mathrm {I},*}_{p,1}(w) = \eta ^{\text {II},*}_{p,1}(w)\) for all w. Thus, \(G^{0}_{p,1}\) in Eq. 6 holds while \(H^{0}_{p,1}\) in Eq. 1 fails, as depicted in Fig. 1a. Failing to adjust for \({W_{1}^{g}}\) should therefore result in an inflated typeI error rate for the hypothesis \(G^{0}_{p,1}\). Adjusting for the second covariate \(W_{i,2}^{g}\) should improve the power to detect the differential connection between nodes p and 2. We have constructed \(\eta ^{g,*}_{p,2}\) so that , though the association between nodes p and 2 depends more strongly on W^{g} in group II than in group I. Thus, \(H^{0}_{p,2}\) holds while \(G^{0}_{p,2}\) fails, as depicted in Fig. 1b. The association between nodes p and 3 does not depend on either covariate, though the association differs by group. Thus, one should be able to identify a differential connection using either the adjusted or unadjusted test. Node p is conditionally independent of all other nodes in both groups.
For i = 1,…,n^{g}, we generate \(X^{g}_{i,p}\) as
where \({\epsilon ^{g}_{i}}\) follows a normal distribution with zero mean and unit variance. We use balanced sample sizes n^{I} = n^{II} = n and consider n ∈{80,160,240}. We set the number of nodes p = 40. The graph for nodes 1 through (p − 1) contains 15 edges. Leaving Σ fixed, we generate 400 random data sets following the above approach.
We consider two choices of the basis expansion ϕ:

1.
Linear basis: \(\phi (w_{1}, w_{2}) = \begin {pmatrix} 1 & w_{1} & w_{2} \end {pmatrix}^{\top }\);

2.
Cubic polynomial basis: \(\phi (w_{1}, w_{2}) = \begin {pmatrix} 1 & w_{1} & {w_{1}^{2}} & {w_{1}^{3}} & w_{2} & {w_{2}^{2}} & {w_{2}^{3}} \end {pmatrix}^{\top }\).
Using a linear basis, d = 3, and model in Eq. 9 has 117 parameters. With the cubic polynomial basis, d = 7, and there are 273 parameters.
We compare our proposed methodology with the approach for differential network analysis without covariate adjustment described in Section 3.1. In the unadjusted analysis, ordinary least squares estimation is justified because although (p − 1)d is large with respect to n, (p − 1) is smaller than n.
Simulation Results
Figure 2 shows the Monte Carlo estimates of the expected ℓ_{2} error for the debiased group LASSO estimates \(\check {\alpha }^{g}_{p,k}\), , for k = 1,…,(p − 1). We only report the ℓ_{2} error when the basis ϕ is correctly specified for the varying coefficient function \(\eta ^{g,*}_{p,k}\) — that is, when ϕ is linear basis, and \(\eta ^{g,*}_{p,k}\) is a linear function or when ϕ is a cubic basis, and \(\eta ^{g,*}_{p,k}\) is a cubic function. In both the linear and cubic polynomial settings, the average ℓ_{2} estimation error for \(\alpha ^{g,*}_{p,k}\) decreases with the sample size for all k, as expected. We also find that in small samples, the estimation error is substantially lower in the linear setting than in the cubic setting. This suggests that estimates are less stable in more complex models.
In Table 1, we report Monte Carlo estimates of the probability of rejecting \(G^{0}_{p,k}\), the null hypothesis that nodes p and k are not differentially connected given W^{g}, for k = 1, k = 2, k = 3, and k ≥ 4, using both the adjusted and unadjusted tests at the significance level κ = .05. As the purpose of the simulation study is to examine the behavior of the edgewise test, we do not perform a multiple testing correction.
For k = 1 (i.e., when \(H^{0}_{p,k}\) fails, but \(G^{0}_{p,k}\) holds), the unadjusted test is anticonservative, and the probability of falsely rejecting \(G^{0}_{p,k}\) increases with the sample size. When an adjusted test is performed using a linear basis, and when \(\eta ^{g,*}_{p,1}\) is linear, the typeI error rate is slightly inflated but appears to approach the nominal level of .05 as the sample size increases. However, when \(\eta ^{g,*}_{p,1}\) is a cubic function, and the linear basis is misspecified, the typeI error rate is inflated, though it is still slightly lower than that of unadjusted test. For both specifications of \(\eta ^{g,*}_{p,1}\), the covariateadjusted test controls the typeI error rate near the nominal level when a cubic polynomial basis is used. For k = 2, (i.e., when \(H^{0}_{p,k}\) holds, but \(G^{0}_{p,k}\) fails), the unadjusted test exhibits low power to detect differential associations. The adjusted test provides greatly improved power when either a linear or cubic basis is used. For k = 3, (i.e., when both \(H^{0}_{p,k}\) and \(G^{0}_{p,k}\) fail), the unadjusted test and both adjusted tests are wellpowered against the null. For k ≥ 4 (i.e., when nodes p and k are conditionally independent in both groups), the unadjusted test and the adjusted test with a linear basis both control the typeI error near the nominal level. However, the covariateadjusted test is conservative when a cubic basis is used.
The simulation results corroborate our expectations and suggest that there are potential benefits to covariate adjustment. We find that when the sample size is large, the covariateadjusted test behaves reasonably well with either choice of basis function. However, in small samples, the covariate adjusted test is somewhat imprecise, and the typeI error rate can be slightly above or below the nominal level. Practitioners should therefore exercise caution when using our proposed methodology in very small samples.
Data Example
Breast cancer classification based on expression of estrogen receptor hormone (ER) is prognostic of clinical outcomes. Breast cancers can be classified as estrogen receptor positive (ER+) and estrogen receptor negative (ER), with approximately 70% of breast cancers being ER+ (Lumachi et al. 2013). In ER+ breast cancer, the cancer cells require estrogen to grow; this has been shown to be associated with positive clinical outcomes, compared with ER breast cancer (Carey et al. 2006). Identifying differences between the biological pathways of ER+ and ER breast cancers can be helpful for understanding the underlying disease mechanisms.
It is has been shown that age is associated with ER status and that age can be associated with gene expression (Khan et al. 1998; Yang et al. 2015). This warrants consideration of age as an adjustment variable in a comparison of gene coexpression networks between ER groups.
We perform an ageadjusted differential analysis of the ER+ and ER breast cancer networks, using publicly available data from The Cancer Genome Atlas (TCGA) (Weinstein et al. 2013). We obtain clinical measurements and gene expression data from a total of 806 ER+ patients and 237 ER patients. We consider the set of p = 145 genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) breast cancer pathway (Kanehisa and Goto, 2000), and adjust for age as our only covariate. The average age in the ER+ plus group is 59.3 years (SD = 13.3), and the average age in the ER group is 55.9 years (SD = 12.4). We use a linear basis for covariate adjustment. In the ER+ group, the sample size is considerably larger than the number of the parameters, so we can fit the varying coefficient model in Eq. 9 using ordinary least squares. We use the debiased group LASSO to estimate the network for the ER group because the sample size is smaller than the number of model parameters. We compare the results from the covariateadjusted analysis with the unadjusted approach described in Section 3.1.
To assess for differential connectivity between any two nodes j and k, we can either treat node j or node k as the response in the varying coefficient model in Eq. 9. We can then test either of the hypotheses \(G^{0}_{j,k}:\alpha ^{\mathrm {I},*}_{j,k} = \alpha ^{\text {II},*}_{j,k}\) or \(G^{0}_{k,j}:\alpha ^{\mathrm {I},*}_{k,j} = \alpha ^{\text {II},*}_{k,j}\). Our approach is to set our pvalue for the test for differential connectivity between nodes j and k as the minimum of the pvalues for the tests of \(G^{0}_{j,k}\) and \(G^{0}_{k,j}\), though we acknowledge that this strategy is anticonservative.
Our objective is to identify all pairs of differentially connected genes, so we need to adjust for the fact that we perform a separate hypothesis test for each gene pair. We account for multiplicity by controlling the false discovery rate at the level κ = .05 using the BenjaminiYekutieli method (2001).
The differential networks obtained from the unadjusted and adjusted analyses are substantially different. We report 106 differentially connected edges from the adjusted analysis (shown in Fig. 3), compared to only two such edges from the unadjusted analysis. This suggests it is possible that relationship between the gene coexpression network and age differs by ER group.
Discussion
In this paper, we have addressed challenges that arise when performing differential network analysis (Shojaie, 2020) in the setting where the network depends on covariates. Using both synthetic and real data, we showed that accounting for covariates can result in better control of typeI error and improved power.
We propose a parsimonious approach for covariate adjustment in differential network analysis. A number of improvements and extensions can be made to our current work. First, while this paper focuses on differential network analysis in exponential family models, our framework can be applied to other models where conditional dependence between any pair of nodes can be represented by a single scalar parameter. This includes semiparametric models such as the nonparanormal model (Liu et al. 2009), as well as distributions defined over complex domains, which can be modeled using the generalized score matching framework (Yu et al. 2021). Additionally, we only discuss testing edgewise differences between the networks, though testing differences between subnetworks may also be of interest. When the subnetworks are lowdimensional, one can construct a chisquared test using similar test statistics as presented in Sections 3 and 4 because joint asymptotic normality of a lowdimensional set of the estimators \(\check {\alpha }^{g}_{j,k}\) can be readily established. Such an approach is not applicable to highdimensional subnetworks, but it may be possible to construct a calibrated test using recent results on simultaneous inference in highdimensional models (Zhang and Cheng, 2017; Yu et al. 2020). We can also improve the statistical efficiency of the network estimates by considering joint estimation procedures that borrow information across groups (Guo et al. 2011; Danaher et al. 2014; Saegusa and Shojaie, 2016). Finally, we assume that the relationship between the network and the covariates can be represented by a lowdimensional basis expansion. Investigating nonparametric approaches that relax this assumption can be a fruitful area of research.
Data Availability
This findings of this paper are supported by data from The Cancer Genome Atlas, which are accessible using the publicly available R package RTCGA.
Code availability
An implementation of the proposed methodology is available at https://github.com/awhudson/CovDNA.
References
Barabási, A.L., Gulbahce, N. and Loscalzo, J. (2011). Network medicine: A networkbased approach to human disease. Nat. Rev. Genet. 12, 56–68.
Belilovsky, E., Varoquaux, G. and Blaschko, M.B. (2016). Testing for differences in Gaussian graphical models: Applications to brain connectivity In: Advances in neural information processing systems, vol. 29. Curran Associates Inc,New York.
Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals Stat. 1165–1188.
Breheny, P. and Huang, J. (2009). Penalized methods for bilevel variable selection. Stat. Interf. 2, 369.
Bühlmann, P. and van de Geer, S. (2011). Statistics for highdimensional data: Methods, theory and applications. Springer Science & Business Media, Berlin.
Carey, L.A., Perou, C.M., Livasy, C.A., Dressler, L.G., Cowan, D., Conway, K., Karaca, G., Troester, M.A., Tse, C.K., Edmiston, S. et al. (2006). Race, breast cancer subtypes, and survival in the carolina breast cancer study. J. Am. Med. Assoc. 295, 2492–2502.
Chen, S., Witten, D.M. and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika 102, 47–64.
Danaher, P., Wang, P. and Witten, D.M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Series B 76, 373–397.
Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.
de la Fuente, A. (2010). From ‘differential expression’ to ‘differential networking’–identification of dysfunctional regulatory networks in diseases. Trends Genet. 26, 326–333.
van de Geer, S. (2016). Estimation and testing under sparsity. Lect. Notes Math. 2159.
van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for highdimensional models. Ann. Stat. 42, 1166–1202.
Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98, 1–15.
Hastie, T. and Tibshirani, R. (1993). Varyingcoefficient models. J. R. Stat. Soc. Series B 55, 757–779.
He, H., Cao, S., Zhang, J.G., Shen, H., Wang, Y.P. and Deng, H. (2019). A statistical test for differential network analysis based on inference of Gaussian graphical model. Scientif. Rep. 9, 1–8.
Honda, T. (2019). The debiased group lasso estimation for varying coefficient models. Ann. Inst. Stat. Math. 1–27.
Hyvärinen, A. (2005). Estimation of nonnormalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709.
Hyvärinen, A. (2007). Some extensions of score matching. Comput. Stat. Data Anal. 51, 2499–2512.
Ideker, T. and Krogan, N.J. (2012). Differential network biology. Molecular Systems Biology 8(1).
Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for highdimensional regression. J. Mach. Learn. Res. 15, 2869–2909.
Kanehisa, M. and Goto, S. (2000). Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30.
Khan, S.A., Rogers, M.A., Khurana, K.K., Meguid, M.M. and Numann, P.J. (1998). Estrogen receptor expression in benign breast epithelium and breast cancer risk. J. Natl. Cancer Inst. 90, 37–42.
Lin, L., Drton, M. and Shojaie, A. (2016). Estimation of highdimensional graphical models using regularized score matching. Electron. J. Stat. 10, 806–854.
Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10, 2295–2328.
Lumachi, F., Brunello, A., Maruzzo, M., Basso, U. and Mm Basso, S. (2013). Treatment of estrogen receptorpositive breast cancer. Curr. Med. Chem. 20, 596–604.
Maathuis, M., Drton, M., Lauritzen, S. and Wainwright, M. (2018). Handbook of graphical models. CRC Press, Boca Raton.
Meinshausen, N. and Bühlmann, P. (2006). Highdimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462.
Mitra, R. and Zhang, C.H. (2016). The benefit of group sparsity in group inference with debiased scaled group lasso. Electron. J. Stat. 10, 1829–1873.
Negahban, S.N., Ravikumar, P., Wainwright, M.J. and Yu, B. (2012). A unified framework for highdimensional analysis of mestimators with decomposable regularizers. Stat. Sci. 27, 538–557.
Newman, M.E. (2003). The structure and function of complex networks. SIAM Rev. 45, 167–256.
Saegusa, T. and Shojaie, A. (2016). Joint estimation of precision matrices in heterogeneous populations. Electron. J. Stat. 10, 1341–1392.
Shojaie, A. (2020). Differential network analysis: A statistical perspective. Wiley Interdisciplinary Reviews: Computational Statistics e1508.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B 58, 267–288.
van der Vaart, A.W. (2000). Asymptotic statistics, 3. Cambridge University Press, Cambridge.
Wang, H. and Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. J. Am. Stat. Assoc. 104, 747–757.
Wang, J. and Kolar, M. (2014). Inference for sparse conditional precision matrices. arXiv:1412.7638.
Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C. and Stuart, J.M. (2013). The cancer genome atlas pancancer analysis project. Nat. Genet. 45, 1113–1120.
Xia, Y., Cai, T. and Cai, T.T. (2015). Testing differential networks with applications to the detection of genegene interactions. Biometrika 102, 247–266.
Xia, Y., Cai, T. and Cai, T.T. (2018). Twosample tests for highdimensional linear regression with an application to detecting interactions. Stat. Sin. 28, 63–92.
Yang, E., Ravikumar, P., Allen, G.I. and Liu, Z. (2015). Graphical models via univariate exponential family distributions. J. Mach. Learn. Res. 16, 3813–3847.
Yang, J., Huang, T., Petralia, F., Long, Q., Zhang, B., Argmann, C., Zhao, Y., Mobbs, C.V., Schadt, E.E., Zhu, J. et al. (2015). Synchronized agerelated gene expression changes across multiple tissues in human and the link to complex diseases. Sci. Rep. 5, 1–16.
Yang, Y. and Zou, H. (2015). A fast unified algorithm for solving grouplasso penalize learning problems. Stat. Comput. 25, 1129–1141.
Yu, M., Gupta, V. and Kolar, M. (2020). Simultaneous inference for pairwise graphical models with generalized score matching. J. Mach. Learn. Res. 21, 1–51.
Yu, S., Drton, M. and Shojaie, A. (2019). Generalized score matching for nonnegative data. J. Mach. Learn. Res. 20, 1–70.
Yu, S., Drton, M. and Shojaie, A. (2021). Generalized score matching for general domains. Information and inference: A Journal of the IMA.
Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Series B 68, 49–67.
Zhang, C.H. and Zhang, S.S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Series B 76, 217–242.
Zhang, X. and Cheng, G. (2017). Simultaneous inference for highdimensional linear models. J. Am. Stat. Assoc. 112, 757–768.
Zhao, S.D., Cai, T.T. and Li, H. (2014). Direct estimation of differential networks. Biometrika 101, 253–268.
Zhou, S., Lafferty, J. and Wasserman, L. (2010). Time varying undirected graphs. Mach. Learn. 80, 295–319.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B 67, 301–320.
Funding
The authors gratefully acknowledge the support of the NSF Graduate Research Fellowship Program under grant DGE1762114 as well as NSF grant DMS1561814 and NIH grant R01GM114029. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Debiased Group LASSO Estimator
In this subsection, we derive a debiased group LASSO estimator. Our construction is essentially the same as the one presented in van de Geer (2016).
With \(\mathcal {V}_{j}\) as defined in Eq. 11, let \(\mathcal {V}_{j}^{g} = \left (\mathcal {V}^{g}_{1},\ldots ,\mathcal {V}^{g}_{j1}, \mathcal {V}^{g}_{j+1},\ldots , \mathcal {V}^{g}_{p}\right )\) be an n × (p − 1)d dimensional matrix. For , let \(\boldsymbol {\alpha }_{j} = \left (\alpha _{j,1}^{\top }, \ldots , \alpha _{j,p}^{\top }\right )^{\top }\), let \(\mathcal {P}_{j}\left (\boldsymbol {\alpha }_{j} \right ) = {\sum }_{k \neq j} \left \ \alpha _{j,k} \right \_{2}\), and let \(\nabla \mathcal {P}_{j}\) denote the subgradient of \(\mathcal {P}_{j}\). We can express the subgradient as \(\nabla \mathcal {P}_{j}(\boldsymbol {\alpha }_{j}) =\\ \left ((\nabla \\alpha _{j,1}\_{2})^{\top }, \cdots , (\nabla \\alpha _{j,p}\_{2})^{\top } \right )^{\top }\) where ∇∥α_{j,k}∥_{2} = α_{j,k}/∥α_{j,k}∥_{2} if ∥α_{j,k}∥_{2}≠ 0, and ∇∥α_{j,k}∥_{2} is otherwise a vector with ℓ_{2} norm less than one. The KKT conditions for the group LASSO imply that the estimate \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) satisfies
With some algebra, we can rewrite this as
Let Σ_{j} be defined as the matrix
and let \(\tilde {M}_{j}\) be an estimate of \({{\varSigma }}_{j}^{1}\). We can write \(\left (\tilde {\boldsymbol {\alpha }}^{g}_{j}  \tilde {\boldsymbol {\alpha }}^{g,*}_{j}\right )\) as
The first term (i) in Eq. A.1 is an approximation for the bias of the group LASSO estimate. This term is a function only of the observed data and not of any unknown quantities. This term can therefore be directly added to the initial estimate \(\tilde {\boldsymbol {\alpha }}_{j}^{g}\). If \(\tilde {M}_{j}\) is a consistent estimate of \({{\varSigma }}_{j}^{1}\), the second term (ii) is asymptotically equivalent to
Thus, (ii) is asymptotically equivalent to a sample average of mean zero i.i.d. random variables. The central limit theorem can then be applied to establish convergence in distribution to the multivariate normal distribution at an n^{1/2} rate for any lowdimensional subvector. The third term will also be asymptotically negligible if \(\tilde {M}_{j}\) is an approximate inverse of \((n^{g})^{1}\left (\mathcal {V}_{j}^{g}\right )^{\top }\mathcal {V}^{g}_{j}\). This would suggest that an estimator of the form
will be asymptotically normal for an appropriate choice of \(\tilde {M}_{j}\).
Before describing our construction of \(\tilde {M}_{j}\), we find it helpful to consider an alternative expression for \({{\varSigma }}^{1}_{j}\). We define the d × d matrices \({{\varGamma }}^{*}_{j,k,l}\) as
We also define the d × d matrix \({C}^{*}_{j,k}\) as
It can be shown that \({{\varSigma }}^{1}_{j}\) can be expressed as
We can thus estimate \({{\varSigma }}_{j}^{1}\) by performing a series of regressions to estimate each matrix \({{\varGamma }}^{*}_{j,k,l}\).
Following the approach of van de Geer et al. (2014), we use a group LASSO variant of the nodewise LASSO to construct \(\tilde {M}_{j}\). To proceed, we require some additional notation. For any d × d matrix Γ = (γ_{1},…,γ_{d}) for d −dimensional vectors γ_{c}, let \(\{{\varGamma }} \_{2,*} = {\sum }_{c = 1}^{d} \\gamma _{c}\_{2}\). Let \(\nabla \ {{\varGamma }} \_{2,*} = \left (\gamma _{1}/\\gamma _{1}\_{2},\ldots ,\gamma _{d}/ \\gamma _{d}\_{2} \right )\) be the subgradient of ∥Γ∥_{2,∗}. We use the group LASSO to obtain estimates \(\tilde {{{\varGamma }}}_{j,k,l}\) of \({{\varGamma }}^{*}_{j,k,l}\):
We then estimate \(C^{*}_{j,k}\) as
Our estimate \(\tilde {M}_{j}\) takes the form
With this construction of \(\tilde {M}_{j}\), we can establish a bound on the remainder term (iii) in Eq. A.1. To show this, we make use of the following lemma, which states a special case of the dual norm inequality for the group LASSO norm \(\mathcal {P}_{j}\) (see, e.g., Chapter 6 of van de Geer (2016)).
Lemma 1.
Let a_{1},…,a_{p} and b_{1},…,b_{p} be ddimensional vectors, and let \(\mathbf {a} = \left (a_{1}^{\top },\ldots ,a_{p}^{\top }\right )^{\top }\) and \(\mathbf {b} = \left (b_{1}^{\top },\dots ,b_{p}^{\top }\right )^{\top }\) be pddimensional vectors. Then
The KKT conditions for Eq. A.3 imply that for all l≠j,k
Lemma 1 and Eq. A.4 imply that
where \(\\cdot \_{\infty }\) is the \(\ell _{\infty }\) norm. With \(\omega \asymp \left \{\log (p)/n\right \}^{1/2}\), \(\tilde {M}_{j}\) can be shown to be consistent under sparsity of \({{\varGamma }}^{*}_{j,k,l}\) (i.e., only a few matrices \({{\varGamma }}^{*}_{j,k,l}\) have some nonzero columns) and some additional regularity conditions. Additionally, it can be shown under sparsity of α^{g,∗} (i.e., very few vectors \(\alpha ^{g,*}_{j,k}\) are nonzero) and some additional regularity conditions that \(\mathcal {P}_{j}\left (\tilde {\boldsymbol {\alpha }}_{j}^{g}  \boldsymbol {\alpha }_{j}^{g,*} \right ) = O_{P}\left (\left \{\log (p)/n \right \}^{1/2}\right )\). Thus, a scaled version of the remainder term (iii) is o_{P}(n^{− 1/2}) if \(n^{1/2}\log (p) \to 0\). We refer readers to Chapter 8 of Bühlmann and van de Geer (2011) for a more comprehensive discussion of assumptions required for consistency of the group LASSO.
We now express the debiased group LASSO estimator for \(\alpha ^{g,*}_{j,k}\) as
We have established that \(\check {\alpha }^{g}_{j,k}\) can be written as
As stated above, the central limit theorem implies asymptotic normality of \(\check {\alpha }^{g}_{j,k}\).
We now construct an estimate for the variance of \(\check {\alpha }^{g}_{j,k}\). Suppose the residual \(\mathbf {X}^{g}_{j}  \mathcal {V}^{g}_{j} \boldsymbol {\alpha }^{g,*}_{j}\) is independent of \(\mathcal {V}^{g}\), and let \({\tau _{j}^{g}}\) denote the residual variance
We can approximate the variance of \(\check {\alpha }^{g}_{j,k}\) as
As \({\tau _{j}^{g}}\) is typically unknown, we instead us the estimate
where \(\widehat {df}\) is an estimate of the degrees of freedom for the group LASSO estimate \(\tilde {\boldsymbol {\alpha }}_{j}^{g}\). In our implementation, we use the estimate proposed by Breheny and Huang (2009). Let \(\tilde {\alpha }^g_{j,k,l}\) be the lth element of \(\tilde {\alpha }^g_{j,k}\), and let \(\mathcal {V}^g_{k,l}\) denote the lth column of \(\mathcal {V}^g_k\). We then define
and estimate the degrees of freedom as
Appendix B: Generalized Score Matching Estimator
In this section, we establish consistency of the regularized score matching estimator and derive a biascorrected estimator.
B.1 Form of Generalized Score Matching Loss
Below, we restate Theorem 3 of Yu et al. (2019), which provides conditions under which the score matching loss in Eq. 20 can be expressed as Eq. 21.
Theorem 1.
Assume the following conditions hold:
where the prime symbol denotes the elementwise derivative. Then Eqs. 20 and 21 are equivalent up to an additive constant that does not depend on h.
B.2 Generalized Score Matching Estimator in Low Dimensions
In this section, we provide an explicit form for the generalized score matching estimator in the lowdimensional setting and state its limiting distribution. We first introduce some additional notation below that allows for the generalized score matching loss to be written in a condensed form. Recall the form of the conditional density for the pairwise interaction model in Eq. 22. We define
Let \(\boldsymbol {\alpha }_{j} = \left (\alpha _{j,1}^{\top }, \ldots ,\alpha _{j,p}^{\top }\right )^{\top }\) for and 𝜃_{j} = (𝜃_{j,1},…,𝜃_{j,d})^{⊤} for . We can express the empirical score matching loss Eq. 23 as
We write the gradient of the risk function as
Thus, the minimizer \((\hat {\boldsymbol {\alpha }}^{g}_{j}, \hat {\boldsymbol {\theta }}^{g}_{j})\) of the empirical loss takes the form
By applying Theorem 5.23 of van der Vaart (2000),
where the matrices A and B are defined as
We estimate the variance of \((\hat {\boldsymbol {\alpha }}^{g}_{j}, \hat {\boldsymbol {\theta }}^{g}_{j})\) as \(\hat {{{\varOmega }}}^{g}_{j} = \left (n^{g}\right )^{1}\hat {A} \hat {B} \hat {A}\), where
B.3 Consistency of Regularized Generalized Score Matching Estimator
In this subsection, we argue that the regularized generalized score matching estimators \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) and \(\tilde {\boldsymbol {\theta }}^{g}_{j}\) from Eq. 24 are consistent. Let \(\mathcal {P}_{j}(\boldsymbol {\alpha }_{j}) = {\sum }_{j=1}^{p} \\alpha _{j,k}\_{2}\). We establish convergence rates of \(\mathcal {P}_{j}\left (\tilde {\boldsymbol {\alpha }}_{j}^{g}  \boldsymbol {\alpha }_{j}^{g,*} \right )\) and \(\left \\tilde {\boldsymbol {\theta }}^{g}_{j}  \boldsymbol {\theta }_{j}^{g,*} \right \_{2}\). Our approach is based on proof techniques described in Bühlmann and van de Geer (2011).
Our result requires a notion of compatibility between the penalty function \(\mathcal {P}_{j}\) and the loss \(L^{g}_{n,j}\). Such notions are commonly assumed in the highdimensional literature. Below, we define the compatibility condition.
Definition 1 (Compatibility Condition).
Let S be a set containing indices of the nonzero elements of \(\boldsymbol {\alpha }_{j}^{g,*}\), and let \(\bar {S}\) denote the complement of S. Let be a (p − 1)ddimensional vector where the rth element is one if r ∈ S, and zero otherwise. The group LASSO compatibility condition holds for the index set S ⊂{1,…,p} and for constant C > 0 if for all ,
where ∘ is the elementwise product operator.
Theorem 2.
Let \(\mathcal {E}\) be the set
for some λ_{0} ≤ λ/2. Suppose the compatibility condition also holds. Then on the set \(\mathcal {E}\),
Proof Proof of Theorem 2.
The regularized score matching estimator \(\tilde {\boldsymbol {\alpha }}_{j}^{g}\) necessarily satisfies the following basic inequality:
With some algebra, this inequality can be rewritten as
By Lemma 1, on the set \(\mathcal {E}\) and using λ ≥ λ_{0}/2 we get
On the left hand side, we apply the triangle inequality to get
On the right hand side, we observe that
We then have
Now,
where we use the compatiblility condition for the first inequality, and for the second inequality use the fact that
for any . The conclusion follows immediately. □
If the event \(\mathcal {E}\) occurs with probability tending to one, Theorem 2 implies
We select λ so that the event \(\mathcal {E}\) occurs with high probability. For instance, suppose the elements of the matrix
are subGaussian, and consider the event
where \(\\cdot \_{\infty }\) is the \(\ell _{\infty }\) norm. Observing that \(\mathcal {E} \subset \bar {\mathcal {E}}\), it is only necessary to show that \(\bar {\mathcal {E}}\) holds with high probability. It is shown in Corollary 2 of Negahban et al. (2012) that there exist constants u_{1},u_{2} > 0 such that with \(\lambda _{0} \asymp \{\log (p)/n\}^{1/2}\), \(\bar {\mathcal {E}}\) holds with probability at least \(1  u_{1}p^{u_{2}}\). Thus, \(\mathcal {E}\) occurs with probability tending to one as \(p \to \infty \). For distributions with heavier tails, a larger choice of λ may be required (Yu et al. 2019).
B.4 Debiased Score Matching Estimator
The KKT conditions for the regularized score matching loss imply that the estimator \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) satisfies
With some algebra, we can rewrite the KKT conditions as
Now, let Σ_{j,n} be the matrix
let , and let \(\tilde {M}_{j}\) be an estimate of \({{\varSigma }}_{j}^{1}\). We can now rewrite the KKT conditions as
As is the case for the debiased group LASSO in Appendix ??, the first term (i) in Eq. B.1 depends only on the observed data and can be directly subtracted from the initial estimate. The second term (ii) is asymptotically equivalent to
if \(\tilde {M}_{j}\) is a consistent estimate of \({{\varSigma }}_{j}^{1}\). Using the fact that , it can be seen that Eq. B.2 is an average of i.i.d. random quantities with mean zero. The central limit theorem then implies that any lowdimensional subvector is asymptotically normal. The last term (iii) is asymptotically negligible if \(\tilde {M}_{j}\) is an approximate inverse of Σ_{j,n} and if \((\tilde {\boldsymbol {\alpha }}_{j}^{g}, \tilde {\boldsymbol {\theta }}_{j}^{g})\) is consistent for \((\boldsymbol {\alpha }_{j}^{g,*}, \boldsymbol {\theta }_{j}^{g,*})\). Thus, for an appropriate choice of \(\tilde {M}_{j}\), we expect asymptotic normality of an estimator of the form
Before constructing \(\tilde {M}_{j}\), we first provide an alternative expression for \({{\varSigma }}_{j}^{1}\). We define the d × d matrices \({{\varGamma }}^{*}_{j,k,l}\) and \({{\varDelta }}^{*}_{j,k}\) as
We also define the d × d matrices \({{\varLambda }}^{*}_{j,k}\) as
Additionally, we define the d × d matrices \(C^{*}_{j,k}\) and \(D^{*}_{j}\)
It can be shown that \({{\varSigma }}_{j}^{1}\) can be expressed as
We can thus estimate \({{\varSigma }}_{j}^{1}\) by estimating each of the matrices \({{\varGamma }}^{*}_{j,k,l}\), \({{\varLambda }}^{*}_{j,k}\), and \({{\varDelta }}^{*}_{j,k}\).
Similar to our discussion of the debiased group LASSO in Appendix ??, we use a grouppenalized variant of the nodewise LASSO to construct \(\tilde {M}_{j}\). We estimate \({{\varGamma }}^{*}_{j,k,l}\) and \({{\varDelta }}^{*}_{j,k}\) as
where ω_{1},ω_{2} > 0 are tuning parameters, and ∥⋅∥_{2,∗} is as defined in Appendix ??. We estimate \({{\varLambda }}^{*}_{j,k}\) as
Additionally, we define the d × d matrices \(\tilde {C}_{j,k}\) and \(\tilde {D}_{j}\)
We then take \(\tilde {M}_{j}\) as
When \({{\varGamma }}^{*}_{j,k,l}\), \({{\varDelta }}^{*}_{j,k}\), and \({{\varLambda }}^{*}_{j,k}\) satisfy appropriate sparsity conditions and some additional regularity assumptions, \(\tilde {M}_{j}\) is a consistent estimate of \({{\varSigma }}_{j}^{1}\) for \(\omega _{1} \asymp \{\log (p)/n\}^{1/2}\) and \(\omega _{2} \asymp \{\log (p)/n\}^{1/2}\) (see, e.g., Chapter 8 of Bühlmann and van de Geer (Bühlmann and van de Geer, 2011) for a more comprehensive discussion). Using the same argument presented in Appendix ??, we are able to obtain the following bound on a scaled version of the remainder term (iii):
The remainder is o_{P}(n^{− 1/2}) and hence asymptotically negligible if n^{1/2} \(\max \limits \{\omega _{1}, \omega _{2}\} \lambda \to 0\), where λ is the tuning parameter for the regularized score matching estimator (see Theorem 2).
The debiased estimate \(\check {\alpha }^{g}_{j,k}\) of \(\alpha ^{g,*}_{j,k}\) can be expressed as
The difference between the debiased estimator \(\check {\alpha }^{g}_{j,k}\) and the true parameter \(\alpha ^{g,*}_{j,k}\) can be expressed as
As discussed above, the central limit theorem implies asymptotic normality of \(\check {\alpha }^{g}_{j,k}\). We can estimate the asymptotic variance of \(\check {\alpha }^{g}_{j,k}\) as
where we define
Rights and permissions
About this article
Cite this article
Hudson, A., Shojaie, A. CovariateAdjusted Inference for Differential Analysis of HighDimensional Networks. Sankhya A 84, 345–388 (2022). https://doi.org/10.1007/s13171021002525
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171021002525
Keywords
 Differential network
 Confounding
 Highdimensional
 Penalized likelihood
 Debiased LASSO
 Exponential family
PACS Nos
 62H22 (primary); 62J07 (secondary)