Skip to main content

Covariate-Adjusted Inference for Differential Analysis of High-Dimensional Networks


Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariates on both the disease condition and the network. To address this issue, we propose a general covariate-adjusted test for differential network analysis. Our method assesses differential network connectivity by testing the null hypothesis that the network is the same for individuals who have identical covariates and only differ in disease status. We show empirically in a simulation study that the covariate-adjusted test exhibits improved type-I error control compared with naïve hypothesis testing procedures that do not account for covariates. We additionally show that there are settings in which our proposed methodology provides improved power to detect differential connections. We illustrate our method by applying it to detect differences in breast cancer gene co-expression networks by subtype.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3

Data Availability

This findings of this paper are supported by data from The Cancer Genome Atlas, which are accessible using the publicly available R package RTCGA.

Code availability

An implementation of the proposed methodology is available at


  • Barabási, A.L., Gulbahce, N. and Loscalzo, J. (2011). Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 12, 56–68.

    Google Scholar 

  • Belilovsky, E., Varoquaux, G. and Blaschko, M.B. (2016). Testing for differences in Gaussian graphical models: Applications to brain connectivity In: Advances in neural information processing systems, vol. 29. Curran Associates Inc,New York.

  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals Stat. 1165–1188.

  • Breheny, P. and Huang, J. (2009). Penalized methods for bi-level variable selection. Stat. Interf. 2, 369.

    MathSciNet  MATH  Google Scholar 

  • Bühlmann, P. and van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Springer Science & Business Media, Berlin.

  • Carey, L.A., Perou, C.M., Livasy, C.A., Dressler, L.G., Cowan, D., Conway, K., Karaca, G., Troester, M.A., Tse, C.K., Edmiston, S. et al. (2006). Race, breast cancer subtypes, and survival in the carolina breast cancer study. J. Am. Med. Assoc. 295, 2492–2502.

    Google Scholar 

  • Chen, S., Witten, D.M. and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika 102, 47–64.

    MathSciNet  MATH  Google Scholar 

  • Danaher, P., Wang, P. and Witten, D.M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Series B 76, 373–397.

    MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.

    MATH  Google Scholar 

  • de la Fuente, A. (2010). From ‘differential expression’ to ‘differential networking’–identification of dysfunctional regulatory networks in diseases. Trends Genet. 26, 326–333.

    Google Scholar 

  • van de Geer, S. (2016). Estimation and testing under sparsity. Lect. Notes Math. 2159.

  • van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42, 1166–1202.

  • Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98, 1–15.

    MathSciNet  MATH  Google Scholar 

  • Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. J. R. Stat. Soc. Series B 55, 757–779.

    MathSciNet  MATH  Google Scholar 

  • He, H., Cao, S., Zhang, J.G., Shen, H., Wang, Y.P. and Deng, H. (2019). A statistical test for differential network analysis based on inference of Gaussian graphical model. Scientif. Rep. 9, 1–8.

    Google Scholar 

  • Honda, T. (2019). The de-biased group lasso estimation for varying coefficient models. Ann. Inst. Stat. Math. 1–27.

  • Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709.

    MathSciNet  MATH  Google Scholar 

  • Hyvärinen, A. (2007). Some extensions of score matching. Comput. Stat. Data Anal. 51, 2499–2512.

    MathSciNet  MATH  Google Scholar 

  • Ideker, T. and Krogan, N.J. (2012). Differential network biology. Molecular Systems Biology 8(1).

  • Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15, 2869–2909.

    MathSciNet  MATH  Google Scholar 

  • Kanehisa, M. and Goto, S. (2000). Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30.

    Google Scholar 

  • Khan, S.A., Rogers, M.A., Khurana, K.K., Meguid, M.M. and Numann, P.J. (1998). Estrogen receptor expression in benign breast epithelium and breast cancer risk. J. Natl. Cancer Inst. 90, 37–42.

    Google Scholar 

  • Lin, L., Drton, M. and Shojaie, A. (2016). Estimation of high-dimensional graphical models using regularized score matching. Electron. J. Stat. 10, 806–854.

    MathSciNet  MATH  Google Scholar 

  • Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10, 2295–2328.

    MathSciNet  MATH  Google Scholar 

  • Lumachi, F., Brunello, A., Maruzzo, M., Basso, U. and Mm Basso, S. (2013). Treatment of estrogen receptor-positive breast cancer. Curr. Med. Chem. 20, 596–604.

    Google Scholar 

  • Maathuis, M., Drton, M., Lauritzen, S. and Wainwright, M. (2018). Handbook of graphical models. CRC Press, Boca Raton.

    MATH  Google Scholar 

  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462.

    MathSciNet  MATH  Google Scholar 

  • Mitra, R. and Zhang, C.H. (2016). The benefit of group sparsity in group inference with de-biased scaled group lasso. Electron. J. Stat. 10, 1829–1873.

    MathSciNet  MATH  Google Scholar 

  • Negahban, S.N., Ravikumar, P., Wainwright, M.J. and Yu, B. (2012). A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Stat. Sci. 27, 538–557.

    MathSciNet  MATH  Google Scholar 

  • Newman, M.E. (2003). The structure and function of complex networks. SIAM Rev. 45, 167–256.

    MathSciNet  MATH  Google Scholar 

  • Saegusa, T. and Shojaie, A. (2016). Joint estimation of precision matrices in heterogeneous populations. Electron. J. Stat. 10, 1341–1392.

    MathSciNet  MATH  Google Scholar 

  • Shojaie, A. (2020). Differential network analysis: A statistical perspective. Wiley Interdisciplinary Reviews: Computational Statistics e1508.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • van der Vaart, A.W. (2000). Asymptotic statistics, 3. Cambridge University Press, Cambridge.

    Google Scholar 

  • Wang, H. and Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. J. Am. Stat. Assoc. 104, 747–757.

    MathSciNet  MATH  Google Scholar 

  • Wang, J. and Kolar, M. (2014). Inference for sparse conditional precision matrices. arXiv:1412.7638.

  • Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C. and Stuart, J.M. (2013). The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120.

    Google Scholar 

  • Xia, Y., Cai, T. and Cai, T.T. (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102, 247–266.

    MathSciNet  MATH  Google Scholar 

  • Xia, Y., Cai, T. and Cai, T.T. (2018). Two-sample tests for high-dimensional linear regression with an application to detecting interactions. Stat. Sin. 28, 63–92.

    MathSciNet  MATH  Google Scholar 

  • Yang, E., Ravikumar, P., Allen, G.I. and Liu, Z. (2015). Graphical models via univariate exponential family distributions. J. Mach. Learn. Res. 16, 3813–3847.

    MathSciNet  MATH  Google Scholar 

  • Yang, J., Huang, T., Petralia, F., Long, Q., Zhang, B., Argmann, C., Zhao, Y., Mobbs, C.V., Schadt, E.E., Zhu, J. et al. (2015). Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases. Sci. Rep. 5, 1–16.

    Google Scholar 

  • Yang, Y. and Zou, H. (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput. 25, 1129–1141.

    MathSciNet  MATH  Google Scholar 

  • Yu, M., Gupta, V. and Kolar, M. (2020). Simultaneous inference for pairwise graphical models with generalized score matching. J. Mach. Learn. Res. 21, 1–51.

    MathSciNet  MATH  Google Scholar 

  • Yu, S., Drton, M. and Shojaie, A. (2019). Generalized score matching for non-negative data. J. Mach. Learn. Res. 20, 1–70.

    MathSciNet  MATH  Google Scholar 

  • Yu, S., Drton, M. and Shojaie, A. (2021). Generalized score matching for general domains. Information and inference: A Journal of the IMA.

  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Series B 68, 49–67.

    MathSciNet  MATH  Google Scholar 

  • Zhang, C.H. and Zhang, S.S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Series B 76, 217–242.

    MathSciNet  MATH  Google Scholar 

  • Zhang, X. and Cheng, G. (2017). Simultaneous inference for high-dimensional linear models. J. Am. Stat. Assoc. 112, 757–768.

    MathSciNet  Google Scholar 

  • Zhao, S.D., Cai, T.T. and Li, H. (2014). Direct estimation of differential networks. Biometrika 101, 253–268.

    MathSciNet  MATH  Google Scholar 

  • Zhou, S., Lafferty, J. and Wasserman, L. (2010). Time varying undirected graphs. Mach. Learn. 80, 295–319.

    MathSciNet  MATH  Google Scholar 

  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B 67, 301–320.

    MathSciNet  MATH  Google Scholar 

Download references


The authors gratefully acknowledge the support of the NSF Graduate Research Fellowship Program under grant DGE-1762114 as well as NSF grant DMS-1561814 and NIH grant R01-GM114029. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Aaron Hudson.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A: De-biased Group LASSO Estimator

In this subsection, we derive a de-biased group LASSO estimator. Our construction is essentially the same as the one presented in van de Geer (2016).

With \(\mathcal {V}_{j}\) as defined in Eq. 11, let \(\mathcal {V}_{-j}^{g} = \left (\mathcal {V}^{g}_{1},\ldots ,\mathcal {V}^{g}_{j-1}, \mathcal {V}^{g}_{j+1},\ldots , \mathcal {V}^{g}_{p}\right )\) be an n × (p − 1)d dimensional matrix. For , let \(\boldsymbol {\alpha }_{j} = \left (\alpha _{j,1}^{\top }, \ldots , \alpha _{j,p}^{\top }\right )^{\top }\), let \(\mathcal {P}_{j}\left (\boldsymbol {\alpha }_{j} \right ) = {\sum }_{k \neq j} \left \| \alpha _{j,k} \right \|_{2}\), and let \(\nabla \mathcal {P}_{j}\) denote the sub-gradient of \(\mathcal {P}_{j}\). We can express the sub-gradient as \(\nabla \mathcal {P}_{j}(\boldsymbol {\alpha }_{j}) =\\ \left ((\nabla \|\alpha _{j,1}\|_{2})^{\top }, \cdots , (\nabla \|\alpha _{j,p}\|_{2})^{\top } \right )^{\top }\) where ∇∥αj,k2 = αj,k/∥αj,k2 if ∥αj,k2≠ 0, and ∇∥αj,k2 is otherwise a vector with 2 norm less than one. The KKT conditions for the group LASSO imply that the estimate \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) satisfies

$$ \left( n^{g}\right)^{-1}\left( \mathcal{V}_{-j}^{g}\right)^{\top} \left( \mathbf{X}_{j}^{g} - \mathcal{V}_{-j}^{g} \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) = -\lambda \nabla \mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right). $$

With some algebra, we can rewrite this as

$$ \left( n^{g}\right)^{-1}\left( \mathcal{V}_{-j}^{g}\right)^{\top} \mathcal{V}_{-j}^{g} \left( \tilde{\boldsymbol{\alpha}}_{j}^{g}- \boldsymbol{\alpha}^{g,*}_{j}\right) = -\lambda \nabla \mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) + \left( \mathcal{V}^{g}_{-j}\right)^{\top} \left( \mathbf{X}_{j}^{g} - \mathcal{V}_{-j}^{g} \boldsymbol{\alpha}^{g,*}_{j} \right). $$

Let Σj be defined as the matrix

and let \(\tilde {M}_{j}\) be an estimate of \({{\varSigma }}_{j}^{-1}\). We can write \(\left (\tilde {\boldsymbol {\alpha }}^{g}_{j} - \tilde {\boldsymbol {\alpha }}^{g,*}_{j}\right )\) as

$$ \begin{array}{@{}rcl@{}} \left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}_{j}^{g,*} \right) &= &\underset{\mathrm{(i)}}{\underbrace{-\lambda \tilde{M}_{j}\nabla \mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right)}} + \underset{\mathrm{(ii)}}{\underbrace{\left( n^{g}\right)^{-1}\tilde{M}_{j}\left( \mathcal{V}^{g}_{-j}\right)^{\top} \left( \mathbf{X}_{j}^{g} - \mathcal{V}_{-j}^{g} \boldsymbol{\alpha}^{g,*}_{j} \right)}} + \\ &&\underset{\mathrm{(iii)}}{\underbrace{\left\{I - \left( n^{g}\right)^{-1}\tilde{M}_{j}\left( \mathcal{V}_{-j}^{g}\right)^{\top} \mathcal{V}_{-j}^{g} \right\} \left( \tilde{\boldsymbol{\alpha}}_{j}^{g} - \boldsymbol{\alpha}^{g,*}_{j}\right)}}. \end{array} $$

The first term (i) in Eq. A.1 is an approximation for the bias of the group LASSO estimate. This term is a function only of the observed data and not of any unknown quantities. This term can therefore be directly added to the initial estimate \(\tilde {\boldsymbol {\alpha }}_{j}^{g}\). If \(\tilde {M}_{j}\) is a consistent estimate of \({{\varSigma }}_{j}^{-1}\), the second term (ii) is asymptotically equivalent to

$$ {{\varSigma}}^{-1}_{j} \left( \mathcal{V}^{g}_{-j}\right)^{\top} \left( \mathbf{X}_{j}^{g} - \mathcal{V}_{-j}^{g} \boldsymbol{\alpha}^{g,*}_{j} \right). $$

Thus, (ii) is asymptotically equivalent to a sample average of mean zero i.i.d. random variables. The central limit theorem can then be applied to establish convergence in distribution to the multivariate normal distribution at an n1/2 rate for any low-dimensional sub-vector. The third term will also be asymptotically negligible if \(\tilde {M}_{j}\) is an approximate inverse of \((n^{g})^{-1}\left (\mathcal {V}_{-j}^{g}\right )^{\top }\mathcal {V}^{g}_{-j}\). This would suggest that an estimator of the form

$$ \check{\boldsymbol{\alpha}}_{j}^{g} = \tilde{\boldsymbol{\alpha}}_{j}^{g} + \lambda \tilde{M}_{j} \nabla \mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) $$

will be asymptotically normal for an appropriate choice of \(\tilde {M}_{j}\).

Before describing our construction of \(\tilde {M}_{j}\), we find it helpful to consider an alternative expression for \({{\varSigma }}^{-1}_{j}\). We define the d × d matrices \({{\varGamma }}^{*}_{j,k,l}\) as


We also define the d × d matrix \({C}^{*}_{j,k}\) as

It can be shown that \({{\varSigma }}^{-1}_{j}\) can be expressed as

$$ {{\varSigma}}^{-1}_{j} = \begin{pmatrix} \left( C_{j,1}^{*}\right)^{-1} & {\cdots} & \mathbf{0} \\ {\vdots} & {\ddots} & \vdots \\ \mathbf{0} & {\cdots} & \left( C_{j,p}^{*}\right)^{-1} \end{pmatrix} \begin{pmatrix} I & -{{\varGamma}}^{*}_{j,1,2} & {\cdots} & -{{\varGamma}}^{*}_{j,1,p} \\ -{{\varGamma}}^{*}_{j,2,1} & I & {\cdots} & -{{\varGamma}}^{*}_{j,2,p} \\ {\vdots} & {\vdots} & {\ddots} & \vdots \\ -{{\varGamma}}^{*}_{j,p,1} & -{{\varGamma}}^{*}_{j,p,2} & {\cdots} & I \end{pmatrix} . $$

We can thus estimate \({{\varSigma }}_{j}^{-1}\) by performing a series of regressions to estimate each matrix \({{\varGamma }}^{*}_{j,k,l}\).

Following the approach of van de Geer et al. (2014), we use a group LASSO variant of the nodewise LASSO to construct \(\tilde {M}_{j}\). To proceed, we require some additional notation. For any d × d matrix Γ = (γ1,…,γd) for d −dimensional vectors γc, let \(\|{{\varGamma }} \|_{2,*} = {\sum }_{c = 1}^{d} \|\gamma _{c}\|_{2}\). Let \(\nabla \| {{\varGamma }} \|_{2,*} = \left (\gamma _{1}/\|\gamma _{1}\|_{2},\ldots ,\gamma _{d}/ \|\gamma _{d}\|_{2} \right )\) be the subgradient of ∥Γ2,∗. We use the group LASSO to obtain estimates \(\tilde {{{\varGamma }}}_{j,k,l}\) of \({{\varGamma }}^{*}_{j,k,l}\):


We then estimate \(C^{*}_{j,k}\) as

$$ \tilde{C}_{j,k} = \left( n^{g}\right)^{-1} \left( \mathcal{V}^{g}_{k} - {\sum}_{l \neq k,j} \mathcal{V}^{g}_{l} \tilde{{{\varGamma}}}_{j,k,l} \right)^{\top}\left( \mathcal{V}_{k}^{g}\right). $$

Our estimate \(\tilde {M}_{j}\) takes the form

$$ \tilde{M}_{j} = \begin{pmatrix} \tilde{C}^{-1}_{j,1} & {\cdots} & \mathbf{0} \\ {\vdots} & {\ddots} & \vdots \\ \mathbf{0} & {\cdots} & \tilde{C}^{-1}_{j,p} \end{pmatrix} \begin{pmatrix} I & -\tilde{{{\varGamma}}}_{j,1,2} & {\cdots} & -\tilde{{{\varGamma}}}_{j,1,p} \\ -\tilde{{{\varGamma}}}_{j,2,1} & I & {\cdots} & -\tilde{{{\varGamma}}}_{j,2,p} \\ {\vdots} & {\vdots} & {\ddots} & \vdots \\ -\tilde{{{\varGamma}}}_{j,p,1} & -\tilde{{{\varGamma}}}_{j,p,2} & {\cdots} & I \end{pmatrix} . $$

With this construction of \(\tilde {M}_{j}\), we can establish a bound on the remainder term (iii) in Eq. A.1. To show this, we make use of the following lemma, which states a special case of the dual norm inequality for the group LASSO norm \(\mathcal {P}_{j}\) (see, e.g., Chapter 6 of van de Geer (2016)).

Lemma 1.

Let a1,…,ap and b1,…,bp be d-dimensional vectors, and let \(\mathbf {a} = \left (a_{1}^{\top },\ldots ,a_{p}^{\top }\right )^{\top }\) and \(\mathbf {b} = \left (b_{1}^{\top },\dots ,b_{p}^{\top }\right )^{\top }\) be pd-dimensional vectors. Then

$$ \langle \mathbf{a}, \mathbf{b}\rangle \leq \left( {\sum}_{j=1}^{p} \|a_{j}\|_{2} \right) \max_{j} \left\| b_{j} \right\|_{2}. $$

The KKT conditions for Eq. A.3 imply that for all lj,k

$$ \left( n^{g}\right)^{-1}\left( \mathcal{V}^{g}_{l}\right)^{\top}\left( \mathcal{V}^{g}_{k} - {\sum}_{r \neq k,j} \mathcal{V}^{g}_{r} \tilde{{{\varGamma}}}_{j,k,r}\right) = -\omega \nabla \left\| \tilde{{{\varGamma}}}_{j,k,l} \right\|_{2,*}. $$

Lemma 1 and Eq. A.4 imply that

$$ \left\| \begin{pmatrix} \tilde{C}_{j,1} & {\cdots} & \mathbf{0} \\ {\vdots} & {\ddots} & \vdots \\ \mathbf{0} & {\cdots} & \tilde{C}_{j,p} \end{pmatrix} \left\{I - \left( n^{g}\right)^{-1}\tilde{M}_{j}\left( \mathcal{V}_{-j}^{g}\right)^{\top} \mathcal{V}_{-j}^{g} \right\} \left( \tilde{\boldsymbol{\alpha}}_{j}^{g} - \boldsymbol{\alpha}^{g,*}_{j}\right) \right\|_{\infty} \leq \omega \mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}_{j}^{g} - \boldsymbol{\alpha}^{g,*}_{j}\right), $$

where \(\|\cdot \|_{\infty }\) is the \(\ell _{\infty }\) norm. With \(\omega \asymp \left \{\log (p)/n\right \}^{1/2}\), \(\tilde {M}_{j}\) can be shown to be consistent under sparsity of \({{\varGamma }}^{*}_{j,k,l}\) (i.e., only a few matrices \({{\varGamma }}^{*}_{j,k,l}\) have some nonzero columns) and some additional regularity conditions. Additionally, it can be shown under sparsity of αg,∗ (i.e., very few vectors \(\alpha ^{g,*}_{j,k}\) are nonzero) and some additional regularity conditions that \(\mathcal {P}_{j}\left (\tilde {\boldsymbol {\alpha }}_{j}^{g} - \boldsymbol {\alpha }_{j}^{g,*} \right ) = O_{P}\left (\left \{\log (p)/n \right \}^{1/2}\right )\). Thus, a scaled version of the remainder term (iii) is oP(n− 1/2) if \(n^{-1/2}\log (p) \to 0\). We refer readers to Chapter 8 of Bühlmann and van de Geer (2011) for a more comprehensive discussion of assumptions required for consistency of the group LASSO.

We now express the de-biased group LASSO estimator for \(\alpha ^{g,*}_{j,k}\) as

$$ \check{\alpha}^{g}_{j,k} = \tilde{\alpha}^{g}_{j,k} + \left( n^{g}\right)^{-1} \tilde{C}^{-1}_{j,k} \left( \mathcal{V}^{g}_{k} - {\sum}_{l \neq j, k} \tilde{{{\varGamma}}}_{j,k,l} \mathcal{V}_{l}^{g} \right)^{\top} \left( \mathbf{X}^{g}_{j} - \mathcal{V}^{g}_{-j} \tilde{\boldsymbol{\alpha}}^{g}_{j} \right). $$

We have established that \(\check {\alpha }^{g}_{j,k}\) can be written as

$$ \tilde{C}_{j,k} \left( \check{\alpha}^{g}_{j,k} - \alpha^{g,*}_{j,k}\right) = \left( n^{g}\right)^{-1} \left( \mathcal{V}^{g}_{k} - {\sum}_{l \neq j, k} {{\varGamma}}^{*}_{j,k,l} \mathcal{V}_{l}^{g} \right)^{\top} \left( \mathbf{X}^{g}_{j} - \mathcal{V}^{g}_{-j} \boldsymbol{\alpha}^{g,*}_{j} \right) + o_{P}(n^{-1/2}). $$

As stated above, the central limit theorem implies asymptotic normality of \(\check {\alpha }^{g}_{j,k}\).

We now construct an estimate for the variance of \(\check {\alpha }^{g}_{j,k}\). Suppose the residual \(\mathbf {X}^{g}_{j} - \mathcal {V}^{g}_{-j} \boldsymbol {\alpha }^{g,*}_{j}\) is independent of \(\mathcal {V}^{g}\), and let \({\tau _{j}^{g}}\) denote the residual variance

We can approximate the variance of \(\check {\alpha }^{g}_{j,k}\) as

$$ \check{{{\varOmega}}}^{g}_{j,k} = \left( n^{g}\right)^{-2}{\tau_{j}^{g}} \tilde{C}^{-1}_{j,k} \left( \mathcal{V}^{g}_{k} - {\sum}_{l \neq j, k} \tilde{{{\varGamma}}}_{j,k,l} \mathcal{V}_{l}^{g} \right)^{\top} \left( \mathcal{V}^{g}_{k} - {\sum}_{l \neq j, k} \tilde{{{\varGamma}}}_{j,k,l} \mathcal{V}_{l}^{g} \right) \left( \tilde{C}^{-1}_{j,k}\right)^{\top}. $$

As \({\tau _{j}^{g}}\) is typically unknown, we instead us the estimate

$$ \tilde{\tau}_{j}^{g} = \frac{\left\| \mathbf{X}^{g}_{j} - \mathcal{V}^{g}_{-j} \tilde{\boldsymbol{\alpha}}^{g}_{j} \right\|_{2}^{2}}{n - \widehat{df}}, $$

where \(\widehat {df}\) is an estimate of the degrees of freedom for the group LASSO estimate \(\tilde {\boldsymbol {\alpha }}_{j}^{g}\). In our implementation, we use the estimate proposed by Breheny and Huang (2009). Let \(\tilde {\alpha }^g_{j,k,l}\) be the l-th element of \(\tilde {\alpha }^g_{j,k}\), and let \(\mathcal {V}^g_{k,l}\) denote the l-th column of \(\mathcal {V}^g_k\). We then define

$$ \begin{array}{@{}rcl@{}} \bar{\alpha}^g_{j,k,l} = \frac{\langle \mathbf{X}^g_{j} - \mathcal{V}^g_{-j}\tilde{\boldsymbol{\alpha}}^g_j + \mathcal{V}^g_{k,l}\tilde{\alpha}^g_{j,k,l}, \mathcal{V}^g_{k,l}\rangle }{\langle \mathcal{V}^g_{k,l} , \mathcal{V}^g_{k,l} \rangle}, \end{array} $$

and estimate the degrees of freedom as

$$ \begin{array}{@{}rcl@{}} \hat{df} = {\sum}_{k \neq j}{\sum}_{l=1}^{d} \frac{\tilde{\alpha}^{g}_{j,k,l}}{\bar{\alpha}^{g}_{j,k,l}}. \end{array} $$

Appendix B: Generalized Score Matching Estimator

In this section, we establish consistency of the regularized score matching estimator and derive a bias-corrected estimator.

B.1 Form of Generalized Score Matching Loss

Below, we restate Theorem 3 of Yu et al. (2019), which provides conditions under which the score matching loss in Eq. 20 can be expressed as Eq. 21.

Theorem 1.

Assume the following conditions hold:

where the prime symbol denotes the element-wise derivative. Then Eqs. 20 and 21 are equivalent up to an additive constant that does not depend on h.

B.2 Generalized Score Matching Estimator in Low Dimensions

In this section, we provide an explicit form for the generalized score matching estimator in the low-dimensional setting and state its limiting distribution. We first introduce some additional notation below that allows for the generalized score matching loss to be written in a condensed form. Recall the form of the conditional density for the pairwise interaction model in Eq. 22. We define

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\!\mathcal{V}^{g}_{j,k,1} = \begin{pmatrix} v_{j}^{1/2}\left( X^{g}_{1,j}\right)\dot{\psi}\left( X^{g}_{1,j}, X^{g}_{1,k}\right) \times \phi\left( {W^{g}_{1}}\right) \\ \vdots \\ v_{j}^{1/2}\left( X^{g}_{n^{g},j}\right) \dot{\psi}\left( X^{g}_{n^{g},j}, X^{g}_{n^{g},k}\right) \times \phi\left( W^{g}_{n^{g}}\right) \end{pmatrix}, \\ \\ &&\!\!\!\!\!\mathcal{V}^{g}_{2,j} = \begin{pmatrix} v_{j}^{1/2}\left( X^{g}_{1,j}\right) \times \left\{ \dot{\zeta}\left( X^{g}_{1,j}, \phi_{1}({W^{g}_{1}})\right),\cdots,\dot{\zeta}\left( X^{g}_{1,j}, \phi_{d}({W^{g}_{1}})\right) \right\} \\ \vdots \\ v_{j}^{1/2}\left( X^{g}_{n^{g},j}\right) \times \left\{ \dot{\zeta}\left( X^{g}_{n^{g},j}, \phi_{1}(W^{g}_{n^{g}})\right),\cdots,\dot{\zeta}\left( X^{g}_{n^{g},j}, \phi_{d}(W^{g}_{n^{g}})\right) \right\} \end{pmatrix},\\\\ &&\!\!\!\!\!\mathcal{U}^{g}_{j,k,1} = \begin{pmatrix} \left\{\dot{v}_{j}\left( X^{g}_{1,j}\right)\dot{\psi}\left( X^{g}_{1,j}, X^{g}_{1,k}\right) + v_{j}\left( X^{g}_{1,j}\right)\ddot{\psi}\left( X^{g}_{1,j}, X^{g}_{1,k}\right) \right\} \times \phi\left( {W^{g}_{1}}\right) \\ \vdots \\ \left\{\dot{v}_{j}\left( X^{g}_{1,j}\right)\dot{\psi}\left( X^{g}_{n^{g},j}, X^{g}_{n^{g},k}\right) + v_{j}\left( X^{g}_{n^{g},j}\right)\ddot{\psi}\left( X^{g}_{1,j}, X^{g}_{n^{g},k}\right) \right\} \times \phi\left( W^{g}_{n^{g}}\right) \end{pmatrix}, \\ \\ &&\!\!\!\!\!\mathcal{U}^{g}_{j,2} = \begin{pmatrix} v_{j}\left( X_{1,j}^{g}\right) \ddot{\zeta}\left( X^{g}_{1,j}, \phi_{1}({W^{g}_{1}})\right) & {\cdots} & v_{j}\left( X_{1,j}^{g}\right) \ddot{\zeta}\left( X^{g}_{1,j}, \phi_{d}({W^{g}_{1}})\right) \\ {\vdots} & {\ddots} & \vdots \\ v_{j}\left( X_{n^{g},j}^{g}\right) \ddot{\zeta}\left( X^{g}_{n^{g},j}, \phi_{1}(W^{g}_{n^{g}})\right) & {\cdots} & v_{j}\left( X_{n^{g},j}^{g}\right) \ddot{\zeta}\left( X^{g}_{n^{g},j}, \phi_{d}(W^{g}_{n^{g}})\right) \end{pmatrix} \\\\ &&\quad\quad\quad +\begin{pmatrix} \dot{v}_{j}\left( X_{1,j}^{g}\right) \dot{\zeta}\left( X^{g}_{1,j}, \phi_{1}({W^{g}_{1}})\right) & {\cdots} & \dot{v}_{j}\left( X_{1,j}^{g}\right) \dot{\zeta}\left( X^{g}_{1,j}, \phi_{d}({W^{g}_{1}})\right) \\ {\vdots} & {\ddots} & \vdots \\\ \dot{v}_{j}\left( X_{n^{g},j}^{g}\right) \dot{\zeta}\left( X^{g}_{n^{g},j}, \phi_{1}(W^{g}_{n^{g}})\right) & {\cdots} & \dot{v}_{j}\left( X_{n^{g},j}^{g}\right) \dot{\zeta}\left( X^{g}_{n^{g},j}, \phi_{d}(W^{g}_{n^{g}})\right)\!\! \end{pmatrix}\!, \\ \\ &&\!\!\!\!\!\mathcal{V}^{g}_{j,1} = \begin{pmatrix} \mathcal{V}^{g}_{j,1,1} \\ {\vdots} \\ \mathcal{V}^{g}_{j,p,1} \end{pmatrix}; \quad \mathcal{U}^{g}_{j,1} = \begin{pmatrix} \mathcal{U}^{g}_{1,j,1} \\ {\vdots} \\ \mathcal{U}^{g}_{j,p,1} \end{pmatrix}. \end{array} $$

Let \(\boldsymbol {\alpha }_{j} = \left (\alpha _{j,1}^{\top }, \ldots ,\alpha _{j,p}^{\top }\right )^{\top }\) for and 𝜃j = (𝜃j,1,…,𝜃j,d) for . We can express the empirical score matching loss Eq. 23 as

$$ L^{g}_{n,j}(\boldsymbol{\alpha}_{j}, \boldsymbol{\theta}_{j}) = \left( 2n^{g}\right)^{-1} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j} + \mathcal{V}^{g}_{2,j} \boldsymbol{\theta}_{j} \right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}+ \mathcal{V}^{g}_{2,j} \boldsymbol{\theta}_{j} \right) + \left( n^{g}\right)^{-1}\mathbf{1}^{\top} \left( \mathcal{U}^{g}_{1,j} \boldsymbol{\alpha}_{j} + \mathcal{U}^{g}_{2,j} \boldsymbol{\theta}_{j} \right). $$

We write the gradient of the risk function as

$$ \nabla L^{g}_{n,j}(\boldsymbol{\alpha}_{j}, \boldsymbol{\theta}_{j}) = \left( n^{g}\right)^{-1} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix} \begin{pmatrix} \boldsymbol{\alpha}_{j} \\ \boldsymbol{\theta}_{j} \end{pmatrix} + \left( n^{g}\right)^{-1} \begin{pmatrix} \left( \mathcal{U}_{j,1}^{g}\right)^{\top}\mathbf{1} \\ \left( \mathcal{U}_{j,2}^{g}\right)^{\top}\mathbf{1} \end{pmatrix}. $$

Thus, the minimizer \((\hat {\boldsymbol {\alpha }}^{g}_{j}, \hat {\boldsymbol {\theta }}^{g}_{j})\) of the empirical loss takes the form

$$ \begin{pmatrix} \hat{\boldsymbol{\alpha}}^{g}_{j} \\ \hat{\boldsymbol{\theta}}^{g}_{j} \end{pmatrix} = - \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix}^{-1} \begin{pmatrix} \left( \mathcal{U}_{j,1}^{g}\right)^{\top}\mathbf{1} \\ \left( \mathcal{U}_{j,2}^{g}\right)^{\top}\mathbf{1} \end{pmatrix}. $$

By applying Theorem 5.23 of van der Vaart (2000),

$$ \left( n^{g}\right)^{1/2} \begin{pmatrix} \hat{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}_{j}^{g,*} \\ \hat{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \end{pmatrix} \to_{d} N\left( 0, \begin{pmatrix} A B A \end{pmatrix} \right), $$

where the matrices A and B are defined as

We estimate the variance of \((\hat {\boldsymbol {\alpha }}^{g}_{j}, \hat {\boldsymbol {\theta }}^{g}_{j})\) as \(\hat {{{\varOmega }}}^{g}_{j} = \left (n^{g}\right )^{-1}\hat {A} \hat {B} \hat {A}\), where

$$ \begin{array}{@{}rcl@{}} &&\hat{A} = n^{g} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix}^{-1}, \\ &&\hat{B} = \left( n^{g}\right)^{-1}\hat{\xi}^{\top}\hat{\xi}, \quad \hat{\xi} = \begin{pmatrix} \text{diag}\left( \mathcal{V}_{j,1}^{g}\hat{\boldsymbol{\alpha}}^{g}_{j} + \mathcal{V}_{j,2}^{g} \hat{\boldsymbol{\theta}}^{g}_{j} \right)\mathcal{V}_{j,1}^{g} \\ \text{diag}\left( \mathcal{V}_{j,1}^{g}\hat{\boldsymbol{\alpha}}^{g}_{j} + \mathcal{V}_{j,2}^{g} \hat{\boldsymbol{\theta}}^{g}_{j} \right) \mathcal{V}_{j,2}^{g} \end{pmatrix} + \begin{pmatrix} \mathcal{U}_{j,1}^{g} \\ \mathcal{U}_{j,2}^{g} \end{pmatrix}. \end{array} $$

B.3 Consistency of Regularized Generalized Score Matching Estimator

In this subsection, we argue that the regularized generalized score matching estimators \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) and \(\tilde {\boldsymbol {\theta }}^{g}_{j}\) from Eq. 24 are consistent. Let \(\mathcal {P}_{j}(\boldsymbol {\alpha }_{j}) = {\sum }_{j=1}^{p} \|\alpha _{j,k}\|_{2}\). We establish convergence rates of \(\mathcal {P}_{j}\left (\tilde {\boldsymbol {\alpha }}_{j}^{g} - \boldsymbol {\alpha }_{j}^{g,*} \right )\) and \(\left \|\tilde {\boldsymbol {\theta }}^{g}_{j} - \boldsymbol {\theta }_{j}^{g,*} \right \|_{2}\). Our approach is based on proof techniques described in Bühlmann and van de Geer (2011).

Our result requires a notion of compatibility between the penalty function \(\mathcal {P}_{j}\) and the loss \(L^{g}_{n,j}\). Such notions are commonly assumed in the high-dimensional literature. Below, we define the compatibility condition.

Definition 1 (Compatibility Condition).

Let S be a set containing indices of the nonzero elements of \(\boldsymbol {\alpha }_{j}^{g,*}\), and let \(\bar {S}\) denote the complement of S. Let be a (p − 1)d-dimensional vector where the r-th element is one if rS, and zero otherwise. The group LASSO compatibility condition holds for the index set S ⊂{1,…,p} and for constant C > 0 if for all ,

where ∘ is the element-wise product operator.

Theorem 2.

Let \(\mathcal {E}\) be the set

$$ \begin{array}{@{}rcl@{}} \mathcal{E} &=& \left\{ \max_{k \neq j} \left\{ \left\| \left( \mathcal{V}_{j,k,1}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,1}\right)^{\top} \mathbf{1} \right\|_{2}\right\} \leq n^{g}\lambda_{0} \right\} \cap \\ &&\left\{ \left\| \left( \mathcal{V}_{j,k,2}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,2}\right)^{\top} \mathbf{1} \right\|_{2} \leq n^{g}\lambda_{0} \right\} \end{array} $$

for some λ0λ/2. Suppose the compatibility condition also holds. Then on the set \(\mathcal {E}\),

$$ \mathcal{P}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \right) + \| \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \|_{2} \leq \frac{\lambda 4 |S|}{C^{2}} . $$

Proof Proof of Theorem 2.

The regularized score matching estimator \(\tilde {\boldsymbol {\alpha }}_{j}^{g}\) necessarily satisfies the following basic inequality:

$$ L^{g}_{n,j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j}, \tilde{\boldsymbol{\theta}}^{g}_{j}\right) + \lambda\mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) \leq L^{g}_{n,j}\left( \boldsymbol{\alpha}^{g,*}_{j}, \boldsymbol{\theta}^{g,*}_{j}\right) + \lambda\mathcal{P}_{j}\left( \boldsymbol{\alpha}^{g,*}_{j} \right). $$

With some algebra, this inequality can be rewritten as

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\!\!\!\!\!\!\!\left( 2n^{g}\right)^{-1} \begin{pmatrix} \left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \right)^{\top} & \left( \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}^{g,*}_{j}\right)^{\top} \end{pmatrix} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix}\\ &&\times\begin{pmatrix} \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \\ \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}^{g,*}_{j} \end{pmatrix} + \lambda\mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) \!\leq\! -\left( n^{g}\right)^{-1} \begin{pmatrix} \left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \right)^{\top} & \left( \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}^{g,*}_{j}\right)^{\top} \end{pmatrix}\\ &&\times\begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,1}\right)^{\top} \mathbf{1} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,2}\right)^{\top} \mathbf{1} \end{pmatrix}\ + \lambda\mathcal{P}_{j}\left( \boldsymbol{\alpha}^{g,*}_{j} \right). \end{array} $$

By Lemma 1, on the set \(\mathcal {E}\) and using λλ0/2 we get

$$ \begin{array}{@{}rcl@{}} && \left( n^{g}\right)^{-1} \begin{pmatrix} \left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \right)^{\top} & \left( \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}^{g,*}_{j}\right)^{\top} \end{pmatrix} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix}\\&& \times\begin{pmatrix} \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \\ \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}^{g,*}_{j} \end{pmatrix} + 2\lambda \mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) \leq \lambda\left\|\tilde{\boldsymbol{\theta}}_{j} - \boldsymbol{\theta}^{*}_{j} \right\|_{2} + 2\lambda \mathcal{P}_{j}\left( \boldsymbol{\alpha}^{g,*}_{j} \right) + \lambda\mathcal{P}_{j}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \right). \end{array} $$

On the left hand side, we apply the triangle inequality to get

On the right hand side, we observe that

We then have


where we use the compatiblility condition for the first inequality, and for the second inequality use the fact that

$$ ab \leq b^{2} + a^{2} $$

for any . The conclusion follows immediately. □

If the event \(\mathcal {E}\) occurs with probability tending to one, Theorem 2 implies

$$ \mathcal{P}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \right) + \| \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \|_{2} = O_{P}\left( \lambda\right). $$

We select λ so that the event \(\mathcal {E}\) occurs with high probability. For instance, suppose the elements of the matrix

$$ \begin{array}{@{}rcl@{}} \xi = \begin{pmatrix} \text{diag}\left( \mathcal{V}_{j,1}^g\boldsymbol{\alpha}^{g,*}_j + \mathcal{V}_{j,2}^g \boldsymbol{\theta}^{g,*}_j \right)\mathcal{V}_{j,1}^g + \mathcal{U}_{j,1}^g \\ \text{diag}\left( \mathcal{V}_{j,1}^g\boldsymbol{alpha}^{g,*}_j + \mathcal{V}_{j,2}^g \boldsymbol{\theta}^{g,*}_j \right) \mathcal{V}_{j,2}^g + \mathcal{U}_{j,2}^g \end{pmatrix} \end{array} $$

are sub-Gaussian, and consider the event

$$ \begin{array}{@{}rcl@{}} \bar{\mathcal{E}} =&\left| \begin{pmatrix} \left( \mathcal{V}_{j,1}^g\right)^{\top} \left( \mathcal{V}_{j,1}^g \boldsymbol{\alpha}_j^{g,*} +\mathcal{V}_{j,2}^g\boldsymbol{\theta}_j^{g,*} \right) + \left( \mathcal{U}^g_{j,1}\right)^{\top} \mathbf{1} \\ \left( \mathcal{V}_{j,2}^g\right)^{\top} \left( \mathcal{V}_{j,1}^g \boldsymbol{\alpha}_j^{g,*} +\mathcal{V}_{j,2}^g\boldsymbol{\theta}_j^{g,*} \right) + \left( \mathcal{U}^g_{j,2}\right)^{\top} \mathbf{1} \end{pmatrix} \right|_{\infty} \leq\frac{n^{g\lambda}_0}{d}, \end{array} $$

where \(\|\cdot \|_{\infty }\) is the \(\ell _{\infty }\) norm. Observing that \(\mathcal {E} \subset \bar {\mathcal {E}}\), it is only necessary to show that \(\bar {\mathcal {E}}\) holds with high probability. It is shown in Corollary 2 of Negahban et al. (2012) that there exist constants u1,u2 > 0 such that with \(\lambda _{0} \asymp \{\log (p)/n\}^{1/2}\), \(\bar {\mathcal {E}}\) holds with probability at least \(1 - u_{1}p^{-u_{2}}\). Thus, \(\mathcal {E}\) occurs with probability tending to one as \(p \to \infty \). For distributions with heavier tails, a larger choice of λ may be required (Yu et al. 2019).

B.4 De-biased Score Matching Estimator

The KKT conditions for the regularized score matching loss imply that the estimator \(\tilde {\boldsymbol {\alpha }}^{g}_{j}\) satisfies

$$ \begin{array}{@{}rcl@{}} \nabla L_{n,j}(\tilde{\boldsymbol{\alpha}}^{g}_{j}, \tilde{\boldsymbol{\theta}}^{g}_{j}) &=& \left( n^{g}\right)^{-1} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix} \begin{pmatrix} \tilde{\boldsymbol{\alpha}}_{j}^{g} \\ \tilde{\boldsymbol{\theta}}_{j}^{g} \end{pmatrix}\\ &&+ \left( n^{g}\right)^{-1} \begin{pmatrix} \left( \mathcal{U}_{j,1}^{g}\right)^{\top}\mathbf{1} \\ \left( \mathcal{U}_{j,2}^{g}\right)^{\top}\mathbf{1} \end{pmatrix} = \begin{pmatrix} \lambda \nabla P\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) \\ \mathbf{0} \end{pmatrix}. \end{array} $$

With some algebra, we can rewrite the KKT conditions as

$$ \begin{array}{@{}rcl@{}} &&\left( n^{g}\right)^{-1} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix} \begin{pmatrix} \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}_{j}^{g,*} \\ \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \end{pmatrix} = \\ &&\lambda \begin{pmatrix} \nabla P\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) \\ \mathbf{0} \end{pmatrix} - \left( n^{g}\right)^{-1} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,1}\right)^{\top} \mathbf{1} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,2}\right)^{\top} \mathbf{1} \end{pmatrix}. \end{array} $$

Now, let Σj,n be the matrix

$$ {{\varSigma}}_{j,n} = \left( n^{g}\right)^{-1} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix}, $$

let , and let \(\tilde {M}_{j}\) be an estimate of \({{\varSigma }}_{j}^{-1}\). We can now rewrite the KKT conditions as

$$ \begin{array}{@{}rcl@{}} \begin{pmatrix} \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}_{j}^{g,*} \\ \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \end{pmatrix} &=& \underset{(\mathrm{i})}{\underbrace{\lambda \tilde{M}_{j} \begin{pmatrix} \nabla P\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) \\ \mathbf{0} \end{pmatrix}}} - \underset{(\text{ii})}{\underbrace{\left( n^{g}\right)^{-1} \tilde{M}_{j} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,1}\right)^{\top} \mathbf{1} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,2}\right)^{\top} \mathbf{1} \end{pmatrix} }} + \\ &&\quad\quad\quad \underset{(\text{iii})}{ \underbrace{\left( n^{g}\right)^{-1} \left\{ I - {{\varSigma}}_{j,n} \tilde{M}_{j} \right\} \begin{pmatrix} \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}_{j}^{g,*} \\ \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \end{pmatrix} }}. \end{array} $$

As is the case for the de-biased group LASSO in Appendix ??, the first term (i) in Eq. B.1 depends only on the observed data and can be directly subtracted from the initial estimate. The second term (ii) is asymptotically equivalent to

$$ \left( n^{g}\right)^{-1}{{\varSigma}}_{j}^{-1} \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,1}\right)^{\top} \mathbf{1} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top} \left( \mathcal{V}_{j,1}^{g} \boldsymbol{\alpha}_{j}^{g,*} + \mathcal{V}_{j,2}^{g}\boldsymbol{\theta}_{j}^{g,*} \right) + \left( \mathcal{U}^{g}_{j,2}\right)^{\top} \mathbf{1} \end{pmatrix}, $$

if \(\tilde {M}_{j}\) is a consistent estimate of \({{\varSigma }}_{j}^{-1}\). Using the fact that , it can be seen that Eq. B.2 is an average of i.i.d. random quantities with mean zero. The central limit theorem then implies that any low-dimensional sub-vector is asymptotically normal. The last term (iii) is asymptotically negligible if \(\tilde {M}_{j}\) is an approximate inverse of Σj,n and if \((\tilde {\boldsymbol {\alpha }}_{j}^{g}, \tilde {\boldsymbol {\theta }}_{j}^{g})\) is consistent for \((\boldsymbol {\alpha }_{j}^{g,*}, \boldsymbol {\theta }_{j}^{g,*})\). Thus, for an appropriate choice of \(\tilde {M}_{j}\), we expect asymptotic normality of an estimator of the form

$$ \begin{pmatrix} \check{\boldsymbol{\alpha}}^{g}_{j} \\ \check{\boldsymbol{\theta}}^{g}_{j} \end{pmatrix} = \begin{pmatrix} \tilde{\boldsymbol{\alpha}}^{g}_{j} \\ \tilde{\boldsymbol{\theta}}^{g}_{j} \end{pmatrix} - \lambda \tilde{M}_{j} \begin{pmatrix} \nabla P\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} \right) \\ \mathbf{0} \end{pmatrix}. $$

Before constructing \(\tilde {M}_{j}\), we first provide an alternative expression for \({{\varSigma }}_{j}^{-1}\). We define the d × d matrices \({{\varGamma }}^{*}_{j,k,l}\) and \({{\varDelta }}^{*}_{j,k}\) as

We also define the d × d matrices \({{\varLambda }}^{*}_{j,k}\) as

Additionally, we define the d × d matrices \(C^{*}_{j,k}\) and \(D^{*}_{j}\)

It can be shown that \({{\varSigma }}_{j}^{-1}\) can be expressed as

$$ {{\varSigma}}^{-1}_{j} = \begin{pmatrix} \left( C^{*}_{j,1}\right)^{-1} & {\cdots} & \mathbf{0} & \mathbf{0} \\ {\vdots} & {\ddots} & {\vdots} & \vdots \\ \mathbf{0} & {\cdots} & \left( C^{*}_{j,p}\right)^{-1} & \mathbf{0} \\ \mathbf{0} & {\cdots} & \mathbf{0} & \left( D^{*}_{j}\right)^{-1} \end{pmatrix} \begin{pmatrix} I & -{{\varGamma}}^{*}_{j,1,2} & {\cdots} & -{{\varGamma}}^{*}_{j,1,p} & - {{\varDelta}}^{*}_{j,1} \\ -{{\varGamma}}^{*}_{j,2,1} & I & {\cdots} & -{{\varGamma}}^{*}_{j,2,p} & - {{\varDelta}}^{*}_{j,2} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} & \vdots \\ -{{\varGamma}}^{*}_{j,p,1} & -{{\varGamma}}^{*}_{j,p,2} & {\cdots} & I & - {{\varDelta}}^{*}_{j,p} \\ -{{\varLambda}}^{*}_{j,1} & -{{\varLambda}}^{*}_{j,2} & {\cdots} & -{{\varLambda}}^{*}_{j,p} & I \end{pmatrix} . $$

We can thus estimate \({{\varSigma }}_{j}^{-1}\) by estimating each of the matrices \({{\varGamma }}^{*}_{j,k,l}\), \({{\varLambda }}^{*}_{j,k}\), and \({{\varDelta }}^{*}_{j,k}\).

Similar to our discussion of the de-biased group LASSO in Appendix ??, we use a group-penalized variant of the nodewise LASSO to construct \(\tilde {M}_{j}\). We estimate \({{\varGamma }}^{*}_{j,k,l}\) and \({{\varDelta }}^{*}_{j,k}\) as

where ω1,ω2 > 0 are tuning parameters, and ∥⋅∥2,∗ is as defined in Appendix ??. We estimate \({{\varLambda }}^{*}_{j,k}\) as


Additionally, we define the d × d matrices \(\tilde {C}_{j,k}\) and \(\tilde {D}_{j}\)

$$ \begin{array}{@{}rcl@{}} &&\tilde{C}_{j,k} = \left( n^{g}\right)^{-1}\left( \mathcal{V}^{g}_{j,k,1}\right)^{\top} \left( \mathcal{V}_{j,k,1}^{g} - {\sum}_{l \neq k,j} \mathcal{V}_{j,l,1}^{g} \tilde{{{\varGamma}}}_{j,k,l} - \mathcal{V}^{g}_{j,2}\tilde{{{\varDelta}}}_{j,k} \right) \\ &&\tilde{D}_{j} = \left( n^{g}\right)^{-1}\left( \mathcal{V}^{g}_{j,2}\right)^{\top} \left( \mathcal{V}_{j,2}^{g} - {\sum}_{k \neq j} \mathcal{V}_{j,k,1}^{g} \tilde{{{\varLambda}}}_{j,k} \right). \end{array} $$

We then take \(\tilde {M}_{j}\) as

$$ \tilde{M}_{j} = \begin{pmatrix} \tilde{C}^{-1}_{j,1} & {\cdots} & \mathbf{0} & \mathbf{0} \\ {\vdots} & {\ddots} & {\vdots} & \vdots \\ \mathbf{0} & {\cdots} & \tilde{C}^{-1}_{j,p} & \mathbf{0} \\ \mathbf{0} & {\cdots} & \mathbf{0} & \tilde{D}^{-1}_{j} \end{pmatrix} \begin{pmatrix} I & -\tilde{{{\varGamma}}}_{j,1,2} & {\cdots} & -\tilde{{{\varGamma}}}_{j,1,p} & - \tilde{{{\varDelta}}}_{j,1} \\ -\tilde{{{\varGamma}}}_{j,2,1} & I & {\cdots} & -\tilde{{{\varGamma}}}_{j,2,p} & - \tilde{{{\varDelta}}}_{j,2} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} & \vdots \\ -\tilde{{{\varGamma}}}_{j,p,1} & -\tilde{{{\varGamma}}}_{j,p,2} & {\cdots} & I & - \tilde{{{\varDelta}}}_{j,p} \\ -\tilde{{{\varLambda}}}_{j,1} & -\tilde{{{\varLambda}}}_{j,2} & {\cdots} & -\tilde{{{\varLambda}}}_{j,p} & I \end{pmatrix} . $$

When \({{\varGamma }}^{*}_{j,k,l}\), \({{\varDelta }}^{*}_{j,k}\), and \({{\varLambda }}^{*}_{j,k}\) satisfy appropriate sparsity conditions and some additional regularity assumptions, \(\tilde {M}_{j}\) is a consistent estimate of \({{\varSigma }}_{j}^{-1}\) for \(\omega _{1} \asymp \{\log (p)/n\}^{1/2}\) and \(\omega _{2} \asymp \{\log (p)/n\}^{1/2}\) (see, e.g., Chapter 8 of Bühlmann and van de Geer (Bühlmann and van de Geer, 2011) for a more comprehensive discussion). Using the same argument presented in Appendix ??, we are able to obtain the following bound on a scaled version of the remainder term (iii):

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\!\left\| \begin{pmatrix} \tilde{C}_{j,1} & {\cdots} & \mathbf{0} & \mathbf{0} \\ {\vdots} & {\ddots} & {\vdots} & \vdots \\ \mathbf{0} & {\cdots} & \tilde{C}_{j,p} & \mathbf{0} \\ \mathbf{0} & {\cdots} & \mathbf{0} & \tilde{D}_{j} \end{pmatrix} \left\{ I - \left( n^{g}\right)^{-1}\!\! \begin{pmatrix} \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,1}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \\ \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,1}^{g} & \left( \mathcal{V}_{j,2}^{g}\right)^{\top}\mathcal{V}_{j,2}^{g} \end{pmatrix} \tilde{M}_{j} \right\} \begin{pmatrix} \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}_{j}^{g,*} \\ \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \end{pmatrix} \right\|_{\infty} \leq \\ &&\max\{\omega_{1}, \omega_{2} \} \left\{ \mathcal{P}\left( \tilde{\boldsymbol{\alpha}}^{g}_{j} - \boldsymbol{\alpha}^{g,*}_{j} \right) + \| \tilde{\boldsymbol{\theta}}^{g}_{j} - \boldsymbol{\theta}_{j}^{g,*} \|_{2} \right\}. \end{array} $$

The remainder is oP(n− 1/2) and hence asymptotically negligible if n1/2 \(\max \limits \{\omega _{1}, \omega _{2}\} \lambda \to 0\), where λ is the tuning parameter for the regularized score matching estimator (see Theorem 2).

The de-biased estimate \(\check {\alpha }^{g}_{j,k}\) of \(\alpha ^{g,*}_{j,k}\) can be expressed as

$$ \begin{array}{@{}rcl@{}} \check{\alpha}^{g}_{j,k} &=& \tilde{\alpha}^{g}_{j,k} - \left( n^{g}\right)^{-1} \tilde{C}^{-1}_{j,k} \left( \mathcal{V}^{g}_{j,k,1} - {\sum}_{l \neq j, k} \mathcal{V}_{j,l,1}^{g} \tilde{{{\varGamma}}}_{j,k,l} \right)^{\top} \\ &&\left( \mathcal{V}^{g}_{j,1} \tilde{\boldsymbol{\alpha}}^{g}_{j} + \mathcal{V}^{g}_{j,2} \tilde{\boldsymbol{\theta}}_{j}^{g} + \left( \mathcal{U}_{j,1}^{g}\right)^{\top} \mathbf{1} \right). \end{array} $$

The difference between the de-biased estimator \(\check {\alpha }^{g}_{j,k}\) and the true parameter \(\alpha ^{g,*}_{j,k}\) can be expressed as

$$ \begin{array}{@{}rcl@{}} \tilde{C}_{j,k}\left( \check{\alpha}^{g}_{j,k} - \alpha^{g,*}_{j,k}\right) &=&\!\!\!\! -\left( n^{g}\right)^{-1} \left( \mathcal{V}^{g}_{j,k,1} - {\sum}_{l \neq j, k} \mathcal{V}_{j,l,1}^{g} {{\varGamma}}^{*}_{j,k,l} \right)^{\top} \left( \mathcal{V}^{g}_{j,1} \boldsymbol{\alpha}^{g,*}_{j} + \mathcal{V}^{g}_{j,2} \boldsymbol{\theta}_{j}^{g,*} + \left( \mathcal{U}_{j,1}^{g}\right)^{\top} \mathbf{1} \right) + \\ &&\!\!\!\!\left( n^{g}\right)^{-1} \left( \mathcal{V}^{g}_{j,2} {{\varDelta}}^{*}_{j,k}\right)^{\top} \left( \mathcal{V}^{g}_{j,1} \boldsymbol{\alpha}^{g,*}_{j} + \mathcal{V}^{g}_{j,2} \boldsymbol{\theta}_{j}^{g,*} + \left( \mathcal{U}_{j,2}^{g}\right)^{\top} \mathbf{1} \right) \bigg\} + o_{P}\left( n^{-1/2}\right). \end{array} $$

As discussed above, the central limit theorem implies asymptotic normality of \(\check {\alpha }^{g}_{j,k}\). We can estimate the asymptotic variance of \(\check {\alpha }^{g}_{j,k}\) as

$$ \left( n^{g}\right)^{-2}\tilde{C}_{j,k}^{-1}\tilde{M}_{j,k}\tilde{\xi}^{\top}\tilde{\xi}\tilde{M}^{\top}_{j,k} \left( \tilde{C}_{j,k}^{-1}\right)^{\top}, $$

where we define

$$ \begin{array}{@{}rcl@{}} \tilde{\xi} &=& \begin{pmatrix} \text{diag}\left( \mathcal{V}_{j,1}^{g} \tilde{\boldsymbol{\alpha}}_{j}^{g} + \mathcal{V}_{j,2}^{g}\tilde{\boldsymbol{\theta}}_{j}^{g} \right)\mathcal{V}_{j,1}^{g} + \mathcal{U}^{g}_{j,1} \\ \text{diag}\left( \mathcal{V}_{j,1}^{g} \tilde{\boldsymbol{\alpha}}_{j}^{g} + \mathcal{V}_{j,2}^{g}\tilde{\boldsymbol{\theta}}_{j}^{g} \right)\mathcal{V}_{j,2}^{g} + \mathcal{U}^{g}_{j,2} \end{pmatrix} \\ \tilde{M}_{j,k} &=& \begin{pmatrix} -\tilde{{{\varGamma}}}_{j,k,1} & {\cdots} & -\tilde{{{\varGamma}}}_{j,k,k-1} & I & -\tilde{{{\varGamma}}}_{j,k,k+1} & {\cdots} & -\tilde{{{\varGamma}}}_{j,k,p} & - \tilde{{{\varDelta}}}_{j,p} \end{pmatrix}. \end{array} $$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hudson, A., Shojaie, A. Covariate-Adjusted Inference for Differential Analysis of High-Dimensional Networks. Sankhya A 84, 345–388 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Differential network
  • Confounding
  • High-dimensional
  • Penalized likelihood
  • De-biased LASSO
  • Exponential family


  • 62H22 (primary); 62J07 (secondary)