Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Bayesian sparse covariance decomposition with a graphical structure

Abstract

We consider the problem of estimating covariance matrices of a particular structure that is a summation of a low-rank component and a sparse component. This is a general covariance structure encountered in multiple statistical models including factor analysis and random effects models, where the low-rank component relates to the correlations among variables coming from the latent factors or random effects and the sparse component displays the correlations of the remaining residuals. We propose a Bayesian method for estimating the covariance matrices of such structures by representing the covariance model in the form of a factor model with an unknown number of latent factors. We introduce binary indicators for factor selection and rank estimation for the low-rank component, combined with a Bayesian lasso method for the estimation of the sparse component. Simulation studies show that our method can recover the rank as well as the sparsity of the two respective components. We further extend our method to a latent-factor Markov graphical model, with a focus on the sparse conditional graphical model of the residuals as well as selecting the number of factors. We show through simulations that our Bayesian model can successfully recover both the number of latent factors and the Markov graphical model of the residuals.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Aguilar, O., West, M.: Bayesian dynamic factor models and variance matrix discounting for portfolio allocation. J. Bus. Econ. Stat. 18, 338–357 (2000)

  2. Armstrong, H., Carter, C., Wong, K., Kohn, R.: Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Stat. Comput. 19(3), 303–316 (2009)

  3. Atay-Kayis, A., Massam, H.: The marginal likelihood for decomposable and non-decomposable graphical Gaussian models. Biometrika 92, 317–335 (2005)

  4. Baladandayuthapani, V., Talluri, R., Ji, Y., Coombes, K.R., Hennessy, B.T., Davies, M.A., Mallick, B.K.: Bayesian sparse graphical models for classification with application to protein expression data. Ann. Appl. Stat. 8(3), 1443–1468 (2014)

  5. Bhattacharya, A., Dunson, D.: Sparse Bayesian infinite factor models. Biometrika 98, 291–306 (2011)

  6. Bickel, P.J., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36, 2577–2604 (2008a)

  7. Bickel, P.J., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36, 199–227 (2008b)

  8. Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98, 807–820 (2011)

  9. Bonato, V., Baladandayuthapani, V., Broom, B.M., Sulman, E.P., Aldape, K.D., Do, K.: Bayesian ensemble methods for survival prediction in gene expression data. Bioinformatics 27, 359–367 (2010)

  10. Brooks, S., Giudici, P., Roberts, G.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. J. R. Stat. Soc. 65, 39 (2003)

  11. Cai, T., Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106, 672–684 (2011)

  12. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM, 58, Art. 11, 37 (2011)

  13. Carvalho, C.M., Chang, J., Lucas, J.E., Nevins, J., Wang, Q., West, M.: High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008)

  14. Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–576 (2011)

  15. Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–1967 (2012)

  16. Chiu, T.Y.M., Leonard, T., Tsui, K.W.: The matrix-logarithmic covariance model. J. Am. Stat. Assoc. 91, 198–210 (1996)

  17. Dawid, A.P.: Conditional independence in statistical theory (with Discussion). J. R. Stat. Soc. 41, 1–31 (1979)

  18. Dawid, A.P., Lauritzen, S.L.: Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 21(3), 1272–1317 (1993)

  19. Deng, X., Tsui, K.W.: Penalized covariance matrix estimation using a matrix-logarithm transformation. J. Comput. Gr. Stat. 22, 494–512 (2013)

  20. Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., Strahan, E.J.: Evaluating the use of exploratory factor analysis in psychological research. Psychol. Methods 3, 272–299 (1999)

  21. Fan, J., Fan, Y., Lv, J.: High dimensional covariance matrix estimation using a factor model. J. Econom. 147, 186–197 (2008)

  22. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

  23. Gelfand, A.E., Sahu, S.: Identifiability, Improper priors, and Gibbs sampling for generalized linear models. J. Am. Stat. Assoc. 94, 247–253 (1999)

  24. Geweke, J.F., Zhou, G.: Measuring the pricing error of the arbitrage pricing theory. Rev. Financ. Stud. 9, 557–587 (1996)

  25. Ghosh, J., Dunson, D.: Bayesian model selection in factor analytic models. In: Dunson, D. (ed.) Random Effect and Latent Variable Model Selection. Wiley, New York (2008)

  26. Giudici, P.: Bayesian inference for graphical factor analysis models. Psychometrika 66(4), 577–592 (2001)

  27. Giudici, P., Green, P.J.: Decomposable graphical Gaussian model determination. Biometrika 86, 785–801 (1999)

  28. Glorfeld, L.W.: An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educ. Psychol. Meas. 55, 377–393 (1995)

  29. Grzebyk, M., Wild, P., Chouanière, D.: On identification of multi-factor models with correlated residuals. Biometrika 91(1), 141–151 (2004)

  30. Hoff, P.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102, 674–685 (2007)

  31. Hoff, P., Niu, X.: A covariance regression model. Stat. Sin. 22, 729–753 (2012)

  32. Huang, J., Liu, N., Pourahmadi, M., Liu, L.: Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98 (2006)

  33. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)

  34. Knowles, D., Ghahramani, Z.: Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5, 1534–1552 (2011)

  35. Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)

  36. Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88(2), 365–411 (2004)

  37. Lee, S., Song, X.: Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika 29, 2339 (2002)

  38. Leonard, T., Hsu, J.S.J.: Bayesian inference for a covariance matrix. Ann. Stat. 20, 1669–1696 (1992)

  39. Levina, E., Rothman, A.J., Zhu, J.: Sparse estimation of large covariance matrices via a nested lasso penalty. Ann. Appl. Stat. 1, 245–263 (2008)

  40. Lindley, D.: Bayesian Statistics: A Review. SIAM, Philedelphia, PA (1971)

  41. Liu, R.Z., Graham, K., Glubrecht, D.D., Germain, D.R., et al.: Association of FABP5 expression with poor survival in triple-negative breast cancer: implication for retinoic acid therapy. Am. J. Pathol. 178(3), 997–1008 (2011)

  42. Liu, W., Luo, X.: High-dimensional sparse precision matrix estimation via sparse column inverse operator. Technical report. arXiv:1203.3896 (2012)

  43. Lopes, H.F., Carvalho, C.M.: Factor stochastic volatility with time varying loadings and Markov switching regimes. J. Stat. Plan. Inference 137, 3082–3091 (2007)

  44. Lopes, H.F., West, M.: Bayesian model assessment in factor analysis. Stat. Sin. 14, 41–67 (2004)

  45. Luo, X.: High dimensional low rank and sparse covariance matrix estimation via convex minimization. arXiv:1111.1133v1 (2013)

  46. Pati, D., Bhattacharya, A., Pillai, N.S., Dunson, D.: Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 42(3), 1102–1130 (2014)

  47. Poirier, D.J.: Revisiting beliefs in non-identified models. Technical report, University of Toronto, (1998)

  48. Roverato, A.: Hyper-inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Stat. 29, 391–411 (2002)

  49. Rowe, D.B.: Multivar. Bayesian Stat. Chapman & Hall, London (2003)

  50. Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., et al.: Genome-wide associations of gene expression variation in humans. PLoS Genet. 1(6), e78 (2005)

  51. Stanghellini, E.: Identification of a single-factor model using graphical Gaussian rules. Biometrika 84(1), 241–244 (1997)

  52. Wang, H.: Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7, 867–886 (2012)

  53. Wong, F., Carter, C., Kohn, R.: Efficient estimation of covariance selection models. Biometrika 90, 809–830 (2003)

  54. Wu, W.B., Pourahmadi, M.: Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90, 831–844 (2003)

  55. Wu, W.B., Pourahmadi, M.: Banding sample auto covariance matrices of stationary processes. Stat. Sin. 19, 1755 (2010)

  56. Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19–35 (2007)

Download references

Author information

Correspondence to Lin Zhang.

Appendices

Appendix 1: Derivation of the conditional densities for the elements in \(S\)

The conditional densities for the diagonal elements in S

For convenience, let \(\varLambda =(y-Mf)(y-Mf)^T\).

For each diagonal element \(S_{jj}\) in S, the full conditional density is

$$\begin{aligned} p(S_{jj} | \cdot ) \propto&|S|^{-\frac{n}{2}} e^{- (tr(S^{-1} \varLambda ) + \lambda S_{jj})/2 } \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that \(j=p\). Let \(S = \left[ \begin{array}{cc} S_{-j,-j} &{} S_{-j,j} \\ S_{j,-j} &{} S_{jj} \end{array} \right] \). With the property of matrices, we have

$$\begin{aligned} |S|&= \det (S_{-j,-j}) \cdot \det (S_{jj}-S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}),\\&\propto (S_{jj}-c), \text { where } c = S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}. \end{aligned}$$
$$\begin{aligned} S^{-1}&= \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \cdot (S_{jj}-c)^{-1} \\&\times \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] . \end{aligned}$$
$$\begin{aligned} tr(S^{-1}\varLambda )&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + tr \left( \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \right. \\&\left. \times (S_{jj}-c)^{-1} \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] \varLambda \right) ,\\&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + d(S_{jj}-c)^{-1} ,\\ \text { where } d&= \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j}&-1 \end{array} \right] \varLambda \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj} | \cdot ) \propto (S_{jj}-c)^{-\frac{n}{2}} e^{-\{ d(S_{jj}-c)^{-1} +\lambda S_{jj} \}/2 } \mathcal {I} \{S_{jj}>c\}. \end{aligned}$$

The conditional densities for the off-diagonal elements in S

The conditional density of the off-diagonal element of S, \(S_{jj^{\prime }}\), is

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto&\, |S|^{-\frac{n}{2}} e^{tr(S^{-1}\varLambda )/2} \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that \(j=p-1\) and \(j^{\prime }=p\). Let \(S = \left[ \begin{array}{cc} S_{-(jj^{\prime })} &{} S_{-(jj^{\prime }),jj^{\prime }} \\ S_{jj^{\prime },-(jj^{\prime })} &{} S_{jj^{\prime },jj^{\prime }} \end{array} \right] \), where \(S_{jj^{\prime },jj^{\prime }}=\left[ \begin{array}{cc} S_{jj} &{} S_{jj^{\prime }} \\ S_{jj^{\prime }} &{} S_{j^{\prime }j^{\prime }} \end{array} \right] \).

With the property of matrices, we have

$$\begin{aligned} |S|&\propto \det (S_{jj^{\prime },jj^{\prime }}-B), \end{aligned}$$
$$\begin{aligned} S^{-1} =&\left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] \\&\times (S_{jj^{\prime }}-B)^{-1} \left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] , \end{aligned}$$
$$\begin{aligned} tr(S^{-1}\varLambda ) =&\,tr\left( \left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] S\right) \\&+ tr( (S_{jj^{\prime },jj^{\prime }}-B)^{-1}D), \end{aligned}$$

where

$$\begin{aligned} B=&S_{jj^{\prime },-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime }),jj^{\prime }}, \\ D =&\left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] \varLambda \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot )&\propto |S_{jj^{\prime },jj^{\prime }}-B|^{-\frac{n}{2}} e^{-tr((S_{jj^{\prime }}-B)^{-1}D)/2 } \\&\times p(S_{ij}|\lambda ,\rho _{jj^{\prime }}) \mathcal {I}\{S \in \mathcal {S}^+\},\\&\propto \left\{ 1-\frac{(S_{jj^{\prime }}\!-\!B_{12})^2}{(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }} \!-\!B_{22})}\right\} ^{-\frac{n}{2}} \\&\times \exp \Big [ -\frac{1}{2} \Big \{(S_{j^{\prime }j^{\prime }}\!-\!B_{22})D_{11}+(S_{jj} \!-\!B_{11})D_{22} \\&-2(S_{jj^{\prime }}\!-\!B_{12})D_{12}) \Big \} \Big / \Big \{ (S_{jj}\!-\!B_{11}) (S_{j^{\prime }j^{\prime }}-B_{22}) \\&-(S_{jj^{\prime }}\!-\!B_{12})^2 \Big \} \Big ] \\&\times \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I} \{S_{jj^{\prime }}=0\} \!+\! \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \left\{ (S_{jj^{\prime }}-B_{12})^2<(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }}\!-\!B_{22}) \right\} . \end{aligned}$$

Appendix 2: Posterior inference for the Bayesian covariance decomposition model

In this section, we present the full conditional posterior distributions of the proposed Bayesian covariance decomposition model with the assumption of an sparse covariance graphical model of the residuals. We also present here a framework to carry out Markov chain Monte Carlo (MCMC) calculations. Note that the full conditional distributions of most parameters are available in a closed form, allowing for a straightforward Gibbs sampling algorithm, except for the off-diagonal elements in the sparse matrix \(S\). In this case, we employ an independent Metropolis–Hastings (MH) sampler within the Gibbs sampler to generate posterior samples of the off-diagonal elements in \(S\).

  1. 1.

    Sampling the factor loading matrix \(M\): Let \(\mathbf {M}_k=(m_{1k},\ldots ,m_{qk})^T\) be the \(k\)th column vector of \(M\), and \(M_{(-k)}\) be the matrix of \(M\) excluding the \(k\)th column. The full conditional distribution of \(\mathbf {M}_k\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned} \mathbf {M}_k|y,M_{(-k)},f,S&\sim \mathcal {N}_q(\varvec{\mu }^M_k,\varSigma ^M_k), \end{aligned}$$

    where

    $$\begin{aligned} \varSigma ^M_k =&\,\Big \{ S^{-1}\left( \sum ^n_{i=1}f^2_{ki} \right) + qI_q \Big \}^{-1}, \\ \varvec{\mu }^M_k =&\, \varSigma ^M_k S^{-1}(y-M_{ (-k)}f_{(-k)\cdot })\mathbf {f}_{k\cdot }^T. \end{aligned}$$

    Hence, we can draw samples of each column of \(M\) from a multivariate Gaussian distribution.

  2. 2.

    Sampling the random factors \(f\): Let \(f\) be the \(r \times n\) matrix with \(f_{ki}\) to be the value of \(k\)th factor in \(i\)th replicate. Then \(\mathbf {f}_{k\cdot } = (f_{k1},\ldots ,,f_{kn})\) is the \(k\)th row vector of \(f\), and \(f_{(-k)\cdot }\) denotes the matrix of \(f\) excluding the \(k\)th row. Note that \(\mathbf {f}_{k\cdot }\) can be viewed as the unobserved values of the random factor \(k\) in the \(n\) replicates. The full conditional distribution of the transpose of \(\mathbf {f}_{k\cdot }\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned}&\mathbf {f}^T_{k\cdot } | y,M, f_{(-k)\cdot }, S,z_k,\tau ^2_k \sim (1-z_k)\mathcal {N}_q(0,\tau _k^2 I_n)\\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + z_k \mathcal {N}_n(\varvec{\mu }^f_k, \sigma ^f_k I_n), \end{aligned}$$

    where

    $$\begin{aligned} \sigma ^f_k =&\,\Big ( M^T_k S^{-1} M_k +\tau ^{-2}_k \Big ) ^{-1}, \\ \varvec{\mu }^{f}_k =&\,\sigma ^{f}_k \Big \{ M^T_kS^{-1}(y-M_{(-k)}f_{(k)\cdot }) \Big \}^T. \end{aligned}$$

    Hence we can draw samples of each row of \(f\) from a mixture of two multivariate Gaussian distributions.

  3. 3.

    Sampling the binary diagonal matrix \(Z\): The full conditional of each diagonal element of \(Z\), \(z_k\), is a Bernoulli distribution:

    $$\begin{aligned} z_k | y,M,f,S,\tau ^2_k,p_k \sim&Bernoulli(p^*_k), \end{aligned}$$

    where \(p^*_k = 1- (1-p_k)/\Big \{ 1-p_k+p_k\cdot \Big ( \frac{\sigma ^f_k}{\tau ^2_k} \Big )^{\frac{n}{2}} \times \exp \Big ( - \frac{(\varvec{\mu }^f_k)^T \varvec{\mu }^f_k}{2\sigma ^f_k} \Big ) \Big \}\).

  4. 4.

    Sampling the probabilities \(p_k\) and \(\pi \): The full conditional of \(p_k\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned} p_k | z_k=1&\sim Beta ( a_p+1, b_p ), \\ p_k | z_k=0&\sim (1-\pi ^*) \mathcal {I}\{p_k=0\} + \pi ^*Beta(a_p, b_p+1), \end{aligned}$$

    where \(\pi ^* = \frac{\pi b_p}{a_p+b_p-\pi a_p}\). The full conditional of \(\pi \) is

    $$\begin{aligned} Beta \left( a_{\pi }+\sum _k \mathcal {I}\{p_k=0\}, b_{\pi }+ \sum _k \mathcal {I}\{p_k \ne 0 \} \right) . \end{aligned}$$
  5. 5.

    Sampling the positive diagonal matrix \(D_\tau \): The full conditional of each diagonal element of \(D_\tau \), \(\tau ^2_k\), for \(k=1,\ldots ,r\) is

    $$\begin{aligned} (1-z_k) \mathrm{IG}(a_\tau ,b_\tau ) + z_k \mathrm{IG} \left( a_\tau +\frac{n}{2}, b_\tau +\frac{\mathbf {f}_{k \cdot } \mathbf {f}^T_ {k\cdot }}{2} \right) . \end{aligned}$$
  6. 6.

    Sampling the diagonal elements \(S_{jj}\): The full posterior conditional density of the diagonal element \(S_{jj}\), for \(j=1,\ldots ,q\), is

    $$\begin{aligned} p(S_{jj} | \cdot ) \propto \, |S|^{-\frac{n}{2}} e^ {-\frac{tr(\varLambda S^{-1})}{2}-\frac{\lambda S_{jj} }{2} } \mathcal {I} \{S \in \mathcal {S}^+ \}, \end{aligned}$$

    where \(\varLambda =(y-Mf)(y-Mf)^T\). Without loss of generality, suppose that \(j=q\). By the properties of the matrix inverse and the matrix determinant, we have

    $$\begin{aligned} p(S_{jj} | \cdot ) \propto \, (S_{jj}-c)^{-\frac{n}{2}} e^{-\frac{d}{2}(S_{jj}-c)^{-1} -\frac{\lambda }{2} S_{jj} } \mathcal {I} \{S_{jj}>c\}, \end{aligned}$$

    where \(c = S_{j,-j} S^{-1}_{-j,-j} S_{-j,j}\), and \(d \!=\! [\begin{array}{lr} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ] \cdot \varLambda \cdot [\begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ]^T\). Note that the indicator function \(\mathcal {I} \{S_{jj}>c\}\) ensures the positive definiteness of \(S\) conditional on the other elements in \(S\). The derivation of the full conditional distributions for the elements of \(S\) is detailed in the Appendix. The above distribution is in a closed form. We now transform \(S_{jj}\) to \(\nu =S_{jj}-c\), then the conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot ) \propto \, \nu ^{-\frac{n}{2}} \exp \{-(d/\nu +\lambda \nu )/2\} \mathcal {I} \{\nu >0\}, \end{aligned}$$

    which is a generalized inverse Gaussian (GIG) distribution with parameters \((1-n/2,d,\lambda )\). Therefore, at each MCMC iteration, we can draw a sample of \(\nu \) from the GIG distribution, and obtain \(S_{jj}=\nu +c\).

  7. 7.

    is upper triangular. ThenSampling the off-diagonal elements \(S_{jj^{\prime }}\): The full posterior conditional density of the off-diagonal element \(S_{jj^{\prime }}\), \(j<j^{\prime }\) is

    $$\begin{aligned} p(S_{jj^{\prime }} | \cdot ) \propto \,&|S|^{-\frac{n}{2}} e^{-tr(\varLambda S^{-1})/2 } \Big [(1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+\} . \end{aligned}$$

    Without loss of generality, suppose that \(j=q-1\) and \(j^{\prime }=q\). By the properties of the matrix inverse and the matrix determinant, we have

    $$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto \,&(1\!-\!\frac{\nu ^2}{ab})^{\!-\!\frac{n}{2}} \exp \left\{ \!-\!\frac{bD_{11}\!+\!aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \right\} \\&\times \Big [ (1-\rho _{jj^{\prime }})\mathcal {I} \{S_{jj^{\prime }}=0\} + \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \{\nu ^2<ab\} \end{aligned}$$

    where

    $$\begin{aligned} \nu =&S_{jj^{\prime }}-B_{12}, \\ a=&S_{jj}-B_{11}, \\ b=&S_{j^{\prime }j^{\prime }}-B_{22}, \\ B=&S_{jj^{\prime }-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime })jj^{\prime }} \\ D =&[S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2] \varLambda [S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2]^T. \end{aligned}$$

    We make variable transformation, and the conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot ) \propto&\left( 1-\rho _{jj^{\prime }}\right) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab\} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g(\nu ), \end{aligned}$$

    where

    $$\begin{aligned} g(\nu ) =&\, (1-\frac{\nu ^2}{ab})^{-\frac{n}{2}} \exp \Big \{ -\frac{bD_{11}+aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \\&-\lambda |\nu +B_{12}| \Big \} \mathcal {I} \{\nu ^2<ab\}. \end{aligned}$$

    Note that the positive definite constraint on \(S\) is ensured by the indicator \(\mathcal {I} \{\nu ^2<ab\}\). That is, the posterior distribution of \(\nu \) is only over the region \((-\sqrt{ab},\sqrt{ab})\). The continuous part of the conditional distribution of \(\nu \), \(g(\nu )\), cannot be sampled directly. Furthermore, \(g(\nu )\) is nonconcave and therefore the sampler may be trapped locally if we use the random-walk MH algorithm within the Gibbs sampling. Since \(g(\nu )\) has density only over \((-\sqrt{ab},\sqrt{ab})\) and is zero elsewhere, we construct a piecewise uniform proposal distribution that approximates \(g(\nu )\). We choose \(\kappa -1\) equally spaced grids between (\(-\sqrt{ab}\), \(\sqrt{ab})\), \(-\sqrt{ab}=\nu _0<\nu _1<\cdots <\nu _{\kappa }=\sqrt{ab}\), which divide the domain of \(\nu \) into \(\kappa \) intervals of width \(2\sqrt{ab}/\kappa \). The piecewise uniform distribution is as follows:

    $$\begin{aligned} g_a(\nu ) = \left\{ \begin{array}{ll} g\left( \frac{\nu _0 + \nu _1}{2}\right) &{} \quad \text{ if } \, \nu _0 < \nu \le \nu _1 \\ g\left( \frac{\nu _1 + \nu _2}{2}\right) &{} \quad \text{ if } \, \nu _1 < \nu \le \nu _2 \\ \cdots &{} \\ g\left( \frac{\nu _{\kappa -1} + \nu _\kappa }{2}\right) &{} \quad \text{ if } \nu _{\kappa -1} < \nu < \nu _\kappa . \end{array} \right. \end{aligned}$$

    The independent MH proposal distribution for generating \(\nu =S_{jj^{\prime }}-B_{12}\) is given by

    $$\begin{aligned} q(\nu |\cdot ) \propto&\, (1-\rho _{jj^{\prime }}) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab \} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g_a(\nu ). \end{aligned}$$

    Samples of \(\nu \) can be generated by using an inverse-cumulative distribution function (CDF) method from the proposal distribution, and the proposal \(\nu ^*\) is accepted with the probability

    $$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$

    where \(\nu ^c = S^c_{jj^{\prime }}-B_{12}\) denotes the current state of \(\nu \). The piecewise uniform proposal distribution avoids the local-trap problem and can be sampled easily using an inverse-CDF method. Furthermore, \(q(\nu | \cdot )\) approximates the distribution \(p(\nu |\cdot )\) more accurately with an increased number of grids. Based on our simulations, \(100\) grids are enough for a fast convergence of \(S_{jj^{\prime }}\).

  8. 8.

    Sampling the shrinkage parameter \(\lambda \): The full conditional of \(\lambda \) is

    $$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|S_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q S_{jj}\right) , \end{aligned}$$

    where \(m\) equals the size of the set \(\{(j,j^{\prime }): j \le j^{\prime }, S_{jj^{\prime }} \ne 0 \}\).

  9. 9.

    Sampling the selection parameters \(\rho _{jj^{\prime }}\): The full conditional of \(\rho _{jj^{\prime }}\) for \(j<j^{\prime }\) is

    $$\begin{aligned} \mathrm{Beta}(a_\rho +\mathcal {I} \{S_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ S_{jj^{\prime }}=0 \}). \end{aligned}$$

Appendix 3: Posterior inference for the latent-factor conditional graphical model

We derive the full conditionals for all the parameters and perform posterior inference using a Gibbs sampling algorithm. Note that the prior specification for the parameters \(\{M, z_k, \tau _k\}\) in the graphical factor analysis models parallels that in the hierarchical model in formulas (9), so the full conditionals of these parameters for a Gibbs algorithm are similar to those in Sect. 1. In this section, we present the sampling algorithm of the parameter set \(\{C,\lambda ^C,\varvec{\rho }^C\}\) for the graphical model of the residuals.

Sampling \(\{C,\lambda ^C,\rho ^C\}\) for the graphical model of the residuals

For convenience, let \(\varLambda =(y-Mf)(y-Mf)^T\).

  1. 1.

    Sampling the diagonal elements of \(C\), \(C_{jj}\), for \(j=1,\ldots ,q\): The full conditional density of \(C_{jj}\) is

    $$\begin{aligned} p(C_{jj} | \cdot ) \propto&\,|C|^{\frac{n}{2}} e^{ -(\varLambda _{jj}C_{jj} + \lambda ^C C_{jj})/2 } \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$

    Without loss of generality, suppose that \(j=q\). Let \(C=R^{\prime }R\) be the Cholesky decomposition of \(C\) where the matrix \(R=(R_{jj^{\prime }})\) is upper triangular. Then

    $$\begin{aligned} p(C_{jj} | \cdot ) \propto&\, (C_{jj}-c)^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) C_{jj} } \mathcal {I} \{C_{jj}>c \}, \end{aligned}$$

    where \(c=\sum ^{q-1}_{j=1} R_{j,q}^2\) does not depend on \(C_{jj}\), and the indicator function \(\mathcal {I} \{C_{jj}>c \}\) ensures that \(C\) is positive definite. Let \(\nu =(C_{jj}-c)\), then the conditional distribution of \(\nu \) is

    $$\begin{aligned} p(\nu | \cdot ) \propto \,&\,\nu ^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) \nu } \mathcal {I} \{\nu >0 \}, \end{aligned}$$

    which follows \(\mathrm{Gamma}(\frac{n}{2}+1,\frac{\varLambda _{jj}+\lambda ^C}{2})\). Hence, we can draw samples of \(\nu \) from the Gamma distribution first, and obtain \(C_{jj}=\nu +c\).

  2. 2.

    Sampling the off-diagonal elements of \(C\), \(C_{jj^{\prime }}\), for \(j<j^{\prime }\): The complete conditional density of \(C_{jj^{\prime }}\) is

    $$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&|C|^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }} } p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \\&\times \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$

    Without loss of generality, suppose that \(j=q-1\) and \(j^{\prime }=q\). Then using Lemma 2 of Wong et al. (2003),

    $$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&\Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{ -\varLambda _{jj^{\prime }}C_{jj^{\prime }}} \\&\times p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \} \\ \propto&\Big [ (1-\rho ^C_{jj^{\prime }}) \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} \mathcal {I} \{C_{jj^{\prime }}=0 \} \\&+ \rho ^C_{jj^{\prime }} \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }}-\lambda ^C |C_{jj^{\prime }}| }\\&\times \mathcal {I} \{C_{jj^{\prime }} \ne 0 \} \Big ] \cdot \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \}, \end{aligned}$$

    where \(a=\sum ^{q-2}_{j=1} R_{j,q-1}R_{j,q}\), \(b=R_{q-1,q-1}\), and \(c=R_{q-1,q}^2+R_{q,q}^2\) do not depend on \(C_{jj^{\prime }}\). Transform \(C_{jj^{\prime }}\) to \(\nu = (C_{jj^{\prime }}-a)/(b\sqrt{c})\), and let \(\kappa =-a/(b\sqrt{c})\). The full conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot )&\propto \left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&\quad \quad \,\, + \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g(\nu ), \end{aligned}$$

    where \(g(\nu )=(1-\nu ^2)^{n/2} \exp \{ -\varLambda _{jj^{\prime }}(\nu b\sqrt{c} +a) -\lambda ^C|\nu b\sqrt{c} +a| \} \mathcal {I} \{|\nu |<1\}\). The positive definite constraint on \(C\) is ensured by the indicator function \( \mathcal {I} \{|\nu |<1\}\). The continuous part of the conditional distribution of \(\nu \), \(g(\nu )\), cannot be sampled directly. Since \(g(\nu )\) has density only over \((-1,1)\) and is zero elsewhere, we can use an independent MH algorithm as the sampling algorithm for \(S_{jj^{\prime }}\) in Sect. 1. The details of the independent MH algorithm are explained in Sect. 1. Briefly, we choose \(\kappa -1\) equally spaced grids between (-1,1), \(-1=\nu _0<\nu _1<\cdots <\nu _{\kappa }=1\), which divide the domain of \(\nu \) into \(\kappa \) intervals of width \(2/\kappa \), and construct a piecewise uniform distribution \(g_a(\nu )\) approximating \(g(\nu )\). The independent MH proposal for generating \(\nu \) is then given by

    $$\begin{aligned} q(\nu |\cdot ) \propto&\left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&+ \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g_a(\nu ). \end{aligned}$$

    Samples of \(\nu \) can be generated using an inverse-CDF method from the proposal distribution, and the proposal \(\nu ^*\) is accepted with the probability

    $$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$

    where \(\nu ^c = (C^c_{jj^{\prime }}-a)/(b\sqrt{c})\) denotes the current state of \(\nu \). Samples of \(C_{jj^{\prime }}\) are obtained as \(C_{jj^{\prime }}=\nu b\sqrt{c} +a\).

  3. 3.

    Sampling the shrinkage parameter \(\lambda ^C\): The full conditional of \(\lambda \) is

    $$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|C_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q C_{jj}\right) \!, \end{aligned}$$

    where \(m\) is equal to the size of the set \(\{(j,j^{\prime }): j \le j^{\prime }, C_{jj^{\prime }} \ne 0 \}\).

  4. 4.

    Sampling the selection parameters \(\rho ^C_{jj^{\prime }}\): The full conditional of \(\rho ^C_{jj^{\prime }}\) for \(j<j^{\prime }\) is

    $$\begin{aligned} \;\,\mathrm{Beta}(a_\rho +\mathcal {I} \{C_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ C_{jj^{\prime }}=0 \}). \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Sarkar, A. & Mallick, B.K. Bayesian sparse covariance decomposition with a graphical structure. Stat Comput 26, 493–510 (2016). https://doi.org/10.1007/s11222-014-9540-7

Download citation

Keywords

  • Bayesian graphical lasso
  • Covariance estimation
  • Factor graphical model
  • Factor analysis
  • Low-rank-plus-sparse decomposition