Bayesian sparse covariance decomposition with a graphical structure

Abstract

We consider the problem of estimating covariance matrices of a particular structure that is a summation of a low-rank component and a sparse component. This is a general covariance structure encountered in multiple statistical models including factor analysis and random effects models, where the low-rank component relates to the correlations among variables coming from the latent factors or random effects and the sparse component displays the correlations of the remaining residuals. We propose a Bayesian method for estimating the covariance matrices of such structures by representing the covariance model in the form of a factor model with an unknown number of latent factors. We introduce binary indicators for factor selection and rank estimation for the low-rank component, combined with a Bayesian lasso method for the estimation of the sparse component. Simulation studies show that our method can recover the rank as well as the sparsity of the two respective components. We further extend our method to a latent-factor Markov graphical model, with a focus on the sparse conditional graphical model of the residuals as well as selecting the number of factors. We show through simulations that our Bayesian model can successfully recover both the number of latent factors and the Markov graphical model of the residuals.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Aguilar, O., West, M.: Bayesian dynamic factor models and variance matrix discounting for portfolio allocation. J. Bus. Econ. Stat. 18, 338–357 (2000)

    Google Scholar 

  2. Armstrong, H., Carter, C., Wong, K., Kohn, R.: Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Stat. Comput. 19(3), 303–316 (2009)

    MathSciNet  Article  Google Scholar 

  3. Atay-Kayis, A., Massam, H.: The marginal likelihood for decomposable and non-decomposable graphical Gaussian models. Biometrika 92, 317–335 (2005)

    MATH  MathSciNet  Article  Google Scholar 

  4. Baladandayuthapani, V., Talluri, R., Ji, Y., Coombes, K.R., Hennessy, B.T., Davies, M.A., Mallick, B.K.: Bayesian sparse graphical models for classification with application to protein expression data. Ann. Appl. Stat. 8(3), 1443–1468 (2014)

  5. Bhattacharya, A., Dunson, D.: Sparse Bayesian infinite factor models. Biometrika 98, 291–306 (2011)

    MATH  MathSciNet  Article  Google Scholar 

  6. Bickel, P.J., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36, 2577–2604 (2008a)

    MATH  MathSciNet  Article  Google Scholar 

  7. Bickel, P.J., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36, 199–227 (2008b)

    MATH  MathSciNet  Article  Google Scholar 

  8. Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98, 807–820 (2011)

    MATH  MathSciNet  Article  Google Scholar 

  9. Bonato, V., Baladandayuthapani, V., Broom, B.M., Sulman, E.P., Aldape, K.D., Do, K.: Bayesian ensemble methods for survival prediction in gene expression data. Bioinformatics 27, 359–367 (2010)

    Article  Google Scholar 

  10. Brooks, S., Giudici, P., Roberts, G.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. J. R. Stat. Soc. 65, 39 (2003)

    MathSciNet  Article  Google Scholar 

  11. Cai, T., Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106, 672–684 (2011)

    MATH  MathSciNet  Article  Google Scholar 

  12. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM, 58, Art. 11, 37 (2011)

  13. Carvalho, C.M., Chang, J., Lucas, J.E., Nevins, J., Wang, Q., West, M.: High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Stat. Assoc. 103, 1438–1456 (2008)

    MATH  MathSciNet  Article  Google Scholar 

  14. Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–576 (2011)

    MATH  MathSciNet  Article  Google Scholar 

  15. Chandrasekaran, V., Parrilo, P.A., Willsky, A.S.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40(4), 1935–1967 (2012)

    MATH  MathSciNet  Article  Google Scholar 

  16. Chiu, T.Y.M., Leonard, T., Tsui, K.W.: The matrix-logarithmic covariance model. J. Am. Stat. Assoc. 91, 198–210 (1996)

    MATH  MathSciNet  Article  Google Scholar 

  17. Dawid, A.P.: Conditional independence in statistical theory (with Discussion). J. R. Stat. Soc. 41, 1–31 (1979)

    MATH  MathSciNet  Google Scholar 

  18. Dawid, A.P., Lauritzen, S.L.: Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 21(3), 1272–1317 (1993)

    MATH  MathSciNet  Article  Google Scholar 

  19. Deng, X., Tsui, K.W.: Penalized covariance matrix estimation using a matrix-logarithm transformation. J. Comput. Gr. Stat. 22, 494–512 (2013)

    MathSciNet  Article  Google Scholar 

  20. Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., Strahan, E.J.: Evaluating the use of exploratory factor analysis in psychological research. Psychol. Methods 3, 272–299 (1999)

    Article  Google Scholar 

  21. Fan, J., Fan, Y., Lv, J.: High dimensional covariance matrix estimation using a factor model. J. Econom. 147, 186–197 (2008)

    MathSciNet  Article  Google Scholar 

  22. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    MATH  Article  Google Scholar 

  23. Gelfand, A.E., Sahu, S.: Identifiability, Improper priors, and Gibbs sampling for generalized linear models. J. Am. Stat. Assoc. 94, 247–253 (1999)

    MATH  MathSciNet  Article  Google Scholar 

  24. Geweke, J.F., Zhou, G.: Measuring the pricing error of the arbitrage pricing theory. Rev. Financ. Stud. 9, 557–587 (1996)

    Article  Google Scholar 

  25. Ghosh, J., Dunson, D.: Bayesian model selection in factor analytic models. In: Dunson, D. (ed.) Random Effect and Latent Variable Model Selection. Wiley, New York (2008)

    Google Scholar 

  26. Giudici, P.: Bayesian inference for graphical factor analysis models. Psychometrika 66(4), 577–592 (2001)

    MATH  MathSciNet  Article  Google Scholar 

  27. Giudici, P., Green, P.J.: Decomposable graphical Gaussian model determination. Biometrika 86, 785–801 (1999)

    MATH  MathSciNet  Article  Google Scholar 

  28. Glorfeld, L.W.: An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educ. Psychol. Meas. 55, 377–393 (1995)

    Article  Google Scholar 

  29. Grzebyk, M., Wild, P., Chouanière, D.: On identification of multi-factor models with correlated residuals. Biometrika 91(1), 141–151 (2004)

    MATH  MathSciNet  Article  Google Scholar 

  30. Hoff, P.: Model averaging and dimension selection for the singular value decomposition. J. Am. Stat. Assoc. 102, 674–685 (2007)

  31. Hoff, P., Niu, X.: A covariance regression model. Stat. Sin. 22, 729–753 (2012)

    MATH  MathSciNet  Google Scholar 

  32. Huang, J., Liu, N., Pourahmadi, M., Liu, L.: Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93, 85–98 (2006)

    MATH  MathSciNet  Article  Google Scholar 

  33. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)

    MathSciNet  Article  Google Scholar 

  34. Knowles, D., Ghahramani, Z.: Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5, 1534–1552 (2011)

    MATH  MathSciNet  Article  Google Scholar 

  35. Lauritzen, S.L.: Graphical Models. Oxford University Press, Oxford (1996)

    Google Scholar 

  36. Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88(2), 365–411 (2004)

  37. Lee, S., Song, X.: Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika 29, 2339 (2002)

    MathSciNet  Article  Google Scholar 

  38. Leonard, T., Hsu, J.S.J.: Bayesian inference for a covariance matrix. Ann. Stat. 20, 1669–1696 (1992)

  39. Levina, E., Rothman, A.J., Zhu, J.: Sparse estimation of large covariance matrices via a nested lasso penalty. Ann. Appl. Stat. 1, 245–263 (2008)

  40. Lindley, D.: Bayesian Statistics: A Review. SIAM, Philedelphia, PA (1971)

    MATH  Google Scholar 

  41. Liu, R.Z., Graham, K., Glubrecht, D.D., Germain, D.R., et al.: Association of FABP5 expression with poor survival in triple-negative breast cancer: implication for retinoic acid therapy. Am. J. Pathol. 178(3), 997–1008 (2011)

    Article  Google Scholar 

  42. Liu, W., Luo, X.: High-dimensional sparse precision matrix estimation via sparse column inverse operator. Technical report. arXiv:1203.3896 (2012)

  43. Lopes, H.F., Carvalho, C.M.: Factor stochastic volatility with time varying loadings and Markov switching regimes. J. Stat. Plan. Inference 137, 3082–3091 (2007)

    MATH  MathSciNet  Article  Google Scholar 

  44. Lopes, H.F., West, M.: Bayesian model assessment in factor analysis. Stat. Sin. 14, 41–67 (2004)

    MATH  MathSciNet  Google Scholar 

  45. Luo, X.: High dimensional low rank and sparse covariance matrix estimation via convex minimization. arXiv:1111.1133v1 (2013)

  46. Pati, D., Bhattacharya, A., Pillai, N.S., Dunson, D.: Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 42(3), 1102–1130 (2014)

    MATH  MathSciNet  Article  Google Scholar 

  47. Poirier, D.J.: Revisiting beliefs in non-identified models. Technical report, University of Toronto, (1998)

  48. Roverato, A.: Hyper-inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Stat. 29, 391–411 (2002)

    MATH  MathSciNet  Article  Google Scholar 

  49. Rowe, D.B.: Multivar. Bayesian Stat. Chapman & Hall, London (2003)

    Google Scholar 

  50. Stranger, B.E., Forrest, M.S., Clark, A.G., Minichiello, M.J., Deutsch, S., et al.: Genome-wide associations of gene expression variation in humans. PLoS Genet. 1(6), e78 (2005)

    Article  Google Scholar 

  51. Stanghellini, E.: Identification of a single-factor model using graphical Gaussian rules. Biometrika 84(1), 241–244 (1997)

    MATH  Article  Google Scholar 

  52. Wang, H.: Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7, 867–886 (2012)

    MathSciNet  Article  Google Scholar 

  53. Wong, F., Carter, C., Kohn, R.: Efficient estimation of covariance selection models. Biometrika 90, 809–830 (2003)

    MathSciNet  Article  Google Scholar 

  54. Wu, W.B., Pourahmadi, M.: Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90, 831–844 (2003)

    MathSciNet  Article  Google Scholar 

  55. Wu, W.B., Pourahmadi, M.: Banding sample auto covariance matrices of stationary processes. Stat. Sin. 19, 1755 (2010)

    MathSciNet  Google Scholar 

  56. Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19–35 (2007)

    MATH  MathSciNet  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lin Zhang.

Appendices

Appendix 1: Derivation of the conditional densities for the elements in \(S\)

The conditional densities for the diagonal elements in S

For convenience, let \(\varLambda =(y-Mf)(y-Mf)^T\).

For each diagonal element \(S_{jj}\) in S, the full conditional density is

$$\begin{aligned} p(S_{jj} | \cdot ) \propto&|S|^{-\frac{n}{2}} e^{- (tr(S^{-1} \varLambda ) + \lambda S_{jj})/2 } \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that \(j=p\). Let \(S = \left[ \begin{array}{cc} S_{-j,-j} &{} S_{-j,j} \\ S_{j,-j} &{} S_{jj} \end{array} \right] \). With the property of matrices, we have

$$\begin{aligned} |S|&= \det (S_{-j,-j}) \cdot \det (S_{jj}-S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}),\\&\propto (S_{jj}-c), \text { where } c = S_{j,-j}S^{-1}_{-j,-j}S_{-j,j}. \end{aligned}$$
$$\begin{aligned} S^{-1}&= \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \cdot (S_{jj}-c)^{-1} \\&\times \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] . \end{aligned}$$
$$\begin{aligned} tr(S^{-1}\varLambda )&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + tr \left( \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] \right. \\&\left. \times (S_{jj}-c)^{-1} \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&-1 \end{array} \right] \varLambda \right) ,\\&= tr \left( \left[ \begin{array}{cc} S^{-1}_{-j,-j} &{} 0 \\ 0 &{} 0 \end{array} \right] \varLambda \right) + d(S_{jj}-c)^{-1} ,\\ \text { where } d&= \left[ \begin{array}{cc} S_{j,-j}S^{-1}_{-j}&-1 \end{array} \right] \varLambda \left[ \begin{array}{c} S^{-1}_{-j,-j}S_{-j,j} \\ -1 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj} | \cdot ) \propto (S_{jj}-c)^{-\frac{n}{2}} e^{-\{ d(S_{jj}-c)^{-1} +\lambda S_{jj} \}/2 } \mathcal {I} \{S_{jj}>c\}. \end{aligned}$$

The conditional densities for the off-diagonal elements in S

The conditional density of the off-diagonal element of S, \(S_{jj^{\prime }}\), is

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto&\, |S|^{-\frac{n}{2}} e^{tr(S^{-1}\varLambda )/2} \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+ \}. \end{aligned}$$

Without loss of generality, suppose that \(j=p-1\) and \(j^{\prime }=p\). Let \(S = \left[ \begin{array}{cc} S_{-(jj^{\prime })} &{} S_{-(jj^{\prime }),jj^{\prime }} \\ S_{jj^{\prime },-(jj^{\prime })} &{} S_{jj^{\prime },jj^{\prime }} \end{array} \right] \), where \(S_{jj^{\prime },jj^{\prime }}=\left[ \begin{array}{cc} S_{jj} &{} S_{jj^{\prime }} \\ S_{jj^{\prime }} &{} S_{j^{\prime }j^{\prime }} \end{array} \right] \).

With the property of matrices, we have

$$\begin{aligned} |S|&\propto \det (S_{jj^{\prime },jj^{\prime }}-B), \end{aligned}$$
$$\begin{aligned} S^{-1} =&\left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] + \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] \\&\times (S_{jj^{\prime }}-B)^{-1} \left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] , \end{aligned}$$
$$\begin{aligned} tr(S^{-1}\varLambda ) =&\,tr\left( \left[ \begin{array}{cc} S_{-(jj^{\prime })}^{-1} &{} 0 \\ 0 &{} 0 \end{array} \right] S\right) \\&+ tr( (S_{jj^{\prime },jj^{\prime }}-B)^{-1}D), \end{aligned}$$

where

$$\begin{aligned} B=&S_{jj^{\prime },-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime }),jj^{\prime }}, \\ D =&\left[ \begin{array} {cc} S_{jj^{\prime },-(jj^{\prime })}S_{-(jj^{\prime })}^{-1}&-I_2 \end{array} \right] \varLambda \left[ \begin{array}{c} S_{-(jj^{\prime })}^{-1}S_{-(jj^{\prime }),jj^{\prime }} \\ -I_2 \end{array} \right] . \end{aligned}$$

Hence, we have

$$\begin{aligned} p(S_{jj^{\prime }}|\cdot )&\propto |S_{jj^{\prime },jj^{\prime }}-B|^{-\frac{n}{2}} e^{-tr((S_{jj^{\prime }}-B)^{-1}D)/2 } \\&\times p(S_{ij}|\lambda ,\rho _{jj^{\prime }}) \mathcal {I}\{S \in \mathcal {S}^+\},\\&\propto \left\{ 1-\frac{(S_{jj^{\prime }}\!-\!B_{12})^2}{(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }} \!-\!B_{22})}\right\} ^{-\frac{n}{2}} \\&\times \exp \Big [ -\frac{1}{2} \Big \{(S_{j^{\prime }j^{\prime }}\!-\!B_{22})D_{11}+(S_{jj} \!-\!B_{11})D_{22} \\&-2(S_{jj^{\prime }}\!-\!B_{12})D_{12}) \Big \} \Big / \Big \{ (S_{jj}\!-\!B_{11}) (S_{j^{\prime }j^{\prime }}-B_{22}) \\&-(S_{jj^{\prime }}\!-\!B_{12})^2 \Big \} \Big ] \\&\times \Big [ (1-\rho _{jj^{\prime }}) \mathcal {I} \{S_{jj^{\prime }}=0\} \!+\! \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \left\{ (S_{jj^{\prime }}-B_{12})^2<(S_{jj}\!-\!B_{11})(S_{j^{\prime }j^{\prime }}\!-\!B_{22}) \right\} . \end{aligned}$$

Appendix 2: Posterior inference for the Bayesian covariance decomposition model

In this section, we present the full conditional posterior distributions of the proposed Bayesian covariance decomposition model with the assumption of an sparse covariance graphical model of the residuals. We also present here a framework to carry out Markov chain Monte Carlo (MCMC) calculations. Note that the full conditional distributions of most parameters are available in a closed form, allowing for a straightforward Gibbs sampling algorithm, except for the off-diagonal elements in the sparse matrix \(S\). In this case, we employ an independent Metropolis–Hastings (MH) sampler within the Gibbs sampler to generate posterior samples of the off-diagonal elements in \(S\).

  1. 1.

    Sampling the factor loading matrix \(M\): Let \(\mathbf {M}_k=(m_{1k},\ldots ,m_{qk})^T\) be the \(k\)th column vector of \(M\), and \(M_{(-k)}\) be the matrix of \(M\) excluding the \(k\)th column. The full conditional distribution of \(\mathbf {M}_k\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned} \mathbf {M}_k|y,M_{(-k)},f,S&\sim \mathcal {N}_q(\varvec{\mu }^M_k,\varSigma ^M_k), \end{aligned}$$

    where

    $$\begin{aligned} \varSigma ^M_k =&\,\Big \{ S^{-1}\left( \sum ^n_{i=1}f^2_{ki} \right) + qI_q \Big \}^{-1}, \\ \varvec{\mu }^M_k =&\, \varSigma ^M_k S^{-1}(y-M_{ (-k)}f_{(-k)\cdot })\mathbf {f}_{k\cdot }^T. \end{aligned}$$

    Hence, we can draw samples of each column of \(M\) from a multivariate Gaussian distribution.

  2. 2.

    Sampling the random factors \(f\): Let \(f\) be the \(r \times n\) matrix with \(f_{ki}\) to be the value of \(k\)th factor in \(i\)th replicate. Then \(\mathbf {f}_{k\cdot } = (f_{k1},\ldots ,,f_{kn})\) is the \(k\)th row vector of \(f\), and \(f_{(-k)\cdot }\) denotes the matrix of \(f\) excluding the \(k\)th row. Note that \(\mathbf {f}_{k\cdot }\) can be viewed as the unobserved values of the random factor \(k\) in the \(n\) replicates. The full conditional distribution of the transpose of \(\mathbf {f}_{k\cdot }\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned}&\mathbf {f}^T_{k\cdot } | y,M, f_{(-k)\cdot }, S,z_k,\tau ^2_k \sim (1-z_k)\mathcal {N}_q(0,\tau _k^2 I_n)\\&\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + z_k \mathcal {N}_n(\varvec{\mu }^f_k, \sigma ^f_k I_n), \end{aligned}$$

    where

    $$\begin{aligned} \sigma ^f_k =&\,\Big ( M^T_k S^{-1} M_k +\tau ^{-2}_k \Big ) ^{-1}, \\ \varvec{\mu }^{f}_k =&\,\sigma ^{f}_k \Big \{ M^T_kS^{-1}(y-M_{(-k)}f_{(k)\cdot }) \Big \}^T. \end{aligned}$$

    Hence we can draw samples of each row of \(f\) from a mixture of two multivariate Gaussian distributions.

  3. 3.

    Sampling the binary diagonal matrix \(Z\): The full conditional of each diagonal element of \(Z\), \(z_k\), is a Bernoulli distribution:

    $$\begin{aligned} z_k | y,M,f,S,\tau ^2_k,p_k \sim&Bernoulli(p^*_k), \end{aligned}$$

    where \(p^*_k = 1- (1-p_k)/\Big \{ 1-p_k+p_k\cdot \Big ( \frac{\sigma ^f_k}{\tau ^2_k} \Big )^{\frac{n}{2}} \times \exp \Big ( - \frac{(\varvec{\mu }^f_k)^T \varvec{\mu }^f_k}{2\sigma ^f_k} \Big ) \Big \}\).

  4. 4.

    Sampling the probabilities \(p_k\) and \(\pi \): The full conditional of \(p_k\) for \(k=1,\ldots ,r\) is

    $$\begin{aligned} p_k | z_k=1&\sim Beta ( a_p+1, b_p ), \\ p_k | z_k=0&\sim (1-\pi ^*) \mathcal {I}\{p_k=0\} + \pi ^*Beta(a_p, b_p+1), \end{aligned}$$

    where \(\pi ^* = \frac{\pi b_p}{a_p+b_p-\pi a_p}\). The full conditional of \(\pi \) is

    $$\begin{aligned} Beta \left( a_{\pi }+\sum _k \mathcal {I}\{p_k=0\}, b_{\pi }+ \sum _k \mathcal {I}\{p_k \ne 0 \} \right) . \end{aligned}$$
  5. 5.

    Sampling the positive diagonal matrix \(D_\tau \): The full conditional of each diagonal element of \(D_\tau \), \(\tau ^2_k\), for \(k=1,\ldots ,r\) is

    $$\begin{aligned} (1-z_k) \mathrm{IG}(a_\tau ,b_\tau ) + z_k \mathrm{IG} \left( a_\tau +\frac{n}{2}, b_\tau +\frac{\mathbf {f}_{k \cdot } \mathbf {f}^T_ {k\cdot }}{2} \right) . \end{aligned}$$
  6. 6.

    Sampling the diagonal elements \(S_{jj}\): The full posterior conditional density of the diagonal element \(S_{jj}\), for \(j=1,\ldots ,q\), is

    $$\begin{aligned} p(S_{jj} | \cdot ) \propto \, |S|^{-\frac{n}{2}} e^ {-\frac{tr(\varLambda S^{-1})}{2}-\frac{\lambda S_{jj} }{2} } \mathcal {I} \{S \in \mathcal {S}^+ \}, \end{aligned}$$

    where \(\varLambda =(y-Mf)(y-Mf)^T\). Without loss of generality, suppose that \(j=q\). By the properties of the matrix inverse and the matrix determinant, we have

    $$\begin{aligned} p(S_{jj} | \cdot ) \propto \, (S_{jj}-c)^{-\frac{n}{2}} e^{-\frac{d}{2}(S_{jj}-c)^{-1} -\frac{\lambda }{2} S_{jj} } \mathcal {I} \{S_{jj}>c\}, \end{aligned}$$

    where \(c = S_{j,-j} S^{-1}_{-j,-j} S_{-j,j}\), and \(d \!=\! [\begin{array}{lr} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ] \cdot \varLambda \cdot [\begin{array}{cc} S_{j,-j}S^{-1}_{-j,-j}&\!-\!1 \end{array} ]^T\). Note that the indicator function \(\mathcal {I} \{S_{jj}>c\}\) ensures the positive definiteness of \(S\) conditional on the other elements in \(S\). The derivation of the full conditional distributions for the elements of \(S\) is detailed in the Appendix. The above distribution is in a closed form. We now transform \(S_{jj}\) to \(\nu =S_{jj}-c\), then the conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot ) \propto \, \nu ^{-\frac{n}{2}} \exp \{-(d/\nu +\lambda \nu )/2\} \mathcal {I} \{\nu >0\}, \end{aligned}$$

    which is a generalized inverse Gaussian (GIG) distribution with parameters \((1-n/2,d,\lambda )\). Therefore, at each MCMC iteration, we can draw a sample of \(\nu \) from the GIG distribution, and obtain \(S_{jj}=\nu +c\).

  7. 7.

    is upper triangular. ThenSampling the off-diagonal elements \(S_{jj^{\prime }}\): The full posterior conditional density of the off-diagonal element \(S_{jj^{\prime }}\), \(j<j^{\prime }\) is

    $$\begin{aligned} p(S_{jj^{\prime }} | \cdot ) \propto \,&|S|^{-\frac{n}{2}} e^{-tr(\varLambda S^{-1})/2 } \Big [(1-\rho _{jj^{\prime }}) \mathcal {I}\{S_{jj^{\prime }}=0\} \\&+ \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \mathcal {I} \{S \in \mathcal {S}^+\} . \end{aligned}$$

    Without loss of generality, suppose that \(j=q-1\) and \(j^{\prime }=q\). By the properties of the matrix inverse and the matrix determinant, we have

    $$\begin{aligned} p(S_{jj^{\prime }}|\cdot ) \propto \,&(1\!-\!\frac{\nu ^2}{ab})^{\!-\!\frac{n}{2}} \exp \left\{ \!-\!\frac{bD_{11}\!+\!aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \right\} \\&\times \Big [ (1-\rho _{jj^{\prime }})\mathcal {I} \{S_{jj^{\prime }}=0\} + \rho _{jj^{\prime }} \frac{\lambda }{2} e^{-\lambda |S_{jj^{\prime }}|} \Big ] \\&\times \mathcal {I} \{\nu ^2<ab\} \end{aligned}$$

    where

    $$\begin{aligned} \nu =&S_{jj^{\prime }}-B_{12}, \\ a=&S_{jj}-B_{11}, \\ b=&S_{j^{\prime }j^{\prime }}-B_{22}, \\ B=&S_{jj^{\prime }-(jj^{\prime })}S^{-1}_{-(jj^{\prime })}S_{-(jj^{\prime })jj^{\prime }} \\ D =&[S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2] \varLambda [S_{jj^{\prime } -(jj^{\prime })}S^{-1}_{-(jj^{\prime })}\text { } -I_2]^T. \end{aligned}$$

    We make variable transformation, and the conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot ) \propto&\left( 1-\rho _{jj^{\prime }}\right) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab\} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g(\nu ), \end{aligned}$$

    where

    $$\begin{aligned} g(\nu ) =&\, (1-\frac{\nu ^2}{ab})^{-\frac{n}{2}} \exp \Big \{ -\frac{bD_{11}+aD_{22}-2D_{12}\nu }{2(ab-\nu ^2)} \\&-\lambda |\nu +B_{12}| \Big \} \mathcal {I} \{\nu ^2<ab\}. \end{aligned}$$

    Note that the positive definite constraint on \(S\) is ensured by the indicator \(\mathcal {I} \{\nu ^2<ab\}\). That is, the posterior distribution of \(\nu \) is only over the region \((-\sqrt{ab},\sqrt{ab})\). The continuous part of the conditional distribution of \(\nu \), \(g(\nu )\), cannot be sampled directly. Furthermore, \(g(\nu )\) is nonconcave and therefore the sampler may be trapped locally if we use the random-walk MH algorithm within the Gibbs sampling. Since \(g(\nu )\) has density only over \((-\sqrt{ab},\sqrt{ab})\) and is zero elsewhere, we construct a piecewise uniform proposal distribution that approximates \(g(\nu )\). We choose \(\kappa -1\) equally spaced grids between (\(-\sqrt{ab}\), \(\sqrt{ab})\), \(-\sqrt{ab}=\nu _0<\nu _1<\cdots <\nu _{\kappa }=\sqrt{ab}\), which divide the domain of \(\nu \) into \(\kappa \) intervals of width \(2\sqrt{ab}/\kappa \). The piecewise uniform distribution is as follows:

    $$\begin{aligned} g_a(\nu ) = \left\{ \begin{array}{ll} g\left( \frac{\nu _0 + \nu _1}{2}\right) &{} \quad \text{ if } \, \nu _0 < \nu \le \nu _1 \\ g\left( \frac{\nu _1 + \nu _2}{2}\right) &{} \quad \text{ if } \, \nu _1 < \nu \le \nu _2 \\ \cdots &{} \\ g\left( \frac{\nu _{\kappa -1} + \nu _\kappa }{2}\right) &{} \quad \text{ if } \nu _{\kappa -1} < \nu < \nu _\kappa . \end{array} \right. \end{aligned}$$

    The independent MH proposal distribution for generating \(\nu =S_{jj^{\prime }}-B_{12}\) is given by

    $$\begin{aligned} q(\nu |\cdot ) \propto&\, (1-\rho _{jj^{\prime }}) g(-B_{12}) \mathcal {I} \{\nu =-B_{12}, \nu ^2<ab \} \\&+ \frac{\rho _{jj^{\prime }} \lambda }{2} g_a(\nu ). \end{aligned}$$

    Samples of \(\nu \) can be generated by using an inverse-cumulative distribution function (CDF) method from the proposal distribution, and the proposal \(\nu ^*\) is accepted with the probability

    $$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$

    where \(\nu ^c = S^c_{jj^{\prime }}-B_{12}\) denotes the current state of \(\nu \). The piecewise uniform proposal distribution avoids the local-trap problem and can be sampled easily using an inverse-CDF method. Furthermore, \(q(\nu | \cdot )\) approximates the distribution \(p(\nu |\cdot )\) more accurately with an increased number of grids. Based on our simulations, \(100\) grids are enough for a fast convergence of \(S_{jj^{\prime }}\).

  8. 8.

    Sampling the shrinkage parameter \(\lambda \): The full conditional of \(\lambda \) is

    $$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|S_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q S_{jj}\right) , \end{aligned}$$

    where \(m\) equals the size of the set \(\{(j,j^{\prime }): j \le j^{\prime }, S_{jj^{\prime }} \ne 0 \}\).

  9. 9.

    Sampling the selection parameters \(\rho _{jj^{\prime }}\): The full conditional of \(\rho _{jj^{\prime }}\) for \(j<j^{\prime }\) is

    $$\begin{aligned} \mathrm{Beta}(a_\rho +\mathcal {I} \{S_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ S_{jj^{\prime }}=0 \}). \end{aligned}$$

Appendix 3: Posterior inference for the latent-factor conditional graphical model

We derive the full conditionals for all the parameters and perform posterior inference using a Gibbs sampling algorithm. Note that the prior specification for the parameters \(\{M, z_k, \tau _k\}\) in the graphical factor analysis models parallels that in the hierarchical model in formulas (9), so the full conditionals of these parameters for a Gibbs algorithm are similar to those in Sect. 1. In this section, we present the sampling algorithm of the parameter set \(\{C,\lambda ^C,\varvec{\rho }^C\}\) for the graphical model of the residuals.

Sampling \(\{C,\lambda ^C,\rho ^C\}\) for the graphical model of the residuals

For convenience, let \(\varLambda =(y-Mf)(y-Mf)^T\).

  1. 1.

    Sampling the diagonal elements of \(C\), \(C_{jj}\), for \(j=1,\ldots ,q\): The full conditional density of \(C_{jj}\) is

    $$\begin{aligned} p(C_{jj} | \cdot ) \propto&\,|C|^{\frac{n}{2}} e^{ -(\varLambda _{jj}C_{jj} + \lambda ^C C_{jj})/2 } \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$

    Without loss of generality, suppose that \(j=q\). Let \(C=R^{\prime }R\) be the Cholesky decomposition of \(C\) where the matrix \(R=(R_{jj^{\prime }})\) is upper triangular. Then

    $$\begin{aligned} p(C_{jj} | \cdot ) \propto&\, (C_{jj}-c)^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) C_{jj} } \mathcal {I} \{C_{jj}>c \}, \end{aligned}$$

    where \(c=\sum ^{q-1}_{j=1} R_{j,q}^2\) does not depend on \(C_{jj}\), and the indicator function \(\mathcal {I} \{C_{jj}>c \}\) ensures that \(C\) is positive definite. Let \(\nu =(C_{jj}-c)\), then the conditional distribution of \(\nu \) is

    $$\begin{aligned} p(\nu | \cdot ) \propto \,&\,\nu ^{\frac{n}{2}} e^{ -\left( \frac{\varLambda _{jj}+\lambda ^C}{2}\right) \nu } \mathcal {I} \{\nu >0 \}, \end{aligned}$$

    which follows \(\mathrm{Gamma}(\frac{n}{2}+1,\frac{\varLambda _{jj}+\lambda ^C}{2})\). Hence, we can draw samples of \(\nu \) from the Gamma distribution first, and obtain \(C_{jj}=\nu +c\).

  2. 2.

    Sampling the off-diagonal elements of \(C\), \(C_{jj^{\prime }}\), for \(j<j^{\prime }\): The complete conditional density of \(C_{jj^{\prime }}\) is

    $$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&|C|^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }} } p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \\&\times \mathcal {I} \{C \in \mathcal {C}^+ \}. \end{aligned}$$

    Without loss of generality, suppose that \(j=q-1\) and \(j^{\prime }=q\). Then using Lemma 2 of Wong et al. (2003),

    $$\begin{aligned} p(C_{jj^{\prime }} | \cdot ) \propto \,&\Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{ -\varLambda _{jj^{\prime }}C_{jj^{\prime }}} \\&\times p(C_{jj^{\prime }} |\rho ^C_{jj^{\prime }},\lambda ^C) \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \} \\ \propto&\Big [ (1-\rho ^C_{jj^{\prime }}) \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} \mathcal {I} \{C_{jj^{\prime }}=0 \} \\&+ \rho ^C_{jj^{\prime }} \Big \{ 1-\frac{(C_{jj^{\prime }}-a)^2}{cb^2} \Big \}^{\frac{n}{2}} e^{-\varLambda _{jj^{\prime }}C_{jj^{\prime }}-\lambda ^C |C_{jj^{\prime }}| }\\&\times \mathcal {I} \{C_{jj^{\prime }} \ne 0 \} \Big ] \cdot \mathcal {I} \{|C_{jj^{\prime }}-a|<b\sqrt{c} \}, \end{aligned}$$

    where \(a=\sum ^{q-2}_{j=1} R_{j,q-1}R_{j,q}\), \(b=R_{q-1,q-1}\), and \(c=R_{q-1,q}^2+R_{q,q}^2\) do not depend on \(C_{jj^{\prime }}\). Transform \(C_{jj^{\prime }}\) to \(\nu = (C_{jj^{\prime }}-a)/(b\sqrt{c})\), and let \(\kappa =-a/(b\sqrt{c})\). The full conditional density of \(\nu \) is

    $$\begin{aligned} p(\nu |\cdot )&\propto \left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&\quad \quad \,\, + \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g(\nu ), \end{aligned}$$

    where \(g(\nu )=(1-\nu ^2)^{n/2} \exp \{ -\varLambda _{jj^{\prime }}(\nu b\sqrt{c} +a) -\lambda ^C|\nu b\sqrt{c} +a| \} \mathcal {I} \{|\nu |<1\}\). The positive definite constraint on \(C\) is ensured by the indicator function \( \mathcal {I} \{|\nu |<1\}\). The continuous part of the conditional distribution of \(\nu \), \(g(\nu )\), cannot be sampled directly. Since \(g(\nu )\) has density only over \((-1,1)\) and is zero elsewhere, we can use an independent MH algorithm as the sampling algorithm for \(S_{jj^{\prime }}\) in Sect. 1. The details of the independent MH algorithm are explained in Sect. 1. Briefly, we choose \(\kappa -1\) equally spaced grids between (-1,1), \(-1=\nu _0<\nu _1<\cdots <\nu _{\kappa }=1\), which divide the domain of \(\nu \) into \(\kappa \) intervals of width \(2/\kappa \), and construct a piecewise uniform distribution \(g_a(\nu )\) approximating \(g(\nu )\). The independent MH proposal for generating \(\nu \) is then given by

    $$\begin{aligned} q(\nu |\cdot ) \propto&\left( 1-\rho ^C_{jj^{\prime }}\right) (1-\kappa ^2)^{\frac{n}{2}} \mathcal {I} \{\nu =\kappa ,|\nu | < 1\} \\&+ \frac{\rho ^C_{jj^{\prime }} \lambda ^C}{2} g_a(\nu ). \end{aligned}$$

    Samples of \(\nu \) can be generated using an inverse-CDF method from the proposal distribution, and the proposal \(\nu ^*\) is accepted with the probability

    $$\begin{aligned} \alpha = \min \left\{ 1, \frac{p(\nu ^*)}{q(\nu ^*)} \Big / \frac{p(\nu ^c)}{q(\nu ^c)} \right\} , \end{aligned}$$

    where \(\nu ^c = (C^c_{jj^{\prime }}-a)/(b\sqrt{c})\) denotes the current state of \(\nu \). Samples of \(C_{jj^{\prime }}\) are obtained as \(C_{jj^{\prime }}=\nu b\sqrt{c} +a\).

  3. 3.

    Sampling the shrinkage parameter \(\lambda ^C\): The full conditional of \(\lambda \) is

    $$\begin{aligned} \mathrm{Gamma}\left( a_\lambda +m, b_\lambda +\sum _{j<j^{\prime }}|C_{jj^{\prime }}|+\frac{1}{2}\sum _{j=1}^q C_{jj}\right) \!, \end{aligned}$$

    where \(m\) is equal to the size of the set \(\{(j,j^{\prime }): j \le j^{\prime }, C_{jj^{\prime }} \ne 0 \}\).

  4. 4.

    Sampling the selection parameters \(\rho ^C_{jj^{\prime }}\): The full conditional of \(\rho ^C_{jj^{\prime }}\) for \(j<j^{\prime }\) is

    $$\begin{aligned} \;\,\mathrm{Beta}(a_\rho +\mathcal {I} \{C_{jj^{\prime }} \ne 0\}, b_\rho +\mathcal {I} \{ C_{jj^{\prime }}=0 \}). \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Sarkar, A. & Mallick, B.K. Bayesian sparse covariance decomposition with a graphical structure. Stat Comput 26, 493–510 (2016). https://doi.org/10.1007/s11222-014-9540-7

Download citation

Keywords

  • Bayesian graphical lasso
  • Covariance estimation
  • Factor graphical model
  • Factor analysis
  • Low-rank-plus-sparse decomposition