Skip to main content
Log in

Dynamic and robust Bayesian graphical models

  • OriginalPaper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Gaussian graphical models are widely popular for studying the conditional dependence among random variables. By encoding conditional dependence as an undirected graph, Gaussian graphical models provide interpretable representations and insightful visualizations of the relationships among variables. However, time series data present additional challenges: the graphs can evolve over time—with changes occurring at unknown time points—and the data often exhibit heavy-tailed characteristics. To address these challenges, we propose dynamic and robust Bayesian graphical models that employ state-of-the-art hidden Markov models (HMMs) to introduce dynamics in the graph and heavy-tailed multivariate t-distributions for model robustness. The HMM latent states are linked both temporally and hierarchically for greater information sharing across time and between states. The proposed methods are computationally efficient and demonstrate excellent graph estimation on simulated data with substantial improvements over non-robust graphical models. We demonstrate our proposed approach on human hand gesture tracking data, and discover edges and dynamics with well explained practical meanings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019)

    Article  Google Scholar 

  • Cremaschi, A., Argiento, R., Shoemaker, K., Peterson, C., Vannucci, M.: Hierarchical normalized completely random measures for robust graphical modeling. Bayesian Anal. 14(4), 1271 (2019)

  • Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(2), 373 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Feng, Y., Wu, Y.: Network exploration via the adaptive Lasso and SCAD penalties. Ann. Appl. Stat. 3(2), 521 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Finegold, M., Drton, M.: Robust graphical modeling of gene networks using classical and alternative \(t\)-distributions. Ann. Appl. Stat. 5(2A), 1057–1080 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Finegold, M., Drton, M., et al.: Robust Bayesian graphical modeling using Dirichlet \( t \)-distributions. Bayesian Anal. 9(3), 521–550 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    Article  MATH  Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall/CRC, London (1995)

    Book  MATH  Google Scholar 

  • Geweke, J., et al.: Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments, vol. 196. Federal Reserve Bank of Minneapolis, Research Department Minneapolis (1991)

  • Gibberd, A.J., Nelson, J.D.: Regularized estimation of piecewise constant Gaussian graphical models: the group-fused graphical lasso. J. Comput. Graph. Stat. 26(3), 623–634 (2017)

    Article  MathSciNet  Google Scholar 

  • Gottard, A., Pacillo, S.: Robust concentration graph model selection. Comput. Stat. Data Anal. 54(12), 3070–3079 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Han, T.X., Ning, H., Huang, T.S.: Efficient nonparametric belief propagation with application to articulated body tracking. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, pp. 214–221. IEEE (2006)

  • Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96(453), 161–173 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Jasra, A., Holmes, C.C., Stephens, D.A.: Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20(1), 50–67 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  • Kim, D., Song, J., Kim, D.: Simultaneous gesture segmentation and recognition based on forward spotting accumulative HMMs. Pattern Recognit. 40(11), 3012–3026 (2007)

    Article  MATH  Google Scholar 

  • Kolar, M., Song, L., Ahmed, A., Xing, E.P., et al.: Estimating time-varying networks. Ann. Appl. Stat. 4(1), 94–123 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Kotz, S., Nadarajah, S.: Multivariate \(t\)-Distributions and Their Applications. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  • Lenkoski, A., Dobra, A.: Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. J. Comput. Graph. Stat. 20(1), 140–157 (2011)

    Article  MathSciNet  Google Scholar 

  • Liu, X., Daniels, M.J.: A new algorithm for simulating a correlation matrix based on parameter expansion and reparameterization. J. Comput. Graph. Stat. 15(4), 897–914 (2006)

    Article  MathSciNet  Google Scholar 

  • Liu, Y., Wichura, M.J., Drton, M.: Rejection sampling for an extended Gamma distribution. Submitted (2012)

  • Lun, R., Zhao, W.: A survey of applications and human motion recognition with microsoft kinect. Int. J. Pattern Recognit. Artif. Intell. 29(05), 1555008 (2015)

    Article  Google Scholar 

  • Madeo, R.C., Lima, C.A., Peres, S.M.: Gesture unit segmentation using support vector machines: segmenting gestures from rest positions. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 46–52 (2013)

  • Meinshausen, N., Bühlmann, P., et al.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 37(3), 311–324 (2007)

  • Miyamura, M., Kano, Y.: Robust Gaussian graphical modeling. J. Multivar. Anal. 97(7), 1525–1550 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)

    MathSciNet  Google Scholar 

  • Ni, Y., Baladandayuthapani, V., Vannucci, M., Stingo, F.C.: Bayesian graphical models for modern biological applications. In: Statistical Methods and Applications, pp. 1–29 (2021)

  • Osborne, N., Peterson, C.B., Vannucci, M.: Latent network estimation and variable selection for compositional data via variational EM. J. Comput. Graph. Stat. 31(1), 163–175 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  • Peng, J., Wang, P., Zhou, N., Zhu, J.: Partial correlation estimation by joint sparse regression models. J. Am. Stat. Assoc. 104(486), 735–746 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Peterson, C.B., Osborne, N., Stingo, F.C., Bourgeat, P., Doecke, J.D., Vannucci, M.: Bayesian modeling of multiple structural connectivity networks during the progression of Alzheimer’s disease. Biometrics 76(4), 1120–1132 (2020)

    Article  MathSciNet  Google Scholar 

  • Pinheiro, J.C., Liu, C., Wu, Y.N.: Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate \(t\) distribution. J. Comput. Graph. Stat. 10(2), 249–276 (2001)

    Article  MathSciNet  Google Scholar 

  • Rodríguez, C.E., Walker, S.G.: Label switching in Bayesian mixture models: deterministic relabeling strategies. J. Comput. Graph. Stat. 23(1), 25–45 (2014)

    Article  MathSciNet  Google Scholar 

  • Roverato, A.: Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Stat. 29(3), 391–411 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Scott, S.L.: Bayesian methods for hidden Markov models: recursive computing in the 21st century. J. Am. Stat. Assoc. 97(457), 337–351 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Shaddox, E., Stingo, F.C., Peterson, C.B., Jacobson, S., Cruickshank-Quinn, C., Kechris, K., Bowler, R., Vannucci, M.: A Bayesian approach for learning gene networks underlying disease severity in COPD. Stat. Biosci. 10(1), 59–85 (2018)

    Article  Google Scholar 

  • Sigal, L., Isard, M., Haussecker, H., Black, M.J.: Loose-limbed people: estimating 3d human pose and motion using non-parametric belief propagation. Int. J. Comput. Vis. 98(1), 15–48 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Song, Y., Goncalves, L., Perona, P.: Unsupervised learning of human motion. IEEE Trans. Pattern Anal. Mach. Intell. 25(7), 814–827 (2003)

    Article  Google Scholar 

  • Sperrin, M., Jaki, T., Wit, E.: Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat. Comput. 20(3), 357–366 (2010)

    Article  MathSciNet  Google Scholar 

  • Sun, H., Li, H.: Robust Gaussian graphical modeling via \(\ell _1\) penalization. Biometrics 68(4), 1197–1206 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Tan, L.S., Jasra, A., De Iorio, M., Ebbels, T.M.: Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks. Ann. App. Stat. 11(4), 2222–2251 (2017)

    MathSciNet  MATH  Google Scholar 

  • Vinciotti, V., Hashem, H.: Robust methods for inferring sparse network structures. Comput. Stat. Data Anal. 67, 84–94 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, H.: Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7(4), 867–886 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, H.: Scaling it up: stochastic search structure learning in graphical models. Bayesian Anal. 10(2), 351–377 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Warnick, R., Guindani, M., Erhardt, E., Allen, E., Calhoun, V., Vannucci, M.: A Bayesian approach for estimating dynamic functional network connectivity in fMRI data. J. Am. Stat. Assoc. 113(521), 134–151 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Xu, R., Wu, J., Yue, X., Li, Y.: Online structural change-point detection of high-dimensional streaming data via dynamic sparse subspace learning. In: Technometrics, pp. 1–14 (2022)

  • Yang, E., Lozano, A.C.: Robust Gaussian graphical modeling with the trimmed graphical lasso. In: Advances in Neural Information Processing Systems, pp. 2602–2610 (2015)

  • Yang, J., Peng, J.: Estimating time-varying graphical models. J. Comput. Graph. Stat. 29(1), 191–202 (2020)

  • Zhang, C., Yan, H., Lee, S., Shi, J.: Dynamic multivariate functional data modeling via sparse subspace learning. Technometrics 63(3), 370–383 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marina Vannucci.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 MCMC for the dynamic classical-t graphical model

  • Update \({\varvec{\tau }}\): For \(t=1,2,\ldots ,T\), sample \(\tau _t\) from

    $$\begin{aligned} \text {Gamma}\left( \frac{\nu +p}{2}, \frac{\nu + {\varvec{y}}_t' {\varvec{\Omega }}_{s_t} {\varvec{y}}_t }{2}\right) , \end{aligned}$$

    where the second parameter is the rate.

  • Update \({{\varvec{\Omega }}}\) and \({{\varvec{G}}}\): For each \(s \in \{1,...,S\}\), let \(n_s\) denote the number of time points in state s. Compute \({\varvec{\Sigma }}_s = \sum _{t: s_t=s} \tau _t {\varvec{y}}_t {\varvec{y}}_t'\) and perform the following updates:

    1. 1.

      Following Peterson et al. (2020), for \(s \in \{1,..,S\}\), sample \({{\varvec{\Omega }}}_{s}\) one column and row at a time. Here we demonstrate how to sample the last row and last column in the last state. First, partition \({\varvec{\Omega }}_S\) into

      $$\begin{aligned} {\varvec{\Omega }}_S = \begin{pmatrix} {\varvec{\Omega }}_{S,11} &{} {\varvec{\omega }}_{S,12} \\ {\varvec{\omega }}_{S,21} &{} \omega _{S,pp} \end{pmatrix} \end{aligned}$$

      where \(\omega _{S,pp}\) is the (pp)th element of the matrix, \({\varvec{\omega }}_{S,12}\) is the last column except \(\omega _{S,pp}\) and \({\varvec{\omega }}_{S,21}\) is the last row except \(\omega _{S,pp}\). Similarly, partition \({\varvec{\Theta }}_{ij}\), \(i,j=1,\ldots ,p\), and \({\varvec{\Sigma }}_S\) into

      $$\begin{aligned} {\varvec{\Theta }}_{ij} = \begin{pmatrix} {\varvec{\Theta }}_{ij,11} &{} {\varvec{\theta }}_{ij,12} \\ {\varvec{\theta }}_{ij,21} &{} \theta _{ij,SS} \end{pmatrix} , \quad {\varvec{\Sigma }}_S = \begin{pmatrix} {\varvec{\Sigma }}_{S,11} &{} {\varvec{\sigma }}_{S,12} \\ {\varvec{\sigma }}_{S,21} &{} \sigma _{S,pp} \end{pmatrix} . \end{aligned}$$

      Next, perform a one-on-one transformation of the last column of \({\varvec{\Omega }}_S\) as \(({\varvec{a}} , b) = ({\varvec{\omega }}_{S,12}, \omega _{S,pp} - {\varvec{\omega }}_{S,21} {\varvec{\Omega }}_{S,11}^{-1} {\varvec{\omega }}_{S,12} )\). It can be proved that the conditional posterior of \(({\varvec{a}}, b)\) is

      $$\begin{aligned} \begin{aligned}&p({\varvec{a}}, b | {\varvec{\Omega }}_{S,11}, {\varvec{\Omega }}_1,\ldots ,{\varvec{\Omega }}_{S-1}, {\varvec{G}}_1,\ldots ,{\varvec{G}}_S, {\varvec{\Phi }}, {\varvec{s}}, {\varvec{Y}})\\&\quad = \text {Normal}({\varvec{Q}}^{-1} {\varvec{l}}, {\varvec{Q}}^{-1}) \text {Gamma}\left( \dfrac{n_S}{2}+1, \dfrac{\sigma _{S,pp}+\uplambda }{2}\right) \end{aligned} \nonumber \\ \end{aligned}$$
      (14)

      where \({\varvec{l}} = {\varvec{D}}^{-1}{\varvec{m}} - {\varvec{\sigma }}_{S,12}\), \({\varvec{Q}} = {\varvec{D}}^{-1} + (\sigma _{S,pp}+\uplambda ) {\varvec{\Omega }}_{S,11}^{-1}\), and the second parameter in the gamma distribution is rate. Here \({\varvec{D}}\) is the covariance matrix of \({\varvec{\omega }}_{S,12}\), which off diagonal elements are 0 and the jth on-diagonal element is \(\theta _{jp,SS} - {\varvec{\theta }}_{jp,21} {\varvec{\Theta }}_{jp,11}^{-1} {\varvec{\theta }}_{jp,12}\), \(j=1,\ldots ,p-1\). \({\varvec{m}}\) is the mean of \({\varvec{\omega }}_{S,12}\), which jth element is \({\varvec{\theta }}_{jp,21} {\varvec{\Theta }}_{jp,11}^{-1} (\omega _{1,jp}, \omega _{2,jp},\ldots ,\omega _{S-1,jp})'\), \(j=1,\ldots ,p-1\). The positive definiteness of \({\varvec{\Omega }}_S\) is automatically guaranteed by the Sylvester’s criterion. Assume the sampled \({\varvec{\Omega }}_S\) at the nth iteration, denoted as \({\varvec{\Omega }}_S^{(n)}\), is positive definite. After updating its last column and row at the \((n+1)\)th iteration, the leading \(p-1\) principal minors of \({\varvec{\Omega }}_S^{(n+1)}\) are still positive. Note that the principal minor of pth order in \({\varvec{\Omega }}_S^{(n+1)}\) is \(b \cdot |{\varvec{\Omega }}_{S, 11}^{(n)}|\). Since b is positive with a Gamma distribution, the pth leading principal minor is positive and \({\varvec{\Omega }}_S^{(n+1)}\) is positive definite.

    2. 2.

      Sample \({\varvec{G}}_s\), \(s=1,\ldots ,S\) from the conditional Bernoulli distributions

      $$\begin{aligned} p(g_{s,ij}= & {} 1|{\varvec{\Omega }}_1,\ldots ,{\varvec{\Omega }}_S, {\varvec{\Phi }}, {\varvec{s}}, {\varvec{Y}})\\= & {} \dfrac{N(\omega _{s,ij}|0,v_1^2)\pi }{N(\omega _{s,ij}|0,v_1^2)\pi +N(\omega _{s,ij}|0,v_0^2)(1-\pi )}. \end{aligned}$$
  • Update \({\varvec{\Phi }}\): This is an MH-step based on the PX-RPMH algorithm of Liu and Daniels (2006). Compute the diagonal matrix V where \(V_{ss} =(\sum _{i=1}^{p-1} \sum _{j=i+1}^p \omega _{s,ij})^{-\frac{1}{2}}\), \(s = 1,...,S\), and transfer \({\varvec{\Phi }}\) to \({\varvec{\Psi }}= V {\varvec{\Phi }}V\). Define \({\varvec{\epsilon }}_{ij} = V {\varvec{\omega }}_{ij}\), \(i,j=1,\ldots ,p\). Given the candidate prior \(p({\varvec{\Phi }}) \propto |{\varvec{\Phi }}|^{-\frac{S+1}{2}} {\varvec{1}}_{{\varvec{\Phi }}\in \mathcal {R^{S}}}\), the posterior of \({\varvec{\Psi }}\) is an Inverse-Wishart distribution with degrees of freedom \(p(p-1)/2\) and scale matrix \(\sum _{i=1}^{p-1} \sum _{j=i+1}^p \text {diag}({\varvec{v}}_{ij})^{-1} {\varvec{\epsilon }}_{ij} {\varvec{\epsilon }}_{ij}' \text {diag}({\varvec{v}}_{ij})^{-1}\). Sample \({\varvec{\Psi }}\) and set \({\varvec{\Phi }}^* = \text {diag}({\varvec{\Psi }})^{-\frac{1}{2}} {\varvec{\Psi }}\text {diag}({\varvec{\Psi }})^{-\frac{1}{2}} \). Accept the proposed \({\varvec{\Phi }}^*\) with probability

    $$\begin{aligned} \text {min}\left( 1, \text {exp}\left[ \dfrac{S+1}{2} \left( \text {log}|{\varvec{\Phi }}^*| - \text {log}|{\varvec{\Phi }}| \right) \right] \right) . \end{aligned}$$
  • Update \(\textbf{P}\) and \(\textbf{s}\): This is a Metropolis-Hastings step following Scott (2002). First sample the state transition probabilities for each state s from a \(\text {Dirichlet}({\varvec{\alpha }}_s)\) distribution where \(\alpha _{rs}= (\# r \rightarrow s) +1\), \(r = 1,...,S\), is the number of state transitions from r to s plus 1. Let \(\pi _{new}\) denote the stationary distribution given the transition probabilities. Accept the resulting transition matrix and stationary distribution with probability

    $$\begin{aligned} \text {min} \left\{ 1, \frac{\pi _{new}(State \quad at \quad the \quad first \quad time \quad point)}{\pi _{old}(State \quad at \quad the \quad first \quad time \quad point )} \right\} . \end{aligned}$$

    Next, perform a Gibbs step using Forward Propagate Backward sampling. Obtain the forward transition probabilities for each state at each time point as \({\varvec{Q}}_{t}=[q_{trs}]\) by

    $$\begin{aligned} q_{trs}\propto \pi _{t-1}(r|...) p_{rs} N( \sqrt{\tau _t} {\varvec{y}}_t |{\varvec{0}},{\varvec{\Omega }}_s^{-1}), \end{aligned}$$

    where \(q_{trs}\) is the transition probability from r at time \(t-1\) to s at time t; compute \(\pi _{t}(r|...) = \sum _{s = 1}^S q_{tsr}\). Sample \(s_{T}\) from \(\pi _{T}\) and then recursively sample \(s_{t}\) proportional to the \(s_{t+1}\)-th column of \({\varvec{Q}}_{t+1}\) until \(s_{1}\) is sampled.

1.2 MCMC for the dynamic Dirichlet-t graphical model

  • Update \({{\varvec{\Omega }}}\) and \({{\varvec{G}}}\): Same as the classical-t model with \({\varvec{\Sigma }}_s = \sum _{t: s_t=s} \text {diag}(\sqrt{{\varvec{\tau }}_t}) {\varvec{y}}_t {\varvec{y}}_t'\text {diag}(\sqrt{{\varvec{\tau }}_t})\) for \(s \in \{1,\ldots ,S\}\).

  • Update \({{\varvec{\Phi }}}\): Same as the classical-t model.

  • Update \(\textbf{P}\) and \(\textbf{s}\): Same as the classical t model with \(p_{trs} \propto \pi _{t-1}(r|\theta )q(r,s) N( \text {diag}(\sqrt{{\varvec{\tau }}_t}){\varvec{y}}_t |{\varvec{0}},{\varvec{\Omega }}_s^{-1} )\).

  • Update \({\varvec{\tau }}\): We use the truncated stick-breaking process to estimate the Dirichlet process (Ishwaran and James 2001). In simulation studies, we found that fixing the number of possible clusters reduces the MCMC processing time by up to 90%, as we avoid frequent sampling of the probability that \(\tau _{tj}\) is in a new cluster, which requires evaluating the pdf of noncentral t-distributions. Given the pre-specified level of truncation K, let \({\varvec{Z}}\) be a \(T \times p\) matrix of cluster indicators and let \(n_{tk}\) be the size of cluster k at time t, \(k = 1, \ldots , K\), \(t = 1,...,T\). \({\varvec{\eta }}_t = (\eta _{t1},\ldots ,\eta _{t K})\) denotes the unique cluster values of \({\varvec{\tau }}_t\). Define the cluster weights as \(\{w_{tk}\}_{t \in \{1,\ldots ,T\}, k \in \{1,\ldots ,K\} }\) and perform the following steps:

    1. 1.

      For each \(t=1,\ldots ,T\) and \(j=1,\ldots ,p\), draw \(z_{tj}\) and update \(x_{tj} = y_{tj} * \sqrt{\eta _{t, z_{tj}}}\). The probability that it belongs to cluster k, \(k=1,\ldots ,K\), is

      $$\begin{aligned} p(z_{tj}= k | {\varvec{z}}_{t,-j }, {\varvec{\eta }}_t, {\varvec{y}}_t, {\varvec{\Omega }}_{s_t}, {\varvec{w}}_t) \propto w_{tk} \text {N}\left( y_{tj} | \frac{\mu _c}{\sqrt{\eta _{tk}}}, \frac{\sigma _c^2}{\eta _{tk}}\right) , \end{aligned}$$

      where N\((x|\mu ,\sigma ^2)\) is the Normal(\(\mu ,\sigma ^2\)) pdf evaluated at x. Here \(\mu _c\) and \(\sigma _c^2\) are the conditional mean and variance of \(x_{tj}|{\varvec{x}}_{t,-j}\), where \({\varvec{x}}_{t,-j} = \{x_{ti}:i\ne j, i \in \{1,\ldots , p\} \}\) and \({\varvec{x}}_t \sim MVN({\varvec{0}}, {\varvec{\Omega }}_{s_t}^{-1})\)

    2. 2.

      For each \(t=1,\ldots ,T\) and \(k=1,\ldots ,K-1\), sample the stick-breaking weight \(v_{tk}\) from Beta\((1+n_{tk}, \alpha + \sum _{k'= k+1}^{K} n_{t k'})\). Compute \(w_{tk} = v_{tk} \prod _{k'<k} (1-v_{tk})\).

    3. 3.

      For each \(t=1,\ldots ,T\) and each \(k=1,\ldots ,K\), draw \(\eta _{tk}\) from distribution

      $$\begin{aligned} f(\eta _{tk}| {\varvec{\eta }}_{t,-k}, {\varvec{z}}_t, {\varvec{y}}_t, {\varvec{\Omega }}_{s_t}) \propto \eta _{tk}^{a-1} \text {exp}\{ -b \eta _{tk} -c \sqrt{\eta _{tk}} \} \end{aligned}$$
      (15)

      with \(a = \frac{\nu +n_k}{2}\), \(b=\frac{1}{2}[\nu + \text {tr}({\varvec{\Omega }}_{(k)(k)} {\varvec{y}}_{(k)} {\varvec{y}}_{(k)}' )]\), \(c = \text {tr}( {\varvec{\Omega }}_{(k)(-k)} {\varvec{x}}_{(-k)} {\varvec{y}}_{(k)}' )\), \((k) = \{j: z_j=k, j \in \{1,\ldots ,p\}\}\) and \((-k) = \{j: z_j\ne k, j \in \{ 1,\ldots ,p\}\}\) (t and \(s_t\) are dropped here for simplicity). In order to sample \(\eta _{tk}\), we do a variable transformation \(\tau ^* = b \eta _{tk}\) and get

      $$\begin{aligned} f(\tau ^*| \ldots ) \propto \tau ^{*(a^*-1)} \text {exp}\{ -\tau ^* -2 c^*\sqrt{\tau ^*} \} \end{aligned}$$

      with \(a^*=a\) and \(c^*=\frac{c}{2\sqrt{b}} \). Next, use the rejection algorithm from Liu et al. (2012) to sample \(\tau ^*\):

      1. (a)

        When \(c^*/\sqrt{a^*}\le -0.7\), use Algorithm 2 from Liu et al. (2012). Compute \(m=\frac{2 a^*-1}{c^* + \sqrt{(c^*)^2 + 4 a^* -2}}\). Repeatedly generate \(X \sim N (m, \frac{1}{2} )\) and \(U\sim \) Uniform(0, 1) until \(X>0\) and \(m^{2a^*-1}U< X^{2a^*-1}\text {exp}\{ -2(m+c^*)(X-m)\}\). Accept \(X^2\).

      2. (b)

        When \(-0.7<c^*/\sqrt{a^*}< 0\), use the first part of Algorithm 1 from Liu et al. (2012). Compute \(m=\frac{4a^*}{( \sqrt{(c^*)^2+4a^*}-c^*)^2}\). Repeatedly generate \(X\sim \)Gamma\((a^*, m)\) (m is the rate parameter) and \(U\sim \) Uniform(0, 1) until \(U< \text {exp} \{a^*+X(m-1)-2\sqrt{X}c^* - a^*/m)\}\). Accept X.

      3. (c)

        When \(0\le c^*/\sqrt{a^*}< 0.7\), use the second part of Algorithm 1 from Liu et al. (2012). Repeatedly generate \(X\sim \)Gamma\((a^*,1)\) and \(U\sim \)Uniform(0, 1) until \(U< \text {exp}(-2 c^* \sqrt{X})\). Accept X.

      4. (d)

        When \(0.7\le c^*/\sqrt{a^*}\), use Algorithm 3 from Liu et al. (2012). Compute \(m= c^*+\sqrt{(c^*)^2+4a^*}\). Repeatedly generate \(X\sim \)Gamma\((2a^*,m)\) and \(U\sim \)Uniform(0, 1) until \(U<\text {exp}\{ - (X-\frac{m}{2}+c^*)^2\}\). Accept \(X^2\).

      5. (e)

        \(\eta _{tk} = \tau ^* / b\).

      6. (f)

        Update \(x = \sqrt{\tau } y \) at each k.

  • Update \({\varvec{\alpha }}\): Sample \(\alpha \) from

    $$\begin{aligned} \alpha | \ldots \sim \text {Gamma}\left( a_{\alpha } + T \cdot (K - 1), b_{\alpha } - \sum _{t=1}^T \sum _{k=1}^{K - 1} log ( 1 - v_{tk}) \right) . \end{aligned}$$

1.3 Parameter selection for the fused graphical lasso

In the simulation study, we evaluate the performance of the fused graphical lasso (fLasso; Danaher et al. 2014) by treating the true hidden states as known. The fLasso contains two tuning parameters: \(\uplambda _1\) controls the graphical lasso penalty (i.e., graph sparsity) and \(\uplambda _2\) controls the fused lasso penalty (i.e., graph similarity).

To estimate a single graph, we select both \(\uplambda _1\) and \(\uplambda _2\) from a grid of values based on AIC, as done in Danaher et al. (2014). Specifically, for each value of \(\uplambda _2\) on a grid, we define the grid of \(\uplambda _1\) such that the graph ranges from empty to full. After fitting the fLasso for each \((\uplambda _1, \uplambda _2)\) pair, we return the estimated graph for which the AIC is smallest.

To compute an ROC curve for fLasso, we again vary \(\uplambda _2\) on a grid, but now compute the AUC (across values of \(\uplambda _1\) ranging from the empty graph to the full graph) and select the value of \(\uplambda _2\) for which the AUC is largest.

1.4 Simulation study results when \(p = 10\)

 

Classical-t

Slightly contaminated

Highly contaminated

 

TPR

FPR

MCC

TPR

FPR

MCC

TPR

FPR

MCC

dGM

0.58

0.21

0.36

0.56

0.22

0.35

0.38

0.29

0.09

dCT

0.98

0.01

0.97

0.81

0.10

0.71

0.68

0.16

0.52

dDT

0.98

0.01

0.97

0.97

0.01

0.96

0.97

0.02

0.95

Performance comparisons for hidden state estimation over the three hidden states, for the three simulation scenarios, \(p = 10\)

 

Classical-t

Slightly contaminated

Highly contaminated

 

TPR

FPR

MCC

FL

TPR

FPR

MCC

FL

TPR

FPR

MCC

FL

fLasso

0.99

0.87

0.20

0.35

0.97

0.72

0.31

0.45

0.90

0.63

0.26

0.65

dGM

0.45

0.02

0.54

0.38

0.42

0.02

0.51

0.38

0.31

0.03

0.39

0.51

dCT

0.92

0.01

0.93

0.06

0.67

0.03

0.70

0.42

0.58

0.05

0.60

0.37

dDT

0.92

0.01

0.93

0.09

0.91

0.02

0.91

0.25

0.85

0.01

0.86

0.18

Performance comparisons for graph estimation, \(p = 10\)

1.5 List of variables in the hand gesture data set

Code

Variable name

Code

Variable name

1

Vectorial velocity of left hand (x coordinate)

17

Vectorial acceleration of right hand (y coordinate)

2

Vectorial velocity of left hand (y coordinate)

18

Vectorial acceleration of right hand (z coordinate)

3

Vectorial velocity of left hand (z coordinate)

19

Vectorial acceleration of left wrist (x coordinate)

4

Vectorial velocity of right hand (x coordinate)

20

Vectorial acceleration of left wrist (y coordinate)

5

Vectorial velocity of right hand (y coordinate)

21

Vectorial acceleration of left wrist (z coordinate)

6

Vectorial velocity of right hand (z coordinate)

22

Vectorial acceleration of right wrist (x coordinate)

7

Vectorial velocity of left wrist (x coordinate)

23

Vectorial acceleration of right wrist (y coordinate)

8

Vectorial velocity of left wrist (y coordinate)

24

Vectorial acceleration of right wrist (z coordinate)

9

Vectorial velocity of left wrist (z coordinate)

25

Scalar velocity of left hand

10

Vectorial velocity of right wrist (x coordinate)

26

Scalar velocity of right hand

11

Vectorial velocity of right wrist (y coordinate)

27

Scalar velocity of left wrist

12

Vectorial velocity of right wrist (z coordinate)

28

Scalar velocity of right wrist

13

Vectorial acceleration of left hand (x coordinate)

29

Scalar acceleration of left hand

14

Vectorial acceleration of left hand (y coordinate)

30

Scalar acceleration of right hand

15

Vectorial acceleration of left hand (z coordinate)

31

Scalar acceleration of left wrist

16

Vectorial acceleration of right hand (x coordinate)

32

Scalar acceleration of right wrist

Variables and their abbreviations

1.6 Results from the fused graphical lasso model on the hand gesture data set

In the application study, we fit the fused graphical lasso model given the two states generated by the proposed dynamic Dirichlet-t graphical model. The optimal graphs chosen by AIC (Danaher et al. 2014) are full graphs. To reach a desired level of sparsity, we adjust the two fLasso tuning parameters until we get a similar number of edges as the Dirichlet-t model. The estimated graphs are plotted below. The graphs in the two states include all edges that describe the positive correlation of velocity between the left hand and the left wrist, the positive correlation of acceleration between the left hand and the left wrist, and the same positive correlations on the right hand side. However, many expected edges are not detected. Thus, the fLasso results—even with the states supplied—are unsatisfactory.

figure a

Posterior partial correlations in each hidden state from the fused graphical lasso model. Blue lines indicate positive correlations and red lines indicate negative correlations; line thickness represents the correlation strength; a dotted line indicates that the edge is unique to the state

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Kowal, D.R. & Vannucci, M. Dynamic and robust Bayesian graphical models. Stat Comput 32, 105 (2022). https://doi.org/10.1007/s11222-022-10177-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10177-0

Keywords

Navigation