Abstract
Gaussian graphical models (GGM) express conditional dependencies among variables of Gaussian-distributed high-dimensional data. However, real-life datasets exhibit heterogeneity which can be better captured through the use of mixtures of GGMs, where each component captures different conditional dependencies a.k.a. context-specific dependencies along with some common dependencies a.k.a. shared dependencies. Methods to discover shared and context-specific graphical structures include joint and grouped graphical Lasso, and the EM algorithm with various penalized likelihood scoring functions. However, these methods detect graphical structures with high false discovery rates and do not detect two types of dependencies (i.e., context-specific and shared) together. In this paper, we develop a method to discover shared conditional dependencies along with context-specific graphical models via a two-level hierarchical Gaussian graphical model. We assume that the graphical models corresponding to shared and context-specific dependencies are decomposable, which leads to an efficient greedy algorithm to select edges minimizing a score based on minimum message length (MML). The MML-based score results in lower false discovery rate, leading to a more effective structure discovery. We present extensive empirical results on synthetic and real-life datasets and show that our method leads to more accurate prediction of context-specific dependencies among random variables compared to previous works. Hence, we can consider that our method is a state of the art to discover both shared and context-specific conditional dependencies from high-dimensional Gaussian heterogeneous data.
Similar content being viewed by others
Notes
A clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent [42]. A maximal clique is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique [42].
In graph theory, the term “null graph” refers to a graph without any edges, aka the “empty graph” [42].
In real world, the heterogeneous GGM data exhibit relatively small number of components comparing with the number of datapoints. Therefore, \(K<< n\) therefore \(-\log {K!}\) does not affect the total require bits to encode the clustering coefficient and the contents.
The graphical structure at the top level is the graphical structure with shared edges. At lower level, all the context-specific graphical structures are placed. That is why it is called two-level hierarchical Gaussian graphical models.
\(\mathrm{{FPR}} = \frac{{\text {FP}}}{{\text {TP}}+{\text {FP}}}\) where TP is the number of the predicted edges present in gold standard and FP is the number of the predicted edges not present in gold standard.
\(\mathrm{{FNR}} = \frac{{\text {FN}}}{{\text {TN}}+{\text {FN}}}\) where TN is the number of the predicted conditional independence present in gold standard and FN is the number of the predicted conditional independence not present in gold standard.
error = FNR+FPR.
Except PaGIAM–tGDM and PaGIAM–sContchordalysis-MML, all other baselines of synthetic data experiments do not perform well. For this reason, we do not use these baselines for the real-world data.
References
Akaike H. Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory; 1973. p. 267–281.
Allisons L. Encoding General Graphs. 2017. http://www.allisons.org/ll/MML/Structured/Graph/. Accessed 1 Apr 2020.
Armstrong H, et al. Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Stat Comput. 2009;19:303–16.
Barabási AL, Albert R. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74(1):47–97.
Breheny P, Huang J. Penalized methods for bi-level variable selection. Stat Inference. 2009;2(3):369–80.
Brennan C, et al. The somatic genomic landscape of gliobalstoma. Cell. 2013;155(2):462–77.
Clauset A, et al. Power-law distributions in empirical data. SIAM Rev. 2007;51:661–703.
Danaher P, et al. The Joint Graphical Lasso for inverse covariance estimation across multiple classes. J R Stat Soc. 2014;76(2):373–97.
Dempster A, et al. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. 1977;39(1):1–39.
Deshpande A, et al. Efficient stepwise selection in decomposable models. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence; 2001. p. 128–135.
Dowe D, et al. MML estimation of the parameters of the spherical Fisher distribution. Algorithmic Learn Theory. 1996;1160:213–27.
Dwyer P. Some applications of matrix derivatives in multivariate analysis. J Am Stat Assoc. 1967;62:607–25.
Friedman J, et al. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–41.
Friedman N. The Bayesian structural EM algorithm. In: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (UAI); 1998. p. 129–138.
Gao C, et al. Estimation of multiple networks in Gaussian mixture models. Electron J Stat. 2016;10:1133–54.
Giraud C. Introduction to high-dimensional statistics. Boca Raton: Chapman and Hall/CRCs; 2014.
Guavain JL, Lee CH. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process. 1998;2(2):291–8.
Guo J, et al. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15.
Hao B, et al. Simultaneous clustering and estimation of heterogeneous graphical model. J Mach Learn Res. 2018;18(217):1–58.
Kumar M, Koller D. Learning a small mixture of trees. In: Advances in neural information processing systems; 2009. p. 1051–1059.
Lauritzen S. Graphical models. Oxford: Clarendon Press; 1996.
Li Z, et al. Bayesian Joint Spike-and-Slab Graphical Lasso. In: Proceedings of the 36th international conference on machine learning, vol. 97; 2019. p. 3877–3885.
Ma J, Michailidis G. Joint structural estimation of multiple graphical models. J Mach Learn Res. 2016;17:1–48.
Maretic H, Frossard P. Graph Laplacian mixture model. arXiv:1810.10053. 2018.
McLendon R, et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455(7216):1061–8.
Meilă M, Jordan MI. Learning with mixtures of trees. J Mach Learn Res. 2000;1:1–48.
Mirzaa G, et al. De novo CCND2 mutations leading to stabilization of cyclin D2 cause megalecephaly–polymicrogyria–polydactyly–hydrocephalus syndrome. Nat Genet. 2014;46(5):510–4.
Mukherjee C, Roriguez A. GPU-powered shotgun stochastic search for dirichlet process mixtures of gaussian graphical models. J Comput Graph Stat. 2016;25(3):762–88.
Narita Y, et al. Mutant epidermal growth factor receptor signalling down-regulates p27 through activation of the phosphatidylinositol 3-kinase/AKT pathway in glioblastomas. Cancer Res. 2002;62(22):6764–9.
Oliver J, et al. Unsupervised learning using MML. In: Proceedings of the 13th international conference machine learning; 1996. p. 364–372.
Peterson C, et al. Bayesian inference of multiple gaussian graphical models. J Am Stat Assoc. 2015;110(509):159–74.
Petitjean F, Webb G. Scaling log-linear analysis to datasets with thousands of variables. In: SIAM international conference on data mining; 2015. p. 469–477.
Petitjean F, et al. A statistically efficient and scalable method for log-linear analysis of high-dimensional data. In: Proceedings of IEEE international conference on data mining (ICDM); 2014. p. 110–119.
Pittman J, et al. Integrated modeling of clinical and gene expression information for personalized prediction ofdisease outcomes. Proc Natl Acad Sci USA. 2004;101:8431–6.
Pujana MA, et al. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet. 2007;39:1338–49.
Rahman M, Haffari G. A statistically efficient and scalable method for exploratory analysis of high-dimensional data. SN Comput Sci. 2020;1(2):1–17.
Rodriguez A, et al. Sparse covariance estimation in heterogeneous samples. Electron J Stat. 2011;5:981–1014.
Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6:461–4.
Verhaak R, et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR and NF1. Cancer Cell. 2010;17(1):98–110.
Wallace C, Boulton D. An information measure for classification. Comput J. 1968;11:185–94.
Wallace C, Dowe D. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. J Stat Comput. 2000;10:173–83.
West DB. Introduction to graph theory. London: Pearson; 2001.
Acknowledgements
We are thankful to Monash University for the financial supports towards this research. We are also thankful to Dr. Francois Petitjean for his valuable advise on the development of two level HGGM.
Funding
This study was not funded by any external funding source.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rahman, M.S., Nicholson, A.E. & Haffari, G. Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data. SN COMPUT. SCI. 1, 218 (2020). https://doi.org/10.1007/s42979-020-00224-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-020-00224-w