Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data

Rahman, Mohammad S.; Nicholson, Ann E.; Haffari, Gholamreza

doi:10.1007/s42979-020-00224-w

Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data

Original Research
Published: 27 June 2020

Volume 1, article number 218, (2020)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Mohammad S. Rahman¹,
Ann E. Nicholson¹ &
Gholamreza Haffari¹

563 Accesses
2 Citations
Explore all metrics

Abstract

Gaussian graphical models (GGM) express conditional dependencies among variables of Gaussian-distributed high-dimensional data. However, real-life datasets exhibit heterogeneity which can be better captured through the use of mixtures of GGMs, where each component captures different conditional dependencies a.k.a. context-specific dependencies along with some common dependencies a.k.a. shared dependencies. Methods to discover shared and context-specific graphical structures include joint and grouped graphical Lasso, and the EM algorithm with various penalized likelihood scoring functions. However, these methods detect graphical structures with high false discovery rates and do not detect two types of dependencies (i.e., context-specific and shared) together. In this paper, we develop a method to discover shared conditional dependencies along with context-specific graphical models via a two-level hierarchical Gaussian graphical model. We assume that the graphical models corresponding to shared and context-specific dependencies are decomposable, which leads to an efficient greedy algorithm to select edges minimizing a score based on minimum message length (MML). The MML-based score results in lower false discovery rate, leading to a more effective structure discovery. We present extensive empirical results on synthetic and real-life datasets and show that our method leads to more accurate prediction of context-specific dependencies among random variables compared to previous works. Hence, we can consider that our method is a state of the art to discover both shared and context-specific conditional dependencies from high-dimensional Gaussian heterogeneous data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors

Article 20 September 2021

A Statistically Efficient and Scalable Method for Exploratory Analysis of High-Dimensional Data

Article 07 February 2020

Mutual conditional independence and its applications to model selection in Markov networks

Article Open access 21 July 2020

Notes

A clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent [42]. A maximal clique is a clique that cannot be extended by including one more adjacent vertex, that is, a clique which does not exist exclusively within the vertex set of a larger clique [42].
In graph theory, the term “null graph” refers to a graph without any edges, aka the “empty graph” [42].
In real world, the heterogeneous GGM data exhibit relatively small number of components comparing with the number of datapoints. Therefore, \(K<< n\) therefore \(-\log {K!}\) does not affect the total require bits to encode the clustering coefficient and the contents.
The graphical structure at the top level is the graphical structure with shared edges. At lower level, all the context-specific graphical structures are placed. That is why it is called two-level hierarchical Gaussian graphical models.
\(\mathrm{{FPR}} = \frac{{\text {FP}}}{{\text {TP}}+{\text {FP}}}\) where TP is the number of the predicted edges present in gold standard and FP is the number of the predicted edges not present in gold standard.
\(\mathrm{{FNR}} = \frac{{\text {FN}}}{{\text {TN}}+{\text {FN}}}\) where TN is the number of the predicted conditional independence present in gold standard and FN is the number of the predicted conditional independence not present in gold standard.
error = FNR+FPR.
Except PaGIAM–tGDM and PaGIAM–sContchordalysis-MML, all other baselines of synthetic data experiments do not perform well. For this reason, we do not use these baselines for the real-world data.

References

Akaike H. Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory; 1973. p. 267–281.
Allisons L. Encoding General Graphs. 2017. http://www.allisons.org/ll/MML/Structured/Graph/. Accessed 1 Apr 2020.
Armstrong H, et al. Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Stat Comput. 2009;19:303–16.
MathSciNet Google Scholar
Barabási AL, Albert R. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74(1):47–97.
MathSciNet MATH Google Scholar
Breheny P, Huang J. Penalized methods for bi-level variable selection. Stat Inference. 2009;2(3):369–80.
MathSciNet MATH Google Scholar
Brennan C, et al. The somatic genomic landscape of gliobalstoma. Cell. 2013;155(2):462–77.
Google Scholar
Clauset A, et al. Power-law distributions in empirical data. SIAM Rev. 2007;51:661–703.
MathSciNet MATH Google Scholar
Danaher P, et al. The Joint Graphical Lasso for inverse covariance estimation across multiple classes. J R Stat Soc. 2014;76(2):373–97.
MathSciNet Google Scholar
Dempster A, et al. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. 1977;39(1):1–39.
MathSciNet MATH Google Scholar
Deshpande A, et al. Efficient stepwise selection in decomposable models. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence; 2001. p. 128–135.
Dowe D, et al. MML estimation of the parameters of the spherical Fisher distribution. Algorithmic Learn Theory. 1996;1160:213–27.
MathSciNet MATH Google Scholar
Dwyer P. Some applications of matrix derivatives in multivariate analysis. J Am Stat Assoc. 1967;62:607–25.
MathSciNet MATH Google Scholar
Friedman J, et al. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–41.
MATH Google Scholar
Friedman N. The Bayesian structural EM algorithm. In: Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (UAI); 1998. p. 129–138.
Gao C, et al. Estimation of multiple networks in Gaussian mixture models. Electron J Stat. 2016;10:1133–54.
MathSciNet MATH Google Scholar
Giraud C. Introduction to high-dimensional statistics. Boca Raton: Chapman and Hall/CRCs; 2014.
MATH Google Scholar
Guavain JL, Lee CH. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process. 1998;2(2):291–8.
Google Scholar
Guo J, et al. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15.
MathSciNet MATH Google Scholar
Hao B, et al. Simultaneous clustering and estimation of heterogeneous graphical model. J Mach Learn Res. 2018;18(217):1–58.
MathSciNet Google Scholar
Kumar M, Koller D. Learning a small mixture of trees. In: Advances in neural information processing systems; 2009. p. 1051–1059.
Lauritzen S. Graphical models. Oxford: Clarendon Press; 1996.
MATH Google Scholar
Li Z, et al. Bayesian Joint Spike-and-Slab Graphical Lasso. In: Proceedings of the 36th international conference on machine learning, vol. 97; 2019. p. 3877–3885.
Ma J, Michailidis G. Joint structural estimation of multiple graphical models. J Mach Learn Res. 2016;17:1–48.
MathSciNet MATH Google Scholar
Maretic H, Frossard P. Graph Laplacian mixture model. arXiv:1810.10053. 2018.
McLendon R, et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455(7216):1061–8.
Google Scholar
Meilă M, Jordan MI. Learning with mixtures of trees. J Mach Learn Res. 2000;1:1–48.
MathSciNet MATH Google Scholar
Mirzaa G, et al. De novo CCND2 mutations leading to stabilization of cyclin D2 cause megalecephaly–polymicrogyria–polydactyly–hydrocephalus syndrome. Nat Genet. 2014;46(5):510–4.
Google Scholar
Mukherjee C, Roriguez A. GPU-powered shotgun stochastic search for dirichlet process mixtures of gaussian graphical models. J Comput Graph Stat. 2016;25(3):762–88.
MathSciNet Google Scholar
Narita Y, et al. Mutant epidermal growth factor receptor signalling down-regulates p27 through activation of the phosphatidylinositol 3-kinase/AKT pathway in glioblastomas. Cancer Res. 2002;62(22):6764–9.
Google Scholar
Oliver J, et al. Unsupervised learning using MML. In: Proceedings of the 13th international conference machine learning; 1996. p. 364–372.
Peterson C, et al. Bayesian inference of multiple gaussian graphical models. J Am Stat Assoc. 2015;110(509):159–74.
MathSciNet MATH Google Scholar
Petitjean F, Webb G. Scaling log-linear analysis to datasets with thousands of variables. In: SIAM international conference on data mining; 2015. p. 469–477.
Petitjean F, et al. A statistically efficient and scalable method for log-linear analysis of high-dimensional data. In: Proceedings of IEEE international conference on data mining (ICDM); 2014. p. 110–119.
Pittman J, et al. Integrated modeling of clinical and gene expression information for personalized prediction ofdisease outcomes. Proc Natl Acad Sci USA. 2004;101:8431–6.
Google Scholar
Pujana MA, et al. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet. 2007;39:1338–49.
Google Scholar
Rahman M, Haffari G. A statistically efficient and scalable method for exploratory analysis of high-dimensional data. SN Comput Sci. 2020;1(2):1–17.
Google Scholar
Rodriguez A, et al. Sparse covariance estimation in heterogeneous samples. Electron J Stat. 2011;5:981–1014.
MathSciNet MATH Google Scholar
Schwarz G. Estimating the dimension of a model. Ann Stat. 1978;6:461–4.
MathSciNet MATH Google Scholar
Verhaak R, et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR and NF1. Cancer Cell. 2010;17(1):98–110.
Google Scholar
Wallace C, Boulton D. An information measure for classification. Comput J. 1968;11:185–94.
MATH Google Scholar
Wallace C, Dowe D. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. J Stat Comput. 2000;10:173–83.
Google Scholar
West DB. Introduction to graph theory. London: Pearson; 2001.
Google Scholar

Download references

Acknowledgements

We are thankful to Monash University for the financial supports towards this research. We are also thankful to Dr. Francois Petitjean for his valuable advise on the development of two level HGGM.

Funding

This study was not funded by any external funding source.

Author information

Authors and Affiliations

Clayton School of information Technology, Monash University, Clayton, VIC, 3800, Australia
Mohammad S. Rahman, Ann E. Nicholson & Gholamreza Haffari

Authors

Mohammad S. Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Ann E. Nicholson
View author publications
You can also search for this author in PubMed Google Scholar
Gholamreza Haffari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad S. Rahman.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahman, M.S., Nicholson, A.E. & Haffari, G. Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data. SN COMPUT. SCI. 1, 218 (2020). https://doi.org/10.1007/s42979-020-00224-w

Download citation

Received: 04 April 2020
Accepted: 10 June 2020
Published: 27 June 2020
DOI: https://doi.org/10.1007/s42979-020-00224-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data

Abstract

Access this article

Similar content being viewed by others

Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors

A Statistically Efficient and Scalable Method for Exploratory Analysis of High-Dimensional Data

Mutual conditional independence and its applications to model selection in Markov networks

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data

Abstract

Access this article

Similar content being viewed by others

Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors

A Statistically Efficient and Scalable Method for Exploratory Analysis of High-Dimensional Data

Mutual conditional independence and its applications to model selection in Markov networks

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation