Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors

Lartigue, Thomas; Durrleman, Stanley; Allassonnière, Stéphanie

doi:10.1007/s42979-021-00865-5

Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors

Original Research
Published: 20 September 2021

Volume 2, article number 466, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

364 Accesses
Explore all metrics

Abstract

Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector. In the case of an unlabelled Heterogeneous population, Expectation Maximisation (EM) algorithms for Mixtures of GGM have been proposed to estimate both each sub-population’s graph and the class labels. However, we argue that, with most real data, class affiliation cannot be described with a Mixture of Gaussian, which mostly groups data points according to their geometrical proximity. In particular, there often exists external co-features whose values affect the features’ average value, scattering across the feature space data points belonging to the same sub-population. Additionally, if the co-features’ effect on the features is Heterogeneous, then the estimation of this effect cannot be separated from the sub-population identification. In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters. We develop a penalised EM algorithm to estimate graph-sparse model parameters. We demonstrate on synthetic and real data how this method fulfils its goal and succeeds in identifying the sub-populations where the Mixtures of GGM are disrupted by the effect of the co-features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data

Article 27 June 2020

Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering

Article Open access 11 October 2022

Model-based clustering with sparse covariance matrices

Article 01 November 2018

References

Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974;19(6):716–23.
Article MathSciNet Google Scholar
Banerjee O, Ghaoui LE, d’Aspremont A, Natsoulis G. Convex optimization techniques for fitting sparse gaussian graphical models. In: Proceedings of the 23rd international conference on Machine learning, ACM, 2006;89–96.
Banerjee O, Ghaoui LE, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J Mach Learning Res. 2008;9:485–516.
MathSciNet MATH Google Scholar
Chun H, Chen M, Li B, Zhao H. Joint conditional gaussian graphical models with multiple sources of genomic data. Front Genet. 2013;4:294.
Article Google Scholar
Combettes PL, Pesquet JC. Proximal splitting methods in signal processing. In: Fixed-point algorithms for inverse problems in science and engineering, Springer, 2011; 185–212.
Danaher P, Wang P, Witten DM. The joint graphical lasso for inverse covariance estimation across multiple classes. J Royal Stat Soc Series B. 2014;76(2):373–97.
Article MathSciNet Google Scholar
Dempster AP. Covariance selection. Biometrics 1972; 157–175.
DeSarbo WS, Cron WL. A maximum likelihood methodology for clusterwise linear regression. J Classif. 1988;5(2):249–82.
Article MathSciNet Google Scholar
Figueiredo MAT, Jain AK. Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell. 2002;24(3):381–96.
Article Google Scholar
Fop M, Murphy TB, Scrucca L. Model-based clustering with sparse covariance matrices. Stat Comput. 2019;29(4):791–819.
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–41.
Article Google Scholar
Gao C, Zhu Y, Shen X, Pan W. Estimation of multiple networks in gaussian mixture models. Electron J Stat. 2016;10:1133.
MathSciNet MATH Google Scholar
Guo J, Levina E, Michailidis G, Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15.
Article MathSciNet Google Scholar
Hao B, Sun WW, Liu Y, Cheng G. Simultaneous clustering and estimation of heterogeneous graphical models. J Mach Learning Res. 2017;18(1):7981–8038.
MathSciNet Google Scholar
Hara Y, Suzuki J, Kuwahara M. Network-wide traffic state estimation using a mixture gaussian graphical model and graphical lasso. Transport Res Part C Emerging Technol. 2018;86:622–38.
Article Google Scholar
Honorio J, Samaras D. Multi-task learning of gaussian graphical models. In: ICML, Citeseer, 2010; 447–454.
Huang F, Chen S, Huang SJ. Joint estimation of multiple conditional Gaussian graphical models. IEEE Trans Neural Netw Learning Syst. 2018;29(7):3034–46.
MathSciNet Google Scholar
Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989;76(2):297–307.
Article MathSciNet Google Scholar
Jordan MI, Jacobs RA. Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 1994;6(2):181–214.
Article Google Scholar
Khalili A, Chen J. Variable selection in finite mixture of regression models. J Am Stat Assoc. 2007;102(479):1025–38.
Article MathSciNet Google Scholar
Kim M. Sparse inverse covariance learning of conditional Gaussian mixtures for multiple-output regression. Appl Intell. 2016;44(1):17–29.
Article Google Scholar
Krishnamurthy A. High-dimensional clustering with sparse gaussian mixture models. Unpublished paper 2011; 191–192.
Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22(1):79–86.
Article MathSciNet Google Scholar
Lartigue T. Mixture of gaussian graphical models with constraints. PhD thesis, Institut Polytechnique de Paris 2020.
Lee H, Ghosh SK. Performance of information criteria for spatial models. J Stat Comput Simul. 2009;79(1):93–106.
Article MathSciNet Google Scholar
Mohan K, London P, Fazel M, Witten D, Lee SI. Node-based learning of multiple Gaussian graphical models. J Mach Learning Res. 2014;15(1):445–88.
MathSciNet MATH Google Scholar
Ou-Yang L, Zhang XF, Hu X, Yan H. Differential network analysis via weighted fused conditional gaussian graphical model. IEEE/ACM transactions on computational biology and bioinformatics 2019.
Schiratti JB, Allassonniere S, Routier A, Colliot O, Durrleman S, Initiative ADN, et al. A mixed-effects model with time reparametrization for longitudinal univariate manifold-valued data. In: International Conference on Information Processing in Medical Imaging, Springer, 2015; 564–575.
Schwarz G, et al. Estimating the dimension of a model. Ann Stat. 1978;6(2):461–4.
Article MathSciNet Google Scholar
Sohn KA, Kim S. Joint estimation of structured sparsity and output structure in multiple-output regression via inverse-covariance regularization. In: Artificial Intelligence and Statistics, 2012; 1081–1089.
Varoquaux G, Gramfort A, Poline JB, Thirion B. Brain covariance selection: better individual functional connectivity models using population prior. In: Advances in neural information processing systems, 2010; 2334–2342.
Wytock M, Kolter Z. Sparse gaussian conditional random fields: algorithms, theory, and application to energy forecasting. In: International conference on machine learning, 2013; 1265–1273.
Yang MS, Lai CY, Lin CY. A robust EM clustering algorithm for Gaussian mixture models. Pattern Recogn. 2012;45(11):3950–61.
Article Google Scholar
Yang S, Lu Z, Shen X, Wonka P, Ye J. Fused multiple graphical lasso. SIAM J Optim. 2015;25(2):916–43.
Article MathSciNet Google Scholar
Yin J, Li H. A sparse conditional gaussian graphical model for analysis of genetical genomics data. Ann Appl Stat. 2011;5(4):2630.
Article MathSciNet Google Scholar
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J Royal Stat Soc: Series B. 2006;68(1):49–67.
Article MathSciNet Google Scholar
Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika. 2007;94(1):19–35.
Article MathSciNet Google Scholar
Zhou H, Pan W, Shen X. Penalized model-based clustering with unconstrained covariance matrices. Electron J Stat. 2009;3:1473.
Article MathSciNet Google Scholar

Download references

Funding

The research leading to these results has received funding from the European Research Council (ERC) under Grant agreement No 678304, European Union’s Horizon 2020 research and innovation program under grant agreement No 666992 (EuroPOND) and No 826421 (TVB-Cloud), and the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute) and reference ANR-10-IAIHU-06 (IHU-A-ICM).

Author information

Authors and Affiliations

Aramis Project-Team, INRIA, ICM, Paris, France
Thomas Lartigue & Stanley Durrleman
CMAP, CNRS, École polytechnique, I.P. Paris, Palaiseau, France
Thomas Lartigue
Centre de Recherche des Cordeliers, Université de Paris, Inserm, HEKA Project team, INRIA Paris, Sorbonne Université, 75006, Paris, France
Stéphanie Allassonnière

Authors

Thomas Lartigue
View author publications
You can also search for this author in PubMed Google Scholar
Stanley Durrleman
View author publications
You can also search for this author in PubMed Google Scholar
Stéphanie Allassonnière
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualisation, TL and SA; methodology, TL and SA; software, TL; data curation, SD and TL; validation, TL and SA; visualisation, TL; result analysis, SD, SA and TL; writing—original draft preparation; TL; writing—review and editing, TL, SD and SA; supervision, SD and SA. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Thomas Lartigue.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Code availability

Code for our algorithm, as well as a toy example that reproduces some of the results of this paper, publicly available at: https://github.com/tlartigue/Mixture-of-Conditional-Gaussian-Graphical-Models

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Single Class CGGM on the Real Data

In this appendix, we take a look at the parameters (averaged over several bootstrap folds) estimated by fitting single CGGM on the real data. On Fig. 11, we display both the estimated \({\hat{\beta }} = - {\widehat{\Sigma }} {\widehat{\Theta }}\) between X and Y and the estimated conditional correlation graph in-between the components of Y. The constant term in \({\hat{\beta }}\) is 0 since the data is overall centred. Other than that, the coefficient intensities appear to be weaker than in the multi-class parameters. The conditional correlation graph on the other hand displays the negative correlation between disease earliness \(\tau\) and speed \(\xi\) that was characteristic of the Control patients on Figs. 9 and 10. This is despite the Controls (\(n=636\)) being slightly less numerous than the AD (\(n=708\)) patients on this dataset.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lartigue, T., Durrleman, S. & Allassonnière, S. Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors. SN COMPUT. SCI. 2, 466 (2021). https://doi.org/10.1007/s42979-021-00865-5

Download citation

Received: 05 July 2021
Accepted: 06 September 2021
Published: 20 September 2021
DOI: https://doi.org/10.1007/s42979-021-00865-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors

Abstract

Access this article

Similar content being viewed by others

Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data

Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering

Model-based clustering with sparse covariance matrices

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Appendix: Single Class CGGM on the Real Data

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mixture of Conditional Gaussian Graphical Models for Unlabelled Heterogeneous Populations in the Presence of Co-factors

Abstract

Access this article

Similar content being viewed by others

Inferring Two-Level Hierarchical Gaussian Graphical Models to Discover Shared and Context-Specific Conditional Dependencies from High-Dimensional Heterogeneous Data

Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering

Model-based clustering with sparse covariance matrices

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Appendix: Single Class CGGM on the Real Data

Appendix: Single Class CGGM on the Real Data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation