Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis

Tio, Pia; Waldorp, Lourens; VanDeun, Katrijn

doi:10.1007/978-981-15-3311-2_22

Pia Tio^23,24,
Lourens Waldorp²³ &
Katrijn VanDeun²⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

844 Accesses

Abstract

Gaussian graphical models (GGMs) are a popular method for analysing complex data by modelling the unique relationships between variables. Recently, a shift in interest has taken place from investigating relationships within a (sub)discipline (e.g. genetics) to estimating relationships between variables from various subdisciplines (e.g. how gene expression relates to cognitive performance). It is thus not surprising that there is an increasing need for analysing large, so-called multi-source datasets, each containing detailed information from many data sources on the same individuals. GGMs are a straightforward statistical candidate for estimating unique cross-source relationships from such network-oriented data. However, the multi-source nature of the data poses two challenges: First, different sources may inherently differ from one another, biasing the estimation of the relations. Second, GGMs are not cut out for separating cross-source relationships from all other, source-specific relationships. In this paper we propose the addition of a simultaneous-component-model pre-processing step to the Gaussian graphical model, the combination of which is suitable for estimating cross-source relationships from multi-source data. Compared to the graphical lasso (a commonly used GGM technique), this Sparse Network and Component (SNAC) model more accurately estimates the unique cross-source relationships from multi-source data. This holds in particular when the multi-source data contains more variables than observations (p > n). Neither differences in sparseness of the underlying component structure of the data nor in the relative dominance of the cross-source compared to source-specific relationships strongly affect the relationship estimates. Sparse Network and Component analysis, a hybrid component-graphical model, is a promising tool for modelling unique relationships between different data sources, thus providing insight into how various disciplines are connected to one another.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdi, H.: RV coefficient and congruence coefficient. Encyclopedia Meas. Stat. 849, 853 (2007)
Google Scholar
Bartel, J., Krumsiek, J., Theis, F.J.: Statistical methods for the analysis of high-throughput metabolomics data. Comput. Struct. Biotechnol. J. 4(5), 1–9 (2013)
Google Scholar
Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin (2011)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Glasso: Graphical Lasso-estimation of Gaussian graphical models (2014). https://CRAN.R-project.org/package=glasso. R package version 1.8
Gu, Z., Van Deun, K.: A variable selection method for simultaneous component based data integration. Chemom. Intell. Lab. Syst. 158, 187–199 (2016)
Google Scholar
Gu, Z., Van Deun, K.: RSCA: Regularized simultaneous component analysis of multiblock data in R. Behav. Res. 51, 2268–2289 (2019)
Google Scholar
Johnson, M.R., Shkura, K., Langley, S.R., Delahaye-Duriez, A., Srivastava, P., Hill, W.D., Rackham, O.J., Davies, G., Harris, S.E., Moreno-Moral, A., Rotival, M.: Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease. Nat. Neurosci. 19(2), 223 (2015)
Google Scholar
Koller, D., Nir Friedman N.: Probabilistic Graphical Models: Principles And Techniques. MIT Press, Cambridge (2009)
MATH Google Scholar
Lauritzen, S.L.: Graphical Models. Clarendon Press, Oxford (1996)
MATH Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B Stat Methodol. 72(4), 417–473 (2010)
MathSciNet MATH Google Scholar
Schäfer, J., Strimmer K.: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4(1), 32 (2005)
MathSciNet Google Scholar
Schouteden, M., Van Deun, K., Wilderjans, T.F., Van Mechelen, I.: Performing disco-SCA to search for distinctive and common information in linked data. Behav. Res. Methods 46(2), 576–587 (2014)
Google Scholar
Silverman, E.K., Loscalzo, J.: Network medicine approaches to the genetics of complex diseases. Discov. Med. 14(75), 143 (2012)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Tucker, L.R.: A method for synthesis of factor analysis studies. Technical report, DTIC Document (1951)
Google Scholar
Van Deun, K., Wilderjans, T.F., Van den Berg, R.A., Antoniadis, A., Van Mechelen, I.: A flexible framework for sparse simultaneous component based data integration. BMC Bioinform. 12(1), 448 (2011)
Google Scholar

Download references

Acknowledgements

Support for this study was provided by European Research Council Consolidator Grant 647209 (PT, LW) and Netherlands Organisation for Scientific Research Aspasia Grant 015.011.034 (PT, KVD).

Author information

Authors and Affiliations

Department of Psychological Methods, University of Amsterdam, Amsterdam, The Netherlands
Pia Tio & Lourens Waldorp
Department of Methodology, Tilburg University, Tilburg, The Netherlands
Pia Tio & Katrijn VanDeun

Authors

Pia Tio
View author publications
You can also search for this author in PubMed Google Scholar
Lourens Waldorp
View author publications
You can also search for this author in PubMed Google Scholar
Katrijn VanDeun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pia Tio .

Editor information

Editors and Affiliations

School of Management and Information Sciences, Tama University, Tokyo, Japan
Tadashi Imaizumi
Rikkyo University, Tokyo, Japan
Akinori Okada
University of Tsukuba, Tsukuba, Japan
Sadaaki Miyamoto
Department of Mathematics, Chuo University, Tokyo, Japan
Fumitake Sakaori
Department of Mathematics, Tokai University, Hiratsuka-shi, Japan
Yoshiro Yamamoto
Department of Statistical Sciences, Sapienza University of Rome, Roma, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tio, P., Waldorp, L., VanDeun, K. (2020). Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis. In: Imaizumi, T., Okada, A., Miyamoto, S., Sakaori, F., Yamamoto, Y., Vichi, M. (eds) Advanced Studies in Classification and Data Science. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Singapore. https://doi.org/10.1007/978-981-15-3311-2_22

Download citation

DOI: https://doi.org/10.1007/978-981-15-3311-2_22
Published: 26 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3310-5
Online ISBN: 978-981-15-3311-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics