Abstract
Gaussian graphical models (GGMs) are a popular method for analysing complex data by modelling the unique relationships between variables. Recently, a shift in interest has taken place from investigating relationships within a (sub)discipline (e.g. genetics) to estimating relationships between variables from various subdisciplines (e.g. how gene expression relates to cognitive performance). It is thus not surprising that there is an increasing need for analysing large, so-called multi-source datasets, each containing detailed information from many data sources on the same individuals. GGMs are a straightforward statistical candidate for estimating unique cross-source relationships from such network-oriented data. However, the multi-source nature of the data poses two challenges: First, different sources may inherently differ from one another, biasing the estimation of the relations. Second, GGMs are not cut out for separating cross-source relationships from all other, source-specific relationships. In this paper we propose the addition of a simultaneous-component-model pre-processing step to the Gaussian graphical model, the combination of which is suitable for estimating cross-source relationships from multi-source data. Compared to the graphical lasso (a commonly used GGM technique), this Sparse Network and Component (SNAC) model more accurately estimates the unique cross-source relationships from multi-source data. This holds in particular when the multi-source data contains more variables than observations (p > n). Neither differences in sparseness of the underlying component structure of the data nor in the relative dominance of the cross-source compared to source-specific relationships strongly affect the relationship estimates. Sparse Network and Component analysis, a hybrid component-graphical model, is a promising tool for modelling unique relationships between different data sources, thus providing insight into how various disciplines are connected to one another.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdi, H.: RV coefficient and congruence coefficient. Encyclopedia Meas. Stat. 849, 853 (2007)
Bartel, J., Krumsiek, J., Theis, F.J.: Statistical methods for the analysis of high-throughput metabolomics data. Comput. Struct. Biotechnol. J. 4(5), 1–9 (2013)
Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin (2011)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Friedman, J., Hastie, T., Tibshirani, R.: Glasso: Graphical Lasso-estimation of Gaussian graphical models (2014). https://CRAN.R-project.org/package=glasso. R package version 1.8
Gu, Z., Van Deun, K.: A variable selection method for simultaneous component based data integration. Chemom. Intell. Lab. Syst. 158, 187–199 (2016)
Gu, Z., Van Deun, K.: RSCA: Regularized simultaneous component analysis of multiblock data in R. Behav. Res. 51, 2268–2289 (2019)
Johnson, M.R., Shkura, K., Langley, S.R., Delahaye-Duriez, A., Srivastava, P., Hill, W.D., Rackham, O.J., Davies, G., Harris, S.E., Moreno-Moral, A., Rotival, M.: Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease. Nat. Neurosci. 19(2), 223 (2015)
Koller, D., Nir Friedman N.: Probabilistic Graphical Models: Principles And Techniques. MIT Press, Cambridge (2009)
Lauritzen, S.L.: Graphical Models. Clarendon Press, Oxford (1996)
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B Stat Methodol. 72(4), 417–473 (2010)
Schäfer, J., Strimmer K.: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4(1), 32 (2005)
Schouteden, M., Van Deun, K., Wilderjans, T.F., Van Mechelen, I.: Performing disco-SCA to search for distinctive and common information in linked data. Behav. Res. Methods 46(2), 576–587 (2014)
Silverman, E.K., Loscalzo, J.: Network medicine approaches to the genetics of complex diseases. Discov. Med. 14(75), 143 (2012)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
Tucker, L.R.: A method for synthesis of factor analysis studies. Technical report, DTIC Document (1951)
Van Deun, K., Wilderjans, T.F., Van den Berg, R.A., Antoniadis, A., Van Mechelen, I.: A flexible framework for sparse simultaneous component based data integration. BMC Bioinform. 12(1), 448 (2011)
Acknowledgements
Support for this study was provided by European Research Council Consolidator Grant 647209 (PT, LW) and Netherlands Organisation for Scientific Research Aspasia Grant 015.011.034 (PT, KVD).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tio, P., Waldorp, L., VanDeun, K. (2020). Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis. In: Imaizumi, T., Okada, A., Miyamoto, S., Sakaori, F., Yamamoto, Y., Vichi, M. (eds) Advanced Studies in Classification and Data Science. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Singapore. https://doi.org/10.1007/978-981-15-3311-2_22
Download citation
DOI: https://doi.org/10.1007/978-981-15-3311-2_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3310-5
Online ISBN: 978-981-15-3311-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)