Skip to main content

Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis

  • Conference paper
  • First Online:
Advanced Studies in Classification and Data Science

Abstract

Gaussian graphical models (GGMs) are a popular method for analysing complex data by modelling the unique relationships between variables. Recently, a shift in interest has taken place from investigating relationships within a (sub)discipline (e.g. genetics) to estimating relationships between variables from various subdisciplines (e.g. how gene expression relates to cognitive performance). It is thus not surprising that there is an increasing need for analysing large, so-called multi-source datasets, each containing detailed information from many data sources on the same individuals. GGMs are a straightforward statistical candidate for estimating unique cross-source relationships from such network-oriented data. However, the multi-source nature of the data poses two challenges: First, different sources may inherently differ from one another, biasing the estimation of the relations. Second, GGMs are not cut out for separating cross-source relationships from all other, source-specific relationships. In this paper we propose the addition of a simultaneous-component-model pre-processing step to the Gaussian graphical model, the combination of which is suitable for estimating cross-source relationships from multi-source data. Compared to the graphical lasso (a commonly used GGM technique), this Sparse Network and Component (SNAC) model more accurately estimates the unique cross-source relationships from multi-source data. This holds in particular when the multi-source data contains more variables than observations (p > n). Neither differences in sparseness of the underlying component structure of the data nor in the relative dominance of the cross-source compared to source-specific relationships strongly affect the relationship estimates. Sparse Network and Component analysis, a hybrid component-graphical model, is a promising tool for modelling unique relationships between different data sources, thus providing insight into how various disciplines are connected to one another.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abdi, H.: RV coefficient and congruence coefficient. Encyclopedia Meas. Stat. 849, 853 (2007)

    Google Scholar 

  • Bartel, J., Krumsiek, J., Theis, F.J.: Statistical methods for the analysis of high-throughput metabolomics data. Comput. Struct. Biotechnol. J. 4(5), 1–9 (2013)

    Google Scholar 

  • Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin (2011)

    Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    MATH  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Glasso: Graphical Lasso-estimation of Gaussian graphical models (2014). https://CRAN.R-project.org/package=glasso. R package version 1.8

  • Gu, Z., Van Deun, K.: A variable selection method for simultaneous component based data integration. Chemom. Intell. Lab. Syst. 158, 187–199 (2016)

    Google Scholar 

  • Gu, Z., Van Deun, K.: RSCA: Regularized simultaneous component analysis of multiblock data in R. Behav. Res. 51, 2268–2289 (2019)

    Google Scholar 

  • Johnson, M.R., Shkura, K., Langley, S.R., Delahaye-Duriez, A., Srivastava, P., Hill, W.D., Rackham, O.J., Davies, G., Harris, S.E., Moreno-Moral, A., Rotival, M.: Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease. Nat. Neurosci. 19(2), 223 (2015)

    Google Scholar 

  • Koller, D., Nir Friedman N.: Probabilistic Graphical Models: Principles And Techniques. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  • Lauritzen, S.L.: Graphical Models. Clarendon Press, Oxford (1996)

    MATH  Google Scholar 

  • Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B Stat Methodol. 72(4), 417–473 (2010)

    MathSciNet  MATH  Google Scholar 

  • Schäfer, J., Strimmer K.: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4(1), 32 (2005)

    MathSciNet  Google Scholar 

  • Schouteden, M., Van Deun, K., Wilderjans, T.F., Van Mechelen, I.: Performing disco-SCA to search for distinctive and common information in linked data. Behav. Res. Methods 46(2), 576–587 (2014)

    Google Scholar 

  • Silverman, E.K., Loscalzo, J.: Network medicine approaches to the genetics of complex diseases. Discov. Med. 14(75), 143 (2012)

    Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Tucker, L.R.: A method for synthesis of factor analysis studies. Technical report, DTIC Document (1951)

    Google Scholar 

  • Van Deun, K., Wilderjans, T.F., Van den Berg, R.A., Antoniadis, A., Van Mechelen, I.: A flexible framework for sparse simultaneous component based data integration. BMC Bioinform. 12(1), 448 (2011)

    Google Scholar 

Download references

Acknowledgements

Support for this study was provided by European Research Council Consolidator Grant 647209 (PT, LW) and Netherlands Organisation for Scientific Research Aspasia Grant 015.011.034 (PT, KVD).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pia Tio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tio, P., Waldorp, L., VanDeun, K. (2020). Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis. In: Imaizumi, T., Okada, A., Miyamoto, S., Sakaori, F., Yamamoto, Y., Vichi, M. (eds) Advanced Studies in Classification and Data Science. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Singapore. https://doi.org/10.1007/978-981-15-3311-2_22

Download citation

Publish with us

Policies and ethics