Covariate-Related Structure Extraction from Paired Data

  • Linfei Zhou
  • Elisabeth Georgii
  • Claudia Plant
  • Christian BöhmEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9832)


In the biological domain, it is more and more common to apply several high-throughput technologies to the same set of samples. We propose a Covariate-Related Structure Extraction approach (CRSE) that explores relationships between different types of high-dimensional molecular data (views) in the context of sample covariate information from the experimental design, for example class membership. Real-world data analysis with an initial pipeline implementation of CRSE shows that the proposed approach successfully captures cross-view structures underlying multiple biologically relevant classification schemes, allowing to predict class labels to unseen examples from either view or across views.


Partial Little Square Canonical Correlation Analysis Canonical Variable Covariate Information Data View 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Ming Jin, Jin Zhao, Basem Kanawati, Philippe Schmitt-Kopplin, Andreas Albert, J. Barbro Winkler, and Anton R. Schäffner for kindly providing the datasets used in this study.


  1. Abdi, H., Williams, L.J., Valentin, D.: Multiple factor analysis: principal component analysis for multitable and multiblock data sets. Wiley Interdisc. Rev. Comput. Stat. 5(2), 149–179 (2013)CrossRefGoogle Scholar
  2. Acar, E., Gurdeniz, G., Rasmussen, M., Rago, D., Dragsted, L.O., Bro, R.: Coupled matrix factorization with sparse factors to identify potential biomarkers in metabolomics. In: IEEE 12th International Conference on Data Mining Workshops, pp. 1–8 (2012)Google Scholar
  3. Acar, E., Papalexakis, E.E., Rasmussen, M.A., Lawaetz, A.J., Nilsson, M., Bro, R.: Structure-revealing data fusion. BMC Bioinf. 15(1), 239 (2014)CrossRefGoogle Scholar
  4. Barkauskas, D.: FTICRMS: Programs for Analyzing Fourier Transform-Ion Cyclotron Resonance Mass Spectrometry Data. R package version 8 (2012)Google Scholar
  5. Boulesteix, A.-L., Strimmer, K.: Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings Bioinform. 8(1), 32–44 (2007)CrossRefGoogle Scholar
  6. Choi, S.W., Lee, I.-B.: Multiblock PLS-based localized process diagnosis. J. Process Control 15(3), 295–306 (2005)CrossRefGoogle Scholar
  7. Duda, R.O., Hart, P.E., et al.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)zbMATHGoogle Scholar
  8. Eslami, A., Qannari, E., Kohler, A., Bougeard, S.: Multivariate analysis of multiblock and multigroup data. Chemometr. Intell. Lab. Syst. 133, 63–69 (2014)CrossRefGoogle Scholar
  9. Geladi, P., Kowalski, B.R.: Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)CrossRefGoogle Scholar
  10. González, I., Déjean, S., Martin, P.G., Baccini, A., et al.: CCA: an R package to extend canonical correlation analysis. J. Stat. Softw. 23(12), 1–14 (2008)CrossRefGoogle Scholar
  11. Guo, S., Ruan, Q., Wang, Z., Liu, S.: Facial expression recognition using spectral supervised canonical correlation analysis. J. Comput. Inf. Sci. Eng. 29(5), 907–924 (2013)MathSciNetGoogle Scholar
  12. Haenlein, M., Kaplan, A.M.: A beginner’s guide to partial least squares analysis. Underst. Stat. 3(4), 283–297 (2004)CrossRefGoogle Scholar
  13. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)CrossRefzbMATHGoogle Scholar
  14. Horst, P.: Generalized canonical correlations and their applications to experimental data. J. Clin. Psychol. 17(4), 331–347 (1961)CrossRefGoogle Scholar
  15. Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)CrossRefzbMATHGoogle Scholar
  16. Huopaniemi, I., Suvitaival, T., Nikkilä, J., Orešič, M., Kaski, S.: Multivariate multi-way analysis of multi-source data. Bioinformatics 26(12), i391–i398 (2010)CrossRefGoogle Scholar
  17. Jamali, M., Ester, M.: A matrix factorization technique with trust propagation for recommendation in social networks. In: Proceedings of the 4th ACM Conference on Recommender Systems, pp. 135–142. ACM (2010)Google Scholar
  18. Jiang, M., Cui, P., Liu, R., Yang, Q., Wang, F., Zhu, W., Yang, S.: Social contextual recommendation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 45–54. ACM (2012)Google Scholar
  19. Klami, A., Virtanen, S., Kaski, S.: Bayesian canonical correlation analysis. J. Mach. Learn. Res. 14(1), 965–1003 (2013)MathSciNetzbMATHGoogle Scholar
  20. Krzanowski, W.: Principal component analysis in the presence of group structure. Appl. Stat. 33, 164–168 (1984)CrossRefGoogle Scholar
  21. Lanckriet, G.R., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004)CrossRefGoogle Scholar
  22. Lee, C.M., Mudaliar, M.A., Haggart, D., Wolf, C.R., Miele, G., Vass, J.K., Higham, D.J., Crowther, D.: Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology. PLoS ONE 7(12), e48238 (2012)CrossRefGoogle Scholar
  23. Luo, Y., Tao, D., Ramamohanarao, K., Xu, C., Wen, Y.: Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans. Knowl. Data Eng. 27(11), 3111–3124 (2015)CrossRefGoogle Scholar
  24. Pinheiro, J.C., Bates, D.M.: Basic concepts and examples. Mixed-effects Models in S and S-Plus, pp. 3–56 (2000)Google Scholar
  25. Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015)CrossRefGoogle Scholar
  26. Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning, pp. 515–521. Morgan Kaufmann (1998)Google Scholar
  27. Smilde, A.K., Westerhuis, J.A., de Jong, S.: A framework for sequential multiblock component methods. J. Chemom. 17(6), 323–337 (2003)CrossRefGoogle Scholar
  28. Smyth, G.K.: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3(1), 1–25 (2004). doi: 10.2202/1544-6115.1027. ISSN (Online) 1544-6115MathSciNetzbMATHGoogle Scholar
  29. Sweeney, K.T., McLoone, S.F., Ward, T.E.: The use of ensemble empirical mode decomposition with canonical correlation analysis as a novel artifact removal technique. IEEE Trans. Biomed. Eng. 60(1), 97–105 (2013)CrossRefGoogle Scholar
  30. Tenenhaus, M., Vinzi, V.E.: PLS regression, PLS path modeling and generalized procrustean analysis: a combined approach for multiblock analysis. J. Chemom. 19(3), 145–153 (2005)CrossRefGoogle Scholar
  31. Vía, J., Santamaría, I., Pérez, J.: A learning algorithm for adaptive canonical correlation analysis of several data sets. Neural Netw. 20(1), 139–152 (2007)CrossRefzbMATHGoogle Scholar
  32. Vinod, H.D.: Canonical ridge and econometrics of joint production. J. Econometrics 4(2), 147–166 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
  33. Wendorf, C.A.: Primer on multiple regression coding: common forms and the additional case of repeated contrasts. Underst. Stat. 3(1), 47–57 (2004)CrossRefGoogle Scholar
  34. Westerhuis, J.A., Kourti, T., MacGregor, J.F.: Analysis of multiblock and hierarchical PCA and PLS models. J. Chemom. 12(5), 301–321 (1998)CrossRefGoogle Scholar
  35. Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)CrossRefGoogle Scholar
  36. Witten, D.M., Tibshirani, R.J.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8(1), 1–27 (2009)MathSciNetzbMATHGoogle Scholar
  37. Wold, S., Hellberg, S., Lundstedt, T., Sjöström, M.: PLS modeling with latent variables in two or more dimensions. Partial Least Squares Model Building: Theory and Application (1987)Google Scholar
  38. Wold, S., Kettaneh, N., Tjessem, K.: Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection. J. Chemom. 10(5–6), 463–482 (1996)CrossRefGoogle Scholar
  39. Zhou, G., Cichocki, A., Zhang, Y., Mandic, D.P.: Group component analysis for multiblock data: common and individual feature extraction. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2015). doi: 10.1109/TNNLS.2015.2487364 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Linfei Zhou
    • 1
  • Elisabeth Georgii
    • 2
  • Claudia Plant
    • 3
  • Christian Böhm
    • 1
    Email author
  1. 1.Ludwig-Maximilians-Universität MünchenMunichGermany
  2. 2.Helmholtz Zentrum MünchenNeuherbergGermany
  3. 3.University of ViennaViennaAustria

Personalised recommendations