Using regression makes extraction of shared variation in multiple datasets easy
- 522 Downloads
In many data analysis tasks it is important to understand the relationships between different datasets. Several methods exist for this task but many of them are limited to two datasets and linear relationships. In this paper, we propose a new efficient algorithm, termed cocoreg, for the extraction of variation common to all datasets in a given collection of arbitrary size. cocoreg extends redundancy analysis to more than two datasets, utilizing chains of regression functions to extract the shared variation in the original data space. The algorithm can be used with any linear or non-linear regression function, which makes it robust, straightforward, fast, and easy to implement and use. We empirically demonstrate the efficacy of shared variation extraction using the cocoreg algorithm on five artificial and three real datasets.
KeywordsShared variation Multiple regression Regression chains
This work was partly supported by the Revolution of Knowledge Work Project, funded by Tekes (Grants 40228/13 and 5159/31/2014), and in part by Academy of Finland (Finnish Centre of Excellence in Computational Research COIN, 251170; 26696).
- Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 30th international conference on machine learning, vol 28, pp 1247–1255Google Scholar
- Damianou A, Ek C, Titsias MK, Lawrence ND (2012) Manifold relevance determination. In: Proceedings of the 29th international conference on machine learning, pp 145–152Google Scholar
- Korpela J, Henelius A (2016) Cocoreg: extracts shared variation in collections of datasets using regression models. http://cran.r-project.org/package=cocoreg
- Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22. https://cran.r-project.org/package=randomForest
- Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2014) e1071: misc functions of the department of statistics (e1071), Technische Universität Wien. http://cran.r-project.org/package=e1071
- Nguyen HV, Müller E, Vreeken J, Efros P, Böhm K (2014) Multivariate maximal correlation analysis. In: Proceedings of the 31st international conference on machine learning, pp 775–783Google Scholar
- R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, http://www.R-project.org/
- Virtanen S, Klami A, Khan SA, Kaski S (2012) CCAGFA: Bayesian canonical correlation analysis and group factor analysis. http://cran.r-project.org/package=CCAGFA