Abstract
The modern technology has enabled very high dimensional multimodal data streams to be routinely acquired, which results in very high dimensional feature spaces () as compared to number of training samples (n). In this regard, the paper presents a new feature extraction algorithm to address the ‘small n and large ’ problem associated with multimodal data sets. It judiciously integrates both regularization and shrinkage with canonical correlation analysis (CCA). While the diagonal elements of covariance matrices are increased using regularization parameters, the off-diagonal elements are decreased by shrinkage parameters. The theory of rough sets is used to find out the optimum regularization parameters of CCA. The effectiveness of the proposed method, along with a comparison with other methods, is demonstrated on three pairs of modalities of two real life data sets.
This work is partially supported by the Department of Electronics and Information Technology, Government of India (PhD-MLA/4(90)/2015-16).
This is a preview of subscription content, log in via an institution.
References
Cruz-Cano, R., Lee, M.T.: Fast regularized canonical correlation analysis. Comput. Stat. Data Anal. 70, 88–100 (2014)
Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. Int. J. Gen Syst 17(2–3), 191–209 (1990)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Hoboken (1973)
Gladwell, G.M.L.: On isospectral spring - mass systems. Inverse Probl. 11(3), 591–602 (1995)
Golugula, A., Lee, G., Master, S.R., Feldman, M.D., Tomaszewski, J.E., Speicher, D.W., Madabhushi, A.: Analysis, supervised regularized canonical correlation: integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery. BMC Bioinform. 12, 483 (2011)
Gonzalez, I., Dejean, S., Martin, P.G.P., Baccini, A.: CCA: an R package to extend canonical correlation analysis. J. Stat. Softw. 23(12), 1–14 (2008)
Gonzalez, I., Dejean, S., Martin, P.G.P., Goncalves, O., Besse, P., Baccini, A.: Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis. J. Biol. Syst. 17(2), 173–199 (2009)
Gou, Z., Fyfe, C.: A canonical correlation neural network for multicollinearity and functional data. Neural Netw. 17(2), 285–293 (2004)
Guo, Y., Hastie, T., Tibshirani, R.: Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8(1), 86–100 (2007)
Hassan, M., Boudaoud, S., Terrien, J., Karlsson, B., Marque, C.: Combination of canonical correlation analysis and empirical mode decomposition applied to denoising the labor electrohysterogram. IEEE Trans. Biomed. Eng. 58(9), 2441–2447 (2011)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Hwang, D., Schmitt, W.A., Stephanopoulos, G., Stephanopoulos, G.: Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics 18, 1184–1193 (2002)
Jafari, P., Azuaje, F.: An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med. Inform. Decis. Making 6, 27 (2006)
Lee, G., Singanamalli, A., Wang, H., Feldman, M.D., Master, S.R., Shih, N.N.C., Spangler, E., Rebbeck, T., Tomaszewski, J.E., Madabhushi, A.: Supervised Multi-View Canonical Correlation Analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer. IEEE Trans. Med. Imaging 34(1), 284–297 (2015)
Li, M., Liu, Y., Feng, G., Zhou, Z., Hu, D.: OI and fMRI signal separation using both temporal and spatial autocorrelations. IEEE Trans. Biomed. Eng. 57(8), 1917–1926 (2010)
Lin, Z., Zhang, C., Wu, W., Gao, X.: Frequency recognition based on canonical correlation analysis for SSVEP-based BCIs. IEEE Trans. Biomed. Eng. 53(12), 2610–2614 (2006)
Maji, P.: A rough hypercuboid approach for feature selection in approximation spaces. IEEE Trans. Knowl. Data Eng. 26(1), 16–29 (2014)
Maji, P., Mandal, A.: Multimodal omics data integration using max relevance-max significance criterion. IEEE Trans. Biomed. Eng. (2016). doi:10.1109/TBME.2016.2624823
Maji, P., Mandal, A.: Rough hypercuboid based supervised regularized canonical correlation for multimodal data analysis. Fundamenta Informaticae 148(1–2), 133–155 (2016)
Mandal, A., Maji, P.: FaRoC: fast and robust supervised canonical correlation analysis for multimodal omics data. IEEE Trans. Cybern. (2017). doi:10.1109/TCYB.2017.2685625
Paul, S., Maji, P.: \(\mu \)HEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix. BMC Bioinform. 14(1), 266 (2013)
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Schafer, J., Strimmer, K.: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4(1), 32 (2005)
Sweeney, K.T., McLoone, S.F., Ward, T.E.: The use of ensemble empirical mode decomposition with canonical correlation analysis as a novel artifact removal technique. IEEE Trans. Biomed. Eng. 60(1), 97–105 (2013)
Thomas, J.G., Olson, J.M., Tapscott, S.J., Zhao, L.P.: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 11(7), 1227–1236 (2001)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Vinod, H.D.: Canonical ridge and econometrics of joint production. J. Econometrics 4(2), 147–166 (1976)
Wu, G.R., Chen, F., Kang, D., Zhang, X., Marinazzo, D., Chen, H.: Multiscale causal connectivity analysis by canonical correlation: theory and application to epileptic brain. IEEE Trans. Biomed. Eng. 58(11), 3088–3096 (2011)
Yamanishi, Y., Vert, J.P., Kanehisa, M.: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 20, i363–i370 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mandal, A., Maji, P. (2017). Regularization and Shrinkage in Rough Set Based Canonical Correlation Analysis. In: Polkowski, L., et al. Rough Sets. IJCRS 2017. Lecture Notes in Computer Science(), vol 10313. Springer, Cham. https://doi.org/10.1007/978-3-319-60837-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-60837-2_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60836-5
Online ISBN: 978-3-319-60837-2
eBook Packages: Computer ScienceComputer Science (R0)