Machine Learning

, Volume 83, Issue 3, pp 331–353 | Cite as

Sparse canonical correlation analysis

Article

Abstract

We present a novel method for solving Canonical Correlation Analysis (CCA) in a sparse convex framework using a least squares approach. The presented method focuses on the scenario when one is interested in (or limited to) a primal representation for the first view while having a dual representation for the second view. Sparse CCA (SCCA) minimises the number of features used in both the primal and dual projections while maximising the correlation between the two views. The method is compared to alternative sparse solutions as well as demonstrated on paired corpuses for mate-retrieval. We are able to observe, in the mate-retrieval, that when the number of the original features is large SCCA outperforms Kernel CCA (KCCA), learning the common semantic space from a sparse set of features.

Keywords

Sparsity Canonical correlation analysis 

References

  1. Akaho, S. (2001). A kernel method for canonical correlation analysis. In International meeting of psychometric society, Osaka. Google Scholar
  2. Bach, F., & Jordan, M. (2002). Kernel independent component analysis. Journal of Machine Leaning Research, 3, 1–48. MathSciNetCrossRefGoogle Scholar
  3. Breiman, L., & Friedman, L. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80, 580–598. MathSciNetMATHCrossRefGoogle Scholar
  4. Chen, S. S., Donoho, D. L., & Saunders, M. A. (1999). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61. MathSciNetMATHCrossRefGoogle Scholar
  5. d’Aspremont, A., Ghaoui, L. E., Jordan, M. I., & Lanckriet, G. (2007). A direct formulation for sparse pca using semidefinite programming. SIAM Review, 49(3), 434–448. MathSciNetMATHCrossRefGoogle Scholar
  6. Dhanjal, C., Gunn, S. R., & Shawe-Taylor, J. (2006). Sparse feature extraction using generalised partial least squares. In Proceedings of the IEEE international workshop on machine learning for signal processing (pp. 27–32). Google Scholar
  7. Friman, O., Borga, M., Lundberg, P., & Knutsson, H. (2001a). A correlation framework for functional MRI data analysis. In Proceedings of the 12th Scandinavian conference on image analysis, Bergen, Norway, June 2001. Google Scholar
  8. Friman, O., Carlsson, J., Lundberg, P., Borga, M., & Knutsson, H. (2001b). Detection of neural activity in functional MRI using canonical correlation analysis. Magnetic Resonance in Medicine, 450(2), 323–330. CrossRefGoogle Scholar
  9. Fukumizu, K., Bach, F. R., & Gretton, A. (2007). Consistency of kernel canonical correlation analysis. Journal of Machine Learning Research, 8, 361–383. MathSciNetGoogle Scholar
  10. Fyfe, C., & Lai, P. L. (2000). ICA using kernel canonical correlation analysis. In Proc. int. workshop on independent component analysis and blind signal separation (ICA 2000) (pp. 279–284). Google Scholar
  11. Hardoon, D. R., & Shawe-Taylor, J. (2003). KCCA for different level precision in content-based image retrieval. In Proceedings of third international workshop on content-based multimedia indexing, IRISA, Rennes, France. Google Scholar
  12. Hardoon, D., & Shawe-Taylor, J. (2007). Sparse canonical correlation analysis (Technical report). UK: University College London. Google Scholar
  13. Hardoon, D. R., & Shawe-Taylor, J. (2009). Convergence analysis of kernel canonical correlation analysis: Theory and practice. Machine Learning, 74(1), 23–38. CrossRefGoogle Scholar
  14. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2003). Canonical correlation analysis; an overview with application to learning methods (Technical Report CSD-TR-03-02). Royal Holloway University of London. Google Scholar
  15. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 16, 2639–2664. MATHCrossRefGoogle Scholar
  16. Hardoon, D. R., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). A correlation approach for automatic image annotation. In Springer LNAI (Vol. 4093, pp. 681–692). Berlin: Springer. Google Scholar
  17. Hardoon, D. R., Mourao-Miranda, J., Brammer, M., & Shawe-Taylor, J. (2007). Unsupervised analysis of fmri data using kernel canonical correlation. NeuroImage, 37(4), 1250–1259. CrossRefGoogle Scholar
  18. Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models. London/Boca Raton: Chapman & Hall/CRC Press. MATHGoogle Scholar
  19. Heiler, M., & Schnor, C. (2006). Learning sparse representations by non-negative matrix factorization and sequential cone programming. Journal of Machine Learning Research, 7, 1385–1407. Google Scholar
  20. Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 312–377. Google Scholar
  21. Ketterling, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58, 433–451. MathSciNetCrossRefGoogle Scholar
  22. Koehn, P. (2005). Europarl: A multilingual corpus for evaluation of machine translation. In Conference proceedings: the tenth machine translation summit (pp. 79–86). Google Scholar
  23. Lai, P. L., & Fyfe, C. (2000). Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10(5), 365–377. Google Scholar
  24. Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2006). Efficient sparse coding algorithms. In Proceedings of the 20th annual conference on neural information process systems (NIPS). Google Scholar
  25. Moghaddam, B., Weiss, Y., & Avidan, S. (2006). Spectral bounds for sparse pca: Exact and greedy algorithms. In Neural information processing systems (NIPS 06). Google Scholar
  26. Ralf, S., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., & Varga, D. (2006). The jrc-acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th international conference on language resources and evaluation (LREC’2006). Google Scholar
  27. Roth, V. (2004). The generalized lasso. IEEE Transactions on Neural Networks, 15(1), 16–28. CrossRefGoogle Scholar
  28. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press. Google Scholar
  29. Sriperumbudur, B. K., Torres, D., & Lanckriet, G. (2007). Sparse eigen methods by d.c. programming. In C. Brodley & A. Danyluk (Eds.), Proceedings of 2nd international conference on machine learning (pp. 831–838). San Mateo: Morgan Kaufmann. Google Scholar
  30. Szedmak, S., De Bie, T., & Hardoon, D. R. (2007). A metamorphosis of canonical correlation analysis into multivariate maximum margin learning. In 15th European symposium on artificial neural networks (ESANN). Google Scholar
  31. Tibshirani, R. (1994). Regression shrinkage and selection via the lasso (Technical report). University of Toronto. Google Scholar
  32. Torres, D., Turnbull, D., Barrington, L., & Lanckriet, G. (2007). Identifying words that are musically meaningful. In Proceedings of the 8th international conference on music information retrieval. Google Scholar
  33. Vinokourov, A., Hardoon, D. R., & Shawe-Taylor, J. (2003). Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of fourth international symposium on independent component analysis and blind source separation, Nara, Japan. Google Scholar
  34. Weston, J., Elisseeff, A., Scholkopf, B., & Tipping, M. (2003). Use of the zero norm with linear models and kernel method. Journal of Machine Learning Research, 3, 1439–1461. MATHCrossRefGoogle Scholar
  35. Zou, H., Hastie, T., & Tibshirani, R. (2004). Sparse principal component analysis (Technical report). Statistics department, Stanford University. Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Data Mining Department, Institute for Infocomm Research (I2R)A*STARSingaporeSingapore
  2. 2.Centre for Computational Statistics and Machine Learning, Department of Computer ScienceUniversity College LondonLondonUK

Personalised recommendations