Machine Learning

, Volume 79, Issue 1–2, pp 47–71 | Cite as

Multi-view kernel construction

  • Virginia R. de Sa
  • Patrick W. Gallagher
  • Joshua M. Lewis
  • Vicente L. Malave
Open Access


In many problem domains data may come from multiple sources (or views), such as video and audio from a camera or text on and links to a web page. These multiple views of the data are often not directly comparable to one another, and thus a principled method for their integration is warranted. In this paper we develop a new algorithm to leverage information from multiple views for unsupervised clustering by constructing a custom kernel. We generate a multipartite graph (with the number of parts given by the number of views) that induces a kernel we then use for spectral clustering. Our algorithm can be seen as a generalization of co-clustering and spectral clustering and a relative of Kernel Canonical Correlation Analysis. We demonstrate the algorithm on four data sets: an illustrative artificial data set, synthetic fMRI data, voxels from an fMRI study, and a collection of web pages. Finally, we compare its performance to common alternatives.

Spectral clustering Minimizing-disagreement Multi-view fMRI analysis Kernel Canonical correlation analysis CCA Co-clustering 


  1. Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the IEEE international conference on data mining (pp. 19–26). Google Scholar
  2. Blaschko, M., & Lampert, C. (2008). Correlational spectral clustering. Computer Vision and Pattern Recognition. DOI: 10.1109/CVPR.2008.4587353. CVPR 2008. IEEE Conference on pp. 1–8 (2008). Google Scholar
  3. Blaschko, M. B., Lampert, C. H., & Gretton, A. (2008). Semi-supervised Laplacian regularization of kernel canonical correlation analysis. In ECML PKDD ’08: Proceedings of the 2008 European conference on machine learning and knowledge discovery in databases—Part I (pp. 133–145). Berlin/Heidelberg: Springer. Google Scholar
  4. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on computational learning theory (COLT-98) (pp. 92–100). Google Scholar
  5. Cai, D., He, X., Li, Z., Ma, W., & Wen, J. (2004). Hierarchical clustering of WWW image search results using visual, textual and link information. In Proceedings of the 12th annual ACM international conference on Multimedia (pp. 952–959). Google Scholar
  6. Charless Fowlkes Serge Belongie, F. C., & Malik, J. (2004). Spectral grouping using the Nystrom method. IEEE Transactions Pattern Analysis and Machine Intelligence, 26(2), 214–225. CrossRefGoogle Scholar
  7. Chaudhuri, K., Kakade, S., Livescu, K., & Sridharan, K. (2009). Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning. New York: ACM. Google Scholar
  8. de Sa, V. R. (1994). Learning classification with unlabeled data. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 112–119). San Mateo: Morgan Kaufmann. Google Scholar
  9. de Sa, V. R. (2005). Spectral clustering with two views. In ICML workshop on learning with multiple views (20–27). Google Scholar
  10. de Sa, V. R., & Ballard, D. H. (1998). Category learning through multimodality sensing. Neural Computation, 10(5), 1097–1117. CrossRefGoogle Scholar
  11. Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In KDD 2001 (pp. 269–274). Google Scholar
  12. Golland, Y., Golland, P., Bentin, S., & Malach, R. (2008). Data-driven clustering reveals a fundamental subdivision of the human cortex into two global systems. Neuropsychologia, 46(2), 540–553. CrossRefGoogle Scholar
  13. Golland, P., Golland, Y., & Malach, R. (2007). Detection of spatial activation patterns as unsupervised segmentation of fMRI Data. In LNCS : Vol. 4791. Proceedings of MICCAI: International Conference on Medical Image Computing and Computer Assisted Intervention (pp. 110–118). Berlin: Springer. Google Scholar
  14. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2003). Canonical correlation analysis; an overview with application to learning methods (Technical Report CSD-TR-03-02). Department of Computer Science, Royal Holloway, University of London. Google Scholar
  15. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 16, 2639–2664. zbMATHCrossRefGoogle Scholar
  16. Haxby, J., Gobbini, M., Furey, M., Ishai, A., Schouten, J., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425–2430. CrossRefGoogle Scholar
  17. Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 321–377. zbMATHGoogle Scholar
  18. Joachims, T. (2003). Transductive learning via spectral graph partitioning. In Proceedings of the 20th international conference on machine learning (ICML 2003) (pp. 290–297). Google Scholar
  19. Kleine, L. L., Monnet, V., Pechoux, C., & Trubuil, A. (2008). Role of bacterial peptidase f inferred by statistical analysis and further experimental validation. HFSP Journal, 2(1), 29–41. CrossRefGoogle Scholar
  20. Lai, P., & Fyfe, C. (2000). Kernel and nonlinear canonical correlation analysis. In IJCNN ’00: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks (IJCNN’00) (Vol. 4, p. 4614). Washington: IEEE Computer Society. Google Scholar
  21. Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72. Google Scholar
  22. Law, M., Topchy, A., & Jain, A. (2004). Clustering with soft and groupconstraints. In Joint IAPR international workshop on syntactical and structural pattern recognition and statistical pattern recognition (pp. 662–670). Google Scholar
  23. Law, M., Topchy, A., & Jain, A. (2005). Model-based clustering with probabilistic constraints. In Proceedings of SIAM data mining (pp. 641–645). Google Scholar
  24. Loeff, N., Alm, C., & Forsyth, D. (2006). Discriminating image senses by clustering with multimodal features. In Proceedings of the COLING/ACL 2006 main conference poster sessions (pp. 547–554). Google Scholar
  25. Long, B., Wu, X., Zhang, Z. M., & Yu, P. S. (2006). Unsupervised learning on k-partite graphs. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 317–326). New York: ACM. CrossRefGoogle Scholar
  26. Long, B., Yu, P. S., & Zhang, Z. M. (2008). A general model for multiple view unsupervised learning. In SDM (pp. 822–833). Philadelphia: SIAM. Google Scholar
  27. Lu, Z., & Leen, T. (2005). Semi-supervised learning with penalized probabilistic clustering. Advances in Neural Information Processing Systems, 17, 849–856. Google Scholar
  28. Lu, Z., & Leen, T. (2007). Penalized Probabilistic Clustering. Neural Computation, 19(6), 1528–1567. zbMATHCrossRefMathSciNetGoogle Scholar
  29. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems (Vol. 14). Google Scholar
  30. Rakotomamonjy, A., Bach, F., Grandvalet, Y., & Canu, S. (2008). SimpleMKL. Journal of Machine Learning Research, 9, 2491–2521. MathSciNetGoogle Scholar
  31. Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 888–905. Google Scholar
  32. Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the eighteenth international conference on machine learning (pp. 577–584). Google Scholar
  33. Wang, Y., & Rajapakse, J. C. (2006). Contextual modeling of functional mr images with conditional random fields. IEEE Transactions on Medical Imaging, 25(6), 804–812. CrossRefGoogle Scholar
  34. Woolrich, M. W., Behrens, T. E., Beckmann, C. F., & Smith, S. M. (2005). Mixture models with adaptive spatial regularization for segmentation with an application to fmri data. IEEE Transactions on Medical Imaging, 24(1), 1–11. CrossRefGoogle Scholar
  35. Zha, H., Ding, C., & Gu, M. (2001). Bipartite graph partitioning and data clustering. In CIKM ’01 (pp. 25–32). Google Scholar
  36. Zhou, D., & Burges, C. J. C. (2007). Spectral clustering and transductive learning with multiple views. In ICML ’07: Proceedings of the 24th international conference on machine learning (pp. 1159–1166). New York: ACM. DOI: CrossRefGoogle Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  • Virginia R. de Sa
    • 1
  • Patrick W. Gallagher
  • Joshua M. Lewis
  • Vicente L. Malave
  1. 1.Department of Cognitive ScienceUniversity of CaliforniaSan DiegoUSA

Personalised recommendations