Advertisement

International Journal of Computer Vision

, Volume 110, Issue 3, pp 308–327 | Cite as

Low-Rank Bilinear Classification: Efficient Convex Optimization and Extensions

  • Takumi KobayashiEmail author
Article

Abstract

In pattern classification, it is needed to efficiently treat not only feature vectors but also feature matrices defined as two-way data, while preserving the two-way structure such as spatio-temporal relationships. The classifier for the feature matrix is generally formulated in a bilinear form composed of row and column weights which jointly result in a matrix weight. The rank of the matrix should be low from the viewpoint of generalization performance and computational cost. For that purpose, we propose a low-rank bilinear classifier based on the efficient convex optimization. In the proposed method, the classifier is optimized by minimizing the trace norm of the classifier (matrix) to reduce the rank without any hard constraint on it. We formulate the optimization problem in a tractable convex form and provide the procedure to solve it efficiently with the global optimum. In addition, we propose two novel extensions of the bilinear classifier in terms of multiple kernel learning and cross-modal learning. Through kernelizing the bilinear method, we naturally induce a novel multiple kernel learning. The method integrates both the inter kernels between heterogeneous reproducing kernel Hilbert spaces (RKHSs) and the ordinary kernels within respective RKHSs into a new discriminative kernel in a unified manner using the bilinear model. Besides, for cross-modal learning, we consider to map into the common space the multi-modal features which are subsequently classified in that space. We show that the projection and the classification are jointly represented by the bilinear model, and then propose the method to optimize both of them simultaneously in the bilinear framework. In the experiments on various visual classification tasks, the proposed methods exhibit favorable performances compared to the other methods.

Keywords

Bilinear classifier Low-rank matrix Convex optimization Multiple kernel learning Cross modal learning 

References

  1. Akaho, S. (2001). A kernel method for canonical correlation analysis. In international meeting on psychometric society (IMPS2001).Google Scholar
  2. Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.CrossRefzbMATHGoogle Scholar
  3. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  4. Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:1–27:27.Google Scholar
  5. Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46(1), 131–159.CrossRefzbMATHGoogle Scholar
  6. Christoudias, C.M., Urtasun, R., Salzmann, M., Darrell, T. (2010). Learning to recognize objects from unseen modalities. In European conference on computer vision (ECCV) (pp. 677–691).Google Scholar
  7. Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C. (2004). Visual categorization with bags of keypoints. In: ECCV workshop on statistical learning in computer vision, (pp. 1–22).Google Scholar
  8. Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 886–893).Google Scholar
  9. Dempe, S. (2002). Foundations of bilevel programming. Dordrecht: Kluwer Academic Publishers.zbMATHGoogle Scholar
  10. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2nd ed.). New York: Wiley-Interscience.zbMATHGoogle Scholar
  11. Eriksson, A., van den Hengel, A. (2010). Efficient computation of robust low-rank matrix approximations in the presence of missing data using the \(l_1\) norm. In IEEE conference on computer vision and pattern recognition (CVPR), (pp. 771–778).Google Scholar
  12. Everingham, M., Gool, L. V., Williams, C., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. Google Scholar
  13. Fan, R. E., Chen, P. H., & Lin, C. J. (2005). Working set selection using second order information for training support vector machines. Journal of Machine Learning Research, 6, 1889–1918.zbMATHMathSciNetGoogle Scholar
  14. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.zbMATHGoogle Scholar
  15. Gehler, P., Nowozin, S. (2009). On feature combination for multiclass object classification. In International conference on computer vision (ICCV) (pp. 221–228).Google Scholar
  16. Graepel, T., Herbrich, R., Schölkopf, B., Smola, A., Bartlett, P., Müller, K.R., Obermayer, K., Williamson, R. (1999). Classification on proximity data with lp-machines. In international conference on artificial neural networks (ICANN) (pp. 304–309).Google Scholar
  17. Guillaumin, M., Verbeek, J., Schmid, C. (2010). Multimodal semi-supervised learning for image classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 902–909).Google Scholar
  18. Kan, M., Shan, S., Zhang, H., Lao, S., Chen, X. (2012). Multi-view discriminant analysis. In European conference on computer vision (ECCV) (pp. 808–821).Google Scholar
  19. Kim, T.K., Wong, S.F., Cipolla, R. (2007). Tensor canonical correlation analysis for action classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  20. Kobayashi, T., Otsu, N. (2008). Image feature extraction using gradient local auto-correlations. In European conference on computer vision (ECCV) (pp. 346–358).Google Scholar
  21. Kobayashi, T., & Otsu, N. (2009). A three-way auto-correlation based approach to motion recognition. Pattern Recognition Letters, 30(3), 185–192.CrossRefGoogle Scholar
  22. Kobayashi, T., Otsu, N. (2012a). Efficient optimization for low-rank integrated bilinear classifiers. In European conference on computer vision (ECCV) (pp. 474–487).Google Scholar
  23. Kobayashi, T., & Otsu, N. (2012b). Motion recognition using local auto-correlation of space-time gradients. Pattern Recognition Letters, 33(9), 1188–1195.CrossRefGoogle Scholar
  24. Lampert, C.H., Nickisch, H., Harmeling, S. (2009) Learning to detect unseen object classes by between-class attribute transfer. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 951–958).Google Scholar
  25. Lanckriet, G. R., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.zbMATHGoogle Scholar
  26. Lazebnik, S., Schmid, C., Ponce, J. (2004). Semi-local affine parts for object recognition. In British machine vision conference (BMVC) (pp. 779–788).Google Scholar
  27. Lazebnik, S., Schmid, C., Ponce, J. (2005). A maximum entropy framework for part-based texture and object recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 832–838).Google Scholar
  28. Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791.CrossRefGoogle Scholar
  29. Ling, H., Soatto, S. (2007). Proximity distribution kernels for geometric context in category recognition. In International conference on computer vision (ICCV) (pp. 1–8).Google Scholar
  30. Loeff, N., Farhadi, A. (2008). Scene discovery by matrix factorization. In European conference on computer vision (ECCV) (pp. 451–464). Google Scholar
  31. Lowe, D. G. (2004). Distinctive image features from scale invariant features. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  32. Martínez, A., Benavente, R. (1998). The AR Face Database. Tech. Rep. 24, Computer Vision Center, Bellatera.Google Scholar
  33. Nilsback, M.E., Zisserman, A. (2006). A visual vocabulary for flower classification. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1447–1454).Google Scholar
  34. Nilsback, M.E., Zisserman, A. (2008). Automated flower classification over a large number of classes. In Indian conference on computer vision, graphics and image processing (ICVGIP) (pp. 722–729).Google Scholar
  35. Nocedal, J., & Wright, S. J. (1999). Numerical optimization. New York: Springer.CrossRefzbMATHGoogle Scholar
  36. Osherson, D. N., Stern, J., Wilkie, O., Stob, M., & Smith, E. E. (1991). Default probability. Cognitive Science, 15(2), 251–269.CrossRefGoogle Scholar
  37. Pirsiavash, H., Ramanan, D., Fowlkes, C. (2009). Bilinear classifiers for visual recognition. In Advances in neural information processing systems 22 (pp. 1482–1490).Google Scholar
  38. Rakotomamonjy, A., Bach, F. R., Canu, S., & Grandvalet, Y. (2008). SimpleMKL. Journal of Machine Learning Research, 9, 2491–2521.zbMATHMathSciNetGoogle Scholar
  39. Rennie, J.D., Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In international conference on machine learning (ICML) (pp. 713–719).Google Scholar
  40. Rodriguez, M., Ahmed, J., Shah, M. (2008). Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  41. Schölkopf, B., & Smola, A. J. (2001). Learning with kernels. Cambridge: MIT Press.Google Scholar
  42. Sharma, A., Jacobs, D.W. (2011). Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 593–600).Google Scholar
  43. Smola, A. J., Bartlett, P., Schölkopf, B., & Schuurmans, D. (2000). Advances in large-margin classifiers. Cambridge: MIT Press.zbMATHGoogle Scholar
  44. Srebro, N., Rennie, J. D. M., & Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1329–1336). Cambridge: MIT Press.Google Scholar
  45. Tenenbaum, J. B., & Freeman, W. T. (2000). Separating style and content with bilinear models. Neural Computation, 12(6), 1247–1283.CrossRefGoogle Scholar
  46. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.zbMATHGoogle Scholar
  47. Varma, M., Ray, D. (2007). Learning the discriminative power-invariance trade-off. In international conference on computer vision (ICCV) (pp. 1–8).Google Scholar
  48. Wang, X., & Tang, X. (2009). Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 1955–1967.CrossRefGoogle Scholar
  49. Wolf, L., Jhuang, H., Hazan, T. (2007). Modeling appearances with low-rank svm. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–6).Google Scholar
  50. Yang, J., Zhang, D., Frangi, A. F., & Yang, J. Y. (2004). Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence, 26(1), 131–137.CrossRefGoogle Scholar
  51. Ye, J., Janardan, R., & Li, Q. (2005). Two-dimensional linear discriminant analysis. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1569–1576). Cambridge: MIT Press.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.National Institute of Advanced Industrial Science and TechnologyTsukubaJapan

Personalised recommendations