Online Bayesian max-margin subspace learning for multi-view classification and regression

  • Jia He
  • Changying DuEmail author
  • Fuzhen Zhuang
  • Xin Yin
  • Qing He
  • Guoping Long


Multi-view data have become increasingly popular in many real-world applications where data are generated from different information channels or different views such as image + text, audio + video, and webpage + link data. Last decades have witnessed a number of studies devoted to multi-view learning algorithms, especially the predictive latent subspace learning approaches which aim at obtaining a subspace shared by multiple views and then learning models in the shared subspace. However, few efforts have been made to handle online multi-view learning scenarios. In this paper, we propose an online Bayesian multi-view learning algorithm which learns predictive subspace with the max-margin principle. Specifically, we first define the latent margin loss for classification or regression in the subspace, and then cast the learning problem into a variational Bayesian framework by exploiting the pseudo-likelihood and data augmentation idea. With the variational approximate posterior inferred from the past samples, we can naturally combine historical knowledge with new arrival data, in a Bayesian passive-aggressive style. Finally, we extensively evaluate our model on several real-world data sets and the experimental results show that our models can achieve superior performance, compared with a number of state-of-the-art competitors.


Multi-view learning Online learning Bayesian subspace learning Max-margin Classification Regression 



This work was supported by the National Key Research and Development Program of China under Grant No. 2018YFB1004300, the National Natural Science Foundation of China under Grant No. U1811461, 61602449, U1836206, 61773361, the Project of Youth Innovation Promotion Association CAS under Grant No. 2017146.


  1. Beal, J. M. (2003). Variational algorithms for approximate bayesian inference. London: University College London.Google Scholar
  2. Blum, A, & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 111th annual conference on computational learning theory (pp. 92–100).Google Scholar
  3. Brbić, M., & Kopriva, I. (2018). Multi-view low-rank sparse subspace clustering. Pattern Recognition, 73, 247–258.CrossRefGoogle Scholar
  4. Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  5. Chechik, G., Sharma, V., Shalit, U., & Bengio, S. (2010). Large scale online learning of image similarity through ranking. The Journal of Machine Learning Research, 11, 1109–1135.MathSciNetzbMATHGoogle Scholar
  6. Chen, K., & Jie, Y. (2014). Short-term wind speed prediction using an unscented Kalman filter based state-space support vector regression approach. Applied Energy, 113(6), 690–705.CrossRefGoogle Scholar
  7. Chen, N., Zhu, J., Sun, F., & Xing, E. P. (2012). Large-margin predictive latent subspace learning for multiview data analysis. Pattern Analysis and Machine Intelligence, 34(12), 2365–2378.CrossRefGoogle Scholar
  8. Chen, Z., Yang, L. F., Li, C. J., & Zhao, T. (2017). Online partial least square optimization: Dropping convexity for better efficiency and scalability. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 777–786). JMLR. org.Google Scholar
  9. Chen, Z., & Zhou, J. (2018). Collaborative multiview hashing. Pattern Recognition, 75, 149–160.CrossRefGoogle Scholar
  10. Chiang, D., Marton, Y., & Resnik, P.. (2008). Online large-margin training of syntactic and structural translation features. In Proceedings of the conference on empirical methods in natural language processing (pp. 224–233).Google Scholar
  11. Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y.-T. (2009). Nus-wide: A real-world web image database from National University of Singapore. In CIVR. Santorini, Greece., July 8–10.Google Scholar
  12. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. The Journal of Machine Learning Research, 7, 551–585.MathSciNetzbMATHGoogle Scholar
  13. Deng, S., Gao, K., Du, C., Ma, W., Long, G., & Li, Y. (2016). Online variational bayesian support vector regression. In International joint conference on neural networks (pp. 3950–3957).Google Scholar
  14. Du, C., Zhe, S., Zhuang, F., Qi, Y., He, Q., & Shi, Z. (2015). Bayesian maximum margin principal component analysis. In The 29th AAAI conference on artificial intelligence.Google Scholar
  15. Gilks, W. R. (2005). Markov chain monte carlo. New York: Wiley.Google Scholar
  16. Gönen, M., & Alpaydın, E. (2011). Multiple kernel learning algorithms. The Journal of Machine Learning Research, 12, 2211–2268.MathSciNetzbMATHGoogle Scholar
  17. Grangier, D., & Bengio, S. (2008). A discriminative kernel-based approach to rank images from text queries. Pattern Analysis and Machine Intelligence, 30(8), 1371–1384.CrossRefGoogle Scholar
  18. Guo, Y., & Xiao, M. (2012). Cross language text classification via subspace co-regularized multi-view learning. Computer Science—Computation and Language. In Proceedings of the 29th International Conference on Machine Learning (915–922). Omnipress.Google Scholar
  19. Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.CrossRefGoogle Scholar
  20. Hazan, E., Agarwal, A., & Kale, S. (2007). Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2–3), 169–192.CrossRefGoogle Scholar
  21. He, J., Du, C., Zhuang, F., Yin, X., He, Q., & Long, G. (2016). Online bayesian max-margin subspace multi-view learning. In IJCAI (pp. 1555–1561).Google Scholar
  22. Jiang, H., & He, W. (2012). Grey relational grade in local support vector regression for financial time series prediction. Expert Systems with Applications, 39(3), 2256–2262.CrossRefGoogle Scholar
  23. Kalteh, A. M. (2013). Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform. Computers and Geosciences, 54(4), 1–8.CrossRefGoogle Scholar
  24. Kazem, A., Sharifi, E., Hussain, F. K., Saberi, M., & Hussain, O. K. (2013). Support vector regression with chaos-based firefly algorithm for stock market price forecasting. Applied Soft Computing, 13(2), 947–958.CrossRefGoogle Scholar
  25. Lan, C., Deng, Y., Li, X., & Huan, J. (2016). Co-regularized least square regression for multi-view multi-class classification. In 2016 International joint conference on neural networks (IJCNN) (pp. 342–347). IEEE.Google Scholar
  26. Li, Y., Yang, M., & Zhang, Z. (2016). Multi-view representation learning: A survey from shallow methods to deep methods. arXiv preprint arXiv:1610.01206.
  27. Lin, K.-P., Lu, Y.-M., Pai, P.-F., & Chang, P.-T. (2013). Revenue forecasting using a least-squares support vector regression model in a fuzzy environment. Information Sciences, 220(1), 196–209. CrossRefGoogle Scholar
  28. Liu, Y, Zheng, Y, Liang, Y, Liu, S., & Rosenblum, D. S. (2016). Urban water quality prediction based on multi-task multi-view learning. In Proceedings of the 25th international joint conference on artificial intelligence.Google Scholar
  29. Long, B., Yu, P. S., & Zhang, Z. (2008). A general model for multiple view unsupervised learning. In SIAM international conference on data mining (pp. 822–833).Google Scholar
  30. Lu, C.-T., He, L., Shao, W., Cao, B., & Yu, P. S. (2017). Multilinear factorization machines for multi-task multi-view learning. In Proceedings of the 10th ACM international conference on web search and data mining (pp. 701–709). ACM.Google Scholar
  31. Merugu, S., Rosset, S., & Perlich, C. (2006). A new multi-view regression approach with an application to customer wallet estimation. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 656–661). ACM.Google Scholar
  32. Nguyen-Tuong, D., Peters, J. R., & Seeger, M. (2009). Local gaussian process regression for real time online model learning. In Advances in neural information processing systems (pp. 1193–1200).Google Scholar
  33. Parrella, F. (2007). Online support vector regression. Master’s Thesis, Department of Information Science, University of Genoa, Italy.Google Scholar
  34. Polson, N. G., & Scott, S. L. (2011). Data augmentation for support vector machines. Bayesian Analysis, 6(1), 43–47.MathSciNetCrossRefGoogle Scholar
  35. Quang, M. H., Bazzani, L., & Murino, V. (2013). A unifying framework for vector-valued manifold regularization and multi-view learning. In International conference on machine learning (pp. 100–108).Google Scholar
  36. Reents, G., & Urbanczik, R. (1998). Self-averaging and on-line learning. Physical Review Letters, 80(24), 5448.CrossRefGoogle Scholar
  37. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.CrossRefGoogle Scholar
  38. Shao, W., He, L., Lu, C.-T., Wei, X., & Philip, S Y. (2016). Online unsupervised multi-view feature selection. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1203–1208). IEEE.Google Scholar
  39. Sharma, A., Kumar, A., Daume III, H., & Jacobs, D. W. (2012). Generalized multiview analysis: A discriminative latent space. In IEEE conference on computer vision and pattern recognition (pp. 2160–2167).Google Scholar
  40. Shi, T., & Zhu, J. (2013). Online Bayesian passive-aggressive learning. In International conference on machine learning (pp. 378–386).Google Scholar
  41. Smola, A. J., & Lkopf, B. (2004). A tutorial on support vector regression. Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
  42. Sun, S., & Chao, G. (2013). Multi-view maximum entropy discrimination. In International joint conference on artificial intelligence (pp. 1706–1712).Google Scholar
  43. Ting, H. (2011). Online regression with varying gaussians and non-identical distributions. Analysis and Applications, 9(04), 395–408.MathSciNetCrossRefGoogle Scholar
  44. Wang, K., He, R., Wang, W., Wang, L., & Tan, T. (2013). Learning coupled feature spaces for cross-modal matching. In IEEE international conference on computer vision (pp. 2088–2095).Google Scholar
  45. Xie, L., Shen, J., Han, J., Zhu, L., & Shao, L. (2017). Dynamic multi-view hashing for online image retrieval. In IJCAI international joint conference on artificial intelligence (pp. 3133–3139). AAAI Press.Google Scholar
  46. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on association for computational linguistics (pp. 189–196).Google Scholar
  47. Ye, G., Liu, D., Jhuo, I.-H., Chang, S.-F. et al. (2012). Robust late fusion with rank minimization. In IEEE conference on computer vision and pattern recognition (pp. 3021–3028).Google Scholar
  48. Ye, H.-J., Zhan, D.-C., Miao, Y., Jiang, Y., & Zhou, Z. (2015). Rank consistency based multi-view learning: A privacy-preserving approach. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 991–1000). ACM.Google Scholar
  49. Zhang, C., Hu, Q., Fu, H., Zhu, P., & Cao, X.. (2017). Latent multi-view subspace clustering. In IEEE conference on computer vision and pattern recognition (pp. 4333–4341).Google Scholar
  50. Zhao, H., Ding, Z., & Fu, Y. (2017). Multi-view clustering via deep matrix factorization. In AAAI (pp. 2921–2927).Google Scholar
  51. Zheng, S., Cai, X., Ding, C., Nie, F., & Huang, H.. (2015). A closed form solution to multi-view low-rank regression. In 29th AAAI conference on artificial intelligence.Google Scholar
  52. Zhu, J., Ahmed, A., & Xing, E. P. (2009). Medlda: Maximum margin supervised topic models for regression and classification. In Proceedings of the 26th annual international conference on machine learning (pp. 1257–1264). ACM.Google Scholar
  53. Zhu, J., Chen, N., Perkins, H., & Zhang, B. (2014). Gibbs max-margin topic models with data augmentation. Journal of Machine Learning Research, 15(1), 1073–1110.MathSciNetzbMATHGoogle Scholar
  54. Zhu, J., Chen, N., & Xing, E. P. (2014). Bayesian inference with posterior regularization and applications to infinite latent svms. Journal of Machine Learning Research, 15, 1799.MathSciNetzbMATHGoogle Scholar
  55. Zhu, Y., Gao, W., & Zhou, Z.-H. (2015). One-pass multi-view learning. In Asian conference on machine learning (pp. 407–422).Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.The Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina
  2. 2.Huawei EI Innovation LabBeijingChina
  3. 3.Huawei Noah’s Ark LabBeijingChina
  4. 4.The University of Chinese Academy of SciencesBeijingChina
  5. 5.The Lab of Parallel Software and Computational ScienceInstitute of Software, CASBeijingChina

Personalised recommendations