Skip to main content
Log in

Abstract

There is a tradeoff between generalization capability and computational overhead in multi-class learning. We propose a generative probabilistic multi-class classifier, considering both the generalization capability and the learning/prediction rate. We show that the classifier has a max-margin property. Thus, prediction on future unseen data can nearly achieve the same performance as in the training stage. In addition, local variables are eliminated, which greatly simplifies the optimization problem. By convex and probabilistic analysis, an efficient online learning algorithm is developed. The algorithm aggregates rather than averages dualities, which is different from the classical situations. Empirical results indicate that our method has a good generalization capability and coverage rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, A., Kakade, S.M., Karampatziakis, N., et al., 2014. Least squares revisited: calable approaches for multiclass prediction. Proc. Int. Conf. on Machine Learning, p.541–549.

    Google Scholar 

  • Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer, New York, USA.

    MATH  Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3(Jan):993–1022.

    MATH  Google Scholar 

  • Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press, Cambridge, UK.

    Book  Google Scholar 

  • Cai, Q., Yin, Y.F., Man, H., 2013. DSPM: dynamic structure preserving map for action recognition. IEEE Int. Conf. on Multimedia and Expo, p.1–6. http://dx.doi.org/10.1109/ICME.2013.6607606

    Google Scholar 

  • Coates, A., Lee, H., Ng, A.Y., 2011. An analysis of singlelayer networks in unsupervised feature learning. Int. Conf. on Artificial Intelligence and Statistics, p.215–223.

    Google Scholar 

  • Daniely, A., Shalev-Shwartz, S., 2014. Optimal learners for multiclass problems. Proc. Conf. on Learning Theory, p.287–316.

    Google Scholar 

  • Duchi, J., Hazan, E., Singer, Y., 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res., 12:2121–2159.

    MathSciNet  MATH  Google Scholar 

  • Galar, M., Fernández, A., Barrenechea, E., et al., 2011. An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Patt. Recogn., 44(8): 1761–1776. http://dx.doi.org/10.1016/j.patcog.2011.01.017

    Article  Google Scholar 

  • Hazan, E., Rakhlin, A., Bartlett, P.L., 2007. Adaptive online gradient descent. In: Platt, J.C., Koller, D., Singer, Y., et al. (Eds.), Advances in Neural Information Processing Systems 20. MIT Press, Canada, p.65–72.

    Google Scholar 

  • Hu, T.C., Yu, J.H., 2015. Generalized entropy based semi-supervised learning. IEEE/ACIS Int. Conf. on Computer and Information Science, p.259–263. http://dx.doi.org/10.1109/ICIS.2015.7166603

    Google Scholar 

  • Hu, T.C., Yu, J.H., 2016. Incremental max-margin learning for semi-supervised multi-class problem. Stud. Comput. Intell., 612:31–43. http://dx.doi.org/10.1007/978-3-319-23509-7_3

    Google Scholar 

  • Jebara, T., 2004. Machine learning: discriminative and generative. In: Meila, M. (Ed.), the Kluwer International Series in Engineering and Computer Science. Kluwer Academic, Germany.

    Chapter  Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., et al., 1998. Gradientbased learning applied to document recognition. Proc. IEEE, 86(11): 2278–2324.

    Article  Google Scholar 

  • Nene, S.A., Nayar, S.K., Murase, H., 1996a. Columbia Object Image Library (COIL-20) Available from http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php [Accessed on Feb. 1, 2016].

    Google Scholar 

  • Nene, S.A., Nayar, S.K., Murase, H., 1996b. Columbia Object Image Library (COIL-100) Available from http://www.cs.columbia.edu/CAVE/software/softlib/ coil-100.php [Accessed on Feb. 1, 2016].

    Google Scholar 

  • Rahimi, A., Recht, B., 2007. Random features for large-scale kernel machines. In: Platt, J.C., Koller, D., Singer, Y., et al. (Eds.), Advances in Neural Information Processing Systems 20. MIT Press, Canada, p.1177–1184.

    Google Scholar 

  • Ramaswamy, H.G., Babu, B.S., Agarwal, S., et al., 2014. On the consistency of output code based learning algorithms for multiclass learning problems. Proc. Conf. on Learning Theory, p.885–902.

    Google Scholar 

  • Shalev-Shwartz, S., 2007. Online learning: theory, algorithms and applications. PhD Thesis, Hebrew University, Jerusalem, Israel.

    MATH  Google Scholar 

  • Shalev-Shwartz, S., Kakade, S.M., 2009. Mind the duality gap: logarithmic regret algorithms for online optimization. In: Koller, D., Schuurmans, D., Bengio, Y. (Eds.), Advances in Neural Information Processing Systems 21. MIT Press, Canada, p.1457–1464.

    Google Scholar 

  • Srebro, N., Sridharan, K., Tewari, A., 2011. On the universality of online mirror descent. In: Saul, L.K., Weiss, Y., Bottou, L. (Eds.), Advances in Neural Information Processing Systems 17. MIT Press, Canada, p.2645–2653.

    Google Scholar 

  • Zhu, J., 2012. Max-margin nonparametric latent feature models for link prediction. Proc. Int. Conf. on Machine Learning, p.719–726.

    Google Scholar 

  • Zhu, J., Xing, E.P., 2009. Maximum entropy discrimination Markov networks. J. Mach. Learn. Res., 10(Nov):2531–2569.

    MathSciNet  MATH  Google Scholar 

  • Zhu, J., Chen, N., Xing, E.P., 2011. Infinite latent SVM for classification and multi-task learning. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., et al. (Eds.), Advances in Neural Information Processing Systems 24. MIT Press, Canada, p.1620–1628.

    Google Scholar 

  • Zhu, J., Chen, N., Perkins, H., et al., 2013. Gibbs maxmargin topic models with fast sampling algorithms. Proc. Int. Conf. on Machine Learning, p.124–132.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao-cheng Hu.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61379069), the Major Program of the National Social Science Foundation of China (No. 12&ZD231), and the National Key Technology R&D Program of China (No. 2014BAK09B04)

ORCID: Tao-cheng HU, http://orcid.org/0000-0002-6722-2420

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Tc., Yu, Jh. Max-margin based Bayesian classifier. Frontiers Inf Technol Electronic Eng 17, 973–981 (2016). https://doi.org/10.1631/FITEE.1601078

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1601078

Keywords

CLC number

Navigation