Multi-label optimal margin distribution machine

  • Zhi-Hao Tan
  • Peng Tan
  • Yuan JiangEmail author
  • Zhi-Hua Zhou
Part of the following topical collections:
  1. Special Issue of the ACML 2019 Journal Track


Multi-label support vector machine (Rank-SVM) is a classic and effective algorithm for multi-label classification. The pivotal idea is to maximize the minimum margin of label pairs, which is extended from SVM. However, recent studies disclosed that maximizing the minimum margin does not necessarily lead to better generalization performance, and instead, it is more crucial to optimize the margin distribution. Inspired by this idea, in this paper, we first introduce margin distribution to multi-label learning and propose multi-label Optimal margin Distribution Machine (mlODM), which optimizes the margin mean and variance of all label pairs efficiently. Extensive experiments in multiple multi-label evaluation metrics illustrate that mlODM outperforms SVM-style multi-label methods. Moreover, empirical study presents the best margin distribution and verifies the fast convergence of our method.


Optimal margin distribution machine Multi-label learning Support vector machine Margin theory 



This research was supported by the National Key R&D Program of China (2018YFB1004300), NSFC (61673201), and the Collaborative Innovation Center of Novel Software Technology and Industrialization.


  1. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.CrossRefGoogle Scholar
  2. Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11(7), 1493–1517.CrossRefGoogle Scholar
  3. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.zbMATHGoogle Scholar
  4. Elisseeff, A., & Weston, J. (2002). A kernel method for multi-labelled classification. In T. G. Dietterich, S. Becker and Z. Ghahramani (Eds.), Advances in neural information processing systems (pp. 681–687). MIT Press.Google Scholar
  5. Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1–2), 95–110.MathSciNetCrossRefGoogle Scholar
  6. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetzbMATHCrossRefGoogle Scholar
  7. Gao, W., & Zhou, Z. H. (2013). On the doubt about margin explanation of boosting. Artificial Intelligence, 203, 1–18.MathSciNetzbMATHCrossRefGoogle Scholar
  8. Guo, Y., & Schuurmans, D. (2011). Adaptive large margin training for multilabel classification. In: W. Burgard and D. Roth (Eds.), 25th AAAI conference on artificial intelligence. San Francisco, CA: AAAI Press.Google Scholar
  9. Jiang, A., Wang, C., & Zhu, Y. (2008). Calibrated Rank-SVM for multi-label image categorization. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1450–1455). IEEE.Google Scholar
  10. Liu, Z., Cui, Y., & Li, W. (2015). A classification method for complex power quality disturbances using EEMD and rank wavelet SVM. IEEE Transactions on Smart Grid, 6(4), 1678–1685.CrossRefGoogle Scholar
  11. Lv, S. H., Wang, L., & Zhou, Z. H. (2018). Optimal margin distribution network. arXiv preprint arXiv:1812.10761
  12. McCallum, A. (1999). Multi-label text classification with a mixture model trained by EM. In AAAI workshop on text learning (pp. 1–7)Google Scholar
  13. Reyzin, L., & Schapire, R. E. (2006). How boosting the margin can also boost classifier complexity. In Proceedings of the 23rd international conference on machine learning (pp. 753–760). ACM.Google Scholar
  14. Richtárik, P., & Takáč, M. (2014). Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, 144(1–2), 1–38.MathSciNetzbMATHCrossRefGoogle Scholar
  15. Schapire, R. E., Freund, Y., Bartlett, P., Lee, W. S., et al. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.MathSciNetzbMATHCrossRefGoogle Scholar
  16. Schapire, R. E., & Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2–3), 135–168.zbMATHCrossRefGoogle Scholar
  17. Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.Google Scholar
  18. Sha, F., Saul, L. K., & Lee, D. D. (2002). Multiplicative updates for nonnegative quadratic programming in support vector machines. In S. Becker, S. Thrun and K. Obermayer (Eds.,) Advances in neural information processing systems (pp. 1041–1048). MIT Press.Google Scholar
  19. Tan, Z. H., Zhang, T., & Zhou, Z. H. (2019). Coreset stochastic variance-reduced gradient with application to optimal margin distribution machine. In 33rd AAAI conference on artificial intelligence.Google Scholar
  20. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(Sep), 1453–1484.MathSciNetzbMATHGoogle Scholar
  21. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2011a). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079–1089.CrossRefGoogle Scholar
  22. Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., & Vlahavas, I. (2011b). Mulan: A java library for multi-label learning. Journal of Machine Learning Research, 12, 2411–2414.MathSciNetzbMATHGoogle Scholar
  23. Turnbull, D., Barrington, L., Torres, D., & Lanckriet, G. (2008). Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 467–476.CrossRefGoogle Scholar
  24. Vapnik, V. N. (1995). The nature of statistical learning theory. Berlin: Springer.zbMATHCrossRefGoogle Scholar
  25. Wu, X. Z., & Zhou, Z. H. (2017). A unified view of multi-label performance measures. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 3780–3788). Scholar
  26. Xu, J. (2012). An efficient multi-label support vector machine with a zero label. Expert Systems with Applications, 39(5), 4796–4804.CrossRefGoogle Scholar
  27. Xu, J. (2013a). Fast multi-label core vector machine. Pattern Recognition, 46(3), 885–898.zbMATHCrossRefGoogle Scholar
  28. Xu, J. (2013b). A random block coordinate descent method for multi-label support vector machine. In International conference on neural information processing (pp. 281–290). Berlin: Springer.CrossRefGoogle Scholar
  29. Xu, J. (2016). Multi-label lagrangian support vector machine with random block coordinate descent method. Information Sciences, 329, 184–205.zbMATHCrossRefGoogle Scholar
  30. Zhang, M. L., Li, Y. K., Liu, X. Y., Xin, G. (2018). Binary relevance for multi-label learning: an overview. Frontiers of Computer Science, 12(2), 191–202. CrossRefGoogle Scholar
  31. Zhou, Z. H. (2018). A brief introduction to weakly supervised learning. National Science Review, 5(1), 44–53. CrossRefGoogle Scholar
  32. Zhang, M. L., & Zhou, Z. H. (2006). Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1338–1351.CrossRefGoogle Scholar
  33. Zhang, M. L., & Zhou, Z. H. (2007). Ml-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.zbMATHCrossRefGoogle Scholar
  34. Zhang, M. L., & Zhou, Z. H. (2014a). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.CrossRefGoogle Scholar
  35. Zhang, T., & Zhou, Z. H. (2014b). Large margin distribution machine. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 313–322). ACM.Google Scholar
  36. Zhang, T., & Zhou, Z. H. (2017). Multi-class optimal margin distribution machine. In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 4063–4071). Scholar
  37. Zhang, T., & Zhou, Z. H. (2018). Optimal margin distribution clustering. In 22nd AAAI conference on artificial intelligence.Google Scholar
  38. Zhang, T., & Zhou, Z. H. (2018a). Semi-supervised optimal margin distribution machines. In Jérôme Lang (ed.) Proceedings of the 27th international joint conference on artificial intelligence (pp. 3104–3110). Stockholm, Sweden: IJCAI.Google Scholar
  39. Zhou, Z. H. (2019). Abductive learning: Towards bridging machine learning and logical reasoning. Science China Information Sciences, 62(7), 76101.Google Scholar
  40. Zhang, T., & Zhou, Z. (2019). Optimal margin distribution machine. In IEEE Transactions on Knowledge and Data Engineering.
  41. Zhou, Y. H., & Zhou, Z. H. (2016). Large margin distribution learning with cost interval and unlabeled data. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1749–1763.CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations