Advertisement

Mathematical Programming

, Volume 169, Issue 1, pp 277–305 | Cite as

Robust multicategory support vector machines using difference convex algorithm

  • Chong Zhang
  • Minh Pham
  • Sheng Fu
  • Yufeng Liu
Full Length Paper Series B

Abstract

The support vector machine (SVM) is one of the most popular classification methods in the machine learning literature. Binary SVM methods have been extensively studied, and have achieved many successes in various disciplines. However, generalization to multicategory SVM (MSVM) methods can be very challenging. Many existing methods estimate k functions for k classes with an explicit sum-to-zero constraint. It was shown recently that such a formulation can be suboptimal. Moreover, many existing MSVMs are not Fisher consistent, or do not take into account the effect of outliers. In this paper, we focus on classification in the angle-based framework, which is free of the explicit sum-to-zero constraint, hence more efficient, and propose two robust MSVM methods using truncated hinge loss functions. We show that our new classifiers can enjoy Fisher consistency, and simultaneously alleviate the impact of outliers to achieve more stable classification performance. To implement our proposed classifiers, we employ the difference convex algorithm for efficient computation. Theoretical and numerical results obtained indicate that for problems with potential outliers, our robust angle-based MSVMs can be very competitive among existing methods.

Keywords

Difference convex algorithm Fisher consistency Outlier Truncated hinge loss 

Mathematics Subject Classification

90C26 62H30 

Notes

Acknowledgements

The authors would like to thank the reviewers and editors for their helpful comments and suggestions which led to a much improved presentation. Yufeng Liu’s research was supported in part by National Science Foundation Grant IIS1632951 and National Institute of Health Grant R01GM126550. Chong Zhang’s research was supported in part by National Science and Engineering Research Council of Canada (NSERC). Pham was supported in part by National Science Foundation Grant DMS1127914 and the Hobby Postdoctoral Fellowship.

References

  1. 1.
    Arora, S., Bhattacharjee, D., Nasipuri, M., Malik, L., Kundu, M., Basu, D.K.: Performance Comparison of SVM and ANN for Handwritten Devnagari Character Recognition. arXiv preprint arXiv:1006.5902 (2010)
  2. 2.
    Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml (2013)
  3. 3.
    Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Bartlett, P.L., Bousquet, O., Mendelson, S.: Local rademacher complexities. Ann. Stat. 33(4), 1497–1537 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101, 138–156 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pp. 144–152. Association for Computing Machinery, New York (1992).  https://doi.org/10.1145/130385.130401 CrossRefGoogle Scholar
  7. 7.
    Caruana, R., Karampatziakis, N., Yessenalina, A.: An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, pp. 96–103. ACM (2008)Google Scholar
  8. 8.
    Cortes, C., Vapnik, V.N.: Support vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  9. 9.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001)zbMATHGoogle Scholar
  10. 10.
    Cristianini, N., Shawe-Taylor, J.S.: An Introduction to Support Vector Machines, 1st edn. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
  11. 11.
    Demšar, J., Curk, T., Erjavec, A., Črt Gorup, Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14:2349–2353. http://jmlr.org/papers/v14/demsar13a.html (2013)
  12. 12.
    Freund, Y., Schapire, R.E.: A Desicion-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)CrossRefzbMATHGoogle Scholar
  13. 13.
    Guermeur, Y., Monfrini, E.: A quadratic loss multi-class SVM for which a radius-margin bound applies. Informatica 22(1), 73–96 (2011)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)CrossRefzbMATHGoogle Scholar
  15. 15.
    Hsieh, C., Chang, K., Lin, C., Keerthi, S., Sundarajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, Proceeding ICML ’08, pp. 408–415 (2008)Google Scholar
  16. 16.
    Justino, E.J.R., Bortolozzi, F., Sabourin, R.: A comparison of SVM and HMM classifiers in the off-line signature verification. Pattern Recognit. Lett. 26(9), 1377–1385 (2005)CrossRefGoogle Scholar
  17. 17.
    Kiwiel, K., Rosa, C., Ruszczynski, A.: Proximal decomposition via alternating linearization. SIAM J. Optim. 9(3), 668–689 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Koltchinskii, V.: Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Stat. 34(6), 2593–2656 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Stat. 30(1), 1–50 (2002)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J. Glob. Optim. 11(3), 253–285 (1997)CrossRefzbMATHGoogle Scholar
  21. 21.
    Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with dc models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Le Thi, H.A., Pham Dinh, T.: The State of the Art in DC Programming and DCA. Research Report (60 pages), Lorraine University (2013)Google Scholar
  23. 23.
    Le Thi, H.A., Pham Dinh, T.: Recent advances in DC programming and DCA. Trans. Comput. Collect. Intell. 8342, 1–37 (2014)Google Scholar
  24. 24.
    Le Thi, H.A., Le, H.M., Pham Dinh, T.: A dc programming approach for feature selection in support vector machines learning. Adv. Data Anal. Classif. 2(3), 259–278 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: DC programming and DCA for general DC programs. Adv. Intell. Syst. Comput. 15–35. ISBN 978-3-319-06568-7 (2014)Google Scholar
  26. 26.
    Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Lin, X., Wahba, G., Xiang, D., Gao, F., Klein, R., Klein, B.: Smoothing spline ANOVA models for large data sets with bernoulli observations and the randomized GACV. Ann. Stat. 28(6), 1570–1600 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Lin, X., Pham, M., Ruszczynski, A.: Alternating linearization for structured regularization problem. J. Mach. Learn. Res. 15, 3447–3481 (2014)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Lin, Y.: Some Asymptotic Properties of the Support Vector Machine. Technical Report 1044r, Department of Statistics, University of Wisconsin, Madison (1999)Google Scholar
  30. 30.
    Liu Y (2007) Fisher consistency of multicategory support vector machines. In: Eleventh International Conference on Artificial Intelligence and Statistics, pp. 289–296Google Scholar
  31. 31.
    Liu, Y., Shen, X.: Multicategory \(\psi \)-learning. J. Am. Stat. Assoc. 101, 500–509 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Liu, Y., Yuan, M.: Reinforced multicategory support vector machines. J. Comput. Gr. Stat. 20(4), 901–919 (2011)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Liu, Y., Zhang, H.H., Wu, Y.: Soft or hard classification? Large margin unified machines. J. Am. Stat. Assoc. 106, 166–177 (2011)CrossRefzbMATHGoogle Scholar
  34. 34.
    McDiarmid, C.: On the method of bounded differences. In: Surveys in Combinatorics, Cambridge University Press, Cambridge, pp. 148–188 (1989)Google Scholar
  35. 35.
    Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge, MA (2012)zbMATHGoogle Scholar
  36. 36.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(4), 341–362 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42, 95–118 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge, MA, USA (1999)Google Scholar
  39. 39.
    Shawe-Taylor, J.S., Cristianini, N.: Kernel Methods for Pattern Analysis, 1st edn. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  40. 40.
    Steinwart, I., Scovel, C.: Fast rates for support vector machines using Gaussian kernels. Ann. Stat. 35(2), 575–607 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Tseng, P.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. J. Comput. Optim. Appl. 47(4), 179–206 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes with Application to Statistics, 1st edn. Springer, Berlin, New York, NY (2000)zbMATHGoogle Scholar
  43. 43.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  44. 44.
    Wahba, G.: Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In: Schölkopf, B., Burges, J.C., Smola, A.J. (eds.) Advances in Kernel Methods: Support Vector learning, pp. 69–88. MIT Press, Cambridge, MA, USA (1999)Google Scholar
  45. 45.
    Wang, L., Shen, X.: On \(L_1\)-norm multi-class support vector machines: methodology and theory. J. Am. Stat. Assoc. 102, 595–602 (2007)CrossRefGoogle Scholar
  46. 46.
    Wang, L., Zhu, J., Zou, H.: The doubly regularized support vector machine. Stat. Sin. 16, 589–615 (2006)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Wu, Y., Liu, Y.: On multicategory truncated-hinge-loss support vector. In: Prediction and Discovery: AMS-IMS-SIAM Joint Summer Research Conference, Machine and Statistical Learning: Prediction and Discovery, June 25–29, 2006, Snowbird, Utah, American Mathematical Society, vol. 443, pp. 49–58 (2006)Google Scholar
  48. 48.
    Wu, Y., Liu, Y.: Robust truncated hinge loss support vector machines. J. Am. Stat. Assoc. 102(479), 974–983 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    Zhang, C., Liu, Y.: Multicategory angle-based large-margin classification. Biometrika 101(3), 625–640 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  50. 50.
    Zhang, C., Liu, Y., Wang, J., Zhu, H.: Reinforced angle-based multicategory support vector machines. J. Comput. Gr. Stat. 25, 806–825 (2016)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2017

Authors and Affiliations

  1. 1.Department of Statistics and Actuarial ScienceUniversity of WaterlooWaterlooCanada
  2. 2.Statistical and Applied Mathematical Sciences Institute (SAMSI)DurhamUSA
  3. 3.Department of StatisticsUniversity of VirginiaCharlottesvilleUSA
  4. 4.University of Chinese Academy of SciencesBeijingChina
  5. 5.Department of Statistics and Operations Research, Department of Genetics, Department of BiostatisticsCarolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations