Skip to main content
Log in

A robust formulation for twin multiclass support vector machine

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multiclass classification is an important task in pattern analysis since numerous algorithms have been devised to predict nominal variables with multiple levels accurately. In this paper, a novel support vector machine method for twin multiclass classification is presented. The main contribution is the use of second-order cone programming as a robust setting for twin multiclass classification, in which the training patterns are represented by ellipsoids instead of reduced convex hulls. A linear formulation is derived first, while the kernel-based method is also constructed for nonlinear classification. Experiments on benchmark multiclass datasets demonstrate the virtues in terms of predictive performance of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Recall that an SOC constraint on variable xR n has the form ∥D x + b∥≤c x + d, where dR, cR n, bR m, DR m×n are given.

References

  1. Agarwal S, Tomar D (2014) A feature selection based model for software defect prediction. Int J Adv SciTechnol 65:39–58

    Google Scholar 

  2. Angulo C, Parra X, Catal A (2003) K-SVCR: a support vector machine for multi-class classification. Neurocomputing 55:57–77

    Article  Google Scholar 

  3. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  4. Bosch P, López J, Ramírez H, Robotham H (2013) Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in Chile. Expert Syst Appl 40(10):4029–4034

    Article  Google Scholar 

  5. Bottou L, Cortes C, Denker J, Drucker H, Guyon I, Jackel L, LeCun Y, Muller U, Sackinger E, Simard P, Vapnik V (1994) Comparison of classifier methods: a case study in handwritten digit recognition. Proc Int Conf Pattern Recog 2:77–82

    Article  Google Scholar 

  6. Bredensteiner E J, Bennett K P (1999) Multicategory classification by support vector machines. Comput Optim Appl 12:53–79

    Article  MathSciNet  MATH  Google Scholar 

  7. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2011) Dbsmote: density-based synthetic minority over-sampling technique. Appl Intell 36:1–21

    Google Scholar 

  8. Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

    Article  Google Scholar 

  9. Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292

    MATH  Google Scholar 

  10. Demšar J (2006) Statistical comparisons of classifiers over multiple data set. J Mach Learn Res:1–30

  11. Dinkelbach W (1967) On nonlinear fractional programming. Manag Sci 13:492–498

    Article  MathSciNet  MATH  Google Scholar 

  12. Djuric N, Lan L, Vucetic S, Wang Z (2013) Budgetedsvm: a toolbox for scalable svm approximations. J Mach Learn Res 14:3813–3817

    MathSciNet  MATH  Google Scholar 

  13. Friedman J (1996) Another approach to polychotomous classification. Tech. rep., Department of Statistics, Stanford University, http://www-stat.stanford.edu/~jhf/ftp/poly.ps.Z

  14. Geng X, Zhan D C, Zhou Z H (2005) Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern Part B: Cybern 35(6):1098–1107

    Article  Google Scholar 

  15. Goldfarb D, Iyengar G (2003) Robust convex quadratically constrained programs. Math Program 97 (3):495–515

    Article  MathSciNet  MATH  Google Scholar 

  16. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70

    MathSciNet  MATH  Google Scholar 

  17. Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910

    Article  MATH  Google Scholar 

  18. Kim Y J, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43

    Article  Google Scholar 

  19. Kressel U G (1999) Advances in kernel methods. MIT Press, Cambridge, pp 255–268. USA, chap Pairwise classification and support vector machines

    Google Scholar 

  20. Lanckriet G, Ghaoui L, Bhattacharyya C, Jordan M (2003) A robust minimax approach to classification. J Mach Learn Res 3:555–582

    MathSciNet  MATH  Google Scholar 

  21. Le Thi H, Pham Dinh T, Thiao M (2016) Efficient approaches for l2-l0 regularization and applications to feature selection in svm. Appl Intell 45(2):549–565

    Article  Google Scholar 

  22. López J, Maldonado S (2016) Multi-class second-order cone programming support vector machines. Inform Sci 330:328–341

    Article  MATH  Google Scholar 

  23. López J, Maldonado S, Carrasco M (2016) A novel multi-class svm model using second-order cone constraints. Appl Intell 44(2):457–469

    Article  Google Scholar 

  24. Maldonado S, López J (2014) Imbalanced data classification using second-order cone programming support vector machines. Pattern Recogn 47:2070–2079

    Article  MATH  Google Scholar 

  25. Maldonado S, López J, Carrasco M (2016) A second-order cone programming formulation for twin support vector machines. Appl Intell 45(2):265–276

    Article  Google Scholar 

  26. Mangasarian O L (1994) Nonlinear programming. Classics in applied mathematics, society for industrial and applied mathematics

  27. Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415–446

    Article  MATH  Google Scholar 

  28. Nath S, Bhattacharyya C (2007) Maximum margin classifiers with specified false positive and false negative error rates. In: Proceedings of the SIAM international conference on data mining

  29. Qi Z, Tian Y, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recogn 46(1):305–316

    Article  MATH  Google Scholar 

  30. Sánchez-Morillo D, López-Gordo M, León A (2014) Novel multiclass classification for home-based diagnosis of sleep apnea hypopnea syndrome. Expert Syst Appl 41(4):1654–1662

    Article  Google Scholar 

  31. Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press

  32. Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968

    Article  Google Scholar 

  33. Sturm J (1999) Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim Methods Softw 11(12):625–653. Special issue on Interior Point Methods (CD supplement with software)

    Article  MathSciNet  MATH  Google Scholar 

  34. Tomar D, Agarwal S (2014) Feature selection based least square twin support vector machine for diagnosis of heart disease. Int J Bio-Sci Bio-Technol 6(2):69–82

    Article  Google Scholar 

  35. Vapnik V (1998) Statistical learning theory. Wiley

  36. Wang Z, Crammer K, Vucetic S (2010) Multi-class pegasos on a budget. In: Proceedings of the 27th international conference on machine learning (ICML-10). Omnipress, pp 1143–1150

  37. Wang Z, Djuric N, Crammer K, Vucetic S (2011) Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 24–32

  38. Wang Z, Crammer K, Vucetic S (2012) Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training. J Mach Learn Res 13:3103–3131

    MathSciNet  MATH  Google Scholar 

  39. Weston J, Watkins C (1999) Multi-class support vector machines. In: Proceedings of the Seventh European symposium on artificial neural networks

  40. Weston J, Elisseeff A, BakIr G, Sinz F (2005) The spider machine learning toolbox. http://www.kyb.tuebingen.mpg.de/bs/people/spider/

  41. Xie J, Hone K, Xie W, Gao X, Shi Y, Liu X (2013) Extending twin support vector machine classifier for multi-category classification problems. Intell Data Anal 17(4):649–664

    Google Scholar 

  42. Xu Y, Guo R, Wang L (2013) A twin multi-class classification support vector machines. Cogn Comput 5:580–588

    Article  Google Scholar 

  43. Yang H Y, Wang X Y, Niu P P, Liu Y C (2014) Image denoising using nonsubsampled shearlet transform and twin support vector machines. Neural Netw 57:152–165

    Article  Google Scholar 

  44. Yang Z, Shao Y, Zhang X (2013) Multiple birth support vector machine for multi-class classification. Neural Comput Appl 22:S153–S161

    Article  Google Scholar 

  45. Zeng M, Yang Y, Zheng J, Cheng J (2015) Maximum margin classification based on flexible convex hulls. Neurocomputing 149(B):957–965

    Article  Google Scholar 

  46. Zhong P, Fukushima M (2007) Second-order cone programming formulations for robust multiclass classification. Neural Comput 19:258–282

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The first author was supported by FONDECYT project 1160894, the second was supported by FONDECYT projects 1140831 and 1160738, and the third author was supported by FONDECYT project 1130905. This research was partially funded by the Complex Engineering Systems Institute, ISCI (ICM-FIC: P05-004-F, CONICYT: FB0816).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastián Maldonado.

Appendix: Dual formulation of Twin-KSOCP and geometric interpretation

Appendix: Dual formulation of Twin-KSOCP and geometric interpretation

1.1 Proof of Proposition 1

The Lagrangian function associated with Problem (16) is given by

$$\begin{array}{@{}rcl@{}} L(\mathbf{w}_{1},b_{1},\lambda_{1},\lambda_{2})&=&\frac{1}{2}\left\| A\mathbf{w}_{1}+\mathbf{e}_{1}b_{1} \right\|^{2}+\frac{\theta_{1}}{2}(\|\mathbf{w}_{1}\|^{2}+{b_{1}^{2}})\\ &&+\lambda_{1}(\mathbf{w}_{1}^{\top} {\boldsymbol{\mu}}_{2}+b_{1}+1+\kappa_{1}\|S_{2}^{\top}\mathbf{w}_{1}\|)\\ &&+\lambda_{2}(\mathbf{w}_{1}^{\top} {\boldsymbol{\mu}}_{3}\,+\,b_{1}\,+\,1\!-\epsilon\,+\,\kappa_{2}\|S_{3}^{\top}\mathbf{w}_{1}\|), \end{array} $$

where λ 1, λ 2 ≥ 0. Since ∥v∥ = max ∥u∥≤1u v holds for any vR n, we can rewrite the Lagrangian as follows:

$$\begin{array}{@{}rcl@{}} L(\mathbf{w}_{1},b_{1},\lambda_{1},\lambda_{2})&=&\max_{\mathbf{u}}\{L_{1}(\mathbf{w}_{1},b_{1},\lambda_{1},\lambda_{2},\mathbf{u}_{1},\mathbf{u}_{2}):\|\mathbf{u}_{i}\|\\&\le& 1,\ i=1,2\}, \end{array} $$

with L 1 given by

$$ \begin{array}{llllll} L_{1}(\mathbf{w}_{1},b_{1},\lambda_{1},\lambda_{2},\mathbf{u}_{1},\mathbf{u}_{2})=\frac{1}{2}\left\| A\mathbf{w}_{1}\!+\mathbf{e}_{1}b_{1} \right\|^{2}+\frac{\theta_{1}}{2}(\|\mathbf{w}_{1}\|^{2}+{b_{1}^{2}})\\+\lambda_{1}(\mathbf{w}_{1}^{\top} {\boldsymbol{\mu}}_{2}+b_{1}+1+\kappa_{1}\mathbf{w}_{1}^{\top} S_{2}\mathbf{u}_{1})\\ +\lambda_{2}(\mathbf{w}_{1}^{\top} {\boldsymbol{\mu}}_{3}+b_{1}\!+1\!-\epsilon+\kappa_{2}\mathbf{w}_{1}^{\top} S_{3}\mathbf{u}_{2}) . \end{array} $$
(A.30)

Thus, Problem (16) can be equivalently written as

$$\begin{array}{@{}rcl@{}} \min_{\mathbf{w}_{1},b_{1} }\max_{\mathbf{u}_{1},\mathbf{u}_{2},\lambda_{1},\lambda_{2}}\{L_{1}(\mathbf{w}_{1},b_{1},\lambda_{1},\lambda_{2},\mathbf{u}_{1},\mathbf{u}_{2}):\|\mathbf{u}_{i}\|\\\le 1,\lambda_{i}\ge0,\, i=1,2\}. \end{array} $$

Hence, the dual problem of (16) is given by

$$\begin{array}{@{}rcl@{}} \max_{\mathbf{u}_{1},\mathbf{u}_{2},\lambda_{1},\lambda_{2}}\min_{\mathbf{w}_{1},b_{1} }\{L_{1}(\mathbf{w}_{1},b_{1},\lambda_{1},\lambda_{2},\mathbf{u}_{1},\mathbf{u}_{2}):\|\mathbf{u}_{i}\|\\\le 1,\lambda_{i}\ge0,\, i=1,2\}. \end{array} $$
(A.31)

The above expression allows the construction of the dual formulation. A detailed description of this procedure can be found in [26]. The computation of the first order condition for the inner optimization task (the minimization problem) yields to

$$\begin{array}{@{}rcl@{}} \nabla_{\mathbf{w}_{1}}L_{1}\!&=&\!A^{\top}(A\mathbf{w}_{1}\,+\,\mathbf{e}_{1}b_{1})\,+\,\theta_{1}\mathbf{w}_{1}\!\,+\,\lambda_{1}({\boldsymbol{\mu}}_{2}\,+\,\kappa_{1} S_{2}\mathbf{u}_{1})\\&&+\lambda_{2}({\boldsymbol{\mu}}_{3}\,+\,\kappa_{2} S_{3}\mathbf{u}_{2})\,=\,0,\qquad \end{array} $$
(A.32)
$$\begin{array}{@{}rcl@{}} \nabla_{b_{1}}L_{1}\!&=&\!\mathbf{e}_{1}^{\top}(A\mathbf{w}_{1}+\mathbf{e}_{1}b_{1})+\theta_{1}b_{1}+\lambda_{1}+\lambda_{2}=0. \end{array} $$
(A.33)

Let us denote by \( \hat {\mathbf {z}}_{1}=[\mathbf {z}_{1};1],\, \hat {\mathbf {z}}_{2}=[\mathbf {z}_{2};1]\in \Re ^{n+1}, \) with z 1 = μ 2 + κ 1 S 2 u 1R n, and z 2 = μ 3 + κ 2 S 3 u 2R n. Then the relations (A.32)–(A.33) can be written compactly as

$$(H^{\top} H+\theta_{1}I) \mathbf{v}_{1}+\lambda_{1}\hat{\mathbf{z}}_{1}+\lambda_{2}\hat{\mathbf{z}}_{2}=0, $$

where v 1 = [w 1; b 1] and H = [A e 1]. Since the symmetric matrix \(\hat {H}=H^{\top } H+\theta _{1}I \in \Re ^{n+1\times n+1} \) is positive definite, for any 𝜃 1 > 0, the following relation can be obtained:

$$ \mathbf{v}_{1}=-\hat{H}^{-1}(\lambda_{1}\hat{\mathbf{z}}_{1}+\lambda_{2}\hat{\mathbf{z}}_{2}). $$
(A.34)

Then, by replacing (A.32)–(A.33) in (A.30), and using the relations (18) and (A.34), the dual problem can be stated as follows:

$$ \begin{array}{llllll} \max_{\mathbf{z}_{i},\mathbf{u}_{i},\lambda_{i} } & \ \lambda_{1}+\lambda_{2}(1-\epsilon)-\frac{1}{2}(\lambda_{1}\hat{\mathbf{z}}_{1}+\lambda_{2}\hat{\mathbf{z}}_{2})^{\top} \hat{H}^{-1}(\lambda_{1}\hat{\mathbf{z}}_{1}+\lambda_{2}\hat{\mathbf{z}}_{2}) \\ \text{s.t.}\, &\, \mathbf{z}_{1}= {\boldsymbol{\mu}}_{2}+\kappa_{1} S_{2}\mathbf{u}_{1}, \ \|\mathbf{u}_{1}\|\le1, \\ &\, \mathbf{z}_{2}= {\boldsymbol{\mu}}_{3}+\kappa_{2} S_{3}\mathbf{u}_{2}, \ \|\mathbf{u}_{2}\|\le1, \\ &\, \lambda_{1},\lambda_{2}\ge0. \end{array} $$
(A.35)

Notice that the Hessian of the objective function of the above problem with respect to λ = [λ 1; λ 2] ∈R 2 is given by

$${H_{z}}=\left( \begin{array}{cc} \hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{1}&\hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2} \\ \hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2}& \hat{\mathbf{z}}_{2}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2} \end{array}\right). $$

Clearly, this matrix is symmetric positive definite. Then, the objective function of the dual problem (A.35) is strictly concave with respect to λ, and it attains its maximum value at the solution of the following linear system:

$$H_{z}\left( \begin{array}{c} \lambda_{1}^{*}\\ \lambda_{2}^{*} \end{array}\right)=\left( \begin{array}{c} 1\\ 1-\epsilon \end{array}\right). $$

This linear system has the following solution:

$$\begin{array}{@{}rcl@{}} \lambda_{1}^{*}&=&\frac{\hat{\mathbf{z}}_{2}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2}-(1-\epsilon)\hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2}}{\det(H_{z})}, \\\ \lambda_{2}^{*}&=&\frac{(1-\epsilon)\hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{1}-\hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2}}{\det(H_{z})}. \end{array} $$
(A.36)

Thus, the optimal value of Problem (A.35) (with respect to λ) is given by

$$ \frac{1}{2}(1\quad 1-\epsilon)(H_{z})^{-1} \left( \begin{array}{c} 1\\ 1-\epsilon \end{array}\right), $$
(A.37)

where

$$(H_{z})^{-1}=\frac{1}{\det(H_{z})}\left( \begin{array}{cc} \hat{\mathbf{z}}_{2}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2}&-\hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2} \\ -\hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{2}& \hat{\mathbf{z}}_{1}^{\top}\hat{H}^{-1}\hat{\mathbf{z}}_{1} \end{array}\right). $$

Then, the dual problem of (16) can be stated as follows:

$$ \begin{array}{llllll} \max_{\mathbf{z}_{i},\mathbf{u}_{i} } & \ \frac{1}{2}\frac{\|\hat{H}^{-1/2}(\hat{\mathbf{z}}_{2}-(1-\epsilon)\hat{\mathbf{z}}_{1})\|^{2}}{(\|\hat{H}^{-1/2}\hat{\mathbf{z}}_{1}\|\|\hat{H}^{-1/2}\hat{\mathbf{z}}_{2}\|)^{2}-(\hat{\mathbf{z}}_{1}^{\top} \hat{H}^{-1}\hat{\mathbf{z}}_{2})^{2}}\\ \text{s.t.}\, &\, \mathbf{z}_{1}\in \mathbf{B}({\boldsymbol{\mu}}_{2},S_{2},\kappa_{1}),\quad \mathbf{z}_{2}\in \mathbf{B}({\boldsymbol{\mu}}_{3},S_{3},\kappa_{2}), \end{array} $$
(A.38)

where

$$ \mathbf{B}({\boldsymbol{\mu}},S,\kappa)=\{\mathbf{z}:{\mathbf{z}}={\boldsymbol{\mu}}+\kappa S\mathbf{u},\|\mathbf{u}\|\le1\}. $$
(A.39)

Similarly, since the symmetric matrix \(\hat {G}=G^{\top } G+\theta _{2}I\) is positive definite, for any 𝜃 2 > 0, we can show that the dual of the problem (17) is given by

$$ \begin{array}{llllll} \max_{\mathbf{p}_{i},\mathbf{u}_{i} } & \ \frac{1}{2}\frac{\|\hat{G}^{-1/2}(\hat{\mathbf{p}}_{2}-(1-\epsilon)\hat{\mathbf{p}}_{1})\|^{2}}{(\|\hat{G}^{-1/2}\hat{\mathbf{p}}_{1}\|\|\hat{G}^{-1/2}\hat{\mathbf{p}}_{2}\|)^{2}-(\hat{\mathbf{p}}_{1}^{\top} \hat{G}^{-1}\hat{\mathbf{p}}_{2})^{2}}\\ \text{s.t.}\, &\, \mathbf{p}_{1}\in \mathbf{B}({\boldsymbol{\mu}}_{1},S_{1},\kappa_{3}),\quad \mathbf{p}_{2}\in \mathbf{B}({\boldsymbol{\mu}}_{3},S_{3},\kappa_{4}), \end{array} $$
(A.40)

where \(\hat {\mathbf {p}}_{i}=[\mathbf {p}_{i};1]\in \Re ^{n+1}\), for i = 1, 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

López, J., Maldonado, S. & Carrasco, M. A robust formulation for twin multiclass support vector machine. Appl Intell 47, 1031–1043 (2017). https://doi.org/10.1007/s10489-017-0943-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-0943-y

Keywords

Navigation