Abstract
Multiclass classification is an important task in pattern analysis since numerous algorithms have been devised to predict nominal variables with multiple levels accurately. In this paper, a novel support vector machine method for twin multiclass classification is presented. The main contribution is the use of second-order cone programming as a robust setting for twin multiclass classification, in which the training patterns are represented by ellipsoids instead of reduced convex hulls. A linear formulation is derived first, while the kernel-based method is also constructed for nonlinear classification. Experiments on benchmark multiclass datasets demonstrate the virtues in terms of predictive performance of our approach.
Similar content being viewed by others
Notes
Recall that an SOC constraint on variable x ∈R n has the form ∥D x + b∥≤c ⊤ x + d, where d ∈R, c ∈R n, b ∈R m, D ∈R m×n are given.
References
Agarwal S, Tomar D (2014) A feature selection based model for software defect prediction. Int J Adv SciTechnol 65:39–58
Angulo C, Parra X, Catal A (2003) K-SVCR: a support vector machine for multi-class classification. Neurocomputing 55:57–77
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Bosch P, López J, Ramírez H, Robotham H (2013) Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in Chile. Expert Syst Appl 40(10):4029–4034
Bottou L, Cortes C, Denker J, Drucker H, Guyon I, Jackel L, LeCun Y, Muller U, Sackinger E, Simard P, Vapnik V (1994) Comparison of classifier methods: a case study in handwritten digit recognition. Proc Int Conf Pattern Recog 2:77–82
Bredensteiner E J, Bennett K P (1999) Multicategory classification by support vector machines. Comput Optim Appl 12:53–79
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2011) Dbsmote: density-based synthetic minority over-sampling technique. Appl Intell 36:1–21
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Demšar J (2006) Statistical comparisons of classifiers over multiple data set. J Mach Learn Res:1–30
Dinkelbach W (1967) On nonlinear fractional programming. Manag Sci 13:492–498
Djuric N, Lan L, Vucetic S, Wang Z (2013) Budgetedsvm: a toolbox for scalable svm approximations. J Mach Learn Res 14:3813–3817
Friedman J (1996) Another approach to polychotomous classification. Tech. rep., Department of Statistics, Stanford University, http://www-stat.stanford.edu/~jhf/ftp/poly.ps.Z
Geng X, Zhan D C, Zhou Z H (2005) Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern Part B: Cybern 35(6):1098–1107
Goldfarb D, Iyengar G (2003) Robust convex quadratically constrained programs. Math Program 97 (3):495–515
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70
Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Kim Y J, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43
Kressel U G (1999) Advances in kernel methods. MIT Press, Cambridge, pp 255–268. USA, chap Pairwise classification and support vector machines
Lanckriet G, Ghaoui L, Bhattacharyya C, Jordan M (2003) A robust minimax approach to classification. J Mach Learn Res 3:555–582
Le Thi H, Pham Dinh T, Thiao M (2016) Efficient approaches for l2-l0 regularization and applications to feature selection in svm. Appl Intell 45(2):549–565
López J, Maldonado S (2016) Multi-class second-order cone programming support vector machines. Inform Sci 330:328–341
López J, Maldonado S, Carrasco M (2016) A novel multi-class svm model using second-order cone constraints. Appl Intell 44(2):457–469
Maldonado S, López J (2014) Imbalanced data classification using second-order cone programming support vector machines. Pattern Recogn 47:2070–2079
Maldonado S, López J, Carrasco M (2016) A second-order cone programming formulation for twin support vector machines. Appl Intell 45(2):265–276
Mangasarian O L (1994) Nonlinear programming. Classics in applied mathematics, society for industrial and applied mathematics
Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415–446
Nath S, Bhattacharyya C (2007) Maximum margin classifiers with specified false positive and false negative error rates. In: Proceedings of the SIAM international conference on data mining
Qi Z, Tian Y, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recogn 46(1):305–316
Sánchez-Morillo D, López-Gordo M, León A (2014) Novel multiclass classification for home-based diagnosis of sleep apnea hypopnea syndrome. Expert Syst Appl 41(4):1654–1662
Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press
Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Sturm J (1999) Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim Methods Softw 11(12):625–653. Special issue on Interior Point Methods (CD supplement with software)
Tomar D, Agarwal S (2014) Feature selection based least square twin support vector machine for diagnosis of heart disease. Int J Bio-Sci Bio-Technol 6(2):69–82
Vapnik V (1998) Statistical learning theory. Wiley
Wang Z, Crammer K, Vucetic S (2010) Multi-class pegasos on a budget. In: Proceedings of the 27th international conference on machine learning (ICML-10). Omnipress, pp 1143–1150
Wang Z, Djuric N, Crammer K, Vucetic S (2011) Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 24–32
Wang Z, Crammer K, Vucetic S (2012) Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training. J Mach Learn Res 13:3103–3131
Weston J, Watkins C (1999) Multi-class support vector machines. In: Proceedings of the Seventh European symposium on artificial neural networks
Weston J, Elisseeff A, BakIr G, Sinz F (2005) The spider machine learning toolbox. http://www.kyb.tuebingen.mpg.de/bs/people/spider/
Xie J, Hone K, Xie W, Gao X, Shi Y, Liu X (2013) Extending twin support vector machine classifier for multi-category classification problems. Intell Data Anal 17(4):649–664
Xu Y, Guo R, Wang L (2013) A twin multi-class classification support vector machines. Cogn Comput 5:580–588
Yang H Y, Wang X Y, Niu P P, Liu Y C (2014) Image denoising using nonsubsampled shearlet transform and twin support vector machines. Neural Netw 57:152–165
Yang Z, Shao Y, Zhang X (2013) Multiple birth support vector machine for multi-class classification. Neural Comput Appl 22:S153–S161
Zeng M, Yang Y, Zheng J, Cheng J (2015) Maximum margin classification based on flexible convex hulls. Neurocomputing 149(B):957–965
Zhong P, Fukushima M (2007) Second-order cone programming formulations for robust multiclass classification. Neural Comput 19:258–282
Acknowledgments
The first author was supported by FONDECYT project 1160894, the second was supported by FONDECYT projects 1140831 and 1160738, and the third author was supported by FONDECYT project 1130905. This research was partially funded by the Complex Engineering Systems Institute, ISCI (ICM-FIC: P05-004-F, CONICYT: FB0816).
Author information
Authors and Affiliations
Corresponding author
Appendix: Dual formulation of Twin-KSOCP and geometric interpretation
Appendix: Dual formulation of Twin-KSOCP and geometric interpretation
1.1 Proof of Proposition 1
The Lagrangian function associated with Problem (16) is given by
where λ 1, λ 2 ≥ 0. Since ∥v∥ = max ∥u∥≤1u ⊤ v holds for any v ∈R n, we can rewrite the Lagrangian as follows:
with L 1 given by
Thus, Problem (16) can be equivalently written as
Hence, the dual problem of (16) is given by
The above expression allows the construction of the dual formulation. A detailed description of this procedure can be found in [26]. The computation of the first order condition for the inner optimization task (the minimization problem) yields to
Let us denote by \( \hat {\mathbf {z}}_{1}=[\mathbf {z}_{1};1],\, \hat {\mathbf {z}}_{2}=[\mathbf {z}_{2};1]\in \Re ^{n+1}, \) with z 1 = μ 2 + κ 1 S 2 u 1 ∈R n, and z 2 = μ 3 + κ 2 S 3 u 2 ∈R n. Then the relations (A.32)–(A.33) can be written compactly as
where v 1 = [w 1; b 1] and H = [A e 1]. Since the symmetric matrix \(\hat {H}=H^{\top } H+\theta _{1}I \in \Re ^{n+1\times n+1} \) is positive definite, for any 𝜃 1 > 0, the following relation can be obtained:
Then, by replacing (A.32)–(A.33) in (A.30), and using the relations (18) and (A.34), the dual problem can be stated as follows:
Notice that the Hessian of the objective function of the above problem with respect to λ = [λ 1; λ 2] ∈R 2 is given by
Clearly, this matrix is symmetric positive definite. Then, the objective function of the dual problem (A.35) is strictly concave with respect to λ, and it attains its maximum value at the solution of the following linear system:
This linear system has the following solution:
Thus, the optimal value of Problem (A.35) (with respect to λ) is given by
where
Then, the dual problem of (16) can be stated as follows:
where
Similarly, since the symmetric matrix \(\hat {G}=G^{\top } G+\theta _{2}I\) is positive definite, for any 𝜃 2 > 0, we can show that the dual of the problem (17) is given by
where \(\hat {\mathbf {p}}_{i}=[\mathbf {p}_{i};1]\in \Re ^{n+1}\), for i = 1, 2.
Rights and permissions
About this article
Cite this article
López, J., Maldonado, S. & Carrasco, M. A robust formulation for twin multiclass support vector machine. Appl Intell 47, 1031–1043 (2017). https://doi.org/10.1007/s10489-017-0943-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-0943-y