Abstract
The small number of support vectors is an important factor for SVM to fast deal with very large scale problems. This paper considers fitting each class of data with a plane by a new model, which captures separability information between classes and can be solved by fast core set methods. Then training on the core sets of the fitting-planes yields a very sparse SVM classifier. The computing complexity of the proposed algorithm is up bounded by \( {\text{\rm O}}(1/\varepsilon ) \). Experimental results show that the new algorithm trains faster than both CVM and SVMperf averagely, and with comparable generalization performance.
Similar content being viewed by others
References
Bach FR, Jordan MI (2005) Predictive low-rank decomposition for kernel methods. In: 22nd international conference on machine learning, Bonn. ICML 2005. Association for Computing Machinery, pp 33–40
Badoiu M, Clarkson KL (2008) Optimal core-sets for balls. Comput Geom Theory Appl 40(1):14–22. doi:10.1016/j.comgeo.2007.04.002
Burges CJC (1996) Simplified support vector decision rules. In: Proceedings of 13th international conference on machine learning, p 7
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Downs T, Gates KE, Masters A (2002) Exact simplification of support vector solutions. J Mach Learn Res 2(2):293–297. doi:10.1162/15324430260185637
Fan R-E, Chen P-H, Lin C-J (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918
Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. doi:10.1109/tpami.2007.1068
Joachims T (1998) Making large scale SVM learning practical. Advances in kernel methods—support vector learning
Joachims T, Yu CNJ (2009) Sparse kernel SVMs via cutting-plane training. Mach Learn 76(2–3):179–193. doi:10.1007/s10994-009-5126-6
Keerthi SS, Chapelle O, DeCoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515
Lee YJ, Huang SY (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13. doi:10.1109/tnn.2006.883722
Liang X, Chen RC, Guo XY (2008) Pruning support vector machines without altering performances. IEEE Trans Neural Netw 19(10):1792–1803. doi:10.1109/tnn.2008.2002696
Licheng J, Liefeng B, Ling W (2007) Fast sparse approximation for least squares support vector machine. IEEE Trans Neural Netw 18(3):685–697
Lin KM, Lin CJ (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14(6):1449–1459. doi:10.1109/tnn.2003.820828
Peng XJ (2011) Building sparse twin support vector machine classifiers in primal space. Inf Sci 181(18):3967–3980. doi:10.1016/j.ins.2011.05.004
Smola A, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. Paper presented at the ICML
Sun P, Yao X (2010) Sparse approximation through boosting for learning large scale kernel machines. IEEE Trans Neural Netw 21(6):883–894. doi:10.1109/tnn.2010.2044244
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Tsang IWH, Kwok JTY, Zurada JM (2006) Generalized core vector machines. IEEE Trans Neural Netw 17(5):1126–1140. doi:10.1109/tnn.2006.878123
Wu M, Scholkopf B, Bakir G Building sparse large margin classifiers. In: 22nd international conference on machine learning, Bonn. ICML 2005. Association for Computing Machinery, pp 1001–1008
Khemchandani R, Karpatne A, Chandra S (2013) Twin support vector regression for the simultaneous learning of a function and its derivatives. Int J Mach Learn Cybern 4(1):51–63
Wang X, Shu-Xia L, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120
Acknowledgments
This work was supported by Scientific Research Fund of SiChuan Provincial Education Department under Grant No. 12ZA112 and the National Natural Science Foundation of China (No. 61202256).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1
From the definition of \( \bar{K} \), given any vector \( \bar{\alpha } = \left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]^{T} \ne 0 \), and \( C^{\prime} = {\raise0.7ex\hbox{${m_{1} C_{n} }$} \!\mathord{\left/ {\vphantom {{m_{1} C_{n} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}} \), one has
Because matrix \( E_{AA} \) is positive semidefinite and I AA is positive definite, so, If K AA is positive semidefinite, then \( \bar{K} \) is positive semidefinite. Furthermore, for the new learning algorithm, \( \alpha \ne - \alpha^{*} , \alpha \succ 0, \alpha^{*} \succ 0 \), so, \( \bar{\alpha }^{\text{T}} \bar{K}\bar{\alpha } \) is always positive. Thus we say \( \bar{K} \) is actually strict positive definite in the feasible set. \(\square \)
Proof of Theorem 2
For the CCMEB of (8), like (10), one has
Combining it with (16) and the definition of Δ yields
From (13), (15), (20), (19) and the definition of \( \bar{K} \), the squared distance between the center and any point \( \bar{\varphi }(x_{j} ) \) is
For \( j \prec m_{1} \),
Likewise, for \( m_{1} \le j \prec 2m_{1} \),
So, for each point \( x_{i} \in CS_{P}^{t} \) with nonzero Lagrange multiplier, one has
if \( j \prec m_{1} \), then
if \( m_{1} \le j \prec 2m_{1} \), then
Thus the point \( x_{i} \in CS_{P}^{t} \) with nonzero Lagrange multiplier must lie out of the slab.
One can also prove, in the same way, that each point \( x_{i} \in CS_{P}^{t} \) on the ball \( B(c^{t} ,r^{t} ) \) with zero Lagrange multiplier must lie on bounding planes of the slab; each point \( x_{i} \in CS_{P}^{t} \) inside the ball \( B(c^{t} ,r^{t} ) \) must lie inside the slab. Details are deleted to save space. \(\square \)
Proof of Theorem 3
Similar to the proof of theorem 2. \(\square \)
Proof of Theorem 4
From (6) and (7), we can easily have \( {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {m_{1} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${m_{1} }$}}\xi_{ + }^{T} e_{{m_{1} }} = C_{n} \). So the parameter C n represents the average extent each point out of the slab.\(\square \)
Rights and permissions
About this article
Cite this article
Li, Z., Zhou, M., Lin, H. et al. A two stages sparse SVM training. Int. J. Mach. Learn. & Cyber. 5, 425–434 (2014). https://doi.org/10.1007/s13042-013-0181-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-013-0181-5