Abstract
Support vector data description (SVDD) is a well known model for pattern analysis when only positive examples are reliable. SVDD is usually trained by solving a quadratic programming problem, which is time consuming. This paper formulates the Lagrangian of a simply modified SVDD model as a differentiable convex function over the nonnegative orthant. The resulting minimization problem can be solved by a simple iterative algorithm. The proposed algorithm is easy to implement, without requiring any particular optimization toolbox. Theoretical and experimental analysis show that the algorithm converges r-linearly to the unique minimum point. Extensive experiments on pattern classification were conducted, and compared to the quadratic programming based SVDD (QP-SVDD), the proposed approach is much more computationally efficient (hundreds of times faster) and yields similar performance in terms of receiver operating characteristic curve. Furthermore, the proposed method and QP-SVDD extract almost the same set of support vectors.
Similar content being viewed by others
Notes
In SVM and SVDD literature, the explicit form of the function \(\varPhi (\cdot )\) is not important, and in fact, it is often difficult to write out \(\varPhi (\cdot )\) explicitly. What is important is the kernel function, that is, the inner product of \(\varPhi (\mathbf{x}_i)\) and \(\varPhi (\mathbf{x}_j)\).
For notational convenience, in this proof, we assume all the vectors are row vectors. Clearly, the result also applies to column vectors.
To the best of our knowledge, there is no theoretical result regarding the dependence between the largest eigenvalue of matrix \(2\mathbf {K}+2\rho \mathbf {J}_n\) and the parameter \(\rho\).
For example, for digit “0”, our computer spent 47.0655 s on constructing kernel matrix, while only 2.1216 s on Algorithm 1.
References
Aganagić M (1984) Newton’s method for linear complementarity problems. Math Program 28(3):349–362
Armand P, Gilbert JC, Jan-Jégou S (2000) A feasible BFGS interior point algorithm for solving convex minimization problems. SIAM J Optim 11(1):199–222
Balasundaram S, Gupta D, Kapil (2014) Lagrangian support vector regression via unconstrained convex minimization. Neural Netw 51:67–79
Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. J Mach Learn Res 2:125–137
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont, MA
Chang K-W, Hsieh C-J, Lin C-J (2008) Coordinate descent method for large-scale \(L_2\)-loss linear support vector machines. J. Mach Learn Res 9:1369–1398
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159
Cottle RW (1983) On the uniqueness of solutions to linear complementarity problems. Math Program 27(2):191–213
Cottle RW, Pang J-S, Stone RE (1992) The linear complementarity problem. SIAM, Philadelphia, PA
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Fung G, Mangasarian OL (2003) Finite Newton method for Lagrangian support vector machine classification. Neurocomputing 55:39–55
Gold C, Sollich P (2003) Model selection for support vector machine classification. Neurocomputing 55(1–2):221–249
Gunn SR (1997) Support vector machines for classification and regression, technical report, image speech and intelligent systems research group, University of Southampton. http://users.ecs.soton.ac.uk/srg/publications/pdf/SVM.pdf
Joachims J (1999) Making large-scale SVM learning practical. In: chölkopf B, Burges SC, Smola A (eds)Advances in Kernel methods—support vector learning. MIT-Press
Kremers H, Talman D (1994) A new pivoting algorithm for the linear complementarity problem allowing for an arbitrary starting point. Math Program 63(1):235–252
Lee D, Lee J (2007) Domain described support vector classifier for multi-classification problems. Pattern Recogn 40(1):41–51
Lee S-W, Park J, Lee S-W (2006) Low resolution face recognition based on support vector data description. Pattern Recogn 39(9):1809–1812
Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task Lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Int’l Conf. on Machine Learning, pp 649–656
Mangasarian OL (1994) Nonlinear programming. SIAM, Philadelphia, PA
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Rese 1:161–177
Muñoz-Marí J, Bruzzone L, Camps-Valls G (2007) A support vector domain description approach to supervised classification of remote sensing images. IEEE Trans Geosci Remote Sens 45(8):2683–2692
Musicant DR, Feinberg A (2004) Active set support vector regression. IEEE Trans Neural Netw 15(2):268–275
Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Proceedings of IEEE Workshop Neural Networks for Signal Processing, pp 276–285
Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods—support vector learning. MIT-Press
Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63
Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton, NJ
Tang R, Han J, Zhang X (2009) Efficient iris segmentation method with support vector domain description. Opt Appl XXXIX(2):365–374
Tax DMJ, Duin RPW (1999) Data domain description using support vectors. Neural Networks, Proc. of the European Symposium on Artificial, pp 251–256
Tax DMJ, Duin RPW (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199
Tax DMJ, Duin RPW (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Vapnik V (1998) Statistical learning theory. Wiley, NY
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition
Wahba G, Lin Y, Zhang H (2000) Generalized approximate cross validation for support vector machines, or, another way to look at margin-like quantities. In: Smola B, Scholkopf, Schurmans (eds) Advances in large margin classifiers. MIT Press
Wang X, Lu S, Zhai J (2008) Fast fuzzy multicategory SVM based on support vector domain description. Int J Patt Recogn Artif Intell 1:109–120
Yu X, Dementhon D, Doermann D (2008) Support vector data description for image categorization from internet images. In: Proc. of IEEE International Conf. on Pattern Recognition
Zheng S (2015) A fast algorithm for training support vector regression via smoothed primal function minimization. Int J Mach Learn Cybernet 6(1):155–166
Zheng S (2016) Smoothly approximated support vector domain description. Pattern Recogn 49(1):55–64
Acknowledgements
The author would like to thank the editors and four anonymous reviewers for their constructive suggestions which greatly helped improve the paper. This work was supported by a Summer Faculty Fellowship from Missouri State University.
Author information
Authors and Affiliations
Corresponding author
Orthogonality condition for two nonnegative vectors
Orthogonality condition for two nonnegative vectors
We show that two nonnegative vectors \(\mathbf {a}\) and \(\mathbf {b}\) are perpendicular, if and only if \(\mathbf {a} = (\mathbf {a} - \gamma \mathbf {b})_+\) for any real \(\gamma>0\).
If two nonnegative real numbers a and b satisfy \(ab=0\), there is at least one of a and b is 0. If \(a=0\) and \(b\ge 0\), then for any \(\gamma>0\), \(a-\gamma b\le 0\), so that \((a-\gamma b)_+=0=a\); if \(a>0\), we must have \(b=0\), then for any real \(\gamma>0\), \((a-\gamma b)_+ = (a)_+ = a\). In both cases, there is \(a=(a-\gamma b)_+\) for any real number \(\gamma>0\).
Conversely, assume that two nonnegative real numbers a and b can be written as \(a=(a-\gamma b)_+\) for any real number \(\gamma>0\). If a and b are both strictly positive, then \(a-\gamma b<a\) since \(\gamma>0\). Consequently, \((a-\gamma b)_+<a\), which is contradict to the assumption that \(a = (a-\gamma b)_+\). Thus at least one of a and b must be 0, i.e., \(ab=0\).
Now assume that nonnegative vectors \(\mathbf {a}\) and \(\mathbf {b}\) in space \(\mathbb {R}^p\) are perpendicular, that is, \(\sum _{i=1}^p a_ib_i=0\). Since both of \(a_i\) and \(b_i\) are nonnegative, there must be \(a_ib_i=0\) for \(i=1,2,\ldots ,p\). By the last argument, this is equivalent to \(a_i = (a_i - \gamma b_i)_+\) for any \(\gamma>0\) and any \(i=1,2,\ldots ,p\). In vector form, we have \(\mathbf {a} = (\mathbf {a} - \gamma \mathbf {b})_+\).
Rights and permissions
About this article
Cite this article
Zheng, S. A fast iterative algorithm for support vector data description. Int. J. Mach. Learn. & Cyber. 10, 1173–1187 (2019). https://doi.org/10.1007/s13042-018-0796-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-018-0796-7