A fast iterative algorithm for support vector data description

Zheng, Songfeng

doi:10.1007/s13042-018-0796-7

A fast iterative algorithm for support vector data description

Original Article
Published: 05 March 2018

Volume 10, pages 1173–1187, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Songfeng Zheng¹

264 Accesses
5 Citations
Explore all metrics

Abstract

Support vector data description (SVDD) is a well known model for pattern analysis when only positive examples are reliable. SVDD is usually trained by solving a quadratic programming problem, which is time consuming. This paper formulates the Lagrangian of a simply modified SVDD model as a differentiable convex function over the nonnegative orthant. The resulting minimization problem can be solved by a simple iterative algorithm. The proposed algorithm is easy to implement, without requiring any particular optimization toolbox. Theoretical and experimental analysis show that the algorithm converges r-linearly to the unique minimum point. Extensive experiments on pattern classification were conducted, and compared to the quadratic programming based SVDD (QP-SVDD), the proposed approach is much more computationally efficient (hundreds of times faster) and yields similar performance in terms of receiver operating characteristic curve. Furthermore, the proposed method and QP-SVDD extract almost the same set of support vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese Neural Networks: An Overview

Introduction to Machine Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

In SVM and SVDD literature, the explicit form of the function \(\varPhi (\cdot )\) is not important, and in fact, it is often difficult to write out \(\varPhi (\cdot )\) explicitly. What is important is the kernel function, that is, the inner product of \(\varPhi (\mathbf{x}_i)\) and \(\varPhi (\mathbf{x}_j)\).
We slightly abuse the notation here because \(L(\varvec{\alpha })\) was used in Eq. (3). However, this will not cause any confusion because all of our following discussions are based on Eq. (7).
For notational convenience, in this proof, we assume all the vectors are row vectors. Clearly, the result also applies to column vectors.
To the best of our knowledge, there is no theoretical result regarding the dependence between the largest eigenvalue of matrix \(2\mathbf {K}+2\rho \mathbf {J}_n\) and the parameter \(\rho\).
For instance, on the face detection problem in Sect. 4.2, to achieve the error tolerance of \(10^{-5}\), the algorithm based on Eq. (25) needs more than 20,000 iterations to converge.
For example, for digit “0”, our computer spent 47.0655 s on constructing kernel matrix, while only 2.1216 s on Algorithm 1.

References

Aganagić M (1984) Newton’s method for linear complementarity problems. Math Program 28(3):349–362
Article MathSciNet MATH Google Scholar
Armand P, Gilbert JC, Jan-Jégou S (2000) A feasible BFGS interior point algorithm for solving convex minimization problems. SIAM J Optim 11(1):199–222
Article MathSciNet MATH Google Scholar
Balasundaram S, Gupta D, Kapil (2014) Lagrangian support vector regression via unconstrained convex minimization. Neural Netw 51:67–79
Article MATH Google Scholar
Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. J Mach Learn Res 2:125–137
MATH Google Scholar
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont, MA
MATH Google Scholar
Chang K-W, Hsieh C-J, Lin C-J (2008) Coordinate descent method for large-scale \(L_2\)-loss linear support vector machines. J. Mach Learn Res 9:1369–1398
MathSciNet MATH Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159
Article MATH Google Scholar
Cottle RW (1983) On the uniqueness of solutions to linear complementarity problems. Math Program 27(2):191–213
Article MathSciNet MATH Google Scholar
Cottle RW, Pang J-S, Stone RE (1992) The linear complementarity problem. SIAM, Philadelphia, PA
MATH Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Fung G, Mangasarian OL (2003) Finite Newton method for Lagrangian support vector machine classification. Neurocomputing 55:39–55
Article Google Scholar
Gold C, Sollich P (2003) Model selection for support vector machine classification. Neurocomputing 55(1–2):221–249
Article Google Scholar
Gunn SR (1997) Support vector machines for classification and regression, technical report, image speech and intelligent systems research group, University of Southampton. http://users.ecs.soton.ac.uk/srg/publications/pdf/SVM.pdf
Joachims J (1999) Making large-scale SVM learning practical. In: chölkopf B, Burges SC, Smola A (eds)Advances in Kernel methods—support vector learning. MIT-Press
Kremers H, Talman D (1994) A new pivoting algorithm for the linear complementarity problem allowing for an arbitrary starting point. Math Program 63(1):235–252
Article MathSciNet MATH Google Scholar
Lee D, Lee J (2007) Domain described support vector classifier for multi-classification problems. Pattern Recogn 40(1):41–51
Article MATH Google Scholar
Lee S-W, Park J, Lee S-W (2006) Low resolution face recognition based on support vector data description. Pattern Recogn 39(9):1809–1812
Article MATH Google Scholar
Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task Lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Int’l Conf. on Machine Learning, pp 649–656
Mangasarian OL (1994) Nonlinear programming. SIAM, Philadelphia, PA
Book MATH Google Scholar
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Rese 1:161–177
MathSciNet MATH Google Scholar
Muñoz-Marí J, Bruzzone L, Camps-Valls G (2007) A support vector domain description approach to supervised classification of remote sensing images. IEEE Trans Geosci Remote Sens 45(8):2683–2692
Article Google Scholar
Musicant DR, Feinberg A (2004) Active set support vector regression. IEEE Trans Neural Netw 15(2):268–275
Article Google Scholar
Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Proceedings of IEEE Workshop Neural Networks for Signal Processing, pp 276–285
Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods—support vector learning. MIT-Press
Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63
MathSciNet Google Scholar
Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton, NJ
Book MATH Google Scholar
Tang R, Han J, Zhang X (2009) Efficient iris segmentation method with support vector domain description. Opt Appl XXXIX(2):365–374
Google Scholar
Tax DMJ, Duin RPW (1999) Data domain description using support vectors. Neural Networks, Proc. of the European Symposium on Artificial, pp 251–256
Tax DMJ, Duin RPW (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199
Article Google Scholar
Tax DMJ, Duin RPW (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173
MATH Google Scholar
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Article MATH Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, NY
MATH Google Scholar
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition
Wahba G, Lin Y, Zhang H (2000) Generalized approximate cross validation for support vector machines, or, another way to look at margin-like quantities. In: Smola B, Scholkopf, Schurmans (eds) Advances in large margin classifiers. MIT Press
Wang X, Lu S, Zhai J (2008) Fast fuzzy multicategory SVM based on support vector domain description. Int J Patt Recogn Artif Intell 1:109–120
Article Google Scholar
Yu X, Dementhon D, Doermann D (2008) Support vector data description for image categorization from internet images. In: Proc. of IEEE International Conf. on Pattern Recognition
Zheng S (2015) A fast algorithm for training support vector regression via smoothed primal function minimization. Int J Mach Learn Cybernet 6(1):155–166
Article MathSciNet Google Scholar
Zheng S (2016) Smoothly approximated support vector domain description. Pattern Recogn 49(1):55–64
Article MATH Google Scholar

Download references

Acknowledgements

The author would like to thank the editors and four anonymous reviewers for their constructive suggestions which greatly helped improve the paper. This work was supported by a Summer Faculty Fellowship from Missouri State University.

Author information

Authors and Affiliations

Department of Mathematics, Missouri State University, Springfield, MO, 65897, USA
Songfeng Zheng

Authors

Songfeng Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Songfeng Zheng.

Orthogonality condition for two nonnegative vectors

We show that two nonnegative vectors \(\mathbf {a}\) and \(\mathbf {b}\) are perpendicular, if and only if \(\mathbf {a} = (\mathbf {a} - \gamma \mathbf {b})_+\) for any real \(\gamma>0\).

If two nonnegative real numbers a and b satisfy \(ab=0\), there is at least one of a and b is 0. If \(a=0\) and \(b\ge 0\), then for any \(\gamma>0\), \(a-\gamma b\le 0\), so that \((a-\gamma b)_+=0=a\); if \(a>0\), we must have \(b=0\), then for any real \(\gamma>0\), \((a-\gamma b)_+ = (a)_+ = a\). In both cases, there is \(a=(a-\gamma b)_+\) for any real number \(\gamma>0\).

Conversely, assume that two nonnegative real numbers a and b can be written as \(a=(a-\gamma b)_+\) for any real number \(\gamma>0\). If a and b are both strictly positive, then \(a-\gamma b<a\) since \(\gamma>0\). Consequently, \((a-\gamma b)_+<a\), which is contradict to the assumption that \(a = (a-\gamma b)_+\). Thus at least one of a and b must be 0, i.e., \(ab=0\).

Now assume that nonnegative vectors \(\mathbf {a}\) and \(\mathbf {b}\) in space \(\mathbb {R}^p\) are perpendicular, that is, \(\sum _{i=1}^p a_ib_i=0\). Since both of \(a_i\) and \(b_i\) are nonnegative, there must be \(a_ib_i=0\) for \(i=1,2,\ldots ,p\). By the last argument, this is equivalent to \(a_i = (a_i - \gamma b_i)_+\) for any \(\gamma>0\) and any \(i=1,2,\ldots ,p\). In vector form, we have \(\mathbf {a} = (\mathbf {a} - \gamma \mathbf {b})_+\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, S. A fast iterative algorithm for support vector data description. Int. J. Mach. Learn. & Cyber. 10, 1173–1187 (2019). https://doi.org/10.1007/s13042-018-0796-7

Download citation

Received: 09 February 2017
Accepted: 26 February 2018
Published: 05 March 2018
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s13042-018-0796-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast iterative algorithm for support vector data description

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Introduction to Machine Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Orthogonality condition for two nonnegative vectors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast iterative algorithm for support vector data description

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

Introduction to Machine Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Orthogonality condition for two nonnegative vectors

Orthogonality condition for two nonnegative vectors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation