Skip to main content
Log in

A fast iterative algorithm for support vector data description

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Support vector data description (SVDD) is a well known model for pattern analysis when only positive examples are reliable. SVDD is usually trained by solving a quadratic programming problem, which is time consuming. This paper formulates the Lagrangian of a simply modified SVDD model as a differentiable convex function over the nonnegative orthant. The resulting minimization problem can be solved by a simple iterative algorithm. The proposed algorithm is easy to implement, without requiring any particular optimization toolbox. Theoretical and experimental analysis show that the algorithm converges r-linearly to the unique minimum point. Extensive experiments on pattern classification were conducted, and compared to the quadratic programming based SVDD (QP-SVDD), the proposed approach is much more computationally efficient (hundreds of times faster) and yields similar performance in terms of receiver operating characteristic curve. Furthermore, the proposed method and QP-SVDD extract almost the same set of support vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. In SVM and SVDD literature, the explicit form of the function \(\varPhi (\cdot )\) is not important, and in fact, it is often difficult to write out \(\varPhi (\cdot )\) explicitly. What is important is the kernel function, that is, the inner product of \(\varPhi (\mathbf{x}_i)\) and \(\varPhi (\mathbf{x}_j)\).

  2. We slightly abuse the notation here because \(L(\varvec{\alpha })\) was used in Eq. (3). However, this will not cause any confusion because all of our following discussions are based on Eq. (7).

  3. For notational convenience, in this proof, we assume all the vectors are row vectors. Clearly, the result also applies to column vectors.

  4. To the best of our knowledge, there is no theoretical result regarding the dependence between the largest eigenvalue of matrix \(2\mathbf {K}+2\rho \mathbf {J}_n\) and the parameter \(\rho\).

  5. For instance, on the face detection problem in Sect. 4.2, to achieve the error tolerance of \(10^{-5}\), the algorithm based on Eq. (25) needs more than 20,000 iterations to converge.

  6. For example, for digit “0”, our computer spent 47.0655 s on constructing kernel matrix, while only 2.1216 s on Algorithm 1.

References

  1. Aganagić M (1984) Newton’s method for linear complementarity problems. Math Program 28(3):349–362

    Article  MathSciNet  MATH  Google Scholar 

  2. Armand P, Gilbert JC, Jan-Jégou S (2000) A feasible BFGS interior point algorithm for solving convex minimization problems. SIAM J Optim 11(1):199–222

    Article  MathSciNet  MATH  Google Scholar 

  3. Balasundaram S, Gupta D, Kapil (2014) Lagrangian support vector regression via unconstrained convex minimization. Neural Netw 51:67–79

    Article  MATH  Google Scholar 

  4. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. J Mach Learn Res 2:125–137

    MATH  Google Scholar 

  5. Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont, MA

    MATH  Google Scholar 

  6. Chang K-W, Hsieh C-J, Lin C-J (2008) Coordinate descent method for large-scale \(L_2\)-loss linear support vector machines. J. Mach Learn Res 9:1369–1398

    MathSciNet  MATH  Google Scholar 

  7. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27

  8. Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159

    Article  MATH  Google Scholar 

  9. Cottle RW (1983) On the uniqueness of solutions to linear complementarity problems. Math Program 27(2):191–213

    Article  MathSciNet  MATH  Google Scholar 

  10. Cottle RW, Pang J-S, Stone RE (1992) The linear complementarity problem. SIAM, Philadelphia, PA

    MATH  Google Scholar 

  11. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  12. Fung G, Mangasarian OL (2003) Finite Newton method for Lagrangian support vector machine classification. Neurocomputing 55:39–55

    Article  Google Scholar 

  13. Gold C, Sollich P (2003) Model selection for support vector machine classification. Neurocomputing 55(1–2):221–249

    Article  Google Scholar 

  14. Gunn SR (1997) Support vector machines for classification and regression, technical report, image speech and intelligent systems research group, University of Southampton. http://users.ecs.soton.ac.uk/srg/publications/pdf/SVM.pdf

  15. Joachims J (1999) Making large-scale SVM learning practical. In: chölkopf B, Burges SC, Smola A (eds)Advances in Kernel methods—support vector learning. MIT-Press

  16. Kremers H, Talman D (1994) A new pivoting algorithm for the linear complementarity problem allowing for an arbitrary starting point. Math Program 63(1):235–252

    Article  MathSciNet  MATH  Google Scholar 

  17. Lee D, Lee J (2007) Domain described support vector classifier for multi-classification problems. Pattern Recogn 40(1):41–51

    Article  MATH  Google Scholar 

  18. Lee S-W, Park J, Lee S-W (2006) Low resolution face recognition based on support vector data description. Pattern Recogn 39(9):1809–1812

    Article  MATH  Google Scholar 

  19. Liu H, Palatucci M, Zhang J (2009) Blockwise coordinate descent procedures for the multi-task Lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Int’l Conf. on Machine Learning, pp 649–656

  20. Mangasarian OL (1994) Nonlinear programming. SIAM, Philadelphia, PA

    Book  MATH  Google Scholar 

  21. Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Rese 1:161–177

    MathSciNet  MATH  Google Scholar 

  22. Muñoz-Marí J, Bruzzone L, Camps-Valls G (2007) A support vector domain description approach to supervised classification of remote sensing images. IEEE Trans Geosci Remote Sens 45(8):2683–2692

    Article  Google Scholar 

  23. Musicant DR, Feinberg A (2004) Active set support vector regression. IEEE Trans Neural Netw 15(2):268–275

    Article  Google Scholar 

  24. Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Proceedings of IEEE Workshop Neural Networks for Signal Processing, pp 276–285

  25. Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition

  26. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods—support vector learning. MIT-Press

  27. Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63

    MathSciNet  Google Scholar 

  28. Ruszczyński A (2006) Nonlinear optimization. Princeton University Press, Princeton, NJ

    Book  MATH  Google Scholar 

  29. Tang R, Han J, Zhang X (2009) Efficient iris segmentation method with support vector domain description. Opt Appl XXXIX(2):365–374

    Google Scholar 

  30. Tax DMJ, Duin RPW (1999) Data domain description using support vectors. Neural Networks, Proc. of the European Symposium on Artificial, pp 251–256

  31. Tax DMJ, Duin RPW (1999) Support vector domain description. Pattern Recogn Lett 20(11–13):1191–1199

    Article  Google Scholar 

  32. Tax DMJ, Duin RPW (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173

    MATH  Google Scholar 

  33. Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66

    Article  MATH  Google Scholar 

  34. Vapnik V (1998) Statistical learning theory. Wiley, NY

    MATH  Google Scholar 

  35. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition

  36. Wahba G, Lin Y, Zhang H (2000) Generalized approximate cross validation for support vector machines, or, another way to look at margin-like quantities. In: Smola B, Scholkopf, Schurmans (eds) Advances in large margin classifiers. MIT Press

  37. Wang X, Lu S, Zhai J (2008) Fast fuzzy multicategory SVM based on support vector domain description. Int J Patt Recogn Artif Intell 1:109–120

    Article  Google Scholar 

  38. Yu X, Dementhon D, Doermann D (2008) Support vector data description for image categorization from internet images. In: Proc. of IEEE International Conf. on Pattern Recognition

  39. Zheng S (2015) A fast algorithm for training support vector regression via smoothed primal function minimization. Int J Mach Learn Cybernet 6(1):155–166

    Article  MathSciNet  Google Scholar 

  40. Zheng S (2016) Smoothly approximated support vector domain description. Pattern Recogn 49(1):55–64

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The author would like to thank the editors and four anonymous reviewers for their constructive suggestions which greatly helped improve the paper. This work was supported by a Summer Faculty Fellowship from Missouri State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songfeng Zheng.

Orthogonality condition for two nonnegative vectors

Orthogonality condition for two nonnegative vectors

We show that two nonnegative vectors \(\mathbf {a}\) and \(\mathbf {b}\) are perpendicular, if and only if \(\mathbf {a} = (\mathbf {a} - \gamma \mathbf {b})_+\) for any real \(\gamma>0\).

If two nonnegative real numbers a and b satisfy \(ab=0\), there is at least one of a and b is 0. If \(a=0\) and \(b\ge 0\), then for any \(\gamma>0\), \(a-\gamma b\le 0\), so that \((a-\gamma b)_+=0=a\); if \(a>0\), we must have \(b=0\), then for any real \(\gamma>0\), \((a-\gamma b)_+ = (a)_+ = a\). In both cases, there is \(a=(a-\gamma b)_+\) for any real number \(\gamma>0\).

Conversely, assume that two nonnegative real numbers a and b can be written as \(a=(a-\gamma b)_+\) for any real number \(\gamma>0\). If a and b are both strictly positive, then \(a-\gamma b<a\) since \(\gamma>0\). Consequently, \((a-\gamma b)_+<a\), which is contradict to the assumption that \(a = (a-\gamma b)_+\). Thus at least one of a and b must be 0, i.e., \(ab=0\).

Now assume that nonnegative vectors \(\mathbf {a}\) and \(\mathbf {b}\) in space \(\mathbb {R}^p\) are perpendicular, that is, \(\sum _{i=1}^p a_ib_i=0\). Since both of \(a_i\) and \(b_i\) are nonnegative, there must be \(a_ib_i=0\) for \(i=1,2,\ldots ,p\). By the last argument, this is equivalent to \(a_i = (a_i - \gamma b_i)_+\) for any \(\gamma>0\) and any \(i=1,2,\ldots ,p\). In vector form, we have \(\mathbf {a} = (\mathbf {a} - \gamma \mathbf {b})_+\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, S. A fast iterative algorithm for support vector data description. Int. J. Mach. Learn. & Cyber. 10, 1173–1187 (2019). https://doi.org/10.1007/s13042-018-0796-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-018-0796-7

Keywords

Navigation