Skip to main content
Log in

Sparse least squares support vector training in the reduced empirical feature space

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this paper we discuss sparse least squares support vector machines (sparse LS SVMs) trained in the empirical feature space, which is spanned by the mapped training data. First, we show that the kernel associated with the empirical feature space gives the same value with that of the kernel associated with the feature space if one of the arguments of the kernels is mapped into the empirical feature space by the mapping function associated with the feature space. Using this fact, we show that training and testing of kernel-based methods can be done in the empirical feature space and that training of LS SVMs in the empirical feature space results in solving a set of linear equations. We then derive the sparse LS SVMs restricting the linearly independent training data in the empirical feature space by the Cholesky factorization. Support vectors correspond to the selected training data and they do not change even if the value of the margin parameter is changed. Thus for linear kernels, the number of support vectors is the number of input variables at most. By computer experiments we show that we can reduce the number of support vectors without deteriorating the generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The method proposed in this paper is considered to be a generalized version of [8].

  2. http://www.ida.first.fraunhofer.de/projects/bench/benchmarks.htm.

  3. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/.

References

  1. Burges CJC (1996) Simplified support vector decision rules. In: Saitta L (ed) Machine Learning, Proceedings of the 13th international conference (ICML ’96). Morgan Kaufmann, San Francisco, pp 71–77

  2. Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244

    Article  MATH  Google Scholar 

  3. Chen S, Hong X, Harris CJ, Sharkey PM (2004) Sparse modelling using orthogonal forward regression with PRESS statistic and regularization. IEEE Trans Syst Man Cybern Part B 34(2):898–911

    Article  Google Scholar 

  4. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  5. Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific Publishing, Singapore

    MATH  Google Scholar 

  6. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  7. Cawley GC, Talbot NLC (2002) Improved sparse least-squares support vector machines. Neurocomputing 48:1025–1031

    Article  MATH  Google Scholar 

  8. Valyon J, Horvath G (2004) A sparse least squares support vector machine classifier. In: Proceedings of international joint conference on neural networks (IJCNN 2004), vol 1. Budapest, Hungary, pp 543–548

  9. Abe S (2005) Support vector machines for pattern classification. Springer, London

    Google Scholar 

  10. Xiong H, Swamy MNS, Ahmad MO (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2):460–474

    Article  Google Scholar 

  11. Kaieda K, Abe S (2004) KPCA-based training of a kernel fuzzy classifier with ellipsoidal regions. Int J Approx Reason 37(3):145–253

    Article  Google Scholar 

  12. Rätsch G, Onoda T, Müller K-R (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320

    Article  MATH  Google Scholar 

  13. Müller K-R, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201

    Article  Google Scholar 

  14. Abe S (2001) Pattern classification: neuro-fuzzy methods and their comparison. Springer, London

    MATH  Google Scholar 

  15. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Google Scholar 

  16. Bezdek JC, Keller JM, Krishnapuram R, Kuncheva LI, Pal NR (1999) Will the real iris data please stand up? IEEE Trans Fuzzy Syst 7(3):368–369

    Article  Google Scholar 

  17. Takenaga H, Abe S, Takatoo M, Kayama M, Kitamura T, Okuyama Y (1991) Input layer optimization of neural networks by sensitivity analysis and its application to recognition of numerals. Electr Eng Jpn 111(4):130–138

    Article  Google Scholar 

  18. Weiss SM, Kapouleas I (1989) An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. In: Proceedings of the 11th international joint conference on artificial intelligence. Detroit, pp 781–787

  19. Hashizume A, Motoike J, Yabe R (1998) Fully automated blood cell differential system and its application. In: Proceedings of the IUPAC 3rd international congress on automation and new technology in the clinical laboratory. Kobe, Japan, pp 297–302

  20. Lan M-S, Takenaga H, Abe S (1994) Character recognition using fuzzy rules extracted from data. In: Proceedings of the 3rd IEEE international conference on fuzzy systems, vol 1. Orlando, pp 415–420

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shigeo Abe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abe, S. Sparse least squares support vector training in the reduced empirical feature space. Pattern Anal Applic 10, 203–214 (2007). https://doi.org/10.1007/s10044-007-0062-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-007-0062-1

Keywords

Navigation