Skip to main content
Log in

About the non-convex optimization problem induced by non-positive semidefinite kernel learning

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

During the last years, kernel based methods proved to be very successful for many real-world learning problems. One of the main reasons for this success is the efficiency on large data sets which is a result of the fact that kernel methods like support vector machines (SVM) are based on a convex optimization problem. Solving a new learning problem can now often be reduced to the choice of an appropriate kernel function and kernel parameters. However, it can be shown that even the most powerful kernel methods can still fail on quite simple data sets in cases where the inherent feature space induced by the used kernel function is not sufficient. In these cases, an explicit feature space transformation or detection of latent variables proved to be more successful. Since such an explicit feature construction is often not feasible for large data sets, the ultimate goal for efficient kernel learning would be the adaptive creation of new and appropriate kernel functions. It can, however, not be guaranteed that such a kernel function still leads to a convex optimization problem for Support Vector Machines. Therefore, we have to enhance the optimization core of the learning method itself before we can use it with arbitrary, i.e., non-positive semidefinite, kernel functions. This article motivates the usage of appropriate feature spaces and discusses the possible consequences leading to non-convex optimization problems. We will show that these new non-convex optimization SVM are at least as accurate as their quadratic programming counterparts on eight real-world benchmark data sets in terms of the generalization performance. They always outperform traditional approaches in terms of the original optimization problem. Additionally, the proposed algorithm is more generic than existing traditional solutions since it will also work for non-positive semidefinite or indefinite kernel functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alpay D (2001) The schur algorithm, reproducing kernel spaces and system theory. In: SMF/AMS texts and monographs, vol 5. SMF, France

  • Beyer H-G, Schwefel H-P (2002) Evolution strategies: a comprehensive introduction. J Natural Comput 1(1): 2–52

    MathSciNet  Google Scholar 

  • Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discovery 2(2): 121–167

    Article  Google Scholar 

  • Camps-Valls G, Martin-Guerrero J, Rojo-Alvarez J, Soria-Olivas E (2004) Fuzzy sigmoid kernel for support vector classifiers. Neurocomputing 62: 501–506

    Article  Google Scholar 

  • Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines

  • Frp̈hlich H, Chapelle O, Schölkopf B (2004) Feature selection for support vector machines using genetic algorithms. Int J Artif Intell Tools 13(4): 791–800

    Article  Google Scholar 

  • Haasdonk B (2005) Feature space interpretation of svms with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4): 482–492

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction Springer Series in Statistics. Springer, Heidelberg

    MATH  Google Scholar 

  • Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds). Advances in Kernel Methods—support vector learning, chapter 11. MIT Press, Cambridge

  • Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33: 82–95

    Article  MATH  MathSciNet  Google Scholar 

  • Lin H-T, Lin C-J (2003a) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods

  • Lin H-T, Lin C-J (2003b) A study on sigmoid kernels for svm and the training of non-PSD kernels by SMO-type methods

  • Mary X (2003) Hilbertian subspaces, subdualities and applications. PhD thesis, Institut National des Sciences Appliquees Rouen

  • Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Roy Soc A 209: 415–446

    Article  Google Scholar 

  • Mierswa I (2006) Evolutionary learning with kernels: a generic solution for large margin problems. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2006)

  • Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2006)

  • Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Ong C, Mary X, Canu S, Smola AJ (2004a) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML), pp 639–646

  • Ong C, Mary X, Canu S, Smola AJ (2004b) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML 2004), pp 639–646

  • Platt J (1999) Advances in large margin classifiers. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. MIT Press, Cambridge

  • Rasmussen CE, Quinonero-Candela J (2005) Healing the relevance vector machine through augmentation. In: Proceedings of the 22nd international conference on machine learning (ICML 2005). ACM Press, New York, pp 689–696

  • Rechenberg I (1973) Evolutionsstrategie: optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog, Stuttgart

    Google Scholar 

  • Ritthoff O, Klinkenberg R, Fischer S, Mierswa I (2002) A hybrid approach to feature selection and generation using an evolutionary algorithm. In: Proceedings of the 2002 U.K. workshop on computational intelligence (UKCI-02). University of Birmingham, pp 147–154

  • Rüping S (2000) mySVM Manual. Universität Dortmund, Lehrstuhl Informatik VIII. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM

  • Schölkopf B, Smola AJ (2002) Learning with kernels—support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  • Smola AJ, Ovari ZL, Williamson RC (2000) Regularization with dot-product kernels. In: Proceedings of the neural information processing systems (NIPS), pp 308–314

  • StatLib (2002) Statlib—datasets archive. http://lib.stat.cmu.edu/datasets

  • Storch T (2005) On the impact of objective function transformations on evolutionary and black-box algorithms. In: Proceedings of the genetic and evolutionary computation conference (GECCO), pp 833–840

  • Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1: 211–244

    Article  MATH  MathSciNet  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katharina Morik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mierswa, I., Morik, K. About the non-convex optimization problem induced by non-positive semidefinite kernel learning. Adv Data Anal Classif 2, 241–258 (2008). https://doi.org/10.1007/s11634-008-0033-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-008-0033-4

Keywords

Mathematics Subject Classification (2000)

JEL Classification

Navigation