Evolutionary Intelligence

, Volume 5, Issue 3, pp 189–205 | Cite as

Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification

  • Fabian GiesekeEmail author
  • Oliver Kramer
  • Antti Airola
  • Tapio Pahikkala
Special Issue


Binary classification tasks are among the most important ones in the field of machine learning. One prominent approach to address such tasks are support vector machines which aim at finding a hyperplane separating two classes well such that the induced distance between the hyperplane and the patterns is maximized. In general, sufficient labeled data is needed for such classification settings to obtain reasonable models. However, labeled data is often rare in real-world learning scenarios while unlabeled data can be obtained easily. For this reason, the concept of support vector machines has also been extended to semi- and unsupervised settings: in the unsupervised case, one aims at finding a partition of the data into two classes such that a subsequent application of a support vector machine leads to the best overall result. Similarly, given both a labeled and an unlabeled part, semi-supervised support vector machines favor decision hyperplanes that lie in a low density area induced by the unlabeled training patterns, while still considering the labeled part of the data. The associated optimization problems for both the semi- and unsupervised case, however, are of combinatorial nature and, hence, difficult to solve. In this work, we present efficient implementations of simple local search strategies for (variants of) the both cases that are based on matrix update schemes for the intermediate candidate solutions. We evaluate the performances of the resulting approaches on a variety of artificial and real-world data sets. The results indicate that our approaches can successfully incorporate unlabeled data. (The unsupervised case was originally proposed by Gieseke F, Pahikkala et al. (2009). The derivations presented in this work are new and comprehend the old ones (for the unsupervised setting) as a special case.)


Machine learning Semi-supervised support vector machines Maximum margin clustering Regularized least-squares classification Combinatorial optimization Matrix calculus  



This work has been supported in part by funds of the Deutsche Forschungsgemeinschaft (Fabian Gieseke, grant KR 3695) and by the Academy of Finland (Tapio Pahikkala, grant 134020).


  1. 1.
    Bennett KP, Demiriz A (1998) Semi-supervised support vector machines. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11, MIT Press, pp 368–374Google Scholar
  2. 2.
    Beyer HG, Schwefel HP (2002) Evolution strategies—a comprehensive introduction. Nat Comput 1:3–52MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Bie TD, Cristianini N (2003) Convex methods for transduction. In: Advances in neural information processing systems 16, MIT Press, pp 73–80Google Scholar
  4. 4.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkzbMATHGoogle Scholar
  5. 5.
    Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at
  6. 6.
    Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 57–64Google Scholar
  7. 7.
    Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised svms. In: Proceedings of the international conference on machine learning, pp 185–192Google Scholar
  8. 8.
    Chapelle, O, Schölkopf, B, Zien, A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge, MAGoogle Scholar
  9. 9.
    Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233zbMATHGoogle Scholar
  10. 10.
    Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the international conference on machine learning, pp 201–208Google Scholar
  11. 11.
    Droste S, Jansen T, Wegener I (2002) On the analysis of the (1+1) evolutionary algorithm. Theor Comput Sci 276(1–2):51–81MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50, Google Scholar
  13. 13.
    Fogel DB (1966) Artificial intelligence through simulated evolution. Wiley, New YorkzbMATHGoogle Scholar
  14. 14.
    Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15:29–44zbMATHCrossRefGoogle Scholar
  15. 15.
    Gieseke F, Pahikkala T, Kramer O (2009) Fast evolutionary maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 361–368Google Scholar
  16. 16.
    Golub GH, Van Loan C (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore and LondonGoogle Scholar
  17. 17.
    Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkzbMATHCrossRefGoogle Scholar
  18. 18.
    Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
  19. 19.
    Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, CambridgezbMATHGoogle Scholar
  20. 20.
    Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the international conference on machine learning, pp 200–209Google Scholar
  21. 21.
    Mierswa I (2009) Non-convex and multi-objective optimization in data mining. PhD thesis, Technische Universität DortmundGoogle Scholar
  22. 22.
    Nene S, Nayar S, Murase H (1996) Columbia object image library (coil-100). Tech. repGoogle Scholar
  23. 23.
    Rechenberg I (1973) Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution. Frommann-Holzboog, StuttgartGoogle Scholar
  24. 24.
    Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Advances in learning theory: methods, models and applications, IOS Press, pp 131–154Google Scholar
  25. 25.
    Rifkin RM (2002) Everything old is new again: a fresh look at historical approaches in machine learning. PhD thesis, MITGoogle Scholar
  26. 26.
    Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory. Springer, London, pp 416–426Google Scholar
  27. 27.
    Schwefel HP (1977) Numerische optimierung von computer-modellen mittel der evolutionsstrategie. Birkhuser, BaselGoogle Scholar
  28. 28.
    Silva C, Santos JS, Wanner EF, Carrano EG, Takahashi RHC (2009) Semi-supervised training of least squares support vector machine using a multiobjective evolutionary algorithm. In: Proceedings of the eleventh conference on congress on evolutionary computation, IEEE Press, Piscataway, NJ, USA, pp 2996–3002Google Scholar
  29. 29.
    Sindhwani V, Keerthi S, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the international conference on machine learning, pp 841–848Google Scholar
  30. 30.
    Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkzbMATHGoogle Scholar
  31. 31.
    Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300MathSciNetCrossRefGoogle Scholar
  32. 32.
    Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Advances in neural information processing systems, MIT Press, vol 19, pp 1417–1424Google Scholar
  33. 33.
    Vapnik V (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
  34. 34.
    Xu L, Schuurmans D (2005) Unsupervised and semi-supervised multi-class support vector machines. In: Proceedings of the national conference on artificial intelligence, pp 904–910Google Scholar
  35. 35.
    Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Advances in neural information processing systems vol 17, pp 1537–1544Google Scholar
  36. 36.
    Zhang K, Tsang IW, Kwok JT (2007) Maximum margin clustering made practical. In: Proceedings of the international conference on machine learning, pp 1119–1126Google Scholar
  37. 37.
    Zhao B, Wang F, Zhang C (2008a) Efficient maximum margin clustering via cutting plane algorithm. In: Proceedings of the SIAM international conference on data mining, pp 751–762Google Scholar
  38. 38.
    Zhao B, Wang F, Zhang C (2008b) Efficient multiclass maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 1248–1255Google Scholar
  39. 39.
    Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan and Claypool, SeattlezbMATHGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Fabian Gieseke
    • 1
    Email author
  • Oliver Kramer
    • 1
  • Antti Airola
    • 2
  • Tapio Pahikkala
    • 2
  1. 1.Computer Science DepartmentCarl von Ossietzky Universität OldenburgOldenburgGermany
  2. 2.Department of Information Technology, Turku Centre for Computer ScienceUniversity of TurkuTurkuFinland

Personalised recommendations