# Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification

- 152 Downloads

## Abstract

Binary classification tasks are among the most important ones in the field of machine learning. One prominent approach to address such tasks are support vector machines which aim at finding a hyperplane separating two classes well such that the induced distance between the hyperplane and the patterns is maximized. In general, sufficient labeled data is needed for such classification settings to obtain reasonable models. However, labeled data is often rare in real-world learning scenarios while unlabeled data can be obtained easily. For this reason, the concept of support vector machines has also been extended to semi- and unsupervised settings: in the unsupervised case, one aims at finding a partition of the data into two classes such that a subsequent application of a support vector machine leads to the best overall result. Similarly, given both a labeled and an unlabeled part, semi-supervised support vector machines favor decision hyperplanes that lie in a low density area induced by the unlabeled training patterns, while still considering the labeled part of the data. The associated optimization problems for both the semi- and unsupervised case, however, are of combinatorial nature and, hence, difficult to solve. In this work, we present efficient implementations of simple local search strategies for (variants of) the both cases that are based on matrix update schemes for the intermediate candidate solutions. We evaluate the performances of the resulting approaches on a variety of artificial and real-world data sets. The results indicate that our approaches can successfully incorporate unlabeled data. (The unsupervised case was originally proposed by Gieseke F, Pahikkala et al. (2009). The derivations presented in this work are new and comprehend the old ones (for the unsupervised setting) as a special case.)

## Keywords

Machine learning Semi-supervised support vector machines Maximum margin clustering Regularized least-squares classification Combinatorial optimization Matrix calculus## Notes

### Acknowledgments

This work has been supported in part by funds of the *Deutsche Forschungsgemeinschaft* (Fabian Gieseke, grant KR 3695) and by the Academy of Finland (Tapio Pahikkala, grant 134020).

## References

- 1.Bennett KP, Demiriz A (1998) Semi-supervised support vector machines. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11, MIT Press, pp 368–374Google Scholar
- 2.Beyer HG, Schwefel HP (2002) Evolution strategies—a comprehensive introduction. Nat Comput 1:3–52MathSciNetMATHCrossRefGoogle Scholar
- 3.Bie TD, Cristianini N (2003) Convex methods for transduction. In: Advances in neural information processing systems 16, MIT Press, pp 73–80Google Scholar
- 4.Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkMATHGoogle Scholar
- 5.Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
- 6.Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 57–64Google Scholar
- 7.Chapelle O, Chi M, Zien A (2006) A continuation method for semi-supervised svms. In: Proceedings of the international conference on machine learning, pp 185–192Google Scholar
- 8.Chapelle, O, Schölkopf, B, Zien, A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge, MAGoogle Scholar
- 9.Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233MATHGoogle Scholar
- 10.Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the international conference on machine learning, pp 201–208Google Scholar
- 11.Droste S, Jansen T, Wegener I (2002) On the analysis of the (1+1) evolutionary algorithm. Theor Comput Sci 276(1–2):51–81MathSciNetMATHCrossRefGoogle Scholar
- 12.Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50, http://dx.doi.org/10.1023/A:1018946025316 Google Scholar
- 13.Fogel DB (1966) Artificial intelligence through simulated evolution. Wiley, New YorkMATHGoogle Scholar
- 14.Fung G, Mangasarian OL (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Methods Softw 15:29–44MATHCrossRefGoogle Scholar
- 15.Gieseke F, Pahikkala T, Kramer O (2009) Fast evolutionary maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 361–368Google Scholar
- 16.Golub GH, Van Loan C (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore and LondonGoogle Scholar
- 17.Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkMATHCrossRefGoogle Scholar
- 18.Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
- 19.Horn R, Johnson CR (1985) Matrix analysis. Cambridge University Press, CambridgeMATHGoogle Scholar
- 20.Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the international conference on machine learning, pp 200–209Google Scholar
- 21.Mierswa I (2009) Non-convex and multi-objective optimization in data mining. PhD thesis, Technische Universität DortmundGoogle Scholar
- 22.Nene S, Nayar S, Murase H (1996) Columbia object image library (coil-100). Tech. repGoogle Scholar
- 23.Rechenberg I (1973) Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution. Frommann-Holzboog, StuttgartGoogle Scholar
- 24.Rifkin R, Yeo G, Poggio T (2003) Regularized least-squares classification. In: Advances in learning theory: methods, models and applications, IOS Press, pp 131–154Google Scholar
- 25.Rifkin RM (2002) Everything old is new again: a fresh look at historical approaches in machine learning. PhD thesis, MITGoogle Scholar
- 26.Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory. Springer, London, pp 416–426Google Scholar
- 27.Schwefel HP (1977) Numerische optimierung von computer-modellen mittel der evolutionsstrategie. Birkhuser, BaselGoogle Scholar
- 28.Silva C, Santos JS, Wanner EF, Carrano EG, Takahashi RHC (2009) Semi-supervised training of least squares support vector machine using a multiobjective evolutionary algorithm. In: Proceedings of the eleventh conference on congress on evolutionary computation, IEEE Press, Piscataway, NJ, USA, pp 2996–3002Google Scholar
- 29.Sindhwani V, Keerthi S, Chapelle O (2006) Deterministic annealing for semi-supervised kernel machines. In: Proceedings of the international conference on machine learning, pp 841–848Google Scholar
- 30.Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkMATHGoogle Scholar
- 31.Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300MathSciNetCrossRefGoogle Scholar
- 32.Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. In: Advances in neural information processing systems, MIT Press, vol 19, pp 1417–1424Google Scholar
- 33.Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATHGoogle Scholar
- 34.Xu L, Schuurmans D (2005) Unsupervised and semi-supervised multi-class support vector machines. In: Proceedings of the national conference on artificial intelligence, pp 904–910Google Scholar
- 35.Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. In: Advances in neural information processing systems vol 17, pp 1537–1544Google Scholar
- 36.Zhang K, Tsang IW, Kwok JT (2007) Maximum margin clustering made practical. In: Proceedings of the international conference on machine learning, pp 1119–1126Google Scholar
- 37.Zhao B, Wang F, Zhang C (2008a) Efficient maximum margin clustering via cutting plane algorithm. In: Proceedings of the SIAM international conference on data mining, pp 751–762Google Scholar
- 38.Zhao B, Wang F, Zhang C (2008b) Efficient multiclass maximum margin clustering. In: Proceedings of the international conference on machine learning, pp 1248–1255Google Scholar
- 39.Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan and Claypool, SeattleMATHGoogle Scholar