Self-adaptive support vector machines: modelling and experiments
In this paper, we introduce a bi-level optimization formulation for the model and feature selection problems of support vector machines (SVMs). A bi-level optimization model is proposed to select the best model, where the standard convex quadratic optimization problem of the SVM training is cast as a subproblem.
The optimal objective value of the quadratic problem of SVMs is minimized over a feasible range of the kernel parameters at the master level of the bi-level model. Since the optimal objective value of the subproblem is a continuous function of the kernel parameters, through implicity defined over a certain region, the solution of this bi-level problem always exists. The problem of feature selection can be handled in a similar manner.
Experiments and results
Two approaches for solving the bi-level problem of model and feature selection are considered as well. Experimental results show that the bi-level formulation provides a plausible tool for model selection.
KeywordsSupport vector machines (SVMs) Machine learning Model selection Feature selection Bi-level programming
Unable to display preview. Download preview PDF.
- Bennett KP (1992) Decision tree construction via linear programming. In: Proceedings of the 4th midwest artificial intelligence and cognitive science society, Utica, Illinois, pp 97–102Google Scholar
- Chapelle O, Vapnik V (2000) Model selection for support vector machines. In: Leen TK, Solla SA, Muller KR(eds) Advances in neural information processing system, vol 12. MIT Press, CambridgeGoogle Scholar
- Conn A, Scheinberg K, Toint PhL (1997) Recent progress in unconstrained nonlinear optimization without drivatives. Math Program 79: 397–414Google Scholar
- Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machine. Cambridge University Press, LondonGoogle Scholar
- Fan E (2002) Global optimization of Lennard-Jones atomic clusters. Master Thesis, Department of Computing and Software, McMaster UniversityGoogle Scholar
- Fourer R, Gay D, Kernighan B (2002) AMPL: A mathematical programming language. Duxbury Press/Brooks/Cole Publishing CompanyGoogle Scholar
- Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. In: Solla SA, Kearns MS, Cohn DA(eds) Advances in neural information processing systems (Cambridgem, MA, USA). MIT Press, Cambridge, pp 487–493Google Scholar
- Joachims T (2000) Estimating the generalization performance of a svm efficiently. In: Pat Langley(eds) Proceedings of ICML-00, 17th international conference on machine learning (Stanford, US). Morgan Kaufmann Publishers, San Francisco, pp 431–438Google Scholar
- LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LJ (1990) Handwritten digit recognition with back-propagation network. In: Advances in neural information processing systems, vol. 2. Morgan Kaufman, San FranciscoGoogle Scholar
- LeCun Y, Jackel L, Bottou L, Brunot A, Cortes C, Denker J, Drucker H, Guyon I, Muller U, Sackinger E, Simard P, Vapnik V (1995) Comparison of learning algorithms for handwritten digit recognition. In: Fogelman F, Gallinari P (eds) International conference on artificial neural networks, pp 53–60Google Scholar
- Pontil M, Verri A (1998) Object recognition with support vector machines. IEEE Trans. PAMI 20: 637–646Google Scholar
- Street WN, Wolberg WH, Mangasarian OL (1993) Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE: international symposium on electronic imaging: science and technology, vol. 1905. San JoseGoogle Scholar
- Vapnik V (1999) The nature of statistical learning theory. Springer, New YorkGoogle Scholar
- Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. NIPS, pp 668–674Google Scholar