Machine Learning

, Volume 46, Issue 1–3, pp 131–159 | Cite as

Choosing Multiple Parameters for Support Vector Machines

  • Olivier Chapelle
  • Vladimir Vapnik
  • Olivier Bousquet
  • Sayan Mukherjee
Article

Abstract

The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractable as soon as the number of parameters exceeds two. Some experimental results assess the feasibility of our approach for a large number of parameters (more than 100) and demonstrate an improvement of generalization performance.

support vector machines kernel selection leave-one-out procedure gradient descent feature selection 

References

  1. Bengio. Y. (2000). Gradient-based optimization of hyper-parameters. Neural Computation, 12:8.Google Scholar
  2. Bonnans J. F. & Shapiro, A. (2000). Perturbation analysis of optimization problems. Berlin: Springer-Verlag.Google Scholar
  3. Chapelle, O. & Vapnik, V. (1999). Model selection for support vector machines. In Advances in neural information processing systems.Google Scholar
  4. Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273-297.Google Scholar
  5. Cristianini, N., Campbell, C., & Shawe-Taylor, J. (1999). Dynamically adapting kernels in support vector machines. In Advances in neural information processing systems. Google Scholar
  6. Cristianini, N. & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge, MA: Cambridge University Press.Google Scholar
  7. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., & Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.Google Scholar
  8. Heisele, B., Poggio, T., & Pontil, M. (2000). Face detection in still gray images. AI Memo 1687, Massachusetts Institute of Technology.Google Scholar
  9. Jaakkola, T. S. & Haussler, D. (1999). Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics.Google Scholar
  10. Joachims, T. (2000). Estimating the generalization performance of a svm efficiently. In Proceedings of the International Conference on Machine Learning. San Mateo, CA: Morgan Kaufman.Google Scholar
  11. Larsen, J., Svarer, C., Andersen, L. N., & Hansen, L. K. (1998). Adaptive regularization in neural network modeling. In G. B. Orr & K. R. Müller (Eds.). Neural networks: Trick of the trade. Berlin: Springer.Google Scholar
  12. Luntz, A. & Brailovsky, V. (1969). On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3, (in Russian).Google Scholar
  13. Lütkepohl, H. (1996). Handbook of matrices. New York: Wiley & Sons.Google Scholar
  14. Opper, M. & Winther, O. (2000). Gaussian processes and svm: Mean field and leave-one-out. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 311-326). Cambridge, MA: MIT Press.Google Scholar
  15. Platt, J. (2000). Probabilities for support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers. Cambridge, MA: MIT Press.Google Scholar
  16. Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42:3, 287-320.Google Scholar
  17. Serre, T., Heisele, B., Mukherjee, S., & Poggio, T. (2000). Feature selection for face detection. AI Memo 1697, Massachusetts Institute of Technology.Google Scholar
  18. Vapnik, V. (1995). The nature of statistical learning theory. Berlin: Springer.Google Scholar
  19. Vapnik, V. (1998). Statistical learning theory. New York: John Wiley & Sons.Google Scholar
  20. Vapnik, V. & Chapelle, O. (2000). Bounds on error expectation for support vector machines: Neural Computation, 12:9.Google Scholar
  21. Wahba, G., Lin, Y., & Zhang, H. (2000). Generalized approximate crossvalidation for support vector machines: Another way to look at marginlike quantities. In A. Smola, P. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 297-309). Cambridge, MA: MIT Press.Google Scholar
  22. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for support vector machines. In Advances in neural information processing systems.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Olivier Chapelle
    • 1
  • Vladimir Vapnik
    • 2
  • Olivier Bousquet
    • 3
  • Sayan Mukherjee
    • 4
  1. 1.LIP6ParisFrance
  2. 2.AT&T Research LabsMiddletownUSA
  3. 3.École PolytechniqueFrance
  4. 4.MITCambridgeUSA

Personalised recommendations