Training Invariant Support Vector Machines

Abstract

Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.

References

  1. Baird, H. (1990). Document image defect models. In Proceedings, IAPR Workshop on Syntactic and Structural Pattern Recognition (pp. 38-46). Murray Hill, NJ.

    Google Scholar 

  2. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler, (Ed.), Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory (pp. 144-152). Pittsburgh, PA: ACM Press.

    Google Scholar 

  3. Bottou, L. & Vapnik, V. N. (1992). Local learning algorithms. Neural Computation, 4:6, 888-900.

    Google Scholar 

  4. Bromley, J. & Säckinger, E. (1991). Neural-network and k-nearest-neighbor classifiers. Technical Report 11359-910819-16TM, AT &T.

  5. Burges, C. J. C. (1999). Geometry and invariance in kernel based methods. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.). Advances in kernel methods-support vector learning (pp. 89-116). Cambridge, MA: MIT Press.

    Google Scholar 

  6. Burges, C. J. C. & Schölkopf, B. (1997). Improving the accuracy and speed of support vector learning machines. In M. Mozer, M. Jordan, & T. Petsche, (Eds.). Advances in neural information processing systems 9 (pp. 375-381). Cambridge, MA: MIT Press.

    Google Scholar 

  7. Burl, M. C. (2000). NASA volcanoe data set at UCI KDD Archive. (See http:/kdd.ics.uci.edu/databases/ volcanoes/volcanoes.html).

  8. Burl, M. C. (2001). Mining large image collections: Architecture and algorithms. In R. Grossman, C. Kamath, V. Kumar, & R. Namburu (Eds.). Data mining for scientific and engineering applications. Series in Massive Computing. Cambridge, MA: Kluwer Academic Publishers.

    Google Scholar 

  9. Burl, M. C., Asker, L., Smyth, P., Fayyad, U., Perona, P., Crumpler, L., & Aubele, J. (1998). Learning to recognize volcanoes on Venus. Machine Learning, 30, 165-194.

    Google Scholar 

  10. Chapelle, O. & Schölkopf, B. (2000). Incorporating invariances in nonlinear SVMs. Presented at the NIPS2000 workshop on Learning with Kernels.

  11. Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273-297.

    Google Scholar 

  12. Crampin, M. & Pirani, F. A. E. (1986). Applicable differential geometry. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  13. DeCoste, D. & Burl, M. C. (2000). Distortion-invariant recognition via jittered queries. In Computer Vision and Pattern Recognition (CVPR-2000).

  14. DeCoste, D. & Wagstaff, K. (2000). Alpha seeding for support vector machines. In International Conference on Knowledge Discovery and Data Mining (KDD-2000).

  15. Drucker, H., Schapire, R., & Simard, P. (1993). Boosting performance in neural networks. International Journal of Pattern Recognition and Artificial Intelligence, 7:4, 705-719.

    Google Scholar 

  16. Girosi, F. (1998). An equivalence between sparse approximation and support vector machines. Neural Computation, 10:6, 1455-1480.

    Google Scholar 

  17. Haussler, D. (1999). Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99-10, Computer Science Department, University of California at Santa Cruz.

  18. Jaakkola, T. S. & Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In M. S. Kearns, S. A. Solla, & D. A. Cohn (Eds.). Advances in neural information processing systems 11. Cambridge, MA: MIT Press.

    Google Scholar 

  19. Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector machines (Schölkopf et al., 1999).

  20. Keerthi, S., Shevade, S., Bhattacharyya, C., & Murthy, K. (1999). Improvements to Platt's SMO algorithm for SVM classifier design. Technical Report CD-99-14, Dept. of Mechanical and Production Engineering, National University of Singapore.

  21. Kimeldorf, G. S. & Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41, 495-502.

    Google Scholar 

  22. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. J. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541-551.

    Google Scholar 

  23. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278-2324.

    Google Scholar 

  24. LeCun, Y., Jackel, L. D., Bottou, L., Brunot, A., Cortes, C., Denker, J. S., Drucker, H., Guyon, I., Müller, U. A., Säckinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In F. Fogelman-Soulié, & P. Gallinari (Eds.). Proceedings ICANN'95-International Conference on Artificial Neural Networks vol. II, pp. 53-60). Nanterre, France: EC2.

    Google Scholar 

  25. Oliver, N., Schölkopf, B., & Smola, A. J. (2000). Natural regularization in SVMs. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 51-60). Cambridge, MA: MIT Press.

    Google Scholar 

  26. Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.). Advances in kernel methods-support vector learning (pp. 185-208). Cambridge, MA: MIT Press.

    Google Scholar 

  27. Poggio, T. & Girosi, F. (1989). A theory of networks for approximation and learning. Technical Report AIM-1140, Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts.

    Google Scholar 

  28. Poggio, T. & Vetter, T. (1992). Recognition and structure from one 2D model view: Observations on prototypes, object classes and symmetries. A.I. Memo No. 1347, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.

  29. Schölkopf, B. (1997). Support vector learning. R. Oldenbourg Verlag,M¨unchen. Doktorarbeit, TU Berlin. Download: http://www.kernel-machines.org.

  30. Schölkopf, B., Burges, C., & Smola, A. (1999). Advances in kernel methods: support vector machines. Cambridge, MA: MIT Press.

    Google Scholar 

  31. Schülkopf, B., Burges, C., & Vapnik, V. (1995). Extracting support data for a given task. In U. M. Fayyad, & R. Uthurusamy (Eds.). In Proceedings, First International Conference on Knowledge Discovery & Data Mining, Menlo Park: AAAI Press.

    Google Scholar 

  32. Schölkopf, B., Burges, C., & Vapnik, V. (1996). Incorporating invariances in support vector learning machines. In C. von der Malsburg, W. von Seelen, J. C. Vorbrüggen, & B. Sendhoff (Eds.). Artificial neural networks-ICANN'96 (pp. 47-52). Berlin: Springer. Lecture Notes in Computer Science (Vol. 1112).

    Google Scholar 

  33. Schölkopf, B., Simard, P., Smola, A., & Vapnik, V. (1998a). Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, & S. Solla (Eds.). Advances in neural information processing systems 10 (pp. 640-646). Cambridge, MA: MIT Press.

    Google Scholar 

  34. Schölkopf, B., Smola, A., & Müller, K.-R. (1998b). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299-1319.

    Google Scholar 

  35. Simard, P., LeCun, Y., & Denker, J. (1993). Efficient pattern recognition using a new transformation distance. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.). Advances in neural information processing systems 5. Proceedings of the 1992 Conference (pp. 50-58). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  36. Simard, P., Victorri, B., LeCun, Y., & Denker, J. (1992). Tangent prop-a formalism for specifying selected invariances in an adaptive network. In J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.). Advances in neural information processing systems 4, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  37. Smola, A., Schölkopf, B., & Müller, K.-R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11, 637-649.

    Google Scholar 

  38. Teow, L.-N. & Loe, K.-F. (2000). Handwritten digit recognition with a novel vision model that extracts linearly separable features. In computer vision and pattern recognition (CVPR-2000).

  39. Vapnik, V. (1995). The nature of statistical learning theory. NY: Springer.

    Google Scholar 

  40. Vapnik, V. (1998). Statistical learning theory. NY: Wiley.

    Google Scholar 

  41. Watkins, C. (2000). Dynamic alignment kernels. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.). Advances in large margin classifiers (pp. 39-50). Cambridge, MA: MIT Press.

    Google Scholar 

  42. Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., & Müller, K.-R. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16:9, 799-807.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Decoste, D., Schölkopf, B. Training Invariant Support Vector Machines. Machine Learning 46, 161–190 (2002). https://doi.org/10.1023/A:1012454411458

Download citation

  • support vector machines
  • invariance
  • prior knowledge
  • image classification
  • pattern recognition