Scaling Kernel-Based Systems to Large Data Sets

Abstract

In the form of the support vector machine and Gaussian processes, kernel-based systems are currently very popular approaches to supervised learning. Unfortunately, the computational load for training kernel-based systems increases drastically with the size of the training data set, such that these systems are not ideal candidates for applications with large data sets. Nevertheless, research in this direction is very active. In this paper, I review some of the current approaches toward scaling kernel-based systems to large data sets.

This is a preview of subscription content, access via your institution.

References

  1. Christianini, N. and Shawe-Taylor, J. 2000. Support Vector Machines. Cambridge: Cambridge University Press.

    Google Scholar 

  2. Csató, L.E. and Opper, M. 2001. Sparse representation for Gaussian process models. In Advances in Neural Information Processing Systems, 13, T.K. Leen, T.G. Diettrich, and V. Tresp (Eds.).

  3. Fahrmeir, L. and Tutz, G. 1994. Multivariate Statistical Modeling based on Generalized Linear Models. Berlin: Springer.

    Google Scholar 

  4. Joachims, T. 1998. Making large-scale support vector machine learning practical. In Advances in Kernel Methods, B. Schölkopf, C.J.C. Burges, and A.J. Smola (Eds.), Cambridge: MIT Press.

    Google Scholar 

  5. Gibbs, M. and MacKay, D.J.C. 1997. Efficient Implementation of Gaussian Processes. (Technical Report, available from http://wol.ra.phy.cam.ac.uk/mackay/homepage.html.

  6. Lee, Y.-J. and Mangasarian, O.L. 2000. RSVM:Reduced SupportVector Machines. Data Mining InstituteTechnical Report 00-07. Computer Sciences Department, University of Wisconsin.

  7. MacKay, D.J.C. 1997. Introduction to Gaussian processes. In Neural Networks and Machine Learning, NATO Asi Series, C.M. Bishop (Ed.), Series F, Computer and Systems Science, Vol. 168.

  8. Mangasarian, O.L. and Musicant, D.R. 1999. Massive Support Vector Regression. Data Mining Institute Technical Report 99-01. Computer Sciences Department, University of Wisconsin.

  9. Mangasarian, O.L. and Musicant, D.R. 2000. Lagrangian support vector machine. Data Mining Institute Technical Report 00-06. Computer Sciences Department, University of Wisconsin.

  10. Mangasarian, O.L. and Musicant, D.R. 2001. Active support vector machine classification. In Advances in Neural Information Processing Systems, 13, T.K. Leen, T.G. Diettrich, and V. Tresp (Eds.).

  11. Müller, K.-R., Mika, S., Rätsch, G., Tsuda, T., and Schölkopf, B. An introduction to kernel-based learning algorithms.IEEE Transactions on Neural Networks, May 2001.

  12. Osuna, E., Freund, R., and Girosi, F. 1997. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing VII—Proceedings of 1997 IEEE Workshop, J. Principe, L. Giles, N. Morgan, and E. Wilson (Eds.), New York: IEEE.

    Google Scholar 

  13. Pavlov, D., Mao, J., and Dom, B. 2000. Scaling support vector machines using boosting algorithm. Proceedings of the ICPR.

  14. Pavlov, D., Chudova, D., and Smyth, P. 2000. Towards scalable support vector machines using squashing. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2000. New York: The Association of Computing Machinery.

    Google Scholar 

  15. Pérez-Cruz, F., Alarcón-Diana, P.L., Navia-Vázquez, A., and Artés-Rodríguez, A. 2001. Fast training of support vector classifiers. In Advances in Neural Information Processing Systems, 13, T.K. Leen, T.G. Diettrich, and V. Tresp (Eds.).

  16. Platt, J.C. 1998. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods, B. Schölkopf, C.J.C. Burges, and A.J. Smola (Eds.), Cambridge: MIT Press.

    Google Scholar 

  17. Poggio, T. and Girosi, F. 1990. Networks for approximation and learning. In Proceedings of the IEEE, 78.

  18. Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. 1992. Numerical Recipes in C. Cambridge: Cambridge University Press.

    Google Scholar 

  19. Schapire, R.E. 1990. The strength of weak learnability. Machine Learning, 5:197.

    Google Scholar 

  20. Schölkopf, B., Burges, C.J.C., and Smola, A.J. 1999. Advances in Kernel Methods. Cambridge, MA: MIT Press.

    Google Scholar 

  21. Schwaighofer, A. and Tresp, V. 2001. The Bayesian committee support vector machine. Proceedings of the Eleventh International Conference on Artificial Neural Networks, ICANN 2001.

  22. Smola, A.J. and Schölkopf, B. 2000. Sparse greedy matrix approximations for machine learning. In Proceedings of the 17th International Conference on Machine Learning, P. Langley (Ed.).

  23. Smola, A.J. and Bartlett, P. 2001. Sparse greedy Gaussian process regression. In Advances in Neural Information Processing Systems, 13, T.K. Leen, T.G. Diettrich, and V. Tresp (Eds.).

  24. Tipping, M.E. 2000. The relevance vector machine. In Advances in Neural Information Processing Systems, 12, S.A. Solla, T.K. Leen, and K.-R. Müller (Eds.).

  25. Trecate, G.F., Williams, C.K.I., and Opper, M. 1998. Finite-dimensional approximations of Gaussian processes. In Advances in Neural Information Processing Systems, 11, M.J. Kearns, S.A. Solla, and D.A. Cohn (Eds.).

  26. Tresp, V. 2000a. The Bayesian committee machine. Neural Computation, 12.

  27. Tresp, V. 2000b. The generalized Bayesian committee machine. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2000. New York: The Association of Computing Machinery.

    Google Scholar 

  28. Tresp, V. and Schwaighofer, A. 2001. Scalable kernel systems. Proceedings of the Eleventh International Conference on Artificial Neural Networks, ICANN 2001.

  29. Vapnik, V.N. 1998. Statistical Learning Theory. New York: John Wiley & Sons.

    Google Scholar 

  30. Wahba, G. 1990. Spline Models for Observational Data. Philadelphia: Society for Industrial and Applied Mathematics.

    Google Scholar 

  31. Williams, C.K.I. and Barber, D. 1998. Bayesian classification with Gaussian processes. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12).

  32. Williams, C.K.I. and Seeger, M. 2001. Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems, 13, T.K. Leen, T.G. Diettrich, and V. Tresp (Eds.).

  33. Zhu, H., Williams, C.K.I., Rohwer, R., and Morciniec, M. 1998. Gaussian regression and optimal finite dimensional linear models. In neural networks and machine learning, NATO Asi Series, C.M. Bishop (Ed.), Series F, Computer and Systems Sciences, Vol. 168.

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tresp, V. Scaling Kernel-Based Systems to Large Data Sets. Data Mining and Knowledge Discovery 5, 197–211 (2001). https://doi.org/10.1023/A:1011425201219

Download citation

  • Kernel-based systems
  • support vector machine
  • Gaussian processes
  • committee machines
  • massive data sets