Mining Very Large Datasets with Support Vector Machine Algorithms

  • François Poulet
  • Thanh-Nghi Do
Conference paper

Abstract

In this paper, we present new support vector machines (SVM) algorithms that can be used to classify very large datasets on standard personal computers. The algorithms have been extended from three recent SVMs algorithms: least squares SVM classification, finite Newton method for classification and incremental proximal SVM classification. The extension consists in building incremental, parallel and distributed SVMs for classification. Our three new algorithms are very fast and can handle very large datasets. An example of the effectiveness of these new algorithms is given with the classification into two classes of one billion points in 10-dimensional input space in some minutes on ten personal computers (800 MHz Pentium III, 256 MB RAM, Linux).

Keywords

Data mining Parallel and distributed algorithms Classification Machine learning Support vector machines Least squares classifiers Newton method Proximal classifiers Incremental learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennett K. and Campbell C., 2000, “Support Vector Machines: Hype or Hallelujah?”, in SIGKDD Explorations, Vol. 2, No. 2, pp. 1–13.Google Scholar
  2. Cauwenberghs G. and Poggio T. 2001, “Incremental and Decremental Support Vector Machine Learning”, in Advances in Neural Information Processing Systems (NIPS 2000), MIT Press, Vol. 13, 2001, Cambridge, USA, pp. 409–415.Google Scholar
  3. Cristianini, N. and Shawe-Taylor, J., 2000, “An Introduction to Support Vector Machines and Other Kernel-based Learning Methods”, Cambridge University Press.Google Scholar
  4. Fayyad U., Piatetsky-Shapiro G., Smyth P., Uthurusamy R., 1996, “Advances in Knowledge Discovery and Data Mining”, AAAI Press.Google Scholar
  5. Fayyad, U., Uthurusamy R., 2002, “Evolving Data Mining into Solutions for Insights”, in Communication of the ACM, 45(8), pp.28–31.CrossRefGoogle Scholar
  6. Fung G. and Mangasarian O., 2001, “Proximal Support Vector Machine Classifiers”, in proc. of the 7th ACM SIGKDD Int. Conf. on KDD’01, San Francisco, USA, pp. 77–86.Google Scholar
  7. Fung G. and Mangasarian O., 2002, “Incremental Support Vector Machine Classification”, in proc. of the 2nd SIAM Int. Conf. on Data Mining SDM’2002 Arlington, Virginia, USA.Google Scholar
  8. Fung G. and Mangasarian O., 2001, “Finite Newton Method for Lagrangian Support Vector Machine Classification”, Data Mining Institute Technical Report 02-01, Computer Sciences Department, University of Wisconsin, Madison, USA.Google Scholar
  9. Guyon I., 1999, “Web Page on SVM Applications”, http://www.clopinet.com/isabelle/Projects/SVM/applist.html
  10. Mangasarian O., 2001, “A Finite Newton Method for Classification Problems”, Data Mining Institute Technical Report 01-11, Computer Sciences Department, University of Wisconsin, Madison, USA.Google Scholar
  11. Musicant D., 1998, “NDC: Normally Distributed Clustered Datasets”, http://www.cs.cf.ac.uk/Dave/C/
  12. Suykens, J. and Vandewalle J., 1999, “Least Squares Support Vector Machines Classifiers”, Neural Processing Letters, Vol. 9, No. 3, pp. 293–300.CrossRefMathSciNetGoogle Scholar
  13. Syed N., Liu H., Sung K., 1999, “Incremental Learning with Support Vector Machines”, in proc. of the 6th ACM SIGKDD Int. Conf. on KDD’99, San Diego, USA.Google Scholar
  14. Vapnik V., 1995, “The Nature of Statistical Learning Theory”, Springer-Verlag, New York.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • François Poulet
    • 1
  • Thanh-Nghi Do
    • 1
  1. 1.ESIEA RechercheLavalFrance

Personalised recommendations