Mining Very Large Datasets with Support Vector Machine Algorithms
In this paper, we present new support vector machines (SVM) algorithms that can be used to classify very large datasets on standard personal computers. The algorithms have been extended from three recent SVMs algorithms: least squares SVM classification, finite Newton method for classification and incremental proximal SVM classification. The extension consists in building incremental, parallel and distributed SVMs for classification. Our three new algorithms are very fast and can handle very large datasets. An example of the effectiveness of these new algorithms is given with the classification into two classes of one billion points in 10-dimensional input space in some minutes on ten personal computers (800 MHz Pentium III, 256 MB RAM, Linux).
KeywordsData mining Parallel and distributed algorithms Classification Machine learning Support vector machines Least squares classifiers Newton method Proximal classifiers Incremental learning
Unable to display preview. Download preview PDF.
- Bennett K. and Campbell C., 2000, “Support Vector Machines: Hype or Hallelujah?”, in SIGKDD Explorations, Vol. 2, No. 2, pp. 1–13.Google Scholar
- Cauwenberghs G. and Poggio T. 2001, “Incremental and Decremental Support Vector Machine Learning”, in Advances in Neural Information Processing Systems (NIPS 2000), MIT Press, Vol. 13, 2001, Cambridge, USA, pp. 409–415.Google Scholar
- Cristianini, N. and Shawe-Taylor, J., 2000, “An Introduction to Support Vector Machines and Other Kernel-based Learning Methods”, Cambridge University Press.Google Scholar
- Fayyad U., Piatetsky-Shapiro G., Smyth P., Uthurusamy R., 1996, “Advances in Knowledge Discovery and Data Mining”, AAAI Press.Google Scholar
- Fung G. and Mangasarian O., 2001, “Proximal Support Vector Machine Classifiers”, in proc. of the 7th ACM SIGKDD Int. Conf. on KDD’01, San Francisco, USA, pp. 77–86.Google Scholar
- Fung G. and Mangasarian O., 2002, “Incremental Support Vector Machine Classification”, in proc. of the 2nd SIAM Int. Conf. on Data Mining SDM’2002 Arlington, Virginia, USA.Google Scholar
- Fung G. and Mangasarian O., 2001, “Finite Newton Method for Lagrangian Support Vector Machine Classification”, Data Mining Institute Technical Report 02-01, Computer Sciences Department, University of Wisconsin, Madison, USA.Google Scholar
- Guyon I., 1999, “Web Page on SVM Applications”, http://www.clopinet.com/isabelle/Projects/SVM/applist.html
- Mangasarian O., 2001, “A Finite Newton Method for Classification Problems”, Data Mining Institute Technical Report 01-11, Computer Sciences Department, University of Wisconsin, Madison, USA.Google Scholar
- Musicant D., 1998, “NDC: Normally Distributed Clustered Datasets”, http://www.cs.cf.ac.uk/Dave/C/
- Syed N., Liu H., Sung K., 1999, “Incremental Learning with Support Vector Machines”, in proc. of the 6th ACM SIGKDD Int. Conf. on KDD’99, San Diego, USA.Google Scholar
- Vapnik V., 1995, “The Nature of Statistical Learning Theory”, Springer-Verlag, New York.Google Scholar