Advertisement

Data Mining and Knowledge Discovery

, Volume 11, Issue 3, pp 295–321 | Cite as

Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing

  • Hwanjo YuEmail author
  • Jiong Yang
  • Jiawei Han
  • Xiaolei Li
Article

Abstract

Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However, despite these prominent properties, SVMs are usually not chosen for large-scale data mining problems because their training complexity is highly dependent on the data set size. Unlike traditional pattern recognition and machine learning, real-world data mining applications often involve huge numbers of data records. Thus it is too expensive to perform multiple scans on the entire data set, and it is also infeasible to put the data set in memory. This paper presents a method, Clustering-Based SVM (CB-SVM), that maximizes the SVM performance for very large data sets given a limited amount of resource, e.g., memory. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples. These samples carry statistical summaries of the data and maximize the benefit of learning. Our analyses show that the training complexity of CB-SVM is quadratically dependent on the number of support vectors, which is usually much less than that of the entire data set. Our experiments on synthetic and real-world data sets show that CB-SVM is highly scalable for very large data sets and very accurate in terms of classification.

Keywords

Training Time Cluster Feature Training Data Point Nonlinear Kernel Nonlinear Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

The work was supported in part by National Science Foundation under grants No. IIS-02-09199/IIS-03-08215 and an IBM Faculty Award. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.

References

  1. Agarwal, D.K. 2002. Shrinkage estimator generalizations of proximal support vector machines. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'02), pp. 173–182.Google Scholar
  2. Burges, C.J.C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167.CrossRefGoogle Scholar
  3. Cauwenberghs G. and Poggio, T. 2000. Incremental and decremental support vector machine learning. In Proc. Advances in Neural Information Processing Systems (NIPS'00), pp. 409–415.Google Scholar
  4. Chang, C.-C. and Lin, C.-J. 2001. Training nu-support vector classifiers: Theory and algorithms. Neural Computation, 13:2119–2147.CrossRefPubMedzbMATHGoogle Scholar
  5. Collobert, R. and Bengio, S. 2001. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1:143–160.CrossRefMathSciNetGoogle Scholar
  6. Devroye, L. Gyorfi, L., and Lugosi, G. (Eds.), A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.Google Scholar
  7. Domingos P. and Hulten, G. 2000. Mining high-speed data streams. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'00).Google Scholar
  8. Fung, G. and Mangasarian, O.L. 2001. Proximal support vector machine classifiers. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'01), pp. 77–86.Google Scholar
  9. Ganti, V. Ramakrishnan, R., and Gehrke, J. 1999. Clustering large datasets in arbitrary metric spaces. In Proc. Int. Conf. Data Engineering (ICDE'98).Google Scholar
  10. Greiner, R. Grove, A.J., and Roth, D. 1996. Learning active classifiers. In Proc. Int. Conf. Machine Learning (ICML'96), pp. 207–215.Google Scholar
  11. Guha, S. Rastogi, R. and Shim, K. 1998. CURE: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD'98), pp. 73–84.Google Scholar
  12. Joachims, T. 1998a. Text categorization with support vector machines. In Proc. European Conf. Machine Learning (ECML'98), pp. 137–142.Google Scholar
  13. Joachims, T. 1998b. Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Machines, A.J. Smola B. Scholkopf, C. Burges, (Eds.) Cambridge, MA: MIT Press.Google Scholar
  14. Karypis, G. Han, E.-H., and Kumar, V. 1999 Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32:(8)68–75.CrossRefGoogle Scholar
  15. Kivinen, J. Smola, A.J., and Williamson, R.C. 2001. Online learning with kernels. In Proc. Advances in Neural Information Processing Systems (NIPS'01), pp. 785–792.Google Scholar
  16. Lee Y.-J. and Mangasarian, O.L. 2001. RSVM: Reduced support vector machines. In SIAM Int. Conf. Data Mining.Google Scholar
  17. Mangasarian, O.L. and Musicant, D.R. 2000. Active support vector machine classification. Tech. Rep., Computer Sciences Department, University of Wisconsin at Madison.Google Scholar
  18. Platt, J. 1998. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Machines, A.J. Smola B. Scholkopf, and C. Burges (Eds.) Cambridge, MA: MIT Press.Google Scholar
  19. Scheffer T. and Wrobel, S. 2002. Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research.Google Scholar
  20. Schohn, G. and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proc. Int. Conf. Machine Learning (ICML'00), pp. 839–846.Google Scholar
  21. Scholkopf, B. Williamson, R.C. Smola, A.J., and Shawe-Taylor, J. 2000. SV estimation of a distribution's support. In Proc. Advances in Neural Information Processing Systems (NIPS'00), pp. 582–588.Google Scholar
  22. Shih, L. Chang, Y.-H. Rennie, J., and Karger, D. 2002. Not too hot, not too cold: The bundled-svm is just right!. In Proc. the Workshop on Text Learning at the Int. Conf. on Machine Learning.Google Scholar
  23. Smola, A.J. and Scholkopf, B. 1998. A tutorial on support vector regression. Tech. Rep., NeuroCOLT2 Technical Report NC2-TR-1998-030.Google Scholar
  24. Syed, N. Liu, H., and Sung, K. 1999. Incremental learning with support vector machines. In Proc. the Workshop on Support Vector Machines at the Int. Joint Conf. on Articial Intelligence (IJCAI'99).Google Scholar
  25. Tong, S. and Koller, D. 2000. Support vector machine active learning with applications to text classification. In Proc. Int. Conf. Machine Learning (ICML'00), pp. 999–1006.Google Scholar
  26. Vapnik, V.N. 1998. Statistical Learning Theory. John Wiley and Sons.Google Scholar
  27. Wang, W. Yang, J., and Muntz, R.R. 1997. STING: A statistical information grid approach to spatial data mining. In Proc. Int. Conf. Very Large Databases (VLDB'97), pp. 186–195.Google Scholar
  28. Watanabe, O. Balczar, J.L. Dai, Y. 2001. A random sampling technique for training support vector machines. In Int. Conf. Data Mining (ICDM'01), pp. 43–50.Google Scholar
  29. Yu, H. Han, J., and Chang, K.C. 2002. PEBL: Positive-example based learning for Web page classification using SVM. In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'02), pp. 239–248.Google Scholar
  30. Zhang, T. Ramakrishnan, R., and Livny, M. 1996. BIRCH: An efficient data clustering method for very large databases. In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD'96), pp. 103–114.Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of IowaIowaUSA
  2. 2.Department of Computer ScienceCase Western Reserve UniversityOhioUSA
  3. 3.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignIllinoisUSA

Personalised recommendations