Support Vector Machine Classification Based on Fuzzy Clustering for Large Data Sets

  • Jair Cervantes
  • Xiaoou Li
  • Wen Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)


Support vector machine (SVM) has been successfully applied to solve a large number of classification problems. Despite its good theoretic foundations and good capability of generalization, it is a big challenging task for the large data sets due to the training complexity, high memory requirements and slow convergence. In this paper, we present a new method, SVM classification based on fuzzy clustering. Before applying SVM we use fuzzy clustering, in this stage the optimal number of clusters are not needed in order to have less computational cost. We only need to partition the training data set briefly. The SVM classification is realized with the center of the groups. Then the de-clustering and SVM classification via reduced data are used. The proposed approach is scalable to large data sets with high classification accuracy and fast convergence speed. Empirical studies show that the proposed approach achieves good performance for large data sets.


Support Vector Machine Fuzzy Cluster Quadratic Programming Problem Data Subset Membership Grade 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Awad, M., Khan, L., Bastani, F., Yen, I.L.: An Effective support vector machine (SVMs) Performance Using Hierarchical Clustering. In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004), November 15-17, 2004, vol. 00, pp. 663–667 (2004)Google Scholar
  2. 2.
    Balcazar, J.L., Dai, Y., Watanabe, O.: Provably Fast Training Algorithms for support vector machine. In: Proc. of the 1st IEEE Int. Conf. on Data Mining, pp. 43–50. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  3. 3.
    Chih-Chung, C., Chih-Jen, L.: LIBSVM: a library for support vector machines (2001),
  4. 4.
    Collobert, R., Bengio, S.: SVMTorch: support vector machine for large regresion problems. Journal of Machine Learning Research 1, 143–160 (2001)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Daniael, B., Cao, D.: Training support vector machine Using Adaptive Clustering. In: Proc. of SIAM Int. Conf on Data Mining 2004, Lake Buena Vista, FL, USA (2004)Google Scholar
  6. 6.
    Joachims, T.: Making large-scale support vector machine learning practical. In: Scholkopf, A.S.B., Burges, C. (eds.) Advances in Kernel Methods: support vector machine, MIT Press, Cambridge (1998)Google Scholar
  7. 7.
    Kim, S.W., Oommen, B.J.: Enhancing prototype reduction schemes with recursion: a method applicable for ”large” data sets. IEEE Transactions on Systems, Man and Cybernetics, Part B 34(3), 1384–1397 (2004)CrossRefGoogle Scholar
  8. 8.
    Lebrun, G., Charrier, C., Cardot, H.: SVM training time reduction using vector quantization. In: Proceedings of the 17th International Conference on pattern recognition, vol. 1, pp. 160–163 (2004)Google Scholar
  9. 9.
    Li, K., Huang, H.K.: Incremental learning proximal support vector machine classifiers. In: Proceedings. 2002 International Conference on machine learning and cybernetics, vol. 3, pp. 1635–1637 (2002)Google Scholar
  10. 10.
    Luo, F., Khan, L., Bastani, F., Yen, I., Zhou, J.: A Dynamical Growing Self Organizing Tree (DGSOT) for Hierarquical Clustering Gene Expression Profiles. Bioinformatics 20(16), 2605–2617 (2004)CrossRefGoogle Scholar
  11. 11.
    Pavlov, D., Mao, J., Dom, B.: Scaling-up support vector machine using boosting algorithm. In: Proceedings of 15th International Conference on pattern recognition, vol. 2, pp. 219–222 (2000)Google Scholar
  12. 12.
    Platt, J.: Fast Training of support vector machine using sequential minimal optimization. In: Scholkopf, A.S.B., Burges, C. (eds.) Advances in Kernel Methods: support vector machine, MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)CrossRefGoogle Scholar
  14. 14.
    Schohn, G., Cohn, D.: Less is more: Active Learning with support vector machine. In: Proc. 17th Int. Conf. Machine Learning, Stanford, CA (2000)Google Scholar
  15. 15.
    Shih, L., Rennie, D.M., Chang, Y., Karger, D.R.: Text Bundling: Statistics-based Data Reduction. In: Proc of the Twentieth Int. Conf. on Machine Learning (ICML-2003), Washington (2003)Google Scholar
  16. 16.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text clasifications. In: Proc. 17th Int. Conf. Machine Learning, Stanford (2000)Google Scholar
  17. 17.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)MATHGoogle Scholar
  18. 18.
    Van Gestel, T., Suykens, J.A.K., De Moor, B., Vandewalle, J.: Bayesian inference for LS-SVMs on large data sets using the Nystrom method. In: Proceedings of the 2002 International Joint Conference on neural networks, vol. 3, pp. 2779–2784 (2002)Google Scholar
  19. 19.
    Yu, H., Yang, J., Jiawei, H.: Classifying Large Data Sets Using SVMs with Hierarchical Clusters. In: Proc. of the 9th ACM SIGKDD 2003, August 24-27, 2003, Washington (2003)Google Scholar
  20. 20.
    Xu, R., Wunsch, D.: Survey of Clustering Algorithms. IEEE Trans. Neural Networks 16, 645–678 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jair Cervantes
    • 1
  • Xiaoou Li
    • 1
  • Wen Yu
    • 2
  1. 1.Sección de Computación Departamento de Ingenierá Elétrica, CINVESTAV-IPNMéxico D.F.México
  2. 2.Departamento de Control Automático, CINVESTAV-IPNMéxico D.F.México

Personalised recommendations