Skip to main content
Log in

A fast classification strategy for SVM on the large-scale high-dimensional datasets

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Wang F (2011) Semisupervised metric learning by maximizing constraint margin. IEEE Trans Syst Man Cybern B 41(4):931–939

    Article  Google Scholar 

  2. Yu J, Rui Y, Tang YY, Tao D (2014) High-order distance based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442

    Article  Google Scholar 

  3. Yu J, Tao D (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272

    Article  MathSciNet  Google Scholar 

  4. Li IJ, Wu JL (2014) A new nearest neighbor classification algorithm based on local probability centers. Math Probl Eng 2014. doi:10.1155/2014/324742

    MathSciNet  Google Scholar 

  5. Yu J, Tao D, Rui Y, Cheng J (2013) Pairwise constraints based multiview features fusion for scene classification. Pattern Recognit 46:483–496

    Article  Google Scholar 

  6. Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435

    Article  Google Scholar 

  7. Triguero I, Derrac J, Garcıa S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C 42(1):86–100

    Article  Google Scholar 

  8. Joachims T (1999) Transductive inference for text classification using support vector machines prodigy. In: Proceedings of international conference on machine learning

  9. Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor for visual object recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition

  10. Van Nguyen H, Porikli F (2013) Support vector shape: a classifier-based shape representation. IEEE Trans Pattern Anal Mach Intell 35(4):970–982

    Article  Google Scholar 

  11. Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474

    Article  Google Scholar 

  12. Chang CC, Lin CJ (2016) LIBSVM: a library for support vector machines. Software Available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm

  13. Rojas SA, Fernandez Reyes D (2005) Adapting multiple kernel parameters for support vector machines using genetic algorithms. In: The 2005 IEEE congress on evolutionary computation, vol 1. pp 626–631

  14. Liang X, Liu F (2002) Choosing multiple parameters for SVM based on genetic algorithm. In: 6th International conference on signal processing, vol 1. pp 117–119

  15. Liu HJ, Wang YN, Lu XF (2005) A method to choose kernel function and its parameters for support vector machines. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4277–4280

  16. Liu S, Jia CY, Ma H (2005) A new weighted support vector machine with GA-based parameter selection. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4351–4355

  17. Quang AT, Zhang QL, Li X (2002) Evolving support vector machine parameters. In: Proceedings of 2002 international conference on machine learning and cybernetics, vol 1. pp 548–551

  18. Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717

    Article  Google Scholar 

  19. Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of 1st SIAM international conference on data mining

  20. Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In: Proceedings of international conference on knowledge discovery data mining. pp 306–315

  21. Bakur GH, Bottou L, Weston J (2005) Breaking SVM complexity with cross-training. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems (NIPS), vol 17. MIT Press, Cambridge, pp 81–88

    Google Scholar 

  22. Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357

    Article  Google Scholar 

  23. Devi FS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35(2):505–513

    Article  Google Scholar 

  24. Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Academic Press, London

    MATH  Google Scholar 

  25. Hart PE, Stock DG, Duda RO (2001) Pattern classification, 2nd edn. Wiley, Hoboken

    Google Scholar 

  26. Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900

    Article  Google Scholar 

  27. Lau KW, Wu QH (2008) Local prediction of non-linear time series using support vector regression. Pattern Recognit 41(5):1556–1564

    Article  Google Scholar 

  28. Li IJ, Chen JC, Wu JL (2013) A fast prototype reduction method based on template reduction and visualization-induced self-organizing map for nearest neighbor algorithm. Appl Intell 39(3):564–582

    Article  Google Scholar 

  29. Cheng HB, Tan PN, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549

    Article  Google Scholar 

  30. Schrijver A (1998) Theory of linear and integer programming. Wiley, Hoboken

    MATH  Google Scholar 

  31. Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464

    Article  Google Scholar 

  32. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  33. Gates W (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433

    Article  Google Scholar 

  34. Blake C, Keogh E, Merz CJ (2009) UCI repository of machine learning databases. Department of Information and Computer Science, University of California. http://www.ics.uci.edu/∼mlearn

  35. Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit 48:3102–3112

    Article  Google Scholar 

  36. Xiong W, Zhang L, Du B, Tao D (2016) Combining local and global: rich and robust feature pooling for visual recognition. Pattern Recognit 62:225–235

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiunn-Lin Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, IJ., Wu, JL. & Yeh, CH. A fast classification strategy for SVM on the large-scale high-dimensional datasets. Pattern Anal Applic 21, 1023–1038 (2018). https://doi.org/10.1007/s10044-017-0620-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-017-0620-0

Keywords

Navigation