A fast classification strategy for SVM on the large-scale high-dimensional datasets

Li, I-Jing; Wu, Jiunn-Lin; Yeh, Chih-Hung

doi:10.1007/s10044-017-0620-0

A fast classification strategy for SVM on the large-scale high-dimensional datasets

Theoretical Advances
Published: 18 April 2017

Volume 21, pages 1023–1038, (2018)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

I-Jing Li¹,
Jiunn-Lin Wu² &
Chih-Hung Yeh²

407 Accesses
3 Citations
Explore all metrics

Abstract

The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research and Application of Fast Multi-label SVM Classification Algorithm Using Approximate Extreme Points

Distributed independent vector machine for big data classification problems

Article 06 November 2023

LMSVCR: novel effective method of semi-supervised multi-classification

Article 08 November 2021

References

Wang F (2011) Semisupervised metric learning by maximizing constraint margin. IEEE Trans Syst Man Cybern B 41(4):931–939
Article Google Scholar
Yu J, Rui Y, Tang YY, Tao D (2014) High-order distance based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442
Article Google Scholar
Yu J, Tao D (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272
Article MathSciNet Google Scholar
Li IJ, Wu JL (2014) A new nearest neighbor classification algorithm based on local probability centers. Math Probl Eng 2014. doi:10.1155/2014/324742
MathSciNet Google Scholar
Yu J, Tao D, Rui Y, Cheng J (2013) Pairwise constraints based multiview features fusion for scene classification. Pattern Recognit 46:483–496
Article Google Scholar
Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
Article Google Scholar
Triguero I, Derrac J, Garcıa S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C 42(1):86–100
Article Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machines prodigy. In: Proceedings of international conference on machine learning
Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor for visual object recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition
Van Nguyen H, Porikli F (2013) Support vector shape: a classifier-based shape representation. IEEE Trans Pattern Anal Mach Intell 35(4):970–982
Article Google Scholar
Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474
Article Google Scholar
Chang CC, Lin CJ (2016) LIBSVM: a library for support vector machines. Software Available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Rojas SA, Fernandez Reyes D (2005) Adapting multiple kernel parameters for support vector machines using genetic algorithms. In: The 2005 IEEE congress on evolutionary computation, vol 1. pp 626–631
Liang X, Liu F (2002) Choosing multiple parameters for SVM based on genetic algorithm. In: 6th International conference on signal processing, vol 1. pp 117–119
Liu HJ, Wang YN, Lu XF (2005) A method to choose kernel function and its parameters for support vector machines. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4277–4280
Liu S, Jia CY, Ma H (2005) A new weighted support vector machine with GA-based parameter selection. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4351–4355
Quang AT, Zhang QL, Li X (2002) Evolving support vector machine parameters. In: Proceedings of 2002 international conference on machine learning and cybernetics, vol 1. pp 548–551
Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717
Article Google Scholar
Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of 1st SIAM international conference on data mining
Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In: Proceedings of international conference on knowledge discovery data mining. pp 306–315
Bakur GH, Bottou L, Weston J (2005) Breaking SVM complexity with cross-training. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems (NIPS), vol 17. MIT Press, Cambridge, pp 81–88
Google Scholar
Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357
Article Google Scholar
Devi FS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35(2):505–513
Article Google Scholar
Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Academic Press, London
MATH Google Scholar
Hart PE, Stock DG, Duda RO (2001) Pattern classification, 2nd edn. Wiley, Hoboken
Google Scholar
Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900
Article Google Scholar
Lau KW, Wu QH (2008) Local prediction of non-linear time series using support vector regression. Pattern Recognit 41(5):1556–1564
Article Google Scholar
Li IJ, Chen JC, Wu JL (2013) A fast prototype reduction method based on template reduction and visualization-induced self-organizing map for nearest neighbor algorithm. Appl Intell 39(3):564–582
Article Google Scholar
Cheng HB, Tan PN, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549
Article Google Scholar
Schrijver A (1998) Theory of linear and integer programming. Wiley, Hoboken
MATH Google Scholar
Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464
Article Google Scholar
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516
Article Google Scholar
Gates W (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433
Article Google Scholar
Blake C, Keogh E, Merz CJ (2009) UCI repository of machine learning databases. Department of Information and Computer Science, University of California. http://www.ics.uci.edu/∼mlearn
Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit 48:3102–3112
Article Google Scholar
Xiong W, Zhang L, Du B, Tao D (2016) Combining local and global: rich and robust feature pooling for visual recognition. Pattern Recognit 62:225–235
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Statistics, National Taichung University of Science and Technology, 129 Sce. 3, Sanmin Rd., Taichung, 404, Taiwan
I-Jing Li
Deptartment of Computer Science and Engineering, National Chung Hsing University, 250, Kuo Kuang Rd., Taichung, 402, Taiwan
Jiunn-Lin Wu & Chih-Hung Yeh

Authors

I-Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiunn-Lin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Hung Yeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiunn-Lin Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, IJ., Wu, JL. & Yeh, CH. A fast classification strategy for SVM on the large-scale high-dimensional datasets. Pattern Anal Applic 21, 1023–1038 (2018). https://doi.org/10.1007/s10044-017-0620-0

Download citation

Received: 29 August 2016
Accepted: 08 April 2017
Published: 18 April 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10044-017-0620-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast classification strategy for SVM on the large-scale high-dimensional datasets

Abstract

Access this article

Similar content being viewed by others

Research and Application of Fast Multi-label SVM Classification Algorithm Using Approximate Extreme Points

Distributed independent vector machine for big data classification problems

LMSVCR: novel effective method of semi-supervised multi-classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast classification strategy for SVM on the large-scale high-dimensional datasets

Abstract

Access this article

Similar content being viewed by others

Research and Application of Fast Multi-label SVM Classification Algorithm Using Approximate Extreme Points

Distributed independent vector machine for big data classification problems

LMSVCR: novel effective method of semi-supervised multi-classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation