Abstract
The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.
Similar content being viewed by others
References
Wang F (2011) Semisupervised metric learning by maximizing constraint margin. IEEE Trans Syst Man Cybern B 41(4):931–939
Yu J, Rui Y, Tang YY, Tao D (2014) High-order distance based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442
Yu J, Tao D (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272
Li IJ, Wu JL (2014) A new nearest neighbor classification algorithm based on local probability centers. Math Probl Eng 2014. doi:10.1155/2014/324742
Yu J, Tao D, Rui Y, Cheng J (2013) Pairwise constraints based multiview features fusion for scene classification. Pattern Recognit 46:483–496
Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
Triguero I, Derrac J, Garcıa S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C 42(1):86–100
Joachims T (1999) Transductive inference for text classification using support vector machines prodigy. In: Proceedings of international conference on machine learning
Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor for visual object recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition
Van Nguyen H, Porikli F (2013) Support vector shape: a classifier-based shape representation. IEEE Trans Pattern Anal Mach Intell 35(4):970–982
Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474
Chang CC, Lin CJ (2016) LIBSVM: a library for support vector machines. Software Available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm
Rojas SA, Fernandez Reyes D (2005) Adapting multiple kernel parameters for support vector machines using genetic algorithms. In: The 2005 IEEE congress on evolutionary computation, vol 1. pp 626–631
Liang X, Liu F (2002) Choosing multiple parameters for SVM based on genetic algorithm. In: 6th International conference on signal processing, vol 1. pp 117–119
Liu HJ, Wang YN, Lu XF (2005) A method to choose kernel function and its parameters for support vector machines. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4277–4280
Liu S, Jia CY, Ma H (2005) A new weighted support vector machine with GA-based parameter selection. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4351–4355
Quang AT, Zhang QL, Li X (2002) Evolving support vector machine parameters. In: Proceedings of 2002 international conference on machine learning and cybernetics, vol 1. pp 548–551
Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717
Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of 1st SIAM international conference on data mining
Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In: Proceedings of international conference on knowledge discovery data mining. pp 306–315
Bakur GH, Bottou L, Weston J (2005) Breaking SVM complexity with cross-training. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems (NIPS), vol 17. MIT Press, Cambridge, pp 81–88
Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357
Devi FS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35(2):505–513
Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Academic Press, London
Hart PE, Stock DG, Duda RO (2001) Pattern classification, 2nd edn. Wiley, Hoboken
Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900
Lau KW, Wu QH (2008) Local prediction of non-linear time series using support vector regression. Pattern Recognit 41(5):1556–1564
Li IJ, Chen JC, Wu JL (2013) A fast prototype reduction method based on template reduction and visualization-induced self-organizing map for nearest neighbor algorithm. Appl Intell 39(3):564–582
Cheng HB, Tan PN, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549
Schrijver A (1998) Theory of linear and integer programming. Wiley, Hoboken
Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464
Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516
Gates W (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433
Blake C, Keogh E, Merz CJ (2009) UCI repository of machine learning databases. Department of Information and Computer Science, University of California. http://www.ics.uci.edu/∼mlearn
Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit 48:3102–3112
Xiong W, Zhang L, Du B, Tao D (2016) Combining local and global: rich and robust feature pooling for visual recognition. Pattern Recognit 62:225–235
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, IJ., Wu, JL. & Yeh, CH. A fast classification strategy for SVM on the large-scale high-dimensional datasets. Pattern Anal Applic 21, 1023–1038 (2018). https://doi.org/10.1007/s10044-017-0620-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-017-0620-0