Skip to main content
Log in

Reduction of training data for support vector machine: a survey

  • Data analytics and machine learning
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Support vector machine (SVM) is a popular supervised machine learning technique extensively applied to various real-life applications. A weakness of SVM, though, is that its efficiency rapidly decreases as the size of training dataset increases. This issue is more challenging for the devices and applications of IoT where the computing resource is quite constrained. As a result, numerous schemes have been proposed to decrease the training time of SVM, especially for the large-scale dataset. In this paper, a survey on the state-of-the-art methods for reducing the number of training data points and increasing the speed of SVM is presented. The existing methods are categorized into five types based on the approach employed for data reduction process, namely clustering-based, geometrical, distance-based, random sampling, and evolutionary methods. The key characteristics of the schemes are reviewed and compared against six practical benchmarks. This survey will be useful to select an appropriate method for the given target problem and develop a new approach for improving the performance of SVM dealing with the large-scale dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and material

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Code availability

Not applicable.

References

  • Abbasion S, Rafsanjani A, Farshidianfar A, Irani N (2007) Rolling element bearings multi-fault classification based on the wavelet denoising and support vector machine. Mech Syst Signal Process 21:2933–2945

    Article  Google Scholar 

  • Abe S, Inoue T (2001) Fast training of support vector machines by extracting boundary data. In: Dorffner G, Bischof H, Hornik K (eds) Artificial neural networks—ICANN 2001. Springer, Berlin, Heidelberg, pp 308–313

    Chapter  Google Scholar 

  • Almasi ON, Rouhani M (2016) Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turk J Electr Eng Comput Sci 24:219–233

    Article  Google Scholar 

  • Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19:1450–1464. https://doi.org/10.1109/TKDE.2007.190645

    Article  Google Scholar 

  • Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21:351–357. https://doi.org/10.1109/TNN.2009.2039227

    Article  Google Scholar 

  • Awad M, Khan L, Bastani F, Yen I-L (2004) An effective support vector machines (SVMs) performance using hierarchical clustering. In: IEEE, pp 663–667

  • Balcázar J, Dai Y, Watanabe O (2001) A Random sampling technique for training support vector machines. In: Abe N, Khardon R, Zeugmann T (eds) Algorithmic learning theory. Springer, Berlin, Heidelberg, pp 119–134

    Chapter  Google Scholar 

  • Bang S, Jhun M (2014) Weighted support vector machine using k-means clustering. Commun Stat Simul Comput 43:2307–2324

    Article  MathSciNet  Google Scholar 

  • Barber CB, Dobkin DP, Huhdanpaa H (1996) The quickhull algorithm for convex hulls. ACM Trans Math Softw 22:469–483. https://doi.org/10.1145/235815.235821

    Article  MathSciNet  MATH  Google Scholar 

  • Bennett KP, Bredensteiner EJ (2000) Duality and geometry in SVM classifiers, pp 57–64

  • Birzhandi P, Youn HY (2019) CBCH (clustering-based convex hull) for reducing training time of support vector machine. J Supercomput. https://doi.org/10.1007/s11227-019-02795-9

    Article  Google Scholar 

  • Birzhandi P, Kim KT, Lee B, Youn HY (2019) Reduction of training data using parallel hyperplane for support vector machine. Appl Artif Intell 33:497–516. https://doi.org/10.1080/08839514.2019.1583449

    Article  Google Scholar 

  • Cervantes J, Li X, Yu W (2006) Support vector machine classification based on fuzzy clustering for large data sets. Springer, Berlin, pp 572–582

    Google Scholar 

  • Cervantes J, Li X, Yu W, Li K (2008) Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing 71:611–619. https://doi.org/10.1016/j.neucom.2007.07.028

    Article  Google Scholar 

  • Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1-27:27. https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  • Chau AL, Li X, Yu W (2013) Large data sets classification using convex–concave hull and support vector machine. Soft Comput 17:793–804

    Article  Google Scholar 

  • Cheng H, Tan P, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22:537–549. https://doi.org/10.1109/TKDE.2009.116

    Article  Google Scholar 

  • Crisp DJ, Burges CJ (2000) A geometric interpretation of v-SVM classifiers, pp 244–250

  • Dakka J, Farkas-Pall K, Balasubramanian V, Turilli M, Wan S, Wright DW, Zasada S, Coveney PV, Jha S (2018) Enabling trade-offs between accuracy and computational cost: adaptive algorithms to reduce time to clinical insight. In: 2018 18th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID), pp 572–577

  • de Almeida MB, de Braga AP, Braga JP (2000) SVM-KM: speeding SVMs learning with a priori cluster selection and k-means. In: Proceedings. vol 1. Sixth Brazilian symposium on neural networks, pp 162–167

  • Demidova L, Sokolova Y, Nikulchev E et al (2015) Use of fuzzy clustering algorithms ensemble for SVM classifier development. Int Rev Model Simul IREMOS 8:446–457. https://doi.org/10.15866/iremos.v8i4.6825

    Article  Google Scholar 

  • Dong J, Krzyżak A, Suen CY (2005) An improved handwritten Chinese character recognition system using support vector machine. Pattern Recogn Lett 26:1849–1856

    Article  Google Scholar 

  • Elouedi Z, Mellouli K, Smets P (2001) Belief decision trees: theoretical foundations. Int J Approx Reason 28:91–124. https://doi.org/10.1016/S0888-613X(01)00045-7

    Article  MathSciNet  MATH  Google Scholar 

  • Garg A, Upadhyaya S, Kwiat K (2013) A user behavior monitoring and profiling scheme for masquerade detection. Handb Stat Mach Learn Theory Appl 31:353–379

    MathSciNet  Google Scholar 

  • Goodrich B, Albrecht D, Tischer P (2009) Algorithms for the computation of reduced convex hulls. In: Nicholson A, Li X (eds) AI 2009: advances in artificial intelligence. Springer, Berlin, Heidelberg, pp 230–239

    Chapter  Google Scholar 

  • Grother PJ, Candela GT, Blue JL (1997) Fast implementations of nearest neighbor classifiers. Pattern Recogn 30:459–465. https://doi.org/10.1016/S0031-3203(96)00098-2

    Article  Google Scholar 

  • Guo L, Boukir S (2015) Fast data selection for SVM training using ensemble margin. Pattern Recogn Lett 51:112–119. https://doi.org/10.1016/j.patrec.2014.08.003

    Article  Google Scholar 

  • He Q, Xie Z, Hu Q, Wu C (2011) Neighborhood based sample and feature selection for SVM classification learning. Neurocomputing 74:1585–1594. https://doi.org/10.1016/j.neucom.2011.01.019

    Article  Google Scholar 

  • Kaufman L (1999) Solving the quadratic programming problem arising in support vector classification, pp 147–167

  • Kawulok M, Nalepa J (2012) Support vector machines training data selection using a genetic algorithm. In: Gimel’farb G, Hancock E, Imiya A et al (eds) Structural, syntactic, and statistical pattern recognition. Springer, Berlin, Heidelberg, pp 557–565

    Chapter  Google Scholar 

  • Khosravani HR, Ruano AE, Ferreira PM (2013) A simple algorithm for convex hull determination in high dimensions. In: 2013 IEEE 8th international symposium on intelligent signal processing, pp 109–114

  • Koggalage R, Halgamuge S (2004) Reducing the number of training samples for fast support vector machine classification. Neural Inf Process Lett Rev 2:57–65

    Google Scholar 

  • Kumar MA, Gopal M (2010) A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recogn Lett 31:1437–1444

    Article  Google Scholar 

  • Lee Y, Huang S (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18:1–13. https://doi.org/10.1109/TNN.2006.883722

    Article  Google Scholar 

  • Lee SW, Verri A (eds) (2003) Pattern recognition with support vector machines: first international workshop, SVM 2002, Niagara Falls, Canada. Proceedings, vol 2388. Springer

  • Li R, Bhanu B, Krawiec K (2007) Hybrid coevolutionary algorithms vs. SVM algorithms. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York, pp 456–463

  • Li C, Liu K, Wang H (2011) The incremental learning algorithm with support vector machine based on hyperplane-distance. Appl Intell 34:19–27

    Article  Google Scholar 

  • Li I-J, Wu J-L, Yeh C-H (2018) A fast classification strategy for SVM on the large-scale high-dimensional datasets. Pattern Anal Appl 21:1023–1038. https://doi.org/10.1007/s10044-017-0620-0

    Article  MathSciNet  Google Scholar 

  • Liu P, Choo K-KR, Wang L, Huang F (2017) SVM or deep learning? A comparative study on remote sensing image classification. Soft Comput 21:7053–7065. https://doi.org/10.1007/s00500-016-2247-2

    Article  Google Scholar 

  • López-Chau A, Li X, Yu W (2012) Convex-concave hull for classification with support vector machine. In: 2012 IEEE 12th international conference on data mining workshops, pp 431–438

  • Makris A, Kosmopoulos D, Perantonis S, Theodoridis S (2011) A hierarchical feature fusion framework for adaptive visual tracking. Image vis Comput 29:594–606. https://doi.org/10.1016/j.imavis.2011.07.001

    Article  Google Scholar 

  • Manimala K, David IG, Selvi K (2015) A novel data selection technique using fuzzy C-means clustering to enhance SVM-based power quality classification. Soft Comput 19:3123–3144. https://doi.org/10.1007/s00500-014-1472-9

    Article  Google Scholar 

  • Mavroforakis ME, Theodoridis S (2006) A geometric approach to support vector machine (SVM) classification. IEEE Trans Neural Netw 17:671–682

    Article  Google Scholar 

  • Mavroforakis ME, Sdralis M, Theodoridis S (2006) A novel SVM geometric algorithm based on reduced convex hulls. In: IEEE, pp 564–568

  • Mitra V, Wang C-J, Banerjee S (2007) Text classification: a least square support vector machine approach. Appl Soft Comput 7:908–914

    Article  Google Scholar 

  • Moslemnejad S, Hamidzadeh J (2019) A hybrid method for increasing the speed of SVM training using belief function theory and boundary region. Int J Mach Learn Cyber. https://doi.org/10.1007/s13042-019-00944-3

    Article  Google Scholar 

  • Muruganantham A, Nguyen PT, Lydia EL, Shankar K, Hashim W, Maseleno A (2019) Big data analytics and intelligence: a perspective for health care. Int J Eng Adv Technol 8:861–864

    Google Scholar 

  • Nalepa J, Kawulok M (2014a) Adaptive genetic algorithm to select training data for support vector machines. In: Esparcia-Alcázar AI, Mora AM (eds) Applications of evolutionary computation. Springer, Berlin, Heidelberg, pp 514–525

    Chapter  Google Scholar 

  • Nalepa J, Kawulok M (2014b) A memetic algorithm to select training data for support vector machines. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation. ACM, New York, pp 573–580

  • Nalepa J, Blocho M (2016) Adaptive memetic algorithm for minimizing distance in the vehicle routing problem with time windows. Soft Comput 20:2309–2327. https://doi.org/10.1007/s00500-015-1642-4

    Article  Google Scholar 

  • Nalepa J, Kawulok M (2016) Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs. Neurocomputing 185:113–132. https://doi.org/10.1016/j.neucom.2015.12.046

    Article  Google Scholar 

  • Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900

    Article  Google Scholar 

  • Nalepa J, Siminski K, Kawulok M (2015) Towards parameter-less support vector machines. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp 211–215

  • Osuna E, De Castro O (2002) Convex hull in feature space for support vector machines. Springer, Berlin, pp 411–419

    MATH  Google Scholar 

  • Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Neural networks for signal processing VII. Proceedings of the 1997 IEEE signal processing society workshop, pp 276–285

  • Ougiaroglou S, Diamantaras KI, Evangelidis G (2018) Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280:101–110. https://doi.org/10.1016/j.neucom.2017.08.076

    Article  Google Scholar 

  • Peng P, Ma QL, Hong LM (2009) The research of the parallel SMO algorithm for solving SVM. In: 2009 International conference on machine learning and cybernetics, pp 1271–1274

  • Pietruszkiewicz W, Imada A (2013) Artificial intelligence evolved from random behaviour: departure from the state of the Art. In: Yang X-S (ed) Artificial intelligence, evolutionary computing and metaheuristics: in the footsteps of alan turing. Springer, Berlin, Heidelberg, pp 19–41

    Chapter  Google Scholar 

  • Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines

  • Qiu J, Wu Q, Ding G et al (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016:67. https://doi.org/10.1186/s13634-016-0355-x

    Article  Google Scholar 

  • Sánchez AVD (2003) Advanced support vector machines and kernel methods. Neurocomputing 55:5–20

    Article  Google Scholar 

  • Shen X, Li Z, Jiang Z, Zhan Y (2013) Distributed SVM classification with redundant data removing. In: IEEE, pp 866–870

  • Shen X-J, Mu L, Li Z et al (2016) Large-scale support vector machine classification with redundant data reduction. Neurocomputing 172:189–197

    Article  Google Scholar 

  • Shin H, Cho S (2002) Pattern selection for support vector classifiers. In: Yin H, Allinson N, Freeman R et al (eds) Intelligent data engineering and automated learning—IDEAL 2002. Springer, Berlin, Heidelberg, pp 469–474

    Chapter  Google Scholar 

  • Sun Z, Guo Z, Liu C et al (2017) Fast extended one-versus-rest multi-label support vector machine using approximate extreme points. IEEE Access 5:8526–8535

    Article  Google Scholar 

  • Theodoridis S, Mavroforakis M (2007) Reduced convex hulls: a geometric approach to support vector machines [lecture notes]. IEEE Signal Process Mag 24(3):119–122

    Article  Google Scholar 

  • Varadwaj P, Purohit N, Arora B (2009) Detection of splice sites using support vector machine. Springer, Berlin, pp 493–502

    Google Scholar 

  • Wang J, Neskovic P, Cooper LN (2007) Selecting data for fast support vector machines training. In: Chen K, Wang L (eds) Trends in neural computation. Springer, Berlin, Heidelberg, pp 61–84

    Chapter  Google Scholar 

  • Wang D, Qiao H, Zhang B, Wang M (2013) Online support vector machine based on convex hull vertices selection. IEEE Trans Neural Netw Learn Syst 24:593–609. https://doi.org/10.1109/TNNLS.2013.2238556

    Article  Google Scholar 

  • Wani MA (2013) Hybrid method for fast SVM training in applications involving large volumes of data. In: 2013 12th international conference on machine learning and applications, pp 491–494

  • Wrona S, Pawełczyk M (2013) Controllability-oriented placement of actuators for active noise-vibration control of rectangular plates using a memetic algorithm. Archiv Acoust 38:529–536

    Article  Google Scholar 

  • Xia S, Xiong Z, Luo Y, Dong L (2015) A method to improve support vector machine based on distance to hyperplane. Optik Int J Light Electr Opt 126:2405–2410

    Article  Google Scholar 

  • Yang Q, Webb G (eds) (2008) PRICAI 2006: trends in artificial intelligence: 9th Pacific rim international conference on artificial intelligence, Guilin, China, August 7–11 Proceedings. Springer

  • Yang Y, Yu D, Cheng J (2007) A fault diagnosis approach for roller bearing based on IMF envelope spectrum and SVM. Measurement 40:943–950

    Article  Google Scholar 

  • Yao Y, Liu Y, Yu Y et al (2013) K-SVM: an effective SVM algorithm based on K-means clustering. JCP 8:2632–2639

    Google Scholar 

  • Yu H, Yang J, Han J, Li X (2005) Making SVMs scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Disc 11:295–321

    Article  Google Scholar 

  • Zeng Z-Q, Yu H-B, Xu H-R et al (2008) Fast training support vector machines using parallel sequential minimal optimization. In: 2008 3rd International conference on intelligent system and knowledge engineering, pp 997–1001

  • Zeng M, Yang Y, Zheng J, Cheng J (2015) Maximum margin classification based on flexible convex hulls. Neurocomputing 149:957–965

    Article  Google Scholar 

  • Zeng Z-Q, Xu H-R, Xie Y-Q, Gao J (2008) A geometric approach to train SVM on very large data sets. In: 2008 3rd International conference on intelligent system and knowledge engineering, pp 991–996

  • Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data. ACM, New York, pp 103–114

  • Zhiyong D, Zuolin D, Peixin Q, Xianfang W (2010) Fuzzy support vector machine based on improved sequential minimal optimization algorithm. In: 2010 international conference on computer and communication technologies in agriculture engineering, pp 152–155

  • Zhong W, Chow R, Stolz R et al (2008) Hierarchical clustering support vector machines for classifying type-2 diabetes patients. Bioinformatics Research and Applications. Springer, Berlin, Heidelberg, pp 379–389

    Chapter  Google Scholar 

  • Zhou C, Yin K, Cao Y, Ahmed B (2016) Application of time series analysis and PSO–SVM model in predicting the Bazimen landslide in the Three Gorges Reservoir, China. Eng Geol 204:108–120. https://doi.org/10.1016/j.enggeo.2016.02.009

    Article  Google Scholar 

Download references

Funding

This work was partly supported by the Institute for Information and communications Technology Promotion (IITP) Grant funded by the Korea government (MSIT) (No. 2016–0-00133, Research on Edge computing via collective intelligence of hyperconnection IoT nodes), Korea, under the National Program for Excellence in SW supervised by the IITP (Institute for Information and communications Technology Promotion) (2015–0-00914), Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2017R1A2B2009095, Research on SDN-based WSN Supporting Real-time Stream Data Processing and Multiconnectivity, 2019R1I1A1A01058780, Efficient Management of SDN-based Wireless Sensor Network Using Machine Learning Technique), the second Brain Korea 21 PLUS project.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to this study. PB and HYY had the idea for the article and performed the literature search and data analysis. PB drafted the paper and KTK and HYY critically revised the work.

Corresponding author

Correspondence to Hee Yong Youn.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Birzhandi, P., Kim, K.T. & Youn, H.Y. Reduction of training data for support vector machine: a survey. Soft Comput 26, 3729–3742 (2022). https://doi.org/10.1007/s00500-022-06787-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-06787-5

Keywords

Navigation