Skip to main content
Log in

Fast instance selection method for SVM training based on fuzzy distance metric

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Support Vector Machine (SVM) is a well-known classification technique which has achieved excellent performance in many nonlinear and high dimensional pattern recognition fields. However, due to the high time complexity of training SVM model, it’s difficult to implement it for large-scale data sets. One of the most promising solutions is to reduce the training data used for establishing the optimal classification hyperplane by means of selecting relevant support vectors which are the only factors affecting the classification rule. Thus, instance selection method is an efficient pre-processing technique to reduce the computational complexity and storage requirements of the learning process. In this manuscript, considering the geometry-distribution of data sets, we propose a Half Shell Extraction (HSE) algorithm which falls into the condensation category of instance selection methods. Moreover, fuzzy distance metric based on locality sensitive hash is employed to accelerate the instance selection process. Empirically, an experimental study involving various of data sets is carried out to compare the proposed algorithm with five competitive algorithms, and the results obtained show that the proposed algorithm consistently outperforms the other algorithms in terms of accuracy, reduction capability and runtime.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The data sets analysed during the current study are available in the UCI repository; http://archive.ics.uci.edu/ml/index.php, and KEEL repository; http://keel.es/.

References

  1. Acampora G, Herrera F, Tortora G et al (2018) A multi-objective evolutionary approach to training set selection for support vector machine. Knowl-Based Syst 147:94–108

    Article  Google Scholar 

  2. Almasi ON, Rouhani M (2016) Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turkish J Elect Eng Compu Sci 24(1):219–233

    Article  Google Scholar 

  3. Angiulli F (2005) Fast condensed nearest neighbor rule. In: Proceedings of the 22nd international conference on machine learning, pp 25–32

  4. Arnaiz-González Á , Díez-Pastor JF, Rodríguez JJ et al (2016) Instance selection of linear complexity for big data. Knowl-Based Syst 107:83–95

    Article  Google Scholar 

  5. Assheton P, Hunter A (2011) A shape-based voting algorithm for pedestrian detection and tracking. Patt Recognit 44(5):1106–1120

    Article  MATH  Google Scholar 

  6. Awad M, Khan L, Bastani F et al (2004) An effective support vector machines (svms) performance using hierarchical clustering. In: 16th IEEE international conference on tools with artificial intelligence. IEEE, pp 663–667

  7. Balcázar J, Dai Y, Watanabe O (2001) A random sampling technique for training support vector machines. In: International conference on algorithmic learning theory. Springer, pp 119– 134

  8. Birzhandi P, Kim KT, Lee B et al (2019) Reduction of training data using parallel hyperplane for support vector machine. Appl Artif Intell 33(6):497–516

    Article  Google Scholar 

  9. Cao S, Liu X, Liu Z (2006) Fuzzy suppor t vector machine of dismissing margin based on the method of class-center. Comput Eng Appl 42(22):146–149

    Google Scholar 

  10. Cervantes J, Lin X, Yu W (2006) Support vector machine classification based on fuzzy clustering for large data sets. In: Mexican international conference on artificial intelligence. Springer, pp 572–582

  11. Chang F, Guo CY, Lin XR et al (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972

    MathSciNet  MATH  Google Scholar 

  12. Chang KW, Hsieh CJ, Lin CJ (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res, vol 9(7)

  13. Chen J, Zhang C, Xue X et al (2013) Fast instance selection for speeding up support vector machines. Knowl-Based Syst 45:1–7

    Article  Google Scholar 

  14. Cheng F, Chen J, Qiu J et al (2020) A subregion division based multi-objective evolutionary algorithm for svm training set selection. Neurocomputing 394:70–83

    Article  Google Scholar 

  15. Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: 18th international conference on pattern recognition (ICPR’06). IEEE, pp 556–559

  16. Dai G, Yeung DY, Qian YT (2007) Face recognition using a kernel fractional-step discriminant analysis algorithm. Patt Recognit 40(1):229–243

    Article  MATH  Google Scholar 

  17. Datar M, Immorlica N, Indyk P et al (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry, pp 253–262

  18. Garcia S, Derrac J, Cano J et al (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Patt Anal Mach Intell 34(3):417–435

    Article  Google Scholar 

  19. Graf H, Cosatto E, Bottou L et al (2004) Parallel support vector machines: The cascade svm. Adv Neural Inf Process Syst 17:521–528

    Google Scholar 

  20. Har-Peled S, Indyk P, Motwani R (2012) Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput 8(1):321–350

    Article  MathSciNet  MATH  Google Scholar 

  21. Hart P (1968) The condensed nearest neighbor rule (corresp.) IEEE Trans Inf Theory 14 (3):515–516

    Article  Google Scholar 

  22. Hsieh CJ, Chang KW, Lin CJ et al (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on Machine learning, pp 408–415

  23. Kawulok M, Nalepa J (2012) Support vector machines training data selection using a genetic algorithm

  24. Kawulok M, Nalepa J (2014) Dynamically adaptive genetic algorithm to select training data for svms. In: Ibero-American conference on artificial intelligence. Springer, pp 242–254

  25. Keerthi SS, Shevade SK, Bhattacharyya C et al (2000) A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Trans Neural Netw 11(1):124–136

    Article  Google Scholar 

  26. Keerthi SS, Shevade SK, Bhattacharyya C et al (2001) Improvements to platt’s smo algorithm for svm classifier design. Neural Comput 13(3):637–649

    Article  MATH  Google Scholar 

  27. Koggalage R, Halgamuge S (2004) Reducing the number of training samples for fast support vector machine classification. Neural Inf Process-Letters Reviews 2(3):57–65

    Google Scholar 

  28. Lee YJ, Mangasarian OL (2001) Rsvm: reduced support vector machines. In: Proceedings of the 2001 SIAM International Conference on Data Mining. SIAM, pp 1-17

  29. Li HL, Wang C, Yuan B (2003) An improved svm: Nn-svm. Chinese Journal Of Computers-Chinese Edition- 26(8):1015–1020

    MathSciNet  Google Scholar 

  30. Li Z, Weida Z, Licheng J (2000) Pre-extracting support vectors for support vector machine. In: WCC 2000-ICSP 2000. 2000 5th international conference on signal processing proceedings. 16th world computer congress 2000. IEEE, pp 1432–1435

  31. Liu C, Wang W, Wang M et al (2017) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 116:58–73

    Article  Google Scholar 

  32. López-Chau A, García LL, Cervantes J et al (2012) Data selection using decision tree for svm classification. In: 2012 IEEE 24th international conference on tools with artificial intelligence. IEEE, pp 742–749

  33. Lyhyaoui A, Martinez M, Mora I et al (1999) Sample selection via clustering to construct support vector-like classifiers. IEEE Trans Neural Netw 10(6):1474–1481

    Article  Google Scholar 

  34. Mourad S, Tewfik A Vikalo H (2019) Weighted subset selection for fast svm training

  35. Nalepa J, Kawulok M (2014a) Adaptive genetic algorithm to select training data for support vector machines. In: European conference on the applications of evolutionary computation. Springer, pp 514–525

  36. Nalepa J, Kawulok M (2014b) A memetic algorithm to select training data for support vector machines. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation, pp 573–580

  37. Nalepa J, Kawulok M (2016) Adaptive memetic algorithm enhanced with data geometry analysis to select training data for svms. Neurocomputing 185:113–132

    Article  Google Scholar 

  38. Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900

    Article  Google Scholar 

  39. Ougiaroglou S, Diamantaras KI, Evangelidis G (2018) Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280:101–110

    Article  Google Scholar 

  40. Pighetti R, Pallez D, Precioso F (2015) Improving svm training sample selection using multi-objective evolutionary algorithm and lsh. In: 2015 IEEE symposium series on computational intelligence. IEEE, pp 1383–1390

  41. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines

  42. Qin J, Yung NH (2010) Scene categorization via contextual visual words. Pattern Recogn 43 (5):1874–1888

    Article  MATH  Google Scholar 

  43. Richtárik P, Takáč M (2016) Parallel coordinate descent methods for big data optimization. Math Program 156(1):433–484

    Article  MathSciNet  MATH  Google Scholar 

  44. Rosales-Pérez A, García S, Gonzalez JA et al (2017) An evolutionary multiobjective model and instance selection for support vector machines with pareto-based ensembles. IEEE Trans Evol Comput 21(6):863–877

    Article  Google Scholar 

  45. Shen XJ, Mu L, Li Z et al (2016) Large-scale support vector machine classification with redundant data reduction. Neurocomputing 172:189–197

    Article  Google Scholar 

  46. Shin H, Cho S (2002) Pattern selection for support vector classifiers. In: International conference on intelligent data engineering and automated learning. Springer, pp 469–474

  47. Shin H, Cho S (2003) Fast pattern selection for support vector classifiers. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 376–387

  48. Shrivastava A, Ahirwal RR (2013) A svm and k-means clustering based fast and efficient intrusion detection system. Int J Comput Appl 72(6):25–29

    Google Scholar 

  49. Vamvakas G, Gatos B, Perantonis SJ (2010) Handwritten character recognition through two-stage foreground sub-sampling. Pattern Recogn 43(8):2807–2816

    Article  MATH  Google Scholar 

  50. Vapnik V (2013) The nature of statistical learning theory. Springer Sci Business Media

  51. Yang J, Yu X, Xie ZQ et al (2011) A novel virtual sample generation method based on gaussian distribution. Knowl-Based Syst 24(6):740–748

    Article  Google Scholar 

  52. Yu G, Tian J, Li M (2016) Nearest neighbor-based instance selection for classification. In: International conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, pp 75-80

  53. Yu H, Yang J, Han J et al (2005) Making svms scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Disc 11(3):295–321

    Article  Google Scholar 

  54. Yu L, Wende Y, Dake H et al (2007) Fast reduction for large-scale training data set. J Southwest Jiaotong University:4

Download references

Acknowledgments

The authors would like to thank the anonymous referees for the valuable comments and suggestions which help us to improve this paper.

Funding

This work has been supported by Fundamental Research Funds for the Central Universities under grant SWU117051.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuan Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Liu, C. Fast instance selection method for SVM training based on fuzzy distance metric. Appl Intell 53, 18109–18124 (2023). https://doi.org/10.1007/s10489-022-04447-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04447-7

Keywords

Navigation