Fast instance selection method for SVM training based on fuzzy distance metric

Zhang, Junyuan; Liu, Chuan

doi:10.1007/s10489-022-04447-7

Fast instance selection method for SVM training based on fuzzy distance metric

Published: 24 January 2023

Volume 53, pages 18109–18124, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

603 Accesses
1 Altmetric
Explore all metrics

Abstract

Support Vector Machine (SVM) is a well-known classification technique which has achieved excellent performance in many nonlinear and high dimensional pattern recognition fields. However, due to the high time complexity of training SVM model, it’s difficult to implement it for large-scale data sets. One of the most promising solutions is to reduce the training data used for establishing the optimal classification hyperplane by means of selecting relevant support vectors which are the only factors affecting the classification rule. Thus, instance selection method is an efficient pre-processing technique to reduce the computational complexity and storage requirements of the learning process. In this manuscript, considering the geometry-distribution of data sets, we propose a Half Shell Extraction (HSE) algorithm which falls into the condensation category of instance selection methods. Moreover, fuzzy distance metric based on locality sensitive hash is employed to accelerate the instance selection process. Empirically, an experimental study involving various of data sets is carried out to compare the proposed algorithm with five competitive algorithms, and the results obtained show that the proposed algorithm consistently outperforms the other algorithms in terms of accuracy, reduction capability and runtime.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Local feature selection for multiple instance learning

Article 01 November 2021

Sequential Instance Based Feature Subset Selection for High Dimensional Data

Data Availability

The data sets analysed during the current study are available in the UCI repository; http://archive.ics.uci.edu/ml/index.php, and KEEL repository; http://keel.es/.

References

Acampora G, Herrera F, Tortora G et al (2018) A multi-objective evolutionary approach to training set selection for support vector machine. Knowl-Based Syst 147:94–108
Article Google Scholar
Almasi ON, Rouhani M (2016) Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turkish J Elect Eng Compu Sci 24(1):219–233
Article Google Scholar
Angiulli F (2005) Fast condensed nearest neighbor rule. In: Proceedings of the 22nd international conference on machine learning, pp 25–32
Arnaiz-González Á , Díez-Pastor JF, Rodríguez JJ et al (2016) Instance selection of linear complexity for big data. Knowl-Based Syst 107:83–95
Article Google Scholar
Assheton P, Hunter A (2011) A shape-based voting algorithm for pedestrian detection and tracking. Patt Recognit 44(5):1106–1120
Article MATH Google Scholar
Awad M, Khan L, Bastani F et al (2004) An effective support vector machines (svms) performance using hierarchical clustering. In: 16th IEEE international conference on tools with artificial intelligence. IEEE, pp 663–667
Balcázar J, Dai Y, Watanabe O (2001) A random sampling technique for training support vector machines. In: International conference on algorithmic learning theory. Springer, pp 119– 134
Birzhandi P, Kim KT, Lee B et al (2019) Reduction of training data using parallel hyperplane for support vector machine. Appl Artif Intell 33(6):497–516
Article Google Scholar
Cao S, Liu X, Liu Z (2006) Fuzzy suppor t vector machine of dismissing margin based on the method of class-center. Comput Eng Appl 42(22):146–149
Google Scholar
Cervantes J, Lin X, Yu W (2006) Support vector machine classification based on fuzzy clustering for large data sets. In: Mexican international conference on artificial intelligence. Springer, pp 572–582
Chang F, Guo CY, Lin XR et al (2010) Tree decomposition for large-scale svm problems. J Mach Learn Res 11:2935–2972
MathSciNet MATH Google Scholar
Chang KW, Hsieh CJ, Lin CJ (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res, vol 9(7)
Chen J, Zhang C, Xue X et al (2013) Fast instance selection for speeding up support vector machines. Knowl-Based Syst 45:1–7
Article Google Scholar
Cheng F, Chen J, Qiu J et al (2020) A subregion division based multi-objective evolutionary algorithm for svm training set selection. Neurocomputing 394:70–83
Article Google Scholar
Chou CH, Kuo BH, Chang F (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: 18th international conference on pattern recognition (ICPR’06). IEEE, pp 556–559
Dai G, Yeung DY, Qian YT (2007) Face recognition using a kernel fractional-step discriminant analysis algorithm. Patt Recognit 40(1):229–243
Article MATH Google Scholar
Datar M, Immorlica N, Indyk P et al (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry, pp 253–262
Garcia S, Derrac J, Cano J et al (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Patt Anal Mach Intell 34(3):417–435
Article Google Scholar
Graf H, Cosatto E, Bottou L et al (2004) Parallel support vector machines: The cascade svm. Adv Neural Inf Process Syst 17:521–528
Google Scholar
Har-Peled S, Indyk P, Motwani R (2012) Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput 8(1):321–350
Article MathSciNet MATH Google Scholar
Hart P (1968) The condensed nearest neighbor rule (corresp.) IEEE Trans Inf Theory 14 (3):515–516
Article Google Scholar
Hsieh CJ, Chang KW, Lin CJ et al (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on Machine learning, pp 408–415
Kawulok M, Nalepa J (2012) Support vector machines training data selection using a genetic algorithm
Kawulok M, Nalepa J (2014) Dynamically adaptive genetic algorithm to select training data for svms. In: Ibero-American conference on artificial intelligence. Springer, pp 242–254
Keerthi SS, Shevade SK, Bhattacharyya C et al (2000) A fast iterative nearest point algorithm for support vector machine classifier design. IEEE Trans Neural Netw 11(1):124–136
Article Google Scholar
Keerthi SS, Shevade SK, Bhattacharyya C et al (2001) Improvements to platt’s smo algorithm for svm classifier design. Neural Comput 13(3):637–649
Article MATH Google Scholar
Koggalage R, Halgamuge S (2004) Reducing the number of training samples for fast support vector machine classification. Neural Inf Process-Letters Reviews 2(3):57–65
Google Scholar
Lee YJ, Mangasarian OL (2001) Rsvm: reduced support vector machines. In: Proceedings of the 2001 SIAM International Conference on Data Mining. SIAM, pp 1-17
Li HL, Wang C, Yuan B (2003) An improved svm: Nn-svm. Chinese Journal Of Computers-Chinese Edition- 26(8):1015–1020
MathSciNet Google Scholar
Li Z, Weida Z, Licheng J (2000) Pre-extracting support vectors for support vector machine. In: WCC 2000-ICSP 2000. 2000 5th international conference on signal processing proceedings. 16th world computer congress 2000. IEEE, pp 1432–1435
Liu C, Wang W, Wang M et al (2017) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 116:58–73
Article Google Scholar
López-Chau A, García LL, Cervantes J et al (2012) Data selection using decision tree for svm classification. In: 2012 IEEE 24th international conference on tools with artificial intelligence. IEEE, pp 742–749
Lyhyaoui A, Martinez M, Mora I et al (1999) Sample selection via clustering to construct support vector-like classifiers. IEEE Trans Neural Netw 10(6):1474–1481
Article Google Scholar
Mourad S, Tewfik A Vikalo H (2019) Weighted subset selection for fast svm training
Nalepa J, Kawulok M (2014a) Adaptive genetic algorithm to select training data for support vector machines. In: European conference on the applications of evolutionary computation. Springer, pp 514–525
Nalepa J, Kawulok M (2014b) A memetic algorithm to select training data for support vector machines. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation, pp 573–580
Nalepa J, Kawulok M (2016) Adaptive memetic algorithm enhanced with data geometry analysis to select training data for svms. Neurocomputing 185:113–132
Article Google Scholar
Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52(2):857–900
Article Google Scholar
Ougiaroglou S, Diamantaras KI, Evangelidis G (2018) Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280:101–110
Article Google Scholar
Pighetti R, Pallez D, Precioso F (2015) Improving svm training sample selection using multi-objective evolutionary algorithm and lsh. In: 2015 IEEE symposium series on computational intelligence. IEEE, pp 1383–1390
Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines
Qin J, Yung NH (2010) Scene categorization via contextual visual words. Pattern Recogn 43 (5):1874–1888
Article MATH Google Scholar
Richtárik P, Takáč M (2016) Parallel coordinate descent methods for big data optimization. Math Program 156(1):433–484
Article MathSciNet MATH Google Scholar
Rosales-Pérez A, García S, Gonzalez JA et al (2017) An evolutionary multiobjective model and instance selection for support vector machines with pareto-based ensembles. IEEE Trans Evol Comput 21(6):863–877
Article Google Scholar
Shen XJ, Mu L, Li Z et al (2016) Large-scale support vector machine classification with redundant data reduction. Neurocomputing 172:189–197
Article Google Scholar
Shin H, Cho S (2002) Pattern selection for support vector classifiers. In: International conference on intelligent data engineering and automated learning. Springer, pp 469–474
Shin H, Cho S (2003) Fast pattern selection for support vector classifiers. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 376–387
Shrivastava A, Ahirwal RR (2013) A svm and k-means clustering based fast and efficient intrusion detection system. Int J Comput Appl 72(6):25–29
Google Scholar
Vamvakas G, Gatos B, Perantonis SJ (2010) Handwritten character recognition through two-stage foreground sub-sampling. Pattern Recogn 43(8):2807–2816
Article MATH Google Scholar
Vapnik V (2013) The nature of statistical learning theory. Springer Sci Business Media
Yang J, Yu X, Xie ZQ et al (2011) A novel virtual sample generation method based on gaussian distribution. Knowl-Based Syst 24(6):740–748
Article Google Scholar
Yu G, Tian J, Li M (2016) Nearest neighbor-based instance selection for classification. In: International conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, pp 75-80
Yu H, Yang J, Han J et al (2005) Making svms scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Disc 11(3):295–321
Article Google Scholar
Yu L, Wende Y, Dake H et al (2007) Fast reduction for large-scale training data set. J Southwest Jiaotong University:4

Download references

Acknowledgments

The authors would like to thank the anonymous referees for the valuable comments and suggestions which help us to improve this paper.

Funding

This work has been supported by Fundamental Research Funds for the Central Universities under grant SWU117051.

Author information

Authors and Affiliations

School of Computer and Information Science, Southwest University, No.2 Tiansheng Road, Beibei District, Chongqing City, 400715, China
Junyuan Zhang & Chuan Liu

Authors

Junyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuan Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, J., Liu, C. Fast instance selection method for SVM training based on fuzzy distance metric. Appl Intell 53, 18109–18124 (2023). https://doi.org/10.1007/s10489-022-04447-7

Download citation

Accepted: 29 December 2022
Published: 24 January 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-022-04447-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast instance selection method for SVM training based on fuzzy distance metric

Abstract

Access this article

Similar content being viewed by others

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Local feature selection for multiple instance learning

Sequential Instance Based Feature Subset Selection for High Dimensional Data

Data Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast instance selection method for SVM training based on fuzzy distance metric

Abstract

Access this article

Similar content being viewed by others

Large-Scale Instance Selection Using a Heterogeneous Value Difference Matrix

Local feature selection for multiple instance learning

Sequential Instance Based Feature Subset Selection for High Dimensional Data

Data Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation