Skip to main content

SVGPM: evolving SVM decision function by using genetic programming to solve imbalanced classification problem

Abstract

In supervised learning, imbalanced class dataset is a state where the class distribution is not uniform among the classes. Most standard classifiers fail to properly identify pattern that belongs to minority class because most of those classifiers are built to minimize the error rate. As a result, a biased classification model is highly anticipated, as higher accuracy metrics can solely be represented by the majority class. In order to tackle this problem, several methods have been proposed, mainly to reduce the classifier’s bias, such as performing resampling on the dataset, modification on a classifier optimization problem, or introducing a new optimization task on top of the classifier. Our proposal is based on a new optimization task on top of a classifier, combined as a part of the learning process. Specifically, a hybrid classifier based on genetic programming and support vector machines is proposed. Our classifier has shown to be competitive enough against several variations of support vector machines in solving imbalanced classification problem from the experimentation carried out.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. 1.

    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  2. 2.

    Zheng, B., Myint, S.W., Thenkabail, P.S., Aggarwal, R.M.: A support vector machine to identify irrigated crop types using time-series landsat NDVI data. Int. J. Appl. Earth Obs. Geoinf. 34, 103–112 (2015)

    Article  Google Scholar 

  3. 3.

    Geiß, C., Pelizari, P.A., Marconcini, M., Sengara, W., Edwards, M., Lakes, T., Taubenböck, H.: Estimation of seismic building structural types using multi-sensor remote sensing and machine learning techniques. ISPRS J. Photogramm. Remote. Sens. 104, 175–188 (2015)

    Article  Google Scholar 

  4. 4.

    Yu, L., Zhou, R., Tang, L., Chen, R.: A dbn-based resampling svm ensemble learning paradigm for credit classification with imbalanced data. Appl. Soft Comput. 69, 192–202 (2018)

    Article  Google Scholar 

  5. 5.

    Lameski, P., Zdravevski, E., Mingov, R., Kulakov, A.: Svm parameter tuning with grid search and its impact on reduction of model over-fitting. In: Rough sets, fuzzy sets, data mining, and granular computing, pp. 464–474. Springer (2015)

  6. 6.

    Mease, D., Wyner, A.J., Buja, A.: Boosted classification trees and class probability/quantile estimation. J. Mach. Learn. Res. 8, 409–439 (2007)

    MATH  Google Scholar 

  7. 7.

    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  8. 8.

    Iranmehr, A., Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive support vector machines. Neurocomputing 343, 50–64 (2019)

    Article  Google Scholar 

  9. 9.

    Tanveer, M., Gautam, C., Suganthan, P.N.: Comprehensive evaluation of twin SVM based classifiers on UCI datasets. Appl. Soft Comput. 83, 105–617 (2019)

    Article  Google Scholar 

  10. 10.

    Gonzalez-Abril, L., Nuñez, H., Angulo, C., Velasco, F.: Gsvm: An svm for handling imbalanced accuracy between classes inbi-classification problems. Appl. Soft Comput. 17, 23–31 (2014)

    Article  Google Scholar 

  11. 11.

    Imam, T., Ting, K.M., Kamruzzaman, J.: z-SVM: an SVM for improved classification of imbalanced data. In: Advances in Artificial Intelligence, pp. 264–273. Springer (2006)

  12. 12.

    Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks 13(2), 415–425 (2002)

    Article  Google Scholar 

  13. 13.

    Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag

  14. 14.

    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press (2000)

  15. 15.

    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  16. 16.

    Fernández, A., López, V., Galar, M., Del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl.-Based Syst. 42, 97–110 (2013)

    Article  Google Scholar 

  17. 17.

    Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2012)

    Article  Google Scholar 

  18. 18.

    Mathew, J., Pang, C.K., Luo, M., Leong, W.H.: Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 29(9), 4065–4076 (2017)

    Article  Google Scholar 

  19. 19.

    Douzas, G., Bacao, F.: Self-organizing map oversampling (somo) for imbalanced data set learning. Expert Syst. Appl. 82, 40–52 (2017)

    Article  Google Scholar 

  20. 20.

    Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)

    Article  Google Scholar 

  21. 21.

    Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39(2), 539–550 (2009)

    Article  Google Scholar 

  22. 22.

    Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets (2003)

  23. 23.

    Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)

    Article  Google Scholar 

  24. 24.

    Kang, Q., Chen, X., Li, S., Zhou, M.: A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans. Cybern. 47(12), 4263–4274 (2016)

    Article  Google Scholar 

  25. 25.

    Koziarski, M.: Radial-based undersampling for imbalanced data classification. Pattern Recognit. 102, 107–262 (2020)

  26. 26.

    Barua, S., Islam, M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014). https://doi.org/10.1109/TKDE.2012.232

    Article  Google Scholar 

  27. 27.

    Lu, W., Li, Z., Chu, J.: Adaptive ensemble undersampling-boost: a novel learning framework for imbalanced data. J. Syst. Softw. 132, 272–282 (2017)

    Article  Google Scholar 

  28. 28.

    Batuwita, R., Palade, V.: Fsvm-cil: fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18(3), 558–571 (2010)

    Article  Google Scholar 

  29. 29.

    Khemchandani, R., Chandra, S., et al.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 905–910 (2007)

    Article  Google Scholar 

  30. 30.

    Tomar, D., Agarwal, S.: Twin support vector machine: a review from 2007 to 2014. Egypt. Inf. J. 16(1), 55–69 (2015)

    Google Scholar 

  31. 31.

    Ji, W., Liu, D., Meng, Y., Xue, Y.: A review of genetic-based evolutionary algorithms in SVM parameters optimization. Evolutionary Intelligence, pp. 1–26 (2020)

  32. 32.

    Xuefeng, L., Fang, L.: Choosing multiple parameters for SVM based on genetic algorithm. In: 6th International Conference on Signal Processing, 2002, vol. 1, pp. 117–119. IEEE (2002)

  33. 33.

    Gupta, P., Mehlawat, M.K., Mittal, G.: Asset portfolio optimization using support vector machines and real-coded genetic algorithm. J. Glob. Optim. 53(2), 297–315 (2012)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Kalyani, S., Swarup, K.: Static security assessment in power systems using multi-class SVM with parameter selection methods. Int. J. Comput. Theory Eng. 5(3), 465 (2013)

    Article  Google Scholar 

  35. 35.

    Mishra, S., Ahirwar, A.: An analysis on feature selection method using real coded genetic algorithm (RCGA). J. Softw. Eng. Tools & Technol. Trends 5(1), 23–30 (2018)

    Google Scholar 

  36. 36.

    Rai, P., Barman, A.G.: Design optimization of spur gear using SA and RCGA. J. Braz. Soc. Mech. Sci. Eng. 40(5), 1–8 (2018)

    Article  Google Scholar 

  37. 37.

    Yin, Z.Y., Jin, Y.F., Shen, S.L., Huang, H.W.: An efficient optimization method for identifying parameters of soft structured clay by an enhanced genetic algorithm and elastic-viscoplastic model. Acta Geotech. 12(4), 849–867 (2017)

    Article  Google Scholar 

  38. 38.

    Tao, M., Xinzhi, Z., Yinjie, L.: A parameters optimization method for an SVM based on adaptive genetic algorithm. Comput. Measur. Control 24(9), 215–217 (2016)

    Google Scholar 

  39. 39.

    Tam, V.W., Cheng, K.Y., Lui, K.S.: Using micro-genetic algorithms to improve localization in wireless sensor networks. JCM 1(4), 1–10 (2006)

    Article  Google Scholar 

  40. 40.

    De Sampaio, W.B., Silva, A.C., de Paiva, A.C., Gattass, M.: Detection of masses in mammograms with adaption to breast density using genetic algorithm, phylogenetic trees, lbp and svm. Expert Syst. Appl. 42(22), 8911–8928 (2015)

    Article  Google Scholar 

  41. 41.

    Zhang, J., Zhou, X., Yang, J., Cao, C., Ma, J.: Adaptive robust blind watermarking scheme improved by entropy-based svm and optimized quantum genetic algorithm. Mathematical Problems in Engineering 2019 (2019)

  42. 42.

    Chen, P., Yuan, L., He, Y., Luo, S.: An improved svm classifier based on double chains quantum genetic algorithm and its application in analogue circuit diagnosis. Neurocomputing 211, 202–211 (2016)

    Article  Google Scholar 

  43. 43.

    Devos, O., Downey, G., Duponchel, L.: Simultaneous data pre-processing and svm classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils. Food Chem. 148, 124–130 (2014)

    Article  Google Scholar 

  44. 44.

    Li, X., Kong, W., Shi, W., Shen, Q.: A combination of chemometrics methods and gc-ms for the classification of edible vegetable oils. Chemom. Intell. Lab. Syst. 155, 145–150 (2016)

    Article  Google Scholar 

  45. 45.

    Adankon, M.M., Cheriet, M.: Genetic algorithm-based training for semi-supervised svm. Neural Comput. Appl. 19(8), 1197–1206 (2010)

    Article  Google Scholar 

  46. 46.

    Ding, S., Zhu, Z., Zhang, X.: An overview on semi-supervised support vector machine. Neural Comput. Appl. 28(5), 969–978 (2017)

    Article  Google Scholar 

  47. 47.

    Corus, D., Oliveto, P.S.: Standard steady state genetic algorithms can hillclimb faster than mutation-only evolutionary algorithms. IEEE Trans. Evol. Comput. 22(5), 720–732 (2017)

    Article  Google Scholar 

  48. 48.

    Maratea, A., Petrosino, A., Manzo, M.: Adjusted f-measure and kernel scaling for imbalanced data learning. Inf. Sci. 257, 331–341 (2014)

    Article  Google Scholar 

  49. 49.

    Ripley, B.: Classification and regression trees. R package version pp. 1–0 (2005)

  50. 50.

    Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  51. 51.

    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F. (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17:1

  52. 52.

    Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)

  53. 53.

    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  54. 54.

    Tahir, M. A.U.H., Aghar, S., Manzoor, A.,Noor, M.A.: classification model for class imbalance dataset using genetic programming. IEEE Access 7, 71013–71037. https://doi.org/10.1109/ACCESS.2019.2915611

  55. 55.

    Lessmann, S., Stahlbock, R., Crone, S.F.: Genetic algorithms for support vector machine model selection. In: International Joint Conference on Neural Networks, IJCNN’06, pp. 3063–3069. IEEE (2006)

  56. 56.

    Howley, T., Madden, M.G.: The genetic evolution of kernels for support vector machine classifiers. In: 15th Irish conference on artificial intelligence, pp. 445–453. Citeseer (2004)

  57. 57.

    Frohlich, H., Chapelle, O., Scholkopf, B.: Feature selection for support vector machines by means of genetic algorithm. In: Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence, 2003, pp. 142–148. IEEE (2003)

  58. 58.

    Shao, L., Liu, L., Li, X.: Feature learning for image classification via multiobjective genetic programming. IEEE Trans. Neural Netw. Learn. Syst. 25(7), 1359–1371 (2014)

    Article  Google Scholar 

  59. 59.

    Cervantes, J., Li, X., Yu, W.: Using Genetic Algorithm to improvecassification accuracy on imbalanced data. In: 2013 IEEE InternationalConference on Systems, Man, and Cybernetics, pp. 2659-2664 (2013). https://doi.org/10.1109/SMC.2013.7

Download references

Acknowledgements

This research is supported by Malaysia Ministry of Higher Education under Grant FRGS-RACER/1/2019/SS09/UUM//2.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Muhammad Syafiq Mohd Pozi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pozi, M.S.M., Azhar, N.A., Raziff, A.R.A. et al. SVGPM: evolving SVM decision function by using genetic programming to solve imbalanced classification problem. Prog Artif Intell (2021). https://doi.org/10.1007/s13748-021-00260-4

Download citation