An Alternating Genetic Algorithm for Selecting SVM Model and Training Set

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10267)

Abstract

Support vector machines (SVMs) have been found highly helpful in solving numerous pattern recognition tasks. Although it is challenging to train SVMs from large data sets, this obstacle may be mitigated by selecting a small, yet representative, subset of the entire training set. Another crucial and deeply-investigated problem consists in selecting the SVM model. There have been a plethora of methods proposed to effectively deal with these two problems treated independently, however to the best of our knowledge, it was not explored how to effectively combine these two processes. It is a noteworthy observation that depending on the subset selected for training, a different SVM model may be optimal, hence performing these two operations simultaneously is potentially beneficial. In this paper, we propose a new method to select both the training set and the SVM model, using a genetic algorithm which alternately optimizes two different populations. We demonstrate that our approach is competitive with sequential optimization of the hyperparameters followed by selecting the training set. We report the results obtained for several benchmark data sets and we visualize the results elaborated for artificial sets of 2D points.

Keywords

Support vector machines Model selection Training set selection Genetic algorithms 

Notes

Acknowledgments

This work was supported by the National Centre for Research and Development under the grant: POIR.01.02.00-00-0030/15.

References

  1. 1.
    Angiulli, F., Astorino, A.: Scaling up support vector machines using nearest neighbor condensation. IEEE Trans. Neural Netw. 21(2), 351–357 (2010)CrossRefGoogle Scholar
  2. 2.
    Cervantes, J., Lamont, F.G., López-Chau, A., Mazahua, L.R., Ruíz, J.S.: Data selection based on decision tree for SVM classification on large data sets. Appl. Soft Comput. 37, 787–798 (2015)CrossRefGoogle Scholar
  3. 3.
    Chou, J.S., Cheng, M.Y., Wu, Y.W., Pham, A.D.: Optimizing parameters of SVM using fast messy genetic algorithm for dispute classification. Expert Syst. Appl. 41(8), 3955–3964 (2014)CrossRefGoogle Scholar
  4. 4.
    Ferragut, E., Laska, J.: Randomized sampling for large data applications of SVM. In: Proceedings of the ICMLA, vol. 1, pp. 350–355 (2012)Google Scholar
  5. 5.
    Friedrichs, F., Igel, C.: Evolutionary tuning of multiple SVM parameters. Neurocomputing 64, 107–117 (2005)CrossRefGoogle Scholar
  6. 6.
    Gold, C., Sollich, P.: Model selection for support vector machine classification. Neurocomputing 55(1–2), 221–249 (2003)CrossRefGoogle Scholar
  7. 7.
    Guo, L., Boukir, S.: Fast data selection for SVM training using ensemble margin. Pattern Recognit. Lett. 51, 112–119 (2015)CrossRefGoogle Scholar
  8. 8.
    Joachims, T.: Making large-scale SVM learning practical. In: Advances in Kernel Methods, pp. 169–184. MIT Press, Cambridge (1999)Google Scholar
  9. 9.
    Kapp, M.N., Sabourin, R., Maupin, P.: A dynamic model selection strategy for support vector machine classifiers. Appl. Soft Comput. 12(8), 2550–2565 (2012)CrossRefGoogle Scholar
  10. 10.
    Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm. In: Gimel’farb, G.L., et al. (eds.) SSPR & SPR 2012. LNCS, vol. 7626, pp. 557–565. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34166-3_61 CrossRefGoogle Scholar
  11. 11.
    Kawulok, M., Nalepa, J.: Dynamically adaptive genetic algorithm to select training data for SVMs. In: Bazzan, A.L.C., Pichara, K. (eds.) IBERAMIA 2014. LNCS (LNAI), vol. 8864, pp. 242–254. Springer, Cham (2014). doi:10.1007/978-3-319-12027-0_20 Google Scholar
  12. 12.
    Le, Q., Sarlos, T., Smola, A.: Fastfood - approximating kernel expansions in loglinear time. In: Proceedings of the ICML, pp. 1–9 (2013)Google Scholar
  13. 13.
    Lebrun, G., Charrier, C., Lezoray, O., Cardot, H.: Tabu search model selection for SVM. Int. J. Neural Syst. 18(01), 19–31 (2008)CrossRefGoogle Scholar
  14. 14.
    von Luxburg, U., Bousquet, O., Schölkopf, B.: A compression approach to support vector model selection. J. Mach. Learn. Res. 5, 293–323 (2004)MathSciNetMATHGoogle Scholar
  15. 15.
    Nalepa, J., Kawulok, M.: A memetic algorithm to select training data for support vector machines. In: Proceedings of the GECCO, pp. 573–580. ACM (2014)Google Scholar
  16. 16.
    Nalepa, J., Kawulok, M.: Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs. Neurocomputing 185, 113–132 (2016)CrossRefGoogle Scholar
  17. 17.
    Nalepa, J., Siminski, K., Kawulok, M.: Towards parameter-less support vector machines. In: Proceedings of the ACPR, pp. 211–215 (2015)Google Scholar
  18. 18.
    Nishida, K., Kurita, T.: RANSAC-SVM for large-scale datasets. In: Proceedings of the IEEE ICPR, pp. 1–4 (2008)Google Scholar
  19. 19.
    Ripepi, G., Clematis, A., DAgostino, D.: A hybrid parallel implementation of model selection for support vector machines. In: Proceedings of the PDP, pp. 145–149 (2015)Google Scholar
  20. 20.
    Shen, X.J., Mu, L., Li, Z., Wu, H.X., Gou, J.P., Chen, X.: Large-scale SVM classification with redundant data reduction. Neurocomputing 172, 189–197 (2016)CrossRefGoogle Scholar
  21. 21.
    Simiński, K.: Neuro-fuzzy system based kernel for classification with support vector machines. In: Gruca, D.A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. AISC, vol. 242, pp. 415–422. Springer, Cham (2014). doi:10.1007/978-3-319-02309-0_45 CrossRefGoogle Scholar
  22. 22.
    Sullivan, K.M., Luke, S.: Evolving kernels for support vector machine classification. In: Proceedings of the GECCO, pp. 1702–1707. ACM, New York (2007)Google Scholar
  23. 23.
    Tang, Y., Guo, W., Gao, J.: Efficient model selection for support vector machine with Gaussian kernel function. In: Proceedings of the IEEE CIDM, pp. 40–45 (2009)Google Scholar
  24. 24.
    Wang, D., Shi, L.: Selecting valuable training samples for SVMs via data structure analysis. Neurocomputing 71, 2772–2781 (2008)CrossRefGoogle Scholar
  25. 25.
    Wang, Z., Shao, Y.H., Wu, T.R.: A GA-based model selection for smooth twin parametric-margin SVM. Pattern Recognit. 46(8), 2267–2277 (2013)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Silesian University of TechnologyGliwicePoland
  2. 2.Future ProcessingGliwicePoland

Personalised recommendations