Stochastic Semantic-Based Multi-objective Genetic Programming Optimisation for Classification of Imbalanced Data

  • Edgar Galván-López
  • Lucia Vázquez-Mendoza
  • Leonardo Trujillo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10062)


Data sets with imbalanced class distribution pose serious challenges to well-established classifiers. In this work, we propose a stochastic multi-objective genetic programming based on semantics. We tested this approach on imbalanced binary classification data sets, where the proposed approach is able to achieve, in some cases, higher recall, precision and F-measure values on the minority class compared to C4.5, Naive Bayes and Support Vector Machine, without significantly decreasing these values on the majority class.



EGL’s research is funded by an ELEVATE Fellowship, the Irish Research Council’s Career Development Fellowship co-funded by Marie Curie Actions. EGL would like to thank the TAO group at INRIA Saclay France for hosting him during the outgoing phase of the fellowship. LVM thanks the SSSP for hosting her during her research visit at TCD. The authors would like to thank the reviewers for their comments that helped us to improve our work. EGL would also like to thank E. Mezura-Montes, O. Ait Elhara and M. Schoenauer for their earlier involvement in this work.


  1. 1.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2014)CrossRefGoogle Scholar
  3. 3.
    Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)CrossRefGoogle Scholar
  4. 4.
    Coello, C.A.C.: Evolutionary multi-objective optimization: a historical view of the field. IEEE Comput. Intell. Mag. 1(1), 28–36 (2006)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Deb, K., Kalyanmoy, D.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, New York (2001)zbMATHGoogle Scholar
  6. 6.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)CrossRefGoogle Scholar
  7. 7.
    Eiben, A.E., Smith, J.: From evolutionary computation to the evolution of things. Nature 521, 476–482 (2015)CrossRefGoogle Scholar
  8. 8.
    Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Berlin (2003). doi: 10.1007/978-3-662-05094-1 CrossRefzbMATHGoogle Scholar
  9. 9.
    Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms, 1st edn. Springer, Berlin (2002). doi: 10.1007/978-3-662-04923-5 CrossRefzbMATHGoogle Scholar
  10. 10.
    Galván-López, E.: Efficient graph-based genetic programming representation with multiple outputs. Int. J. Autom. Comput. 5(1), 81–89 (2008)CrossRefGoogle Scholar
  11. 11.
    Galván-López, E., Cody-Kenny, B., Trujillo, L., Kattan, A.: Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity. In: 2013 IEEE Congress on Evolutionary Computation, pp. 2972–2979, June 2013Google Scholar
  12. 12.
    Galván-López, E., Fagan, D., Murphy, E., Swafford, J., Agapitos, A., O’Neill, M., Brabazon, A.: Comparing the performance of the evolvable \(\pi \) grammatical evolution genotype-phenotype map to grammatical evolution in the dynamic Ms. Pac-Man environment. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, July 2010Google Scholar
  13. 13.
    Galván-López, E., McDermott, J., O’Neill, M., Brabazon, A.: Defining locality in genetic programming to predict performance. In: IEEE Congress on Evolutionary Computation, pp. 1–8. IEEE (2010)Google Scholar
  14. 14.
    Galván-López, E., McDermott, J., O’Neill, M., Brabazon, A.: Towards an understanding of locality in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO 2010, NY, USA, pp. 901–908. ACM (2010)Google Scholar
  15. 15.
    Galván-López, E., Mezura-Montes, E., Ait ElHara, O., Schoenauer, M.: On the use of semantics in multi-objective genetic programming. In: Handl, J., Hart, E., Lewis, P.R., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds.) PPSN 2016. LNCS, vol. 9921, pp. 353–363. Springer, Cham (2016). doi: 10.1007/978-3-319-45823-6_33 CrossRefGoogle Scholar
  16. 16.
    Galván-López, E., Poli, R.: Some steps towards understanding how neutrality affects evolutionary search. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 778–787. Springer, Heidelberg (2006). doi: 10.1007/11844297_79 CrossRefGoogle Scholar
  17. 17.
    López, E.G., Poli, R., Coello, C.A.C.: Reusing code in genetic programming. In: Keijzer, M., O’Reilly, U.-M., Lucas, S., Costa, E., Soule, T. (eds.) EuroGP 2004. LNCS, vol. 3003, pp. 359–368. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-24650-3_34 CrossRefGoogle Scholar
  18. 18.
    Galván-López, E., Poli, R., Kattan, A., O’Neill, M., Brabazon, A.: Neutrality in evolutionary algorithms.. What do we know? Evol. Syst. 2(3), 145–163 (2011)CrossRefGoogle Scholar
  19. 19.
    Galván-López, E., Swafford, J.M., O’Neill, M., Brabazon, A.: Evolving a Ms. PacMan controller using grammatical evolution. In: Di Chio, C., et al. (eds.) EvoApplications 2010. LNCS, vol. 6024, pp. 161–170. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-12239-2_17 CrossRefGoogle Scholar
  20. 20.
    Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201, October 2008Google Scholar
  21. 21.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  22. 22.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  23. 23.
    Koza, J.R.: Human-competitive results produced by genetic programming. Genet. Program. Evolvable Mach. 11(3–4), 251–284 (2010)CrossRefGoogle Scholar
  24. 24.
    Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)CrossRefGoogle Scholar
  25. 25.
    Poli, R., Galván-López, E.: The effects of constant and bit-wise neutrality on problem hardness, fitness distance correlation and phenotypic mutation rates. IEEE Trans. Evol. Comput. 16(2), 279–300 (2012)CrossRefzbMATHGoogle Scholar
  26. 26.
    Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)CrossRefGoogle Scholar
  27. 27.
    Vanneschi, L., Castelli, M., Silva, S.: A survey of semantic methods in genetic programming. Genet. Program. Evolvable Mach. 15(2), 195–214 (2014)CrossRefGoogle Scholar
  28. 28.
    Weiss, G.M., Provost, F.: Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Int. Res. 19(1), 315–354 (2003)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Edgar Galván-López
    • 1
  • Lucia Vázquez-Mendoza
    • 2
  • Leonardo Trujillo
    • 3
  1. 1.Department of Computer ScienceNational University of Ireland MaynoothMaynoothIreland
  2. 2.School of Social Sciences and PhilosophyTrinity College DublinDublinIreland
  3. 3.Posgrado en Ciencias de la IngenieríaInstituto Tecnológico de TijuanaTijuanaMexico

Personalised recommendations