Advertisement

Pruning Techniques for Mixed Ensembles of Genetic Programming Models

  • Mauro Castelli
  • Ivo Gonçalves
  • Luca ManzoniEmail author
  • Leonardo Vanneschi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10781)

Abstract

The objective of this paper is to define an effective strategy for building an ensemble of Genetic Programming (GP) models. Ensemble methods are widely used in machine learning due to their features: they average out biases, they reduce the variance and they usually generalize better than single models. Despite these advantages, building ensemble of GP models is not a well-developed topic in the evolutionary computation community. To fill this gap, we propose a strategy that blends individuals produced by standard syntax-based GP and individuals produced by geometric semantic genetic programming, one of the newest semantics-based method developed in GP. In fact, recent literature showed that combining syntax and semantics could improve the generalization ability of a GP model. Additionally, to improve the diversity of the GP models used to build up the ensemble, we propose different pruning criteria that are based on correlation and entropy, a commonly used measure in information theory. Experimental results, obtained over different complex problems, suggest that the pruning criteria based on correlation and entropy could be effective in improving the generalization ability of the ensemble model and in reducing the computational burden required to build it.

Notes

Acknowledgements

This work was also financed through the Regional Operational Programme CENTRO2020 within the scope of the project CENTRO-01-0145-FEDER-000006.

References

  1. 1.
    Vanneschi, L., Castelli, M., Silva, S.: A survey of semantic methods in genetic programming. Genet. Program. Evolvable Mach. 15(2), 195–214 (2014)CrossRefGoogle Scholar
  2. 2.
    Castelli, M., Vanneschi, L., Felice, M.D.: Forecasting short-term electricity consumption using a semantics-based genetic programming framework: the South Italy case. Energy Econ. 47, 37–41 (2015)CrossRefGoogle Scholar
  3. 3.
    Castelli, M., Castaldi, D., Giordani, I., Silva, S., Vanneschi, L., Archetti, F., Maccagnola, D.: An efficient implementation of geometric semantic genetic programming for anticoagulation level prediction in pharmacogenetics. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS (LNAI), vol. 8154, pp. 78–89. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40669-0_8 CrossRefGoogle Scholar
  4. 4.
    Yoo, S., Xie, X., Kuo, F.C., Chen, T.Y., Harman, M.: Human competitiveness of genetic programming in spectrum-based fault localisation: theoretical and empirical analysis. ACM Trans. Softw. Eng. Methodol. 26(1), 4:1–4:30 (2017)CrossRefGoogle Scholar
  5. 5.
    Picek, S., Mariot, L., Leporati, A., Jakobovic, D.: Evolving s-boxes based on cellular automata with genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2017, pp. 251–252. ACM, New York (2017)Google Scholar
  6. 6.
    Keijzer, M., Babovic, V.: Genetic programming, ensemble methods and the bias/variance tradeoff – introductory investigations. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 76–90. Springer, Heidelberg (2000).  https://doi.org/10.1007/978-3-540-46239-2_6 CrossRefGoogle Scholar
  7. 7.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45014-9_1 CrossRefGoogle Scholar
  8. 8.
    Castelli, M., Silva, S., Vanneschi, L.: A C++ framework for geometric semantic genetic programming. Genet. Program. Evolvable Mach. 16(1), 73–81 (2015)CrossRefGoogle Scholar
  9. 9.
    Gonçalves, I., Silva, S., Fonseca, C.M., Castelli, M.: Unsure when to stop? Ask your semantic neighbors. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 929–936. ACM (2017)Google Scholar
  10. 10.
    Polikar, R.: Ensemble learning. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning, pp. 1–34. Springer, Boston (2012).  https://doi.org/10.1007/978-1-4419-9326-7_1 Google Scholar
  11. 11.
    Gonçalves, I.: An exploration of generalization and overfitting in genetic programming: standard and geometric semantic approaches. Ph.D. thesis, Department of Informatics Engineering, University of Coimbra, Portugal (2017)Google Scholar
  12. 12.
    Chen, Q., Xue, B., Shang, L., Zhang, M.: Improving generalisation of genetic programming for symbolic regression with structural risk minimisation. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, pp. 709–716. ACM (2016)Google Scholar
  13. 13.
    Gonçalves, I., Silva, S., Fonseca, C.M.: On the generalization ability of geometric semantic genetic programming. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., García-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 41–52. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16501-1_4 Google Scholar
  14. 14.
    Kommenda, M., Affenzeller, M., Burlacu, B., Kronberger, G., Winkler, S.M.: Genetic programming with data migration for symbolic regression. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 1361–1366. ACM (2014)Google Scholar
  15. 15.
    Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37207-0_7 CrossRefGoogle Scholar
  16. 16.
    Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.B.: Random sampling technique for overfitting control in genetic programming. In: Moraglio, A., Silva, S., Krawiec, K., Machado, P., Cotta, C. (eds.) EuroGP 2012. LNCS, vol. 7244, pp. 218–229. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-29139-5_19 CrossRefGoogle Scholar
  17. 17.
    Gonçalves, I., Silva, S.: Experiments on controlling overfitting in genetic programming. In: Proceedings of the 15th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, EPIA 2011 (2011)Google Scholar
  18. 18.
    Castelli, M., Manzoni, L., Silva, S., Vanneschi, L.: A quantitative study of learning and generalization in genetic programming. In: Silva, S., Foster, J.A., Nicolau, M., Machado, P., Giacobini, M. (eds.) EuroGP 2011. LNCS, vol. 6621, pp. 25–36. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20407-4_3 CrossRefGoogle Scholar
  19. 19.
    Vanneschi, L., Bakurov, I., Castelli, M.: An initialization technique for geometric semantic GP based on demes evolution and despeciation. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 113–120. IEEE (2017)Google Scholar
  20. 20.
    Vanneschi, L., Galvão, B.: A parallel and distributed semantic genetic programming system. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 121–128. IEEE (2017)Google Scholar
  21. 21.
    Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990)CrossRefGoogle Scholar
  22. 22.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: Icml, vol. 96, pp. 148–156 (1996)Google Scholar
  24. 24.
    Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol. 2, pp. 1053–1060. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
  25. 25.
    Gagné, C., Sebag, M., Schoenauer, M., Tomassini, M.: Ensemble learning for free with evolutionary algorithms? In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1782–1789. ACM (2007)Google Scholar
  26. 26.
    Zhang, Y., Bhattacharyya, S.: Genetic programming in classifying large-scale data: an ensemble method. Inf. Sci. 163(1), 85–101 (2004)CrossRefGoogle Scholar
  27. 27.
    Folino, G., Pizzuti, C., Spezzano, G.: GP ensembles for large-scale data classification. IEEE Trans. Evol. Comput. 10(5), 604–616 (2006)CrossRefGoogle Scholar
  28. 28.
    Folino, G., Pizzuti, C., Spezzano, G.: GP ensemble for distributed intrusion detection systems. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 54–62. Springer, Heidelberg (2005).  https://doi.org/10.1007/11551188_6 CrossRefGoogle Scholar
  29. 29.
    Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semant. Sci. Serv. Agents World Wide Web 23, 2–15 (2013)CrossRefGoogle Scholar
  30. 30.
    Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Active learning of regular expressions for entity extraction. IEEE Trans. Cybern. 1–14 (2017)Google Scholar
  31. 31.
    Pappa, G.L., Freitas, A.A.: Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl. Inf. Syst. 19(3), 283–309 (2009)CrossRefGoogle Scholar
  32. 32.
    Bartoli, A., De Lorenzo, A., Medvet, E., Tarlao, F.: Learning text patterns using separate-and-conquer genetic programming. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., García-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 16–27. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16501-1_2 Google Scholar
  33. 33.
    Veeramachaneni, K., Derby, O., Sherry, D., O’Reilly, U.M.: Learning regression ensembles with genetic programming at scale. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1117–1124. ACM (2013)Google Scholar
  34. 34.
    Moraglio, A., Krawiec, K., Johnson, C.G.: Geometric semantic genetic programming. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) PPSN 2012. LNCS, vol. 7491, pp. 21–31. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-32937-1_3 CrossRefGoogle Scholar
  35. 35.
    Vanneschi, L., Castelli, M., Manzoni, L., Silva, S.: A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 205–216. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37207-0_18 CrossRefGoogle Scholar
  36. 36.
    Brooks, T., Pope, D., Marcolini, A.: Airfoil self-noise and prediction. Technical report, NASA RP-1218 (1989)Google Scholar
  37. 37.
    Castelli, M., Vanneschi, L., Silva, S.: Prediction of high performance concrete strength using genetic programming with geometric semantic genetic operators. Expert Syst. Appl. 40(17), 6856–6862 (2013)CrossRefGoogle Scholar
  38. 38.
    Castelli, M., Vanneschi, L., Popovič, A.: Parameter evaluation of geometric semantic genetic programming in pharmacokinetics. Int. J. Bio-Inspired Comput. 8(1), 42–50 (2016)CrossRefGoogle Scholar
  39. 39.
    Yeh, I.-C.: Simulation of concrete slump using neural networks. Constr. Mater. 162(1), 11–18 (2009)CrossRefGoogle Scholar
  40. 40.
    Ortigosa, I., Lopez, R., Garcia, J.: A neural networks approach to residuary resistance of sailing yachts prediction. In: Proceedings of the International Conference on Marine Engineering MARINE, vol. 2007, p. 250 (2007)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Mauro Castelli
    • 1
  • Ivo Gonçalves
    • 2
    • 3
  • Luca Manzoni
    • 4
    Email author
  • Leonardo Vanneschi
    • 1
  1. 1.NOVA IMSUniversidade Nova de LisboaLisboaPortugal
  2. 2.INESC Coimbra, DEECUniversity of CoimbraCoimbraPortugal
  3. 3.CISUC, Department of Informatics EngineeringUniversity of CoimbraCoimbraPortugal
  4. 4.Dipartimento di Informatica, Sistemistica e ComunicazioneUniversità degli Studi di Milano-BicoccaMilanoItaly

Personalised recommendations