Genetic Programming and Evolvable Machines

, Volume 16, Issue 4, pp 455–498 | Cite as

Controlling code growth by dynamically shaping the genotype size distribution

  • Marc-André Gardner
  • Christian Gagné
  • Marc Parizeau
Article

Abstract

Genetic programming is a hyperheuristic optimization approach that seeks to evolve various forms of symbolic computer programs, in order to solve a wide range of problems. However, the approach can be severely hindered by a significant computational burden and stagnation of the evolution caused by uncontrolled code growth. This paper introduces HARM-GP, a novel operator equalization method that conducts an adaptive shaping of the genotype size distribution of individuals in order to effectively control code growth. Its probabilistic nature minimizes the computational overheads on the evolutionary process while its generic formulation allows it to remain independent of both the problem and the genetic variation operators. Comparative results over twelve problems with different dynamics, and over nine other algorithms taken from the literature, show that HARM-GP is excellent at controlling code growth while maintaining good overall performance. Results also demonstrate the effectiveness of HARM-GP at limiting overfitting in real-world supervised learning problems.

Keywords

Genetic programming Bloat control Monte Carlo methods 

Supplementary material

10710_2015_9242_MOESM1_ESM.pdf (1.7 mb)
Appendix A details the sensitivity analysis results reported in Sec. 5.1. Appendix B exposes the general experimental methodology followed, makes a detailed description of each problem and previous results obtained on it, and presents plots and tables detailing the experimental results obtained for each of these problems.

References

  1. 1.
    E. Alfaro-Cid, A. Esparcia-Alcázar, K. Sharman, F. Fernandez de Vega, J. J. Merelo, Prune and plant: a new bloat control method for genetic programming. in Proceedings of the International Conference on Hybrid Intelligent Systems (HIS), pp. 31–35 (2008)Google Scholar
  2. 2.
    E. Alfaro-Cid, J.J. Merelo, F. Fernandez de Vega, A. Esparcia-Alcazar, K. Sharman, Bloat control operators and diversity in genetic programming: a comparative study. Evol. Comput. 18(2), 305–332 (2010)CrossRefGoogle Scholar
  3. 3.
    E. Alpaydin, Introduction to Machine Learning (MIT Press, Cambridge, 2004)Google Scholar
  4. 4.
    N. Amil, N. Bredeche, C. Gagné, S. Gelly, M. Schoenauer, O. Teytaud, A statistical learning perspective of genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 327–338 (2009)Google Scholar
  5. 5.
    J. Bacardit, M. Stout, N. Krasnogor, J. D. Hirst, J. Blazewicz, Coordination number prediction using learning classifier systems: performance and interpretability. in Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 247–254 (2006)Google Scholar
  6. 6.
    W. Banzhaf, W.B. Langdon, Some considerations on the reason for bloat. Genet. Program. Evolvable Mach. 3(1), 81–91 (2002)MATHCrossRefGoogle Scholar
  7. 7.
    W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming: An Introduction (Morgan Kaufmann, Los Altos, 1997)Google Scholar
  8. 8.
    S. Bleuler, M. Brack, L. Thiele, E. Zitzler, Multiobjective genetic programming: reducing bloat using SPEA2. in Proceedings of the Congress on evolutionary computation (CEC), 1, pp. 536–543 (2001)Google Scholar
  9. 9.
    T. Blickle, L. Thiele, Genetic programming and redundancy. in Genetic Algorithms within the Framework of Computation (Workshop at KI-94) (1994)Google Scholar
  10. 10.
    R. Bock, A. Chilingarian, M. Gaug, F. Hakl, T. Hengstebeck, M. Jiřina, J. Klaschka, E. Kotrč, P. Savickỳ, S. Towers et al., Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 516(2), 511–528 (2004)CrossRefGoogle Scholar
  11. 11.
    M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)CrossRefGoogle Scholar
  12. 12.
    E.K. Burke, M. Hyde, G. Kendall, J. Woodward, A genetic programming hyper-heuristic approach for evolving 2-D strip packing heuristics. IEEE Trans. Evol. Comput. 14(6), 942–958 (2010)CrossRefGoogle Scholar
  13. 13.
    S. Dignum, R. Poli, Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1588–1595 (2007)Google Scholar
  14. 14.
    S. Dignum, R. Poli, Crossover, sampling, bloat and the harmful effects of size limits. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 158–169 (2008)Google Scholar
  15. 15.
    S. Dignum, R. Poli, Operator equalisation and bloat free GP. in Proceedings of the European conference on Genetic Programming (EuroGP), pp. 110–121 (2008)Google Scholar
  16. 16.
    K. Dimitrios, K. Aigli, T. Konstantinos, L. Spiros, T. Athanasios, M. Seferina, Where we stand, where we are moving: surveying computational techniques for identifying miRNA genes and uncovering their regulatory role. J. Biomed. Inform. 46(3), 563–573 (2013)CrossRefGoogle Scholar
  17. 17.
    P.G. Espejo, S. Ventura, F. Herrera, A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(2), 121–144 (2010)CrossRefGoogle Scholar
  18. 18.
    J. Fitzgerald, R. Azad, C. Ryan, Bootstrapping to reduce bloat and improve generalisation in genetic programming. in Companion Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 141–142 (2013)Google Scholar
  19. 19.
    F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MATHMathSciNetGoogle Scholar
  20. 20.
    C. Gagné, M. Parizeau, Genericity in evolutionary computation software tools: principles and case study. Int. J. Artif. Intel. Tools 15(2), 173–194 (2006)CrossRefGoogle Scholar
  21. 21.
    S. Gelly, O. Teytaud, N. Bredeche, M. Schoenauer, A statistical learning theory approach of bloat. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1783–1784 (2005)Google Scholar
  22. 22.
    L. Guo, D. Rivero, J. Dorado, C.R. Munteanu, A. Pazos, Automatic feature extraction using genetic programming: an application to epileptic EEG classification. Expert Systems Appl. 38(8), 10425–10436 (2011)CrossRefGoogle Scholar
  23. 23.
    K. Harries, P. Smith, Code growth, explicitly defined introns and alternative selection schemes. Evol. Comput. 6(4), 346–364 (1998)Google Scholar
  24. 24.
    M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 70–82 (2003)Google Scholar
  25. 25.
    K. Kinnear Jr., Evolving a sort: lessons in genetic programming. in Proceedings of the IEEE International Conference on Neural Networks (ICNN), pp. 881–888 (1993)Google Scholar
  26. 26.
    D. Kinzett, M. Johnston, M. Zhang, Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evol. Intell. 2(4), 151–168 (2009)CrossRefGoogle Scholar
  27. 27.
    A. Kordon, G. Smits, E. Jordaan, E. Rightor, Robust soft sensors based on integration of genetic programming, analytical neural networks, and support vector machines. in Proceedings of the IEEE International Conference on E-Commerce Technology, 1 (2002)Google Scholar
  28. 28.
    J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)MATHGoogle Scholar
  29. 29.
    J.R. Koza, Human-competitive results produced by genetic programming. Genet. Program. Evolv. Mach. 11(3–4), 251 (2010)CrossRefGoogle Scholar
  30. 30.
    W. Langdon, R. Poli, Fitness causes bloat. in Soft Computing in Engineering Design and Manufacturing, (Springer, London, 1998), pp. 13–22Google Scholar
  31. 31.
    W. Langdon, T. Soule, R. Poli, J. Foster. The evolution of size and shape. in Advances in Genetic Programming III, chapter 8 (MIT Press, 1999) pp. 163–190Google Scholar
  32. 32.
    W.B. Langdon, R. Poli, Foundations of Genetic Programming (Springer, Berlin, 2002)MATHCrossRefGoogle Scholar
  33. 33.
    S. M. Lee, D. S. Kim, J. H. Kim, J. S. Park, Spam detection using feature selection and parameters optimization. in Proceedings of the International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), (Washington, DC, USA, 2010) pp. 883–888Google Scholar
  34. 34.
    S. Luke, L. Panait, Fighting bloat with nonparametric parsimony pressure. in Proceedings of Parallel Problem Solving from Nature (PPSN), pp. 411–421 (2002)Google Scholar
  35. 35.
    S. Luke, L. Panait, Lexicographic parsimony pressure. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 829–836 (2002)Google Scholar
  36. 36.
    S. Luke, L. Panait, A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309–344 (2006)CrossRefGoogle Scholar
  37. 37.
    J. F. Miller, P. Thomson, Cartesian genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 121–132 (2000)Google Scholar
  38. 38.
    P. Nordin, W. Banzhaf et al., Complexity compression and evolution. in Proceedings of the International Conference on Genetic Algorithms (ICGA), pp. 310–317 (1995)Google Scholar
  39. 39.
    M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program. Evolv. Mach. 11(3–4), 339–363 (2010)CrossRefGoogle Scholar
  40. 40.
    L. Pagie, P. Hogeweg, Evolutionary consequences of coevolving targets. Evol. Comput. 5(4), 401–418 (1997)CrossRefGoogle Scholar
  41. 41.
    L. Panait, S. Luke, Alternative bloat control methods. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 630–641 (2004)Google Scholar
  42. 42.
    R. Poli, A simple but theoretically-motivated method to control bloat in genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 204–217 (2003)Google Scholar
  43. 43.
    R. Poli, Covariant tarpeian method for bloat control in genetic programming. in Genetic Programming Theory and Practice VIII, (Springer, 2011), pp. 71–89Google Scholar
  44. 44.
    R. Poli, W. B. Langdon, S. Dignum, On the limiting distribution of program sizes in tree-based genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 193–204 (2007)Google Scholar
  45. 45.
    R. Poli, W. B. Langdon, N. F. McPhee, A field guide to genetic programming. freely. http://www.gp-field-guide.org.uk (2008)
  46. 46.
    R. Poli, N. F. McPhee, Exact schema theorems for gp with one-point and standard crossover operating on linear structures and their application to the study of the evolution of size. in Proceedings of the European Conference on Genetic Programming (EuroGP) (2001)Google Scholar
  47. 47.
    R. Poli, N.F. McPhee, General schema theory for genetic programming with subtree-swapping crossover: Part II. Evol. Comput. 11(2), 169–206 (2003)CrossRefGoogle Scholar
  48. 48.
    R. Poli, N. F. McPhee, Parsimony pressure made easy. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) pp. 1267–1274 (2008)Google Scholar
  49. 49.
    R.Y. Rubinstein, D.P. Kroese, Simulation and the Monte Carlo method (Wiley, New York, 2008)MATHGoogle Scholar
  50. 50.
    S. Silva, Reassembling operator equalisation: a secret revealed. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 1395–1402 (2011)Google Scholar
  51. 51.
    S. Silva, E. Costa, Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet. Program. Evolv. Mach. 10(2), 141–179 (2009)MathSciNetCrossRefGoogle Scholar
  52. 52.
    S. Silva, S. Dignum, Extending operator equalisation: fitness based self adaptive length distribution for bloat free GP. in Proc. of the European Conference on Genetic Programming (EuroGP), pp. 159–170 (2009)Google Scholar
  53. 53.
    S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program. Evolv. Mach. 13(2), 197–238 (2011)CrossRefGoogle Scholar
  54. 54.
    S. Silva, L. Vanneschi, Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction. in Proceedings of Genetic and Evolutionary Computation Conference (GECCO), pp. 1115–1122 (2009)Google Scholar
  55. 55.
    S. Silva, L. Vanneschi, The importance of being flat—studying the program length distributions of operator equalisation. in Genetic Programming Theory and Practice IX, (Springer, 2011), pp. 211–233Google Scholar
  56. 56.
    T. Soule, J.A. Foster, Effects of code growth and parsimony pressure on populations in genetic programming. Evolut. Comput. 6(4), 293–309 (1998)CrossRefGoogle Scholar
  57. 57.
    T. Soule, R.B. Heckendorn, An analysis of the causes of code growth in genetic programming. Genet. Program. Evolv. Machin. 3(3), 283–309 (2002)MATHCrossRefGoogle Scholar
  58. 58.
    W. Tackett, Recombination, selection, and the genetic construction of computer programs. PhD thesis, (University of Southern California, 1994)Google Scholar
  59. 59.
    A. Teller, M. Veloso, Program evolution for data mining. Int. J. Expert Syst. Res. Appl. 8, 213–236 (1995)CrossRefGoogle Scholar
  60. 60.
    M. Tomassini, L. Vanneschi, J. Cuendet, F. Fernández, A new technique for dynamic size populations in genetic programming. in Proceedings of the Congress on Evolutionary Computation (CEC), 1, pp. 486–493 (2004)Google Scholar
  61. 61.
    L. Trujillo, E. Naredo, Y. Martínez, Preliminary study of bloat in genetic programming with behavior-based search. in EVOLVE-A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation IV, (Springer, 2013) pp. 293–305Google Scholar
  62. 62.
    L. Trujillo, S. Silva, P. Legrand, L. Vanneschi, An empirical study of functional complexity as an indicator of overfitting in genetic programming. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 262–273 (2011)Google Scholar
  63. 63.
    N.Q. Uy, N.X. Hoai, M. O’Neill, R.I. McKay, E. Galván-López, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolv. Machin. 12(2), 91–119 (2011)CrossRefGoogle Scholar
  64. 64.
    L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp. 877–884 (2010)Google Scholar
  65. 65.
    E.J. Vladislavleva, G.F. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evolut. Comput. 13(2), 333–349 (2009)CrossRefGoogle Scholar
  66. 66.
    P.A. Whigham, G. Dick, Implicitly controlling bloat in genetic programming. IEEE Trans. Evolut. Comput. 14(2), 173–190 (2010)CrossRefGoogle Scholar
  67. 67.
    D.R. White, J. McDermott, M. Castelli, L. Manzoni, B.W. Goldman, G. Kronberger, W. Jaśkowski, U.-M. O’Reilly, S. Luke, Better GP benchmarks: community survey results and proposals. Genetic Program. Evolv. Mach. 14(1), 3–29 (2013)CrossRefGoogle Scholar
  68. 68.
    L. Wilkinson, A. Anand, D. N. Tuan, CHIRP: a new classifier based on composite hypercubes on iterated random projections. in Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp. 6–14 (2011)Google Scholar
  69. 69.
    J. Yu, J. Yu, A.A. Almal, S.M. Dhanasekaran, D. Ghosh, W.P. Worzel, A.M. Chinnaiyan, Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292 (2007)CrossRefGoogle Scholar
  70. 70.
    M. Zhang, P. Wong, Genetic programming for medical classification: a program simplification approach. Genetic Program. Evolv. Mach. 9(3), 229–255 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Marc-André Gardner
    • 1
  • Christian Gagné
    • 1
  • Marc Parizeau
    • 1
  1. 1.Laboratoire de vision et systèmes numériques, Département de génie électrique et de génie informatiqueUniversité LavalQuebecCanada

Personalised recommendations