Skip to main content
Log in

Numerical simplification for bloat control and analysis of building blocks in genetic programming

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

In tree-based genetic programming, there is a tendency for the size of the programs to increase from generation to generation, a phenomenon known as bloat. It is standard practice to place some form of control on program size either by limiting the number of nodes or the depth of the program trees, or by adding a component to the fitness function that rewards smaller programs (parsimony pressure). Others have proposed directly simplifying individual programs using algebraic methods. In this paper, we add node-based numerical simplification as a tree pruning criterion to control program size. We investigate the effect of on-line program simplification, both algebraic and numerical, on program size and resource usage. We also investigate the distribution of building blocks within a genetic programming population and how this is changed by using simplification. We show that simplification results in reductions in expected program size, memory use and computation time. We also show that numerical simplification performs at least as well as algebraic simplification, and in some cases will outperform algebraic simplification. We further show that although the two on-line simplification methods destroy some existing building blocks, they effectively generate new more diverse building blocks during evolution, which compensates for the negative effect of disruption of building blocks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. When we use the term building block we mean a subtree of the specified depth. It may occur at any point in the program of which it is a part. We do not imply anything about the fitness of the building block.

  2. By effectiveness, we mean fitness for a symbolic regression problem and classification accuracy on the test set for a classification problem.

References

  1. Soule T, Foster JA, Dickinson J (1996) Code growth in genetic programming. In: Koza JR et al (eds) Genetic programming 1996: proceedings of the first annual conference. Stanford University, MIT Press, USA, pp 215–223

  2. Soule T, Heckendorn RB (2002) An analysis of the causes of code growth in genetic programming. Genet Program Evolvable Mach 3(3):283–309

    Article  MATH  Google Scholar 

  3. Blickle T, Thiele, L (1994) Genetic programming and redundancy. In: Hopf J (ed) Genetic algorithms within the framework of evolutionary computation. Max-Planck-Institut für Informatik (MPI-I-94-241), pp 33–38

  4. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  5. Soule T, Foster JA (1997) Support for multiple causes of code growth in GP. Position paper at the workshop on evolutionary computation with variable size representation at ICGA 1997, July 20

  6. Soule T (1998) Code growth in genetic programming. PhD thesis, University of Idaho, Moscow, Idaho, USA, May 15

  7. Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming: an introduction on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco

  8. Zhang M, Smart W (2006) Using gaussian distribution to construct fitness functions in genetic programming for multiclass object classification. Pattern Recognit Lett 27(11):1266–1274

    Article  Google Scholar 

  9. Wong P, Zhang M (2006) Algebraic simplification of GP programs during evolution. In: Keijzer M et al (eds) GECCO 2006: proceedings of the 8th annual conference on genetic and evolutionary computation, vol 1. ACM Press, USA, pp 927–934

  10. Nordin P, Banzhaf W (1995) Complexity compression and evolution. In: Eshelman L (ed) Genetic algorithms: proceedings of the sixth international conference (ICGA95). Morgan Kaufmann, Pittsburgh, pp 310–317, 15–19 July

  11. Parrott D, Li X, Ciesielski V (2005) Multi-objective techniques in genetic programming for evolving classifiers. In: Corne D, Michalewicz Z et al (eds) Proceedings of the 2005 IEEE congress on evolutionary computation vol 2. IEEE Press, Edinburgh, pp 1141–1148, 2–5 Sep

  12. Zhang BT, Mühlenbein H (1995) Balancing accuracy and parsimony in genetic programming. Evol Comput 3(1):17–38

    Article  Google Scholar 

  13. Zhang M, Bhowan U (2004) Program size and pixel statistics in genetic programming for object detection. In: Raidl GR, Cagnoni S, et al. (eds) Applications of evolutionary computing, evoworkshops 2004. vol 3005 LNCS, Springer, Coimbra, pp 379–388, 5–7 April

  14. Luke S, Panait L (2002) Lexicographic parsimony pressure In: Langdon WB et al (eds) GECCO 2002: proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann, New York, pp 829–836, 9–13 July

  15. de Jong ED, Pollack JB (2003) Multi-objective methods for tree size control. Genet Program Evolvable Mach 4(3):211–233

    Article  Google Scholar 

  16. Luke S, Panait L (2002) Fighting bloat with nonparametric parsimony pressure. In: Merelo Guervos JJ, Adamidis P, Beyer HG, Fernandez-Villacanas JL, Schwefel HP (eds) Proceedings of the international conference on parallel problem solving from nature (PPSN VII). Springer, The Netherland pp 411–421

  17. Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Genetic programming, proceedings of EuroGP 2003. Springer, The Netherland pp 204–217

  18. Luke S, Panait L (2004) Alternative bloat control methods In: Genetic and evolutionary computation—GECCO-2004, Part II (Lecture Notes in Computer Science). Springer, The Netherland pp 630–641

  19. Blickle T, Thiele L (1994) Genetic programming and redundancy, In: Hopf J (ed) Genetic algorithms within the fram work of evolutionary computation. Workshop at KI-94, Saarbrüken, Im Stadtwald, Building 44, D-66123 Saarbrüken, Germany, Max-Planck-Institut für Informatik, MPI-I-94-241, pp 33–38

  20. Ashlock W, Ashlock D (2005) Single parent genetic programming, In: Corne D, Mickalewicz Z, Dorigo M, Eiben G, Fogel D, Fonseca C, Greenwood G, Chen TK, Raidl G, Zalzala A, Lucas S, Paechter B, Willies J, Guervos JJM, Eberbach E, McKay B, Channon A, Tiwari A, Volkert LG, Asklock D, Schoenauer M (eds) Proceedings 2005 IEEE congress evolutionary computation, vol 2. IEEE Press, Edinburgh, pp 1172–1179, 2–5 September

  21. Langdon WB, Poli R (1997) Fitness causes bloat. In: Chawdhry PK, Roy R, Pant RK (eds) Soft computing in engineering design and manufacturing. Springer, London, 23–27 June, pp 13–22

  22. Nordin P, Francone F, Banzhaf W (1995) Explicitly defined introns and destructive crossover in genetic programming, In: Rosca JP (eds) Proceedings workshop on genetic programming: from theory to real-world applications. Tahoe City, pp 6--22, 9 July

  23. Hooper D, Flann NS (1996) Improving the accuracy and robustness of genetic programming through expression simplification, In: Koza JR et al. (eds) Genetic programming 1996: proceedings of the first annual conference. Stanford University, MIT Press, CA, p 428

  24. Wong P, Zhang M (2007) Effects of program simplification on simple building blocks in genetic programming. In: IEEE congress on evolutionary computation pp 1570–1577

  25. Mori N, Matsumoto K (2005) A novel measure of diversity in genetic programming by means of subtree entropy. In: Proceedings 32nd SICE symposium on intelligent systems, pp 205–210 (in Japanese)

  26. Kang M, Shin J, Hoang TH, McKay B, Essam D, Mori N, Nguyen XH (2006) Code duplication and developmental evaluation in genetic programming. In: Proceedings 2006 asia-pacific workshop on intelligent and evolutionary systems. Seoul, Korea pp 181–191

  27. Marshall D (2001) The discrete cosine transform. http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html

  28. Forina M, Leardi R, Armanino C, Lanteri S (1988) Parvus: an extendable package of programs for data exploration, classification and correlation. Elsevier, Amsterdam

  29. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  30. Samaria F, Harter AC (1994) Parameterisation of a stochastic model for human face identification. proceedings of the second IEEE workshop on applications of computer vision

  31. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83

    Article  Google Scholar 

  32. LaVange LM, Koch Gary G (2006) Rank score tests. Circulation 114(23):2528–2533

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their time, comments and suggestions that provided great help in improving this paper. This work was supported in part by the Marsden Fund council from the government funding (08-VUW-014), administrated by the Royal Society of New Zealand, and the University Research Fund (URF09-2399/85608) at Victoria University of Wellington.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Kinzett.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kinzett, D., Johnston, M. & Zhang, M. Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evol. Intel. 2, 151–168 (2009). https://doi.org/10.1007/s12065-009-0029-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-009-0029-9

Keywords

Navigation