Abstract
In tree-based genetic programming, there is a tendency for the size of the programs to increase from generation to generation, a phenomenon known as bloat. It is standard practice to place some form of control on program size either by limiting the number of nodes or the depth of the program trees, or by adding a component to the fitness function that rewards smaller programs (parsimony pressure). Others have proposed directly simplifying individual programs using algebraic methods. In this paper, we add node-based numerical simplification as a tree pruning criterion to control program size. We investigate the effect of on-line program simplification, both algebraic and numerical, on program size and resource usage. We also investigate the distribution of building blocks within a genetic programming population and how this is changed by using simplification. We show that simplification results in reductions in expected program size, memory use and computation time. We also show that numerical simplification performs at least as well as algebraic simplification, and in some cases will outperform algebraic simplification. We further show that although the two on-line simplification methods destroy some existing building blocks, they effectively generate new more diverse building blocks during evolution, which compensates for the negative effect of disruption of building blocks.
Similar content being viewed by others
Notes
When we use the term building block we mean a subtree of the specified depth. It may occur at any point in the program of which it is a part. We do not imply anything about the fitness of the building block.
By effectiveness, we mean fitness for a symbolic regression problem and classification accuracy on the test set for a classification problem.
References
Soule T, Foster JA, Dickinson J (1996) Code growth in genetic programming. In: Koza JR et al (eds) Genetic programming 1996: proceedings of the first annual conference. Stanford University, MIT Press, USA, pp 215–223
Soule T, Heckendorn RB (2002) An analysis of the causes of code growth in genetic programming. Genet Program Evolvable Mach 3(3):283–309
Blickle T, Thiele, L (1994) Genetic programming and redundancy. In: Hopf J (ed) Genetic algorithms within the framework of evolutionary computation. Max-Planck-Institut für Informatik (MPI-I-94-241), pp 33–38
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Soule T, Foster JA (1997) Support for multiple causes of code growth in GP. Position paper at the workshop on evolutionary computation with variable size representation at ICGA 1997, July 20
Soule T (1998) Code growth in genetic programming. PhD thesis, University of Idaho, Moscow, Idaho, USA, May 15
Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming: an introduction on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco
Zhang M, Smart W (2006) Using gaussian distribution to construct fitness functions in genetic programming for multiclass object classification. Pattern Recognit Lett 27(11):1266–1274
Wong P, Zhang M (2006) Algebraic simplification of GP programs during evolution. In: Keijzer M et al (eds) GECCO 2006: proceedings of the 8th annual conference on genetic and evolutionary computation, vol 1. ACM Press, USA, pp 927–934
Nordin P, Banzhaf W (1995) Complexity compression and evolution. In: Eshelman L (ed) Genetic algorithms: proceedings of the sixth international conference (ICGA95). Morgan Kaufmann, Pittsburgh, pp 310–317, 15–19 July
Parrott D, Li X, Ciesielski V (2005) Multi-objective techniques in genetic programming for evolving classifiers. In: Corne D, Michalewicz Z et al (eds) Proceedings of the 2005 IEEE congress on evolutionary computation vol 2. IEEE Press, Edinburgh, pp 1141–1148, 2–5 Sep
Zhang BT, Mühlenbein H (1995) Balancing accuracy and parsimony in genetic programming. Evol Comput 3(1):17–38
Zhang M, Bhowan U (2004) Program size and pixel statistics in genetic programming for object detection. In: Raidl GR, Cagnoni S, et al. (eds) Applications of evolutionary computing, evoworkshops 2004. vol 3005 LNCS, Springer, Coimbra, pp 379–388, 5–7 April
Luke S, Panait L (2002) Lexicographic parsimony pressure In: Langdon WB et al (eds) GECCO 2002: proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann, New York, pp 829–836, 9–13 July
de Jong ED, Pollack JB (2003) Multi-objective methods for tree size control. Genet Program Evolvable Mach 4(3):211–233
Luke S, Panait L (2002) Fighting bloat with nonparametric parsimony pressure. In: Merelo Guervos JJ, Adamidis P, Beyer HG, Fernandez-Villacanas JL, Schwefel HP (eds) Proceedings of the international conference on parallel problem solving from nature (PPSN VII). Springer, The Netherland pp 411–421
Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Genetic programming, proceedings of EuroGP 2003. Springer, The Netherland pp 204–217
Luke S, Panait L (2004) Alternative bloat control methods In: Genetic and evolutionary computation—GECCO-2004, Part II (Lecture Notes in Computer Science). Springer, The Netherland pp 630–641
Blickle T, Thiele L (1994) Genetic programming and redundancy, In: Hopf J (ed) Genetic algorithms within the fram work of evolutionary computation. Workshop at KI-94, Saarbrüken, Im Stadtwald, Building 44, D-66123 Saarbrüken, Germany, Max-Planck-Institut für Informatik, MPI-I-94-241, pp 33–38
Ashlock W, Ashlock D (2005) Single parent genetic programming, In: Corne D, Mickalewicz Z, Dorigo M, Eiben G, Fogel D, Fonseca C, Greenwood G, Chen TK, Raidl G, Zalzala A, Lucas S, Paechter B, Willies J, Guervos JJM, Eberbach E, McKay B, Channon A, Tiwari A, Volkert LG, Asklock D, Schoenauer M (eds) Proceedings 2005 IEEE congress evolutionary computation, vol 2. IEEE Press, Edinburgh, pp 1172–1179, 2–5 September
Langdon WB, Poli R (1997) Fitness causes bloat. In: Chawdhry PK, Roy R, Pant RK (eds) Soft computing in engineering design and manufacturing. Springer, London, 23–27 June, pp 13–22
Nordin P, Francone F, Banzhaf W (1995) Explicitly defined introns and destructive crossover in genetic programming, In: Rosca JP (eds) Proceedings workshop on genetic programming: from theory to real-world applications. Tahoe City, pp 6--22, 9 July
Hooper D, Flann NS (1996) Improving the accuracy and robustness of genetic programming through expression simplification, In: Koza JR et al. (eds) Genetic programming 1996: proceedings of the first annual conference. Stanford University, MIT Press, CA, p 428
Wong P, Zhang M (2007) Effects of program simplification on simple building blocks in genetic programming. In: IEEE congress on evolutionary computation pp 1570–1577
Mori N, Matsumoto K (2005) A novel measure of diversity in genetic programming by means of subtree entropy. In: Proceedings 32nd SICE symposium on intelligent systems, pp 205–210 (in Japanese)
Kang M, Shin J, Hoang TH, McKay B, Essam D, Mori N, Nguyen XH (2006) Code duplication and developmental evaluation in genetic programming. In: Proceedings 2006 asia-pacific workshop on intelligent and evolutionary systems. Seoul, Korea pp 181–191
Marshall D (2001) The discrete cosine transform. http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html
Forina M, Leardi R, Armanino C, Lanteri S (1988) Parvus: an extendable package of programs for data exploration, classification and correlation. Elsevier, Amsterdam
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Samaria F, Harter AC (1994) Parameterisation of a stochastic model for human face identification. proceedings of the second IEEE workshop on applications of computer vision
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83
LaVange LM, Koch Gary G (2006) Rank score tests. Circulation 114(23):2528–2533
Acknowledgments
The authors would like to thank the anonymous referees for their time, comments and suggestions that provided great help in improving this paper. This work was supported in part by the Marsden Fund council from the government funding (08-VUW-014), administrated by the Royal Society of New Zealand, and the University Research Fund (URF09-2399/85608) at Victoria University of Wellington.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kinzett, D., Johnston, M. & Zhang, M. Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evol. Intel. 2, 151–168 (2009). https://doi.org/10.1007/s12065-009-0029-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-009-0029-9