Natural Computing

, Volume 7, Issue 4, pp 589–613 | Cite as

Repeated patterns in genetic programming

  • W. B. LangdonEmail author
  • W. Banzhaf


Evolved genetic programming trees contain many repeated code fragments. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail using depth vs. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, sensitivity analysis, syntactic and semantic fitness correlations. Programs evolve in a self-similar fashion, akin to fractal random trees, with diffuse introns. Data mining frequent patterns reveals that as software is progressively improved a large proportion of it is exactly repeated subtrees as well as exactly repeated subgraphs. We relate this emergent phenomenon to building blocks in GP and suggest GP works by jumbling subtrees which already have high fitness on the whole problem to give incremental improvements and create complete solutions with multiple identical components of different importance.


Genetic alogorithms ALU SINE Frequent subgraphs Frequent subtrees Macky-Glass Poly-10 Nuclear protein localisation Tiny GP GPquick Evolution of program shape Sensitivity analysis 



This work was carried out while WBL was at University College, London and Essex University. WB thanks NSERC for grant RGPIN 283304-04.


  1. Achaz G, Rocha EPC, Netter P, Coissac E (2002) Origin and fate of repeats in bacteria. Nucleic Acids Res 30:2987–2994CrossRefGoogle Scholar
  2. Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming – an introduction. Morgan KaufmannGoogle Scholar
  3. Blickle T (1996) Theory of evolutionary algorithms and application to system synthesis. PhD thesis, Swiss Federal Institute of Technology, ZurichGoogle Scholar
  4. Britten RJ, Kohnen DE (1968) Repeated sequences in DNA. Science 161:529–540CrossRefGoogle Scholar
  5. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT PressGoogle Scholar
  6. Langdon WB (1998) Genetic programming and data structures. Kluwer, BostonzbMATHGoogle Scholar
  7. Langdon WB (2000) Size fair and homologous tree genetic programming crossovers. Genet Program Evol Mach 1(1/2):95–119zbMATHCrossRefGoogle Scholar
  8. Langdon WB, Banzhaf W (2005a) Repeated sequences in linear genetic programming genomes. Complex Syst 15(4):285–306MathSciNetGoogle Scholar
  9. Langdon WB, Banzhaf W (2005b) Repeated patterns in tree genetic programming. In: Keijzer M, Tettamanzi A, Collet P, van Hemert JI, Tomassini M (eds) Proceedings of the 8th European conference on genetic programming, vol 3447 of Lecture Notes in Computer Science, Lausanne. Springer, Switzerland, pp 190–202Google Scholar
  10. Langdon WB, Barrett SJ (2004) Genetic programming in data mining for drug discovery. In: Ghosh A, Jain LC (eds) Evolutionary computing in data mining, vol 163 of Studies in fuzziness and soft computing, chapter 10. Springer, pp 211– 235Google Scholar
  11. Langdon WB, Poli R (2002) Foundations of genetic programming. Springer-VerlagGoogle Scholar
  12. Langdon WB, Soule T, Poli R, Foster JA (1999) The evolution of size and shape. In: Spector L, Langdon WB, O’Reilly U-M, Angeline PJ (eds) Advances in genetic programming 3, chapter 8. MIT Press, pp 163–190Google Scholar
  13. Lupski JR, Weinstock GM (1992) Short, interspersed repetitive DNA sequences in prokaryotic genomes. J Bacteriol 174:4525–4529Google Scholar
  14. Oakley H (1994) Two scientific applications of genetic programming: stack filters and non-linear equation fitting to chaotic data. In: Kinnear Jr KE (ed) Advances in genetic programming, chapter 17. MIT Press, pp 369–389Google Scholar
  15. O’Reilly U-M, Oppacher U-M (1995) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD, Vose MD (eds) Foundations of genetic algorithms 3, 31 July–2 August 1994, Estes Park, Colorado, USA. Morgan Kaufmann, pp 73–88Google Scholar
  16. Patience C, Wilkinson DA, Weiss RA (1997) Our retroviral heritage. Trends Genet 13:116–120CrossRefGoogle Scholar
  17. Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Ryan C, Soule T, Keijzer M, Tsang E, Poli R, Costa E (eds) Genetic programming, Proceedings of EuroGP’2003, vol 2610 of LNCS, Essex, UK. Springer-Verlag, pp 204–217Google Scholar
  18. Poli R (2004) TinyGP. See TinyGP GECCO 2004 competition at Scholar
  19. Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26(9):2230–2236CrossRefGoogle Scholar
  20. Sedgewick R, Flajolet P (1996) An introduction to the analysis of algorithms. Addison-WesleyGoogle Scholar
  21. Shannon CE and Weaver W (1964) The mathematical theory of communication. The University of Illinois Press, UrbanaGoogle Scholar
  22. Smit AFA (1996) The origin of interspersed repeats in the human genome. Curr Opin Genet Dev 6:743–748CrossRefGoogle Scholar
  23. Syswerda G (1989) Uniform crossover in genetic algorithms. In Schaffer JD (ed) Proceedings of the third international conference on genetic algorithms, 4–7 June, George Mason University. Morgan Kaufmann, pp 2–9Google Scholar
  24. Toth G, Gaspari Z, Jurka J (2000) Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res 10:967–981CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2007

Authors and Affiliations

  1. 1.Department of Computer Science, Essex Institute of TechnologyUniversity of EssexColchesterUK
  2. 2.Department of Computer ScienceMemorial University of NewfoundlandSt. John’sCanada

Personalised recommendations