Advertisement

Natural Computing

, Volume 14, Issue 2, pp 303–330 | Cite as

Empirical modeling using genetic programming: a survey of issues and approaches

  • Vipul K. Dabhi
  • Sanjay Chaudhary
Article

Abstract

Empirical modeling, which is a process of developing a mathematical model of a system from experimental data, has attracted many researchers due to its wide applicability. Finding both the structure and appropriate numeric coefficients of the model is a real challenge. Genetic programming (GP) has been applied by many practitioners to solve this problem. However, there are a number of issues which require careful attention while applying GP to empirical modeling problems. We begin with highlighting the importance of these issues including: computational efforts in evolving a model, premature convergence, generalization ability of an evolved model, building hierarchical models, and constant creation techniques. We survey and classify different approaches used by GP researchers to deal with the mentioned issues. We present different performance measures which are useful to report the results of analysis of GP runs. We hope this work would help the reader by facilitating to understand key concepts and practical issues of GP and steering in selection of an appropriate approach to solve a particular issue effectively.

Keywords

Genetic programming Research issues Efficiency of genetic programming Convergence rate of genetic programming Generalization of genetic programming solutions 

References

  1. Altenberg L (1994) The evolution of evolvability in genetic programming. In: Kinnear Jr. KE (eds) Advances in genetic programming. MIT Press, Cambridge, MA, pp 47–74Google Scholar
  2. Angeline PJ, Pollack J (1993) Evolutionary module acquisition. In: Fogel D, Atmar W (eds) Proceedings of the second annual conference on evolutionary programming, La Jolla, CA, pp 154–163Google Scholar
  3. Babovic V, Keijzer M (2000) Genetic programming as a model induction engine. J Hydroinform 2(1):35–60Google Scholar
  4. Barr RS, Golden BL, Kelly JP, Resende MG, Stewart Jr. WR (1995) Designing and reporting on computational experiments with heuristic methods. J Heuristics 1(1):9–32CrossRefzbMATHGoogle Scholar
  5. Beadle L, Johnson C (2008) Semantically driven crossover in genetic programming. In: Evolutionary computation, 2008. CEC 2008. IEEE World Congress on Computational Intelligence, pp 111–116Google Scholar
  6. Bentley PJ, Wakefield JP (1996) An analysis of multiobjective optimization within genetic algorithms. Technical Report ENGPJB96 96:1–14Google Scholar
  7. Burke E, Gustafson S, Kendall G (2004) Diversity in genetic programming: an analysis of measures and correlation with fitness. IEEE Trans Evol Comput 8(1):47–62CrossRefGoogle Scholar
  8. Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms, vol. 1. Springer, Norwell, MAGoogle Scholar
  9. Coello CAC (1998) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1(3):269–308CrossRefGoogle Scholar
  10. Costelloe D, Ryan C (2009) On improving generalisation in genetic programming. In: Proceedings of the 12th European conference on genetic programming, EuroGP ’09, Springer-Verlag, Berlin, Heidelberg, pp 61–72Google Scholar
  11. Crawford-Marks R, Spector L (2002) Size control via size fair genetic operators in the pushgp genetic programming system. In: Proceedings of the genetic and evolutionary computation conference, GECCO ’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 733–739Google Scholar
  12. Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: Nsga-ii. In: Proceedings of the 6th international conference on parallel problem solving from nature, PPSN VI, Springer-Verlag, London, pp 849–858Google Scholar
  13. de Jong ED, Watson RA, Pollack JB (2001) Reducing bloat and promoting diversity using multi-objective methods. Proceedings of the genetic and evolutionary computation conference (GECCO-2001), pp 11–18Google Scholar
  14. de Vega FF, Tomassini M, Vanneschi L, Bucher L (2000) A distributed computing environment for genetic programming using MPI. In: Proceedings of the 7th European PVM/MPI users’ group meeting on recent advances in parallel virtual machine and message passing interface, Springer, London, UK, pp 322–329Google Scholar
  15. Dignum S, Poli R (2008) Operator equalisation and bloat free gp. In: Proceedings of the 11th European conference on genetic programming, EuroGP’08, Springer-Verlag, Berlin, Heidelberg, pp 110–121Google Scholar
  16. Eiben A, Jelasity M (2002) A critical note on experimental research methodology in ec. In: Proceedings of the 2002 Congress on evolutionary computation, 2002. CEC’02., vol 1, pp 582–587Google Scholar
  17. Eiben A, Smit S (2011) Parameter tuning for configuring and analyzing evolutionary algorithms, pp 19–31Google Scholar
  18. Ekárt A, Németh SZ (2001) Selection based on the pareto nondomination criterion for controlling code growth in genetic programming. Genet Program Evolvable Mach 2(1):61–73CrossRefzbMATHGoogle Scholar
  19. Eshelman LJ, Schaffer JD (1993) Crossover’s niche. In: Proceedings of the 5th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 9–14Google Scholar
  20. Esparcia-Alcazar AI, Sharman K (1997) Learning schemes for genetic programming. In: Late breaking papers at the 1997 genetic programming conference, pp 57–65Google Scholar
  21. Ferreira C (2002) Gene expression programming in problem solving. In: Soft computing and industry, Springer, Berlin, pp 635–653.Google Scholar
  22. Ferreira C (2003) Function finding and the creation of numerical constants in gene expression programming. Springer, Berlin, pp 257–265Google Scholar
  23. Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In: Proceedings of the 5th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 416–423Google Scholar
  24. Gagné C, Parizeau M, Dubreuil M (2003) Distributed beagle: an environment for parallel and distributed evolutionary computations. In: Proceedings of the 17th annual international symposium on high performance computing systems and applications (HPCS), vol 2003. NRC Research Press, Canada, pp 201–208Google Scholar
  25. Gagné C, Schoenauer M, Parizeau M, Tomassini M (2006) Genetic programming, validation sets, and parsimony pressure. In: Proceedings of the 9th European conference on genetic programming, EuroGP’06, Springer-Verlag, Berlin, Heidelberg, pp 109–120Google Scholar
  26. Gustafson S, Burke E, Krasnogor N (2005) On improving genetic programming for symbolic regression. In: The 2005 IEEE congress on evolutionary computation, 2005. vol. 1, pp 912–919Google Scholar
  27. Guyon I, Alamdari A, Dror G, Buhmann, J (2006) Performance prediction challenge. In: International joint conference on neural networks, 2006. IJCNN ’06, pp 1649–1656Google Scholar
  28. Handley S (1994) On the use of a directed acyclic graph to represent a population of computer programs. In: Proceedings of the First IEEE Conference on evolutionary computation, 1994. IEEE world congress on computational intelligence, vol 1, pp 154–159Google Scholar
  29. Harmeling S, Dornhege G, Tax D, Meinecke F, Müller KR (2006) From outliers to prototypes: ordering data. Neurocomputing 69(13):1608–1618CrossRefGoogle Scholar
  30. Haynes T (1998) Collective adaptation: the exchange of coding segments. Evol Comput 6(4):311–338CrossRefGoogle Scholar
  31. Hengproprohm S, Chongstitvatana P (2001) Selective crossover in genetic programming. In: ISCIT international symposium on communications and information technologies. ChiangMai Orchid, ChiangMai ThailandGoogle Scholar
  32. Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, CambridgeGoogle Scholar
  33. Horn J, Nafpliotis N, Goldberg D (1994) A niched pareto genetic algorithm for multiobjective optimization. In: Proceedings of the First IEEE Conference on evolutionary computation, 1994. IEEE world congress on computational intelligence, vol 1, pp 82–87Google Scholar
  34. Howard L, D’Angelo D (1995) The ga-p: a genetic algorithm and genetic programming hybrid. IEEE Expert 10(3):11–15CrossRefGoogle Scholar
  35. Ito T, Iba H, Sato S (1998) Non-destructive depth-dependent crossover for genetic programming. In: Genetic programming, Springer, London, pp 71–82.Google Scholar
  36. Jin R, Chen W, Simpson TW (2000) Comparative studies of metamodeling techniques under multiple modeling criteria. Struct Multi Optim 23:1–13CrossRefGoogle Scholar
  37. Jin Y, Olhofer M, Sendhoff B (2001) Dynamic weighted aggregation for evolutionary multi-objective optimization: why does it work and how? In: Proceedings of the genetic and evolutionary computation conference GECCO, Morgan Kaufmann, pp 1042–1049Google Scholar
  38. Keijzer M (1996) Advances in genetic programming. MIT Press, Cambridge, MA, pp 259–278Google Scholar
  39. Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 70–82Google Scholar
  40. Keijzer M (2004) Alternatives in subtree caching for genetic programming. In: Genetic programming, Springer, Berlin, pp 328–337Google Scholar
  41. Keijzer M, Babovic V (2000) Genetic programming within a framework of computer-aided discovery of scientific knowledge. In: Whitley D, Goldberg D, Cantu-Paz D, Spector L, Parmee I, Beyer HG (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2000), Morgan Kaufmann, Las Vegas, Nevada, pp 543–550Google Scholar
  42. Knowles JD, Corne DW (2000) Approximating the nondominated front using the pareto archived evolution strategy. Evol Comput 8(2):149–172CrossRefGoogle Scholar
  43. Kotanchek M, Smits G, Vladislavleva E (2007) Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice V, vol. 5. Springer. Genetic and Evolutionary Computation, Ann Arbor, pp 201–220.Google Scholar
  44. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection, vol 1. MIT Press, CambridgeGoogle Scholar
  45. Koza JR (1995) Evolving the architecture of a multi-part program in genetic programming using architecture-altering operations. In: McDonnell JR, Reynolds RG, Fogel DB (eds) Evolutionary programming IV proceedings of the fourth annual conference on evolutionary programming, MIT Press, San Diego, CA, pp 695–717.Google Scholar
  46. Langdon WB (1998) Genetic programming and data structures: genetic programming + data structures = automatic programming!, vol 1. Springer, BerlinCrossRefGoogle Scholar
  47. Langdon WB (2000) Size fair and homologous tree crossovers for tree genetic programming. Genet Program Evolvable Mach 1(1−2):95–119CrossRefzbMATHGoogle Scholar
  48. Langdon W, Nordin J (2000) Seeding genetic programming populations. In: Poli R, Banzhaf W, Langdon W, Miller J, Nordin P, Fogarty T (eds) Genetic programming, lecture notes in computer science, vol. 1802, vol. 1802. Springer, Berlin Heidelberg, pp 304–315Google Scholar
  49. Langdon WB, Poli R (1998) Fitness causes bloat: mutation. In: Chawdhry PK, Roy R, Pan RK (eds) Second on-line world conference on soft computing in engineering design and manufacturing, Springer-Verlag, London, pp 37–48Google Scholar
  50. Laumanns M, Thiele L, Zitzler E, Deb K (2002) Archiving with guaranteed convergence and diversity in multi-objective optimization. In: Proceedings of the genetic and evolutionary computation conference (GECCO), GECCO’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 439–447Google Scholar
  51. Li X, Zhou C, Nelson PC, Tirpak TM (2004) Investigation of constant creation techniques in the context of gene expression programming. In: Keijzer M (eds) Late breaking papers at the 2004 genetic and evolutionary computation conference. Seattle, Washington, USAGoogle Scholar
  52. Li X, Zhou C, Xiao W, Nelson PC (2005) Prefix gene expression programming. In: Late breaking paper at genetic and evolutionary computation conference (GECCO’2005), Washington, DC, pp 25–31Google Scholar
  53. Liu SH, Mernik M, Bryant BR (2006) Entropy-driven exploration and exploitation in evolutionary algorithms. In: Proceedings of the 2nd international conference on bioinspired optimization methods and their applications (BIOMA 2006), pp 15–24Google Scholar
  54. Liu SH, Mernik M, Bryant BR (2007) A clustering entropy-driven approach for exploring and exploiting noisy functions. In: Proceedings of the 2007 ACM symposium on applied computing, SAC’07, ACM, New York, NY, pp 738–742Google Scholar
  55. Lopes HS, Weinert WR (2004) EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems. Int J Appl Math Comput Sci 14(3):375–384zbMATHMathSciNetGoogle Scholar
  56. Luke S (2003) Modification point depth and genome growth in genetic programming. Evol Comput 11(1):67–106CrossRefGoogle Scholar
  57. Majeed H, Ryan C (2007) On the constructiveness of context-aware crossover. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, GECCO’07, ACM, New York, NY, pp 1659–1666.Google Scholar
  58. McPhee NF, Hopper NJ (1999) Analysis of genetic diversity through population history. In: Banzhaf W, Daida J, Eiben AE, Garzon MH, Honavar V, Jakiela M, Smith RE (eds) Proceedings of the genetic and evolutionary computation conference, vol 2. Morgan Kaufmann, Orlando, Florida, pp 1112–1120.Google Scholar
  59. McPhee NF, Miller JD (1995) Accurate replication in genetic programming. In: Proceedings of the 6th international conference on genetic algorithms, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 303–309Google Scholar
  60. Ngatchou P, Zarei A, El-Sharkawi M (2005) Pareto multi objective optimization. In: Proceedings of the 13th international conference on intelligent systems application to power systems, 2005, pp 84–91Google Scholar
  61. Nikolaev N, Iba H (2001) Regularization approach to inductive genetic programming. IEEE Trans Evol Comput 5(4):359–375CrossRefGoogle Scholar
  62. O’Neill M, Vanneschi L, Gustafson S, Banzhaf W (2010) Open issues in genetic programming. Genet Program Evolvable Mach 11(3-4):339–363CrossRefGoogle Scholar
  63. O’Reilly UM, Oppacher F (1994) Program search with a hierarchical variable length representation: genetic programming, simulated annealing and hill climbing. Technical ReportGoogle Scholar
  64. Orlov M, Sipper M (2011) Flight of the finch through the java wilderness. IEEE Trans Evol Comput 15(2):166–182CrossRefGoogle Scholar
  65. Poli R (1996) Some steps towards a form of parallel distributed genetic programming. In: Proceedings of the first on-line workshop on soft computing, pp 290–295Google Scholar
  66. Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 204–217Google Scholar
  67. Poli R, McPhee NF (2008) Parsimony pressure made easy. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, GECCO’08, ACM, New York, NY, pp 1267–1274Google Scholar
  68. Poli R, Langdon WB, Dignum S (2007) On the limiting distribution of program sizes in tree-based genetic programming. In: Proceedings of the 10th European conference on genetic programming, EuroGP’07, Springer-Verlag, Berlin, Heidelberg, pp 193–204Google Scholar
  69. Poli R, Vanneschi L, Langdon WB, Mcphee NF (2010) Theoretical results in genetic programming: the next ten years?. Genet Program Evolvable Mach 11(3-4):285–320CrossRefGoogle Scholar
  70. Rosca JP (1995a) Entropy-driven adaptive representation. In: Proceedings of the workshop on genetic programming: from theory to real-world applications, Morgan Kaufmann, pp 23–32.Google Scholar
  71. Rosca JP (1995b) Towards automatic discovery of building blocks in genetic programming. In: Working Notes for the AAAI Symposium on Genetic Programming, vol. 445. MIT, Cambridge, MA: AAAI, pp 78–85Google Scholar
  72. Ryan C (1994) Advances in genetic programming chap Pygmies and civil servants. MIT Press, Cambridge, MA, pp 243–263Google Scholar
  73. Ryan C, Keijzer M (2003) An analysis of diversity of constants of genetic programming. In: Proceedings of the 6th European conference on genetic programming, EuroGP’03, Springer-Verlag, Berlin, Heidelberg, pp 404–413Google Scholar
  74. Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the 1st international conference on genetic algorithms, L. Erlbaum Associates Inc., Hillsdale, NJ, pp 93–100Google Scholar
  75. Schmidt MD, Lipson H (2009) Incorporating expert knowledge in evolutionary search: a study of seeding methods. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO’09, ACM, New York, NY, pp 1091–1098.Google Scholar
  76. Silva S (2008) Controlling bloat: individual and population based approaches in genetic programming. Ph.D. thesis, Departamento de Engenharia Informatica, Universidade de CoimbraGoogle Scholar
  77. Silva S, Costa E (2009) Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet Program Evolvable Mach 10(2):141–179CrossRefMathSciNetGoogle Scholar
  78. Smits G, Vladislavleva E (2006) Ordinal pareto genetic programming. In: IEEE congress on evolutionary computation, 2006. CEC 2006, pp 3114–3120Google Scholar
  79. Smits G, Kordon A, Vladislavleva K, Jordaan E, Kotanchek M (2005) Variable selection in industrial datasets using pareto genetic programming. In: Yu T, Riolo RL, Worzel B (eds) Genetic programming theory and practice III, genetic programming, vol. 9, chap. 6. Springer, Ann Arbor, pp 79–92Google Scholar
  80. Soule T, Foster J (1998) Removal bias: a new cause of code growth in tree based evolutionary programming. In: The 1998 IEEE international conference on evolutionary computation proceedings, 1998. IEEE world congress on computational intelligence, pp 781–786Google Scholar
  81. Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248CrossRefGoogle Scholar
  82. Stinstra E, Rennen G, Teeuwen G (2006) Meta-modeling by symbolic regression and pareto simulated annealing. Internal Report No. 2006-15, Tilburg University, HollandGoogle Scholar
  83. Tackett WA (1994) Recombination, selection, and the genetic construction of computer programs. Ph.D. thesis, Los Angeles, CA, USA. Not available from Univ. Microfilms Int.Google Scholar
  84. tak Zhang B (1997) A taxonomy of control schemes for genetic code growth. In: Position paper at the workshop on evolutionary computation with variable size representation at ICGA-97. East Lansing, MI, USAGoogle Scholar
  85. Tokui N, Iha H (1999) Empirical and statistical analysis of genetic programming with linear genome. In: IEEE international conference on systems, man, and cybernetics, 1999. IEEE SMC’99 conference proceedings, vol 3, pp 610–615Google Scholar
  86. Torres S, Larre M, Torres J (2002) A string representation methodology to generate syntactically valid genetic programs. In: WSEAS transactions on systems, vol 1, Mexico, pp 290–295Google Scholar
  87. Ursem RK (2002) Diversity-guided evolutionary algorithms. In: Proceedings of the 7th international conference on parallel problem solving from nature, PPSN VII, Springer-Verlag, London, pp 462–474.Google Scholar
  88. Uy NQ, Hoai NX, O’Neill M (2009) Semantic aware crossover for genetic programming: the case for real-valued function regression. In: Proceedings of the 12th European conference on genetic programming, EuroGP’09, Springer-Verlag, Berlin, Heidelberg, pp 292–302.Google Scholar
  89. Uy NQ, Hoai NX, O’Neill M, Mckay RI, Galván-López E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolvable Mach 12:91–119CrossRefGoogle Scholar
  90. Vanneschi L, Castelli M, Silva S (2010) Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th annual conference on genetic and evolutionary computation, GECCO’10, ACM, New York, NY, pp 877–884.Google Scholar
  91. Vladislavleva EJ, Smits GF, Den Hertog D (2009) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans Evol Comput 13:333–349CrossRefGoogle Scholar
  92. Wichard J (2006) Model selection in an ensemble framework. In: International joint conference on neural networks, 2006. IJCNN’06, pp 2187–2192Google Scholar
  93. Wyns B, De Bruyne P, Boullart L (2006) Characterizing diversity in genetic programming. In: Proceedings of the 9th European conference on genetic programming, Springer-Verlag, pp 250–259Google Scholar
  94. Zăvoianu AC (2010) Towards solution parsimony in an enhanced genetic programming process. Master’s thesis, International School Informatics: Engineering & Management, ISI-Hagenberg, Johannes Kepler University, LinzGoogle Scholar
  95. Zhang BT, Cho DY (1999) Genetic programming with active data selection. In: Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning, SEAL’98, Springer-Verlag, London, pp 146–153Google Scholar
  96. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 3(4):257–271CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Department of Information TechnologyDharmsinh Desai UniversityNadiadIndia
  2. 2.Institute of Information and Communication TechnologyAhmedabad UniversityAhmedabadIndia

Personalised recommendations