Skip to main content

Better GP benchmarks: community survey results and proposals


We present the results of a community survey regarding genetic programming benchmark practices. Analysis shows broad consensus that improvement is needed in problem selection and experimental rigor. While views expressed in the survey dissuade us from proposing a large-scale benchmark suite, we find community support for creating a “blacklist” of problems which are in common use but have important flaws, and whose use should therefore be discouraged. We propose a set of possible replacement problems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8





























  1. J. Bacardit, M. Stout, N. Krasnogor, J.D. Hirst, J. Blazewicz, Coordination number prediction using learning classifier systems, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), (Seattle, Washington, USA, 2006), p. 247. doi:10.1145/1143997.1144041

  2. D.F. Barrero, M. R-Moreno, B. Castano, D. Camacho, An empirical study on the accuracy of computational effort in genetic programming, in Proceedings of the Congress on Evolutionary Computation (2011)

  3. S. Christensen, F. Oppacher, An analysis of Koza’s computational effort statistic for genetic programming. In: Proceedings of EuroGP. (Springer, Berlin, 2002)

  4. J.M. Daida, R. Bertram, S. Stanhope, J. Khoo, S. Chaudhary, O. Chaudhary, What makes a problem GP-Hard? Analysis of a tunably difficult problem in genetic programming. Genet. Program Evolvable Mach. 2, 165–191 (2001)

    MATH  Article  Google Scholar 

  5. C. Drummond, N. Japkowicz, Warning: statistical benchmarking is addictive. Kicking the habit in machine learning. J. Exp. Theor. Artif. Intell. 22(1), 67–80 (2010)

    MATH  Article  Google Scholar 

  6. E. Espié, C. Guionneau, B. Wymann, C. Dimitrakakis, R. Coulom, A. Sumner, TORCS—the open racing car simulator (2005)

  7. R. Feldt, M. O’Neill, C. Ryan, P. Nordin, W.B. Langdon, GP-Beagle: a benchmarking problem repository for the genetic programming community, in Late Breaking Papers at GECCO (2000)

  8. A. Fernández-Ares, A. Mora, J. Merelo, P. García-Sánchez, C. Fernandes, Optimizing player behavior in a real-time strategy game using evolutionary algorithms, in Proceedings of the Congress on Evolutionary Computation, pp. 2017–2024. IEEE (2011)

  9. P. Flener, U. Schmid, An introduction to inductive programming. Artif. Intell. Rev. 29(1), 45–62 (2008)

    Article  Google Scholar 

  10. A. Frank, A. Asuncion, UCI machine learning repository (2010).

  11. J. Friedman, Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)

    Google Scholar 

  12. M. Gallagher, A. Ryan, Learning to play Pac-Man: an evolutionary, rule-based approach, in Proceedings of the Congress on Evolutionary Computation, vol. 4, pp. 2462–2469. IEEE (2003)

  13. C. Gathercole, P. Ross, An adverse interaction between crossover and restricted tree depth in genetic programming, in: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (1996)

  14. D.E. Goldberg, U.M. O’Reilly, Where does the good stuff go, and why? How contextual semantics influence program structure in simple genetic programming, in Proceedings of EuroGP (1998)

  15. S. Gulwani, Dimensions in program synthesis, in Proceedings of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming (ACM, Philadelphia, 2010) , pp. 13–24

  16. S. Gustafson, E.K. Burke, N. Krasnogor, The tree-string problem: an artificial domain for structure and content search, in Proceedings of EuroGP (2005)

  17. D.J. Hand, Classifier technology and the illusion of progress. Stat. Sci. 21(1), 1–14 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  18. M. Harman, B. Jones, Search-based software engineering. Inf. Softw. Technol. 43(14), 833–839 (2001)

    Article  Google Scholar 

  19. R. Harper, Spatial co-evolution: quicker, fitter and less bloated, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (ACM, Philadelphia, 2012), pp. 759–766

  20. K. Hartness, Robocode: using games to teach artificial intelligence. J. Comput. Sci. Coll. 19(4), 287–291 (2004)

    Google Scholar 

  21. T.H. Hoang, N.X. Hoai, N.T. Hien, R.I. McKay, D. Essam, ORDERTREE: a new test problem for genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (2006)

  22. R.C. Holte, Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)

    MATH  Article  Google Scholar 

  23. K. Imamura, J. Foster, A. Krings, The test vector problem and limitations to evolving digital circuits, in Proceedings of the Second NASA/DoD Workshop on Evolvable Hardware, pp. 75–79. IEEE (2000)

  24. D. Johnson, in A theoretician’s guide to the experimental analysis of algorithms. Data structures, near neighbor searches, and methodology: fifth and sixth DIMACS implementation challenges, vol 59, pp. 215–250 (2002)

  25. M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling, in Proceedings of EuroGP (2003)

  26. E. Kirshenbaum, Iteration over vectors in genetic programming. HP Laboratories Technical Report HPL-2001-327 (2001)

  27. M.F. Korns, Accuracy in symbolic regression, in Proceedings of Genetic Programming Theory and Practice (2011)

  28. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. (MIT Press, Cambridge, MA, 1992)

    MATH  Google Scholar 

  29. J.R. Koza, Genetic Programming II: Automatic Discovery of Reusable Programs. (MIT Press, Cambridge, MA, 1994)

    MATH  Google Scholar 

  30. D. Loiacono, J. Togelius, Competitions@WCCI-2008: simulated car racing competition. ACM SIGEVOlution 2(4), 35–36 (2007)

    Article  Google Scholar 

  31. S. Luke, L. Panait, Is the perfect the enemy of the good? in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (2002)

  32. J. McDermott, D.R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaśkowski, K. Krawiec, R. Harper, K.D. Jong, U.M. O’Reilly, Genetic programming needs better benchmarks, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (ACM, Philadelphia, 2012)

  33. Q.U. Nguyen, X.H. Nguyen, M. O’Neill, R.I. Mckay, E. Galván-López, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program Evolvable Mach. 12, 91–119 (2011)

    Article  Google Scholar 

  34. J. Niehaus, W. Banzhaf, More on computational effort statistics for genetic programming, in Proceedings of EuroGP (2003)

  35. M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program Evolvable Mach. 11(3/4), 339–363 (2010)

    Article  Google Scholar 

  36. L. Pagie, P. Hogeweg, Evolutionary consequences of coevolving targets. Evol. Comput. 5, 401–418 (1997)

    Article  Google Scholar 

  37. N. Paterson, M. Livesey, Performance comparison in genetic programming, in Late Breaking Papers at GECCO (2000)

  38. D. Perez, P. Rohlfshagen, S.M. Lucas, Monte-Carlo tree search for the physical travelling salesman problem, in Applications of Evolutionary Computation. Lecture Notes in Computer Science, vol. 7248, ed. by C. Di Chio, A. Agapitos, S. Cagnoni, C. Cotta, F.F. de Vega, G.A. Di Caro, R. Drechsler, A. Ekárt, A.I. Esparcia-Alcázar, M. Farooq, W.B. Langdon, J.J. Merelo-Guervós, M. Preuss, H. Richter, S. Silva, A. Simões, G. Squillero, E. Tarantino, A.G.B. Tettamanzi, J. Togelius, N. Urquhart, A.Ş. Uyar, G.N. Yannakakis (Springer, Berlin, Heidelberg, 2012), pp. 255–264

  39. D. Phong, N. Hoai, R. McKay, C. Siriteanu, N. Uy, N. Park, Evolving the best known approximation to the q function. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (ACM, Philadelphia, 2012) , pp. 807–814

  40. B. Punch, D. Zongker, E. Goodman, The royal tree problem, a benchmark for single and multiple population genetic programming. In: Advances in Genetic Programming 2, (MIT Press, Cambridge, MA, 1996), pp. 299–316

  41. A. Strauss, J. Corbin (eds), Qualitative Research Practice: A Guide for Social Science Students and Researchers. (Sage, Beverley Hills, CA, 1997)

    Google Scholar 

  42. S. Salzberg, On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Min. Knowl. Disc. 1, 317–328 (1997)

    Article  Google Scholar 

  43. B. Sendhoff, M. Roberts, X. Yao, Evolutionary computation benchmarking repository. IEEE Comput. Intell. Mag. 1(4), 50–60 (2006)

    Google Scholar 

  44. J.C. Sprott, Simplest dissipative chaotic flow. Phys. Lett. A 228(4), 271–274 (1997)

    MathSciNet  MATH  Article  Google Scholar 

  45. A. Strauss, J. Corbin, Grounded Theory in Practice. (Sage, Beverley Hills, CA, 1997)

    Google Scholar 

  46. M. Streeter, L.A. Becker, Automated discovery of numerical approximation formulae via genetic programming. Genet. Program. Evol. Mach. 4, 255–286 (2003). doi:10.1023/A:1025176407779

  47. J. Togelius, S. Karakovskiy, R. Baumgarten, The 2009 mario ai competition, in Proceedings of the Congress on Evolutionary Computation (2010)

  48. M. Tomassini, L. Vanneschi, P. Collard, M. Clergue, A study of fitness distance correlation as a difficulty measure in genetic programming. Evol. Comput. 13, 213–239 (2005). doi:10.1162/1063656054088549

    Google Scholar 

  49. L. Vanneschi, M. Castelli, L. Manzoni, The K landscapes: a tunably difficult benchmark for genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (2011)

  50. E. Vladislavleva, G. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2009)

  51. K.L. Wagstaff, Machine learning that matters, in Proceedings of the 29th International Conference on Machine Learning (ICML-12), ed. by J. Langford, J. Pineau (2012)

  52. J. Walker, J. Miller, Predicting prime numbers using Cartesian genetic programming. Proceedings of EuroGP pp. 205–216 (2007)

  53. J. Walker, J. Miller, The automatic acquisition, evolution and reuse of modules in Cartesian genetic programming. IEEE Trans. Evol. Comput. 12(4), 397–417 (2008)

    Article  Google Scholar 

  54. H. Warren, Hacker’s Delight. (Addison-Wesley Professional, 2003).

  55. W. Weimer, T. Nguyen, C. Le Goues, S. Forrest, Automatically finding patches using genetic programming, in Proceedings of the 31st International Conference on Software Engineering (2009)

  56. P. Widera, J. Garibaldi, N. Krasnogor, GP challenge: evolving energy function for protein structure prediction. Genet. Program Evolvable Mach. 11, 61–88 (2010)

    Article  Google Scholar 

  57. J.L. Wilkerson, D.R. Tauritz, J. Bridges, Multi-objective coevolutionary automated software correction system, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO). (ACM, Philadelphia, 2012)

  58. L. Wilkinson, A. Anand, D. Tuan, CHIRP: a new classifier based on composite hypercubes on iterated random projections. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), vol. 11, (2011), pp. 6–14

Download references


Thanks to Ricardo Segurado and the University College Dublin CSTAR statistics consultancy; thanks to Marilyn McGee-Lennon in the School of Computing Science at the University of Glasgow for her advice on survey design, and to the School itself for providing the supporting web service. Thanks to all those who participated in the GP survey and have engaged in discussion through the GP mailing list, the benchmark mailing list, and the GECCO 2012 debate. Thanks to the anonymous reviewers of this paper. David R White is funded by the Scottish Informatics and Computer Science Alliance. James McDermott is funded by the Irish Research Council. Gabriel Kronberger is supported by the Austrian Research Promotion Agency, Josef Ressel-centre “Heureka!” Wojciech Jaśkowski is supported by Polish Ministry of Science and Education, grant no. 91-531/DS.

Author information

Authors and Affiliations


Corresponding author

Correspondence to James McDermott.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

White, D.R., McDermott, J., Castelli, M. et al. Better GP benchmarks: community survey results and proposals. Genet Program Evolvable Mach 14, 3–29 (2013).

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI:


  • Genetic programming
  • Benchmarks
  • Community survey