Better GP benchmarks: community survey results and proposals

  • David R. White
  • James McDermottEmail author
  • Mauro Castelli
  • Luca Manzoni
  • Brian W. Goldman
  • Gabriel Kronberger
  • Wojciech Jaśkowski
  • Una-May O’Reilly
  • Sean Luke


We present the results of a community survey regarding genetic programming benchmark practices. Analysis shows broad consensus that improvement is needed in problem selection and experimental rigor. While views expressed in the survey dissuade us from proposing a large-scale benchmark suite, we find community support for creating a “blacklist” of problems which are in common use but have important flaws, and whose use should therefore be discouraged. We propose a set of possible replacement problems.


Genetic programming Benchmarks Community survey 



Thanks to Ricardo Segurado and the University College Dublin CSTAR statistics consultancy; thanks to Marilyn McGee-Lennon in the School of Computing Science at the University of Glasgow for her advice on survey design, and to the School itself for providing the supporting web service. Thanks to all those who participated in the GP survey and have engaged in discussion through the GP mailing list, the benchmark mailing list, and the GECCO 2012 debate. Thanks to the anonymous reviewers of this paper. David R White is funded by the Scottish Informatics and Computer Science Alliance. James McDermott is funded by the Irish Research Council. Gabriel Kronberger is supported by the Austrian Research Promotion Agency, Josef Ressel-centre “Heureka!” Wojciech Jaśkowski is supported by Polish Ministry of Science and Education, grant no. 91-531/DS.


  1. 1.
    J. Bacardit, M. Stout, N. Krasnogor, J.D. Hirst, J. Blazewicz, Coordination number prediction using learning classifier systems, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), (Seattle, Washington, USA, 2006), p. 247. doi: 10.1145/1143997.1144041
  2. 2.
    D.F. Barrero, M. R-Moreno, B. Castano, D. Camacho, An empirical study on the accuracy of computational effort in genetic programming, in Proceedings of the Congress on Evolutionary Computation (2011)Google Scholar
  3. 3.
    S. Christensen, F. Oppacher, An analysis of Koza’s computational effort statistic for genetic programming. In: Proceedings of EuroGP. (Springer, Berlin, 2002)Google Scholar
  4. 4.
    J.M. Daida, R. Bertram, S. Stanhope, J. Khoo, S. Chaudhary, O. Chaudhary, What makes a problem GP-Hard? Analysis of a tunably difficult problem in genetic programming. Genet. Program Evolvable Mach. 2, 165–191 (2001)zbMATHCrossRefGoogle Scholar
  5. 5.
    C. Drummond, N. Japkowicz, Warning: statistical benchmarking is addictive. Kicking the habit in machine learning. J. Exp. Theor. Artif. Intell. 22(1), 67–80 (2010)zbMATHCrossRefGoogle Scholar
  6. 6.
    E. Espié, C. Guionneau, B. Wymann, C. Dimitrakakis, R. Coulom, A. Sumner, TORCS—the open racing car simulator (2005)Google Scholar
  7. 7.
    R. Feldt, M. O’Neill, C. Ryan, P. Nordin, W.B. Langdon, GP-Beagle: a benchmarking problem repository for the genetic programming community, in Late Breaking Papers at GECCO (2000)Google Scholar
  8. 8.
    A. Fernández-Ares, A. Mora, J. Merelo, P. García-Sánchez, C. Fernandes, Optimizing player behavior in a real-time strategy game using evolutionary algorithms, in Proceedings of the Congress on Evolutionary Computation, pp. 2017–2024. IEEE (2011)Google Scholar
  9. 9.
    P. Flener, U. Schmid, An introduction to inductive programming. Artif. Intell. Rev. 29(1), 45–62 (2008)CrossRefGoogle Scholar
  10. 10.
    A. Frank, A. Asuncion, UCI machine learning repository (2010).
  11. 11.
    J. Friedman, Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991)Google Scholar
  12. 12.
    M. Gallagher, A. Ryan, Learning to play Pac-Man: an evolutionary, rule-based approach, in Proceedings of the Congress on Evolutionary Computation, vol. 4, pp. 2462–2469. IEEE (2003)Google Scholar
  13. 13.
    C. Gathercole, P. Ross, An adverse interaction between crossover and restricted tree depth in genetic programming, in: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (1996)Google Scholar
  14. 14.
    D.E. Goldberg, U.M. O’Reilly, Where does the good stuff go, and why? How contextual semantics influence program structure in simple genetic programming, in Proceedings of EuroGP (1998)Google Scholar
  15. 15.
    S. Gulwani, Dimensions in program synthesis, in Proceedings of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming (ACM, Philadelphia, 2010) , pp. 13–24Google Scholar
  16. 16.
    S. Gustafson, E.K. Burke, N. Krasnogor, The tree-string problem: an artificial domain for structure and content search, in Proceedings of EuroGP (2005)Google Scholar
  17. 17.
    D.J. Hand, Classifier technology and the illusion of progress. Stat. Sci. 21(1), 1–14 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    M. Harman, B. Jones, Search-based software engineering. Inf. Softw. Technol. 43(14), 833–839 (2001)CrossRefGoogle Scholar
  19. 19.
    R. Harper, Spatial co-evolution: quicker, fitter and less bloated, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (ACM, Philadelphia, 2012), pp. 759–766Google Scholar
  20. 20.
    K. Hartness, Robocode: using games to teach artificial intelligence. J. Comput. Sci. Coll. 19(4), 287–291 (2004)Google Scholar
  21. 21.
    T.H. Hoang, N.X. Hoai, N.T. Hien, R.I. McKay, D. Essam, ORDERTREE: a new test problem for genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (2006)Google Scholar
  22. 22.
    R.C. Holte, Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)zbMATHCrossRefGoogle Scholar
  23. 23.
    K. Imamura, J. Foster, A. Krings, The test vector problem and limitations to evolving digital circuits, in Proceedings of the Second NASA/DoD Workshop on Evolvable Hardware, pp. 75–79. IEEE (2000)Google Scholar
  24. 24.
    D. Johnson, in A theoretician’s guide to the experimental analysis of algorithms. Data structures, near neighbor searches, and methodology: fifth and sixth DIMACS implementation challenges, vol 59, pp. 215–250 (2002)Google Scholar
  25. 25.
    M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling, in Proceedings of EuroGP (2003)Google Scholar
  26. 26.
    E. Kirshenbaum, Iteration over vectors in genetic programming. HP Laboratories Technical Report HPL-2001-327 (2001)Google Scholar
  27. 27.
    M.F. Korns, Accuracy in symbolic regression, in Proceedings of Genetic Programming Theory and Practice (2011)Google Scholar
  28. 28.
    J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. (MIT Press, Cambridge, MA, 1992)zbMATHGoogle Scholar
  29. 29.
    J.R. Koza, Genetic Programming II: Automatic Discovery of Reusable Programs. (MIT Press, Cambridge, MA, 1994)zbMATHGoogle Scholar
  30. 30.
    D. Loiacono, J. Togelius, Competitions@WCCI-2008: simulated car racing competition. ACM SIGEVOlution 2(4), 35–36 (2007)CrossRefGoogle Scholar
  31. 31.
    S. Luke, L. Panait, Is the perfect the enemy of the good? in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (2002)Google Scholar
  32. 32.
    J. McDermott, D.R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaśkowski, K. Krawiec, R. Harper, K.D. Jong, U.M. O’Reilly, Genetic programming needs better benchmarks, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (ACM, Philadelphia, 2012)Google Scholar
  33. 33.
    Q.U. Nguyen, X.H. Nguyen, M. O’Neill, R.I. Mckay, E. Galván-López, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program Evolvable Mach. 12, 91–119 (2011)CrossRefGoogle Scholar
  34. 34.
    J. Niehaus, W. Banzhaf, More on computational effort statistics for genetic programming, in Proceedings of EuroGP (2003)Google Scholar
  35. 35.
    M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program Evolvable Mach. 11(3/4), 339–363 (2010)CrossRefGoogle Scholar
  36. 36.
    L. Pagie, P. Hogeweg, Evolutionary consequences of coevolving targets. Evol. Comput. 5, 401–418 (1997)CrossRefGoogle Scholar
  37. 37.
    N. Paterson, M. Livesey, Performance comparison in genetic programming, in Late Breaking Papers at GECCO (2000)Google Scholar
  38. 38.
    D. Perez, P. Rohlfshagen, S.M. Lucas, Monte-Carlo tree search for the physical travelling salesman problem, in Applications of Evolutionary Computation. Lecture Notes in Computer Science, vol. 7248, ed. by C. Di Chio, A. Agapitos, S. Cagnoni, C. Cotta, F.F. de Vega, G.A. Di Caro, R. Drechsler, A. Ekárt, A.I. Esparcia-Alcázar, M. Farooq, W.B. Langdon, J.J. Merelo-Guervós, M. Preuss, H. Richter, S. Silva, A. Simões, G. Squillero, E. Tarantino, A.G.B. Tettamanzi, J. Togelius, N. Urquhart, A.Ş. Uyar, G.N. Yannakakis (Springer, Berlin, Heidelberg, 2012), pp. 255–264Google Scholar
  39. 39.
    D. Phong, N. Hoai, R. McKay, C. Siriteanu, N. Uy, N. Park, Evolving the best known approximation to the q function. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (ACM, Philadelphia, 2012) , pp. 807–814Google Scholar
  40. 40.
    B. Punch, D. Zongker, E. Goodman, The royal tree problem, a benchmark for single and multiple population genetic programming. In: Advances in Genetic Programming 2, (MIT Press, Cambridge, MA, 1996), pp. 299–316Google Scholar
  41. 41.
    A. Strauss, J. Corbin (eds), Qualitative Research Practice: A Guide for Social Science Students and Researchers. (Sage, Beverley Hills, CA, 1997)Google Scholar
  42. 42.
    S. Salzberg, On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Min. Knowl. Disc. 1, 317–328 (1997)CrossRefGoogle Scholar
  43. 43.
    B. Sendhoff, M. Roberts, X. Yao, Evolutionary computation benchmarking repository. IEEE Comput. Intell. Mag. 1(4), 50–60 (2006)Google Scholar
  44. 44.
    J.C. Sprott, Simplest dissipative chaotic flow. Phys. Lett. A 228(4), 271–274 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  45. 45.
    A. Strauss, J. Corbin, Grounded Theory in Practice. (Sage, Beverley Hills, CA, 1997)Google Scholar
  46. 46.
    M. Streeter, L.A. Becker, Automated discovery of numerical approximation formulae via genetic programming. Genet. Program. Evol. Mach. 4, 255–286 (2003). doi: 10.1023/A:1025176407779
  47. 47.
    J. Togelius, S. Karakovskiy, R. Baumgarten, The 2009 mario ai competition, in Proceedings of the Congress on Evolutionary Computation (2010)Google Scholar
  48. 48.
    M. Tomassini, L. Vanneschi, P. Collard, M. Clergue, A study of fitness distance correlation as a difficulty measure in genetic programming. Evol. Comput. 13, 213–239 (2005). doi: 10.1162/1063656054088549 Google Scholar
  49. 49.
    L. Vanneschi, M. Castelli, L. Manzoni, The K landscapes: a tunably difficult benchmark for genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (2011)Google Scholar
  50. 50.
    E. Vladislavleva, G. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2009)Google Scholar
  51. 51.
    K.L. Wagstaff, Machine learning that matters, in Proceedings of the 29th International Conference on Machine Learning (ICML-12), ed. by J. Langford, J. Pineau (2012)Google Scholar
  52. 52.
    J. Walker, J. Miller, Predicting prime numbers using Cartesian genetic programming. Proceedings of EuroGP pp. 205–216 (2007)Google Scholar
  53. 53.
    J. Walker, J. Miller, The automatic acquisition, evolution and reuse of modules in Cartesian genetic programming. IEEE Trans. Evol. Comput. 12(4), 397–417 (2008)CrossRefGoogle Scholar
  54. 54.
    H. Warren, Hacker’s Delight. (Addison-Wesley Professional, 2003).
  55. 55.
    W. Weimer, T. Nguyen, C. Le Goues, S. Forrest, Automatically finding patches using genetic programming, in Proceedings of the 31st International Conference on Software Engineering (2009)Google Scholar
  56. 56.
    P. Widera, J. Garibaldi, N. Krasnogor, GP challenge: evolving energy function for protein structure prediction. Genet. Program Evolvable Mach. 11, 61–88 (2010)CrossRefGoogle Scholar
  57. 57.
    J.L. Wilkerson, D.R. Tauritz, J. Bridges, Multi-objective coevolutionary automated software correction system, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO). (ACM, Philadelphia, 2012)Google Scholar
  58. 58.
    L. Wilkinson, A. Anand, D. Tuan, CHIRP: a new classifier based on composite hypercubes on iterated random projections. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), vol. 11, (2011), pp. 6–14Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • David R. White
    • 1
  • James McDermott
    • 2
    Email author
  • Mauro Castelli
    • 3
  • Luca Manzoni
    • 4
  • Brian W. Goldman
    • 5
  • Gabriel Kronberger
    • 6
  • Wojciech Jaśkowski
    • 7
  • Una-May O’Reilly
    • 8
  • Sean Luke
    • 9
  1. 1.School of Computing ScienceUniversity of GlasgowGlasgowUK
  2. 2.School of BusinessUniversity College DublinDublinIreland
  3. 3.Instituto Superior de Estatística e Gestão de Informação (ISEGI)Universidade Nova de LisboaLisbonPortugal
  4. 4.Dipartimento di Informatica, Sistemistica e ComunicazioneUniversity of Milano-BicoccaMilanItaly
  5. 5.BEACON Center for the Study of Evolution in ActionMichigan State UniversityEast LansingUSA
  6. 6.University of Applied Sciences Upper AustriaLinzAustria
  7. 7.Institute of Computing SciencePoznan University of TechnologyPoznanPoland
  8. 8.CSAILMassachusetts Institute of TechnologyCambridgeUSA
  9. 9.Department of Computer ScienceGeorge Mason UniversityFairfaxUSA

Personalised recommendations