Skip to main content

A study on Koza’s performance measures


John R. Koza defined several metrics to measure the performance of an Evolutionary Algorithm that have been widely used by the Genetic Programming community. Despite the importance of these metrics, and the doubts that they have generated in many authors, their reliability has attracted little research attention, and is still not well understood. The lack of knowledge about these metrics has likely contributed to the decline in their usage in the last years. This paper is an attempt to increase the knowledge about these measures, exploring in which circumstances they are more reliable, providing some clues to improve how they are used, and eventually making their use more justifiable. Specifically, we investigate the amount of uncertainty associated with the measures, taking an analytical and empirical approach and reaching theoretical boundaries to the error. Additionally, a new method to calculate Koza’s performance measures is presented. It is shown that these metrics, under common experimental configurations, have an unacceptable error, which can be arbitrary large in certain conditions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. All the code, configuration files and datasets required to reproduce these experiments are available on


  1. A. Agresti, B.A. Coull, Approximate is better than ’exact’ for interval estimation of binomial proportions. Am. Stat. 52, 119–126 (1998)

    MathSciNet  Google Scholar 

  2. P.J. Angeline, An investigation into the sensitivity of genetic programming to the frequency of leaf selection during subtree crossover. in Proceedings of the First Annual Conference on Genetic Programming (GECCO 1996). (MIT Press, Cambridge, MA, 1996), pp. 21–29

  3. D.F. Barrero, D. Camacho, M.D. R-Moreno, Confidence intervals of success rates in evolutionary computation. in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO 2010). (ACM, Portland, Oregon, 2010), pp. 975–976. doi:10.1145/1830483.1830657

  4. D.F. Barrero, B. Castaño, M.D. R-Moreno, D. Camacho, Statistical Distribution of Generation-to-Success in GP: Application to Model Accumulated Success Probability, in Proceedings of the 14th European Conference on Genetic Programming, EuroGP 2011, LNCS, vol. 6621, ed. by S. Silva, J.A. Foster, M. Nicolau, M. Giacobini, P. Machado (Springer, Turin, 2011), pp. 155–166

    Google Scholar 

  5. D.F. Barrero, M.D. R-Moreno, B. Castano, D. Camacho, An empirical study on the accuracy of computational effort in genetic programming, in Proceedings of the 2011 IEEE Congress on Evolutionary Computation. IEEE Computational Intelligence Society, ed. by A.E. Smith (IEEE Press, New Orleans, 2011), pp. 1169–1176

    Google Scholar 

  6. L.D. Brown, T.T. Cai, A. Dasgupta, Interval estimation for a binomial proportion. Stat. Sci. 16, 101–133 (2001)

    MATH  MathSciNet  Google Scholar 

  7. L.D. Brown, T.T. Cai, A. Dasgupta, Confidence intervals for a binomial proportion and asymptotic expansions. Ann. Stat. 30(1), 160–201 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  8. M. Chiarandini, T. Stützle, Experimental Evaluation of Course Timetabling Algorithms. Tech. Rep. AIDA-02-05, Intellectics Group, Computer Science Department, Darmstadt University of Technology, Darmstadt, Germany (2002)

  9. S. Christensen, F. Oppacher, An analysis of Koza’s computational effort statistic for genetic programming. in Proceedings of the 5th European Conference on Genetic Programming (EuroGP 2002). (Springer, London, 2002), pp. 182–191

  10. C. Clopper, S. Pearson, The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, 404–413 (1934)

    Article  MATH  Google Scholar 

  11. D. Frost, I. Rish, L. Vila, Summarizing CSP hardness with continuous probability distributions. in Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence, AAAI’97/IAAI’97. (AAAI Press, Menlo Park, 1997), pp. 327–333

  12. A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data Analysis, Second Edition (Chapman & Hall/CRC Texts in Statistical Science), 2nd edn. (Chapman and Hall, London, 2003)

    Google Scholar 

  13. H.H. Hoos, T. Sttzle, Evaluating Las Vegas algorithms—pitfalls and remedies. in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98). (Morgan Kaufmann Publishers, Los Altos, CA, 1998), pp. 238–245

  14. A. Kaufmann, D. Grounchko, R. Cruon, Mathematical Models for the Study of the Reliability of Systems, Mathematics in Science and Engineering, vol. 124 (Academic Press, New York, 1977)

    Google Scholar 

  15. M. Keijzer, V. Babovic, C. Ryan, M. O’Neill, M. Cattolico, Adaptive logic programming. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). (Morgan Kaufmann, San Francisco, CA, 2001), pp. 42–49

  16. J. Koza, Genetic Programming: On the programming of Computers by Means of Natural Selection (MIT Press, Cambrige, MA, 1992)

    MATH  Google Scholar 

  17. P.S. Laplace, Théorie Analytique des probabilités (Mme Ve Courcier, Paris, 1812)

    MATH  Google Scholar 

  18. E. Limpert, W.A. Stahel, M. Abbt, Log-normal distributions across the sciences: keys and clues. Bioscience 51(5), 341–352 (2001)

    Article  Google Scholar 

  19. D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, 4th edn. (Wiley, New York, 2006)

    Google Scholar 

  20. J.B. Mouret, S. Doncieux, Encouraging behavioral diversity in evolutionary robotics: an empirical study. Evol. Comput. 20(1), 91–133 (2012)

    Article  Google Scholar 

  21. R. Myers, E.R. Hancock, Empirical modelling of genetic algorithms. Evol. Comput. 9(4), 461–493 (2001)

    Article  Google Scholar 

  22. R.G. Newcombe, Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat. Med. 17(8), 857–872 (1998)

    Article  Google Scholar 

  23. J. Niehaus, W. Banzhaf, More on computational effort statistics for genetic programming. in Genetic Programming, Proceedings of EuroGP’2003, LNCS, vol. 2610. (Springer, Essex, 2003), pp. 164–172

  24. R. Poli, L. Vanneschi, W. Langdon, N. McPhee, Theoretical results in Genetic Programming: the next ten years? Genet. Program Evolvable Mach. 11(3), 285–320 (2010)

    Article  Google Scholar 

  25. R. Sharma, Bayes approach to interval estimation of a binomial parameter. Ann. Inst. Stat. Math. 27(1), 259–267 (1975)

    Article  MATH  Google Scholar 

  26. M. Walker, H. Edwards, C. Messom, The reliability of confidence intervals for computational effort comparisons. in Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO 2007). (ACM, New York, NY, 2007), pp. 1716–1723

  27. M. Walker, H. Edwards, C.H. Messom, Confidence intervals for computational effort comparisons. in EuroGP, pp. 23–32 (2007)

  28. E.B. Wilson, Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22, 309–316 (1927)

    Google Scholar 

Download references


Authors would like to thank Héctor Menéndez, Alejandro Sierra and Ricardo Aler for their reviews and valuable suggestions. This work is supported by the Project of Castilla-La Mancha PEII11-0079-8929.

Author information

Authors and Affiliations


Corresponding author

Correspondence to David F. Barrero.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barrero, D.F., Castaño, B., R-Moreno, M.D. et al. A study on Koza’s performance measures. Genet Program Evolvable Mach 16, 327–349 (2015).

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI:


  • Genetic Programming
  • Computational effort
  • Performance measures
  • Experimental methods
  • Measurement error