Genetic Programming and Evolvable Machines

, Volume 16, Issue 3, pp 327–349 | Cite as

A study on Koza’s performance measures

  • David F. Barrero
  • Bonifacio Castaño
  • María D. R-Moreno
  • David Camacho


John R. Koza defined several metrics to measure the performance of an Evolutionary Algorithm that have been widely used by the Genetic Programming community. Despite the importance of these metrics, and the doubts that they have generated in many authors, their reliability has attracted little research attention, and is still not well understood. The lack of knowledge about these metrics has likely contributed to the decline in their usage in the last years. This paper is an attempt to increase the knowledge about these measures, exploring in which circumstances they are more reliable, providing some clues to improve how they are used, and eventually making their use more justifiable. Specifically, we investigate the amount of uncertainty associated with the measures, taking an analytical and empirical approach and reaching theoretical boundaries to the error. Additionally, a new method to calculate Koza’s performance measures is presented. It is shown that these metrics, under common experimental configurations, have an unacceptable error, which can be arbitrary large in certain conditions.


Genetic Programming Computational effort Performance measures Experimental methods Measurement error 



Authors would like to thank Héctor Menéndez, Alejandro Sierra and Ricardo Aler for their reviews and valuable suggestions. This work is supported by the Project of Castilla-La Mancha PEII11-0079-8929.


  1. 1.
    A. Agresti, B.A. Coull, Approximate is better than ’exact’ for interval estimation of binomial proportions. Am. Stat. 52, 119–126 (1998)MathSciNetGoogle Scholar
  2. 2.
    P.J. Angeline, An investigation into the sensitivity of genetic programming to the frequency of leaf selection during subtree crossover. in Proceedings of the First Annual Conference on Genetic Programming (GECCO 1996). (MIT Press, Cambridge, MA, 1996), pp. 21–29Google Scholar
  3. 3.
    D.F. Barrero, D. Camacho, M.D. R-Moreno, Confidence intervals of success rates in evolutionary computation. in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO 2010). (ACM, Portland, Oregon, 2010), pp. 975–976. doi: 10.1145/1830483.1830657
  4. 4.
    D.F. Barrero, B. Castaño, M.D. R-Moreno, D. Camacho, Statistical Distribution of Generation-to-Success in GP: Application to Model Accumulated Success Probability, in Proceedings of the 14th European Conference on Genetic Programming, EuroGP 2011, LNCS, vol. 6621, ed. by S. Silva, J.A. Foster, M. Nicolau, M. Giacobini, P. Machado (Springer, Turin, 2011), pp. 155–166Google Scholar
  5. 5.
    D.F. Barrero, M.D. R-Moreno, B. Castano, D. Camacho, An empirical study on the accuracy of computational effort in genetic programming, in Proceedings of the 2011 IEEE Congress on Evolutionary Computation. IEEE Computational Intelligence Society, ed. by A.E. Smith (IEEE Press, New Orleans, 2011), pp. 1169–1176Google Scholar
  6. 6.
    L.D. Brown, T.T. Cai, A. Dasgupta, Interval estimation for a binomial proportion. Stat. Sci. 16, 101–133 (2001)zbMATHMathSciNetGoogle Scholar
  7. 7.
    L.D. Brown, T.T. Cai, A. Dasgupta, Confidence intervals for a binomial proportion and asymptotic expansions. Ann. Stat. 30(1), 160–201 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    M. Chiarandini, T. Stützle, Experimental Evaluation of Course Timetabling Algorithms. Tech. Rep. AIDA-02-05, Intellectics Group, Computer Science Department, Darmstadt University of Technology, Darmstadt, Germany (2002)Google Scholar
  9. 9.
    S. Christensen, F. Oppacher, An analysis of Koza’s computational effort statistic for genetic programming. in Proceedings of the 5th European Conference on Genetic Programming (EuroGP 2002). (Springer, London, 2002), pp. 182–191Google Scholar
  10. 10.
    C. Clopper, S. Pearson, The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, 404–413 (1934)CrossRefzbMATHGoogle Scholar
  11. 11.
    D. Frost, I. Rish, L. Vila, Summarizing CSP hardness with continuous probability distributions. in Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Conference on Innovative Applications of Artificial Intelligence, AAAI’97/IAAI’97. (AAAI Press, Menlo Park, 1997), pp. 327–333Google Scholar
  12. 12.
    A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data Analysis, Second Edition (Chapman & Hall/CRC Texts in Statistical Science), 2nd edn. (Chapman and Hall, London, 2003)Google Scholar
  13. 13.
    H.H. Hoos, T. Sttzle, Evaluating Las Vegas algorithms—pitfalls and remedies. in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98). (Morgan Kaufmann Publishers, Los Altos, CA, 1998), pp. 238–245Google Scholar
  14. 14.
    A. Kaufmann, D. Grounchko, R. Cruon, Mathematical Models for the Study of the Reliability of Systems, Mathematics in Science and Engineering, vol. 124 (Academic Press, New York, 1977)Google Scholar
  15. 15.
    M. Keijzer, V. Babovic, C. Ryan, M. O’Neill, M. Cattolico, Adaptive logic programming. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). (Morgan Kaufmann, San Francisco, CA, 2001), pp. 42–49Google Scholar
  16. 16.
    J. Koza, Genetic Programming: On the programming of Computers by Means of Natural Selection (MIT Press, Cambrige, MA, 1992)zbMATHGoogle Scholar
  17. 17.
    P.S. Laplace, Théorie Analytique des probabilités (Mme Ve Courcier, Paris, 1812)zbMATHGoogle Scholar
  18. 18.
    E. Limpert, W.A. Stahel, M. Abbt, Log-normal distributions across the sciences: keys and clues. Bioscience 51(5), 341–352 (2001)CrossRefGoogle Scholar
  19. 19.
    D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, 4th edn. (Wiley, New York, 2006)Google Scholar
  20. 20.
    J.B. Mouret, S. Doncieux, Encouraging behavioral diversity in evolutionary robotics: an empirical study. Evol. Comput. 20(1), 91–133 (2012)CrossRefGoogle Scholar
  21. 21.
    R. Myers, E.R. Hancock, Empirical modelling of genetic algorithms. Evol. Comput. 9(4), 461–493 (2001)CrossRefGoogle Scholar
  22. 22.
    R.G. Newcombe, Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat. Med. 17(8), 857–872 (1998)CrossRefGoogle Scholar
  23. 23.
    J. Niehaus, W. Banzhaf, More on computational effort statistics for genetic programming. in Genetic Programming, Proceedings of EuroGP’2003, LNCS, vol. 2610. (Springer, Essex, 2003), pp. 164–172Google Scholar
  24. 24.
    R. Poli, L. Vanneschi, W. Langdon, N. McPhee, Theoretical results in Genetic Programming: the next ten years? Genet. Program Evolvable Mach. 11(3), 285–320 (2010)CrossRefGoogle Scholar
  25. 25.
    R. Sharma, Bayes approach to interval estimation of a binomial parameter. Ann. Inst. Stat. Math. 27(1), 259–267 (1975)CrossRefzbMATHGoogle Scholar
  26. 26.
    M. Walker, H. Edwards, C. Messom, The reliability of confidence intervals for computational effort comparisons. in Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO 2007). (ACM, New York, NY, 2007), pp. 1716–1723Google Scholar
  27. 27.
    M. Walker, H. Edwards, C.H. Messom, Confidence intervals for computational effort comparisons. in EuroGP, pp. 23–32 (2007)Google Scholar
  28. 28.
    E.B. Wilson, Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22, 309–316 (1927)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • David F. Barrero
    • 1
  • Bonifacio Castaño
    • 2
  • María D. R-Moreno
    • 1
  • David Camacho
    • 3
  1. 1.Departamento de AutomáticaUniversidad de AlcaláMadridSpain
  2. 2.Departamento de MatemáticasUniversidad de AlcaláMadridSpain
  3. 3.Departamento de InformáticaUniversidad Autonónoma de MadridMadridSpain

Personalised recommendations