Skip to main content

A Statistical Learning Perspective of Genetic Programming

  • Conference paper
Genetic Programming (EuroGP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5481))

Included in the following conference series:

Abstract

This paper proposes a theoretical analysis of Genetic Programming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical error minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: an introduction. Morgan Kaufmann Publisher Inc., San Francisco (1998)

    Book  MATH  Google Scholar 

  2. Bleuler, S., Brack, M., Thiele, L., Zitzler, E.: Multiobjective genetic programming: Reducing bloat using SPEA2. In: Proceedings of the 2001 Congress on Evolutionary Computation CEC 2001, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, pp. 536–543. IEEE Press, Los Alamitos (2001)

    Google Scholar 

  3. Blickle, T., Thiele, L.: Genetic programming and redundancy. In: Hopf, J. (ed.) Genetic Algorithms Workshop at KI 1994, pp. 33–38. Max-Planck-Institut für Informatik (1994)

    Google Scholar 

  4. Daida, J.M., Bertram, R.R., Stanhope, S.A., Khoo, J.C., Chaudhary, S.A., Chaudhri, O.A., Polito II, J.A.: What makes a problem GP-Hard? Analysis of a tunably difficult problem in genetic programming. Genetic Programming and Evolvable Machines 2(2), 165–191 (2001)

    Article  MATH  Google Scholar 

  5. De Jong, E.D., Watson, R.A., Pollack, J.B.: Reducing bloat and promoting diversity using multi-objective methods. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2001, pp. 11–18. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  6. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1997)

    MATH  Google Scholar 

  7. Ekart, A., Nemeth, S.: Maintaining the diversity of genetic programs. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 162–171. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Gagné, C., Parizeau, M.: Genericity in evolutionary computation software tools: Principles and case study. International Journal on Artificial Intelligence Tools 15(2), 173–194 (2006)

    Article  Google Scholar 

  9. Gustafson, S., Ekart, A., Burke, E., Kendall, G.: Problem difficulty and code growth in genetic programming. Genetic Programming and Evolvable Machines 4(3), 271–290 (2004)

    Article  Google Scholar 

  10. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  11. Langdon, W.B.: The evolution of size in variable length representations. In: IEEE International Congress on Evolutionary Computations (ICEC 1998), pp. 633–638. IEEE Press, Los Alamitos (1998)

    Google Scholar 

  12. Langdon, W.B.: Size fair and homologous tree genetic programming crossovers. Genetic Programming And Evolvable Machines 1(1/2), 95–119 (2000)

    Article  MATH  Google Scholar 

  13. Langdon, W.B., Poli, R.: Fitness causes bloat: Mutation. In: Late Breaking Papers at GP 1997, pp. 132–140. Stanford Bookstore (1997)

    Google Scholar 

  14. Langdon, W.B., Soule, T., Poli, R., Foster, J.A.: The evolution of size and shape. In: Advances in Genetic Programming III, pp. 163–190. MIT Press, Cambridge (1999)

    Google Scholar 

  15. Luke, S., Panait, L.: Lexicographic parsimony pressure. In: GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 829–836. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  16. McPhee, N.F., Miller, J.D.: Accurate replication in genetic programming. In: Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA 1995), Pittsburgh, PA, USA, pp. 303–309. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  17. Nordin, P., Banzhaf, W.: Complexity compression and evolution. In: Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA 1995), Pittsburgh, PA, USA, pp. 310–317. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  18. Ratle, A., Sebag, M.: Avoiding the bloat with probabilistic grammar-guided genetic programming. In: Artificial Evolution VI. Springer, Heidelberg (2001)

    Google Scholar 

  19. Silva, S., Almeida, J.: Dynamic maximum tree depth: A simple technique for avoiding bloat in tree-based GP. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 1776–1787. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  20. Silva, S., Costa, E.: Dynamic limits for bloat control: Variations on size and depth. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3103, pp. 666–677. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. Soule, T.: Exons and code growth in genetic programming. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 142–151. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Soule, T., Foster, J.A.: Effects of code growth and parsimony pressure on populations in genetic programming. Evolutionary Computation 6(4), 293–309 (1998)

    Article  Google Scholar 

  23. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  24. Zhang, B.-T., Mühlenbein, H.: Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3(1) (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amil, N.M., Bredeche, N., Gagné, C., Gelly, S., Schoenauer, M., Teytaud, O. (2009). A Statistical Learning Perspective of Genetic Programming. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds) Genetic Programming. EuroGP 2009. Lecture Notes in Computer Science, vol 5481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01181-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01181-8_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01180-1

  • Online ISBN: 978-3-642-01181-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics