Controlling Overfitting in Symbolic Regression Based on a Bias/Variance Error Decomposition

  • Alexandros Agapitos
  • Anthony Brabazon
  • Michael O’Neill
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7491)


We consider the fundamental property of generalisation of data-driven models evolved by means of Genetic Programming (GP). The statistical treatment of decomposing the regression error into bias and variance terms provides insight into the generalisation capability of this modelling method. The error decomposition is used as a source of inspiration to design a fitness function that relaxes the sensitivity of an evolved model to a particular training dataset. Results on eight symbolic regression problems show that new method is capable on inducing better-generalising models than standard GP for most of the problems.


Genetic Programming Training Dataset Regression Error Symbolic Regression Genetic Programming System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agapitos, A., O’Neill, M., Brabazon, A.: Evolutionary Learning of Technical Trading Rules without Data-Mining Bias. In: Schaefer, R., Cotta, C., Kołodziej, J., Rudolph, G. (eds.) PPSN XI, Part I. LNCS, vol. 6238, pp. 294–303. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Agapitos, A., O’Neill, M., Brabazon, A., Theodoridis, T.: Maximum Margin Decision Surfaces for Increased Generalisation in Evolutionary Decision Tree Learning. In: Silva, S., Foster, J.A., Nicolau, M., Machado, P., Giacobini, M. (eds.) EuroGP 2011. LNCS, vol. 6621, pp. 61–72. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Banzhaf, W., Francone, F.D., Nordin, P.: The Effect of Extensive Use of the Mutation Operator on Generalization in Genetic Programming Using Sparse Data Sets. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN IV. LNCS, vol. 1141, pp. 300–309. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  4. 4.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  5. 5.
    Castelli, M., Manzoni, L., Silva, S., Vanneschi, L.: A comparison of the generalization ability of different genetic programming frameworks. In: IEEE Congress on Evolutionary Computation (CEC 2010), July 18-23. IEEE Press, Barcelona (2010)Google Scholar
  6. 6.
    Efron, B., Tibshirani, R.: An introduction to the bootstrap. Chapman and Hall (1993)Google Scholar
  7. 7.
    Keijzer, M.: Improving Symbolic Regression with Interval Arithmetic and Linear Scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 70–82. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Keijzer, M., Babovic, V.: Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff - Introductory Investigations. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 76–90. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via and freely available at (2008),, (With contributions by J. R. Koza)
  10. 10.
    Theodoridis, T., Agapitos, A., Hu, H.: A gaussian groundplan projection area model for evolving probabilistic classifiers. In: Genetic and Evolutionary Computation Conference, GECCO 2011, July 12-16. ACM, Dublin (2011) (forthcoming)Google Scholar
  11. 11.
    Tuite, C., Agapitos, A., O’Neill, M., Brabazon, A.: A Preliminary Investigation of Overfitting in Evolutionary Driven Model Induction: Implications for Financial Modelling. In: Di Chio, C., Brabazon, A., Di Caro, G.A., Drechsler, R., Farooq, M., Grahl, J., Greenfield, G., Prins, C., Romero, J., Squillero, G., Tarantino, E., Tettamanzi, A.G.B., Urquhart, N., Uyar, A.Ş. (eds.) EvoApplications 2011, Part II. LNCS, vol. 6625, pp. 120–130. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Tuite, C., Agapitos, A., O’Neill, M., Brabazon, A.: Tackling Overfitting in Evolutionary-Driven Financial Model Induction. In: Brabazon, A., O’Neill, M., Maringer, D. (eds.) Natural Computing in Computational Finance. SCI, vol. 380, pp. 141–161. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation 13(2), 333–349 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alexandros Agapitos
    • 1
  • Anthony Brabazon
    • 1
  • Michael O’Neill
    • 1
  1. 1.Financial Mathematics and Computation Research Cluster, Natural Computing Research and Applications GroupUniversity College DublinIreland

Personalised recommendations