Extreme Accuracy in Symbolic Regression

Chapter
Part of the Genetic and Evolutionary Computation book series (GEVO)

Abstract

Although recent advances in symbolic regression (SR) have promoted the field into the early stages of commercial exploitation, the poor accuracy of SR is still plaguing even the most advanced commercial packages today. Users expect to have the correct formula returned, especially in cases with zero noise and only one basis function with minimally complex grammar depth. Poor accuracy is a hindrance to greater academic and industrial acceptance of SR tools.

In a previous paper, the poor accuracy of Symbolic Regression was explored, and several classes of test formulas, which prove intractable for SR, were examined. An understanding of why these test problems prove intractable was developed. In another paper a baseline Symbolic Regression algorithm was developed with specific techniques for optimizing embedded real numbers constants. These previous steps have placed us in a position to make an attempt at vanquishing the SR accuracy problem.

In this chapter we develop a complex algorithm for modern symbolic regression which is extremely accurate for a large class of Symbolic Regression problems. The class of problems, on which SR is extremely accurate, is described in detail. A definition of extreme accuracy is provided, and an informal argument of extreme SR accuracy is outlined in this chapter. Given the critical importance of accuracy in SR, it is our suspicion that in the future all commercial Symbolic Regression packages will use this algorithm or a substitute for this algorithm.

Keywords

Abstract expression grammars Grammar template genetic programming Genetic algorithms Particle swarm Symbolic regression 

References

  1. Hornby GS (2006) ALPS: the age-layered population structure for reducing the problem of premature convergence. In: Keijzer M, Cattolico M, Arnold D, Babovic V, Blum C, Bosman P, Butz MV, Coello Coello C, Dasgupta D, Ficici SG, Foster J, Hernandez-Aguirre A, Hornby G, Lipson H, McMinn P, Moore J, Raidl G, Rothlauf F, Ryan C, Thierens D (eds) GECCO 2006: proceedings of the 8th annual conference on genetic and evolutionary computation, Seattle, vol 1. ACM, pp 815–822. doi:10.1145/1143997.1144142, http://www.cs.bham.ac.uk/~wbl/biblio/gecco2006/docs/p815.pdf
  2. Korns MF (2010) Abstract expression grammar symbolic regression. In: Riolo R, McConaghy T, Vladislavleva E (eds) Genetic programming theory and practice VIII, Ann Arbor. Genetic and evolutionary computation, vol 8. Springer, chap 7, pp 109–128. http://www.springer.com/computer/ai/book/978-1-4419-7746-5
  3. Korns MF (2011) Accuracy in symbolic regression. In: Riolo R, Vladislavleva E, Moore JH (eds) Genetic programming theory and practice IX, Ann Arbor. Genetic and evolutionary computation. Springer, chap 8, pp 129–151. doi:10.1007/978-1-4614-1770-5-8Google Scholar
  4. Korns MF (2012) A baseline symbolic regression algorithm. In: Genetic programming theory and practice X. SpringerGoogle Scholar
  5. Kotanchek M, Smits G, Vladislavleva E (2007) Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice V, Ann Arbor. Genetic and evolutionary computation. Springer, chap 12, pp 201–220. doi:10.1007/978-0-387-76308-8-12Google Scholar
  6. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT, CambridgeMATHGoogle Scholar
  7. McConaghy T (2011) FFX: fast, scalable, deterministic symbolic regression technology. In: Riolo R, Vladislavleva E, Moore JH (eds) Genetic programming theory and practice IX, Ann Arbor. Genetic and evolutionary computation. Springer, chap 13, pp 235–260. doi:10.1007/978-1-4614-1770-5-13Google Scholar
  8. Nelder J, Wedderburn R (1972) Generalized linear models. J R Stat Soc Ser A 135:370–384CrossRefGoogle Scholar
  9. Schmidt M, Lipson H (2010) Age-fitness pareto optimization. In: Riolo R, McConaghy T, Vladislavleva E (eds) Genetic programming theory and practice VIII, Ann Arbor. Genetic and evolutionary computation, vol 8. Springer, chap 8, pp 129–146. http://www.springer.com/computer/ai/book/978-1-4419-7746-5
  10. Smits G, Kotanchek M (2004) Pareto-front exploitation in symbolic regression. In: O’Reilly UM, Yu T, Riolo RL, Worzel B (eds) Genetic programming theory and practice II, Ann Arbor. Springer, chap 17, pp 283–299. doi:10.1007/0-387-23254-0-17Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Analytic Research FoundationMakatiPhilippines

Personalised recommendations