Skip to main content

Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication

Part of the Genetic and Evolutionary Computation book series (GEVO)

Abstract

Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available. Such scenarios often require specific model properties such as interpretability, robustness, trustworthiness and plausibility, that are not easily achievable using standard approaches like genetic programming for symbolic regression. In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues. The algorithm uses a context-free grammar to produce models that are parameterized by a non-linear least squares local optimization procedure. A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions. Enumeration order is established via heuristics designed to improve search efficiency. Empirical tests on a comprehensive benchmark suite show that our approach is competitive with genetic programming in many noiseless problems while maintaining desirable properties such as simple, reliable models and reproducibility.

Keywords

  • Symbolic regression
  • Grammar enumeration
  • Graph search

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-39958-0_5
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-39958-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.99
Price excludes VAT (USA)
Hardcover Book
USD   179.99
Price excludes VAT (USA)
Fig. 5.1
Fig. 5.2
Fig. 5.3
Fig. 5.4
Fig. 5.5

Notes

  1. 1.

    http://www.evolved-analytics.com/.

  2. 2.

    https://www.nutonian.com/products/eureqa/.

  3. 3.

    https://dev.heuristiclab.com.

  4. 4.

    https://dev.heuristiclab.com.

References

  1. Affenzeller, M., Winkler, S., Wagner, S., Beham, A.: Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications, Numerical Insights, vol. 6. CRC Press, Chapman & Hall (2009)

    MATH  CrossRef  Google Scholar 

  2. Angeline, P.J., Pollack, J.: Evolutionary module acquisition. In: Proceedings of the Second Annual Conference on Evolutionary Programming, pp. 154–163. La Jolla, CA, USA (1993)

    Google Scholar 

  3. Burlacu, B., Kammerer, L., Affenzeller, M., Kronberger, G.: Hash-based Tree Similarity and Simplification in Genetic Programming for Symbolic Regression. In: Computer Aided Systems Theory, EUROCAST 2019 (2019)

    Google Scholar 

  4. Chen, C., Luo, C., Jiang, Z.: A multilevel block building algorithm for fast modeling generalized separable systems. Expert Systems with Applications 109, 25–34 (2018)

    CrossRef  Google Scholar 

  5. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4(2), 100–107 (1968)

    CrossRef  Google Scholar 

  6. Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Genetic Programming, Proceedings of EuroGP’2003, LNCS, vol. 2610, pp. 70–82. Springer-Verlag, Essex (2003)

    Google Scholar 

  7. Keijzer, M., Babovic, V.: Genetic programming, ensemble methods and the bias/variance tradeoff - introductory investigations. In: Genetic Programming, Proceedings of EuroGP’2000, LNCS, vol. 1802, pp. 76–90. Springer-Verlag, Edinburgh (2000)

    Google Scholar 

  8. Keijzer, M., Ryan, C., Murphy, G., Cattolico, M.: Undirected training of run transferable libraries. In: Proceedings of the 8th European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 3447, pp. 361–370. Springer, Lausanne, Switzerland (2005)

    Google Scholar 

  9. Kommenda, M., Kronberger, G., Winkler, S., Affenzeller, M., Wagner, S.: Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’13 Companion, pp. 1121–1128. ACM (2013)

    Google Scholar 

  10. Korns, M.F.: Symbolic regression using abstract expression grammars. In: GEC ’09: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pp. 859–862. ACM, Shanghai, China (2009)

    Google Scholar 

  11. Korns, M.F.: Abstract expression grammar symbolic regression. In: Genetic Programming Theory and Practice VIII, Genetic and Evolutionary Computation, vol. 8, chap. 7, pp. 109–128. Springer, Ann Arbor, USA (2010)

    Google Scholar 

  12. Korns, M.F.: Extreme accuracy in symbolic regression. In: Genetic Programming Theory and Practice XI, Genetic and Evolutionary Computation, chap. 1, pp. 1–30. Springer, Ann Arbor, USA (2013)

    Google Scholar 

  13. Korns, M.F.: Extremely accurate symbolic regression for large feature problems. In: Genetic Programming Theory and Practice XII, Genetic and Evolutionary Computation, pp. 109–131. Springer, Ann Arbor, USA (2014)

    Google Scholar 

  14. Korns, M.F.: Highly accurate symbolic regression with noisy training data. In: Genetic Programming Theory and Practice XIII, Genetic and Evolutionary Computation, pp. 91–115. Springer, Ann Arbor, USA (2015)

    Google Scholar 

  15. Kotanchek, M., Smits, G., Vladislavleva, E.: Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation, chap. 12, pp. 201–220. Springer, Ann Arbor (2007)

    Google Scholar 

  16. Kotanchek, M.E., Vladislavleva, E., Smits, G.: Symbolic Regression Is Not Enough: It Takes a Village to Raise a Model, pp. 187–203. Springer New York, New York, NY (2013)

    Google Scholar 

  17. Krawiec, K., Pawlak, T.: Locally geometric semantic crossover. In: GECCO Companion ’12: Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, pp. 1487–1488. ACM, Philadelphia, Pennsylvania, USA (2012)

    Google Scholar 

  18. Krawiec, K., Swan, J., O’Reilly, U.M.: Behavioral program synthesis: Insights and prospects. In: Genetic Programming Theory and Practice XIII, Genetic and Evolutionary Computation, pp. 169–183. Springer, Ann Arbor, USA (2015)

    Google Scholar 

  19. Kronberger, G., Kammerer, L., Burlacu, B., Winkler, S.M., Kommenda, M., Affenzeller, M.: Cluster analysis of a symbolic regression search space. In: Genetic Programming Theory and Practice XVI. Springer, Ann Arbor, USA (2018)

    Google Scholar 

  20. Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics 2(2), 164–168 (1944)

    MathSciNet  MATH  CrossRef  Google Scholar 

  21. Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics 11(2), 431–441 (1963)

    MathSciNet  MATH  CrossRef  Google Scholar 

  22. McConaghy, T.: FFX: Fast, scalable, deterministic symbolic regression technology. In: Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, chap. 13, pp. 235–260. Springer, Ann Arbor, USA (2011)

    Google Scholar 

  23. Merkle, R.C.: A digital signature based on a conventional encryption function. In: Advances in Cryptology — CRYPTO ’87, pp. 369–378. Springer Berlin Heidelberg, Berlin, Heidelberg (1988)

    Google Scholar 

  24. Pagie, L., Hogeweg, P.: Evolutionary consequences of coevolving targets. Evolutionary Computation 5(4), 401–418 (1997)

    CrossRef  Google Scholar 

  25. Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: Genetic Programming, Proceedings of EuroGP’2003, LNCS, vol. 2610, pp. 204–217. Springer-Verlag, Essex (2003)

    Google Scholar 

  26. Salustowicz, R.P., Schmidhuber, J.: Probabilistic incremental program evolution. Evolutionary Computation 5(2), 123–141 (1997)

    CrossRef  Google Scholar 

  27. Schmidt, M., Lipson, H.: Co-evolving fitness predictors for accelerating and reducing evaluations. In: Genetic Programming Theory and Practice IV, Genetic and Evolutionary Computation, vol. 5, pp. 113–130. Springer, Ann Arbor (2006)

    Google Scholar 

  28. Schmidt, M., Lipson, H.: Symbolic regression of implicit equations. In: Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, chap. 5, pp. 73–85. Springer, Ann Arbor (2009)

    Google Scholar 

  29. Schmidt, M., Lipson, H.: Age-fitness pareto optimization. In: Genetic Programming Theory and Practice VIII, Genetic and Evolutionary Computation, vol. 8, chap. 8, pp. 129–146. Springer, Ann Arbor, USA (2010)

    Google Scholar 

  30. Smits, G., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: Genetic Programming Theory and Practice II, chap. 17, pp. 283–299. Springer, Ann Arbor (2004)

    Google Scholar 

  31. Stijven, S., Vladislavleva, E., Kordon, A., Kotanchek, M.: Prime-time: Symbolic regression takes its place in industrial analysis. In: Genetic Programming Theory and Practice XIII, Genetic and Evolutionary Computation, pp. 241–260. Springer, Ann Arbor, USA (2015)

    Google Scholar 

  32. Streeter, M.J.: Automated discovery of numerical approximation formulae via genetic programming. Master’s thesis, Computer Science, Worcester Polytechnic Institute, MA, USA (2001)

    Google Scholar 

  33. Topchy, A., Punch, W.F.: Faster genetic programming based on local gradient search of numeric leaf values. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 155–162. Morgan Kaufmann, San Francisco, California, USA (2001)

    Google Scholar 

  34. Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galvan-Lopez, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12(2), 91–119 (2011)

    CrossRef  Google Scholar 

  35. Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via Pareto genetic programming. IEEE Transactions on Evolutionary Computation 13(2), 333–349 (2009)

    CrossRef  Google Scholar 

  36. Wagner, S., Affenzeller, M.: HeuristicLab: A generic and extensible optimization environment. In: Adaptive and Natural Computing Algorithms, pp. 538–541. Springer (2005)

    Google Scholar 

  37. White, D.R., McDermott, J., Castelli, M., Manzoni, L., Goldman, B.W., Kronberger, G., Jaśkowski, W., O’Reilly, U.M., Luke, S.: Better GP benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines 14(1), 3–29 (2013)

    CrossRef  Google Scholar 

  38. Worm, T., Chiu, K.: Prioritized grammar enumeration: symbolic regression by dynamic programming. In: GECCO ’13: Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference, pp. 1021–1028. ACM, Amsterdam, The Netherlands (2013)

    Google Scholar 

  39. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology) 67(2), 301–320 (2005)

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge support by the Christian Doppler Research Association and the Federal Ministry for Digital and Economic Affairs within the Josef Ressel Center for Symbolic Regression.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukas Kammerer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Kammerer, L., Kronberger, G., Burlacu, B., Winkler, S.M., Kommenda, M., Affenzeller, M. (2020). Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39958-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39957-3

  • Online ISBN: 978-3-030-39958-0

  • eBook Packages: Computer ScienceComputer Science (R0)