Advertisement

Cluster Analysis of a Symbolic Regression Search Space

  • Gabriel KronbergerEmail author
  • Lukas Kammerer
  • Bogdan Burlacu
  • Stephan M. Winkler
  • Michael Kommenda
  • Michael Affenzeller
Chapter
Part of the Genetic and Evolutionary Computation book series (GEVO)

Abstract

In this chapter we take a closer look at the distribution of symbolic regression models generated by genetic programming in the search space. The motivation for this work is to improve the search for well-fitting symbolic regression models by using information about the similarity of models that can be precomputed independently from the target function. For our analysis, we use a restricted grammar for uni-variate symbolic regression models and generate all possible models up to a fixed length limit. We identify unique models and cluster them based on phenotypic as well as genotypic similarity. We find that phenotypic similarity leads to well-defined clusters while genotypic similarity does not produce a clear clustering. By mapping solution candidates visited by GP to the enumerated search space we find that GP initially explores the whole search space and later converges to the subspace of highest quality expressions in a run for a simple benchmark problem.

Notes

Acknowledgements

The authors thank the participants of the Genetic Programming in Theory and Practice (GPTP XVI) workshop for their valuable feedback and ideas which helped to improve the work described in this chapter. The authors gratefully acknowledge support by the Christian Doppler Research Association and the Federal Ministry for Digital and Economic Affairs within the Josef Ressel Center for Symbolic Regression.

References

  1. 1.
    Burke, E.K., Gustafson, S., Kendall, G.: Diversity in genetic programming: An analysis of measures and correlation with fitness. IEEE Transactions on Evolutionary Computation 8(1), 47–62 (2004).  https://doi.org/10.1109/TEVC.2003.819263 CrossRefGoogle Scholar
  2. 2.
    Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: J. Pei, V.S. Tseng, L. Cao, H. Motoda, G. Xu (eds.) Advances in Knowledge Discovery and Data Mining, pp. 160–172. Springer Berlin Heidelberg, Berlin, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC’08), pp. 537–546. ACM (2008)Google Scholar
  4. 4.
    Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: European Conference on Genetic Programming, pp. 70–82. Springer (2003)Google Scholar
  5. 5.
    Kommenda, M., Kronberger, G., Winkler, S., Affenzeller, M., Wagner, S.: Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1121–1128. ACM (2013)Google Scholar
  6. 6.
    Luke, S.: Two fast tree-creation algorithms for genetic programming. IEEE Transactions on Evolutionary Computation 4(3), 274–283 (2000)CrossRefGoogle Scholar
  7. 7.
    Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. Journal of Machine Learning Research 9(Nov), 2579–2605 (2008)Google Scholar
  8. 8.
    McInnes, L., Healy, J.: Accelerated hierarchical density based clustering. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 33–42 (2017).  https://doi.org/10.1109/ICDMW.2017.12
  9. 9.
    McInnes, L., Healy, J., Astels, S.: hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2(11) (2017).  https://doi.org/10.21105/joss.00205 CrossRefGoogle Scholar
  10. 10.
    Pagie, L., Hogeweg, P.: Evolutionary consequences of coevolving targets. Evolutionary Computation 5(4), 401–418 (1997)CrossRefGoogle Scholar
  11. 11.
    Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World Wide Web, WWW ‘16, pp. 287–297. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016). https://doi.org/10.1145/2872427.2883041
  12. 12.
    Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12(2), 91–119 (2011)CrossRefGoogle Scholar
  13. 13.
    Valiente, G.: An efficient bottom-up distance between trees. In: Proc. 8th Int. Symposium on String Processing and Information Retrieval, pp. 212–219. IEEE Computer Science Press (2001)Google Scholar
  14. 14.
    Worm, T., Chiu, K.: Prioritized grammar enumeration: symbolic regression by dynamic programming. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1021–1028. ACM (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Gabriel Kronberger
    • 1
    • 2
    Email author
  • Lukas Kammerer
    • 1
    • 2
    • 3
  • Bogdan Burlacu
    • 1
    • 2
    • 3
  • Stephan M. Winkler
    • 1
    • 3
  • Michael Kommenda
    • 1
    • 2
    • 3
  • Michael Affenzeller
    • 1
    • 3
  1. 1.Heuristic and Evolutionary Algorithms Laboratory (HEAL)University of Applied Sciences Upper AustriaHagenbergAustria
  2. 2.Josef Ressel Center for Symbolic RegressionUniversity of Applied Sciences Upper AustriaHagenbergAustria
  3. 3.Institute for Formal Models and VerificationJohannes Kepler UniversityLinzAustria

Personalised recommendations