Abstract
A distinguishing feature of symbolic regression using genetic programming is its ability to identify complex nonlinear white-box models. This is especially relevant in practice where models are extensively scrutinized in order to gain knowledge about underlying processes. This potential is often diluted by the ambiguity and complexity of the models produced by genetic programming. In this contribution we discuss several analysis methods with the common goal to enable better insights in the symbolic regression process and to produce models that are more understandable and show better generalization. In order to gain more information about the process we monitor and analyze the progresses of population diversity, building block information, and even more general genealogy information. Regarding the analysis of results, several aspects such as model simplification, relevance of variables, node impacts, and variable network analysis are presented and discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Affenzeller M, Wagner S (2004) SASEGASA: a new generic parallel evolutionary algorithm for achieving highest quality results. J Heuristics Spec Issue New Adv Parallel Meta-Heuristics Complex Probl 10:239–263
Affenzeller M, Winkler S, Wagner S, Beham A (2009) Genetic algorithms and genetic programming: modern concepts and practical applications. Numerical Insights. CRC, Singapore
Altenberg L (1994) The evolution of evolvability in genetic programming. In: Kinnear KE Jr (ed) Advances in genetic programming. MIT, Cambridge, chap 3, pp 47–74
Banzhaf W, Langdon WB (2002) Some considerations on the reason for bloat. Genet Program Evolvable Mach 3(1):81–91
Burke EK, Gustafson S, Kendall G (2004) Diversity in genetic programming: an analysis of measures and correlation with fitness. IEEE Trans Evol Comput 8(1):47–62
Burlacu B, Affenzeller M, Kommenda M, Winkler SM, Kronberger G (2013) Visualization of genetic lineages and inheritance information in genetic programming. In: Proceedings of the GECCO’13: VizGEC workshop, Amsterdam (accepted to be published)
Ekart A, Nemeth SZ (2000) A metric for genetic programs and fitness sharing. In: Proceedings of EuroGP’2000 genetic programming, Edinburgh. LNCS, vol 1802. Springer, pp 259–270
Essam D, Mckay RI (2004) Heritage diversity in genetic programming. In: 5th international conference on simulated evolution and learning, Busan
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–141
Jackson D (2010) The identification and exploitation of dormancy in genetic programming. Genet Program Evolvable Mach 11(1):89–121
Keijzer M (1996) Efficiently representing populations in genetic programming. In: Angeline PJ, Kinnear KE Jr (eds) Advances in genetic programming 2. MIT, Cambridge, chap 13, pp 259–278
Kotanchek M, Smits G, Vladislavleva E (2007) Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Genetic programming theory and practice V, genetic and evolutionary computation. Springer, Ann Arbor, chap 12, pp 201–220
Kotanchek ME, Vladislavleva E, Smits GF (2013) Symbolic regression is not enough: it takes a village to raise a model. In: Genetic programming theory and practice X, genetic and evolutionary computation, vol 10. Springer, Ann Arbor, chap 13, pp 187–203
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT, Cambridge
Kronberger G (2011) Symbolic regression for knowledge discovery. Schriften der Johannes Kepler Universität Linz, Universitätsverlag Rudolf Trauner
Kronberger G, Fink S, Kommenda M, Affenzeller M (2011) Macro-economic time series modeling and interaction networks. In: EvoApplications (2). Lecture notes in computer science, vol 6625. Springer, Berlin/New York, pp 101–110
Langdon WB, Poli R (2002) Foundations of genetic programming. Springer, Berlin/New York
McPhee NF, Hopper NJ (1999) Analysis of genetic diversity through population history. In: Proceedings of the genetic and evolutionary computation conference, Orlando, vol 2. Kaufmann, pp 1112–1120
Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B 36(1):106–117
Pearl J (2009) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press, New York
Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: proceedings of EuroGP’2003 genetic programming, Essex. LNCS, vol 2610. Springer, pp 204–217
Rosca JP (1995) Entropy-driven adaptive representation. In: Rosca JP (ed) Proceedings of the workshop on genetic programming: from theory to real-world applications, Tahoe City, pp 23–32
Smits G, Kordon A, Vladislavleva K, Jordaan E, Kotanchek M (2005) Variable selection in industrial datasets using pareto genetic programming. In: Yu T, Riolo RL, Worzel B (eds) Genetic programming theory and practice III, genetic programming, vol 9. Springer, Ann Arbor, chap 6, pp 79–92
Stijven S, Minnebo W, Vladislavleva K (2011) Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression. In: 3rd symbolic regression and modeling workshop for GECCO 2011, Dublin. ACM, pp 623–630
Vanneschi L, Gustafson S, Mauri G (2006) Using subtree crossover distance to investigate genetic programming dynamics. In: Proceedings of the 9th European conference on genetic programming, lecture notes in computer science, Budapest, vol 3905. Springer, pp 238–249
Vladislavleva E (2008) Model-based problem solving through symbolic regression via pareto genetic programming. PhD thesis, Tilburg University
Winkler SM (2009) Evolutionary system identification: modern concepts and practical applications. Johannes Kepler University, Linz, Reihe C, vol 59. Trauner, Linz
Winkler SM, Affenzeller M, Kronberger G, Kommenda M, Wagner S, Jacak W, Stekel H (2011) Analysis of selected evolutionary algorithms in feature selection and parameter optimization for data based tumor marker modeling. In: EUROCAST (1). Lecture notes in computer science, vol 6927. Springer, Berlin/New york, pp 335–342
Acknowledgements
The work described in this chapter was done within the Josef Ressel Center for Heuristic Optimization Heureka! sponsored by the Austrian Research Promotion Agency (FFG).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Affenzeller, M., Winkler, S.M., Kronberger, G., Kommenda, M., Burlacu, B., Wagner, S. (2014). Gaining Deeper Insights in Symbolic Regression. In: Riolo, R., Moore, J., Kotanchek, M. (eds) Genetic Programming Theory and Practice XI. Genetic and Evolutionary Computation. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0375-7_10
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0375-7_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0374-0
Online ISBN: 978-1-4939-0375-7
eBook Packages: Computer ScienceComputer Science (R0)