Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models

Kotanchek, Mark; Smits, Guido; Vladislavleva, Ekaterina

doi:10.1007/978-0-387-76308-8_12

Mark Kotanchek⁶,
Guido Smits⁷ &
Ekaterina Vladislavleva⁸

Part of the book series: Genetic and Evolutionary Computation Series ((GEVO))

835 Accesses
27 Citations

Trust is a major issue with deploying empirical models in the real world since changes in the underlying system or use of the model in new regions of parameter space can produce (potentially dangerous) incorrect predictions. The trepidation involved with model usage can be mitigated by assembling ensembles of diverse models and using their consensus as a trust metric, since these models will be constrained to agree in the data region used for model development and also constrained to disagree outside that region. The problem is to define an appropriate model complexity (since the ensemble should consist of models of similar complexity), as well as to identify diverse models from the candidate model set.

In this chapter we discuss strategies for the development and selection of robust models and model ensembles and demonstrate those strategies against industrial data sets. An important benefit of this approach is that all available data may be used in the model development rather than a partition into training, test and validation subsets. The result is constituent models are more accurate without risk of over-fitting, the ensemble predictions are more accurate and the ensemble predictions have a meaningful trust metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Castillo, Flor, Kordon, Arthur, Sweeney, Jeff, and Zirk, Wayne (2004). Using genetic programming in industrial statistical model building. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 3, pages 31-48. Springer, Ann Arbor.
Google Scholar
Hamill, Thomas (2002). An overview of ensemble forecasting and data assimilation. In Preprints of the 14th conference on Numerical Weather Prediction, Ft.Lauderdale, USA. American Meteorological Society.
Google Scholar
Keijzer, Maarten (2003). Improving symbolic regression with interval arithmetic and linear scaling. In Ryan, Conor, Soule, Terence, Keijzer, Maarten, Tsang, Edward, Poli, Riccardo, and Costa, Ernesto, editors, Genetic Programming, Proceedings of EuroGP’2003, volume 2610 of LNCS, pages 70-82, Essex. Springer-Verlag.
Google Scholar
Kordon, Arthur, Smits, Guido, Kalos, Alex, and Jordaan, Elsa (2003). Robust soft sensor development using genetic programming. In Leardi, R., editor, Nature-Inspired Methods in Chemometrics: Genetic Algorithms and Artificial Neural Networks. Elsevier, Amsterdam.
Google Scholar
Kordon, Arthur, Smits, Guido, and Kotanchek, Mark (2006). Industrial evolutionary computing. In GECCO 2006: Tutorials of the 8th annual conference on Genetic and evolutionary computation, Seattle, Washington, USA. ACM Press.
Google Scholar
Korns, Michael F. (2006). Large-scale, time-constrained symbolic regression. In Riolo, Rick L., Soule, Terence, and Worzel, Bill, editors, Genetic Programming Theory and Practice IV, volume 5 of Genetic and Evolutionary Computation, chapter 16. Springer, Ann Arbor.
Google Scholar
Kotanchek, Mark, Smits, Guido, and Vladislavleva, Ekaterina (2006). Pursuing the pareto paradigm tournaments, algorithm variations & ordinal optimization. In Riolo, Rick L., Soule, Terence, and Worzel, Bill, editors, Genetic Programming Theory and Practice IV, volume 5 of Genetic and Evolutionary Computation, chapter 3. Springer, Ann Arbor.
Google Scholar
Smits, Guido and Vladislavleva, Ekaterina (2006). Ordinal pareto genetic programming. In Proceedings of the 2006 IEEE Congress on Evolutionary Computation, Vancouver. IEEE Press.
Google Scholar
DataModeler (2007). Add-on analysis package for Mathematica.
Google Scholar
Vladislavleva, Ekaterina and Smits, Guido (2007). Order of non-linearity as a complexity measure for models generated by symbolic regression via genetic programming. In review at IEEE Trans. on Evolutionary Computation (sumbitted).
Google Scholar
Wichard, Joerg (2006). Model selection in an ensemble framework. In Proceedings of the IEEE World Congress on Computational Intelligence WCCI 2006, Vancouver, Canada.
Google Scholar

Download references

Author information

Authors and Affiliations

Evolved Analytics (LLC), Midland, MI, USA
Mark Kotanchek
Dow Benelux B.V, Terneuzen, The Netherlands
Guido Smits
Tilburg University, Tilburg, Netherlands
Ekaterina Vladislavleva

Authors

Mark Kotanchek
View author publications
You can also search for this author in PubMed Google Scholar
Guido Smits
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Vladislavleva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for the Study of Complex Systems, University of Michigan, 323 West Hall, Ann Arbor, MI 48109
Rick Riolo
Department of Computer Science, University of Idaho, Janssen Engineering Building, Moscow, ID 83844-1010
Terence Soule
Genetics Squared, 401 W. Morgan Rd., Ann Arbor, MI 48108
Bill Worzel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kotanchek, M., Smits, G., Vladislavleva, E. (2008). Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo, R., Soule, T., Worzel, B. (eds) Genetic Programming Theory and Practice V. Genetic and Evolutionary Computation Series. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76308-8_12

Download citation

DOI: https://doi.org/10.1007/978-0-387-76308-8_12
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76307-1
Online ISBN: 978-0-387-76308-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics