Skip to main content

Symbolic-regression boosting

Abstract

Modifying standard gradient boosting by replacing the embedded weak learner in favor of a strong(er) one, we present SyRBo: symbolic-regression boosting. Experiments over 98 regression datasets show that by adding a small number of boosting stages—between 2 and 5—to a symbolic regressor, statistically significant improvements can often be attained. We note that coding SyRBo on top of any symbolic regressor is straightforward, and the added cost is simply a few more evolutionary rounds. SyRBo is essentially a simple add-on that can be readily added to an extant symbolic regressor, often with beneficial results.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. ACM, New York, NY, USA, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785

  2. M. Fink, P. Perona, Mutual boosting for contextual inference, in Advances in Neural Information Processing Systems. ed. by S. Thrun, L.K. Saul, B. Schölkopf, vol. 16, pp. 1515–1522 (2004). https://proceedings.neurips.cc/paper/2003/file/070dbb6024b5ef93784428afc71f2146-Paper.pdf

  3. Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  Google Scholar 

  4. J.H. Friedman, Greedy function approximation: a gradient boosting machine. Ann. stat. 29, 1189–1232 (2001)

    Article  MathSciNet  Google Scholar 

  5. GPLearn. https://gplearn.readthedocs.io/ (2020). Accessed 20 Nov 2020

  6. M.B. Harries, Boosting a strong learner: evidence against the minimum margin, in Proceedings of the 16th International Conference on Machine Learning, ICML ’99. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 171–180 (1999)

  7. H. Iba, Bagging, boosting, and bloating in genetic programming, in Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol. 2, pp. 1053–1060 (1999)

  8. S. Karakatič, V. Podgorelec, Building boosted classification tree ensemble with genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 165–166 (2018)

  9. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.Y. Liu, LightGBM: a highly efficient gradient boosting decision tree, in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp. 3149–3157 (2017)

  10. E. Oliveira, A. Pozo, S.R. Vergilio, Using boosting techniques to improve software reliability models based on genetic programming, in 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06). IEEE, pp. 643–650 (2006)

  11. L.O.V. Oliveira, F.E. Otero, G.L. Pappa, J. Albinati, Sequential symbolic regression with genetic programming, in Genetic Programming Theory and Practice XII. ed. by R. Riolo, W.P. Worzel, M. Kotanchek (Springer International Publishing, Cham, 2015), pp. 73–90

    Chapter  Google Scholar 

  12. P. Orzechowski, W. La Cava, J.H. Moore, Where are we now? a large benchmark study of recent symbolic regression methods, in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1183–1190 (2018)

  13. G. Paris, D. Robilliard, C. Fonlupt, Applying boosting techniques to genetic programming, in International Conference on Artificial Evolution (Evolution Artificielle). Springer, pp. 267–278 (2001)

  14. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  15. R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)

    Google Scholar 

  16. Scikit-learn: machine learning in python. https://scikit-learn.org/ (2020). Accessed 20 Nov 2020

  17. J. Wickramaratna, S. Holden, B. Buxton, Performance degradation in boosting, in International Workshop on Multiple Classifier Systems. Springer, pp. 11–21 (2001)

Download references

Acknowledgements

This work was supported by National Institutes of Health (USA) Grants LM010098, LM012601, AI116794. We thank Hagai Ravid for spotting an error in an earlier version of the code.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moshe Sipper.

Appendix: detailed results

Appendix: detailed results

The results of all experiments over all datasets are given in Tables 3,  4,  5, and 6 for number of stages equal to 2, 3, 4, and 5, respectively. As noted in Sect. 3, for each of the 98 datasets we recorded the mean absolute error attained per algorithm over each of the 30 replicate runs, per each of the 5 test folds. We then computed the median of these scores, which are presented under ‘mean absolute error’ in the tables. Under ‘pval’ we show the results of the 10,000-round permutation tests between the scores of SyRBo and SymbolicRegressor, with a ‘!’ denoting a significant win for SyRBo and a ‘=’ denoting an insignificant loss for SyRBo. Under ‘run times’ we show the median run times for SyRBo and SymbolicRegressor. ‘SR’ denotes SymbolicRegressor.

Table 3 2-stage SyRBo: results of all datasets
Table 4 3-stage SyRBo: results of all datasets
Table 5 4-stage SyRBo: results of all datasets
Table 6 5-stage SyRBo: results of all datasets

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sipper, M., Moore, J.H. Symbolic-regression boosting. Genet Program Evolvable Mach 22, 357–381 (2021). https://doi.org/10.1007/s10710-021-09400-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-021-09400-0

Keywords