Modifying standard gradient boosting by replacing the embedded weak learner in favor of a strong(er) one, we present SyRBo: symbolic-regression boosting. Experiments over 98 regression datasets show that by adding a small number of boosting stages—between 2 and 5—to a symbolic regressor, statistically significant improvements can often be attained. We note that coding SyRBo on top of any symbolic regressor is straightforward, and the added cost is simply a few more evolutionary rounds. SyRBo is essentially a simple add-on that can be readily added to an extant symbolic regressor, often with beneficial results.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. ACM, New York, NY, USA, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
M. Fink, P. Perona, Mutual boosting for contextual inference, in Advances in Neural Information Processing Systems. ed. by S. Thrun, L.K. Saul, B. Schölkopf, vol. 16, pp. 1515–1522 (2004). https://proceedings.neurips.cc/paper/2003/file/070dbb6024b5ef93784428afc71f2146-Paper.pdf
Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
J.H. Friedman, Greedy function approximation: a gradient boosting machine. Ann. stat. 29, 1189–1232 (2001)
GPLearn. https://gplearn.readthedocs.io/ (2020). Accessed 20 Nov 2020
M.B. Harries, Boosting a strong learner: evidence against the minimum margin, in Proceedings of the 16th International Conference on Machine Learning, ICML ’99. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 171–180 (1999)
H. Iba, Bagging, boosting, and bloating in genetic programming, in Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol. 2, pp. 1053–1060 (1999)
S. Karakatič, V. Podgorelec, Building boosted classification tree ensemble with genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 165–166 (2018)
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.Y. Liu, LightGBM: a highly efficient gradient boosting decision tree, in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp. 3149–3157 (2017)
E. Oliveira, A. Pozo, S.R. Vergilio, Using boosting techniques to improve software reliability models based on genetic programming, in 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06). IEEE, pp. 643–650 (2006)
L.O.V. Oliveira, F.E. Otero, G.L. Pappa, J. Albinati, Sequential symbolic regression with genetic programming, in Genetic Programming Theory and Practice XII. ed. by R. Riolo, W.P. Worzel, M. Kotanchek (Springer International Publishing, Cham, 2015), pp. 73–90
P. Orzechowski, W. La Cava, J.H. Moore, Where are we now? a large benchmark study of recent symbolic regression methods, in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1183–1190 (2018)
G. Paris, D. Robilliard, C. Fonlupt, Applying boosting techniques to genetic programming, in International Conference on Artificial Evolution (Evolution Artificielle). Springer, pp. 267–278 (2001)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Scikit-learn: machine learning in python. https://scikit-learn.org/ (2020). Accessed 20 Nov 2020
J. Wickramaratna, S. Holden, B. Buxton, Performance degradation in boosting, in International Workshop on Multiple Classifier Systems. Springer, pp. 11–21 (2001)
This work was supported by National Institutes of Health (USA) Grants LM010098, LM012601, AI116794. We thank Hagai Ravid for spotting an error in an earlier version of the code.
Appendix: detailed results
Appendix: detailed results
The results of all experiments over all datasets are given in Tables 3, 4, 5, and 6 for number of stages equal to 2, 3, 4, and 5, respectively. As noted in Sect. 3, for each of the 98 datasets we recorded the mean absolute error attained per algorithm over each of the 30 replicate runs, per each of the 5 test folds. We then computed the median of these scores, which are presented under ‘mean absolute error’ in the tables. Under ‘pval’ we show the results of the 10,000-round permutation tests between the scores of SyRBo and SymbolicRegressor, with a ‘!’ denoting a significant win for SyRBo and a ‘=’ denoting an insignificant loss for SyRBo. Under ‘run times’ we show the median run times for SyRBo and SymbolicRegressor. ‘SR’ denotes SymbolicRegressor.
About this article
Cite this article
Sipper, M., Moore, J.H. Symbolic-regression boosting. Genet Program Evolvable Mach 22, 357–381 (2021). https://doi.org/10.1007/s10710-021-09400-0