MetaBags: Bagged Meta-Decision Trees for Regression
- 1.6k Downloads
Methods for learning heterogeneous regression ensembles have not yet been proposed on a large scale. Hitherto, in classical ML literature, stacking, cascading and voting are mostly restricted to classification problems. Regression poses distinct learning challenges that may result in poor performance, even when using well established homogeneous ensemble schemas such as bagging or boosting. In this paper, we introduce MetaBags, a novel stacking framework for regression. MetaBags learns a set of meta-decision trees designed to select one base model (i.e. expert) for each query, and focuses on inductive bias reduction. Finally, these predictions are aggregated into a single prediction through a bagging procedure at meta-level. MetaBags is designed to learn a model with a fair bias-variance trade-off, and its improvement over base model performance is correlated with the prediction diversity of different experts on specific input space subregions. An exhaustive empirical testing of the method was performed, evaluating both generalization error and scalability of the approach on open, synthetic and real-world application datasets. The obtained results show that our method outperforms existing state-of-the-art approaches.
KeywordsStacking Regression Meta-learning Landmarking
S.D. and B.Ž. are supported by The Slovenian Research Agency (grant P2-0103). B.Ž. is additionally supported by the European Commission (grant 769661 SAAM). S.D. further acknowledges support by the Slovenian Research Agency (via grants J4-7362, L2-7509, and N2-0056), the European Commission (projects HBP SGA2 and LANDMARK), ARVALIS (project BIODIV) and the INTERREG (ERDF) Italy-Slovenia project TRAIN.
- 2.Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd ICML, pp. 97–104. ACM (2006)Google Scholar
- 6.Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees (cart) wadsworth international group, CA, USA, Belmont (1984)Google Scholar
- 11.Drucker, H., Burges, C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: NIPS, pp. 155–161 (1997)Google Scholar
- 12.Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: NIPS, pp. 2962–2970 (2015)Google Scholar
- 13.Feurer, M., Springenberg, J., Hutter, F.: Initializing Bayesian hyperparameter optimization via meta-learning. In: AAAI, pp. 1128–1135(2015)Google Scholar
- 19.Hassan, S.M., Moreira-Matias, L., Khiari, J., Cats, O.: Feature selection issues in long-term travel time prediction. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 98–109. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46349-0_9CrossRefGoogle Scholar
- 21.Kaggle Inc.: https://www.kaggle.com/bigfatdata/what-algorithms-are-most-successful-on-kaggle. Technical report (Eletronic, Accessed in March 2018)
- 27.Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: ICML, pp. 743–750 (2000)Google Scholar
- 28.Schaffer, C.: A conservation law for generalization performance. In: Machine Learning Proceedings 1994, pp. 259–265. Elsevier (1994)Google Scholar
- 31.Torgo, L.: Regression data sets. Eletronic (last access at 02/2018) (February 2018). http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html