MetaBags: Bagged Meta-Decision Trees for Regression

  • Jihed KhiariEmail author
  • Luis Moreira-Matias
  • Ammar Shaker
  • Bernard Ženko
  • Sašo Džeroski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


Methods for learning heterogeneous regression ensembles have not yet been proposed on a large scale. Hitherto, in classical ML literature, stacking, cascading and voting are mostly restricted to classification problems. Regression poses distinct learning challenges that may result in poor performance, even when using well established homogeneous ensemble schemas such as bagging or boosting. In this paper, we introduce MetaBags, a novel stacking framework for regression. MetaBags learns a set of meta-decision trees designed to select one base model (i.e. expert) for each query, and focuses on inductive bias reduction. Finally, these predictions are aggregated into a single prediction through a bagging procedure at meta-level. MetaBags is designed to learn a model with a fair bias-variance trade-off, and its improvement over base model performance is correlated with the prediction diversity of different experts on specific input space subregions. An exhaustive empirical testing of the method was performed, evaluating both generalization error and scalability of the approach on open, synthetic and real-world application datasets. The obtained results show that our method outperforms existing state-of-the-art approaches.


Stacking Regression Meta-learning Landmarking 



S.D. and B.Ž. are supported by The Slovenian Research Agency (grant P2-0103). B.Ž. is additionally supported by the European Commission (grant 769661 SAAM). S.D. further acknowledges support by the Slovenian Research Agency (via grants J4-7362, L2-7509, and N2-0056), the European Commission (projects HBP SGA2 and LANDMARK), ARVALIS (project BIODIV) and the INTERREG (ERDF) Italy-Slovenia project TRAIN.


  1. 1.
    Bell, R., Koren, Y.: Lessons from the netflix prize challenge. ACM SIGKDD Explor. Newsl. 9(2), 75–79 (2007)CrossRefGoogle Scholar
  2. 2.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd ICML, pp. 97–104. ACM (2006)Google Scholar
  3. 3.
    Brazdil, P., Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008)zbMATHGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees (cart) wadsworth international group, CA, USA, Belmont (1984)Google Scholar
  7. 7.
    Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)zbMATHGoogle Scholar
  8. 8.
    Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
  10. 10.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)zbMATHCrossRefGoogle Scholar
  11. 11.
    Drucker, H., Burges, C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: NIPS, pp. 155–161 (1997)Google Scholar
  12. 12.
    Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: NIPS, pp. 2962–2970 (2015)Google Scholar
  13. 13.
    Feurer, M., Springenberg, J., Hutter, F.: Initializing Bayesian hyperparameter optimization via meta-learning. In: AAAI, pp. 1128–1135(2015)Google Scholar
  14. 14.
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Friedman, J.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Friedman, J., Stuetzle, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76(376), 817–823 (1981)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Gama, J., Brazdil, P.: Cascade generalization. Mach. Learn. 41(3), 315–343 (2000)zbMATHCrossRefGoogle Scholar
  19. 19.
    Hassan, S.M., Moreira-Matias, L., Khiari, J., Cats, O.: Feature selection issues in long-term travel time prediction. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 98–109. Springer, Cham (2016). Scholar
  20. 20.
    Hastie, T., Tibshirani, R.: Generalized additive models: some applications. J. Am. Stat. Assoc. 82(398), 371–386 (1987)zbMATHCrossRefGoogle Scholar
  21. 21.
    Kaggle Inc.: Technical report (Eletronic, Accessed in March 2018)
  22. 22.
    Kiefer, J.: Sequential minimax search for a maximum. Proc. Am. Math. Soc. 4(3), 502–506 (1953)MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    Lemke, C., Budka, M., Gabrys, B.: Metalearning: a survey of trends and technologies. Artif. Intell. Rev. 44(1), 117–130 (2015)CrossRefGoogle Scholar
  24. 24.
    Mendes-Moreira, J., Soares, C., Jorge, A., Sousa, J.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)zbMATHCrossRefGoogle Scholar
  25. 25.
    Merz, C.: Dynamical Selection of Learning Algorithms, pp. 281–290 (1996)CrossRefGoogle Scholar
  26. 26.
    Moreira-Matias, L., Mendes-Moreira, J., Freire de Sousa, J., Gama, J.: On improving mass transit operations by using AVL-based systems: a survey. IEEE Trans. Intell. Transp. Syst. 16(4), 1636–1653 (2015)CrossRefGoogle Scholar
  27. 27.
    Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: ICML, pp. 743–750 (2000)Google Scholar
  28. 28.
    Schaffer, C.: A conservation law for generalization performance. In: Machine Learning Proceedings 1994, pp. 259–265. Elsevier (1994)Google Scholar
  29. 29.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B (Methodological) 58(1), 267–288 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    Todorovski, L., Dzeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)zbMATHCrossRefGoogle Scholar
  31. 31.
    Torgo, L.: Regression data sets. Eletronic (last access at 02/2018) (February 2018).
  32. 32.
    Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Dynamic integration with random forests. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 801–808. Springer, Heidelberg (2006). Scholar
  33. 33.
    Wolpert, D.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jihed Khiari
    • 1
    Email author
  • Luis Moreira-Matias
    • 1
  • Ammar Shaker
    • 1
  • Bernard Ženko
    • 2
  • Sašo Džeroski
    • 2
  1. 1.NEC Laboratories Europe GmbHHeidelbergGermany
  2. 2.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations