Advertisement

MetaBags: Bagged Meta-Decision Trees for Regression

  • Jihed KhiariEmail author
  • Luis Moreira-Matias
  • Ammar Shaker
  • Bernard Ženko
  • Sašo Džeroski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

Methods for learning heterogeneous regression ensembles have not yet been proposed on a large scale. Hitherto, in classical ML literature, stacking, cascading and voting are mostly restricted to classification problems. Regression poses distinct learning challenges that may result in poor performance, even when using well established homogeneous ensemble schemas such as bagging or boosting. In this paper, we introduce MetaBags, a novel stacking framework for regression. MetaBags learns a set of meta-decision trees designed to select one base model (i.e. expert) for each query, and focuses on inductive bias reduction. Finally, these predictions are aggregated into a single prediction through a bagging procedure at meta-level. MetaBags is designed to learn a model with a fair bias-variance trade-off, and its improvement over base model performance is correlated with the prediction diversity of different experts on specific input space subregions. An exhaustive empirical testing of the method was performed, evaluating both generalization error and scalability of the approach on open, synthetic and real-world application datasets. The obtained results show that our method outperforms existing state-of-the-art approaches.

Keywords

Stacking Regression Meta-learning Landmarking 

Notes

Acknowledgments

S.D. and B.Ž. are supported by The Slovenian Research Agency (grant P2-0103). B.Ž. is additionally supported by the European Commission (grant 769661 SAAM). S.D. further acknowledges support by the Slovenian Research Agency (via grants J4-7362, L2-7509, and N2-0056), the European Commission (projects HBP SGA2 and LANDMARK), ARVALIS (project BIODIV) and the INTERREG (ERDF) Italy-Slovenia project TRAIN.

References

  1. 1.
    Bell, R., Koren, Y.: Lessons from the netflix prize challenge. ACM SIGKDD Explor. Newsl. 9(2), 75–79 (2007)Google Scholar
  2. 2.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd ICML, pp. 97–104. ACM (2006)Google Scholar
  3. 3.
    Brazdil, P., Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008)zbMATHGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)zbMATHGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees (cart) wadsworth international group, CA, USA, Belmont (1984)Google Scholar
  7. 7.
    Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)zbMATHGoogle Scholar
  8. 8.
    Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)Google Scholar
  10. 10.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)zbMATHGoogle Scholar
  11. 11.
    Drucker, H., Burges, C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: NIPS, pp. 155–161 (1997)Google Scholar
  12. 12.
    Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: NIPS, pp. 2962–2970 (2015)Google Scholar
  13. 13.
    Feurer, M., Springenberg, J., Hutter, F.: Initializing Bayesian hyperparameter optimization via meta-learning. In: AAAI, pp. 1128–1135(2015)Google Scholar
  14. 14.
    Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Friedman, J.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Friedman, J., Stuetzle, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76(376), 817–823 (1981)MathSciNetGoogle Scholar
  18. 18.
    Gama, J., Brazdil, P.: Cascade generalization. Mach. Learn. 41(3), 315–343 (2000)zbMATHGoogle Scholar
  19. 19.
    Hassan, S.M., Moreira-Matias, L., Khiari, J., Cats, O.: Feature selection issues in long-term travel time prediction. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 98–109. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46349-0_9Google Scholar
  20. 20.
    Hastie, T., Tibshirani, R.: Generalized additive models: some applications. J. Am. Stat. Assoc. 82(398), 371–386 (1987)zbMATHGoogle Scholar
  21. 21.
    Kaggle Inc.: https://www.kaggle.com/bigfatdata/what-algorithms-are-most-successful-on-kaggle. Technical report (Eletronic, Accessed in March 2018)
  22. 22.
    Kiefer, J.: Sequential minimax search for a maximum. Proc. Am. Math. Soc. 4(3), 502–506 (1953)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Lemke, C., Budka, M., Gabrys, B.: Metalearning: a survey of trends and technologies. Artif. Intell. Rev. 44(1), 117–130 (2015)Google Scholar
  24. 24.
    Mendes-Moreira, J., Soares, C., Jorge, A., Sousa, J.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)zbMATHGoogle Scholar
  25. 25.
    Merz, C.: Dynamical Selection of Learning Algorithms, pp. 281–290 (1996)Google Scholar
  26. 26.
    Moreira-Matias, L., Mendes-Moreira, J., Freire de Sousa, J., Gama, J.: On improving mass transit operations by using AVL-based systems: a survey. IEEE Trans. Intell. Transp. Syst. 16(4), 1636–1653 (2015)Google Scholar
  27. 27.
    Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: ICML, pp. 743–750 (2000)Google Scholar
  28. 28.
    Schaffer, C.: A conservation law for generalization performance. In: Machine Learning Proceedings 1994, pp. 259–265. Elsevier (1994)Google Scholar
  29. 29.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B (Methodological) 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Todorovski, L., Dzeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)zbMATHGoogle Scholar
  31. 31.
    Torgo, L.: Regression data sets. Eletronic (last access at 02/2018) (February 2018). http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html
  32. 32.
    Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Dynamic integration with random forests. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 801–808. Springer, Heidelberg (2006).  https://doi.org/10.1007/11871842_82Google Scholar
  33. 33.
    Wolpert, D.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jihed Khiari
    • 1
    Email author
  • Luis Moreira-Matias
    • 1
  • Ammar Shaker
    • 1
  • Bernard Ženko
    • 2
  • Sašo Džeroski
    • 2
  1. 1.NEC Laboratories Europe GmbHHeidelbergGermany
  2. 2.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations