Advertisement

Constructive Aggregation and Its Application to Forecasting with Dynamic Ensembles

  • Vitor CerqueiraEmail author
  • Fabio Pinto
  • Luis Torgo
  • Carlos Soares
  • Nuno Moniz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)

Abstract

While the predictive advantage of ensemble methods is nowadays widely accepted, the most appropriate way of estimating the weights of each individual model remains an open research question. Meanwhile, several studies report that combining different ensemble approaches leads to improvements in performance, due to a better trade-off between the diversity and the error of the individual models in the ensemble. We contribute to this research line by proposing an aggregation framework for a set of independently created forecasting models, i.e. heterogeneous ensembles. The general idea is to, instead of directly aggregating these models, first rearrange them into different subsets, creating a new set of combined models which is then aggregated into a final decision. We present this idea as constructive aggregation, and apply it to time series forecasting problems. Results from empirical experiments show that applying constructive aggregation to state of the art dynamic aggregation methods provides a consistent advantage. Constructive aggregation is publicly available in a software package. Data related to this paper are available at: https://github.com/vcerqueira/timeseriesdata. Code related to this paper is available at: https://github.com/vcerqueira/tsensembler.

Keywords

Ensemble learning Forecasting Constructive induction Regression Dynamic expert aggregation 

Notes

Acknowledgements

This work is financed by Project “Coral - Sustainable Ocean Exploitation: Tools and Sensors/NORTE-01-0145-FEDER-000036”, which is financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF).

References

  1. 1.
    Aiolfi, M., Timmermann, A.: Persistence in forecasting performance and conditional combination strategies. J. Econ. 135(1), 31–53 (2006)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)CrossRefGoogle Scholar
  3. 3.
    Brown, G.: Ensemble learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 312–320. Springer, Boston (2010).  https://doi.org/10.1007/978-0-387-30164-8_252CrossRefGoogle Scholar
  4. 4.
    Brown, G., Wyatt, J.L., Tiňo, P.: Managing diversity in regression ensembles. J. Mach. Learn. Res. 6, 1621–1650 (2005)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 18. ACM (2004)Google Scholar
  6. 6.
    Cerqueira, V., Torgo, L., Smailović, J., Mozetič, I.: A comparative study of performance estimation methods for time series forecasting, pp. 529–538. IEEE (2017)Google Scholar
  7. 7.
    Cerqueira, V., Torgo, L., Pinto, F., Soares, C.: Arbitrated ensemble for time series forecasting. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 478–494. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-71246-8_29CrossRefGoogle Scholar
  8. 8.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)CrossRefGoogle Scholar
  9. 9.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)CrossRefGoogle Scholar
  12. 12.
    Gaillard, P., Goude, Y.: Forecasting electricity consumption by aggregating experts; how to design a good set of experts. In: Antoniadis, A., Poggi, J.-M., Brossat, X. (eds.) Modeling and Stochastic Learning for Forecasting in High Dimensions. LNS, vol. 217, pp. 95–115. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18732-7_6CrossRefGoogle Scholar
  13. 13.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)CrossRefGoogle Scholar
  14. 14.
    Hyndman, R.: Time series data library. http://data.is/TSDLdemo. Accessed 11 Dec 2017
  15. 15.
    Hyndman, R.J., et al.: forecast: Forecasting functions for time series and linear models, R package version 5.6 (2014)Google Scholar
  16. 16.
    Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004)CrossRefGoogle Scholar
  17. 17.
    Kennel, M.B., Brown, R., Abarbanel, H.D.: Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys. Rev. A 45(6), 3403 (1992)CrossRefGoogle Scholar
  18. 18.
    Kuhn, M., Weston, S., Keefer, C., Coulter N.: C code for Cubist by Ross Quinlan. In: Cubist: Rule-and Instance-Based Regression Modeling, R package version 0.0.18 (2014)Google Scholar
  19. 19.
    Kuncheva, L.I.: A theoretical study on six classifier fusion strategies. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 281–286 (2002)CrossRefGoogle Scholar
  20. 20.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New York (2004)CrossRefGoogle Scholar
  21. 21.
    Lichman, M.: UCI machine learning repository (2013). https://archive.ics.uci.edu/ml
  22. 22.
    Mevik, B.H., Wehrens, R., Liland, K.H.: pls: Partial Least Squares and Principal Component Regression, r package version 2.6-0 (2016)Google Scholar
  23. 23.
    Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models. Derived from mda:mars by Trevor Hastie and Rob Tibshirani (2012)Google Scholar
  24. 24.
    Newbold, P., Granger, C.W.: Experience with forecasting univariate time series and the combination of forecasts. J. R. Stat. Society. Ser. (Gen.) 137, 131–165 (1974)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Pfahringer, B.: Winning the KDD99 classification cup: bagged boosting. ACM SIGKDD Explor. Newsl. 1(2), 65–66 (2000)CrossRefGoogle Scholar
  26. 26.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Austria, Vienna (2013)Google Scholar
  27. 27.
    van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: The online performance estimation framework: heterogeneous ensemble learning for data streams. Mach. Learn. 107, 1–28 (2018)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Timmermann, A.: Forecast combinations. In: Handbook of Economic Forecasting, vol. 1, pp. 135–196 (2006)CrossRefGoogle Scholar
  29. 29.
    Webb, G.I.: Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)CrossRefGoogle Scholar
  30. 30.
    Webb, G.I., Zheng, Z.: Multistrategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE Trans. Knowl. Data Eng. 16(8), 980–991 (2004)CrossRefGoogle Scholar
  31. 31.
    Wnek, J., Michalski, R.S.: Hypothesis-driven constructive induction in AQ17-HCI: a method and experiments. Mach. Learn. 14(2), 139–168 (1994)CrossRefGoogle Scholar
  32. 32.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRefGoogle Scholar
  33. 33.
    Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)CrossRefGoogle Scholar
  34. 34.
    Wright, M.N.: ranger: A Fast Implementation of Random Forests, R package (2015)Google Scholar
  35. 35.
    Yu, Y., Zhou, Z.H., Ting, K.M.: Cocktail ensemble for regression. In: 7th IEEE International Conference on Data Mining, ICDM 2007, pp. 721–726. IEEE (2007)Google Scholar
  36. 36.
    Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 928–936 (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Vitor Cerqueira
    • 1
    • 2
    Email author
  • Fabio Pinto
    • 1
  • Luis Torgo
    • 1
    • 2
    • 3
  • Carlos Soares
    • 1
    • 2
  • Nuno Moniz
    • 1
    • 2
  1. 1.University of PortoPortoPortugal
  2. 2.INESC TECPortoPortugal
  3. 3.Dalhousie UniversityHalifaxCanada

Personalised recommendations