Skip to main content

2CS: Correlation-Guided Split Candidate Selection in Hoeffding Tree Regressors

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2020)

Abstract

Incremental machine learning algorithms have been effective alternatives to deal with stream data. The Hoeffding Tree framework is one of the most successful solutions for supervised online prediction tasks. Although online regression tasks are present in several forms, and in many real-life problems, most of the research efforts have been devoted to classification. Existing regression tree solutions have strong limitations, mainly regarding their memory usage and running time. Hence, a new algorithm able to address these aspects in Hoeffding Tree Regressors is a relevant research issue. In this paper, we propose 2CS, a correlation-guided strategy to speed up Hoeffding Tree Regressor training. 2CS is conceptually simple and works by avoiding the exhaustive evaluation of all possible features as split candidates, as occurs in the existing solutions. Moreover, 2CS can be easily merged into existing incremental tree solutions and online tree ensembles algorithms, such as bagging and boosting. Throughout an extensive experimental evaluation, we show that the induction of 2CS-based models can be significantly faster than the traditional Hoeffding Tree Regressor algorithms, whereas retaining similar predictive power and memory use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets.php.

  2. 2.

    https://www.openml.org.

References

  1. Barddal, J.P., Enembreck, F.: Learning regularized hoeffding trees from data streams. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 574–581. ACM (2019)

    Google Scholar 

  2. Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22

    Chapter  Google Scholar 

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Boston (2006)

    MATH  Google Scholar 

  4. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008)

    MATH  Google Scholar 

  5. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall, Wadsworth (1984)

    MATH  Google Scholar 

  6. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM, Boston (2000)

    Google Scholar 

  8. Duarte, J., Gama, J.: Multi-target regression from high-speed data streams with adaptive model rules. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), vol. 36678, pp. 1–10. IEEE, Campus des Cordeliers, Paris (2015)

    Google Scholar 

  9. Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM sIGKDD Explor. Newslett. 14(2), 1–5 (2013)

    Article  Google Scholar 

  10. Gabriel, K.R.: The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467 (1971)

    Article  MathSciNet  Google Scholar 

  11. Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC, London (2010)

    Book  Google Scholar 

  12. Gomes, H.M., Barddal, J.P., Enembreck, F., Bifet, A.: A survey on ensemble learning for data stream classification. ACM Comput. Surv. (CSUR) 50(2), 23 (2017)

    Article  Google Scholar 

  13. Gomes, H.M., Barddal, J.P., Ferreira, L.E.B., Bifet, A.: Adaptive random forests for data stream regression. In: 26th European Symposium on Artificial Neural Networks, ESANN 2018, Bruges, Belgium, 25–27 April 2018 (2018). http://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2018-183.pdf

  14. Grzenda, M., Gomes, H.M., Bifet, A.: Delayed labelling evaluation for data streams. Data Min. Knowl. Disc. 34, 1237–1266 (2019)

    Article  MathSciNet  Google Scholar 

  15. Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)

    Article  MathSciNet  Google Scholar 

  16. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM (2001)

    Google Scholar 

  17. Ikonomovska, E., Gama, J., Džeroski, S.: Incremental multi-target model trees for data streams. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 988–993. ACM (2011)

    Google Scholar 

  18. Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Disc. 23(1), 128–168 (2011)

    Article  MathSciNet  Google Scholar 

  19. Ikonomovska, E., Gama, J., Džeroski, S.: Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing 150, 458–470 (2015)

    Article  Google Scholar 

  20. Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)

    Article  Google Scholar 

  21. Mastelini, S.M., Barbon Jr., S., de Carvalho, A.C.P.d., Ferreira, L.: Online multi-target regression trees with stacked leaf models. arXiv preprint arXiv:1903.12483 (2019)

  22. Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(1), 2914–2915 (2018)

    MATH  Google Scholar 

  23. Osojnik, A., Panov, P., Džeroski, S.: Tree-based methods for online multi-target regression. J. Intell. Inf. Syst. 50(2), 315–339 (2018)

    Article  Google Scholar 

  24. Salehi-Moghaddami, N., Yazdi, H.S., Poostchi, H.: Correlation based splitting criterion in multi branch decision tree. Cent. Eur. J. Comp. Sci. 1(2), 205–220 (2011)

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank FAPESP (São Paulo Research Foundation) for its financial support (grants #2018/07319-6, #2016/18615-0 and #2013/07375-0) and Intel Inc. for providing equipment for some of the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saulo Martiello Mastelini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mastelini, S.M., Ponce de Leon Ferreira de Carvalho, A.C. (2020). 2CS: Correlation-Guided Split Candidate Selection in Hoeffding Tree Regressors. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61380-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61379-2

  • Online ISBN: 978-3-030-61380-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics