Abstract
Incremental machine learning algorithms have been effective alternatives to deal with stream data. The Hoeffding Tree framework is one of the most successful solutions for supervised online prediction tasks. Although online regression tasks are present in several forms, and in many real-life problems, most of the research efforts have been devoted to classification. Existing regression tree solutions have strong limitations, mainly regarding their memory usage and running time. Hence, a new algorithm able to address these aspects in Hoeffding Tree Regressors is a relevant research issue. In this paper, we propose 2CS, a correlation-guided strategy to speed up Hoeffding Tree Regressor training. 2CS is conceptually simple and works by avoiding the exhaustive evaluation of all possible features as split candidates, as occurs in the existing solutions. Moreover, 2CS can be easily merged into existing incremental tree solutions and online tree ensembles algorithms, such as bagging and boosting. Throughout an extensive experimental evaluation, we show that the induction of 2CS-based models can be significantly faster than the traditional Hoeffding Tree Regressor algorithms, whereas retaining similar predictive power and memory use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barddal, J.P., Enembreck, F.: Learning regularized hoeffding trees from data streams. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 574–581. ACM (2019)
Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Boston (2006)
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall, Wadsworth (1984)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM, Boston (2000)
Duarte, J., Gama, J.: Multi-target regression from high-speed data streams with adaptive model rules. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), vol. 36678, pp. 1–10. IEEE, Campus des Cordeliers, Paris (2015)
Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM sIGKDD Explor. Newslett. 14(2), 1–5 (2013)
Gabriel, K.R.: The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3), 453–467 (1971)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC, London (2010)
Gomes, H.M., Barddal, J.P., Enembreck, F., Bifet, A.: A survey on ensemble learning for data stream classification. ACM Comput. Surv. (CSUR) 50(2), 23 (2017)
Gomes, H.M., Barddal, J.P., Ferreira, L.E.B., Bifet, A.: Adaptive random forests for data stream regression. In: 26th European Symposium on Artificial Neural Networks, ESANN 2018, Bruges, Belgium, 25–27 April 2018 (2018). http://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2018-183.pdf
Grzenda, M., Gomes, H.M., Bifet, A.: Delayed labelling evaluation for data streams. Data Min. Knowl. Disc. 34, 1237–1266 (2019)
Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM (2001)
Ikonomovska, E., Gama, J., Džeroski, S.: Incremental multi-target model trees for data streams. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 988–993. ACM (2011)
Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Disc. 23(1), 128–168 (2011)
Ikonomovska, E., Gama, J., Džeroski, S.: Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing 150, 458–470 (2015)
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017)
Mastelini, S.M., Barbon Jr., S., de Carvalho, A.C.P.d., Ferreira, L.: Online multi-target regression trees with stacked leaf models. arXiv preprint arXiv:1903.12483 (2019)
Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(1), 2914–2915 (2018)
Osojnik, A., Panov, P., Džeroski, S.: Tree-based methods for online multi-target regression. J. Intell. Inf. Syst. 50(2), 315–339 (2018)
Salehi-Moghaddami, N., Yazdi, H.S., Poostchi, H.: Correlation based splitting criterion in multi branch decision tree. Cent. Eur. J. Comp. Sci. 1(2), 205–220 (2011)
Acknowledgements
The authors would like to thank FAPESP (São Paulo Research Foundation) for its financial support (grants #2018/07319-6, #2016/18615-0 and #2013/07375-0) and Intel Inc. for providing equipment for some of the experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mastelini, S.M., Ponce de Leon Ferreira de Carvalho, A.C. (2020). 2CS: Correlation-Guided Split Candidate Selection in Hoeffding Tree Regressors. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-61380-8_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)