Skip to main content
Log in

An eager splitting strategy for online decision trees in ensembles

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Decision tree ensembles are widely used in practice. In this work, we study in ensemble settings the effectiveness of replacing the split strategy for the state-of-the-art online tree learner, Hoeffding Tree, with a rigorous but more eager splitting strategy that we had previously published as Hoeffding AnyTime Tree. Hoeffding AnyTime Tree (HATT), uses the Hoeffding Test to determine whether the current best candidate split is superior to the current split, with the possibility of revision, while Hoeffding Tree aims to determine whether the top candidate is better than the second best and if a test is selected, fixes it for all posterity. HATT converges to the ideal batch tree while Hoeffding Tree does not. We find that HATT is an efficacious base learner for online bagging and online boosting ensembles. On UCI and synthetic streams, HATT as a base learner outperforms HT at a 0.05 significance level for the majority of tested ensembles on what we believe is the largest and most comprehensive set of testbenches in the online learning literature. Our results indicate that HATT is a superior alternative to Hoeffding Tree in a large number of ensemble settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. In the prequential setting, training instances arrive in a sequence, and the true target value pertaining to each training instance is made available after the predictor has offered a prediction for a sequence of n instances. The loss function applied is necessarily incremental in nature. Choosing \(n=1\) — that is, evaluating and then updating the predictor after every instance—is an obvious transformation of a periodic evaluation process into an instantaneous one. While not typical of real world application scenarios, prequential accuracy serves as a useful approximation thereto.

  2. There is a common misconception that an individual random variable “changes, taking on a number of values during a process”; in fact, a process is a sequence of events, each of which corresponds to an individual random variable that has taken a particular value (which is fixed and never to change).

References

  • Agrawal R, Ghosh S, Imielinski T, Iyer B, Swami A (1992) An interval classifier for database mining applications, pp 560–573

  • Bhatt R, Dhall A (2012) Skin segmentation dataset: UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/skin+segmentation

  • Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448

  • Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: Mas16 sive online analysis. J Mach Learn Res, pp 1601–1604

  • Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolv18 ing data streams. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 135–150

  • Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009a) CovPokElec dataset from new ensemble methods for evolving data streams, KDD ’09. https://www.openml.org/d/149

  • Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009b) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 139–148

  • Bifet A, Ikonomovska E (2009) Airlines Dataset. https://www.openml.org/d/1169

  • Blackard J, Dean D (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variable, vol 24, pp 131–151

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York

    MATH  Google Scholar 

  • Brown G (2017) Ensemble learning. In: Claude S, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, MA pp 393–402. https://doi.org/10.1007/978-1-4899-7687-1_252

  • Burgués J, Jiménez-Soto JM, Marco S (2018) Estima tion of the limit of detection in semiconductor gas sensors through linearized calibration models. Anal Chim Acta. https://doi.org/10.1016/j.aca.2018.01.062

    Article  Google Scholar 

  • Burgués J, Marco S (2018) Multivariate estimation of the limit of de tection by orthogonal partial least squares in temperature-modulated MOX sensors. In: Analytica Chimica Acta 1019, pp 49–64. https://doi.org/10.1016/j.aca.2018.03.005. http://www.sciencedirect.com/science/article/pii/S0003267018303702

  • Chen S-T, Lin H-T, Lu C-J (2012) An online boosting algorithm with theoretical justifications. In: arXiv preprint arXiv:1206.6422

  • de Barros RSM, de Carvalho Santos SGT, Junior PMG (2016) A boosting-like online learning ensemble. In: 2016 international joint conference on neural networks (IJCNN), pp 1871–1878. https://doi.org/10.1109/IJCNN.2016.7727427

  • de Carvalho SSGT, Gonçalves JPM, dos Santos SGD, de Barros RSM (2014) Speeding up recovery from concept drifts. In: Toon C, Floriana E, Eyke H, Rosa M (eds) Machine learning and knowledge discovery in databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin, Heidelberg, pp 179–194. https://doi.org/10.1007/978-3-662-44845-8_12

  • de Mello RF, Chaitanya M, Albert B (2019) Measuring the shattering coefficient of decision tree models. Expert Syst Appl 137:443–452

    Article  Google Scholar 

  • Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15

  • Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80

  • Dua D, Graff C (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml

  • Gehrke J, Ramakrishnan R, Ganti V (2000) RainForest–a framework for fast decision tree construction of large datasets. Data Min Knowl Discov 4(2–3):127–162

    Article  Google Scholar 

  • Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y (1999) BOAT—optimistic decision tree construction. In: Proceedings of the 1999 ACM SIG- MOD international conference on management of data. SIGMOD ’99. ACM, Philadelphia, Pennsylvania, pp 169–180. https://doi.org/10.1145/304182.304197

  • Gomes HM, Read J, Bifet A (2019) Streaming random patches for evolving data stream classification. In: Jianyong W, Kyuseok S, Xindong W (eds) 2019 IEEE International conference on data mining, ICDM 2019, Beijing, China, 2019. IEEE, pp 240–249. https://doi.org/10.1109/ICDM.2019.00034

  • Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfahringer B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9–10):1469–1495

    Article  MathSciNet  Google Scholar 

  • Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):1–36

    Article  Google Scholar 

  • Harries M, Gama J, Bifet A (2009) NSW Electricity dataset. https://www.openml.org/d/151

  • Heidrich-Meisner V, Igel C (2009) Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th annual international conference on machine learning, pp 401–408

  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  Google Scholar 

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106

  • Hunt EB, Marin J, Stone PJ (1966) Experiments in induction. Academic Press. https://books.google.com.au/books?id=60NDAAAAIAAJ

  • Jeffrey S, Douglas F (1986) A case study of incremental concept induction. AAAI 86:496–501

    Google Scholar 

  • Kaluza B, Mirchevska V, Dovgan E, Lustrek M, Gams M (2010) An agent-based approach to care in independent living, pp. 177–186. https://doi.org/10.1007/978-3-642-16917-5_18

  • Kuncheva LI (2003) That elusive diversity in classifier ensembles. In: Iberian conference on pattern recognition and image analysis. Springer, pp 1126–1138

  • Kwapisz JR, Weiss GM, Moore SA (2010) Activity recognition using cell phone accelerometers. In: Proceedings of the fourth international workshop on knowledge discovery from sensor data, pp 10–18

  • Larry Wasserman (n.d.). Lecture Notes 3 — Review: Bounded Random Variables - Hoeffd- ing’s bound. https://www.stat.cmu.edu/~larry/=stat705/Lecture3.pdf

  • Lyman R (2016) Character font images data set: UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/Character+Font+Images

  • Manapragada C, Webb GI, Salehi M (2018) Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, pp 1953–1962

  • Manapragada C, Webb GI, Salehi M, Bifet A (2020). Emergent and unspecified behaviors in streaming decision trees. arXiv:2010.08199 [cs.LG]

  • Oza NC (2005) Online bagging and boosting. In: Jamshidi M (ed) International conference on systems, man, and cybernetics, special session on ensemble methods for extreme environments. Institute for Electrical and Electronics Engineers, New Jersey, pp 2340–2345

  • Quinlan JR (1979) Discovering rules by induction from large collections of exam ples. In: Expert systems in the micro electronics age

  • Quinlan JR (1983) Learning efficient classification procedures and their application to chess end games. Mach Learn, pp 463–482

  • Quinlan JR (1992) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo. http://cds.cern.ch/record/2031749

  • Ramon H, Thiago M, Jordi F, Nikolai R (2016) Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemom Intell Lab Syst 157:169–176

    Article  Google Scholar 

  • Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for ac tivity monitoring. In: 2012 16th international symposium on wearable computers (ISWC). IEEE, pp 108–109

  • Roe B, Yang H, Zhu J, Liu Y, Stancu I, McGregor G (2004) Boosted decision trees as an alternative to artificial neural networks for particle identification. In: Nuclear instruments and methods in physics research A 543. https://doi.org/10.1016/j.nima.2004.12.018

  • Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227

    Google Scholar 

  • Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354. https://doi.org/10.1007/BF00116895

    Article  Google Scholar 

  • Servedio Rocco A (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648

    MathSciNet  MATH  Google Scholar 

  • SIGKDD (2015) 2015 KDD Test of Time Award Winners. https://www.kdd.org/ awards/view/2015-kdd-test-of-time (visited on 12/10/2019)

  • Stisen A, Blunck H, Bhattacharya S, Prentow T, Kjaergaard M, Dey A, Sonne T, Jensen M (2015) Smart devices are different: assessing and mitigating mobile sensing heterogeneities for activity recognition. In: Proceedings of the 13th ACM conference on embedded networked sensor systems. SenSys ’15. ACM, Seoul, pp 127–140

  • Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 377–382

  • Ugulino W, Cardador D, Vega K, Velloso E, Milidiu R, Fuks H (2012) Wearable computing: accelerometers’ data classification of body postures and movements. https://doi.org/10.1007/978-3-642-34459-6_6

  • Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186

    Article  Google Scholar 

  • Visser B, Gouk H (2018) AWS dataset. https://www.openml.org/ d/41424

  • Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994

    Article  MathSciNet  Google Scholar 

  • Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390. https://doi.org/10.1162/neco.1996.8.7.1341

    Article  Google Scholar 

  • Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1(1):67

    Article  Google Scholar 

  • Wolpert DH, Macready WG (2005) Coevolutionary free lunches. IEEE Trans Evolut Comput 9(6):721–735

    Article  Google Scholar 

  • Yair M, Michael B, Yael M, Yisroel M, Asaf S, Dominik B, Yuval E (2018) N-BaIoT–Network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput 17:12–22. https://doi.org/10.1109/MPRV.2018.03367731

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaitanya Manapragada.

Additional information

Responsible editor: Henrik Boström.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Manapragada, C., Gomes, H.M., Salehi, M. et al. An eager splitting strategy for online decision trees in ensembles. Data Min Knowl Disc 36, 566–619 (2022). https://doi.org/10.1007/s10618-021-00816-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00816-x

Keywords

Navigation