Skip to main content
Log in

Probabilistic exact adaptive random forest for recurrent concepts in data streams

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

In order to adapt random forests to the dynamic nature of data streams, the state-of-the-art technique discards trained trees and grows new trees when concept drifts are detected. This is particularly wasteful when recurrent patterns exist. In this work, we introduce a novel framework called PEARL, which uses both an exact technique and a probabilistic graphical model with Lossy Counting, to replace drifted trees with relevant trees built in the past. The exact technique utilizes pattern matching to find the set of drifted trees that co-occurred in predictions in the past. Meanwhile, a probabilistic graphical model is being built to capture the tree replacements among recurrent concept drifts. Once the graphical model becomes stable, it replaces the exact technique and finds relevant trees in a probabilistic fashion. Further, Lossy Counting is applied to the graphical model which brings an added theoretical guarantee for both error rate and space complexity. We empirically show our technique outperforms baselines in terms of accuracy and kappa on both synthetic and real-world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/ingako/PEARL

  2. https://moa.cms.waikato.ac.nz/datasets

  3. https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

  4. http://www.cse.fau.edu/~xqzhu/stream/sensor.arff

References

  1. Ahmadi, Z., Kramer, S.: Modeling recurring concepts in data streams: a graph-based framework. Knowl. Inf. Syst. 55(1), 15–44 (2018)

    Article  Google Scholar 

  2. Anderson, R., Koh, Y.S., Dobbie, G., Bifet, A.: Recurring concept meta-learning for evolving data streams. Expert Syst. Appl. 138, 112832 (2019)

    Article  Google Scholar 

  3. Ángel, A.M., Bartolo, G.J., Ernestina, M.: Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst. Appl. 46, 87–105 (2016)

    Article  Google Scholar 

  4. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  5. Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 135–150. Springer, Berlin Heidelberg (2010)

    Chapter  Google Scholar 

  6. Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Improving adaptive bagging methods for evolving data streams. In: Zhou, Z.H., Washio, T. (eds.) Advances in Machine Learning, pp. 23–37. Springer, Berlin Heidelberg (2009)

    Chapter  Google Scholar 

  7. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  8. Brzezinski, D., Stefanowski, J.: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265, 50–67 (2014). https://doi.org/10.1016/j.ins.2013.12.011

    Article  MathSciNet  MATH  Google Scholar 

  9. Cano, A., Krawczyk, B.: Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109, 175–218 (2020)

    Article  MathSciNet  Google Scholar 

  10. Chen, K., Koh, Y.S., Riddle, P.: Proactive drift detection: Predicting concept drifts in data streams using probabilistic networks. In: IJCNN, pp. 780–787. IEEE (2016)

  11. Chiu, C.W., Minku, L.L.: Diversity-based pool of models for dealing with recurring concepts. In: 2018 IJCNN, pp. 1–8. IEEE (2018)

  12. Gama, J., Kosina, P.: Recurrent concepts in data streams classification. Knowl. Inf. Syst. 40(3), 489–507 (2014)

    Article  Google Scholar 

  13. Gao, Y., Chandra, S., Li, Y., Khan, L., Thuraisingham, B. M.: SACCOS: A semi–supervised framework for emerging class detection and concept drift adaption over data streams. In: IEEE Trans. Knowl. Data Eng. (2020). https://doi.org/10.1109/TKDE.2020.2993193

  14. Ghomeshi, H., Gaber, M.M., Kovalchuk, Y.: Eacd: evolutionary adaptation to concept drifts in data streams. Data Min. Knowl. Disc. 33, 663–694 (2019)

    Article  Google Scholar 

  15. Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9–10), 1469–1495 (2017)

    Article  MathSciNet  Google Scholar 

  16. Gonçalves Jr., P.M., Barros, R.S.M.D.: RCD: A recurring concept drift framework. Pattern Recogn. Lett. 34(9), 1018–1025 (2013)

    Article  Google Scholar 

  17. Goyal, A., Daumé, H.: Lossy conservative update (lcu) sketch: Succinct approximate count storage. In: 25th AAAI (2011)

  18. Hu, J., Chen, J., Qin, X.: Algorithm of recurring concept drift base on main feature extraction. In: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, pp. 59–65 (2019)

  19. Koh, Y.S., Huang, D.T.J., Pearce, C., Dobbie, G.: Volatility drift prediction for transactional data streams. In: 2018 IEEE ICDM, pp. 1091–1096. IEEE (2018)

  20. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE International Conference on Data Mining, pp. 123–130 (2003)

  21. Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: A survey. Information Fusion 37, 132–156 (2017)

    Article  Google Scholar 

  22. Krawczyk, B., Woźniak, M.: One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft. Comput. 19, 3387–3400 (2015)

    Article  Google Scholar 

  23. Li, P., Wu, X., Xuegang, H.: Mining recurring concept drifts with limited labeled streaming data. J. Mach. Learn. Res. - Proc. Track 13, 241–252 (2010). https://doi.org/10.1145/2089094.2089105

    Article  Google Scholar 

  24. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB, VLDB ’02, pp. 346–357. VLDB Endowment (2002)

  25. Masud, M.M., Al-Khateeb, T.M., Khan, L., Aggarwal, C., Gao, J., Han, J., Thuraisingham, B.: Detecting recurring and novel classes in concept-drifting data streams. In: 2011 IEEE 11th ICDM, pp. 1176–1181. IEEE (2011)

  26. Moreira dos Reis, D., Maletzke, A., Silva, D.F., Batista, G.E.: Classifying and counting with recurrent contexts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1983–1992 (2018)

  27. Wu, O., Koh, Y.S., Dobbie, G., Lacombe, T.: PEARL: Probabilistic exact adaptive random forest with lossy counting for data streams. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 17–30. Springer (2020)

  28. Yang, Y., Wu, X., Zhu, X.: Mining in anticipation for concept change: Proactive-reactive prediction in data streams. Data Min. Knowl. Disc. 13(3), 261–289 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was funded in part by the Office of Naval Research Global grant (N62909-19-1-2042).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ocean Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, O., Koh, Y.S., Dobbie, G. et al. Probabilistic exact adaptive random forest for recurrent concepts in data streams. Int J Data Sci Anal 13, 17–32 (2022). https://doi.org/10.1007/s41060-021-00273-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-021-00273-1

Keywords

Navigation