Abstract
In order to adapt random forests to the dynamic nature of data streams, the state-of-the-art technique discards trained trees and grows new trees when concept drifts are detected. This is particularly wasteful when recurrent patterns exist. In this work, we introduce a novel framework called PEARL, which uses both an exact technique and a probabilistic graphical model with Lossy Counting, to replace drifted trees with relevant trees built in the past. The exact technique utilizes pattern matching to find the set of drifted trees that co-occurred in predictions in the past. Meanwhile, a probabilistic graphical model is being built to capture the tree replacements among recurrent concept drifts. Once the graphical model becomes stable, it replaces the exact technique and finds relevant trees in a probabilistic fashion. Further, Lossy Counting is applied to the graphical model which brings an added theoretical guarantee for both error rate and space complexity. We empirically show our technique outperforms baselines in terms of accuracy and kappa on both synthetic and real-world datasets.
Similar content being viewed by others
References
Ahmadi, Z., Kramer, S.: Modeling recurring concepts in data streams: a graph-based framework. Knowl. Inf. Syst. 55(1), 15–44 (2018)
Anderson, R., Koh, Y.S., Dobbie, G., Bifet, A.: Recurring concept meta-learning for evolving data streams. Expert Syst. Appl. 138, 112832 (2019)
Ángel, A.M., Bartolo, G.J., Ernestina, M.: Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function. Expert Syst. Appl. 46, 87–105 (2016)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 135–150. Springer, Berlin Heidelberg (2010)
Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Improving adaptive bagging methods for evolving data streams. In: Zhou, Z.H., Washio, T. (eds.) Advances in Machine Learning, pp. 23–37. Springer, Berlin Heidelberg (2009)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Brzezinski, D., Stefanowski, J.: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265, 50–67 (2014). https://doi.org/10.1016/j.ins.2013.12.011
Cano, A., Krawczyk, B.: Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109, 175–218 (2020)
Chen, K., Koh, Y.S., Riddle, P.: Proactive drift detection: Predicting concept drifts in data streams using probabilistic networks. In: IJCNN, pp. 780–787. IEEE (2016)
Chiu, C.W., Minku, L.L.: Diversity-based pool of models for dealing with recurring concepts. In: 2018 IJCNN, pp. 1–8. IEEE (2018)
Gama, J., Kosina, P.: Recurrent concepts in data streams classification. Knowl. Inf. Syst. 40(3), 489–507 (2014)
Gao, Y., Chandra, S., Li, Y., Khan, L., Thuraisingham, B. M.: SACCOS: A semi–supervised framework for emerging class detection and concept drift adaption over data streams. In: IEEE Trans. Knowl. Data Eng. (2020). https://doi.org/10.1109/TKDE.2020.2993193
Ghomeshi, H., Gaber, M.M., Kovalchuk, Y.: Eacd: evolutionary adaptation to concept drifts in data streams. Data Min. Knowl. Disc. 33, 663–694 (2019)
Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9–10), 1469–1495 (2017)
Gonçalves Jr., P.M., Barros, R.S.M.D.: RCD: A recurring concept drift framework. Pattern Recogn. Lett. 34(9), 1018–1025 (2013)
Goyal, A., Daumé, H.: Lossy conservative update (lcu) sketch: Succinct approximate count storage. In: 25th AAAI (2011)
Hu, J., Chen, J., Qin, X.: Algorithm of recurring concept drift base on main feature extraction. In: Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence, pp. 59–65 (2019)
Koh, Y.S., Huang, D.T.J., Pearce, C., Dobbie, G.: Volatility drift prediction for transactional data streams. In: 2018 IEEE ICDM, pp. 1091–1096. IEEE (2018)
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Third IEEE International Conference on Data Mining, pp. 123–130 (2003)
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: A survey. Information Fusion 37, 132–156 (2017)
Krawczyk, B., Woźniak, M.: One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft. Comput. 19, 3387–3400 (2015)
Li, P., Wu, X., Xuegang, H.: Mining recurring concept drifts with limited labeled streaming data. J. Mach. Learn. Res. - Proc. Track 13, 241–252 (2010). https://doi.org/10.1145/2089094.2089105
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB, VLDB ’02, pp. 346–357. VLDB Endowment (2002)
Masud, M.M., Al-Khateeb, T.M., Khan, L., Aggarwal, C., Gao, J., Han, J., Thuraisingham, B.: Detecting recurring and novel classes in concept-drifting data streams. In: 2011 IEEE 11th ICDM, pp. 1176–1181. IEEE (2011)
Moreira dos Reis, D., Maletzke, A., Silva, D.F., Batista, G.E.: Classifying and counting with recurrent contexts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1983–1992 (2018)
Wu, O., Koh, Y.S., Dobbie, G., Lacombe, T.: PEARL: Probabilistic exact adaptive random forest with lossy counting for data streams. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 17–30. Springer (2020)
Yang, Y., Wu, X., Zhu, X.: Mining in anticipation for concept change: Proactive-reactive prediction in data streams. Data Min. Knowl. Disc. 13(3), 261–289 (2006)
Acknowledgements
This work was funded in part by the Office of Naval Research Global grant (N62909-19-1-2042).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, O., Koh, Y.S., Dobbie, G. et al. Probabilistic exact adaptive random forest for recurrent concepts in data streams. Int J Data Sci Anal 13, 17–32 (2022). https://doi.org/10.1007/s41060-021-00273-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-021-00273-1