Skip to main content
Log in

Unsupervised concept drift detection method based on robust random cut forest

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The prevalence of streams in practical applications is rapidly increasing, making stream data mining increasingly important. However, unlike the static datasets used in machine learning, streams are dynamic and tend to exhibit non-stationarity, with their underlying distributions changing over time over time. As a result, the previous models are not suitable anymore and the prediction performance decreases sharply. This phenomenon is referred to as concept drift, which poses a crucial challenge for stream mining. Hence, taking some measures to deal with concept drift is urgent and essential. Most approaches are dedicated to handling concept drift with true labeled data. However, the unavailability or latency of labels and the costly labeling expense in the real world remain a challenge. In this paper, we propose an unsupervised drift detection approach: concept drift detection method based on robust random cut forest and t-test (RFTT). It uses a sliding window and Robust Random Cut Forest (RRCF) to compute anomaly ratio and anomaly score to detect drift. Fourteen popular algorithms have been used for comprehensive experiments on 11 classic datasets, effectively validating our proposed method. The results show that RFTT performs well on most datasets and has the highest average ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Lu J, Liu A, Song Y, Zhang G (2020) Data-driven decision support under concept drift in streamed big data. Complex Intell Syst 6:157–163. https://doi.org/10.1007/s40747-019-00124-4

    Article  Google Scholar 

  2. Hu H, Kantardzic M, Sethi TS (2020) No Free Lunch Theorem for concept drift detection in streaming data classification: a review. Wiley Interdiscip Rev 10:e1327. https://doi.org/10.1002/widm.1327

    Article  Google Scholar 

  3. Somasundaram A, Reddy S (2019) Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance. Neural Comput Appl 31:3–14. https://doi.org/10.1007/s00521-018-3633-8

    Article  Google Scholar 

  4. Wahab OA (2022) Intrusion detection in the IoT under data and concept drifts: online deep learning approach. IEEE Internet Things J 9:19706–19716. https://doi.org/10.1109/Jiot.2022.3167005

    Article  Google Scholar 

  5. Korycki L, Krawczyk B (2019) Unsupervised drift detector ensembles for data stream mining. IEEE Int Conf Data Sci Adv Anal. https://doi.org/10.1109/dsaa.2019.00047

    Article  Google Scholar 

  6. Qiao H, Novikov B, Blech JO (2021) Concept drift analysis by dynamic residual projection for effectively detecting botnet cyber-attacks in IoT scenarios. IEEE Trans Ind Inf 18:3692–3701. https://doi.org/10.1109/tii.2021.3108464

    Article  Google Scholar 

  7. Guo H, Li H, Ren Q, Wang W (2022) Concept drift type identification based on multi-sliding windows. Inf Sci 585:1–23. https://doi.org/10.1016/j.ins.2021.11.023

    Article  Google Scholar 

  8. Lee S, Park SH (2022) Concept drift modeling for robust autonomous vehicle control systems in time-varying traffic environments. Expert Syst Appl 190:116206. https://doi.org/10.1016/j.eswa.2021.116206

    Article  Google Scholar 

  9. Han M, Chen ZQ, Li MH, Wu HX, Zhang XL (2022) A survey of active and passive concept drift handling methods. Comput Intell 38:1492–1535. https://doi.org/10.1111/coin.12520

    Article  Google Scholar 

  10. Krawczyk B, Pfahringer B, Woźniak M (2018) Combining active learning with concept drift detection for data stream mining. In: 2018 IEEE International Conference on Big Data, IEEE, pp 2239–2244. https://doi.org/10.1109/bigdata.2018.8622549

    Article  Google Scholar 

  11. Gözüaçık Ö, Can F (2021) Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif Intell Rev 54:3725–3747. https://doi.org/10.1007/s10462-020-09939-x

    Article  Google Scholar 

  12. Gulcan EB, Can F (2022) Unsupervised concept drift detection for multi-label data streams. Artif Intell Rev. https://doi.org/10.1007/s10462-022-10232-2

    Article  Google Scholar 

  13. Gemaque RN, Costa AFJ, Giusti R, dos Santos EM (2020) An overview of unsupervised drift detection methods. Wiley Interdiscip Rev Data Min Knowl Discov 10:1381

    Article  Google Scholar 

  14. Guha S, Mishra N, Roy G, Schrijvers O (2016) Robust random cut forest based anomaly detection on streams. Int Conf Mach Learn PMLR 48:2712–2721. https://doi.org/10.5555/3045390.3045676

    Article  Google Scholar 

  15. Pinagé F, dos Santos EM, Gama J (2020) A drift detection method based on dynamic classifier selection. Data Min Knowl Disc 34:50–74. https://doi.org/10.1007/s10618-019-00656-w

    Article  MathSciNet  Google Scholar 

  16. Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G (2018) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31:2346–2363. https://doi.org/10.1109/TKDE.2018.2876857

    Article  Google Scholar 

  17. Iwashita AS, Papa JP (2018) An overview on concept drift learning. IEEE access 7:1532–1547. https://doi.org/10.1109/access.2018.2886026

    Article  Google Scholar 

  18. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. Brazilian symposium on artificial intelligence. Springer, pp 286–295. https://doi.org/10.1007/978-3-540-28645-5_29

    Book  MATH  Google Scholar 

  19. Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, pp 77–86

  20. Barros RSM, Cabral DRL, Goncalves PM, Santos SGTC (2017) RDDM: reactive drift detection method. Expert Syst Appl 90:344–355. https://doi.org/10.1016/j.eswa.2017.08.023

    Article  Google Scholar 

  21. Frias-Blanco I, del Campo-Ávila J, Ramos-Jimenez G, Morales-Bueno R, Ortiz-Diaz A, Caballero-Mota Y (2014) Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans Knowl Data Eng 27:810–823. https://doi.org/10.1109/tkde.2014.2345382

    Article  Google Scholar 

  22. Pesaranghader A, Viktor HL (2016) Fast hoeffding drift detection method for evolving data streams. Joint European conference on machine learning and knowledge discovery in databases. Springer, Cham, pp 96–111. https://doi.org/10.1007/978-3-319-46227-1_7

    Chapter  Google Scholar 

  23. Pesaranghader A, Viktor H, Paquet E (2018) Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Mach Learn 107:1711–1743. https://doi.org/10.1007/s10994-018-5719-z

    Article  MathSciNet  Google Scholar 

  24. Pesaranghader A, Viktor HL, Paquet E (2018) McDiarmid drift detection methods for evolving data streams. Int Jt Conf Neural Netw. https://doi.org/10.1109/ijcnn.2018.8489260

    Article  Google Scholar 

  25. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. Proc SIAM Int Conf Data Min. https://doi.org/10.1137/1.9781611972771.42

    Article  Google Scholar 

  26. Raab C, Heusinger M, Schleif FM (2020) Reactive soft prototype computing for concept drift streams. Neurocomputing 416:340–351. https://doi.org/10.1016/j.neucom.2019.11.111

    Article  Google Scholar 

  27. dos Reis DM, Flach P, Matwin S, Batista G (2016) Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. https://doi.org/10.1145/2939672.2939836

    Article  Google Scholar 

  28. Liu AJ, Lu J, Liu F, Zhang GQ (2018) Accumulating regional density dissimilarity for concept drift detection in data streams. Pattern Recogn 76:256–272. https://doi.org/10.1016/j.patcog.2017.11.009

    Article  Google Scholar 

  29. de Mello RF, Vaz Y, Grossi CH, Bifet A (2019) On learning guarantees to unsupervised concept drift detection on data streams. Expert Syst Appl 117:90–102. https://doi.org/10.1016/j.eswa.2018.08.054

    Article  Google Scholar 

  30. Sethi TS, Kantardzic M (2015) Don’t pay for validation: detecting drifts from unlabeled data using margin density. Inns Conf Big Data Progr 53:103–112. https://doi.org/10.1016/j.procs.2015.07.284

    Article  Google Scholar 

  31. Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99. https://doi.org/10.1016/j.eswa.2017.04.008

    Article  Google Scholar 

  32. Gözüaçık Ö, Büyükçakır A, Bonab H, Can F (2019) Unsupervised concept drift detection with a discriminative classifier. Proc ACM Int Conf Inf Knowl Manag. https://doi.org/10.1145/3357384.3358144

    Article  Google Scholar 

  33. Pinto F, Sampaio MO, Bizarro P (2019) Automatic model monitoring for data streams. Arxiv preprint 5:5. https://doi.org/10.48550/arXiv.1908.04240

    Article  Google Scholar 

  34. Li B, Wang YJ, Yang DS, Li YM, Ma XK (2019) FAAD: an unsupervised fast and accurate anomaly detection method for a multi-dimensional sequence over data stream. Front Inf Technol Electron Eng 20:388–404. https://doi.org/10.1631/Fitee.1800038

    Article  Google Scholar 

  35. Losing V, Hammer B, Wersing H (2016) KNN classifier with self adjusting memory for heterogeneous concept drift. Int Conf Data Min. https://doi.org/10.1109/icdm.2016.0040

    Article  Google Scholar 

  36. Harries M, Wales NS (1999) Splice-2 comparative evaluation: electricity pricing.

  37. Vergara A, Vembu S, Ayhan T, Ryan MA, Homer ML, Huerta R (2012) Chemical gas sensor drift compensation using classifier ensembles. Sensors Actuators B 166:320–329. https://doi.org/10.1016/j.snb.2012.01.074

    Article  Google Scholar 

  38. Souza VM, Silva DF, Gama J, Batista GE (2015) Data stream classification guided by clustering on nonstationary environments and extreme verification latency. Proc SIAM Int Conf Data Min. https://doi.org/10.1137/1.9781611974010.98

    Article  Google Scholar 

  39. Zhu X (2010) Stream data mining repository. https://www.cse.fau.edu/~xqzhu/stream.html

  40. Liu J, Burak K, Carlisle A (2020) Machine learning-driven intrusion detection for Contiki-NG-based IoT networks exposed to NSL-KDD dataset. Proc ACM Worksh Wirel Secur Mach Learn. https://doi.org/10.1145/3395352.3402621

    Article  Google Scholar 

  41. Ke GL, Meng Q, Finley T, Wang TF, Chen W, Ma WD, Ye QW, Liu TY (2017) LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. https://doi.org/10.5555/3294996.3295074

    Article  Google Scholar 

  42. Montiel J, Read J, Bifet A, Abdessalem T (2018) Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res 19:2915–2914

    Google Scholar 

  43. Montiel J, Halford M, Mastelini SM, Bolmier G, Sourty R, Vaysse R, Zouitine A, Gomes HM et al (2021) River: machine learning for streaming data in Python. J Mach Learn Res. https://doi.org/10.48550/arXiv.2012.04740

    Article  MATH  Google Scholar 

  44. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. https://doi.org/10.1145/502512.502529

    Article  Google Scholar 

  45. Gomes HM, Read J, Bifet A (2019) Streaming random patches for evolving data stream classification. IEEE Int Conf Data Min. https://doi.org/10.1109/ICDM.2019.00034

    Article  Google Scholar 

  46. Manapragada C, Webb GI, Salehi M (2018) Extremely fast decision tree. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. https://doi.org/10.1145/3219819.3220005

    Article  Google Scholar 

  47. Heusinger M, Raab C, Schleif F-M (2020) Passive concept drift handling via momentum based robust soft learning vector quantization. Advances in self-organizing maps, learning vector quantization, clustering and data visualization: proceedings of the 13th International Workshop, WSOM+ 2019, Barcelona, Spain. Springer, pp 200–209. https://doi.org/10.1007/978-3-030-19642-4_20

    Chapter  Google Scholar 

  48. Wang K, Lu J, Liu A, Song Y, Xiong L, Zhang G (2022) Elastic gradient boosting decision tree with adaptive iterations for concept drift adaptation. Neurocomputing 491:288–304. https://doi.org/10.1016/j.neucom.2022.03.038

    Article  Google Scholar 

  49. Tanha J, Samadi N, Abdi Y, Razzaghi-Asl N (2022) CPSSDS: conformal prediction for semi-supervised classification on data streams. Inf Sci 584:212–234. https://doi.org/10.1016/j.ins.2021.10.068

    Article  Google Scholar 

Download references

Funding

This research was funded by Innovative Research Group Project of the National Natural Science Foundation of China, Grant no [11675060].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Yi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pang, Z., Cen, J. & Yi, M. Unsupervised concept drift detection method based on robust random cut forest. Int. J. Mach. Learn. & Cyber. 14, 4207–4222 (2023). https://doi.org/10.1007/s13042-023-01890-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01890-x

Keywords

Navigation