Skip to main content
Log in

Cluster based active learning for classification of evolving streams

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Classification of imbalanced unlabelled data streams with concept drift in evolving streams has posed many challenges recently. Learner performance from the minority class is poor at high imbalance degrees. This causes drift detection to fail. Therefore, the existing model cannot be updated, resulting in poor classifier performance. Detecting drifts is typically done through supervised learning. They are impractical despite their effectiveness in detecting drifts. In real-world applications, only a portion of the data stream can be labelled as oracle assistance is pricey and laborious. To alleviate these problems, a novel technique which is a cluster based active learning for class imbalance and concept drift (CBAL) is presented in the paper. Adaptive sampling strategies are used for solving high imbalance degrees. A two-layer drift detection strategy is used for detecting drifts where the first layer is unsupervised and the second layer is supervised. To reduce the labelling cost this framework uses a clustering technique for querying the labels. Extensive experiments over synthetic and real-world data streams exhibit better classification performance. CBAL detects the drifts with fewer false alarms and with lesser oracle intervention. For high imbalanced case (i.e., 10%), the performance of CBAL is 53% and higher, whereas the performance of the other algorithms is zero or nil. The number of drifts detected by CBAL is much more accurate and it also reduces the labelling cost by 90%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Gamma J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46:1–37

    Article  MATH  Google Scholar 

  2. Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25

    Article  Google Scholar 

  3. He H, Edward AG (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  4. Sun Y, Wong A, Kamel M (2009) Classification of imbalanced data. Int J Pattern Recognit Artif Intell 23(4):687–719

    Article  Google Scholar 

  5. Haixiang G, Yijing L, Mingyun G, Yuanyue H, Bing G (2016) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  6. Ali H, MohdSalleh MN, Saedudin R, Hussain K, Mushta MF (2019) Imbalance class problems in data mining: a review. Indones J Electric Eng Comput Sci 14(3):1552–1563

    Google Scholar 

  7. Wang S, Minku LL, Yao X (2018) A systematic study of online class imbalance learning with concept drift. IEEE Trans Neural Netw Learn Syst 29(10):4802–4821

    Article  Google Scholar 

  8. Zhang W, Wang J (2017) A hybrid learning framework for imbalanced stream classification. In: 2017 IEEE International Congress on Big Data (Big Data Congress), pp 480–487

  9. Sun Y (2017) A novel ensemble classification for data streams with class imbalance and concept drift. Int J Perform Eng 13(6):945–955

    Google Scholar 

  10. Krishnamurthy A, Agarwal A, Huang T, Daume H, Langford J (2019) Active learning for cost sensitive classification. J Mach Learn Res 20(65):1–50

    MathSciNet  MATH  Google Scholar 

  11. Tran VC, Nguyen NT, Fujita H, Hoang DT, Hwang D (2017) A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields. Knowl Based Syst 132:179–187

    Article  Google Scholar 

  12. Song J, Wang H, Gao Y (2018) An active learning with confidence-based answers for crowdsourcing labelling tasks. Knowl Based Syst 159:244–258

    Article  Google Scholar 

  13. Reyes O, Altalhi AH, Ventura S (2018) Statistical comparisons of active learning strategies over multiple datasets. Knowl Based Syst 145:274–288

    Article  Google Scholar 

  14. Tegjyoth SS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabelled data. Expert Syst Appl Int J 82:77–99

    Article  Google Scholar 

  15. Zhu X, Zhang P, Lin X, Shi Y (2010) Active learning from stream data using optimal weight classifier ensemble. IEEE Trans Syst Man Cybern Part B Cybern 40(6):1607–1621

    Article  Google Scholar 

  16. Zhang H, Liu W, Shan J, Liu Q (2018) Online active learning paired ensemble for concept drift and class imbalance. IEEE Access 6:73815–73828

    Article  Google Scholar 

  17. Zliobaite A, Bifet B, Pfahringer HG (2014) Active learning with drifting streaming data. IEEE Trans Neural Netw Learn Syst 25(1):27–39

    Article  Google Scholar 

  18. Wang M, Fu K, Min F, Jia X (2020) Active learning through label error statistical methods. Knowl Based Syst 189:105140

    Article  Google Scholar 

  19. Krawczyk B (2017) Active and adaptive ensemble learning for online activity recognition from data streams. Knowl Based Syst 138:69–78

    Article  Google Scholar 

  20. Korycki L, Cano A, krawczyk B (2019) Active learning with abstaining classifiers for imbalanced drifting data streams. In: IEEE international conference on big data (big data), pp 2334–2343

  21. Wang S, Minku LL, Ghezzi D, Caltabiana D, Tino P, Yao X (2013) Concept drift detection for online class imbalance learning. In: The 2013 international joint conference on neural networks (IJCNN), pp. 1–10

  22. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Bazzan ALC, Labidi S (eds) Advances in artificial intelligence-SBIA 2004. Springer, Berlin Heidelberg, pp 286–295

    Google Scholar 

  23. Firas B, Bestoun SA, Andreas K (2022) From concept drift to model degradation: An overview on performance-aware drift detectors. Knowl Based Syst 245:108632

    Article  Google Scholar 

  24. Loezer L, Enembreck F, Barddal JP, Britto A (2020) Cost-sensitive learning for imbalanced data streams. In SAC ‘20: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp 498–504

  25. Wang S, Minku LL, Yao X (2015) Resampling based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368

    Article  Google Scholar 

  26. Barua S, Islam MM, Murase K (2015) Gosil: A generalized over-sampling based online imbalanced learning framework. In: Arik S, Huang T, Lai W, Liu Q (eds) Neural Information Processing, ICONIP Lecture Notes in Computer Science. Springer, Cham

    Google Scholar 

  27. Zhang H, Liu W, Wang S, Shan J, Liu Q (2019) Resample-based ensemble framework for drifting imbalanced data streams. IEEE Access 7:65103–65115

    Article  Google Scholar 

  28. Radhika VK, Revathy S, Suhas P (2022) Smart pools of data with ensembles for adaptive learning in dynamic data streams with class imbalance. IAES Int J Artif Intell IJAI 11(1):310–318

    Google Scholar 

  29. Sun Y, Li M, Li L, Shao H, Sun Y (2021) Cost-sensitive classification for evolving data streams with concept drift and class imbalance. Comput Intell Neurosci 2021:9

    Article  Google Scholar 

  30. Sun Y, Sun Y, Dai H (2020) Two-stage cost-sensitive learning for data streams with concept drift and class imbalance. IEEE Access 8:191942–191955

    Article  Google Scholar 

  31. Cano A, Krawczyk B (2020) Kappa updated ensemble for drifting data stream mining. Mach Learn 109(1):175–218

    Article  MathSciNet  MATH  Google Scholar 

  32. Zhao P, Zhang Y, Wu M, Hoi SCH, Tan M, Huang J (2019) Adaptive cost-sensitive online classification. IEEE Trans Knowl Data Eng 31(2):214–228

    Article  Google Scholar 

  33. Jyoti M, Angshul M, Emilie C (2021) Transformed subspace clustering. IEEE Trans Knowl Data Eng 33(4):1796–1801. https://doi.org/10.1109/TKDE.2020.2969354

    Article  Google Scholar 

  34. Jyoti M, Angshul M, Emilie C, Giovanni C (2020) Deeply transformed subspace clustering. Signal Process 174:107628

    Article  Google Scholar 

  35. Jyoti M, Angshul M, Emilie C (2018). Transformed Locally Linear Manifold Clustering. In: 26th European Signal Processing Conference (EUSIPCO), Rome, Italy 1057–1061. https://doi.org/10.23919/EUSIPCO.2018.8553061.

  36. Wang H, Zubin A (2015) Concept drift detection for streaming data. In: international joint conference on neural networks (IJCNN), pp 1–9

  37. Brzezinski D, Brzezinski D (2017) Stefanowski J (2017) Properties of the area under the roc curve for data streams with concept drift. Knowl Inf Syst 52:51–562

    Article  Google Scholar 

  38. Shujian Yu, Abraham Z, Wang H, Mohak S, Prinicipe J (2019) Concept drift detection and adaptation with hierarchical hypothesis testing. J Franklin Inst 356(5):3187–3215

    Article  MathSciNet  MATH  Google Scholar 

  39. Wang S, Minku LL (2019) AUC estimation and concept drift detection for imbalanced data streams with multiple classes. In: 2020 international joint conference on neural networks (IJCNN), pp 1–8

  40. Micevska S, Awad A, Sakr S (2021) SDDM: An interpretable statistical concept drift detection method for data streams. J Intell Inf Syst 56:459–484

    Article  Google Scholar 

  41. Li P, Wu M, He J, Hu X (2021) Recurring drift detection and model selection-based ensemble classification for data streams with unlabelled data. N Gener Comput 39:341–376

    Article  Google Scholar 

  42. Yang LU, Cheung Y, Tang YY (2017) Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp 393–399

  43. Yang LU, Cheung Y, Tang YY (2020) Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift. IEEE Trans Neural Netw Learn Syst 31(8):2764–2778

    Article  Google Scholar 

  44. Jiao B, Guo Y, Gong D, Chen, Q (2022) Dynamic Ensemble Selection for Imbalanced Data Streams with Concept Drift. In: proceedings of IEEE Transactions on Neural Networks and Learning Systems.

  45. Angluin D (1988) Queries and concept learning. Mach Learn 2:319–342

    Article  MathSciNet  MATH  Google Scholar 

  46. Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221

    Article  Google Scholar 

  47. Liu W, Zhang H, Ding Z, Liu Q, Zhu C (2021) A comprehensive active learning method for multiclass imbalanced data streams with concept drift. Knowl Based Syst 215:106778

    Article  Google Scholar 

  48. Korycki L, Krawczyk B (2020) Online oversampling for sparsely labelled imbalanced and nonstationary data streams. In: 2020 international joint conference on neural networks (IJCNN), pp. 1–8

  49. Krawczyk B, Pfahringer B, Wozniak M (2018) Combining active learning with concept drift detection for data stream mining. In: IEEE International Conference on big data (big data), pp 2239–2244

  50. Shan J, Zhang H, Liu W, Liu Q (2019) Online active learning ensemble framework for drifted data streams. IEEE Trans Neural Netw Learn Syst 30(2):486–498

    Article  Google Scholar 

  51. Zhang X, Yang T, Srinivasan P (2016) Online asymmetric active learning with imbalanced data. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD 16, pp 2055–2064

  52. Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, ACM/Springer, pp 3–12.

  53. Tharwat A, Schenck W (2020) Balancing exploration and exploitation: a novel active learner for imbalanced data. Knowl Based Syst 210:106500

    Article  Google Scholar 

  54. Zheng X, Li P, Hu X, Yu K (2021) Semi supervised classification on data streams with recurring concept drift and concept evolution. Knowl Based Syst 215:106749

    Article  Google Scholar 

  55. Arabmakki E (2016) A reduced labelled samples (RLS) framework for classification of imbalanced concept-drifting data

  56. Ksieniewicz P, Wozniak M, Cyganek B, Kasprzak A, Walkowiak K (2019) Data stream classification using active learned neural networks. Neurocomputing 353:74–82

    Article  Google Scholar 

  57. Hualong Y, Yang X, Zheng S, Sun C (2019) Active learning from imbalanced data a solution of online weighted extreme learning machine. IEEE Trans Neural Netw Learn Syst 30(4):1088–1103

    Article  Google Scholar 

  58. Krawczyk B, Cano A (2019) Adaptive ensemble active learning for drifting data stream mining. In: Kraus S (ed.), Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCA, Macao, China, pp 2763–2771

  59. Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on Computational Learning Theory, pp. 287–29.

  60. Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the International Conference on Machine Learning (ICML), pp 441–448, Morgan Kaufmann.

  61. Cohn D, Ghahramani Z, Jordan ML (1996) Active learning with statistical models. J Artif Intell Res 4:129–145

    Article  MATH  Google Scholar 

  62. Ienco D, Bifet A, Zliobaite I, Pfahringer B (2013) Clustering Based Active Learning for Evolving Data Streams. In: Furnkranz J, Hullermeier E, Higuchi T (eds) Discovery Science. Lecture Notes in Computer Science. Springer, Berlin

    Google Scholar 

  63. Bodo Z, Minier Z, Lehel C (2011) Active learning with clustering. JMLR Workshop Active Learn Exp Des 16:127–139

    Google Scholar 

  64. Patra S, Bruzzone L (2012) A fast cluster-assumption based batch mode active learning technique. Pattern Recogn Lett 33(9):1042–1048

    Article  Google Scholar 

  65. Patist JP (2007) Optimal window change detection. In: Proceedings of 7th IEEE International Conference of Data Mining Workshops, pp 557–562

  66. Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: International Conference on Discovery Science, Berlin, Germany, pp 264– 269

  67. Peacock JA (1983) Two-dimensional goodness of-fit testing in astronomy. Mon Not R Astron Soc 202(3):615–627

    Article  Google Scholar 

  68. Bifet A, Holmes G, Kirkby R (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  69. Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742

    Article  Google Scholar 

  70. Gama J, Sebastiao R, Rodrigues PP (2012) On evaluating stream learning algorithms. Mach Learn 90:317–346

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This study was funded by India’s defence research and development organisation (DRDO) under the sanction code ERIPR/GIA/17-18/038. The work was reviewed by the centre for artificial intelligence and robotics (CAIR). We would like to thank the late Dr. T. Maruthi Padmaja for her assistance and support in this work, and she is the grant recipient.

Author information

Authors and Affiliations

Authors

Contributions

Dirsumilli Himaja wrote the main manuscript text, prepared figures and tables and all authors reviewed the manuscript.

Corresponding author

Correspondence to D. Himaja.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Himaja, D., Dondeti, V., Uppalapati, S. et al. Cluster based active learning for classification of evolving streams. Evol. Intel. (2023). https://doi.org/10.1007/s12065-023-00879-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12065-023-00879-3

Keywords

Navigation