Skip to main content
Log in

Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Data stream classification is an important research direction in the field of data mining, but in many practical applications, it is impossible to collect the complete training set at one time, and the data may be in an imbalanced state and interspersed with concept drift, which will greatly affect the classification performance. To this end, an online dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream (DESW-ID) is proposed. The algorithm employs various balancing measures, first resampling the data stream using Poisson distribution, and if it is in a highly imbalanced state then secondary sampling is performed using a window storing a minority class instances to achieve the current balanced state of the data. To improve the processing efficiency of the algorithm, a classifier selection ensemble is proposed to dynamically adjust the number of classifiers, and the algorithm runs with an ADWIN detector to detect the presence of concept drift. The experimental results show that the proposed algorithm ranks first on average in all five classification performance metrics compared to the state-of-the-art methods. Therefore, the proposed algorithm has better classification performance for imbalanced data streams with concept drift and also improves the operation efficiency of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Bernardo A, Gomes HM et al (2020) C-SMOTE: continuous synthetic minority oversampling for evolving data streams. In: Proceedings of the IEEE international conference on big data, pp 483–492

  2. Ren SQ, Zhu W, Li Z et al (2018) The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286:150–166

    Article  Google Scholar 

  3. Li H, Wang Y, Wang H (2017) Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web 20(6):1507–1525

    Article  Google Scholar 

  4. Gama J, Medas P (2004) Learning with drift detection detection. Adv Artif Intell 3171:286–295

    MATH  Google Scholar 

  5. Baena-Garc M, Campo-Ávila JD, Fidalgo-Merino R et al (2006) Early drift detection method. In: International workshop on knowledge discovery from data streams, vol 6, pp 77–86

  6. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the seventh SIAM international conference on data mining, pp 443–448

  7. Ren SQ, Zhu W, Liao B et al (2019) Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning. Knowl Based Syst 163:705–722

    Article  Google Scholar 

  8. Chawla NV, Lazarevic A, Hall LO et al (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of knowledge discovery in databases: PKDD 2003, vol 2838. Springer, Berlin, pp 107–109

  9. Du HL, Zhang Y, Gang K et al (2021) Online ensemble learning algorithm for imbalanced data stream. Appl Soft Comput 107:107378

    Article  Google Scholar 

  10. Sun Y, Kamel MS, Wong AKC et al (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378

    Article  MATH  Google Scholar 

  11. Oza NC, Russell S (2005) Online bagging and boosting. In: Proceedings of artificial intelligence and statistics, pp 105–112

  12. Barros RSM, Santos SGT (2016) A boosting-like online learning ensemble. In: Proceedings of the 26 international joint conference on neural networks, pp 1871–1878

  13. Wang BY, Pineau J (2016) Online bagging and boosting for imbalanced data stream. IEEE Trans Knowl Data Eng 28(12):3353–3366

    Article  Google Scholar 

  14. Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368

    Article  Google Scholar 

  15. Hou WH, Wang XK, Zhang HY et al (2020) A novel dynamic ensemble selection classifier for an imbalanced dataset: an application for credit risk assessment. Knowl Based Syst 208:106462

    Article  Google Scholar 

  16. Ko AHR, Sabourin R, Britto AS et al (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41:1718–1731

    Article  MATH  Google Scholar 

  17. Woloszynski T, Kurzynski M, Podsiadlo P et al (2012) A measure of competence based on random classification for dynamic ensemble selection. Inf Fusion 13:207–213

    Article  Google Scholar 

  18. Soares RGF, Santana A, Canuto AMP et al (2006) Using accuracy and diversity to select classifiers to build ensembles. In: Proceedings of IEEE international joint conference on neural network, Vancouver, Canada, pp 1310–1316

  19. Cruz RMO, Sabourin R, Cavalcanti GDC et al (2015) META-DES: a dynamic ensemble selection framework using meta-learning. Pattern Recogn 48:1925–1935

    Article  Google Scholar 

  20. García S, Zhang ZL, Altalhi A et al (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:445–446

    MathSciNet  Google Scholar 

  21. Zhang XL, Han M, Chen ZQ, Wu HX, Li MH (2021) An overview of complex data stream ensemble classification. J Intell Fuzzy Syst 41(2):3667–3695

    Article  Google Scholar 

  22. Wang H, Fan W, Yu PS (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235

  23. Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382

  24. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 97–106

  25. Cano A, Krawczyk B (2020) Kappa updated ensemble for drifting data stream mining. Mach Learn 1(109):178–218

    MathSciNet  MATH  Google Scholar 

  26. Zyblewski P, Sabourin R, Wozniak M (2021) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Inf Fusion 66:138–154

    Article  Google Scholar 

  27. Bernardo A, Valle DE, Bifet A (2020) Increment rebalancing learning on evolving data streams. In: Proceedings of the 20th international conference on data mining workshops (ICDM), pp 844–850

  28. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 71–80

  29. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  30. Lemaire V, Salperwyck C, Bondu A (2015) A survey on supervised classification on data streams. Lecture Notes Bus Inf Process 205:88–125

    Article  Google Scholar 

Download references

Funding

This work was supported by the National Nature Science Foundation of China (62062004), the Ningxia Natural Science Foundation Project (2020AAC03216, 2022AAC03279) and the Graduate Innovation Project of North Minzu University (YCX21085).

Author information

Authors and Affiliations

Authors

Contributions

MH completes the main work; XZ completed the coding of the model, some experiments and the writing of the main paper; ZC completed some experiments and the production of experimental diagrams; HW and ML participated in the coordination of the study and reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Meng Han.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Human or animal rights

With the unanimous consent of all our authors, the paper is only about a research on a machine learning algorithm and does not involve Human Participants and/or Animals. All data are open source and do not involve the interests of others.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, M., Zhang, X., Chen, Z. et al. Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream. Knowl Inf Syst 65, 1105–1128 (2023). https://doi.org/10.1007/s10115-022-01791-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01791-5

Keywords

Navigation