Skip to main content

CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream

  • Conference paper
  • First Online:
Emerging Research in Electronics, Computer Science and Technology

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 545))

Abstract

In recent years, data stream has been considered as one of the primary sources of big data. Data stream has grown very rapidly in the last decades. Data stream environment has many features distinguishing the batch learning data which arrives on the fly with high speed. Data stream mining has attracted research focus due to its presence in many real-time applications such as telecommunication, networking, and banking. One of the most important challenges in data stream is the distribution of data is changing continuously which is leading to the phenomenon called “concept drift.” Another issue for streaming data is dealing with imbalanced class in the dataset. Many classification algorithms have been made to cope with the concept drift; however, many of them are dealing with the drift from the balanced data. In this paper, we propose a model called “CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream” which aims to handle the imbalanced data and detect the concept drift and behave equally with different types of drift. The algorithm was evaluated on real and synthetic dataset and compared with leading edge methods AWE, SMOTE, SERA, and OOB. Our method performs significantly better average prediction accuracy than the other compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sayed-Mouchaweh M (2016) Learning from data streams in dynamic environments. Springer International Publishing

    Google Scholar 

  2. Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156

    Article  Google Scholar 

  3. Escovedo T, Koshiyama A, da Cruz AA, Vellasco M (2018) DetectA: abrupt concept drift detection in non-stationary environments. Appl Soft Comput J 62:119–133

    Article  Google Scholar 

  4. Rossi ALD, De Souza BF, Soares C, De Carvalho ACPDLF (2017) A guidance of data stream characterization for meta-learning. Intell Data Anal 21(4):1015–1035

    Article  Google Scholar 

  5. Ruano-Ordás D, Fdez-Riverola F, Méndez JR (2018) Concept drift in e-mail datasets: an empirical study with practical implications. Inf Sci (Ny) 428:120–135

    Article  MathSciNet  Google Scholar 

  6. Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57

    Article  Google Scholar 

  7. Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):1–37

    Article  Google Scholar 

  8. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531

    Google Scholar 

  9. Costa J, Silva C, Antunes M, Ribeiro B (2017) Adaptive learning for dynamic environments: a comparative approach. Eng Appl Artif Intell 65(March):336–345

    Article  Google Scholar 

  10. Gama J, Rodrigues PP, Spinosa E, Carvalho A (2010) Knowledge discovery from data streams. Web Intell Secur—Adv Data Text Min Tech Detect Prev Terror Act Web 125–138

    Google Scholar 

  11. Roli F, Kittler J, Windeatt T (2004) Multiple classifier systems

    Google Scholar 

  12. Kmieciak MR, Stefanowski J (2011) Handling sudden concept drift in enron messages data stream. Control Cybern 667–695

    Google Scholar 

  13. Abdualrhman MAA, Padma MC (2017) CS-IBC: cuckoo search based incremental binary classifier for data streams. J King Saud Univ—Comput Inf Sci

    Google Scholar 

  14. Bifet A, Holmes G, Kirkby R, Pfahringer B (2011) MOA massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  15. Abdualrhman MA, Padma MC (2015) Benchmarking concept drift adoption strategies for high speed data stream mining 2–7

    Google Scholar 

  16. Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):2004–2015

    Google Scholar 

  17. Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25

    Article  Google Scholar 

  18. Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of 2007 SIAM international conference on data mining, pp 443–448

    Google Scholar 

  19. Cohen E, Strauss MJ (2006) Maintaining time-decaying stream aggregates. J Algorithms 59(1):19–36

    Article  MathSciNet  Google Scholar 

  20. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  21. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of seventeenth international conference on machine learning, vol 11(May), pp 487–494

    Google Scholar 

  22. Hay DF (1978) On the window size for classification in changing environments. PsycCRITIQUES 23:1–9

    Google Scholar 

  23. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. Brazilian Symp Artif Intell 286–295

    Google Scholar 

  24. Domingos P, Hulten G (2000) Mining high-speed data streams. Kdd 71–80

    Google Scholar 

  25. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of seventh ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’01, pp 97–106

    Google Scholar 

  26. Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. Discovery 4755:264–269

    Article  Google Scholar 

  27. Bifet A et al (2006) Early drift detection method. 4th ECML PKDD Int Work Knowl Discov from Data Streams 6:77–86

    Google Scholar 

  28. Bach SH, Maloof MA (2008) Paired learners for concept drift. In: Proceedings of the IEEE international conference on data mining, ICDM 23–32

    Google Scholar 

  29. Abdualrhman MAA, Padma MC (2019) Deterministic concept drift detection in ensemble classifier based data stream classification process. Int J Grid High Perform Comput 11(1). (on press)

    Google Scholar 

  30. He H, Garcia E (2009) Learning from imbalanced data. Data Eng IEEE Trans 21(9):1263–1284

    Article  Google Scholar 

  31. Nn T, Bayes T, Cnn T, Cnn T (1967) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):1966–1967

    Google Scholar 

  32. Li C (2007) Classifying imbalanced data using a bagging ensemble variation (BEV). In: Proceedings of the ACM Southeast regional conference, pp 203–208

    Google Scholar 

  33. Ditzler G, Muhlbaier MD, Polikar R (2010) Incremental learning of new classes in unbalanced datasets: learn ++.UDNC 33–42

    Google Scholar 

  34. Fan W, Huang Y, Wang H, Philip SY (2004) Active mining of data streams. Sdm 457–461

    Google Scholar 

  35. Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery data mining—KDD ’01, pp 377–382

    Google Scholar 

  36. SPLICE-2 comparative evaluation: electricity pricing. https://www.ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/9905.pdf

  37. Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International joint conference, no neural networks, 2009. IJCNN 2009, vol 201, pp 1141–1141

    Google Scholar 

  38. Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235

    Google Scholar 

  39. KegelmeyeWP, Chawla NV, Bowyer KW, Hall LO (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Google Scholar 

  40. Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Ahmed Ali Abdualrhman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abdualrhman, M.A.A., Padma, M.C. (2019). CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream. In: Sridhar, V., Padma, M., Rao, K. (eds) Emerging Research in Electronics, Computer Science and Technology. Lecture Notes in Electrical Engineering, vol 545. Springer, Singapore. https://doi.org/10.1007/978-981-13-5802-9_54

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5802-9_54

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5801-2

  • Online ISBN: 978-981-13-5802-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics