Abstract
In recent years, data stream has been considered as one of the primary sources of big data. Data stream has grown very rapidly in the last decades. Data stream environment has many features distinguishing the batch learning data which arrives on the fly with high speed. Data stream mining has attracted research focus due to its presence in many real-time applications such as telecommunication, networking, and banking. One of the most important challenges in data stream is the distribution of data is changing continuously which is leading to the phenomenon called “concept drift.” Another issue for streaming data is dealing with imbalanced class in the dataset. Many classification algorithms have been made to cope with the concept drift; however, many of them are dealing with the drift from the balanced data. In this paper, we propose a model called “CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream” which aims to handle the imbalanced data and detect the concept drift and behave equally with different types of drift. The algorithm was evaluated on real and synthetic dataset and compared with leading edge methods AWE, SMOTE, SERA, and OOB. Our method performs significantly better average prediction accuracy than the other compared methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sayed-Mouchaweh M (2016) Learning from data streams in dynamic environments. Springer International Publishing
Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156
Escovedo T, Koshiyama A, da Cruz AA, Vellasco M (2018) DetectA: abrupt concept drift detection in non-stationary environments. Appl Soft Comput J 62:119–133
Rossi ALD, De Souza BF, Soares C, De Carvalho ACPDLF (2017) A guidance of data stream characterization for meta-learning. Intell Data Anal 21(4):1015–1035
Ruano-Ordás D, Fdez-Riverola F, Méndez JR (2018) Concept drift in e-mail datasets: an empirical study with practical implications. Inf Sci (Ny) 428:120–135
Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57
Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):1–37
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531
Costa J, Silva C, Antunes M, Ribeiro B (2017) Adaptive learning for dynamic environments: a comparative approach. Eng Appl Artif Intell 65(March):336–345
Gama J, Rodrigues PP, Spinosa E, Carvalho A (2010) Knowledge discovery from data streams. Web Intell Secur—Adv Data Text Min Tech Detect Prev Terror Act Web 125–138
Roli F, Kittler J, Windeatt T (2004) Multiple classifier systems
Kmieciak MR, Stefanowski J (2011) Handling sudden concept drift in enron messages data stream. Control Cybern 667–695
Abdualrhman MAA, Padma MC (2017) CS-IBC: cuckoo search based incremental binary classifier for data streams. J King Saud Univ—Comput Inf Sci
Bifet A, Holmes G, Kirkby R, Pfahringer B (2011) MOA massive online analysis. J Mach Learn Res 11:1601–1604
Abdualrhman MA, Padma MC (2015) Benchmarking concept drift adoption strategies for high speed data stream mining 2–7
Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Sci Dep Trinity Coll Dublin 4(C):2004–2015
Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25
Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of 2007 SIAM international conference on data mining, pp 443–448
Cohen E, Strauss MJ (2006) Maintaining time-decaying stream aggregates. J Algorithms 59(1):19–36
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of seventeenth international conference on machine learning, vol 11(May), pp 487–494
Hay DF (1978) On the window size for classification in changing environments. PsycCRITIQUES 23:1–9
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. Brazilian Symp Artif Intell 286–295
Domingos P, Hulten G (2000) Mining high-speed data streams. Kdd 71–80
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of seventh ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’01, pp 97–106
Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. Discovery 4755:264–269
Bifet A et al (2006) Early drift detection method. 4th ECML PKDD Int Work Knowl Discov from Data Streams 6:77–86
Bach SH, Maloof MA (2008) Paired learners for concept drift. In: Proceedings of the IEEE international conference on data mining, ICDM 23–32
Abdualrhman MAA, Padma MC (2019) Deterministic concept drift detection in ensemble classifier based data stream classification process. Int J Grid High Perform Comput 11(1). (on press)
He H, Garcia E (2009) Learning from imbalanced data. Data Eng IEEE Trans 21(9):1263–1284
Nn T, Bayes T, Cnn T, Cnn T (1967) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):1966–1967
Li C (2007) Classifying imbalanced data using a bagging ensemble variation (BEV). In: Proceedings of the ACM Southeast regional conference, pp 203–208
Ditzler G, Muhlbaier MD, Polikar R (2010) Incremental learning of new classes in unbalanced datasets: learn ++.UDNC 33–42
Fan W, Huang Y, Wang H, Philip SY (2004) Active mining of data streams. Sdm 457–461
Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery data mining—KDD ’01, pp 377–382
SPLICE-2 comparative evaluation: electricity pricing. https://www.ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/9905.pdf
Chen S, He H (2009) Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: International joint conference, no neural networks, 2009. IJCNN 2009, vol 201, pp 1141–1141
Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235
KegelmeyeWP, Chawla NV, Bowyer KW, Hall LO (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Abdualrhman, M.A.A., Padma, M.C. (2019). CD2A: Concept Drift Detection Approach Toward Imbalanced Data Stream. In: Sridhar, V., Padma, M., Rao, K. (eds) Emerging Research in Electronics, Computer Science and Technology. Lecture Notes in Electrical Engineering, vol 545. Springer, Singapore. https://doi.org/10.1007/978-981-13-5802-9_54
Download citation
DOI: https://doi.org/10.1007/978-981-13-5802-9_54
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5801-2
Online ISBN: 978-981-13-5802-9
eBook Packages: EngineeringEngineering (R0)