Abstract
Nowadays the large volume of data from different sources especially as streaming data opens us various opportunities for streaming analytics. Concept drift is one of the challenging streaming analytic problems which observes the changes in the distribution of the data over time, and detecting and adapting these attracted many researchers. In this work, we modeled concept drift detection as a machine learning problem. We have followed a semi-supervised learning approach by utilizing a statistical test, the Kolmogorov–Smirnov test which determines the variation of two time series distributions. The core work is to build a classifier that is capable of predicting the given window of data stream holds drift or not. As we have no labels representing drift or not drift for the stream windows, we have explicitly labeled some beginning parts of the stream using the Kolmogorov–Smirnov test and utilized these for building the classifier. Using this classifier, further, we can detect the drift. Also, we have used overlapping windows to avoid information loss. For building the classifier we have applied various classification models like Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, Decision Tree, and Random Forest. Among them KNN model has a low false-positive rate and outperformed others with an accuracy 96%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lu N, Zhang G, Lu J (2014) Concept drift detection via competence models. Artif Intell 209:11–28
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1:317–354
Liu A, Song Y, Zhang G, Lu J (2014) Regional concept drift detection and density synchronized drift adaptation. University of Technology, Sydney, Australia, Faculty of Engineering and Information Technology, pp 2280–2286
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. Lecture notes computer science (including subseries lecture notes artificial intelligence lecture notes bioinformatics), vol 3171, pp 286–295
Baena-García M et al (2006) Early drift detection method. 4th ECML PKDD Int Work Knowl Discov Data Streams 6:77–86
Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the seventh SIAM international conference on data mining, pp 443–448
Frías-Blanco I et al (2015) Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans Knowl Data Eng 27:810–823
Dos Reis D, Flach P, Matwin S, Batista G (2016) Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1545–1554
Sethi TS, Kantardzic M, Arabmakki E (2016) Monitoring classification blindspots to detect drifts from unlabeled data. Proceedings—2016 IEEE international conference on information reuse and integration, pp 142–151
Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24:619–633
Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing 92:145–155
Widmer G, Widmer G (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:69–101
Gama J, Kosina P (2014) Recurrent concepts in data streams classification. Knowl Inf Syst 40:489–507
Cohen L, Avrahami-Bakish G, Last M, Kandel A, Kipersztok O (2008) Real-time data mining of non-stationary data streams from sensor networks. Inf Fusion 9:344–353
Raab C, Heusinger M, Schleif FM (2020) Reactive soft prototype computing for concept drift streams. Neurocomputing 416:340–351
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Networks 22(10):1517–1531
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jafseer, K.T., Shailesh, S., Sreekumar, A. (2023). Modeling Concept Drift Detection as Machine Learning Model Using Overlapping Window and Kolmogorov–Smirnov Test. In: Doriya, R., Soni, B., Shukla, A., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. Lecture Notes in Electrical Engineering, vol 946. Springer, Singapore. https://doi.org/10.1007/978-981-19-5868-7_10
Download citation
DOI: https://doi.org/10.1007/978-981-19-5868-7_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5867-0
Online ISBN: 978-981-19-5868-7
eBook Packages: Computer ScienceComputer Science (R0)