Skip to main content

Modeling Concept Drift Detection as Machine Learning Model Using Overlapping Window and Kolmogorov–Smirnov Test

  • Conference paper
  • First Online:
Machine Learning, Image Processing, Network Security and Data Sciences

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 946))

Abstract

Nowadays the large volume of data from different sources especially as streaming data opens us various opportunities for streaming analytics. Concept drift is one of the challenging streaming analytic problems which observes the changes in the distribution of the data over time, and detecting and adapting these attracted many researchers. In this work, we modeled concept drift detection as a machine learning problem. We have followed a semi-supervised learning approach by utilizing a statistical test, the Kolmogorov–Smirnov test which determines the variation of two time series distributions. The core work is to build a classifier that is capable of predicting the given window of data stream holds drift or not. As we have no labels representing drift or not drift for the stream windows, we have explicitly labeled some beginning parts of the stream using the Kolmogorov–Smirnov test and utilized these for building the classifier. Using this classifier, further, we can detect the drift. Also, we have used overlapping windows to avoid information loss. For building the classifier we have applied various classification models like Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, Decision Tree, and Random Forest. Among them KNN model has a low false-positive rate and outperformed others with an accuracy 96%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lu N, Zhang G, Lu J (2014) Concept drift detection via competence models. Artif Intell 209:11–28

    Article  MathSciNet  MATH  Google Scholar 

  2. Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1:317–354

    Article  Google Scholar 

  3. Liu A, Song Y, Zhang G, Lu J (2014) Regional concept drift detection and density synchronized drift adaptation. University of Technology, Sydney, Australia, Faculty of Engineering and Information Technology, pp 2280–2286

    Google Scholar 

  4. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. Lecture notes computer science (including subseries lecture notes artificial intelligence lecture notes bioinformatics), vol 3171, pp 286–295

    Google Scholar 

  5. Baena-García M et al (2006) Early drift detection method. 4th ECML PKDD Int Work Knowl Discov Data Streams 6:77–86

    Google Scholar 

  6. Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the seventh SIAM international conference on data mining, pp 443–448

    Google Scholar 

  7. Frías-Blanco I et al (2015) Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans Knowl Data Eng 27:810–823

    Article  Google Scholar 

  8. Dos Reis D, Flach P, Matwin S, Batista G (2016) Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1545–1554

    Google Scholar 

  9. Sethi TS, Kantardzic M, Arabmakki E (2016) Monitoring classification blindspots to detect drifts from unlabeled data. Proceedings—2016 IEEE international conference on information reuse and integration, pp 142–151

    Google Scholar 

  10. Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24:619–633

    Google Scholar 

  11. Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing 92:145–155

    Article  Google Scholar 

  12. Widmer G, Widmer G (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:69–101

    Google Scholar 

  13. Gama J, Kosina P (2014) Recurrent concepts in data streams classification. Knowl Inf Syst 40:489–507

    Article  Google Scholar 

  14. Cohen L, Avrahami-Bakish G, Last M, Kandel A, Kipersztok O (2008) Real-time data mining of non-stationary data streams from sensor networks. Inf Fusion 9:344–353

    Article  Google Scholar 

  15. Raab C, Heusinger M, Schleif FM (2020) Reactive soft prototype computing for concept drift streams. Neurocomputing 416:340–351

    Article  Google Scholar 

  16. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Networks 22(10):1517–1531

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. T. Jafseer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jafseer, K.T., Shailesh, S., Sreekumar, A. (2023). Modeling Concept Drift Detection as Machine Learning Model Using Overlapping Window and Kolmogorov–Smirnov Test. In: Doriya, R., Soni, B., Shukla, A., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. Lecture Notes in Electrical Engineering, vol 946. Springer, Singapore. https://doi.org/10.1007/978-981-19-5868-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-5868-7_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-5867-0

  • Online ISBN: 978-981-19-5868-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics