Skip to main content
Log in

Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM)

Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Data mining in non-stationary data streams is particularly relevant in the context of Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the Self-Adjusting Memory (SAM) model for the k-nearest-neighbor (kNN) algorithm. SAM-kNN can deal with heterogeneous concept drift, i.e., different drift types and rates, using biologically inspired memory models and their coordination. Its basic idea is to have dedicated models for current and former concepts used according to the demands of the given situation. It can be easily applied in practice without meta parameter optimization. We conduct an extensive evaluation on various benchmarks, consisting of artificial streams with known drift characteristics and real-world datasets. We explicitly add new benchmarks enabling a precise performance analysis on multiple types of drift. Highly competitive results throughout all experiments underline the robustness of SAM-kNN as well as its capability to handle heterogeneous concept drift. Knowledge about drift characteristics in streaming data is not only crucial for a precise algorithm evaluation, but it also facilitates the choice of an appropriate algorithm on real-world applications. Therefore, we additionally propose two tests, able to determine the type and strength of drift. We extract the drift characteristics of all utilized datasets and use it for our analysis of the SAM in relation to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Source of the images: https://www.skylinewebcams.com (color figure online)

Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. The VFDT is often called Hoeffding Tree (HT).

  2. In case of ties, we prioritize the models in the following order: \(\hbox {kNN}_{M_{\text {ST}}}\), \(\hbox {kNN}_{M_{\text {LT}}}, \hbox {kNN}_{M_\text {C}}\).

  3. We used kMeans++ [31] because of its efficiency and scalability to larger datasets.

  4. https://github.com/vlosing.

  5. We used the complete benchmarks in our experiments, but for real-world tasks this is often unrealistic.

  6. We used three test models with \(W=\{500, 1000, 5000\}\) in all tests.

  7. We consistently used \(\hat{w} = 20{,}000\).

  8. Regarding our approach, the available space is shared between the STM and LTM.

References

  1. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile Netw Appl 19(2):171–209

    Article  Google Scholar 

  2. Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805

    Article  MATH  Google Scholar 

  3. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Advances in artificial intelligence—SBIA. Springer, pp 286–295

  4. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    MATH  Google Scholar 

  5. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531

    Article  Google Scholar 

  6. Schiaffino S, Garcia P, Amandi A (2008) eTeacher: providing personalized assistance to e-learning students. Comput Educ 51(4):1744–1754

    Article  Google Scholar 

  7. Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern 4:325–327

    Article  Google Scholar 

  8. Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44

    Article  MATH  Google Scholar 

  9. Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25

    Article  Google Scholar 

  10. Bifet A, Gavald R. (2007) Learning from time-changing data with adaptive windowing, pp 443–448. https://doi.org/10.1137/1.9781611972771.42

  11. Dasu T, Krishnan S, Venkatasubramanian S, Yi K (2006) An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of symposium on the interface of statistics, computing science, and applications. Citeseer

  12. Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the thirtieth international conference on very large data bases-volume 30. VLDB Endowment, pp 180–191

  13. Bifet A, Pfahringer B, Read J, Holmes G (2013) Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th annual ACM symposium on applied computing, ser. SAC ’13. ACM, New York, NY, USA, pp 801–806

  14. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  15. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the seventeenth international conference on machine learning, ser. ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 487–494

  16. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80

  17. Jaber G, Cornuéjols A, Tarroux P (2013) Online learning: Searching for the best forgetting strategy under concept drift. In: Neural information processing. Springer, pp 400–408

  18. Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases. Springer, pp 135–150

  19. Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics, vol 3, pp 2340–2345

  20. Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612

    Google Scholar 

  21. Hammer B, Hasenfuss A (2010) Topographic mapping of large dissimilarity data sets. Neural Comput 22(9):2229–2284

    Article  MathSciNet  MATH  Google Scholar 

  22. Alex N, Hasenfuss A, Hammer B (2009) Patch clustering for massive data sets. Neurocomputing 72(7–9):1455–1469

    Article  Google Scholar 

  23. Loeffel PX, Marsala C, Detyniecki M (2015) Classification with a reject option under concept drift: the droplets algorithm. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), Oct 2015, pp 1–9

  24. Zhang P, Gao BJ, Zhu X, Guo L (2011) Enabling fast lazy learning for data streams. In: Proceedings of the 2011 IEEE 11th international conference on data mining, ser. ICDM ’11. IEEE Computer Society, Washington, DC, USA, pp 932–941

  25. Law Y-N, Zaniolo C (2005) An adaptive nearest neighbor classification algorithm for data streams. In: Knowledge discovery in databases: PKDD 2005. Springer, pp 108–120

  26. Xioufis ES, Spiliopoulou M, Tsoumakas G, Vlahavas I (2011) Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence. vol 2, ser. IJCAI’11. AAAI Press, pp 1583–1588

  27. Atkinson R, Shiffrin R (1968) Human memory: a proposed system and its control processes. Psychol Learn Motiv 2:89–195

    Article  Google Scholar 

  28. Dudai Y (2004) The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 55:51–86

    Article  Google Scholar 

  29. Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81

    Article  Google Scholar 

  30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  31. Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035

  32. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  33. Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’01. ACM, New York, NY, USA, pp 377–382

  34. Harries M (1999) Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South Wales

  35. Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86

  36. Kuncheva LI, Plumpton CO (2008) Adaptive learning rate for online linear discriminant classifiers. In: Structural, syntactic, and statistical pattern recognition. Springer, pp 510–519

  37. Zliobaite I (2013) How good is the electricity benchmark for evaluating concept drift adaptation. CoRR arxiv:1301.3524

  38. Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 523–528

  39. Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 359–364

  40. Losing V, Hammer B, Wersing H (2015) Interactive online learning for obstacle classification on a mobile robot. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  41. Wilcox RR (2012) Introduction to robust estimation and hypothesis testing. Academic Press, London

    MATH  Google Scholar 

  42. Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc., New York

  43. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

  44. Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber

  45. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey

    MATH  Google Scholar 

  46. Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms, ser. SODA ’93. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 311–321

Download references

Acknowledgements

Barbara Hammer acknowledges support by the Cluster of Excellence Cognitive Interaction Technology ’CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viktor Losing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Losing, V., Hammer, B. & Wersing, H. Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM). Knowl Inf Syst 54, 171–201 (2018). https://doi.org/10.1007/s10115-017-1137-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1137-y

Keywords

Navigation