Knowledge and Information Systems

, Volume 54, Issue 1, pp 171–201 | Cite as

Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM)

  • Viktor LosingEmail author
  • Barbara Hammer
  • Heiko Wersing
Regular Paper


Data mining in non-stationary data streams is particularly relevant in the context of Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the Self-Adjusting Memory (SAM) model for the k-nearest-neighbor (kNN) algorithm. SAM-kNN can deal with heterogeneous concept drift, i.e., different drift types and rates, using biologically inspired memory models and their coordination. Its basic idea is to have dedicated models for current and former concepts used according to the demands of the given situation. It can be easily applied in practice without meta parameter optimization. We conduct an extensive evaluation on various benchmarks, consisting of artificial streams with known drift characteristics and real-world datasets. We explicitly add new benchmarks enabling a precise performance analysis on multiple types of drift. Highly competitive results throughout all experiments underline the robustness of SAM-kNN as well as its capability to handle heterogeneous concept drift. Knowledge about drift characteristics in streaming data is not only crucial for a precise algorithm evaluation, but it also facilitates the choice of an appropriate algorithm on real-world applications. Therefore, we additionally propose two tests, able to determine the type and strength of drift. We extract the drift characteristics of all utilized datasets and use it for our analysis of the SAM in relation to other methods.


Concept drift Data mining Data streams Pattern recognition Online learning k-nearest neighbor 



Barbara Hammer acknowledges support by the Cluster of Excellence Cognitive Interaction Technology ’CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).


  1. 1.
    Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile Netw Appl 19(2):171–209CrossRefGoogle Scholar
  2. 2.
    Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805CrossRefzbMATHGoogle Scholar
  3. 3.
    Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Advances in artificial intelligence—SBIA. Springer, pp 286–295Google Scholar
  4. 4.
    Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790zbMATHGoogle Scholar
  5. 5.
    Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531CrossRefGoogle Scholar
  6. 6.
    Schiaffino S, Garcia P, Amandi A (2008) eTeacher: providing personalized assistance to e-learning students. Comput Educ 51(4):1744–1754CrossRefGoogle Scholar
  7. 7.
    Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern 4:325–327CrossRefGoogle Scholar
  8. 8.
    Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44CrossRefzbMATHGoogle Scholar
  9. 9.
    Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25CrossRefGoogle Scholar
  10. 10.
    Bifet A, Gavald R. (2007) Learning from time-changing data with adaptive windowing, pp 443–448.
  11. 11.
    Dasu T, Krishnan S, Venkatasubramanian S, Yi K (2006) An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of symposium on the interface of statistics, computing science, and applications. CiteseerGoogle Scholar
  12. 12.
    Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the thirtieth international conference on very large data bases-volume 30. VLDB Endowment, pp 180–191Google Scholar
  13. 13.
    Bifet A, Pfahringer B, Read J, Holmes G (2013) Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th annual ACM symposium on applied computing, ser. SAC ’13. ACM, New York, NY, USA, pp 801–806Google Scholar
  14. 14.
    Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101Google Scholar
  15. 15.
    Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the seventeenth international conference on machine learning, ser. ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 487–494Google Scholar
  16. 16.
    Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80Google Scholar
  17. 17.
    Jaber G, Cornuéjols A, Tarroux P (2013) Online learning: Searching for the best forgetting strategy under concept drift. In: Neural information processing. Springer, pp 400–408Google Scholar
  18. 18.
    Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases. Springer, pp 135–150Google Scholar
  19. 19.
    Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics, vol 3, pp 2340–2345Google Scholar
  20. 20.
    Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612Google Scholar
  21. 21.
    Hammer B, Hasenfuss A (2010) Topographic mapping of large dissimilarity data sets. Neural Comput 22(9):2229–2284MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Alex N, Hasenfuss A, Hammer B (2009) Patch clustering for massive data sets. Neurocomputing 72(7–9):1455–1469CrossRefGoogle Scholar
  23. 23.
    Loeffel PX, Marsala C, Detyniecki M (2015) Classification with a reject option under concept drift: the droplets algorithm. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), Oct 2015, pp 1–9Google Scholar
  24. 24.
    Zhang P, Gao BJ, Zhu X, Guo L (2011) Enabling fast lazy learning for data streams. In: Proceedings of the 2011 IEEE 11th international conference on data mining, ser. ICDM ’11. IEEE Computer Society, Washington, DC, USA, pp 932–941Google Scholar
  25. 25.
    Law Y-N, Zaniolo C (2005) An adaptive nearest neighbor classification algorithm for data streams. In: Knowledge discovery in databases: PKDD 2005. Springer, pp 108–120Google Scholar
  26. 26.
    Xioufis ES, Spiliopoulou M, Tsoumakas G, Vlahavas I (2011) Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence. vol 2, ser. IJCAI’11. AAAI Press, pp 1583–1588Google Scholar
  27. 27.
    Atkinson R, Shiffrin R (1968) Human memory: a proposed system and its control processes. Psychol Learn Motiv 2:89–195CrossRefGoogle Scholar
  28. 28.
    Dudai Y (2004) The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 55:51–86CrossRefGoogle Scholar
  29. 29.
    Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81CrossRefGoogle Scholar
  30. 30.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830MathSciNetzbMATHGoogle Scholar
  31. 31.
    Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035Google Scholar
  32. 32.
    Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604Google Scholar
  33. 33.
    Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’01. ACM, New York, NY, USA, pp 377–382Google Scholar
  34. 34.
    Harries M (1999) Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South WalesGoogle Scholar
  35. 35.
    Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86Google Scholar
  36. 36.
    Kuncheva LI, Plumpton CO (2008) Adaptive learning rate for online linear discriminant classifiers. In: Structural, syntactic, and statistical pattern recognition. Springer, pp 510–519Google Scholar
  37. 37.
    Zliobaite I (2013) How good is the electricity benchmark for evaluating concept drift adaptation. CoRR arxiv:1301.3524
  38. 38.
    Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 523–528Google Scholar
  39. 39.
    Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 359–364Google Scholar
  40. 40.
    Losing V, Hammer B, Wersing H (2015) Interactive online learning for obstacle classification on a mobile robot. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8Google Scholar
  41. 41.
    Wilcox RR (2012) Introduction to robust estimation and hypothesis testing. Academic Press, LondonzbMATHGoogle Scholar
  42. 42.
    Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc., New YorkGoogle Scholar
  43. 43.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324Google Scholar
  44. 44.
    Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale SeeberGoogle Scholar
  45. 45.
    Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, MontereyzbMATHGoogle Scholar
  46. 46.
    Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms, ser. SODA ’93. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 311–321Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2017

Authors and Affiliations

  1. 1.Bielefeld UniversityBielefeldGermany
  2. 2.HONDA Research Institute EuropeOffenbachGermany

Personalised recommendations