Abstract
Data mining in non-stationary data streams is particularly relevant in the context of Internet of Things and Big Data. Its challenges arise from fundamentally different drift types violating assumptions of data independence or stationarity. Available methods often struggle with certain forms of drift or require unavailable a priori task knowledge. We propose the Self-Adjusting Memory (SAM) model for the k-nearest-neighbor (kNN) algorithm. SAM-kNN can deal with heterogeneous concept drift, i.e., different drift types and rates, using biologically inspired memory models and their coordination. Its basic idea is to have dedicated models for current and former concepts used according to the demands of the given situation. It can be easily applied in practice without meta parameter optimization. We conduct an extensive evaluation on various benchmarks, consisting of artificial streams with known drift characteristics and real-world datasets. We explicitly add new benchmarks enabling a precise performance analysis on multiple types of drift. Highly competitive results throughout all experiments underline the robustness of SAM-kNN as well as its capability to handle heterogeneous concept drift. Knowledge about drift characteristics in streaming data is not only crucial for a precise algorithm evaluation, but it also facilitates the choice of an appropriate algorithm on real-world applications. Therefore, we additionally propose two tests, able to determine the type and strength of drift. We extract the drift characteristics of all utilized datasets and use it for our analysis of the SAM in relation to other methods.
Notes
The VFDT is often called Hoeffding Tree (HT).
In case of ties, we prioritize the models in the following order: \(\hbox {kNN}_{M_{\text {ST}}}\), \(\hbox {kNN}_{M_{\text {LT}}}, \hbox {kNN}_{M_\text {C}}\).
We used kMeans++ [31] because of its efficiency and scalability to larger datasets.
We used the complete benchmarks in our experiments, but for real-world tasks this is often unrealistic.
We used three test models with \(W=\{500, 1000, 5000\}\) in all tests.
We consistently used \(\hat{w} = 20{,}000\).
Regarding our approach, the available space is shared between the STM and LTM.
References
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile Netw Appl 19(2):171–209
Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Advances in artificial intelligence—SBIA. Springer, pp 286–295
Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531
Schiaffino S, Garcia P, Amandi A (2008) eTeacher: providing personalized assistance to e-learning students. Comput Educ 51(4):1744–1754
Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern 4:325–327
Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44
Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25
Bifet A, Gavald R. (2007) Learning from time-changing data with adaptive windowing, pp 443–448. https://doi.org/10.1137/1.9781611972771.42
Dasu T, Krishnan S, Venkatasubramanian S, Yi K (2006) An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of symposium on the interface of statistics, computing science, and applications. Citeseer
Kifer D, Ben-David S, Gehrke J (2004) Detecting change in data streams. In: Proceedings of the thirtieth international conference on very large data bases-volume 30. VLDB Endowment, pp 180–191
Bifet A, Pfahringer B, Read J, Holmes G (2013) Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th annual ACM symposium on applied computing, ser. SAC ’13. ACM, New York, NY, USA, pp 801–806
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the seventeenth international conference on machine learning, ser. ICML ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 487–494
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80
Jaber G, Cornuéjols A, Tarroux P (2013) Online learning: Searching for the best forgetting strategy under concept drift. In: Neural information processing. Springer, pp 400–408
Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases. Springer, pp 135–150
Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics, vol 3, pp 2340–2345
Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612
Hammer B, Hasenfuss A (2010) Topographic mapping of large dissimilarity data sets. Neural Comput 22(9):2229–2284
Alex N, Hasenfuss A, Hammer B (2009) Patch clustering for massive data sets. Neurocomputing 72(7–9):1455–1469
Loeffel PX, Marsala C, Detyniecki M (2015) Classification with a reject option under concept drift: the droplets algorithm. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), Oct 2015, pp 1–9
Zhang P, Gao BJ, Zhu X, Guo L (2011) Enabling fast lazy learning for data streams. In: Proceedings of the 2011 IEEE 11th international conference on data mining, ser. ICDM ’11. IEEE Computer Society, Washington, DC, USA, pp 932–941
Law Y-N, Zaniolo C (2005) An adaptive nearest neighbor classification algorithm for data streams. In: Knowledge discovery in databases: PKDD 2005. Springer, pp 108–120
Xioufis ES, Spiliopoulou M, Tsoumakas G, Vlahavas I (2011) Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence. vol 2, ser. IJCAI’11. AAAI Press, pp 1583–1588
Atkinson R, Shiffrin R (1968) Human memory: a proposed system and its control processes. Psychol Learn Motiv 2:89–195
Dudai Y (2004) The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 55:51–86
Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830
Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’01. ACM, New York, NY, USA, pp 377–382
Harries M (1999) Splice-2 comparative evaluation: Electricity pricing. Technical report, University of New South Wales
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, vol 6, pp 77–86
Kuncheva LI, Plumpton CO (2008) Adaptive learning rate for online linear discriminant classifiers. In: Structural, syntactic, and statistical pattern recognition. Springer, pp 510–519
Zliobaite I (2013) How good is the electricity benchmark for evaluating concept drift adaptation. CoRR arxiv:1301.3524
Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 523–528
Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 359–364
Losing V, Hammer B, Wersing H (2015) Interactive online learning for obstacle classification on a mobile robot. In: 2015 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Wilcox RR (2012) Introduction to robust estimation and hypothesis testing. Academic Press, London
Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc., New York
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms, ser. SODA ’93. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 311–321
Acknowledgements
Barbara Hammer acknowledges support by the Cluster of Excellence Cognitive Interaction Technology ’CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Losing, V., Hammer, B. & Wersing, H. Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM). Knowl Inf Syst 54, 171–201 (2018). https://doi.org/10.1007/s10115-017-1137-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1137-y