Abstract
Data streams are ubiquitous. Examples range from sensor networks to financial transactions and website logs. In fact, even market basket data can be seen as a stream of sales. Detecting changes in the distribution a stream is sampled from is one of the most challenging problems in stream mining, as only limited storage can be used. In this paper we analyse this problem for streams of transaction data from an MDL perspective. Based on this analysis we introduce the StreamKrimp algorithm, whichuses the Krimp algorithm to characterise probability distributions with code tables. With these code tables, StreamKrimp partitions the stream into a sequence of substreams. Each switch of code table indicates a change in the underlying distribution. Experiments on both real and artificial streams show that StreamKrimp detects the changes while using only a very limited amount of data storage.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: Proceedings of ACM SIGMOD 2003 (2003)
Aggarwal, C.C.: On Abnormality Detection in Spuriously Populated Data Streams. In: Proceedings of SIAM Conference on Data Mining 2005 (2005)
Aggarwal, C.C. (ed.): Data Streams: Models and Algorithms. Springer, Heidelberg (2007)
Calders, T., Dexters, N., Goethals, G.: Mining Frequent Itemsets in a Stream. In: Proceedings of IEEE ICDM 2007 (2007)
Chen, K., Liu, L.: Detecting the Change of Clustering Structure in Categorical Data Streams. In: Proceedings of SIAM Conference on Data Mining 2006 (2006)
Coenen, F. The LUCS-KDD Discretised/normalised ARM and CARM Data Library (2003), http://www.csc.liv.ac.uk/~frans/KDD/Software/
Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams. In: Proceedings of Interface 2006 (2006)
Geurts, K., Wets, G., Brijs, T., Vanhoof, K.: Profiling of high-frequency accident locations using association rules. In Transportation research record 1840 (2003)
Grünwald, P.D.: Minimum description length tutorial. In: Grünwald, P.D., Myung, I.J., Pitt, M.A. (eds.) Advances in Minimum Description Length. MIT Press, Cambridge (2005)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting Change in Data Streams. In: Proceedings of VLDB 2004 (2004)
Muthukrishnan, S., van den Berg, E., Wu, Y.: Sequential Change Detection on Data Streams. In: Proceedings of the ICDM Workshops 2007 (2007)
Siebes, A., Vreeken, J., Van Leeuwen, M.: Item Sets That Compress. In: Proc. of the ACM SIAM Conference on Data Mining, pp. 393–404 (2006)
Papadimitriou, S., Brockwell, A., Faloutsos, C.: Adaptive, unsupervised stream mining. The VLDB Journal 13(3), 222–239 (2004)
Vreeken, J., Van Leeuwen, M., Siebes, A.: Characterising the Difference. In: Proceedings of ACM SIGKDD 2007 (2007)
Vreeken, J., Van Leeuwen, M., Siebes, A.: Preserving Privacy through Generation. In: Proceedings of IEEE ICDM 2007 (2007)
Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23, 69–101 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
van Leeuwen, M., Siebes, A. (2008). StreamKrimp: Detecting Change in Data Streams. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_62
Download citation
DOI: https://doi.org/10.1007/978-3-540-87479-9_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87478-2
Online ISBN: 978-3-540-87479-9
eBook Packages: Computer ScienceComputer Science (R0)