Maintaining Gaussian Mixture Models of Data Streams Under Block Evolution

  • J. P. Patist
  • W. Kowalczyk
  • E. Marchiori
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3991)

Abstract

A new method for maintaining a Gaussian mixture model of a data stream that arrives in blocks is presented. The method constructs local Gaussian mixtures for each block of data and iteratively merges pairs of closest components. Time and space complexity analysis of the presented approach demonstrates that it is 1-2 orders of magnitude more efficient than the standard EM algorithm, both in terms of required memory and runtime.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)Google Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS 2002: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–16. ACM Press, New York (2002)CrossRefGoogle Scholar
  3. 3.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society 39(B), 1–38 (1977)Google Scholar
  4. 4.
    Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: A review. ACM SIGMOD Record 34(1) (2005)Google Scholar
  5. 5.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining data streams under block evolution. SIGKDD Explorations 3(2), 1–10 (2002)CrossRefGoogle Scholar
  6. 6.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRefGoogle Scholar
  7. 7.
    Hotelling, H.: Multivariate quality control. In: Eisenhart, C., Hastay, M.W., Wallis, W.A. (eds.) Techniques of Statistical Analysis, pp. 11–184. McGraw-Hill, New York (1947)Google Scholar
  8. 8.
    Nassar, S., Sander, J., Cheng, C.: Incremental and effective data summarization for dynamic hierarchical clustering. In: SIGMOD 2004: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pp. 467–478. ACM Press, New York (2004)CrossRefGoogle Scholar
  9. 9.
    Patist, J., Kowalczyk, W., Marchiori, E.: Efficient Maintenance of Gaussian Mixture Models for Data Streams. Technical Report, Vrije Universiteit Amsterdam (2005), http://www.cs.vu.nl/~jpp/GaussianMixtures.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • J. P. Patist
    • 1
  • W. Kowalczyk
    • 1
  • E. Marchiori
    • 1
  1. 1.Department of Computer ScienceFree University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations