Advertisement

Incremental and Adaptive Clustering Stream Data over Sliding Window

  • Xuan Hong Dang
  • Vincent C. S. Lee
  • Wee Keong Ng
  • Kok Leong Ong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5690)

Abstract

Cluster analysis has played a key role in data stream understanding. The problem is difficult when the clustering task is considered in a sliding window model in which the requirement of outdated data elimination must be dealt with properly. We propose SWEM algorithm that is designed based on the Expectation Maximization technique to address these challenges. Equipped in SWEM is the capability to compute clusters incrementally using a small number of statistics summarized over the stream and the capability to adapt to the stream distribution’s changes. The feasibility of SWEM has been verified via a number of experiments and we show that it is superior than Clustream algorithm, for both synthetic and real datasets.

Keywords

Time Slot Data Stream Global Cluster Sliding Window Cluster Stream 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Charu, C.A., Jiawei, H., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: VLDB conference, pp. 852–863 (2004)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB Conference, pp. 81–92 (2003)Google Scholar
  3. 3.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)Google Scholar
  4. 4.
    Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: SODA, pp. 633–634 (2002)Google Scholar
  5. 5.
    Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: PODS (2003)Google Scholar
  6. 6.
    Moses, C., Liadan, O., Better, R.P.: Streaming algorithms for clustering problems. In: ACM symposium on Theory of computing, pp. 30–39 (2003)Google Scholar
  7. 7.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SDM (2006)Google Scholar
  8. 8.
    Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: SIGKDD Conference, pp. 133–142 (2007)Google Scholar
  9. 9.
    Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: SODA, pp. 635–644 (2002)Google Scholar
  10. 10.
    Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams: you only get one look a tutorial. In: SIGMOD Conference (2002)Google Scholar
  11. 11.
    Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. Next Generation Data Mining (2003)Google Scholar
  12. 12.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams: Theory and Practice. IEEE TKDE 15 (2003)Google Scholar
  13. 13.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams. In: Proc. Symp. on Foundations of Computer Science (November 2000)Google Scholar
  14. 14.
    Han, J., Kamber, M.: Data mining: concepts and techniques (2001)Google Scholar
  15. 15.
    Liadan, O., Nina, M., Sudipto, G., Rajeev, M.: Streaming-data algorithms for high-quality clustering. In: ICDE (2002)Google Scholar
  16. 16.
    Nesime, T., Ugur, Ç., Stanley, B.Z., Michael, S.: Load shedding in a data stream manager. In: VLDB, pp. 309–320 (2003)Google Scholar
  17. 17.
    Ueda, N., Nakano, R.: Deterministic annealing em algorithm. Neural Netw. 11(2), 271–282 (1998)CrossRefGoogle Scholar
  18. 18.
    Aoying, Z., Feng, C., Ying, Y., Chaofeng, S., Xiaofeng, H.: Distributed data stream clustering: A fast em-based approach. In: ICDE Conference, pp. 736–745 (2007)Google Scholar
  19. 19.
    Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: VLDB Conference (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Xuan Hong Dang
    • 1
  • Vincent C. S. Lee
    • 1
  • Wee Keong Ng
    • 2
  • Kok Leong Ong
    • 3
  1. 1.Monash UniversityAustralia
  2. 2.Nanyang Technological UniversitySingapore
  3. 3.Deakin UniversityAustralia

Personalised recommendations