Advertisement

Granularity Adaptive Density Estimation and on Demand Clustering of Concept-Drifting Data Streams

  • Weiheng Zhu
  • Jian Pei
  • Jian Yin
  • Yihuang Xie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4081)

Abstract

Clustering data streams has found a few important applications. While many previous studies focus on clustering objects arriving in a data stream, in this paper, we consider the novel problem of on demand clustering concept drifting data streams. In order to characterize concept drifting data streams, we propose an effective method to estimate densities of data streams. One unique feature of our new method is that its granularity of estimation is adaptive to the available computation resource, which is critical for processing data streams of unpredictable input rates. Moreover, we can apply any clustering method to on demand cluster data streams using their density estimations. A performance study on synthetic data sets is reported to verify our design, which clearly shows that our method obtains results comparable to CluStream [3] on clustering single stream, and much better results than COD [8] when clustering multiple streams.

Keywords

Density Estimation Data Stream Kernel Density Estimation Input Rate Probe Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., et al.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), August 2004, Toronto, ON, Canada (2004)Google Scholar
  2. 2.
    Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 575–586. ACM Press, New York (2003)CrossRefGoogle Scholar
  3. 3.
    Aggarwal, C.C., et al.: A framework for clustering evolving data streams. In: Proc.the 19th Int. Conf. on Very Large Data Bases (VLDB 2003), September 2003, Berlin, Germany (2003)Google Scholar
  4. 4.
    Babcock, B., et al.: Load shedding techniques for data stream systems. In: Proceedings of the 2003 Workshop on Management and Processing of Data Streams (MPDS 2003), San Diego, California (June 2003)Google Scholar
  5. 5.
    Babcock, B., et al.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS 2003), pp. 234–243. ACM Press, New York (2003)CrossRefGoogle Scholar
  6. 6.
    Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 487–492. ACM Press, New York (2003)CrossRefGoogle Scholar
  7. 7.
    Chi, Y., et al.: Loadstar: A load shedding scheme for classifying data streams. In: Proc. 2005 SIAM Int. Conf. Data Mining, April 2005, New Port Beach, CA (2005)Google Scholar
  8. 8.
    Dai, B.-R., et al.: Clustering on demand for multiple data streams. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 2004), November 2004, pp. 367–370. IEEE (2004)Google Scholar
  9. 9.
    Guha, S., et al.: Clustering data streams. In: Proc. IEEE Symposium on Foundations of Computer Science (FOCS 2000), Redondo Beach, CA, pp. 359–366 (2000)Google Scholar
  10. 10.
    Jain, A.K., et al.: Data clustering: A survey. ACM Comput. Surv. 31, 264–323 (1999)CrossRefGoogle Scholar
  11. 11.
    Kleinberg, J.: Bursty and hierarchical structure in streams. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 91–101. ACM Press, New York (2002)CrossRefGoogle Scholar
  12. 12.
    Li, Y., et al.: Clustering moving objects. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 617–622. ACM Press, New York (2004)CrossRefGoogle Scholar
  13. 13.
    Ma, J., Perkins, S.: Online novelty detection on temporal sequences. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 613–618. ACM Press, New York (2003)CrossRefGoogle Scholar
  14. 14.
    O’Callaghan, L., et al.: High-performance clustering of streams and large data sets. In: Proc. 2002 Int. Conf. Data Engineering (ICDE 2002), April 2002, San Fransisco, CA (2002)Google Scholar
  15. 15.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986)MATHGoogle Scholar
  16. 16.
    Tatbul, N., et al.: Load shedding in a data stream manager. In: VLDB, pp. 309–320 (2003)Google Scholar
  17. 17.
    Zhang, Q., Lin, X.: Clustering moving objects for spatio-temporal selectivity estimation. In: Proceedings of the fifteenth conference on Australasian database (CRPIT 2004), pp. 123–130. Australian Computer Society, Inc., Australia (2004)Google Scholar
  18. 18.
    Zhu, Y., Shasha, D.: Efficient elastic burst detection in data streams. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 336–345. ACM Press, New York (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Weiheng Zhu
    • 1
  • Jian Pei
    • 2
  • Jian Yin
    • 1
  • Yihuang Xie
    • 1
  1. 1.Zhongshan UniversityChina
  2. 2.Simon Fraser UniversityCanada

Personalised recommendations