Advertisement

COMET: Event-Driven Clustering over Multiple Evolving Streams

  • Mi-Yen Yeh
  • Bi-Ru Dai
  • Ming-Syan Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)

Abstract

In this paper, we present a framework for event-driven Clustering Over Multiple Evolving sTreams, which, abbreviated as COMET, monitors the distribution of clusters on multiple data streams and online reports the results. This information is valuable to support corresponding online decisions. Note that as time advances, the data streams are evolving and the clusters they belong to will change. Instead of directly clustering the multiple data streams periodically, COMET applies an efficient cluster adjustment procedure only when it is required. The signal of requiring to do cluster adjustments is defined as an ”event.” We design a mechanism of event detection which employs piecewise linear approximation as the key technique. The piecewise linear approximation is advantageous in that it can not only be performed in real time as the data comes in, but also be able to capture the trend of data. When an event occurs, through split and merge operations we can report the latest clustering results effectively with high clustering quality.

Keywords

Data Stream Streaming Data Piecewise Linear Approximation Stream Cluster Multiple Data Stream 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. of PODS (2002) Google Scholar
  2. 2.
    Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: Proc. of ICDE (2003)Google Scholar
  3. 3.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. of ACM SIGKDD (2000)Google Scholar
  4. 4.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: DEMON: Mining and monitoring evolving data. Knowledge and Data Engineering 13 (2001)Google Scholar
  5. 5.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: DEMON: Mining and monitoring evolving data. Knowledge and Data Engineering 13 (2001) Google Scholar
  6. 6.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: The Annual Symposium on Foundations of Computer Science (2000)Google Scholar
  7. 7.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of ACM SIGKDD (2001)Google Scholar
  8. 8.
    O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proc. of ICDE (2002)Google Scholar
  9. 9.
    Bulut, A., Singh, A.K.: A unified framework for monitoring data streams in real time. In: Proc. of ICDE (2005)Google Scholar
  10. 10.
    Liu, X., Ferhatosmanoglu, H.: Efficient k-nn search on streaming data series. In: Proc. of SSTD (2003)Google Scholar
  11. 11.
    Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: Proc. of VLDB (2002)Google Scholar
  12. 12.
    Yi, B.K., Sidiropoulos, N.J.T., Jagadish, H.V., Faloutsos, C., Biliris, A.: Online data mining for co-evolving time sequences. In: Proc. of ICDE (2000)Google Scholar
  13. 13.
    Wu, H.B., Salzberg, D.Z.: Online event-driven subsequence matching over financial data streams. In: Proc. of ACM SIGMOD (2004)Google Scholar
  14. 14.
    Dai, B.R., Huang, J.W., Yeh, M.Y., Chen, M.S.: Clustering on demand for multiple data streams. In: Proc. of ICDM (2004)Google Scholar
  15. 15.
    Rodrigues, P., Gama, J., Pedroso, J.P.: Hierarchical time-series clustering for data streams. In: Proc. of Int’l Workshop on Knowledge Discovery in Data Streams in conjunction with 15th European Conference on Machine Learning (2004)Google Scholar
  16. 16.
    Yang, J.: Dynamic clustering of evolving streams with a single pass. In: Proc. of ICDE, pp. 695–697 (2003)Google Scholar
  17. 17.
    Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: Proc. of ICDM (2001)Google Scholar
  18. 18.
    Keogh, E.J.: A fast and robust method for pattern matching in time series databases. In: Proc. of ICTAI (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mi-Yen Yeh
    • 1
  • Bi-Ru Dai
    • 1
  • Ming-Syan Chen
    • 1
  1. 1.Department of Electrical EngineeringNational Taiwan UniversityTaipeiTaiwan, ROC

Personalised recommendations