Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation

  • Wei Fan
  • Yusuke Koyanagi
  • Koichi Asakura
  • Toyohide Watanabe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5465)

Abstract

A growing number of real world applications deal with multiple evolving data streams. In this paper, a framework for clustering over evolving data streams is proposed taking advantage of recent-biased approximation. In recent-biased approximation, more details are preserved for recent data and fewer coefficients are kept for the whole data stream, which improves the efficiency of clustering and space usability greatly. Our framework consists of two phases. One is an online phase which approximates data streams and maintains the summary statistics incrementally. The other is an offline clustering phase which is able to perform dynamic clustering over data streams on all possible time horizons. As shown in complexity analyses and also validated by our empirical studies, our framework performed efficiently in the data stream environment while producing clustering results of very high quality.

Keywords

Clustering over evolving data streams time series data recent-biased approximation data mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Wei Fan
    • 1
  • Yusuke Koyanagi
    • 1
  • Koichi Asakura
    • 2
  • Toyohide Watanabe
    • 1
  1. 1.Department of Systems and Social Informatics, Graduate School of Information ScienceNagoya UniversityNagoyaJapan
  2. 2.School of InformaticsDaido Institute of TechnologyNagoyaJapan

Personalised recommendations