Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation
A growing number of real world applications deal with multiple evolving data streams. In this paper, a framework for clustering over evolving data streams is proposed taking advantage of recent-biased approximation. In recent-biased approximation, more details are preserved for recent data and fewer coefficients are kept for the whole data stream, which improves the efficiency of clustering and space usability greatly. Our framework consists of two phases. One is an online phase which approximates data streams and maintains the summary statistics incrementally. The other is an offline clustering phase which is able to perform dynamic clustering over data streams on all possible time horizons. As shown in complexity analyses and also validated by our empirical studies, our framework performed efficiently in the data stream environment while producing clustering results of very high quality.
KeywordsClustering over evolving data streams time series data recent-biased approximation data mining
Unable to display preview. Download preview PDF.