Clustering Data Streams by On-Line Proximity Updating
In this paper, we introduce a new clustering strategy for temporally ordered data streams, which is able to discover groups of homogeneous streams performing a single pass on data. It is a two steps approach where an on-line algorithm computes statistics about the dissimilarities among data and then, an off-line algorithm computes the final partition of the streams. The effectiveness of the proposal is evaluated through tests on real data.
KeywordsData Stream Adjust Rand Index Spectral Cluster Algorithm Local Partition Data Stream Processing
- Balzanella, A., Lechevallier, Y., & Verde, R. (2011). Clustering multiple data streams. In S. Ingrassia, R. Rocci, & M. Vichi (Eds.), New perspectives in statistical modeling and data analysis. Berlin: Springer.Google Scholar
- De Carvalho, F., Lechevallier, Y., & Verde, R. (2004). Clustering methods in symbolic data analysis. In D. Banks, et al. (Eds.), Classification, clustering, and data mining applications (Studies in classification, data analysis, and knowledge organization, pp. 299–317). Berlin: Springer.Google Scholar
- Kavitha, V., & Punithavalli, M. (2010). Clustering time series data stream – A literature survey. International Journal of Computer Science and Information Security, 8(1), 289–294.Google Scholar
- Liao, T. W. (2005). Clustering of time series data. A survey. Pattern Recognition, 38(11), 1857–1874.Google Scholar