Advertisement

Partition-Based Clustering with Sliding Windows for Data Streams

  • Jonghem YounEmail author
  • Jihun Choi
  • Junho Shim
  • Sang-goo Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10178)

Abstract

Data stream clustering with sliding windows generates clusters for every window movement. Because repeated clustering on all changed windows is highly inefficient in terms of memory and computation time, a clustering algorithm should be designed with considering only inserted and deleted tuples of windows. In this paper, we address this problem by sliding window aggregation technique and cluster modification strategy. We propose a novel data structure for construction and maintenance of 2-level synopses. This data structure enables to update synopses efficiently and support precise sliding window operations. We also suggest a modification strategy to decide whether to append new synopses to pre-existing clusters or perform clustering on whole synopses according to the difference between probability distributions of the original and updated clusters. Experimental results show that proposed method outperforms state-of-the-art methods.

Keywords

Data streams Clustering Sliding windows Approximation algorithms 

Notes

Acknowledgments

This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2016-R0992-16-1023) supervised by the IITP(Institute for Information & communications Technology Promotion).

References

  1. 1.
    Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics 17, 2.4:2.1–2.4:2.30 (2012)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 81–92 (2003). VLDB EndowmentGoogle Scholar
  3. 3.
    Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)CrossRefGoogle Scholar
  4. 4.
    Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 234–243. ACM (2003)Google Scholar
  5. 5.
    Braverman, V., Lang, H., Levin, K., Monemizadeh, M.: Clustering problems on sliding windows. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1374–1390. Society for Industrial and Applied Mathematics (2016)Google Scholar
  6. 6.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339. SIAM (2006)Google Scholar
  7. 7.
    Dang, X.H., Lee, V., Ng, W.K., Ciptadi, A., Ong, K.L.: An EM-based algorithm for clustering data streams in sliding windows. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 230–235. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-00887-0_18 CrossRefGoogle Scholar
  8. 8.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)Google Scholar
  9. 9.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRefGoogle Scholar
  10. 10.
    Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)CrossRefGoogle Scholar
  11. 11.
    Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec. 34(1), 39–44 (2005)CrossRefGoogle Scholar
  12. 12.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho André, C.P.L.F., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 13:1–13:31 (2013)Google Scholar
  14. 14.
    Sun, L., Guo, C.: Incremental affinity propagation clustering based on message passing. IEEE Trans. Knowl. Data Eng. 26(11), 2731–2744 (2014)CrossRefGoogle Scholar
  15. 15.
    Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: Density-based clustering of data streams at multiple resolutions. ACM Trans. Knowl. Discov. Data 3(3), 14:1–14:28 (2009)Google Scholar
  16. 16.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)CrossRefGoogle Scholar
  17. 17.
    Zhang, X., Furtlehner, C., Germain-Renaud, C., Sebag, M.: Data stream clustering with affinity propagation. IEEE Trans. Knowl. Data Eng. 26(7), 1644–1656 (2014)CrossRefGoogle Scholar
  18. 18.
    Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15(2), 181–214 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jonghem Youn
    • 1
    Email author
  • Jihun Choi
    • 1
  • Junho Shim
    • 2
  • Sang-goo Lee
    • 1
  1. 1.Seoul National UniversitySeoulRepublic of Korea
  2. 2.Sookmyung Womens UniversitySeoulRepublic of Korea

Personalised recommendations