DMM-Stream: A Density Mini-Micro Clustering Algorithm for Evolving Data Streams

  • Amineh Amini
  • Hadi Saboohi
  • Teh Ying Wah
  • Tutut Herawan
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 285)

Abstract

Clustering real-time stream data is an important and challenging problem. The existing algorithms have not considered the distribution of data inside micro cluster, specifically when data points are non uniformly distributed inside micro cluster. In this situation, a large radius of micro cluster has to be considered which leads to lower quality. In this paper, we present a density-based clustering algorithm, DMM-Stream, for evolving data streams. It is an online-offline algorithm which considers the distribution of data inside micro cluster. In DMM-Stream, we introduce mini-micro cluster for keeping summary information of data points inside micro cluster. In our method, based on the distribution of the dense areas inside the micro cluster at least one representative point, either micro cluster itself or its mini-micro clusters’ centers, are sent to the offline phase. By choosing a proper mini-micro and micro center, we increase cluster quality while maintaining the time complexity. A pruning strategy is also used to filter out the real data from noise by introducing dense and sparse mini-micro and micro cluster. Our performance study over real and synthetic data sets demonstrates effectiveness of our method.

Keywords

Density-based clustering Micro cluster Mini-micro cluster 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C. (ed.): Data Streams – Models and Algorithms. Springer (2007)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases. pp. 81–92. VLDB Endowment (2003)Google Scholar
  3. 3.
    Amini, A., Teh Ying, W.: Density micro-clustering algorithms on data streams: A review. In: International Conference on Data Mining and Applications (ICDMA). pp. 410–414. Hong Kong (2011)Google Scholar
  4. 4.
    Amini, A., Teh Ying, W.: A comparative study of density-based clustering algorithms on data streams: Micro-clustering approaches. In: Ao, S.I., Castillo, O., Huang, X. (eds.) Intelligent Control and Innovative Computing, Lecture Notes in Electrical Engineering, vol. 110, pp. 275–287. Springer US (2012)Google Scholar
  5. 5.
    Amini, A., Teh Ying, W.: DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window. In: International Conference on Data Mining and Computer Engineering (ICDMCE). pp. 206–210. Bangkok, Thailand (2012)Google Scholar
  6. 6.
    Amini, A., Teh Ying, W.: Requirements for clustering evolving data stream. In: 2nd International Conference on Power Electronics, Computer and Mechanical Engineering (ICPECME). Cambodia (2013)Google Scholar
  7. 7.
    Amini, A., Teh Ying, W., Saybani, M.R., Aghabozorgi, S.R.: A study of density-grid based clustering algorithms on data streams. In: 8th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD11). pp. 1652–1656. IEEE, Shanghai (2011)Google Scholar
  8. 8.
    Amini, A., Wah, T.Y.: Adaptive density-based clustering algorithms for data stream mining. In: Third International Conference on Theoretical and Mathematical Foundations of Computer Science. pp. 620–624. IERI (2012)Google Scholar
  9. 9.
    Bifet, A., Holmes, G., Pfahringer, B., Kranen, P., Kremer, H., Jansen, T., Seidl, T.: Moa: Massive online analysis, a framework for stream classification and clustering. In: Journal of Machine Learning Research (JMLR). vol. 11, pp. 44–50 (2010)Google Scholar
  10. 10.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining. pp. 328–339 (2006)Google Scholar
  11. 11.
    Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 133–142. KDD’07, ACM, New York, NY, USA (2007)Google Scholar
  12. 12.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (June 2003)Google Scholar
  13. 13.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science. p. 359. IEEE Computer Society, Washington, DC, USA (2000)Google Scholar
  14. 14.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques Third edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011)Google Scholar
  15. 15.
    Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)Google Scholar
  16. 16.
    Ng, W., Dash, M.: Discovery of frequent patterns in transactional data streams. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems II, Lecture Notes in Computer Science, vol. 6380, pp. 1–30. Springer Berlin/Heidelberg (2010)Google Scholar
  17. 17.
    O′Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming- data algorithms for high-quality clustering. In: International Conference on Data Engineering. pp. 685–694. IEEE Computer Society, Los Alamitos, CA, USA (2002)Google Scholar
  18. 18.
    Tu, L., Chen, Y.: Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery Data 3(3), 1–27 (2009)Google Scholar
  19. 19.
    Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: Density-based clustering of data streams at multiple resolutions. ACM Transactions Knowledge Discovery Data 3(3), 1–28 (2009)Google Scholar
  20. 20.
    Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems 15, 181–214 (May 2008)Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  • Amineh Amini
    • 1
  • Hadi Saboohi
    • 1
  • Teh Ying Wah
    • 1
  • Tutut Herawan
    • 1
  1. 1.Faculty of Computer Science and Information TechnologyUniversity of Malaya (UM)Kuala LumpurMalaysia

Personalised recommendations