Advertisement

A Comparative Study of Density-based Clustering Algorithms on Data Streams: Micro-clustering Approaches

  • Amineh Amini
  • Teh Ying Wah
Chapter
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 110)

Abstract

Clustering data streams is a challenging problem in mining data streams. Data streams need to be read by a clustering algorithm in a single pass with limited time, and memory whereas they may change over time. Different clustering algorithms have been developed for data streams. Density-based algorithms are a remarkable group in clustering data that can find arbitrary shape clusters, and handle the outliers as well. In recent years, density-based clustering algorithms are adopted for data streams. However, in clustering data streams, it is impossible to record all data streams. Micro-clustering is a summarization method used to record synopsis information about data streams. Various algorithms apply micro-clustering methods for clustering data streams. In this paper, we will concentrate on the density-based clustering algorithms that use micro-clustering methods for clustering and we refer them as density-micro clustering algorithms. We review the algorithms in details and compare them based on different characteristics.

Keywords

Data streams Density-based clustering Micro-cluster 

References

  1. 1.
    Aggarwal CC (ed) (2007) Data streams—models and algorithms. Springer, New york, USAMATHGoogle Scholar
  2. 2.
    Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, VLDB Endowment, Berlin, Germany, pp 81–92Google Scholar
  3. 3.
    Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the thirtieth international conference on very large data bases VLDB Endowment, Toronto, Canada, pp 852–863Google Scholar
  4. 4.
    Anil KJ, Murty MN, Flynn PJ (1999) Data clustering: a review, ACM Comput Surveys 31:264–323CrossRefGoogle Scholar
  5. 5.
    Anil KJ (2008) Data clustering: 50 years beyond K-means, Pattern Recogn Lett 31(8):651–666Google Scholar
  6. 6.
    Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure, SIGMOD Records 28:49–60CrossRefGoogle Scholar
  7. 7.
    Amini A, Teh YW (2011) Density micro-clustering algorithms on data streams: a review, lecture notes in engineering and computer science: proceedings of the international multiconference of engineers and computer scientists 2011, IMECS 2011, Hong Kong, 16–18 March 2011Google Scholar
  8. 8.
    Amini A, Teh YW, Saybani MR, Aghabozorgi SR (2011) A study of density-grid based clustering algorithms on data streams. In: Proceedings of the 8th international conference on fuzzy systems and knowledge discovery, Shanghai, pp 410–414Google Scholar
  9. 9.
    Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS 2002, New York, pp 1–16Google Scholar
  10. 10.
    Cao F, Ester M, Weining Q, Aoying Z (2006) Density-based clustering over an evolving data stream with noise. In: SIAM conference on data mining, SIAM, Bethesda, Maryland, USA, pp 328–339Google Scholar
  11. 11.
    Elena I, Suzana L, Dejan G (2007) A survey of stream data mining. In: Proceedings of 8th national conference with international participation, ETAI, Ohrid, Republic of MACEDONIA, pp 19–21Google Scholar
  12. 12.
    Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining (KDD), AAAI Press, Portland, Oregon, pp 226–231Google Scholar
  13. 13.
    Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications (ASA-SIAM series on statistics and applied probability). Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PennsylvaniaGoogle Scholar
  14. 14.
    Gaber MM, Zaslavsky A, Krishnaswamy S (2010) Data stream mining, data mining and knowledge discovery handbook, pp 759–787Google Scholar
  15. 15.
    Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Record 34:18–26CrossRefGoogle Scholar
  16. 16.
    Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann, San FranciscoGoogle Scholar
  17. 17.
    Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceeding of 4th international conference on knowledge discovery & data mining, New York City, NY, pp 58–65Google Scholar
  18. 18.
    Kranen P, Assent I, Baldauf C, Seidl T (2011) The ClusTree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst 29(2): 249–272CrossRefGoogle Scholar
  19. 19.
    Li-xiong L, Jing K, Yun-fei G, Hai H (2009) A three-step clustering algorithm over an evolving data stream. In: Proceedings of IEEE international conference on intelligent computing and intelligent systems (ICIS), Shanghai, China, pp 160–164Google Scholar
  20. 20.
    Ren J, Ma R, Ren J (2009) Density-based data streams clustering over sliding windows. In: Proceedings of the 6th international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, Tianjin, ChinaGoogle Scholar
  21. 21.
    Ruiz C, Menasalvas E, Spiliopoulou M (2009) C-DenStream: using domain knowledge on a data stream. In: Proceedings of the 12th international conference on discovery science, Springer, Berlin, pp 287–301Google Scholar
  22. 22.
    Ruiz C, Spiliopoulou M, Menasalvas E (2007) C-DBSCAN: density-based clustering with constraints. In: Proceedings of the international conference on rough sets fuzzy sets data mining and granular computing, Springer, Berlin, Heidelberg, pp 216–223Google Scholar
  23. 23.
    Tasoulis DK, Ross G, Adams NM (2007) Visualizing the cluster structure of data streams. In: Proceedings of the 7th international conference on intelligent data analysis, IDA, Springer, Berlin, pp 81–92Google Scholar
  24. 24.
    Wagstaff K, Cardie C, Rogers S, Schrodl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML, San Francisco, pp 577–584Google Scholar
  25. 25.
    Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, Quebec, Canada, pp 103–114Google Scholar
  26. 26.
    Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowledge Inform Syst 15:181–214CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Information System, Faculty of Computer Science and Information TechnologyUniversity of Malaya (UM)Kuala LumpurMalaysia

Personalised recommendations