Skip to main content
Log in

A survey of density based clustering algorithms

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Density based clustering algorithms (DBCLAs) rely on the notion of density to identify clusters of arbitrary shapes, sizes with varying densities. Existing surveys on DBCLAs cover only a selected set of algorithms. These surveys fail to provide an extensive information about a variety of DBCLAs proposed till date including a taxonomy of the algorithms. In this paper we present a comprehensive survey of various DBCLAs over last two decades along with their classification. We group the DBCLAs in each of the four categories: density definition, parameter sensitivity, execution mode and nature of data and further divide them into various classes under each of these categories. In addition, we compare the DBCLAs through their common features and variations in citation and conceptual dependencies. We identify various application areas of DBCLAs in domains such as astronomy, earth sciences, molecular biology, geography, multimedia. Our survey also identifies probable future directions of DBCLAs where involvement of density based methods may lead to favorable results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Jain A K, Murty M N, Flynn P J. Data clustering: a review. ACM Computing Surveys, 1999, 31(3): 264–323

    Google Scholar 

  2. Yan Z, Luo W, Bu C, Ni L. Clustering spatial data by the neighbors intersection and the density difference. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing Applications and Technologies (BDCAT). 2016, 217–226

  3. Xu R, Wunsch D. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 2005, 16(3): 645–678

    Google Scholar 

  4. Ertöz L, Steinbach M, Kumar V. Finding clusters of different sizes, shapes and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining. 2003, 47–58

  5. Karypis G, Han E H, Kumar V. Chameleon: hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68–75

    Google Scholar 

  6. Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeedings of International Conference on KDD. 1996, 226–231

  7. Keogh E, Mueen A. Encyclopedia of Machine Learning and Data Mining. Curse of Dimensionality. 2nd ed. Springer, Boston, MA, 2017, 314–315

    Google Scholar 

  8. Raymond T N, Han J. Clarans: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(5): 1003–1016

    Google Scholar 

  9. Hartigan J A, Wong M A. Algorithm as 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979, 28(1): 100–108

    MATH  Google Scholar 

  10. Silverman B W. Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC press, 1998

  11. Aggarwal C, Reddy C K. Data Clustering: Algorithms and Applications. CRC press, 2013

  12. Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of data. 1998, 94–105

  13. Parimala M, Lopez D, Senthilkumar N C. A survey on density based clustering algorithms for mining large spatial databases. International Journal of Advanced Science and Technology, 2011, 31(1): 59–66

    Google Scholar 

  14. Ankerst M, Breunig M M, Kriegel H P, Sander J. Optics: ordering points to identify the clustering structure. ACM Sigmod Record, 1999, 28: 49–60

    Google Scholar 

  15. Ram A, Jalal S, Jalal A S, Kumar M. A density based algorithm for discovering density varied clusters in large spatial databases. International Journal of Computer Applications, 2010, 3(6): 1–4

    Google Scholar 

  16. Birant D, Kut A. ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data & Knowledge Engineering, 2007, 60(1): 208–221

    Google Scholar 

  17. Bhuyan R, Borah S. A survey of some density based clustering techniques. In: Proceedings of the 2013 Conference on Advancements in Information, Computer and Communication. 2013

  18. Chaudhary S, Chauhan R, Batra P. A survey of density based clustering algorithms. International Journal of Computer Science and Technology, 2014, 5(2): 169–171

    Google Scholar 

  19. Hinneburg A, Keim D A. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. 1998, 58–65

  20. Nagpal P B, Mann P A. Comparative study of density based clustering algorithms. International Journal of Computer Applications, 2011, 27(11): 421–435

    Google Scholar 

  21. Xu X, Ester M, Kriegel H P, Sander J. A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the 14th IEEE International Conference on Data Engineering. 1998, 324–331

  22. Thiagarasu V, Prabahari R. A comparative analysis of density based clustering techniques for outlier mining. International Journal of Engineering Sciences and Research Technology, 2014, 3(11): 132–136

    Google Scholar 

  23. Rajavat A, Dubey P. Comparative study between density based clustering — DBSCAN and OPTICS. International Journal of Advance Computational Engineering and Networking, 2016, 4(12): 34–37

    Google Scholar 

  24. Smiti A, Eloudi Z. Soft DBSCAN: improving DBSCAN clustering method using fuzzy set theory. In: Proceedings of the 6th International Conference on Human System Interactions. 2013, 380–385

  25. Loh W K, Park Y H. A survey on density-based clustering algorithms. In: Jeong Y S, Park Y H, Hsu C H, Park J, eds. Ubiquitous Information Technologies and Applications. Springer, Berlin, Heidelberg, 2014, 775–780

    Google Scholar 

  26. Xu X, Jäger J, Kriegel H P. A fast parallel clustering algorithm for large spatial databases. In: Guo Y, Grossman R, eds. High Performance Data Mining. Springer, Boston, MA, 1999, 263–290

    Google Scholar 

  27. Böhm C, Noll R, Plant C, Wackersreuther B. Density-based clustering using graphics processors. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 661–670

  28. Loh W K, Moon Y S, Park Y H. Erratum: fast density-based clustering using graphics processing units. IEICE TRANSACTIONS on Information and Systems, 2014, 97(7): 1947–1951

    Google Scholar 

  29. Ting K M, Zhu Y, Carman M J, Zhu Y, Zhou Z H. Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1205–1214.

  30. Sander J, Ester M, Kriegel H P, Xu X. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 1998, 2(2): 169–194

    Google Scholar 

  31. Ester M, Kriegel H P, Sander J, Wimmer M, Xu X. Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th International Conference on Very Large Data Bases. 1998, 323–333

  32. Zhou S, Zhou A, Cao J, Wen J, Fan Y, Hu Y. Combining sampling technique with DBSCAN algorithm for clustering large spatial databases. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2000, 169–172

  33. Jarvis R A, Patrick E A. Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers, 1973, 100(11): 1025–1034

    Google Scholar 

  34. Januzaj E, Kriegel H P, Pfeifle M. DBDC: density based distributed clustering. Advances in Database Technology-EDBT 2004, 2004, 529–530

    Google Scholar 

  35. Borah B, Bhattacharyya D K. An improved sampling-based DBSCAN for large spatial databases. In: Proceedings of IEEE International Conference on Intelligent Sensing and Information Processing. 2004, 92–96

  36. Tsai CF, Liu CW. KIDBSCAN: anew efficient data clustering algorithm. In: Proceedings of International Conference on Artifical Intelligence and Soft Computing. 2006, 702–711

  37. Kisilevich S, Mansmann F, Keim D. P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. In: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research and Application. 2010

  38. An N, Khac L, Kechadi M T. On a distributed approach for density-based clustering. In: Proceedings of the 10th International Conference on Machine Learning and Applications and Workshops. 2011, 283–286

  39. He Y, Tan H, Luo W, Mao H, Ma D, Feng S, Fan J. MR-DBSCAN: an efficient parallel density-based clustering algorithm using MapReduce. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems. 2011, 473–480

  40. Campello R J, Moulavi D, Sander J. Density-based clustering based on hierarchical density estimates. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2013, 160–172

  41. Yu Y, Zhao J, Wang X, Wang Q, Zhang Y. Cludoop: an efficient distributed density-based clustering for big data using hadoop. International Journal of Distributed Sensor Networks, 2015, 11(6): 379–391

    Google Scholar 

  42. Hou J, Gao H, Li X. DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Transactions on Image Processing, 2016, 25(7): 3182–3193

    MathSciNet  MATH  Google Scholar 

  43. Pavan M, Pelillo M. Dominant sets and pairwise clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 29(1): 167–172

    Google Scholar 

  44. Gan J, Tao Y. Dynamic density based clustering. In: Proceedings of the 2017 ACM International Conference on Management of Data. 2017, 1493–1507

  45. Wang W, Yang J, Muntz R. Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd International Conference on Very Large Data Bases. 1997, 186–195

  46. Davis L. Bit-climbing, representational bias, and test suite design. In: Proceedings of the 4th International Conference on Genetic Algorithms. 1991, 18–23

  47. Hinneburg A, Keim D A. Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. In: Proceedings of the 25th International Conference on Very Large Data Bases. 1999

  48. Zhang T, Ramakrishnan R, Livny M. BIRCH: a new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, 1997, 1(2): 141–182

    Google Scholar 

  49. Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Proceedings of the 24th International Conference on Very Large Data Bases. 1998, 428–439

  50. Aggarwal C C. A human-computer interactive method for projected clustering. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(4): 448–460

    Google Scholar 

  51. Chen Y, Tu L. Density-based clustering for real-time stream data. In: Proceedings of the 13thACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 133–142

  52. Aggarwal C C. Outlier Analysis. In: Data Mining. Springer, 2015, 237–263

  53. Kriegel H P, Pfeifle M. Hierarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005

  54. Kriegel H P, Pfeifle M. Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 672–677

  55. Stonebraker M, Frew J, Dozier J. The SEQUOIA 2000 Project. In: Proceedings of International Symposium on Spatial Databases. 1993, 397–412

  56. Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. ACM Sigmod Record, 1996, 25(2): 103–114

    Google Scholar 

  57. Zheng Y, Xie X, Ma W Y. Geolife: a collaborative social networking service among user, location and trajectory. IEEE Data Engineering Bulletin, 2010, 33(2): 32–39

    Google Scholar 

  58. Yuan J, Zheng Y, Xie X, Sun G. T-drive: enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge and Data Engineering, 2011, 25(1): 220–232, 2011

    Google Scholar 

  59. Yeung K Y, Fraley C, Murua A, Raftery A E, Ruzzo W L. Model-based clustering and data transformations for gene expression data. Bioinformatics, 2001, 17(10): 977–987

    Google Scholar 

  60. Bishop C M. Pattern Recognition and Machine Learning. Springer, 2006

  61. Lu C D, Zhang T Y, Du X Z, Li C P. A robust kernel PCA algorithm. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics. 2004, 3084–3087

  62. Balakrishnama S, Ganapathiraju A. Linear discriminant analysis-a brief tutorial. Institute for Signal and Information Processing, 1998, 18: 1–8

    Google Scholar 

  63. Baudat G, Anouar F. Generalized discriminant analysis using a kernel approach. Neural Computation, 2000, 12(10): 2385–2404

    Google Scholar 

  64. Epanechnikov V A. Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications, 1969, 14(1): 153–158

    MathSciNet  Google Scholar 

  65. Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases. ACM Sigmod Record, 1998, 27(2): 73–84

    MATH  Google Scholar 

  66. Tavallaee M, Bagheri E, Lu W, Ghorbani A A. A detailed analysis of the KDD cup 99 data set. In: Proceedings of IEEE Symposium on Computational Intelligence for Security and Defense Applications. 2009, 1–6

  67. Bader G D, Hogue C W V. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003, 4(1): 2

    Google Scholar 

  68. Connolly M L. Measurement of protein surface shape by solid angles. Journal of Molecular Graphics, 1986, 4(1): 3–6

    Google Scholar 

  69. Becker R H, White R L, Helfand D J. The first survey: faint images of the radio sky at twenty centimeters. The Astrophysical Journal, 1995, 450: 559

    Google Scholar 

  70. Zepka A F, Cordes J M, Wasserman I. Signal detection amid noise with known statistics. The Astrophysical Journal, 1994, 427: 438–445

    Google Scholar 

  71. Weir N, Fayyad U M, Djorgovski S. Automated star/galaxy classification for digitized POSS-II. The Astronomical Journal, 1995, 109: 2401

    Google Scholar 

  72. Chrisman N R, Dougenik J A, White D. Lessons for the design of polygon overlay processing from the odyssey whirlpool algorithm. In: Proceedings of the 5th International Symposium on Spatial Data Handling. 1992, 401–410

  73. Du Q, Dong Z, Huang C, Ren F. Density-based clustering withgeographical background constraints using a semantic expression model. ISPRS International Journal of Geo-Information, 2016, 5(5): 72

    Google Scholar 

  74. Hendler J. Data integration for heterogenous datasets. Big Data, 2014, 2(4): 205–215

    Google Scholar 

  75. Allen T E, Herrgård M J, Liu M, Qiu Y, Glasner J, Blattner F R, Palsson B. Genome-scale analysis of the uses of the escherichia coli genome: model-driven analysis of heterogeneous data sets. Journal of Bacteriology, 2003, 185(21): 6392–6399

    Google Scholar 

  76. Pavlidis P, Weston J, Cai J, Grundy W N. Gene functional classification from heterogeneous data. In: Proceedings of the 5thAnnual International Conference on Computational Biology, 2001, 249–255

  77. Angelier J. Tectonic analysis of fault slip data sets. Journal of Geophysical Research: Solid Earth, 1984, 89(B7): 5835–5848

    Google Scholar 

  78. Kowalski M, Rubin D, Aldering G, Agostinho R J, Amadon A, Amanullah R, Balland C, Barbary K, Blanc G, Challis P J, et al. Improved cosmological constraints from new, old, and combined supernova data sets. The Astrophysical Journal, 2008, 686(2): 749

    Google Scholar 

  79. Nguyen M D, Shin W Y. DBSTexC: density-based spatio-textual clustering on twitter. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2017, 23–26

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panthadeep Bhattacharjee.

Additional information

Panthadeep Bhattacharjee is currently a PhD research scholar in the Department of Computer Science & Engineering, Indian Institute of Technology (IIT) Guwahati, India. He obtained his BTech in Information Technology from Assam University Silchar, India in 2010 and MTech in Computer Science & Engineering from National Institute of Technology Durgapur (NIT), India in 2012. He worked as assistant professor in the School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, India from June 2012 to December 2013. He joined IIT Guwahati in January 2014. His research interest includes pattern recognition, machine learning, data science, online algorithms. He has been a winner in the Artificial Intelligence-Design Challenge Championship held at International Institute of Information Technology (HIT) — Bangalore in 2018.

Pinaki Mitra is currently an associate professor at the Department of Computer Science & Engineering, Indian Institute of Technology (IIT) Guwahati, India. He obtained his BTech in Computer Science & Engineering from Jadavpur University, Kolkata, India in 1987 and MTech in Computer Science & Engineering from Indian Institute of Science (IISc) Bangalore, India in 1989. He received his PhD from Simon Fraser University, Canada in 1994. He worked as a Research Scientist at Center for Microprocessor Training Education and Research, Jadavpur University. Subsequently he joined National Institute of Management, Kolkata and served as an assistant professor. He joined IIT Guwahati in December 2004. His research interest includes cryptography, network security, computer graphics, multimedia, pattern recognition and machine learning.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhattacharjee, P., Mitra, P. A survey of density based clustering algorithms. Front. Comput. Sci. 15, 151308 (2021). https://doi.org/10.1007/s11704-019-9059-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-019-9059-3

Keywords

Navigation