Skip to main content

An Efficient Density-Based Clustering Algorithm Using Reverse Nearest Neighbour

  • Conference paper
  • First Online:
Intelligent Computing (CompCom 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 998))

Included in the following conference series:

Abstract

Density-based clustering is the task of discovering high-density regions of entities (clusters) that are separated from each other by contiguous regions of low-density. DBSCAN is, arguably, the most popular density-based clustering algorithm. However, its cluster recovery capabilities depend on the combination of the two parameters. In this paper we present a new density-based clustering algorithm which uses reverse nearest neighbour (RNN) and has a single parameter. We also show that it is possible to estimate a good value for this parameter using a clustering validity index. The RNN queries enable our algorithm to estimate densities taking more than a single entity into account, and to recover clusters that are not well-separated or have different densities. Our experiments on synthetic and real-world data sets show our proposed algorithm outperforms DBSCAN and its recent variant ISDBSCAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  2. Hou, J., Gao, H., Li, X.: DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans. Image Process. 25(7), 3182–3193 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)

    Article  Google Scholar 

  4. Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press, Boca Raton (2012)

    MATH  Google Scholar 

  5. Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)

    Article  MATH  Google Scholar 

  6. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  7. Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: ACM Sigmod Record, vol. 28, pp. 49–60. ACM (1999)

    Article  Google Scholar 

  8. Hinneburg, A., Keim, D.A., et al.: An efficient approach to clustering in large multimedia databases with noise. In: KDD, vol. 98, pp. 58–65 (1998)

    Google Scholar 

  9. Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: PAKDD, vol. 6, pp. 577–593. Springer (2006)

    Google Scholar 

  10. Cassisi, C., Ferro, A., Giugno, R., Pigola, G., Pulvirenti, A.: Enhancing density-based clustering: parameter reduction and outlier detection. Inf. Syst. 38(3), 317–330 (2013)

    Article  Google Scholar 

  11. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  12. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

  13. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. SIGMOD Rec. 29(2), 201–212 (2000)

    Article  Google Scholar 

  14. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)

    Article  Google Scholar 

  15. Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation. In: Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, 24–26 April 2014, pp. 839–847 (2014)

    Google Scholar 

  16. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  17. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  18. Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  19. Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)

    Article  Google Scholar 

  20. Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recogn. Lett. 29(6), 773–786 (2008)

    Article  Google Scholar 

  21. Bache, K., Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  22. Limin, F., Medico, E.: Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinformatics 8(1), 3 (2007)

    Article  Google Scholar 

  23. Jain, A.K., Law, M.H.C.: Data clustering: a user’s dilemma. PReMI 3776, 1–10 (2005)

    Google Scholar 

  24. Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)

    Article  Google Scholar 

  25. Chang, H., Yeung, D.-Y.: Robust path-based spectral clustering. Pattern Recogn. 41(1), 191–203 (2008)

    Article  MATH  Google Scholar 

  26. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100(1), 68–86 (1971)

    Article  MATH  Google Scholar 

  27. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)

    Article  Google Scholar 

  28. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)

    Article  Google Scholar 

  29. Tan, M., Eshelman, L.: Using weighted networks to represent classification knowledge in noisy domains. In: Proceedings of the Fifth International Conference on Machine Learning, pp. 121–134 (1988)

    Chapter  Google Scholar 

  30. Fisher, D.H., Schlimmer, J.C.: Concept simplification and prediction accuracy. In: Proceedings of the Fifth International Conference on Machine Learning, pp. 22–28 (2014)

    Chapter  Google Scholar 

  31. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stiphen Chowdhury .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chowdhury, S., de Amorim, R.C. (2019). An Efficient Density-Based Clustering Algorithm Using Reverse Nearest Neighbour. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 998. Springer, Cham. https://doi.org/10.1007/978-3-030-22868-2_3

Download citation

Publish with us

Policies and ethics