Advertisement

An Efficient Density-Based Clustering Algorithm Using Reverse Nearest Neighbour

  • Stiphen ChowdhuryEmail author
  • Renato Cordeiro de Amorim
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 998)

Abstract

Density-based clustering is the task of discovering high-density regions of entities (clusters) that are separated from each other by contiguous regions of low-density. DBSCAN is, arguably, the most popular density-based clustering algorithm. However, its cluster recovery capabilities depend on the combination of the two parameters. In this paper we present a new density-based clustering algorithm which uses reverse nearest neighbour (RNN) and has a single parameter. We also show that it is possible to estimate a good value for this parameter using a clustering validity index. The RNN queries enable our algorithm to estimate densities taking more than a single entity into account, and to recover clusters that are not well-separated or have different densities. Our experiments on synthetic and real-world data sets show our proposed algorithm outperforms DBSCAN and its recent variant ISDBSCAN.

Keywords

Density-based clustering Reverse nearest neighbour Nearest neighbour Influence space 

References

  1. 1.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  2. 2.
    Hou, J., Gao, H., Li, X.: DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans. Image Process. 25(7), 3182–3193 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)CrossRefGoogle Scholar
  4. 4.
    Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press, Boca Raton (2012)zbMATHGoogle Scholar
  5. 5.
    Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)zbMATHCrossRefGoogle Scholar
  6. 6.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)Google Scholar
  7. 7.
    Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: ACM Sigmod Record, vol. 28, pp. 49–60. ACM (1999)CrossRefGoogle Scholar
  8. 8.
    Hinneburg, A., Keim, D.A., et al.: An efficient approach to clustering in large multimedia databases with noise. In: KDD, vol. 98, pp. 58–65 (1998)Google Scholar
  9. 9.
    Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: PAKDD, vol. 6, pp. 577–593. Springer (2006)Google Scholar
  10. 10.
    Cassisi, C., Ferro, A., Giugno, R., Pigola, G., Pulvirenti, A.: Enhancing density-based clustering: parameter reduction and outlier detection. Inf. Syst. 38(3), 317–330 (2013)CrossRefGoogle Scholar
  11. 11.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)Google Scholar
  12. 12.
    Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)MathSciNetGoogle Scholar
  13. 13.
    Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. SIGMOD Rec. 29(2), 201–212 (2000)CrossRefGoogle Scholar
  14. 14.
    Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)CrossRefGoogle Scholar
  15. 15.
    Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation. In: Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, 24–26 April 2014, pp. 839–847 (2014)Google Scholar
  16. 16.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)zbMATHCrossRefGoogle Scholar
  17. 17.
    Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)CrossRefGoogle Scholar
  20. 20.
    Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recogn. Lett. 29(6), 773–786 (2008)CrossRefGoogle Scholar
  21. 21.
    Bache, K., Lichman, M.: UCI machine learning repository (2013)Google Scholar
  22. 22.
    Limin, F., Medico, E.: Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinformatics 8(1), 3 (2007)CrossRefGoogle Scholar
  23. 23.
    Jain, A.K., Law, M.H.C.: Data clustering: a user’s dilemma. PReMI 3776, 1–10 (2005)Google Scholar
  24. 24.
    Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)CrossRefGoogle Scholar
  25. 25.
    Chang, H., Yeung, D.-Y.: Robust path-based spectral clustering. Pattern Recogn. 41(1), 191–203 (2008)zbMATHCrossRefGoogle Scholar
  26. 26.
    Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100(1), 68–86 (1971)zbMATHCrossRefGoogle Scholar
  27. 27.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)CrossRefGoogle Scholar
  28. 28.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)CrossRefGoogle Scholar
  29. 29.
    Tan, M., Eshelman, L.: Using weighted networks to represent classification knowledge in noisy domains. In: Proceedings of the Fifth International Conference on Machine Learning, pp. 121–134 (1988)CrossRefGoogle Scholar
  30. 30.
    Fisher, D.H., Schlimmer, J.C.: Concept simplification and prediction accuracy. In: Proceedings of the Fifth International Conference on Machine Learning, pp. 22–28 (2014)CrossRefGoogle Scholar
  31. 31.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Stiphen Chowdhury
    • 1
    Email author
  • Renato Cordeiro de Amorim
    • 2
  1. 1.School of Computer ScienceUniversity of HertfordshireHatfieldUK
  2. 2.School of Computer Science and Electronic EngineeringUniversity of EssexColchesterUK

Personalised recommendations