An Efficient Density-Based Clustering Algorithm Using Reverse Nearest Neighbour

Chowdhury, Stiphen; de Amorim, Renato Cordeiro

doi:10.1007/978-3-030-22868-2_3

Stiphen Chowdhury¹⁷ &
Renato Cordeiro de Amorim¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 998))

Included in the following conference series:

Intelligent Computing - Proceedings of the Computing Conference

1636 Accesses
4 Citations

Abstract

Density-based clustering is the task of discovering high-density regions of entities (clusters) that are separated from each other by contiguous regions of low-density. DBSCAN is, arguably, the most popular density-based clustering algorithm. However, its cluster recovery capabilities depend on the combination of the two parameters. In this paper we present a new density-based clustering algorithm which uses reverse nearest neighbour (RNN) and has a single parameter. We also show that it is possible to estimate a good value for this parameter using a clustering validity index. The RNN queries enable our algorithm to estimate densities taking more than a single entity into account, and to recover clusters that are not well-separated or have different densities. Our experiments on synthetic and real-world data sets show our proposed algorithm outperforms DBSCAN and its recent variant ISDBSCAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Hou, J., Gao, H., Li, X.: DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans. Image Process. 25(7), 3182–3193 (2016)
Article MathSciNet MATH Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)
Article Google Scholar
Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press, Boca Raton (2012)
MATH Google Scholar
Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)
Article MATH Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)
Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: ACM Sigmod Record, vol. 28, pp. 49–60. ACM (1999)
Article Google Scholar
Hinneburg, A., Keim, D.A., et al.: An efficient approach to clustering in large multimedia databases with noise. In: KDD, vol. 98, pp. 58–65 (1998)
Google Scholar
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: PAKDD, vol. 6, pp. 577–593. Springer (2006)
Google Scholar
Cassisi, C., Ferro, A., Giugno, R., Pigola, G., Pulvirenti, A.: Enhancing density-based clustering: parameter reduction and outlier detection. Inf. Syst. 38(3), 317–330 (2013)
Article Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Google Scholar
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
MathSciNet Google Scholar
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. SIGMOD Rec. 29(2), 201–212 (2000)
Article Google Scholar
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)
Article Google Scholar
Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation. In: Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, 24–26 April 2014, pp. 839–847 (2014)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Article MathSciNet MATH Google Scholar
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Article MathSciNet MATH Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)
Article Google Scholar
Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recogn. Lett. 29(6), 773–786 (2008)
Article Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Limin, F., Medico, E.: Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinformatics 8(1), 3 (2007)
Article Google Scholar
Jain, A.K., Law, M.H.C.: Data clustering: a user’s dilemma. PReMI 3776, 1–10 (2005)
Google Scholar
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Article Google Scholar
Chang, H., Yeung, D.-Y.: Robust path-based spectral clustering. Pattern Recogn. 41(1), 191–203 (2008)
Article MATH Google Scholar
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100(1), 68–86 (1971)
Article MATH Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 4 (2007)
Article Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Article Google Scholar
Tan, M., Eshelman, L.: Using weighted networks to represent classification knowledge in noisy domains. In: Proceedings of the Fifth International Conference on Machine Learning, pp. 121–134 (1988)
Chapter Google Scholar
Fisher, D.H., Schlimmer, J.C.: Concept simplification and prediction accuracy. In: Proceedings of the Fifth International Conference on Machine Learning, pp. 22–28 (2014)
Chapter Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Hertfordshire, College Lane Campus, Hatfield, AL10 9AB, UK
Stiphen Chowdhury
School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK
Renato Cordeiro de Amorim

Authors

Stiphen Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Renato Cordeiro de Amorim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stiphen Chowdhury .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chowdhury, S., de Amorim, R.C. (2019). An Efficient Density-Based Clustering Algorithm Using Reverse Nearest Neighbour. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 998. Springer, Cham. https://doi.org/10.1007/978-3-030-22868-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-22868-2_3
Published: 09 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22867-5
Online ISBN: 978-3-030-22868-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics