Abstract
Shared nearest neighbor (SNN) clustering algorithm is a robust graph-based, efficient clustering method that could handle high-dimensional data. The SNN clustering works well when the data consist of clusters that are of diverse in shapes, densities, and sizes but assignment of the data points lying in the boundary regions of overlapping clusters is not accurate. In order to overcome this problem, we have presented an extension of shared nearest neighbor algorithm that have better capability of handling the data points lying in the boundary regions specifically for overlapping cluster by means of fuzzy concept. Extensive experiments were carried out to compare the proposed approach fuzzy shared nearest neighbor clustering (FSNN) with existing clustering methods K-means, Fuzzy C-means, Density_clust, and Shared Nearest Neighbor. The effectiveness of FSNN is evaluated in benchmark datasets. Experimental results using FSNN method show that it can accurately cluster the data points lying in the overlapping partition and generate compact and well-separated clusters as compared to state-of-the-art clustering algorithm. The results obtained using different clustering methods are validated by standard cluster validation measures.
Similar content being viewed by others
References
Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken (2009)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Series C Appl. Stat. 28(1), 100–108 (1979)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Huang, Zhexue: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)
Ertoz, L., Steinbach, M., Kumar V.: A new shared nearest neighbor clustering algorithm and its applications. In: Workshop on clustering high dimensional data and its applications, SIAM data mining 2002, Arlington, VA, USA (2002)
Ertoz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of second SIAM international conference on data mining, San Francisco, CA, USA (2003)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. InKdd 96(34), 226–231 (1996)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD Record, vol. 27, no. 2, pp. 73–84. ACM (1998)
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: KDD, vol. 98, pp. 58–65 (1998)
Agrawal R. Johannes Gehrke. Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD conference on management of dala, pp. 94–105 (1998)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Rodriguez, A., Laio, A.: Machine learning. Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. 100(11), 1025–1034 (1973)
Houle ME, Kriegel HP, Kröger P, Schubert E, Zimek A. Can shared-neighbor distances defeat the curse of dimensionality?. In: International conference on scientific and statistical database management, pp. 482–500. Springer, Berlin Heidelberg (2010)
Sharma, R., Verma, K. Soft Comput (2017). https://doi.org/10.1007/s00500-017-2767-4
Lichman, M.: UCI machine learning repository http://archive.ics.uci.edu/ml. University of California, School of Information and Computer Science, Irvine, CA (2013)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J Cybern. 4(1), 95–104 (1974)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1(20), 53–65 (1987)
Tan, P., Steinbach, M., Kumar, V.: Introduction to data mining. Addison-Wesley, Boston (2006)
Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Sharma, R., Verma, K. Fuzzy Shared Nearest Neighbor Clustering. Int. J. Fuzzy Syst. 21, 2667–2678 (2019). https://doi.org/10.1007/s40815-019-00699-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-019-00699-7