Advertisement

International Journal of Fuzzy Systems

, Volume 21, Issue 8, pp 2667–2678 | Cite as

Fuzzy Shared Nearest Neighbor Clustering

  • Rika SharmaEmail author
  • Kesari VermaEmail author
Article

Abstract

Shared nearest neighbor (SNN) clustering algorithm is a robust graph-based, efficient clustering method that could handle high-dimensional data. The SNN clustering works well when the data consist of clusters that are of diverse in shapes, densities, and sizes but assignment of the data points lying in the boundary regions of overlapping clusters is not accurate. In order to overcome this problem, we have presented an extension of shared nearest neighbor algorithm that have better capability of handling the data points lying in the boundary regions specifically for overlapping cluster by means of fuzzy concept. Extensive experiments were carried out to compare the proposed approach fuzzy shared nearest neighbor clustering (FSNN) with existing clustering methods K-means, Fuzzy C-means, Density_clust, and Shared Nearest Neighbor. The effectiveness of FSNN is evaluated in benchmark datasets. Experimental results using FSNN method show that it can accurately cluster the data points lying in the overlapping partition and generate compact and well-separated clusters as compared to state-of-the-art clustering algorithm. The results obtained using different clustering methods are validated by standard cluster validation measures.

Keywords

Clustering Shared nearest neighbor Fuzzy shared nearest neighbor Cluster validation 

References

  1. 1.
    Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)CrossRefGoogle Scholar
  2. 2.
    Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken (2009)zbMATHGoogle Scholar
  3. 3.
    Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  4. 4.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Series C Appl. Stat. 28(1), 100–108 (1979)zbMATHGoogle Scholar
  5. 5.
    Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)CrossRefGoogle Scholar
  6. 6.
    Huang, Zhexue: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)CrossRefGoogle Scholar
  7. 7.
    Ertoz, L., Steinbach, M., Kumar V.: A new shared nearest neighbor clustering algorithm and its applications. In: Workshop on clustering high dimensional data and its applications, SIAM data mining 2002, Arlington, VA, USA (2002)Google Scholar
  8. 8.
    Ertoz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of second SIAM international conference on data mining, San Francisco, CA, USA (2003)Google Scholar
  9. 9.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. InKdd 96(34), 226–231 (1996)Google Scholar
  10. 10.
    Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. In: ACM SIGMOD Record, vol. 27, no. 2, pp. 73–84. ACM (1998)Google Scholar
  11. 11.
    Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: KDD, vol. 98, pp. 58–65 (1998)Google Scholar
  12. 12.
    Agrawal R. Johannes Gehrke. Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD conference on management of dala, pp. 94–105 (1998)CrossRefGoogle Scholar
  13. 13.
    Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)CrossRefGoogle Scholar
  14. 14.
    Rodriguez, A., Laio, A.: Machine learning. Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)CrossRefGoogle Scholar
  15. 15.
    Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. 100(11), 1025–1034 (1973)CrossRefGoogle Scholar
  16. 16.
    Houle ME, Kriegel HP, Kröger P, Schubert E, Zimek A. Can shared-neighbor distances defeat the curse of dimensionality?. In: International conference on scientific and statistical database management, pp. 482–500. Springer, Berlin Heidelberg (2010)Google Scholar
  17. 17.
    Sharma, R., Verma, K. Soft Comput (2017).  https://doi.org/10.1007/s00500-017-2767-4 CrossRefGoogle Scholar
  18. 18.
    Lichman, M.: UCI machine learning repository http://archive.ics.uci.edu/ml. University of California, School of Information and Computer Science, Irvine, CA (2013)
  19. 19.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)CrossRefGoogle Scholar
  20. 20.
    Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J Cybern. 4(1), 95–104 (1974)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1(20), 53–65 (1987)CrossRefGoogle Scholar
  22. 22.
    Tan, P., Steinbach, M., Kumar, V.: Introduction to data mining. Addison-Wesley, Boston (2006)Google Scholar
  23. 23.
    Han, J., Kamber, M., Pei, J.: Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)zbMATHGoogle Scholar

Copyright information

© Taiwan Fuzzy Systems Association 2019

Authors and Affiliations

  1. 1.Department of Computer ApplicationsNIT RaipurRaipurIndia

Personalised recommendations