Skip to main content
Log in

Weighted natural neighborhood graph: an adaptive structure for clustering and outlier detection with no neighborhood parameter

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This paper aims at dealing with the practical shortages of nearest neighbor based data mining techniques, especially, clustering and outlier detection. In particular, when there are data sets with arbitrary shaped clusters and varying density, it is difficult to determine the proper parameters without a priori knowledge. To address this issue, we define a novel conception called natural neighbor, which can better reflect the relationship between the elements in a data set than k-nearest neighbor does, and we present a graph called weighted natural neighborhood graph for clustering and outlier detection. Furthermore, the whole process needs no parameter to deal with different data sets. Simulations on both synthetic data and real world data show the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Jain, A.K.: Data clustering: 50 years beyond k-means. In: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases—Part I, pp. 3–4 (2008)

  2. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying densitybased local outliers. ACM Sigmod Record 29(2), 93–104 (2000)

    Article  Google Scholar 

  3. Wang, K., Zhihui, D., Chen, Y., Li, S.: V3COCA: an effective clustering algorithm for complicated objects and its application in breast cancer research and diagnosis. Simul. Model. Pract. Theory 17(2), 454–470 (2009)

    Article  Google Scholar 

  4. Chai, Y., Du, Z., Chen, Y.: An A stepwise optimization algorithm of clustered streaming media servers. J. Syst. Softw. 82(8), 1344–1361 (2009)

    Article  Google Scholar 

  5. Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)

    Article  Google Scholar 

  6. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)

    Article  MATH  Google Scholar 

  7. Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)

    Article  Google Scholar 

  8. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference Knowledge Discovery and Data Mining (1996)

  9. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. ACM Sigmod Record (Stanford Research Inst Memo Stanford University) 28(2), 49–60 (1999)

    Article  Google Scholar 

  10. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  11. Chen, Lajiao, Ma, Yan, Liu, Peng, Wei, Jingbo, Jie, Wei, He, Jijun: A review of parallel computing for large-scale remote sensing image mosaicking. Clust. Comput. 18(2), 517–529 (2015)

    Article  Google Scholar 

  12. Knorr, E.M., Ng, R.T.: A unified notion of outliers: properties and computation. In: In Proceedigs of the International Conference on Knowledge Discovery & Data Mining, pp. 219–222 (1997)

  13. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM Sigmod Record 29(2), 427–438 (2000)

    Article  Google Scholar 

  14. Zhang, K., Hutter, M., Jin, H.: A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data. Springer, Berlin (2009)

    Book  Google Scholar 

  15. Ha, J., Seok, S., Lee, J.S.: Robust outlier detection using the instability factor. Knowl.-Based Syst. 63(3), 1523 (2014)

    Google Scholar 

  16. Tang, J., Chen, Z., Fu, W.C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining, pp. 535–548 (2002)

  17. Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. Lect. Notes Comput. Sci. 3918, 577–593 (2006)

    Article  Google Scholar 

  18. Liu, J., Deng, H.F.: Outlier detection on uncertain data based on local information. Knowl.-Based Syst. 51(1), 60–71 (2013)

    Article  MathSciNet  Google Scholar 

  19. Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. Proc. Vldb Conf. 88(9), 144–155 (1994)

    Google Scholar 

  20. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, pp. 226–231. AAAI Press, Menlo Park (1996)

    Google Scholar 

  21. Al-Zoubi, M.B., Al-Dahoud, A., Yahya, A.A.: New outlier detection method based on fuzzy clustering. Wseas Trans. Inf. Sci. Appl. 7(5), 681–690 (2010)

    Google Scholar 

  22. Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster analysis. J. Am. Stat. Assoc. 90, 773–795 (1990)

    Google Scholar 

  23. Stevens, S.S.: Mathematics, measurement and psychophysics. In: Stevens, S.S. (ed.) Handbook of Experimental Psychology, pp. 1–49. Wiley, New York (1951)

    Google Scholar 

  24. Wang, J., Neskovic, P., Cooper, L.N.: Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recognit. Lett. 28(2), 43–46 (2006)

    Google Scholar 

  25. García, S., et al.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)

    Article  Google Scholar 

  26. Qian, F., et al.: Mining regional co-location patterns with kNNG. J. Intell. Inf. Syst. 42(3), 485–505 (2013)

    Article  Google Scholar 

  27. Ghosh, Anil K.: On optimum choice of k in nearest neighbor classification. Comput. Stat. Data Anal. 50(11), 3113–3123 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  28. Ghosh, A.K.: On nearest neighbor classification using adaptive choice of k. J. Comput. Gr. Stat. 16(2), 482–502 (2007)

    Article  MathSciNet  Google Scholar 

  29. Domeniconi, C., Peng, J., Gunopulos, D.: Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1281–1285 (2002)

    Article  Google Scholar 

  30. Bhattacharya, G., Ghosh, K., Chowdhury, A.S.: An affinity-based new local distance function and similarity measure for kNN algorithm. Pattern Recognit. Lett. 33(3), 356–363 (2012)

    Article  Google Scholar 

  31. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. ACM Sigmod Record 29(2), 201–212 (2000)

    Article  Google Scholar 

  32. Yiu, M.L., Mamoulis, N.: Reverse nearest neighbors search in Ad Hoc subspaces. IEEE Trans. Knowl. Data Eng. 19(3), 412–426 (2007)

    Article  Google Scholar 

  33. Wang, S., Chai, S., Qiannan, L.V.: A pruning based continuous RkNN query algorithm for large k. Chin. J. Electron. 21(3), 523–527 (2012)

    Google Scholar 

  34. Brito, M.R., et al.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35(1), 33–42 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  35. Tang, B., He, H.: ENN: extended nearest neighbor method for pattern recognition [research frontier]. IEEE Comput. Intell. Mag. 10(3), 52–60 (2015)

    Article  MathSciNet  Google Scholar 

  36. Shivakumara, P., et al.: A novel mutual nearest neighbor based symmetry for text frame classification in video. Pattern Recognit. 44(8), 1671–1683 (2011)

  37. Huang, H, et al.: Towards effective and efficient mining of arbitrary shaped clusters. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE) (2014)

  38. Xuan, J., Luo, X., Zhang, G., Lu, J., Xu, Z.: Uncertainty analysis for the keyword system of web events. IEEE Trans. Syst. Man Cybern. 46(6), 829–842 (2016)

    Article  Google Scholar 

  39. Wei, X., Luo, X., Li, Q., Zhang, J., Xu, Z.: Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE Trans. Fuzzy Syst. 23(1), 72–84 (2015)

    Article  Google Scholar 

  40. UCI Repository of Machine Learning Databases. University of California, Irvine, CA. http://www.ics.uci.edu/mlearn/MLRepository.html/

Download references

Acknowledgments

This work was supported by the National Nature Science Foundation of China (No. 61272194 and No. 61073058) and Natural Science Foundation Project of CQ CSTC ( cstc2013jcyjA 40049).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji Feng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Q., Feng, J. & Huang, J. Weighted natural neighborhood graph: an adaptive structure for clustering and outlier detection with no neighborhood parameter. Cluster Comput 19, 1385–1397 (2016). https://doi.org/10.1007/s10586-016-0598-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0598-1

Keywords

Navigation