Density peaks clustering using geodesic distances

Original Article

Abstract

Density peaks clustering (DPC) algorithm is a novel clustering algorithm based on density. It needs neither iterative process nor more parameters. However, it cannot effectively group data with arbitrary shapes, or multi-manifold structures. To handle this drawback, we propose a new density peaks clustering, i.e., density peaks clustering using geodesic distances (DPC-GD), which introduces the idea of the geodesic distances into the original DPC method. By experiments on synthetic data sets, we reveal the power of the proposed algorithm. By experiments on image data sets, we compared our algorithm with classical methods (kernel k-means algorithm and spectral clustering algorithm) and the original algorithm in accuracy and NMI. Experimental results show that our algorithm is feasible and effective.

Keywords

Data clustering Density peaks clustering Geodesic distances 

Notes

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61379101 and 61672522), the National Key Basic Research Program of China (No. 2013CB329502). The Priority Academic Program Development of Jiangsu Higer Education Institutions (PAPD), and the Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET).

References

  1. 1.
    Wen X, Shao L, Xue Y et al (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406CrossRefGoogle Scholar
  2. 2.
    Iam-On N, Boongoen T, Kongkotchawan N (2014) A new link-based method to ensemble clustering and cancer microarray data analysis. Int J Collab Intell 1(1):45–67Google Scholar
  3. 3.
    Jia H, Ding S, Du M et al (2016) Approximate normalized cuts without Eigen-decomposition. Inf Sci 374:135–150CrossRefGoogle Scholar
  4. 4.
    Zheng Y, Jeon B, Xu D et al (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973Google Scholar
  5. 5.
    Han J, Kamber M (2000) Data mining: concepts and techniques. Morgan Kaufman, San FranciscoMATHGoogle Scholar
  6. 6.
    Zhang Y, Sun X, Wang B (2016) Efficient algorithm for k-barrier coverage based on integer linear programming. China Commun 13(7):16–23CrossRefGoogle Scholar
  7. 7.
    Li X, Liang Y, Cai Y (2016) CC-K-means: a candidate centres-based K-means algorithm for text data. Int J Collab Intell 1(3):189–204Google Scholar
  8. 8.
    Dong CR, Ng WWY, Wang XZ et al (2014) An improved differential evolution and its application to determining feature weights in similarity-based clustering. Neurocomputing 146:95–103CrossRefGoogle Scholar
  9. 9.
    Xu L, Ding S, Xu X et al (2016) Self-adaptive extreme learning machine optimized by rough set theory and affinity propagation clustering. Cognit Comput 8(4):720–728CrossRefGoogle Scholar
  10. 10.
    Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Sci 344(6191):1492–1496CrossRefGoogle Scholar
  11. 11.
    Chen GJ, Zhang XY, Wang ZJ et al (2015) Robust support vector data description for outlier detection with noise or uncertain data. Knowl-Based Syst 90:129–137CrossRefGoogle Scholar
  12. 12.
    Lu KY, Xia SY, Xia C (2015) Clustering based road detection method. In: Proceedings of the 34th Chinese Control Conference (CCC). pp 3874–3879Google Scholar
  13. 13.
    Xie K, Wu J, Yang W, Sun CY (2015) K-means clustering based on density for scene image classification. In: Proceedings of the 2015 Chinese Intelligent Automation Conference. pp 379–386Google Scholar
  14. 14.
    Du M, Ding S, Xue Y (2017) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern. doi: 10.1007/s13042-017-0636-1 Google Scholar
  15. 15.
    Zhang Y, Xia Y, Liu Y et al (2015) Clustering sentences with density peaks for multi-document summarization. In: Proceedings of human language technologies: the 2015 annual conference of the north american chapter of the ACL. pp 1262–1267Google Scholar
  16. 16.
    Tang GH, Jia S, Li J (2015) An enhanced density peak-based clustering approach for hyperspectral band selection. In: Proceedings of the international geoscience and remote sensing symposium. pp 1116–1119Google Scholar
  17. 17.
    Zhang WK, Li J (2015) Extended fast search clustering algorithm: widely density clusters, no density peaks. arXiv preprint arXiv:1505.05610. doi: 10.5121/csit.2015.50701
  18. 18.
    Wang XF, Xu YF (2015) Fast clustering using adaptive density peak detection. Stat Methods Med Res. doi: 10.1177/0962280215609948 Google Scholar
  19. 19.
    Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145CrossRefGoogle Scholar
  20. 20.
    Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Sci 290(5500):2319–2323CrossRefGoogle Scholar
  21. 21.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Sci 290(5500):2323–2326CrossRefGoogle Scholar
  22. 22.
    Liu Z, Wang W, Jin Q et al (2016) Manifold alignment using discrete surface Ricci flow. CAAI Trans Intell Technol 1(3):285–292CrossRefGoogle Scholar
  23. 23.
    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396CrossRefMATHGoogle Scholar
  24. 24.
    Sampat MP, Wang Z, Gupta S et al (2009) Complex wavelet structural similarity: a new image similarity index. IEEE Trans Image Process 18(11):2385–2401MathSciNetCrossRefGoogle Scholar
  25. 25.
    Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434MathSciNetMATHGoogle Scholar
  26. 26.
    Ding SF, Hua XP (2014) Recursive least squares projection twin support vector machines for nonlinear classification. Neurocomputing 130:3–9CrossRefGoogle Scholar
  27. 27.
    Xu X, Law R, Chen W et al (2016) Forecasting tourism demand by extracting fuzzy Takagi–Sugeno rules from trained SVMs. CAAI Trans Intell Technol 1(1):30–42CrossRefGoogle Scholar
  28. 28.
    Chen WJ, Shao YH, Hong N (2014) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mach Learn Cybern 5(3):459–468CrossRefGoogle Scholar
  29. 29.
    Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Proc Adv Neural Inf Process Syst 2:849–856Google Scholar
  30. 30.
    Wang L, Bo LF, Jiao LC (2007) Density-sensitive spectral clustering. Acta Electron Sin 35(8):1577–1581Google Scholar
  31. 31.
    Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Proc Adv Neural Inf Process Syst 1601–1608Google Scholar
  32. 32.
    Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-20). Technical Report CUCS-005-96. Columbia University, USAGoogle Scholar
  33. 33.
    Graham DB, Allinson NM (1998) Characterising virtual eigensignatures for general purpose face recognition. Face recognition. Springer, Berlin Heidelberg, pp 446–456Google Scholar
  34. 34.
    Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer, BerlinMATHGoogle Scholar
  35. 35.
    Ma Z, Liu Q, Sun K et al (2016) A syncretic representation for image classification and face recognition. CAAI Trans Intell Technol 1(2):173–178CrossRefGoogle Scholar
  36. 36.
    Zeng S, Yang X, Gou J et al (2016) Integrating absolute distances in collaborative representation for robust image classification. CAAI Trans Intell Technol 1(2):189–196CrossRefGoogle Scholar
  37. 37.
    Xia Z, Wang X, Sun X et al (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed Tools Appl 75(4):1947–1962CrossRefGoogle Scholar
  38. 38.
    Gu B, Sheng VS, Wang Z et al (2015) Incremental learning for ν-support vector regression. Neural Netw 67:140–150CrossRefGoogle Scholar
  39. 39.
    Jia HJ, Ding SF, Meng LH et al (2014) A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction. Neural Comput Applic 25(7–8):1557–1567CrossRefGoogle Scholar
  40. 40.
    Wang XZ, He YL, Wang DD (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. IEEE Trans Cybern 44(1):21–39CrossRefGoogle Scholar
  41. 41.
    Chen WY, Song YQ, Bai HJ et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586CrossRefGoogle Scholar
  42. 42.
    Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Courier Dover Publications, MineolaMATHGoogle Scholar
  43. 43.
    Strehl A, Ghosh J (2003) Cluster ensembles- knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyChina University of Mining and TechnologyXuzhouChina
  2. 2.Key Laboratory of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  3. 3.School of Computer and SoftwareNanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations