Skip to main content

Rotation-DPeak: Improving Density Peaks Selection for Imbalanced Data

  • Conference paper
  • First Online:
Big Data (BigData 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1320))

Included in the following conference series:

  • 992 Accesses

Abstract

Density Peak (DPeak) is an effective clustering algorithm. It maps arbitrary dimensional data onto a 2-dimensional space, which yields cluster centers and outliers automatically distribute on upper right and upper left corner, respectively. However, DPeak is not suitable for imbalanced data set with large difference in density, where sparse clusters are usually not identified. Hence, an improved DPeak, namely Rotation-DPeak, is proposed to overcome this drawback according to an simple idea: the higher density of a point p, the larger \(\delta \) it should have such that p can be picked as a density peak, where \(\delta \) is the distance from p to its nearest neighbor with higher density. Then, we use a quadratic curve to select points with the largest decision gap as density peaks, instead of choosing points with the largest \(\gamma \), where \(\gamma =\rho \times \delta \). Experiments shows that the proposed algorithm obtains better performance on imbalanced data set, which proves that it is promising.

https://github.com/XFastDataLab/Rotation-DPeak.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/index.php.

References

  1. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003)

    Article  Google Scholar 

  2. Zhong, C., Miao, D., FrNti, P.: Minimum spanning tree based split-and-merge: a hierarchical clustering method. Inf. Ences 181(16), 3397–3410 (2011)

    Google Scholar 

  3. Wang, W., Yang, J., Muntz, R.: Sting: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd International Conference Very Large Data Bases, VLDB 1997, Athens, Greece, pp. 186–195 (1997)

    Google Scholar 

  4. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  5. Chen, Y., Tang, S., Bouguila, N., Wang, C., Du, J., Li, H.: A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data. Pattern Recognit. 83, 375–387 (2018)

    Article  Google Scholar 

  6. Chen, Y., et al.: KNN-block DBSCAN: fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 1–15 (2019)

    Google Scholar 

  7. Chen, Y., Zhou, L., Bouguila, N., Wang, C., Chen, Y., Du, J.: Block-DBSCAN: fast clustering for large scale data. Pattern Recognit. 109, 107624 (2021)

    Google Scholar 

  8. Kang, Z., Wen, L., Chen, W., Xu, Z.: Low-rank kernel learning for graph-based clustering. Knowl. Based Syst. 163, 510–517 (2019)

    Article  Google Scholar 

  9. Kang, Z., et al.: Partition level multiview subspace clustering. Neural Netw. 122, 279–288 (2020)

    Article  Google Scholar 

  10. Xing, Y., Yu, G., Domeniconi, C., Wang, J., Zhang, Z., Guo, M.: Multi-view multi-instance multi-label learning based on collaborative matrix factorization, pp. 5508–5515 (2019)

    Google Scholar 

  11. Huang, D., Wang, C.D., Wu, J., Lai, J.H., Kwoh, C.K.: Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. 32(6), 1212–1226 (2019)

    Article  Google Scholar 

  12. Zhang, Z., et al.: Flexible auto-weighted local-coordinate concept factorization: a robust framework for unsupervised clustering. IEEE Trans. Knowl. Data Eng. 1 (2019)

    Google Scholar 

  13. Shi, Y., Chen, Z., Qi, Z., Meng, F., Cui, L.: A novel clustering-based image segmentation via density peaks algorithm with mid-level feature. Neural Comput. Appl. 28(1), 29–39 (2016). https://doi.org/10.1007/s00521-016-2300-1

    Article  Google Scholar 

  14. Bai, X., Yang, P., Shi, X.: An overlapping community detection algorithm based on density peaks. Neurocomputing 226(22), 7–15 (2017)

    Article  Google Scholar 

  15. Liu, D., Su, Y., Li, X., Niu, Z.: A novel community detection method based on cluster density peaks. In: National CCF Conference on Natural Language Processing & Chinese Computing, vol. PP, pp. 515–525 (2017)

    Google Scholar 

  16. Wang, B., Zhang, J., Liu, Y.: Density peaks clustering based integrate framework for multi-document summarization. CAAI Trans. Intell. Technol. 2(1), 26–30 (2017)

    Article  Google Scholar 

  17. Li, C., Ding, G., Wang, D., Yan, L., Wang, S.: Clustering by fast search and find of density peaks with data field. Chin. J. Electron. 25(3), 397–402 (2016)

    Article  Google Scholar 

  18. Mehmood, R., El-Ashram, S., Bie, R., Sun, Y.: Effective cancer subtyping by employing density peaks clustering by using gene expression microarray. Pers. Ubiquit. Comput. 22(3), 615–619 (2018). https://doi.org/10.1007/s00779-018-1112-y

    Article  Google Scholar 

  19. Cheng, D., Zhu, Q., Huang, J., Wu, Q., Lijun, Y.: Clustering with local density peaks-based minimum spanning tree. IEEE Trans. Knowl. Data Eng. PP(99), 1 (2019). https://doi.org/10.1109/TKDE.2019.2930056

  20. Chen, Y., et al.: Fast density peak clustering for large scale data based on KNN. Knowl. Based Syst. 187, 104824 (2020)

    Google Scholar 

  21. Chen, Y., et al.: Decentralized clustering by finding loose and distributed density cores. Inf. Sci. 433–434, 649–660 (2018)

    MathSciNet  Google Scholar 

  22. Yaohui, L., Zhengming, M., Fang, Y.: Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl. Based Syst. 133, 208–220 (2017)

    Article  Google Scholar 

  23. Liang, Z., Chen, P.: Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering. Pattern Recognit. Lett. 73, 52–59 (2016)

    Article  Google Scholar 

  24. Wang, X.F., Xu, Y.: Fast clustering using adaptive density peak detection. Stat. Methods Med. Res. 26(6), 2800–2811 (2017)

    Article  MathSciNet  Google Scholar 

  25. Ding, J., He, X., Yuan, J., Jiang, B.: Automatic clustering based on density peak detection using generalized extreme value distribution. In: Soft Computing. A Fusion of Foundations Methodologies & Applications, pp. 515–525 (2018)

    Google Scholar 

Download references

Acknowledgment

We acknowledge financial support from the National Natural Science Foundation of China (No. 61673186, 61972010, 61975124).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yewang Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, X., Yan, M., Chen, Y., Yang, L., Du, J. (2021). Rotation-DPeak: Improving Density Peaks Selection for Imbalanced Data. In: Mei, H., et al. Big Data. BigData 2020. Communications in Computer and Information Science, vol 1320. Springer, Singapore. https://doi.org/10.1007/978-981-16-0705-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0705-9_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0704-2

  • Online ISBN: 978-981-16-0705-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics