Skip to main content

Density Ratio Peak Clustering

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2023)

Abstract

Clustering is an important means of obtaining hidden information, and is widely used in economics, biomedicine and other disciplines. Data imbalance widely exists in real-world datasets. For example, when fraud detection is performs in transaction data, only a very small amount of transaction data has fraudulent behavior. Therefore clustering on density-imbalanced datasets has practical implications. Various clustering algorithms have been proposed in recent years, but most clustering algorithms cannot correctly identify low-density clusters on density-imbalanced datasets, resulting in clustering failure. To this end, we propose a density ratio peak clustering (DRPC) algorithm, which solves the problem that the original density peak clustering (DPC) algorithm cannot correctly identify low-density clusters and non-center points allocation error linkage problem on density-imbalanced datasets. We conduct experiments on shape datasets, density-imbalanced datasets, and UCI real-world datasets, using normalized mutual information NMI as the evaluation metric, comparing with SNN-DPC, DPC-KNN, DPC, DBSCAN, K-Means algorithms. Experiment results show that DRPC not only inherits the advantages of DPC, but also can more accurately cluster density-imbalanced datasets, and the NMI of the clustering results has increased by \(1.5\%\) on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fang, U., Li, J., Akhtar, N., Li, M., Jia, Y.: Gomic: multi-view image clustering via self-supervised contrastive heterogeneous graph co-learning. World Wide Web-Internet Web Inf. Syst. (2023)

    Google Scholar 

  2. Wang, J., Shi, Y., Li, D., Zhang, K., Chen, Z., Li, H.: MCHA: a multistage clustering-based hierarchical attention model for knowledge graph-aware recommendation. World Wide Web-Internet and Web Inf. Syst. 25(3, SI), 1103–1127 (2022)

    Google Scholar 

  3. Yuan, C., Zhu, Y., Zhong, Z., Zheng, W., Zhu, X.: Robust self-tuning multi-view clustering. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS 25(2, SI), 489–512 (MAR 2022)

    Google Scholar 

  4. Wang, H.Z.: Corrigendum to ‘a fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowl.-Based Syst. 30, 129–135 (2012)

    Google Scholar 

  5. Tian, Z., Ramakrishnan, R.: Miron livny: birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec. 25(2) (1999)

    Google Scholar 

  6. Wang, W., Yang, J., Muntz, R.: Sting: A statistical information grid approach to spatial data mining. In: VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, 25–29 August 1997, Athens, Greece (1997)

    Google Scholar 

  7. Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)

    Google Scholar 

  8. Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst. 99, 135–145 (2016)

    Google Scholar 

  9. Xiang, L.Y.H..: Dynamic resource allocation algorithm based on big data stream characteristic and improved SOM clustering. Comput. Appl. Softw. (2019)

    Google Scholar 

  10. Hu, T., Sung, S.Y.: A Hybrid EM Approach to Spatial Clustering. Elsevier Science Publishers B. V., Amsterdam (2006)

    Google Scholar 

  11. Pourbahrami, S., Hashemzadeh, M.: A geometric-based clustering method using natural neighbors. Inf. Sci. 610, 694–706 (2022)

    Google Scholar 

  12. Pourbahrami, S., Khanli, L.M., Azimpour, S.: Improving neighborhood construction with apollonius region algorithm based on density for clustering. Inf. Sci. 522, 227–240 (2020)

    Google Scholar 

  13. Xu, X., Ding, S., Wang, L., Wang, Y.: A robust density peaks clustering algorithm with density-sensitive similarity. Knowl.-Based Syst. 200 (2020)

    Google Scholar 

  14. Cheng, D., Zhang, S., Huang, J.: Dense members of local cores-based density peaks clustering algorithm. Knowl.-Based Syst. 193 (2020)

    Google Scholar 

  15. Tao, X., et al.: Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering. Inf. Sci. 519, 43–73 (2020)

    Google Scholar 

  16. Gong, C., Su, Z.G., Wang, P.H., Wang, Q.: Cumulative belief peaks evidential k-nearest neighbor clustering. Knowl.-Based Syst. 200 (2020)

    Google Scholar 

  17. Flores, K.G., Garza, S.E.: Density peaks clustering with gap-based automatic center detection. Knowl.-Based Syst. 206 (2020)

    Google Scholar 

  18. Lu, H., Shen, Z., Sang, X., Zhao, Q., Lu, J.: Community detection method using improved density peak clustering and nonnegative matrix factorization. Neurocomputing 415, 247–257 (2020)

    Google Scholar 

  19. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Google Scholar 

  20. Wang, S., Li, Q., Zhao, C., Zhu, X., Dai, T.: Extreme clustering - a clustering method via density extreme points. Inf. Sci. 542 (2020)

    Google Scholar 

  21. Cheng, D., Huang, J., Zhang, S., Liu, H.: Improved density peaks clustering based on shared-neighbors of local cores for manifold data sets. IEEE Access 7, 151339–151349 (2019)

    Article  Google Scholar 

  22. Liu, R., Huang, W., Fei, Z., Wang, K., Liang, J.: Constraint-based clustering by fast search and find of density peaks. Neurocomputing 330, 223–237 (2019)

    Google Scholar 

  23. Liu, L., Yu, D.: Density peaks clustering algorithm based on weighted k-nearest neighbors and geodesic distance. IEEE Access 8, 168282–168296 (2020)

    Article  Google Scholar 

  24. Li, J., Zhu, Q., Wu, Q.: A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl.-Based Syst. 184 (2019)

    Google Scholar 

  25. Li, R., Yang, X., Qin, X., Zhu, W.: Local gap density for clustering high-dimensional data with varying densities. Knowl.-Based Syst. 184 (2019)

    Google Scholar 

  26. Wang, Y., Wong, K.C., Li, X.: Exploring high-throughput biomolecular data with multiobjective robust continuous clustering. Inf. Sci. 583, 239–265 (2022)

    Google Scholar 

  27. Tao, X., Chen, W., Zhang, X., Guo, W., Qi, L., Fan, Z.: SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data. Knowl.-Based Syst. 234 (2021)

    Google Scholar 

  28. Dua, D., Newman, D.: UCI machine learning repository, University of California, School of Information and Computer Science (2017)

    Google Scholar 

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (62306033, 62076027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuliang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S. et al. (2024). Density Ratio Peak Clustering. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2421-5_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2420-8

  • Online ISBN: 978-981-97-2421-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics