Abstract
Clustering is an important means of obtaining hidden information, and is widely used in economics, biomedicine and other disciplines. Data imbalance widely exists in real-world datasets. For example, when fraud detection is performs in transaction data, only a very small amount of transaction data has fraudulent behavior. Therefore clustering on density-imbalanced datasets has practical implications. Various clustering algorithms have been proposed in recent years, but most clustering algorithms cannot correctly identify low-density clusters on density-imbalanced datasets, resulting in clustering failure. To this end, we propose a density ratio peak clustering (DRPC) algorithm, which solves the problem that the original density peak clustering (DPC) algorithm cannot correctly identify low-density clusters and non-center points allocation error linkage problem on density-imbalanced datasets. We conduct experiments on shape datasets, density-imbalanced datasets, and UCI real-world datasets, using normalized mutual information NMI as the evaluation metric, comparing with SNN-DPC, DPC-KNN, DPC, DBSCAN, K-Means algorithms. Experiment results show that DRPC not only inherits the advantages of DPC, but also can more accurately cluster density-imbalanced datasets, and the NMI of the clustering results has increased by \(1.5\%\) on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fang, U., Li, J., Akhtar, N., Li, M., Jia, Y.: Gomic: multi-view image clustering via self-supervised contrastive heterogeneous graph co-learning. World Wide Web-Internet Web Inf. Syst. (2023)
Wang, J., Shi, Y., Li, D., Zhang, K., Chen, Z., Li, H.: MCHA: a multistage clustering-based hierarchical attention model for knowledge graph-aware recommendation. World Wide Web-Internet and Web Inf. Syst. 25(3, SI), 1103–1127 (2022)
Yuan, C., Zhu, Y., Zhong, Z., Zheng, W., Zhu, X.: Robust self-tuning multi-view clustering. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS 25(2, SI), 489–512 (MAR 2022)
Wang, H.Z.: Corrigendum to ‘a fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowl.-Based Syst. 30, 129–135 (2012)
Tian, Z., Ramakrishnan, R.: Miron livny: birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec. 25(2) (1999)
Wang, W., Yang, J., Muntz, R.: Sting: A statistical information grid approach to spatial data mining. In: VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, 25–29 August 1997, Athens, Greece (1997)
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst. 99, 135–145 (2016)
Xiang, L.Y.H..: Dynamic resource allocation algorithm based on big data stream characteristic and improved SOM clustering. Comput. Appl. Softw. (2019)
Hu, T., Sung, S.Y.: A Hybrid EM Approach to Spatial Clustering. Elsevier Science Publishers B. V., Amsterdam (2006)
Pourbahrami, S., Hashemzadeh, M.: A geometric-based clustering method using natural neighbors. Inf. Sci. 610, 694–706 (2022)
Pourbahrami, S., Khanli, L.M., Azimpour, S.: Improving neighborhood construction with apollonius region algorithm based on density for clustering. Inf. Sci. 522, 227–240 (2020)
Xu, X., Ding, S., Wang, L., Wang, Y.: A robust density peaks clustering algorithm with density-sensitive similarity. Knowl.-Based Syst. 200 (2020)
Cheng, D., Zhang, S., Huang, J.: Dense members of local cores-based density peaks clustering algorithm. Knowl.-Based Syst. 193 (2020)
Tao, X., et al.: Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering. Inf. Sci. 519, 43–73 (2020)
Gong, C., Su, Z.G., Wang, P.H., Wang, Q.: Cumulative belief peaks evidential k-nearest neighbor clustering. Knowl.-Based Syst. 200 (2020)
Flores, K.G., Garza, S.E.: Density peaks clustering with gap-based automatic center detection. Knowl.-Based Syst. 206 (2020)
Lu, H., Shen, Z., Sang, X., Zhao, Q., Lu, J.: Community detection method using improved density peak clustering and nonnegative matrix factorization. Neurocomputing 415, 247–257 (2020)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Wang, S., Li, Q., Zhao, C., Zhu, X., Dai, T.: Extreme clustering - a clustering method via density extreme points. Inf. Sci. 542 (2020)
Cheng, D., Huang, J., Zhang, S., Liu, H.: Improved density peaks clustering based on shared-neighbors of local cores for manifold data sets. IEEE Access 7, 151339–151349 (2019)
Liu, R., Huang, W., Fei, Z., Wang, K., Liang, J.: Constraint-based clustering by fast search and find of density peaks. Neurocomputing 330, 223–237 (2019)
Liu, L., Yu, D.: Density peaks clustering algorithm based on weighted k-nearest neighbors and geodesic distance. IEEE Access 8, 168282–168296 (2020)
Li, J., Zhu, Q., Wu, Q.: A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl.-Based Syst. 184 (2019)
Li, R., Yang, X., Qin, X., Zhu, W.: Local gap density for clustering high-dimensional data with varying densities. Knowl.-Based Syst. 184 (2019)
Wang, Y., Wong, K.C., Li, X.: Exploring high-throughput biomolecular data with multiobjective robust continuous clustering. Inf. Sci. 583, 239–265 (2022)
Tao, X., Chen, W., Zhang, X., Guo, W., Qi, L., Fan, Z.: SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data. Knowl.-Based Syst. 234 (2021)
Dua, D., Newman, D.: UCI machine learning repository, University of California, School of Information and Computer Science (2017)
Acknowledgements
This research is supported by National Natural Science Foundation of China (62306033, 62076027).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, S. et al. (2024). Density Ratio Peak Clustering. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_31
Download citation
DOI: https://doi.org/10.1007/978-981-97-2421-5_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)