Density Ratio Peak Clustering

Wang, Shuliang; Liu, Xiaojia; Li, Qi; Yuan, Hanning; Yuan, Ye; Feng, Ziwen; Zhang, Fan

doi:10.1007/978-981-97-2421-5_31

Shuliang Wang¹²,
Xiaojia Liu¹²,
Qi Li¹²,
Hanning Yuan¹²,
Ye Yuan¹²,
Ziwen Feng¹² &
…
Fan Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14334))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

45 Accesses

Abstract

Clustering is an important means of obtaining hidden information, and is widely used in economics, biomedicine and other disciplines. Data imbalance widely exists in real-world datasets. For example, when fraud detection is performs in transaction data, only a very small amount of transaction data has fraudulent behavior. Therefore clustering on density-imbalanced datasets has practical implications. Various clustering algorithms have been proposed in recent years, but most clustering algorithms cannot correctly identify low-density clusters on density-imbalanced datasets, resulting in clustering failure. To this end, we propose a density ratio peak clustering (DRPC) algorithm, which solves the problem that the original density peak clustering (DPC) algorithm cannot correctly identify low-density clusters and non-center points allocation error linkage problem on density-imbalanced datasets. We conduct experiments on shape datasets, density-imbalanced datasets, and UCI real-world datasets, using normalized mutual information NMI as the evaluation metric, comparing with SNN-DPC, DPC-KNN, DPC, DBSCAN, K-Means algorithms. Experiment results show that DRPC not only inherits the advantages of DPC, but also can more accurately cluster density-imbalanced datasets, and the NMI of the clustering results has increased by \(1.5\%\) on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fang, U., Li, J., Akhtar, N., Li, M., Jia, Y.: Gomic: multi-view image clustering via self-supervised contrastive heterogeneous graph co-learning. World Wide Web-Internet Web Inf. Syst. (2023)
Google Scholar
Wang, J., Shi, Y., Li, D., Zhang, K., Chen, Z., Li, H.: MCHA: a multistage clustering-based hierarchical attention model for knowledge graph-aware recommendation. World Wide Web-Internet and Web Inf. Syst. 25(3, SI), 1103–1127 (2022)
Google Scholar
Yuan, C., Zhu, Y., Zhong, Z., Zheng, W., Zhu, X.: Robust self-tuning multi-view clustering. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS 25(2, SI), 489–512 (MAR 2022)
Google Scholar
Wang, H.Z.: Corrigendum to ‘a fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowl.-Based Syst. 30, 129–135 (2012)
Google Scholar
Tian, Z., Ramakrishnan, R.: Miron livny: birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec. 25(2) (1999)
Google Scholar
Wang, W., Yang, J., Muntz, R.: Sting: A statistical information grid approach to spatial data mining. In: VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, 25–29 August 1997, Athens, Greece (1997)
Google Scholar
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Google Scholar
Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst. 99, 135–145 (2016)
Google Scholar
Xiang, L.Y.H..: Dynamic resource allocation algorithm based on big data stream characteristic and improved SOM clustering. Comput. Appl. Softw. (2019)
Google Scholar
Hu, T., Sung, S.Y.: A Hybrid EM Approach to Spatial Clustering. Elsevier Science Publishers B. V., Amsterdam (2006)
Google Scholar
Pourbahrami, S., Hashemzadeh, M.: A geometric-based clustering method using natural neighbors. Inf. Sci. 610, 694–706 (2022)
Google Scholar
Pourbahrami, S., Khanli, L.M., Azimpour, S.: Improving neighborhood construction with apollonius region algorithm based on density for clustering. Inf. Sci. 522, 227–240 (2020)
Google Scholar
Xu, X., Ding, S., Wang, L., Wang, Y.: A robust density peaks clustering algorithm with density-sensitive similarity. Knowl.-Based Syst. 200 (2020)
Google Scholar
Cheng, D., Zhang, S., Huang, J.: Dense members of local cores-based density peaks clustering algorithm. Knowl.-Based Syst. 193 (2020)
Google Scholar
Tao, X., et al.: Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering. Inf. Sci. 519, 43–73 (2020)
Google Scholar
Gong, C., Su, Z.G., Wang, P.H., Wang, Q.: Cumulative belief peaks evidential k-nearest neighbor clustering. Knowl.-Based Syst. 200 (2020)
Google Scholar
Flores, K.G., Garza, S.E.: Density peaks clustering with gap-based automatic center detection. Knowl.-Based Syst. 206 (2020)
Google Scholar
Lu, H., Shen, Z., Sang, X., Zhao, Q., Lu, J.: Community detection method using improved density peak clustering and nonnegative matrix factorization. Neurocomputing 415, 247–257 (2020)
Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Google Scholar
Wang, S., Li, Q., Zhao, C., Zhu, X., Dai, T.: Extreme clustering - a clustering method via density extreme points. Inf. Sci. 542 (2020)
Google Scholar
Cheng, D., Huang, J., Zhang, S., Liu, H.: Improved density peaks clustering based on shared-neighbors of local cores for manifold data sets. IEEE Access 7, 151339–151349 (2019)
Article Google Scholar
Liu, R., Huang, W., Fei, Z., Wang, K., Liang, J.: Constraint-based clustering by fast search and find of density peaks. Neurocomputing 330, 223–237 (2019)
Google Scholar
Liu, L., Yu, D.: Density peaks clustering algorithm based on weighted k-nearest neighbors and geodesic distance. IEEE Access 8, 168282–168296 (2020)
Article Google Scholar
Li, J., Zhu, Q., Wu, Q.: A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl.-Based Syst. 184 (2019)
Google Scholar
Li, R., Yang, X., Qin, X., Zhu, W.: Local gap density for clustering high-dimensional data with varying densities. Knowl.-Based Syst. 184 (2019)
Google Scholar
Wang, Y., Wong, K.C., Li, X.: Exploring high-throughput biomolecular data with multiobjective robust continuous clustering. Inf. Sci. 583, 239–265 (2022)
Google Scholar
Tao, X., Chen, W., Zhang, X., Guo, W., Qi, L., Fan, Z.: SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data. Knowl.-Based Syst. 234 (2021)
Google Scholar
Dua, D., Newman, D.: UCI machine learning repository, University of California, School of Information and Computer Science (2017)
Google Scholar

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (62306033, 62076027).

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Shuliang Wang, Xiaojia Liu, Qi Li, Hanning Yuan, Ye Yuan & Ziwen Feng
China Mobile Information Technology Center, Beijing, 102211, China
Fan Zhang

Authors

Shuliang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojia Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Li
View author publications
You can also search for this author in PubMed Google Scholar
Hanning Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Ziwen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuliang Wang .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S. et al. (2024). Density Ratio Peak Clustering. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_31

Download citation

DOI: https://doi.org/10.1007/978-981-97-2421-5_31
Published: 12 May 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics