Abstract
Density Peak (DPeak) is an effective clustering algorithm. It maps arbitrary dimensional data onto a 2-dimensional space, which yields cluster centers and outliers automatically distribute on upper right and upper left corner, respectively. However, DPeak is not suitable for imbalanced data set with large difference in density, where sparse clusters are usually not identified. Hence, an improved DPeak, namely Rotation-DPeak, is proposed to overcome this drawback according to an simple idea: the higher density of a point p, the larger \(\delta \) it should have such that p can be picked as a density peak, where \(\delta \) is the distance from p to its nearest neighbor with higher density. Then, we use a quadratic curve to select points with the largest decision gap as density peaks, instead of choosing points with the largest \(\gamma \), where \(\gamma =\rho \times \delta \). Experiments shows that the proposed algorithm obtains better performance on imbalanced data set, which proves that it is promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognit. 36(2), 451–461 (2003)
Zhong, C., Miao, D., FrNti, P.: Minimum spanning tree based split-and-merge: a hierarchical clustering method. Inf. Ences 181(16), 3397–3410 (2011)
Wang, W., Yang, J., Muntz, R.: Sting: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd International Conference Very Large Data Bases, VLDB 1997, Athens, Greece, pp. 186–195 (1997)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Chen, Y., Tang, S., Bouguila, N., Wang, C., Du, J., Li, H.: A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data. Pattern Recognit. 83, 375–387 (2018)
Chen, Y., et al.: KNN-block DBSCAN: fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 1–15 (2019)
Chen, Y., Zhou, L., Bouguila, N., Wang, C., Chen, Y., Du, J.: Block-DBSCAN: fast clustering for large scale data. Pattern Recognit. 109, 107624 (2021)
Kang, Z., Wen, L., Chen, W., Xu, Z.: Low-rank kernel learning for graph-based clustering. Knowl. Based Syst. 163, 510–517 (2019)
Kang, Z., et al.: Partition level multiview subspace clustering. Neural Netw. 122, 279–288 (2020)
Xing, Y., Yu, G., Domeniconi, C., Wang, J., Zhang, Z., Guo, M.: Multi-view multi-instance multi-label learning based on collaborative matrix factorization, pp. 5508–5515 (2019)
Huang, D., Wang, C.D., Wu, J., Lai, J.H., Kwoh, C.K.: Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. 32(6), 1212–1226 (2019)
Zhang, Z., et al.: Flexible auto-weighted local-coordinate concept factorization: a robust framework for unsupervised clustering. IEEE Trans. Knowl. Data Eng. 1 (2019)
Shi, Y., Chen, Z., Qi, Z., Meng, F., Cui, L.: A novel clustering-based image segmentation via density peaks algorithm with mid-level feature. Neural Comput. Appl. 28(1), 29–39 (2016). https://doi.org/10.1007/s00521-016-2300-1
Bai, X., Yang, P., Shi, X.: An overlapping community detection algorithm based on density peaks. Neurocomputing 226(22), 7–15 (2017)
Liu, D., Su, Y., Li, X., Niu, Z.: A novel community detection method based on cluster density peaks. In: National CCF Conference on Natural Language Processing & Chinese Computing, vol. PP, pp. 515–525 (2017)
Wang, B., Zhang, J., Liu, Y.: Density peaks clustering based integrate framework for multi-document summarization. CAAI Trans. Intell. Technol. 2(1), 26–30 (2017)
Li, C., Ding, G., Wang, D., Yan, L., Wang, S.: Clustering by fast search and find of density peaks with data field. Chin. J. Electron. 25(3), 397–402 (2016)
Mehmood, R., El-Ashram, S., Bie, R., Sun, Y.: Effective cancer subtyping by employing density peaks clustering by using gene expression microarray. Pers. Ubiquit. Comput. 22(3), 615–619 (2018). https://doi.org/10.1007/s00779-018-1112-y
Cheng, D., Zhu, Q., Huang, J., Wu, Q., Lijun, Y.: Clustering with local density peaks-based minimum spanning tree. IEEE Trans. Knowl. Data Eng. PP(99), 1 (2019). https://doi.org/10.1109/TKDE.2019.2930056
Chen, Y., et al.: Fast density peak clustering for large scale data based on KNN. Knowl. Based Syst. 187, 104824 (2020)
Chen, Y., et al.: Decentralized clustering by finding loose and distributed density cores. Inf. Sci. 433–434, 649–660 (2018)
Yaohui, L., Zhengming, M., Fang, Y.: Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl. Based Syst. 133, 208–220 (2017)
Liang, Z., Chen, P.: Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering. Pattern Recognit. Lett. 73, 52–59 (2016)
Wang, X.F., Xu, Y.: Fast clustering using adaptive density peak detection. Stat. Methods Med. Res. 26(6), 2800–2811 (2017)
Ding, J., He, X., Yuan, J., Jiang, B.: Automatic clustering based on density peak detection using generalized extreme value distribution. In: Soft Computing. A Fusion of Foundations Methodologies & Applications, pp. 515–525 (2018)
Acknowledgment
We acknowledge financial support from the National Natural Science Foundation of China (No. 61673186, 61972010, 61975124).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, X., Yan, M., Chen, Y., Yang, L., Du, J. (2021). Rotation-DPeak: Improving Density Peaks Selection for Imbalanced Data. In: Mei, H., et al. Big Data. BigData 2020. Communications in Computer and Information Science, vol 1320. Springer, Singapore. https://doi.org/10.1007/978-981-16-0705-9_4
Download citation
DOI: https://doi.org/10.1007/978-981-16-0705-9_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0704-2
Online ISBN: 978-981-16-0705-9
eBook Packages: Computer ScienceComputer Science (R0)