CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering

Zhang, Ruilin; Zheng, Haiyang; Wang, Hongpeng

doi:10.1007/978-3-031-46677-9_19

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14180))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

426 Accesses

Abstract

One of the main challenges in data mining is choosing the optimal number of clusters without prior information. Notably, existing methods are usually in the philosophy of cluster validation and hence have underlying assumptions on data distribution, which prevents their application to complex data such as large-scale images and high-dimensional data from the real world. In this regard, we propose an approach named CNMBI. Leveraging the distribution information inherent in the data space, we map the target task as a dynamic comparison process between cluster centers regarding positional behavior, without relying on the complete clustering results and designing the complex validity index as before. Bipartite graph theory is then employed to efficiently model this process. Additionally, we find that different samples have different confidence levels and thereby actively remove low-confidence ones, which is, for the first time to our knowledge, considered in cluster number determination. CNMBI is robust and allows for more flexibility in the dimension and shape of the target data (e.g., CIFAR-10 and STL-10). Extensive comparisof-the-art competitors on various challenging datasets demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Data points, objects, and samples are used exchangeably in this paper.
2.
http://cs.joensuu.fi/sipu/datasets/.
3.
http://www.cs.toronto.edu/ kriz/cifar.html.
4.
https://cs.stanford.edu/ acoates/stl10/.
5.
http://yann.lecun.com/exdb/mnist/.
6.
http://www-prima.inrialpes.fr/Pointing04.

References

Abdalameer, A.K., Alswaitti, M., Alsudani, A.A., Isa, N.A.M.: A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters. Expert Syst. Appl. 191, 116329 (2022)
Article Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013). https://doi.org/10.1145/2063576.2063689
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974). https://doi.org/10.1080/03610917408548446
Cheng, D., Zhu, Q., Huang, J., Wu, Q., Yang, L.: A novel cluster validity index based on local cores. IEEE Trans. Neural Netw. Learn. Syst. 30(4), 985–999 (2019). https://doi.org/10.1109/TNNLS.2018.2853710
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)
Google Scholar
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973)
Article MathSciNet MATH Google Scholar
Li, Y., Hu, P., Liu, Z., Peng, D., Zhou, J.T., Peng, X.: Contrastive clustering. Proc. AAAI Conf. Artif. Intell. 35(10), 8547–8555 (2021). https://doi.org/10.1609/aaai.v35i10.17037
Article Google Scholar
Nguyen, S.D., Nguyen, V.S.T., Pham, N.T.: Determination of the optimal number of clusters: a fuzzy-set based method. IEEE Trans. Fuzzy Syst. 30(9), 3514–3526 (2022). https://doi.org/10.1109/TFUZZ.2021.3118113
Article Google Scholar
Qiu, T., Li, Y.: Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets. IEEE Transactions on Knowledge and Data Engineering 35(5), 4767–4780 (2022)
Google Scholar
Rasool, Z., Zhou, R., Chen, L., Liu, C., Xu, J.: Index-based solutions for efficient density peak clustering. IEEE Trans. Knowl. Data Eng. 34(5), 2212–2226 (2022). https://doi.org/10.1109/TKDE.2020.3004221
Article Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014). https://doi.org/10.1126/science.1242072
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7
Article MATH Google Scholar
Saha, J., Mukherjee, J.: CNAK: cluster number assisted k-means. Pattern Recogn. 110, 107625 (2021)
Article Google Scholar
Salvador, S., Chan, P.: Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 576–584 (2004)
Google Scholar
Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: an information-theoretic approach. J. Am. Stat. Assoc. 98, 750–763 (2003)
Article MathSciNet MATH Google Scholar
Tavakkol, B., Choi, J., Jeong, M.K., Albin, S.L.: Object-based cluster validation with densities. Pattern Recogn. 121, 108223 (2022)
Article Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. B 63(2), 411–423 (2001). https://doi.org/10.1111/1467-9868.00293
Article MathSciNet MATH Google Scholar
Xu, X., Ding, S., Wang, L., Wang, Y.: A robust density peaks clustering algorithm with density-sensitive similarity. Knowl.-Based Syst. 200, 106028 (2020)
Article Google Scholar
Zhang, R., Miao, Z., Tian, Y., Wang, H.: A novel density peaks clustering algorithm based on hopkins statistic. Expert Syst. Appl. 201, 116892 (2022)
Article Google Scholar
Zhang, R., Zheng, H.: Density clustering based on the border-peeling using space vector decomposition. Acta Automatica Sinica 49(6), 1–19 (2023)
Google Scholar
Zhang, Y., Mańdziuk, J., Quek, C.H., Goh, B.W.: Curvature-based method for determining the number of clusters. Inf. Sci. 415–416, 414–428 (2017)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Shenzhen Fundamental Research Fund (JCYJ20210324132212030) and the Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005).

Author information

Authors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Ruilin Zhang, Haiyang Zheng & Hongpeng Wang
Peng Cheng Laboratory, Shenzhen, China
Hongpeng Wang

Authors

Ruilin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haiyang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Hongpeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongpeng Wang .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Xiaochun Yang
The University of Indonesia, Depok, Indonesia
Heru Suhartanto
Beijing Institute of Technology, Beijing, China
Guoren Wang
Northeastern University, Shenyang, China
Bin Wang
University of Technology Sydney, Sydney, NSW, Australia
Jing Jiang
Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Bing Li
Sun Yat-sen University, Guangzhou, China
Huaijie Zhu
Anhui University, Hefei, China
Ningning Cui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, R., Zheng, H., Wang, H. (2023). CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14180. Springer, Cham. https://doi.org/10.1007/978-3-031-46677-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-46677-9_19
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46676-2
Online ISBN: 978-3-031-46677-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering