Skip to main content

CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2023)

Abstract

One of the main challenges in data mining is choosing the optimal number of clusters without prior information. Notably, existing methods are usually in the philosophy of cluster validation and hence have underlying assumptions on data distribution, which prevents their application to complex data such as large-scale images and high-dimensional data from the real world. In this regard, we propose an approach named CNMBI. Leveraging the distribution information inherent in the data space, we map the target task as a dynamic comparison process between cluster centers regarding positional behavior, without relying on the complete clustering results and designing the complex validity index as before. Bipartite graph theory is then employed to efficiently model this process. Additionally, we find that different samples have different confidence levels and thereby actively remove low-confidence ones, which is, for the first time to our knowledge, considered in cluster number determination. CNMBI is robust and allows for more flexibility in the dimension and shape of the target data (e.g., CIFAR-10 and STL-10). Extensive comparisof-the-art competitors on various challenging datasets demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Data points, objects, and samples are used exchangeably in this paper.

  2. 2.

    http://cs.joensuu.fi/sipu/datasets/.

  3. 3.

    http://www.cs.toronto.edu/ kriz/cifar.html.

  4. 4.

    https://cs.stanford.edu/ acoates/stl10/.

  5. 5.

    http://yann.lecun.com/exdb/mnist/.

  6. 6.

    http://www-prima.inrialpes.fr/Pointing04.

References

  1. Abdalameer, A.K., Alswaitti, M., Alsudani, A.A., Isa, N.A.M.: A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters. Expert Syst. Appl. 191, 116329 (2022)

    Article  Google Scholar 

  2. Bache, K., Lichman, M.: UCI machine learning repository (2013). https://doi.org/10.1145/2063576.2063689

  3. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974). https://doi.org/10.1080/03610917408548446

  4. Cheng, D., Zhu, Q., Huang, J., Wu, Q., Yang, L.: A novel cluster validity index based on local cores. IEEE Trans. Neural Netw. Learn. Syst. 30(4), 985–999 (2019). https://doi.org/10.1109/TNNLS.2018.2853710

    Article  Google Scholar 

  5. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)

    Google Scholar 

  6. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  7. Li, Y., Hu, P., Liu, Z., Peng, D., Zhou, J.T., Peng, X.: Contrastive clustering. Proc. AAAI Conf. Artif. Intell. 35(10), 8547–8555 (2021). https://doi.org/10.1609/aaai.v35i10.17037

    Article  Google Scholar 

  8. Nguyen, S.D., Nguyen, V.S.T., Pham, N.T.: Determination of the optimal number of clusters: a fuzzy-set based method. IEEE Trans. Fuzzy Syst. 30(9), 3514–3526 (2022). https://doi.org/10.1109/TFUZZ.2021.3118113

    Article  Google Scholar 

  9. Qiu, T., Li, Y.: Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets. IEEE Transactions on Knowledge and Data Engineering 35(5), 4767–4780 (2022)

    Google Scholar 

  10. Rasool, Z., Zhou, R., Chen, L., Liu, C., Xu, J.: Index-based solutions for efficient density peak clustering. IEEE Trans. Knowl. Data Eng. 34(5), 2212–2226 (2022). https://doi.org/10.1109/TKDE.2020.3004221

    Article  Google Scholar 

  11. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014). https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  12. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7

    Article  MATH  Google Scholar 

  13. Saha, J., Mukherjee, J.: CNAK: cluster number assisted k-means. Pattern Recogn. 110, 107625 (2021)

    Article  Google Scholar 

  14. Salvador, S., Chan, P.: Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 576–584 (2004)

    Google Scholar 

  15. Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: an information-theoretic approach. J. Am. Stat. Assoc. 98, 750–763 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Tavakkol, B., Choi, J., Jeong, M.K., Albin, S.L.: Object-based cluster validation with densities. Pattern Recogn. 121, 108223 (2022)

    Article  Google Scholar 

  17. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc. B 63(2), 411–423 (2001). https://doi.org/10.1111/1467-9868.00293

    Article  MathSciNet  MATH  Google Scholar 

  18. Xu, X., Ding, S., Wang, L., Wang, Y.: A robust density peaks clustering algorithm with density-sensitive similarity. Knowl.-Based Syst. 200, 106028 (2020)

    Article  Google Scholar 

  19. Zhang, R., Miao, Z., Tian, Y., Wang, H.: A novel density peaks clustering algorithm based on hopkins statistic. Expert Syst. Appl. 201, 116892 (2022)

    Article  Google Scholar 

  20. Zhang, R., Zheng, H.: Density clustering based on the border-peeling using space vector decomposition. Acta Automatica Sinica 49(6), 1–19 (2023)

    Google Scholar 

  21. Zhang, Y., Mańdziuk, J., Quek, C.H., Goh, B.W.: Curvature-based method for determining the number of clusters. Inf. Sci. 415–416, 414–428 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Shenzhen Fundamental Research Fund (JCYJ20210324132212030) and the Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongpeng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, R., Zheng, H., Wang, H. (2023). CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14180. Springer, Cham. https://doi.org/10.1007/978-3-031-46677-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46677-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46676-2

  • Online ISBN: 978-3-031-46677-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics