Skip to main content
Log in

Multi-start local search algorithm based on a novel objective function for clustering analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Clustering is the process of partitioning data into different clusters with the goal of minimizing the difference of objects within each cluster, where the commonly used evaluation function is defined as the sum of the squared distance from each point to the cluster center to which it belongs. Nevertheless, this general evaluation function is extremely vulnerable to outliers and noisy data, and it is sensitive to initial cluster centers. More seriously, this evaluation function cannot effectively represent the core of clustering results; even if the partition achieves the global optimum value according to the evaluation function, the clustering results may not be good. In this study, we propose a multi-start local search algorithm (MLS) with several techniques to tackle this problem. First, the center of each cluster is no longer its centroid, which reduces the dependence of the cluster algorithm on difference in size and shape of ideal clusters. Second, the number of adjacent points shared between clusters is defined as the new objective function. Third, two basic meta-operations, merge and split, are used to optimize the objective function and make the iterative process insensitive to the initial solution. The novelty of our approach is the selection criterion of the initial centers and the new objective function, which enables MLS to explore more promising search area. Experimental results demonstrate that MLS outperforms traditional centroid-based clustering algorithms in terms of both solution quality and computational efficiency, and it is quite competitive to other reference algorithms such as spectral, density, and geometric based clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The datasets adopted in the current study are available at https://archive.ics.uci.edu/ml/datasets.php.

Notes

  1. https://archive.ics.uci.edu/ml/datasets.php.

References

  1. Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071

    Google Scholar 

  2. Agrawal R, Gehrke J, Gunopulos D et al (2005) Automatic subspace clustering of high dimensional data. Data Min Knowl Disc 11(1):5–33

    MathSciNet  Google Scholar 

  3. Aljalbout E, Golkov V, Siddiqui Y et al (2018) Clustering with deep learning: taxonomy and new methods. arXiv:180107648

  4. Aljarah I, Mafarja M, Heidari AA et al (2020) Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inf Syst 62(2):507–539

    Google Scholar 

  5. Aloise D, Deshpande A, Hansen P et al (2009) NP-Hardness of euclidean sum-of-squares clustering. Mach Learn 75(2):245–248

    MATH  Google Scholar 

  6. Alshammari M, Stavrakakis J, Takatsuka M (2021) Refining a k-nearest neighbor graph for a computationally efficient spectral clustering. Pattern Recog 114:107,869

    Google Scholar 

  7. Amini A, Wah TY, Saybani MR et al (2011) A study of density-grid based clustering algorithms on data streams. In: 2011 Eighth international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, pp 1652–1656

  8. Ankerst M, Breunig MM, Kriegel HP et al (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod Record 28(2):49–60

    Google Scholar 

  9. Ashraf FB, Matin A, Shafi MSR et al (2021) An improved k-means clustering algorithm for multi-dimensional multi-cluster data using meta-heuristics. In: 2021 24th International conference on computer and information technology (ICCIT), IEEE, pp 1-6

  10. Bateni MH, Behnezhad S, Derakhshan M et al (2017) Affinity clustering: hierarchical clustering at scale. In: Proceedings of the 31st International conference on neural information processing systems, pp 6867–6877

  11. Bendoly E (2003) Theory and support for process frameworks of knowledge discovery and data mining from ERP systems. Inf Manag 40(7):639–647

    Google Scholar 

  12. Brown D, Japa A, Shi Y (2019) A fast density-grid based clustering method. In: 2019 IEEE 9Th annual computing and communication workshop and conference (CCWC), IEEE, pp 0048–0054

  13. Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21(4):457–477

    Google Scholar 

  14. Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst Appl 40(1):200–210

    Google Scholar 

  15. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 3(2):224–227

    Google Scholar 

  16. Duwairi R, Abu-Rahmeh M (2015) A novel approach for initializing the spherical K-means clustering algorithm. Simul Model Pract Theory 54:49–63

    Google Scholar 

  17. Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the AAAI conference on artificial intelligence, pp 226–231

  18. Fahy C, Yang S, Gongora M (2018) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228

    Google Scholar 

  19. Fan J (2019) Ope-hca: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput Applic 31(7):2095–2105

    Google Scholar 

  20. Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. Appl Intell 48(12):4743–4759

    MATH  Google Scholar 

  21. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315 (5814):972–976

    MathSciNet  MATH  Google Scholar 

  22. Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):4–14

    Google Scholar 

  23. Glover F (2017) Pseudo-centroid clustering. Soft Comput 21(22):6571–6592

    Google Scholar 

  24. Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record 27(2):73–84

    MATH  Google Scholar 

  25. Guo X, Gao L, Liu X et al (2017) Improved deep embedded clustering with local structure preservation. In: The 26th International joint conference on artificial intelligence (IJCAI), pp 1753–1759

  26. Hartigan JA, Wong MA (1979) Algorithm as 136: a K-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108

    MATH  Google Scholar 

  27. Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473

    Google Scholar 

  28. Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 32(6):1212–1226

    Google Scholar 

  29. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666

    Google Scholar 

  30. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Res Biomol 22(12):2577–2637

    Google Scholar 

  31. Kanungo T, Mount DM, Netanyahu NS et al (2002) An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892

    Google Scholar 

  32. Langan DA, Modestino JW, Zhang J (1998) Cluster validation for unsupervised stochastic model-based image segmentation. IEEE Trans Image Process 7(2):180–195

    Google Scholar 

  33. Li H, Liu X, Li T et al (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recog 102:107,206

    Google Scholar 

  34. Likas A, Vlassis N, Verbeek JJ (2003) The global K-means clustering algorithm. Pattern Recogn 36(2):451–461

    Google Scholar 

  35. Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226

    MathSciNet  Google Scholar 

  36. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    MathSciNet  MATH  Google Scholar 

  37. Meyerhenke H, Sanders P, Schulz C (2016) Partitioning (hierarchically clustered) complex networks via size-constrained graph clustering. J Heuristics 22(5):759–782

    MATH  Google Scholar 

  38. Peng X, Zhu H, Feng J et al (2019) Deep clustering with sample-assignment invariance prior. IEEE Trans Neural Netw Learn Syst 31(11):4857–4868

    MathSciNet  Google Scholar 

  39. Pourbahrami S, Hashemzadeh M (2022) A geometric-based clustering method using natural neighbors. Inf Sci 610:694–706

    Google Scholar 

  40. Ran X, Zhou X, Lei M et al (2021) A novel K-means clustering algorithm with a noise algorithm for capturing urban hotspots. Appl Sci 11(23):11,202

    Google Scholar 

  41. Rappoport N, Shamir R (2018) Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res 46(20):10,546–10,562

    Google Scholar 

  42. Rezaee MJ, Eshkevari M, Saberi M et al (2021) GBK-Means clustering algorithm: an improvement to the K-means algorithm based on the bargaining game. Knowl-Based Syst 213:106,672

    Google Scholar 

  43. Rezaei M, Fränti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28(8):2173–2186

    Google Scholar 

  44. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496

    Google Scholar 

  45. Ruspini EH, Bezdek JC, Keller JM (2019) Fuzzy clustering: a historical perspective. IEEE Comput Intell Mag 14(1):45–55

    Google Scholar 

  46. Sabin M, Gray R (1986) Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans Inf Theory 32(2):148–155

    MathSciNet  MATH  Google Scholar 

  47. Sedluk MJ, Miller JW (2000) Cluster-based data compression system and method. US Patent 6,100:825

    Google Scholar 

  48. Sharma KK, Seal A (2020) Spectral embedded generalized mean based K-nearest neighbors clustering with S-distance. Expert Syst Appl 169(4):114,326

    Google Scholar 

  49. Sheikholeslami G, Chatterjee S, Zhang A (2000) Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. VLDB J 8(3):289–304

    Google Scholar 

  50. Sheng Y, Wang M, Wu T et al (2019) Adaptive local learning regularized nonnegative matrix factorization for data clustering. Appl Intell 49(6):2151–2168

    Google Scholar 

  51. Shin KS, Jeong YS, Jeong MK (2012) A two-leveled symbiotic evolutionary algorithm for clustering problems. Appl Intell 36(4):788–799

    Google Scholar 

  52. Sieranoja S, Fränti P (2021) Adapting K-means for graph clustering. Knowl Inf Syst 8 (11):33–47

    Google Scholar 

  53. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617

    MathSciNet  MATH  Google Scholar 

  54. Sun G, Cong Y, Wang Q et al (2020) Lifelong spectral clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 5867–5874

  55. Tao X, Wang R, Chang R et al (2019) Spectral clustering algorithm using density-sensitive distance measure with global and local consistencies. Knowl-Based Syst 170:26–42

    Google Scholar 

  56. Vassilvitskii S, Arthur D (2006) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035

  57. Veenman CJ, Reinders MJT, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280

    Google Scholar 

  58. Virmajoki O, Franti P, Kaukoranta T (2002) Iterative shrinking method for generating clustering. In: Proceedings of international conference on image processing, IEEE, pp 2–10

  59. Wang H, Yang Y, Liu B (2019a) Gmc: graph-based multi-view clustering. IEEE Trans Knowl Data Eng 32(6):1116–1129

    Google Scholar 

  60. Wang H, Yang Y, Liu B et al (2019b) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019

    Google Scholar 

  61. Wang TS, Lin HT, Wang P (2017) Weighted-spectral clustering algorithm for detecting community structures in complex networks. Artif Intell Rev 47(4):463–483

    Google Scholar 

  62. Wang W, Yang J, Muntz R et al (1997) Sting: a statistical information grid approach to spatial data mining. In: The VLDB journal, pp 186–195

  63. Wang Y, Duan X, Liu X et al (2018) A spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory. Appl Soft Comput 64:59–74

    Google Scholar 

  64. Wu B, Wilamowski BM (2016) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13(4):1620–1628

    Google Scholar 

  65. Xie H, Zhang L, Lim CP et al (2019) Improving K-means clustering with enhanced firefly algorithms. Appl Soft Comput 84:105,763

    Google Scholar 

  66. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487

  67. Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218

    Google Scholar 

  68. Xu Q, Zhang Q, Liu J et al (2020) Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst Appl 151:113,367

    Google Scholar 

  69. Xu X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74

    Google Scholar 

  70. Xu X, Ding S, Wang Y et al (2021) A fast density peaks clustering algorithm with sparse search. Inf Sci 554:61–83

    MathSciNet  Google Scholar 

  71. Yin L, Li M, Chen H et al (2022) An improved hierarchical clustering algorithm based on the idea of population reproduction and fusion. Electronics 11(17):2735

    Google Scholar 

  72. Yoo HW, Jung SH, Jang DS et al (2002) Extraction of major object features using VQ clustering for content-based image retrieval. Pattern Recogn 35(5):1115–1126

    MATH  Google Scholar 

  73. Zhang C, Fu H, Hu Q et al (2018a) Generalized latent multi-view subspace clustering. IEEE Trans Pattern Anal Mach Intell 42(1):86–99

    Google Scholar 

  74. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM Sigmod Record 25(2):103–114

    Google Scholar 

  75. Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182

    Google Scholar 

  76. Zhang W, Zang W (2018) A fuzzy density peaks clustering algorithm based on improved DNA genetic algorithm and K-nearest neighbors. In: International conference on intelligent science and big data engineering, Springer, pp 476–487

  77. Zhang X, Xu Z (2015) Hesitant fuzzy agglomerative hierarchical clustering algorithms. Int J Syst Sci 46(3):562–576

    MATH  Google Scholar 

  78. Zhang Z, Liu L, Shen F et al (2018b) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782

    Google Scholar 

  79. Zhu X, Zhang S, Li Y et al (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng 31(8):1532–1543

    Google Scholar 

  80. Zhu X, Zhu Y, Zheng W (2020) Spectral rotation for deep one-step clustering. Pattern Recogn 105:107,175

    Google Scholar 

Download references

Acknowledgements

We are grateful to one of our team members Tuanyue Xiao, who has contributed substantially to the revisions, especially the additional experiments for validation, comparisons with other clustering algorithms, and analysis of computational results. This work was supported by the Special Project for Knowledge Innovation of Hubei Province under Grant No. 2022013301015175 and the National Natural Science Foundation of China under Grant No. 62202192.

Author information

Authors and Affiliations

Authors

Contributions

Xiaolu Liu contributed to conceptualization, validation, writing and editing, supervision, project administration. Wenhan Shao contributed to data analysis, investigation, and writing-original draft preparation; Jiaming Chen contributed with conceptualization, validation, investigation, visualization, writing original draft and editing. Jiaming Chen contributed to revisions, additional experiments for validation, drawing flowchart of the main algorithm, and data analysis. Zhipeng Lü contributed to conceptualization, validation, writing-draft review and editing, supervision, project administration, funding acquisition. Fred Glover contributed to conceptualization, methodology, writing-review and editing, project administration. Junwen Ding contributed to conceptualization, methodology, validation, writing-review and editing, formal analysis, supervision, and funding acquisition.

Corresponding author

Correspondence to Junwen Ding.

Ethics declarations

Ethics approval and consent to participate

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Shao, W., Chen, J. et al. Multi-start local search algorithm based on a novel objective function for clustering analysis. Appl Intell 53, 20346–20364 (2023). https://doi.org/10.1007/s10489-023-04580-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04580-x

Keywords

Navigation