Skip to main content
Log in

Swarm based automatic clustering using nature inspired Emperor Penguins Colony algorithm

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Nature acts as a source of concepts, mechanisms, and principles for designing artificial computing systems to deal with complex computational problems. Most heuristic and metaheuristic algorithms are taken from the behavior of biological systems or physical systems in nature. Clustering is the process of grouping a set of data and putting it in a class of similar examples. Since the clustering problem is an NP-hard problem, using metaheuristics can be an appropriate tool to deal with these issues. Indeed, clustering is a special case of an optimization problem. In classic clustering, knowing the number of clusters is required before clustering. This paper presents an algorithm that requires no prior knowledge to classify the data. In this paper, we proposed a swarm-based Emperor Penguins Colony (EPC) algorithm to solve both classic and automatic clustering problems. The proposed approach is compared with six state-of-the-art, popular, and improved nature-inspired algorithms, a partitioning-based heuristic algorithm, and a hierarchical clustering method on ten real-world datasets. The results show that classic and automatic clustering using the EPC algorithm has better performance in comparison with other competing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  • Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. pp 94–105

  • Aguiar C, Leite D (2020) Unsupervised fuzzy eIX: Evolving internal-external fuzzy clustering. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS). pp 1–8

  • Alghamdi SA (2020) Emperor based resource allocation for D2D communication and QoF based routing over cellular V2X in urban environment (ERA-D2Q). Wireless Netw 26(5):3419–3437

    Article  Google Scholar 

  • Aliniya Z, Mirroshandel SA (2019) A novel combinatorial merge-split approach for automatic clustering using imperialist competitive algorithm. Expert Syst Appl 117:243–266

    Article  Google Scholar 

  • Angelin B, Geetha A (2021) A roc curve based K-Means clustering for Outlier Detection using Dragon fly optimization. Turkish J Comput Math Educ (TURCOMAT) 12(9):467–476

    Google Scholar 

  • Azarakhsh J, Raisi Z (2019) Automatic clustering using metaheuristic algorithms for content based image retrieval. In: Fundamental Research in Electrical Engineering The Selected Papers of The First International Conference on Fundamental Research in Electrical Engineering. Springer, Berlin, pp 83–99

    Chapter  Google Scholar 

  • Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data: Recent advances in clustering. Springer, Berlin, pp 25–71

    Chapter  Google Scholar 

  • Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79

    Article  Google Scholar 

  • Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Statistics-theory Methods 3(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Chaturvedi A, Green PE, Caroll JD (2001) K-modes clustering. J Classif 18:35–55

    Article  MathSciNet  MATH  Google Scholar 

  • Chen JX, Gong YJ, Chen WN, Li M, Zhang J (2019) Elastic differential evolution for automatic data clustering. IEEE Trans cybernetics 51(8):4134–4147

    Article  Google Scholar 

  • Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. IEEE Trans neural networks Learn Syst 30(4):985–999

    Article  Google Scholar 

  • Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7:205–220

    Article  MathSciNet  Google Scholar 

  • Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Krogan NJ (2007) Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature 446(7137):806–810

    Article  Google Scholar 

  • Das S, Abraham A, Konar A (2007) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst man cybernetics-Part A: Syst Hum 38(1):218–237

    Article  Google Scholar 

  • Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227

    Article  Google Scholar 

  • Defays D (1977) An efficient algorithm for a complete link method. Comput J 20(4):364–366

    Article  MathSciNet  MATH  Google Scholar 

  • Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18

    Article  Google Scholar 

  • Dey A, Dey S, Bhattacharyya S, Snasel V, Hassanien AE (2018) Simulated annealing based quantum inspired automatic clustering technique. In: The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018). pp 73–81

  • Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175

    Article  MATH  Google Scholar 

  • Dua D, Karra-Taniskidou E (2017) UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA:University of California, School of Information and Computer Science.

  • Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J cybernetics 4(1):95–104

    Article  MathSciNet  MATH  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Inkdd 96(34):226–231

    Google Scholar 

  • Ezugwu AE (2020) Nature-inspired metaheuristic techniques for automatic clustering: a survey and performance study. SN Appl Sci 2:1–57

    Article  Google Scholar 

  • Flasiński M (2016) Pattern recognition and cluster analysis. Introduction to Artificial Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-319-40022-8_10

    Chapter  MATH  Google Scholar 

  • Garai G, Chaudhuri BB (2004) A novel genetic algorithm for automatic clustering. Pattern Recognit Lett 25(2):173–187

    Article  Google Scholar 

  • Garcia-Lamont F, Cervantes J, López A, Rodriguez L (2018) Segmentation of images by color features: a survey. Neurocomputing 292:1–27

    Article  Google Scholar 

  • Gharehchopogh FS, Abdollahzadeh B, Khodadadi N, Mirjalili S (2023) Metaheuristics for clustering problems. In: Comprehensive Metaheuristics. Academic Press, Rome, pp 379–392

    Chapter  Google Scholar 

  • Gower JC, Ross GJ (1969) Minimum spanning trees and single linkage cluster analysis. J Roy Stat Soc: Ser C (Appl Stat) 18(1):54–64

    MathSciNet  Google Scholar 

  • Harifi S, Byagowi E, Khalilian M (2017) Comparative study of apache spark MLlib clustering algorithms. In: Data mining and big data: second international conference, DMBD 2017, Fukuoka, Japan, July 27–August 1, 2017, Proceedings 2. Springer International Publishing, pp 61–73

  • Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2019) Emperor Penguins colony: a new metaheuristic algorithm for optimization. Evol Intel 12:211–226

    Article  Google Scholar 

  • Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2020a) Optimizing a neuro-fuzzy system based on nature-inspired emperor penguins colony optimization algorithm. IEEE Trans Fuzzy Syst 28(6):1110–1124

    Article  Google Scholar 

  • Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2020b) Using Metaheuristic Algorithms to improve k-Means clustering: a comparative study. Rev d’Intelligence Artif 34(3):297–305

    Google Scholar 

  • Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2021) Optimization in solving inventory control problem using nature inspired Emperor Penguins colony algorithm. J Intell Manuf 32:1361–1375

    Article  Google Scholar 

  • Hyde R, Angelov P, MacKenzie AR (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382:96–114

    Article  Google Scholar 

  • Ikotun AM, Almutari MS, Ezugwu AE (2021) K-means-based nature-inspired metaheuristic algorithms for automatic data clustering problems: recent advances and future directions. Appl Sci 11(23):11246

    Article  Google Scholar 

  • Jambudi T, Gandhi S (2019) A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery. In: Information and communication technology for intelligent systems: proceedings of ICTIS 2018, Volume 2. pp 457–467

  • José-García A, Gómez-Flores W (2016) Automatic clustering using nature-inspired metaheuristics: a survey. Appl Soft Comput 41:192–213

    Article  Google Scholar 

  • Kangin D, Angelov P (2015) Evolving clustering, classification and regression with TEDA. In: 2015 International Joint Conference on Neural Networks (IJCNN). pp 1–8

  • Kapoor S, Zeya I, Singhal C, Nanda SJ (2017) A grey wolf optimizer based automatic clustering algorithm for satellite image segmentation. Procedia Comput Sci 115:415–422

    Article  Google Scholar 

  • Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, Rome

    MATH  Google Scholar 

  • Kettani O, Ramdani F, Tadili B (2015) AK-means: an automatic clustering algorithm based on K-means. J Adv Comput Sci Technol 4(2):231

    Article  Google Scholar 

  • Kovács F, Legány C, Babos A (2005) Cluster validity measurement techniques. In: 6th International symposium of hungarian researchers on computational intelligence

  • Kuo RJ, Huang YD, Lin CC, Wu YH, Zulvia FE (2014) Automatic kernel clustering with bee colony optimization algorithm. Inf Sci 283:107–122

    Article  Google Scholar 

  • Lemos A, Leite D, Maciel L, Ballini R, Caminhas W, Gomide F (2012) Evolving fuzzy linear regression tree approach for forecasting sales volume of petroleum products. In: 2012 IEEE International Conference on Fuzzy Systems. pp 1–8

  • Lin NP, Chang CI, Chueh HE, Chen HJ, Hao WH (2008) A deflected grid-based algorithm for clustering analysis. WSEAS Trans Computers 7(4):125–132

    Google Scholar 

  • Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: 2010 IEEE international conference on data mining. pp 911–916

  • Liu Y, Wu X, Shen Y (2011) Automatic clustering using genetic algorithms. Appl Math Comput 218(4):1267–1279

    MathSciNet  MATH  Google Scholar 

  • Mattos CL, Barreto GA, Horstkemper D, Hellingrath B (2017) Metaheuristic optimization for automatic clustering of customer-oriented supply chain data. In: 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM). pp 1–8

  • Mendenhall W, Beaver RJ, Beaver BM (2012) Introduction to probability and statistics. Cengage Learning, Chennai

    MATH  Google Scholar 

  • Nguyen-Trang T, Nguyen-Thoi T, Nguyen-Thi KN, Vo-Van T (2023) Balance-driven automatic clustering for probability density functions using metaheuristic optimization. Int J Mach Learn Cybernet 14:1063–1078

    Article  Google Scholar 

  • Pacheco TM, Gonçalves LB, Ströele V, Soares SSR (2018) An ant colony optimization for automatic data clustering problem. In: 2018 IEEE Congress on evolutionary computation (CEC). pp 1–8

  • Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recogn 37(3):487–501

    Article  MATH  Google Scholar 

  • Pan SM, Cheng KS (2007) Evolution-based tabu search approach to automatic clustering. IEEE Trans Syst Man Cybernetics Part C (Applications Reviews) 37(5):827–838

    Article  Google Scholar 

  • Pelleg D, Moore A (1999) Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. pp 277–281

  • Pelleg D, Moore AW (2000) X-means: Extending k-means with efficient estimation of the number of clusters. In: Icml. pp 727–734

  • Phillips SJ (2002) Acceleration of k-means and related clustering algorithms. In: Algorithm Engineering and Experiments: 4th International Workshop, ALENEX 2002 San Francisco, CA, USA, pp 166–177

  • Said AB, Hadjidj R, Foufou S (2017) Cluster validity index based on Jeffrey divergence. Pattern Anal Appl 20:21–31

    Article  MathSciNet  Google Scholar 

  • Saxena A, Mukesh P, Akshansh G, Neha B, Om-Prakash P, Aruna T, Meng JE, Weiping D, Chin-Teng L (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Article  Google Scholar 

  • Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  • Sharma M, Chhabra JK (2019) Sustainable automatic data clustering using hybrid PSO algorithm with mutation. Sustainable Computing: Informatics and Systems 23:144–157

    Google Scholar 

  • Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput 14:194–209

    Article  Google Scholar 

  • Starczewski A (2017) A new validity index for crisp clusters. Pattern Anal Appl 20:687–700

    Article  MathSciNet  Google Scholar 

  • Steinbach M, Karypis G, Kumar V (2000) A Comparison of Document Clustering Techniques, Technical Report; 00-034, University of Minnesota Digital Conservancy, 2000, 1–22. Available online: https://hdl.handle.net/11299/215421.

  • Tseng LY, Yang SB (2001) A genetic approach to the automatic clustering problem. Pattern Recogn 34(2):415–424

    Article  MATH  Google Scholar 

  • Wallace CS, Dowe DL (1994) Intrinsic classification by MML-the Snob program. In: Proceedings of the 7th Australian Joint Conference on Artificial Intelligence. p 37

  • Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In Vldb 97:186–195

    Google Scholar 

  • Welch WJ (1982) Algorithmic complexity: three NP-hard problems in computational statistics. J Stat Comput Simul 15(1):17–25

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang B, Hsu M, Dayal U (2001) K-harmonic means-a spatial clustering algorithm with boosting. In: Temporal, spatial, and spatio-temporal data mining: first international Workshop, TSDM 2000 Lyon, France, September 12, 2000 Revised Papers, pp 31–45

  • Zhao Q, Fränti P (2014) WB-index: a sum-of-squares based index for cluster validity. Data Knowl Eng 92:77–89

    Article  Google Scholar 

  • Zhao WL, Deng CH, Ngo CW (2018) k-means: a revisit. Neurocomputing 291:195–206

    Article  Google Scholar 

  • Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl Based Syst 163:546–557

    Article  Google Scholar 

  • Zhou Q, Hao JK, Wu Q (2021) Responsive threshold search based memetic algorithm for balanced minimum sum-of-squares clustering. Inf Sci 569:184–204

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madjid Khalilian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harifi, S., Khalilian, M. & Mohammadzadeh, J. Swarm based automatic clustering using nature inspired Emperor Penguins Colony algorithm. Evolving Systems 14, 1083–1099 (2023). https://doi.org/10.1007/s12530-023-09507-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-023-09507-y

Keywords

Navigation