Skip to main content
Log in

Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Clustering algorithms, a fundamental base for data mining procedures and learning techniques, suffer from the lack of efficient methods for determining the optimal number of clusters to be found in an arbitrary dataset. The few methods existing in the literature always use some sort of evolutionary algorithm having a cluster validation index as its objective function. In this article, a new evolutionary algorithm, based on a hybrid model of global and local heuristic search, is proposed for the same task, and some experimentation is done with different datasets and indexes. Due to its design, independent of any clustering procedure, it is applicable to virtually any clustering method like the widely used \(k\)-means algorithm. Moreover, the use of non-parametric statistical tests over the experimental results, clearly show the proposed algorithm to be more efficient than other evolutionary algorithms currently used for the same task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Arabas J, Michalewicz Z, Mulawka J (1994) GAVaPS-a genetic algorithm with varying population size. In: Proceedings of the first IEEE conference on evolutionary computation, IEEE world congress on computational intelligence. IEEE, pp 73–78

  • Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on k-means algorithm for optimal clustering in rn. Inf Sci 146(1):221–237

    Article  MathSciNet  MATH  Google Scholar 

  • Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208

    Article  MATH  Google Scholar 

  • Bellis MA, Jarman I, Downing J, Perkins C, Beynon C, Hughes K, Lisboa P (2012) Using clustering techniques to identify localities with multiple health and social needs. Health Place 18(2):138–143

    Article  Google Scholar 

  • Cao J, Wu Z, Wu J, Liu W (2012) Towards information-theoretic k-means clustering for image indexing. Signal Process 39(2):1–12

    Google Scholar 

  • Chang L, Duarte MM, Sucar L, Morales EF (2012) A bayesian approach for object classification based on clusters of sift local features. Expert Syst Appl 39(2):1679–1686

    Article  Google Scholar 

  • Cortina-Borja M (2012) Handbook of parametric and nonparametric statistical procedures. J R Stat Soc: Ser A (Stat Soc) 175(3):829–829

    Article  Google Scholar 

  • Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. Syst Man Cybern Part A: Syst Hum IEEE Trans 38(1):218–237

    Article  Google Scholar 

  • Davies David L, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intel 2:224–227

    Article  Google Scholar 

  • Franek L, Abdala D, Vega-Pons S, Jiang X (2011) Image segmentation fusion using general ensemble clustering methods. Comput Vis-ACCV 2010:373–384

    Google Scholar 

  • Garcia S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms? behaviour: a case study on the cec 2005 special session on real parameter optimization. J Heuristics 15(6):617–644

    Article  MATH  Google Scholar 

  • Gordon AD (1999) Classification. Chapman & Hall/CRC Monographs on Statistics & Applied Probability

  • Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit 41(9):2742–2756

    Article  MATH  Google Scholar 

  • Jarboui B, Cheikh M, Siarry P, Rebai A (2007) Combinatorial particle swarm optimization (cpso) for partitional clustering problem. Appl Math Comput 192(2):337–345

    Article  MathSciNet  MATH  Google Scholar 

  • Kanade PM, Hall LO (2003) Fuzzy ants as a clustering concept. In: Fuzzy Information Processing Society, 2003. NAFIPS 2003. 22nd International Conference of the North American, pp 227–232. IEEE

  • Kwedlo W (2011) A clustering method combining differential evolution with the \(k\)-means algorithm. Pattern Recognit Lett 32(12):1613–1621

  • Lee W-P, Chen SW (2010) Automatic clustering with differential evolution using a cluster number oscillation method. Intelligent Systems and Applications pp 218–237

  • Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004) Fgka: a fast genetic k-means clustering algorithm. In: Proceedings of the 2004 ACM symposium on Applied computing, pp 622–623. ACM

  • Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern 24(12):1650–1654

    Article  Google Scholar 

  • Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179

    Article  Google Scholar 

  • Omran M, Engelbrecht AP, Salman A (2005) Particle swarm optimization method for image clustering. Int J Pattern Recognit Artif Intel 19(03):297–321

    Article  Google Scholar 

  • Parsopoulos KE (2009) Cooperative micro-differential evolution for high-dimensional problems. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp 531–538. ACM

  • Saha I, Maulik U, Bandyopadhyay S (2009) A new differential evolution based fuzzy clustering for automatic cluster evolution. Advance Computing Conference, 2009. IACC 2009. IEEE International pp 706–711

  • Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359

    Article  MathSciNet  MATH  Google Scholar 

  • Sugar CA, James GM (2003) Finding the number of clusters in a dataset. J Am Stat Assoc 98(463):750–763

    Article  MathSciNet  MATH  Google Scholar 

  • Villa A, Chanussot J, Benediktsson JA, Jutten C, Dambreville R (2012) Unsupervised methods for the classification of hyperspectral images with low spatial resolution. Pattern Recognit 46(6):1556–1568

    Article  Google Scholar 

  • Witt C (2008) Population size versus runtime of a simple evolutionary algorithm. Theor Comput Sci 403(1):104–120

    Article  MathSciNet  MATH  Google Scholar 

  • Xie XL, Beni GA (1991) Validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(4):841–847

    Article  Google Scholar 

  • Yan H, Chen K, Liu L, Yi Z (2010) Scale: a scalable framework for efficiently clustering transactional data. Data Min Knowl Discov 20(1):1–27

    Article  MathSciNet  Google Scholar 

  • Yang Y, Liao Y (2011) A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst Appl 38(9):1311–1320

    MathSciNet  Google Scholar 

Download references

Acknowledgments

Mexican authors wish to express their gratitude to SIP-IPN, CONACyT and ICyT-DF for their economic support of this research, particularly, through grants SIP-20130932 and ICyT-PICCO-10-113. Spanish Ministry & Economy competitiveness and FEDER contract roadMe (http://roadme.lcc.uma.es): Fundamentals for Real World Applications of Metaheuristics: The Vehicular Network Case TIN2011-28194 (2012–2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Arellano-Verdejo.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arellano-Verdejo, J., Alba, E. & Godoy-Calderon, S. Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA. Soft Comput 20, 895–905 (2016). https://doi.org/10.1007/s00500-014-1548-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-014-1548-6

Keywords

Navigation