A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering

Mustafi, D.; Sahoo, G.

doi:10.1007/s00500-018-3289-4

A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering

Methodologies and Application
Published: 07 June 2018

Volume 23, pages 6361–6378, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

745 Accesses
33 Citations
Explore all metrics

Abstract

In this paper, we propose a heuristic-based algorithm to improve the initial seeding of the k-means clustering algorithm. The proposed algorithm primarily aims to improve the initial choice of the centroids used by the k-means algorithm and also ensure that the requisite number of clusters is always returned in every run of the algorithm. Thus, the use of the proposed algorithm significantly reduces the possibility of k-means converging to a locally optimal solution. The paper explores the genetic algorithm framework to obtain the original seed points and couples this with the use of the differential evolution heuristic to obtain the requisite number of clusters. We have examined the performance of the proposed algorithm in the case of clustering text documents as such corpus often have significantly large number of data points and also require the formation of a large number of clusters. The results obtained have been compared with basic implementations of the k-means algorithm using standard parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

A review on genetic algorithm: past, present, and future

Article 31 October 2020

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Particle swarm optimization algorithm: an overview

Article 17 January 2017

References

Abraham A, Das S, Konar A (2006) Document clustering using differential evolution. In: Proceedings of IEEE international conference on evolutionary computation, IEEE, pp 1784–1791
Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC Press, Boca Raton
Book MATH Google Scholar
Al-Shboul B, Myaeng SH (2006) Initializing k-means using genetic algorithms. PhD thesis, University of Jordan
Alshamiri AK, Singh A, Surampudi BR (2016) Artificial bee colony algorithm for clustering: an extreme learning approach. Soft Comput 20(8):3163–3176
Article Google Scholar
Arellano-Verdejo J, Alba E, Godoy-Calderon S (2016) Efficiently finding the optimum number of clusters in a dataset with a new hybrid differential evolution algorithm: DELA. Soft Comput 20(3):895–905
Article Google Scholar
Babu GP, Murty MN (1993) A near-optimal initial seed value selection in k-means means algorithm using a genetic algorithm. Pattern Recogn Lett 14(10):763–769
Article MATH Google Scholar
Banerjee S, Choudhary A, Pal S (2015) Empirical evaluation of k-means, bisecting k-means, fuzzy c-means and genetic k-means clustering algorithms. In: 2015 IEEE international WIE conference on electrical and computer engineering (WIECON-ECE), pp 168–172
Bettoumi S, Jlassi C, Arous N (2017) Collaborative multi-view k-means clustering. Soft Comput. https://doi.org/10.1007/s00500-017-2801-6
Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer, Berlin
MATH Google Scholar
Bickel S, Scheffer T (2004) Multi-view clustering. ICDM 4:19–26
Google Scholar
Castells P, Fernandez M, Vallet D (2007) An adaptation of the vector-space model for ontology-based information retrieval. IEEE Trans Knowl Data Eng 19(2):261–272
Article Google Scholar
Celebi ME (2015) Partitional clustering algorithms. Springer, Berlin
Book MATH Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210
Article Google Scholar
Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern Part A: Syst Hum 38(1):218–237
Article Google Scholar
De Amorim RC, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recogn 45(3):1061–1075
Article Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Dunham MH (2006) Data mining: introductory and advanced topics. Pearson Education India, London
Google Scholar
Feoktistov V (2006) Differential evolution. Springer, New York
MATH Google Scholar
Freitas AA (2013) Data mining and knowledge discovery with evolutionary algorithms. Springer, New York
Google Scholar
Gavish M, Donoho DL (2014) The optimal hard threshold for singular values is \(\frac{4}{\sqrt{3}}\). IEEE Trans Inf Theory 60(8):5040–5053
Article MATH MathSciNet Google Scholar
Ghosh S, Dubey SK (2013) Comparative analysis of k-means and fuzzy c-means algorithms. Int J Adv Comput Sci Appl 4(4):35–39
Google Scholar
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York
MATH Google Scholar
Goyal MM, Agrawal N, Sarma MK, Kalita NJ (2015) Comparison clustering using cosine and fuzzy set based similarity measures of text documents. arXiv preprint arXiv:1505.00168
Grossman DA, Frieder O (2012) Information retrieval: algorithms and heuristics, vol 15. Springer, New York
MATH Google Scholar
Hamerly G, Drake J (2015) Accelerating loydś algorithm for k-means clustering. In: Emre Celebi M (ed) Partitional Clustering algorithms. Springer, New York, pp 41–78
Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184
Article MathSciNet Google Scholar
Hatamlou A, Abdullah S, Nezamabadi-Pour H (2012) A combined approach for clustering based on k-means and gravitational search algorithms. Swarm Evol Comput 6:47–52
Article Google Scholar
Holland JH (1975) Adaptation in natural and artificial systems: an introductory analysis with application to biology, control, and artificial intelligence. University of Michigan Press, Ann Arbor, pp 439–444
Google Scholar
Hornik K, Feinerer I, Kober M, Buchta C (2012) Spherical k-means clustering. J Stat Softw 50(10):1–22
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Article Google Scholar
Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11(1):652–657
Article Google Scholar
Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B (Cybernetics) 29(3):433–439
Article Google Scholar
Li CS (2011) Cluster center initialization method for k-means algorithm over data sets with two clusters. Procedia Eng 24:324–328
Article Google Scholar
Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Book MATH Google Scholar
Mustafi D, Sahoo G, Mustafi A (2016a) An improved heuristic k-means clustering method using genetic algorithm based initialization. In: Advances in computational intelligence: proceedings of international conference on computational intelligence 2015. Springer, New York, pp 123–132
Mustafi D, Sahoo G, Mustafi A (2016b) A multi criteria document clustering approach using genetic algorithm. In: Computational intelligence in data mining, vol 1. Springer, New York, pp 237–247
Nasir JA, Varlamis I, Karim A, Tsatsaronis G (2013) Semantic smoothing for text clustering. Knowl-Based Syst 54:216–229
Article Google Scholar
Ning Y, Zhu X, Zhu S, Zhang Y (2015) Surface emg decomposition based on k-means clustering and convolution kernel compensation. IEEE J Biomed Health Inform 19(2):471–477
Article Google Scholar
Ozturk C, Hancer E, Karaboga D (2015) Improved clustering criterion for image clustering with artificial bee colony algorithm. Pattern Anal Appl 18(3):587–599
Article MathSciNet Google Scholar
Peiravi A, Mashhadi HR, Hamed Javadi S (2013) An optimal energy-efficient clustering method in wireless sensor networks using multi-objective genetic algorithm. Int J Commun Syst 26(1):114–126
Article Google Scholar
Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn Lett 20(10):1027–1040
Article Google Scholar
Pujari AK (2001) Data mining techniques. Universities Press, Hyderabad
Google Scholar
Qu BY, Suganthan PN, Liang JJ (2012) Differential evolution with neighborhood mutation for multimodal optimization. IEEE Trans Evol Comput 16(5):601–614
Article Google Scholar
Romero FP, Peralta A, Soto A, Olivas JA, Serrano-Guerrero J (2010) Fuzzy optimized self-organizing maps and their application to document clustering. Soft Comput 14(8):857–867
Article Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Article MATH Google Scholar
Savaresi SM, Boley DL (2000) Bisecting k-means and PDDP: a comparative analysis. Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza L. da Vinci, 32, 20133, Milan, ITALY
Sharma R, Verma K (2017) Enhanced shared nearest neighbor clustering approach using fuzzy for teleconnection analysis. Soft Comput. https://doi.org/10.1007/s00500-017-2767-4
Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3):491–504
Article Google Scholar
Singh VK, Tiwari N, Garg S (2011) Document clustering using k-means, heuristic k-means and fuzzy c-means. In: 2011 international conference on computational intelligence and communication networks (CICN). pp 297–301
Sivanandam S, Deepa S (2007) Introduction to genetic algorithms. Springer, New York
MATH Google Scholar
Song W, Park SC (2009) Genetic algorithm for text clustering based on latent semantic indexing. Comput Math Appl 57(11):1901–1907
Article MATH Google Scholar
Tsapanos N, Tefas A, Nikolaidis N, Pitas I (2015) A distributed framework for trimmed kernel k-means clustering. Pattern Recogn 48(8):2685–2698
Article MATH Google Scholar
Wang J, Wang J, Ke Q, Zeng G, Li S (2015) Fast approximate k-means via cluster closures. In: Baughman AK, Gao J, Pan J-Y, Petrushin VA (eds) Multimedia data mining and analytics. Springer, New York, pp 373–395
Google Scholar
Xu KS, Kliger M, Hero Iii AO (2014) Adaptive evolutionary clustering. Data Min Knowl Disc 28(2):304–336
Article MathSciNet MATH Google Scholar
Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, Birla Institute of Technology, Mesra, Ranchi, India
D. Mustafi & G. Sahoo

Authors

D. Mustafi
View author publications
You can also search for this author in PubMed Google Scholar
G. Sahoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Mustafi.

Ethics declarations

Conflict of interest

The authors hereby declare that they have no conflict of interest.

Human participants or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mustafi, D., Sahoo, G. A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering. Soft Comput 23, 6361–6378 (2019). https://doi.org/10.1007/s00500-018-3289-4

Download citation

Published: 07 June 2018
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s00500-018-3289-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

A Comprehensive Survey of Clustering Algorithms

Particle swarm optimization algorithm: an overview

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human participants or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

A Comprehensive Survey of Clustering Algorithms

Particle swarm optimization algorithm: an overview

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human participants or animals

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation