Soft Computing

, Volume 23, Issue 1, pp 145–162 | Cite as

A new swarm-based efficient data clustering approach using KHM and fuzzy logic

  • Yogesh Gupta
  • Ashish SainiEmail author
Methodologies and Application


Clustering is a useful technique to create different groups of objects on the basis of their nature. Objects of same group are of similar in nature and differ to the objects of other groups. Clustering has proved its importance in various fields such as information retrieval, bioinformatics, image processing and many others. In this paper, particle swarm optimization (PSO) technique is used with K-harmonic means (KHM) for clustering. PSO overcomes the limitations of KHM like local optimum problem. Fuzzy logic is also employed in this paper to make PSO adaptive in nature by controlling various parameters. The performance of the proposed approach is validated on five benchmark datasets in terms of inter-clustering distance, intra-clustering distance, F-measure and fitness value. The results of proposed approach are compared with well-known conventional clustering techniques such as K-means, KHM and fuzzy C-means along with different state-of-the-art clustering approaches. Two text-based benchmark datasets such as CACM and CISI are also used to test the performance of all clustering approaches. The proposed clustering approach gives better results in comparison with other clustering approaches as clear from both the experimental and statistical analyses.


Clustering Fuzzy logic Particle swarm optimization K-harmonic means F-measure 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Abraham A, Das S, Konar A (2006) Document clustering using differential evolution. In: Proceedings of the 2006 IEEE congress on evolutionary computation (CEC 2006), Vancouver, pp 1784–1791Google Scholar
  2. Alguliev R, Aliguliyev R (2005) Fast genetic algorithm for clustering of text documents. Artif Intell 3:698–707Google Scholar
  3. Aliguliyev R (2006) A clustering method for document collections and algorithm for estimation the optimal number of clusters. Artif Intell 4:651–659Google Scholar
  4. Aupetit S, Monmarché N, Slimane M (2007) Hidden Markov models training by a particle swarm optimization algorithm. J Math Model Algorithms 6:175–193MathSciNetCrossRefzbMATHGoogle Scholar
  5. Azzag H, Venturini G, Oliver A, Guinot C (2007) A hierarchical ant based clustering algorithm and its use in three real-world applications. Eur J Oper Res 179:906–922CrossRefzbMATHGoogle Scholar
  6. Bergh F, Engelbrecht A (2001) Effect of swarm size on cooperative particle swarm optimizers. In: Proceedings of genetic evolutionary computation conference (GECCO-2001), San Francisco, pp 892–899Google Scholar
  7. Bezdek J (1974) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, IthacaGoogle Scholar
  8. Chang P, Liu C, Fan C (2009) Data clustering and fuzzy neural network for sales forecasting: a case study in printed circuit board industry. Knowl-Based Syst 22(5):344–355CrossRefGoogle Scholar
  9. Cui X, Potok T, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings of the 2005 IEEE swarm intelligence symposium, Pasadena, pp 186–191Google Scholar
  10. Das S, Abraham A, Konar A (2008a) Automatic clustering with a multi-elitist particle swarm optimization algorithm. Pattern Recogn Lett 29:688–699CrossRefGoogle Scholar
  11. Das S, Abraham A, Konar A (2008b) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern Part A Syst Hum 38:218–237CrossRefGoogle Scholar
  12. ElAlami M (2011) Supporting image retrieval framework with rule base system. Knowl-Based Syst 24(2):331–340CrossRefGoogle Scholar
  13. Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefzbMATHGoogle Scholar
  14. Garai G, Chaudhuri B (2004) A novel genetic algorithm for automatic clustering. Pattern Recogn Lett 25:173–187CrossRefGoogle Scholar
  15. Gath I, Geva G (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11:773–781CrossRefzbMATHGoogle Scholar
  16. Güngör Z, Ünler A (2008) K-harmonic means data clustering with tabu search method. Appl Math Model 32:1115–1125CrossRefzbMATHGoogle Scholar
  17. Gupta Y, Saini A (2015) An efficient clustering approach based on hybridization of PSO, fuzzy logic and K-harmonic means. In: IEEE workshop on computational intelligence: theories, applications and future directions (WCI). IIT KanpurGoogle Scholar
  18. Hadavandi E, Shavandi H, Ghanbari A (2010) Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl-Based Syst 23(8):800–808CrossRefGoogle Scholar
  19. Hammerly G, Elkan C (2002) Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the 11th international conference on information and knowledge management, pp 600–607Google Scholar
  20. Han J, Kamber M, Pei P (2006) Data mining: concepts and techniques. Morgan Kaufmann, Los AltoszbMATHGoogle Scholar
  21. Hartmann V (2005) Ant colony optimization and swarm intelligence: evolving agent swarms for clustering and sorting. In: Proceedings of the 2005 conference on genetic and evolutionary computation (GECCO’05), Washington, DC, pp 217–224Google Scholar
  22. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31:264–323CrossRefGoogle Scholar
  23. Kalyani S, Swarup K (2011) Particle swarm optimization based K-means clustering approach for security assessment in power systems. Expert Syst Appl 38(9):10839–10846CrossRefGoogle Scholar
  24. Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. J Comput 32(8):68–75Google Scholar
  25. Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis, vol 39. Wiley, LondonCrossRefzbMATHGoogle Scholar
  26. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE international conference on neural networks, Englewood Cliffs, pp 1942–1948Google Scholar
  27. Khan M, Khor S (2004) Web document clustering using a hybrid neural network. Appl Soft Comput 4:423–432CrossRefGoogle Scholar
  28. Khy S, Ishikawa Y, Kitagawa H (2008) A novelty-based clustering method for on-line documents. World Wide Web 11:1–37CrossRefGoogle Scholar
  29. Laszlo M, Mukherjee S (2006) A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans Pattern Anal Mach Intell 28:533–543CrossRefGoogle Scholar
  30. Laszlo M, Mukherjee S (2007) A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recogn Lett 28:2359–2366CrossRefGoogle Scholar
  31. Li Y, Chung S, Holt J (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404CrossRefGoogle Scholar
  32. Liao C, Tseng C, Luarn P (2007) A discrete version of particle swarm optimization for flowshop scheduling problems. Comput Oper Res 34:3099–3111CrossRefzbMATHGoogle Scholar
  33. Lin H, Yang F, Kao Y (2005) An efficient GA-based clustering technique. Tamkang J Sci Eng 8(2):113–122Google Scholar
  34. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley symposium mathematical, statistic and probability, BerkeleyGoogle Scholar
  35. Martin-Guerrero J, Palomares A, Balaguer-Ballester E, Soria-Olivas E, Gomez-Sanchis J, Soriano-Asensi A (2006) Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms. Expert Syst Appl 30(2):299–312CrossRefGoogle Scholar
  36. Nock R, Nielsen F (2006) On weighting clustering. IEEE Trans Pattern Anal Mach Intell 28:1223–1235CrossRefGoogle Scholar
  37. Ponomarenko J, Merkulova T, Orlova G, Fokin O, Gorshkov E, Ponomarenko M (2002) Mining DNA sequences to predict sites which mutations cause genetic diseases. Knowl-Based Syst 15(4):225–233CrossRefGoogle Scholar
  38. Sander J, Ester M, Kriegel M, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Disc 2(2):169–194CrossRefGoogle Scholar
  39. Sebiskveradze D, Vrabie V, Gobinet C, Durlach A, Bernard P, Ly E, Manfait M, Jeannesson P, Piot O (2011) Automation of an algorithm based on fuzzy clustering for analyzing tumoral heterogeneity in human skin carcinoma tissue sections. Lab Invest 91(5):799–811CrossRefGoogle Scholar
  40. Shi J, Luo Z (2010) Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Comput Biol Med 40(8):723–732CrossRefGoogle Scholar
  41. Subramanyam V, Sett S (2008) Knowledge-based image retrieval system. Knowl-Based Syst 21(2):89–100CrossRefGoogle Scholar
  42. Suganthan P (1999) Particle swarm optimizer with neighborhood operator. In: Proceedings of IEEE international conference on evolutionary computation, vol 3, pp 1958–1962Google Scholar
  43. Thakare A, Hanchate R (2014) Introducing hybrid model for data clustering using K-harmonic means and Gravitational search algorithms. Int J Comput Appl 88(17):18–22Google Scholar
  44. Verma N, Roy A (2014) Self-optimal clustering technique using optimized threshold function. IEEE Syst J 8(4):1213–1226CrossRefGoogle Scholar
  45. Vesanto W, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600CrossRefGoogle Scholar
  46. Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In Proceedings of 23rd international conference on very large databases, Greece, pp 186–195Google Scholar
  47. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRefGoogle Scholar
  48. Yang F, Sun T, Zhang C (2009) Efficient hybrid data clustering method based on K-harmonic means and particle swarm optimization. Expert Syst Appl 36(6):9847–9852CrossRefGoogle Scholar
  49. Zadeh L (1965) Fuzzy sets. Inf Control 8:338–353CrossRefzbMATHGoogle Scholar
  50. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD conference of management of data, Canada, pp 103–114Google Scholar
  51. Zhang B, Hsu M, Dayal U (1999) K-harmonic means—a data clustering algorithm. Technical Report HPL-1999-124, Hewlett-Packard LaboratoriesGoogle Scholar
  52. Zhang B, Hsu M, Dayal U (2000) K-harmonic means. In: International workshop on temporal, spatial and spatio-temporal data mining. TSDM 2000, LyonGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringManipal UniversityJaipurIndia
  2. 2.Department of Electrical EngineeringDayalbagh Educational InstituteAgraIndia

Personalised recommendations