# A new swarm-based efficient data clustering approach using KHM and fuzzy logic

- 40 Downloads

## Abstract

Clustering is a useful technique to create different groups of objects on the basis of their nature. Objects of same group are of similar in nature and differ to the objects of other groups. Clustering has proved its importance in various fields such as information retrieval, bioinformatics, image processing and many others. In this paper, particle swarm optimization (PSO) technique is used with *K*-harmonic means (KHM) for clustering. PSO overcomes the limitations of KHM like local optimum problem. Fuzzy logic is also employed in this paper to make PSO adaptive in nature by controlling various parameters. The performance of the proposed approach is validated on *five* benchmark datasets in terms of inter-clustering distance, intra-clustering distance, *F*-measure and fitness value. The results of proposed approach are compared with well-known conventional clustering techniques such as *K*-means, KHM and fuzzy *C*-means along with different state-of-the-art clustering approaches. Two text-based benchmark datasets such as CACM and CISI are also used to test the performance of all clustering approaches. The proposed clustering approach gives better results in comparison with other clustering approaches as clear from both the experimental and statistical analyses.

## Keywords

Clustering Fuzzy logic Particle swarm optimization K-harmonic means*F*-measure

## Notes

### Compliance with ethical standards

### Conflict of interest

The authors declare that they have no conflict of interest.

### Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

## References

- Abraham A, Das S, Konar A (2006) Document clustering using differential evolution. In: Proceedings of the 2006 IEEE congress on evolutionary computation (CEC 2006), Vancouver, pp 1784–1791Google Scholar
- Alguliev R, Aliguliyev R (2005) Fast genetic algorithm for clustering of text documents. Artif Intell 3:698–707Google Scholar
- Aliguliyev R (2006) A clustering method for document collections and algorithm for estimation the optimal number of clusters. Artif Intell 4:651–659Google Scholar
- Aupetit S, Monmarché N, Slimane M (2007) Hidden Markov models training by a particle swarm optimization algorithm. J Math Model Algorithms 6:175–193MathSciNetCrossRefzbMATHGoogle Scholar
- Azzag H, Venturini G, Oliver A, Guinot C (2007) A hierarchical ant based clustering algorithm and its use in three real-world applications. Eur J Oper Res 179:906–922CrossRefzbMATHGoogle Scholar
- Bergh F, Engelbrecht A (2001) Effect of swarm size on cooperative particle swarm optimizers. In: Proceedings of genetic evolutionary computation conference (GECCO-2001), San Francisco, pp 892–899Google Scholar
- Bezdek J (1974) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, IthacaGoogle Scholar
- Chang P, Liu C, Fan C (2009) Data clustering and fuzzy neural network for sales forecasting: a case study in printed circuit board industry. Knowl-Based Syst 22(5):344–355CrossRefGoogle Scholar
- Cui X, Potok T, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings of the 2005 IEEE swarm intelligence symposium, Pasadena, pp 186–191Google Scholar
- Das S, Abraham A, Konar A (2008a) Automatic clustering with a multi-elitist particle swarm optimization algorithm. Pattern Recogn Lett 29:688–699CrossRefGoogle Scholar
- Das S, Abraham A, Konar A (2008b) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern Part A Syst Hum 38:218–237CrossRefGoogle Scholar
- ElAlami M (2011) Supporting image retrieval framework with rule base system. Knowl-Based Syst 24(2):331–340CrossRefGoogle Scholar
- Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefzbMATHGoogle Scholar
- Garai G, Chaudhuri B (2004) A novel genetic algorithm for automatic clustering. Pattern Recogn Lett 25:173–187CrossRefGoogle Scholar
- Gath I, Geva G (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11:773–781CrossRefzbMATHGoogle Scholar
- Güngör Z, Ünler A (2008) K-harmonic means data clustering with tabu search method. Appl Math Model 32:1115–1125CrossRefzbMATHGoogle Scholar
- Gupta Y, Saini A (2015) An efficient clustering approach based on hybridization of PSO, fuzzy logic and
*K*-harmonic means. In: IEEE workshop on computational intelligence: theories, applications and future directions (WCI). IIT KanpurGoogle Scholar - Hadavandi E, Shavandi H, Ghanbari A (2010) Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl-Based Syst 23(8):800–808CrossRefGoogle Scholar
- Hammerly G, Elkan C (2002) Alternatives to the
*k*-means algorithm that find better clusterings. In: Proceedings of the 11th international conference on information and knowledge management, pp 600–607Google Scholar - Han J, Kamber M, Pei P (2006) Data mining: concepts and techniques. Morgan Kaufmann, Los AltoszbMATHGoogle Scholar
- Hartmann V (2005) Ant colony optimization and swarm intelligence: evolving agent swarms for clustering and sorting. In: Proceedings of the 2005 conference on genetic and evolutionary computation (GECCO’05), Washington, DC, pp 217–224Google Scholar
- Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31:264–323CrossRefGoogle Scholar
- Kalyani S, Swarup K (2011) Particle swarm optimization based
*K*-means clustering approach for security assessment in power systems. Expert Syst Appl 38(9):10839–10846CrossRefGoogle Scholar - Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. J Comput 32(8):68–75Google Scholar
- Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis, vol 39. Wiley, LondonCrossRefzbMATHGoogle Scholar
- Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE international conference on neural networks, Englewood Cliffs, pp 1942–1948Google Scholar
- Khan M, Khor S (2004) Web document clustering using a hybrid neural network. Appl Soft Comput 4:423–432CrossRefGoogle Scholar
- Khy S, Ishikawa Y, Kitagawa H (2008) A novelty-based clustering method for on-line documents. World Wide Web 11:1–37CrossRefGoogle Scholar
- Laszlo M, Mukherjee S (2006) A genetic algorithm using hyper-quadtrees for low-dimensional
*k*-means clustering. IEEE Trans Pattern Anal Mach Intell 28:533–543CrossRefGoogle Scholar - Laszlo M, Mukherjee S (2007) A genetic algorithm that exchanges neighboring centers for
*k*-means clustering. Pattern Recogn Lett 28:2359–2366CrossRefGoogle Scholar - Li Y, Chung S, Holt J (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404CrossRefGoogle Scholar
- Liao C, Tseng C, Luarn P (2007) A discrete version of particle swarm optimization for flowshop scheduling problems. Comput Oper Res 34:3099–3111CrossRefzbMATHGoogle Scholar
- Lin H, Yang F, Kao Y (2005) An efficient GA-based clustering technique. Tamkang J Sci Eng 8(2):113–122Google Scholar
- MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley symposium mathematical, statistic and probability, BerkeleyGoogle Scholar
- Martin-Guerrero J, Palomares A, Balaguer-Ballester E, Soria-Olivas E, Gomez-Sanchis J, Soriano-Asensi A (2006) Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms. Expert Syst Appl 30(2):299–312CrossRefGoogle Scholar
- Nock R, Nielsen F (2006) On weighting clustering. IEEE Trans Pattern Anal Mach Intell 28:1223–1235CrossRefGoogle Scholar
- Ponomarenko J, Merkulova T, Orlova G, Fokin O, Gorshkov E, Ponomarenko M (2002) Mining DNA sequences to predict sites which mutations cause genetic diseases. Knowl-Based Syst 15(4):225–233CrossRefGoogle Scholar
- Sander J, Ester M, Kriegel M, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Disc 2(2):169–194CrossRefGoogle Scholar
- Sebiskveradze D, Vrabie V, Gobinet C, Durlach A, Bernard P, Ly E, Manfait M, Jeannesson P, Piot O (2011) Automation of an algorithm based on fuzzy clustering for analyzing tumoral heterogeneity in human skin carcinoma tissue sections. Lab Invest 91(5):799–811CrossRefGoogle Scholar
- Shi J, Luo Z (2010) Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Comput Biol Med 40(8):723–732CrossRefGoogle Scholar
- Subramanyam V, Sett S (2008) Knowledge-based image retrieval system. Knowl-Based Syst 21(2):89–100CrossRefGoogle Scholar
- Suganthan P (1999) Particle swarm optimizer with neighborhood operator. In: Proceedings of IEEE international conference on evolutionary computation, vol 3, pp 1958–1962Google Scholar
- Thakare A, Hanchate R (2014) Introducing hybrid model for data clustering using K-harmonic means and Gravitational search algorithms. Int J Comput Appl 88(17):18–22Google Scholar
- Verma N, Roy A (2014) Self-optimal clustering technique using optimized threshold function. IEEE Syst J 8(4):1213–1226CrossRefGoogle Scholar
- Vesanto W, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600CrossRefGoogle Scholar
- Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In Proceedings of 23rd international conference on very large databases, Greece, pp 186–195Google Scholar
- Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRefGoogle Scholar
- Yang F, Sun T, Zhang C (2009) Efficient hybrid data clustering method based on
*K*-harmonic means and particle swarm optimization. Expert Syst Appl 36(6):9847–9852CrossRefGoogle Scholar - Zadeh L (1965) Fuzzy sets. Inf Control 8:338–353CrossRefzbMATHGoogle Scholar
- Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD conference of management of data, Canada, pp 103–114Google Scholar
- Zhang B, Hsu M, Dayal U (1999)
*K*-harmonic means—a data clustering algorithm. Technical Report HPL-1999-124, Hewlett-Packard LaboratoriesGoogle Scholar - Zhang B, Hsu M, Dayal U (2000)
*K*-harmonic means. In: International workshop on temporal, spatial and spatio-temporal data mining. TSDM 2000, LyonGoogle Scholar