Abstract
A genetic algorithm, that exploits the K-means principles for dividing objects in groups having high similarity, is proposed. The method evolves a population of chromosomes, each representing a division of objects in a different number of clusters. A group-based crossover, enriched with the one-step K-means operator, and a mutation strategy that reassigns objects to clusters on the base of their distance to the clusters computed so far, allow the approach to determine the best number of groups present in the dataset. The method has been experimented with four different fitness functions on both synthetic and real-world datasets, for which the ground-truth division is known, and compared with the K-means method. Results show that the approach obtains higher values of evaluation indexes than that obtained by the K-means method.
Similar content being viewed by others
References
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035 (2007)
Bandyopadhyay, S., Maulik, U.: An evolutionary technique based on k-means algorithm for optimal clustering in rn. Inf. Sci. Appl. 146(1–4), 221–237 (2002)
Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Rec. 35, 1197–1208 (2004)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998)
Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(3), 433–439 (1999)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 911–916 (2010)
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.J.: Fgka: a fast genetic k-means clustering algorithm. In: Proceedings of the 2004 ACM Symposium on Applied Computing, SAC 2004, pp. 622–623 (2004)
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.J.: Performance evaluation of some clustering algorithms and validity indices. BMC Bioinform. 5(172), 1–10 (2004)
Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 370–379 (1995)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Acknowledgment
This work has been partially supported by MIUR D.D. n 0001542, under the project \(BA2KNOW - PON03PE\_00001\_1\).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pizzuti, C., Procopio, N. (2017). A K-means Based Genetic Algorithm for Data Clustering. In: Graña, M., López-Guede, J.M., Etxaniz, O., Herrero, Á., Quintián, H., Corchado, E. (eds) International Joint Conference SOCO’16-CISIS’16-ICEUTE’16. SOCO CISIS ICEUTE 2016 2016 2016. Advances in Intelligent Systems and Computing, vol 527. Springer, Cham. https://doi.org/10.1007/978-3-319-47364-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-47364-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47363-5
Online ISBN: 978-3-319-47364-2
eBook Packages: EngineeringEngineering (R0)