Abstract
Large quantity of data has been accumulating tremendously due to digitalization. But the accumulated data are not converted into useful patterns. This gap is conquered by using exploratory data analysis techniques. Clustering is one of the vital technologies in exploratory data analysis. It is a methodology to arrange data objects as per their characteristics. Traditional clustering approaches, namely leader, K-means, ISODATA and evolutionary-based approaches like genetic algorithm, particle swarm optimization, social group optimization methods, are also implemented on benchmark data set. Evolutionary-based clustering methods are derived from the existing hard clustering methods for finding optimal results. Performance analysis of the above clustering techniques should be validated through different cluster validation methods. The performance analysis reveals evolutionary clustering methods convergence rate is better than partition clustering methods. ISODATA performs better in various aspects on large data. In this work analyzed performance of hard and evolutionary clustering methods on execution time, internal cluster validity criteria.
Similar content being viewed by others
References
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973)
Kanungo, T.; Mount, D.M.; et al.: An efficient k-means clustering algorithm: analysis and Implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Agarwal, P.K.; Procopiuc, C.M.: Exact and approximation algorithms for clustering. In: Proceedings of 9th Annual ACM-SIAM Symposium Discrete Algorithms, pp. 658-667, Jan (1998)
Memarsadeghi, N.; Netanyahu, N.S.; Le Moigne, J.: A fast implementation of the ISODATA clustering algorithm. Int. J. Comput. Geom. Appl. 17, 71–103 (2006)
Madhuri, R.; Ramakrishna Murty, M.; Murthy, J.V.R.; Prasad Reddy, P.V.G.D; et al.: Cluster analysis on different data sets using K-modes and K-prototype algorithms. In: International Conference and Published the Proceeding in AISC Series, vol. 249, pp 137–144. Springer (2014)
Jain, A.K.; Murty, M.N.; Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Ng R.T.; Han J.: Efficient and effective clustering methods for spatial data mining. In: Proceedings 20th International Conference on Very Large Databases, pp. 144–155. Santiago, Chile (1994)
Chou, C.-H.; Su, M.-C.; Lai, E.: A new cluster validity measure and its application to image compression. Pattern Anal. Appl. 7, 205–220 (2004). https://doi.org/10.1007/s10044-004-0218-1
RamakrishnaMurty, M.; Murthy, J.V.R.; Prasad Reddy P.V.G.D.: Statistical approach based keyword extraction aid dimensionality reduction. In: International Conference Information Systems Design and Intelligent Application 2012, vol. 132, 2012, pp. 445–454. Springer, AISC (indexed by SCOPUS, ISI proceeding DBLP etc), ISBN 978-3-642-27443-5.
Wagstaff, K.; Rogers, S.; Schroedl, S.: Constrained K-means clustering with background knowledge. In: Proceedings of 8th International Conference Machine Learning, pp. 577–584 (2001)
Lu, J.F.; Tang, J.B.; Tang, Z.M.; Yang, J.Y.: Hierarchical initialization approach for k-means clustering. Pattern Recognit. Lett. 29(6), 787–795 (2008)
Su, M.; Chou, C.: A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 674–680 (2001)
Stoffel, K.; Belkoniene, A.: Parallel K-means clustering for large data sets. In: Proceedings of EuroPar’99 Parallel Processing, pp. 1451–1454 (1999)
Ramakrishna Murty, M.; Murthy, J.V.R.; Prasad Reddy P.V.G.D; et al.: Homogeneity separateness: a new validity measure for clustering problems. In: International Conference and Published the Proceedings in AISC and Computing, vol. 248, pp 1–10. Springer (2014)
Han, J.; Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (1999)
Vijaya, P.A.; NarasimhaMurty, M.; Subramanian, D.K.: Leaders–sub-leaders: an efficient hierarchical clustering algorithm for large data sets. Pattern Recognit. Lett. 25(4), 505–513 (2004)
Halkidi, M.; Batistakis, Y.; Vazirgiannis, M.: On clustering validity techniques. J. Intell. Inf. Syst. 17, 107–145 (2001)
Aggarwal, C.C.: Data Mining: The Text Book. Springer, Cham (2015)
Zhang T.; Ramakrishnan R.; Linvy M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)
Eiben, A.E.; Smith, J.E.: Introduction to Evolutionary Computing. Springer, Berlin (2003). (ISBN 3-540-40184-9)
Jain, A.K.: Data clustering: 50 years beyond K-means. J. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Rao, R.V.; Kalyankar, V.D.: Parameter optimization of machining processes using a new optimization algorithm. J. Mater. Manuf. Process. 27, 342–363 (2012)
Bandyopadhyay, S.; Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit. 35(6), 1197–1208 (2002)
Ramakrishna Murty, M.; Murthy, J.V.R.; Prasad Reddy, P.V.G.D.; et al.: Automatic clustering using teaching learning based optimization. Int. J. Appl. Math. 5(8), 1202–1211 (2014)
Omran, M.; Salman, A.; Engelbrecht, A.: Dynamic Clustering using particle swarm optimization with application in unsupervised image classification. In: Proceedings of 5th World Enformatika Conference (ICCI). Prague (2005)
Satapathy, Suresh; Naik, A.: Social group optimization (SGO): a new population evolutionary optimization technique. J. Complex Intell. Syst. 2(4), 173–203 (2016)
Van der Merwe, D.W.; Engelbrecht, A.P.: Data Clustering using particle Swarm Optimization. In: Proceedings of Evolutionary Computation, vol. 1, pp. 215–220 (2003)
Halkidi, M.; Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set, In: Proceedings, pp. 187–194. IEEE ICDM, San Jose, CA (2001)
Huang, Z.: Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)
Babu, G.; Murty, M.: A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recognit. Lett. 14(10), 763–769 (1993)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Patibandla, R.S.M.L., Veeranjaneyulu, N. Performance Analysis of Partition and Evolutionary Clustering Methods on Various Cluster Validation Criteria. Arab J Sci Eng 43, 4379–4390 (2018). https://doi.org/10.1007/s13369-017-3036-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-017-3036-7