Abstract
Clustering as a formal, systematic subject in dissertations can be considered the most influential unsupervised learning problem; so, as every other problem of this kind, it deals with finding the structure in a collection of unlabeled data. One of the matters associated with this subject is undoubtedly determination of the number of clusters. In this chapter, an efficient grouping genetic algorithm is proposed under the circumstances of an anonymous number of clusters. Concurrent clustering with different number of clusters is implemented on the same data in each chromosome of grouping genetic algorithm in order to discern the accurate number of clusters. In subsequent iterations of the algorithm, new solutions with different clusters number or distinct accuracy of clustering are produced by application of efficient crossover and mutation operators that led to significant improvement of clustering. Furthermore, a local search by a special probability is applied in each chromosome of each new population in order to increase the accuracy of clustering.These special operators will lead to the successful application of the proposed method in the big data analysis. To prove the accuracy and the efficiency of the algorithm, its tested on various artificial and real data sets in a comparable manner. Most of the datasets consisted of overlapping clusters, but the algorithm could detect the proper number of all data sets with high accuracy of clustering. The consequences make the best evidence of the algorithms successful performance of finding an appropriate number of clusters and accomplishment of the best clusterings quality in comparison with others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Jain, M.N., Murty, A.K., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–322 (1999)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)
Gordon, A.D.: Classification, 2nd edn., Boca Raton FL (1999)
He, H., Tan, Y.: A two stage genetic algorithm for automatic clustering. Neurocomputing 81, 49–59 (2012)
Jiang, H., Yi, S., Yang, L.J.F., Hu, X.: Ant clustering algorithm with k harmonic means clustering. Expert Systems with Applications 37(12), 8679–8684 (2010)
Zhang, C., Ouyang, D., Ning, J.: An artificial bee colony approach for clustering. Expert Systems with Applications 37(7), 4761–4767 (2010)
Dong, H., Dong, Y., Zhou, C., Yin, W., Hou, G.: A fuzzy clustering algorithm based on evolutionary programming. Expert Systems with Applications 36, 11792–11800 (2009)
Liu, Y., Wu, X., Shen, Y.: Automatic clustering using genetic algorithms. Applied Mathematics and Computation 218(4), 1267–1279 (2011)
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2), 224–227 (1997)
Dongxia, C., Xianda, Z.: Dynamic niching genetic algorithm with data attraction for automatic clustering. Tsinhua Science and Techonology 14(6), 718–727 (2009)
Chang, D., Zhao, Y., Zheng, C., Zhang, X.: A genetic clustering algorithm using a messagebased similarity measure. Expert Systems with Applications 392(2), 2194–2202 (2012)
Kuo, Y.J., Syu, R.J.A., Chen, Z.Y., Tien, F.C.: Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Information Sciences 195, 124–140 (2012)
Liu, R., Jiao, L., Zhang, X., Li, Y.: Gene transposon based clone selection algorithm for automatic clustering. Information Sciences 204, 1–22 (2012)
Falkenauer, E.: The grouping genetic algorithm: Widening the scope of the gas. Belgian Journal of Operations Research, Statistics and Computer Science 33, 79–102 (1992)
Falkenauer, E.: Genetic algorithms and grouping problems. John Wiley & Sons, Inc., Chichester (1998)
Brown, E.C., Vroblefski, M.: A grouping genetic algorithm for the microcell sectorization problem. Engineering Applications of Artificial Intelligence 17(6), 589–598 (2004)
Keeling, K.B., James, T.L., Brown, E.C.: A hybrid grouping genetic algorithm for the cell formation problem. Computers and Operations Research 34, 2059–2079 (2007)
Hung, C., Brown, E.C., Sumichrast, R.T.: CPGEA: a grouping genetic algorithm for material cutting plan generation. Computers and Industrial Engineering 44 (4), 651–672 (2003)
Agustin-Blas, L.E., Salcedo-Sanz, S., Vidales, P., Urueta, G., Portilla-Figueras, J.A.: Near optimal citywide wifi network deployment using a hybrid grouping genetic algorithm. Expert Systems with Applications 38(8), 9543–9556 (2011)
Chen, Y., Fan, Z.P., Ma, J., Zeng, S.: A hybrid grouping genetic algorithm for reviewer group construction problem. Expert Systems with Applications 38, 2401–2411 (2011)
Martinez-Bernabeu, L., Florez-Revuelta, F., Casado-Diaz, J.M.: Grouping genetic operators for the delineation of functional areas based on spatial interaction. Expert Systems with Applications 39, 6754–6766 (2012)
Agustin-Blas, L.E., Salcedo-Sanz, S., Jimenez-Fernandez, S., Carro-Calvo, L., Del Ser, J., Portilla-Figueras, J.A.: A new grouping genetic algorithm for clustering problems. Expert Systems with Applications 39, 9695–9703 (2012)
Bhattacharya, M., Islam, R., Abawajy, J.: Evolutionary optimization:a big data perspective. Journal of Network and Computer Applications (2014)
Yannibelli, V., Amandi, A.: A deterministic crowdig evolutionary algorithm to form learning teams in a collabrative learning context. Expert Systems with Applications 39(10), 8584–8592 (2012)
Mahfoud, S.: Crowding and preselection revisited. Technical Report, Illinois Genetic Algorithm Laboratoty (1992)
Tsutsui, S., Ghosh, A.: A search space division in gas using phenotypic squares estimates. Information Sciences 109, 119–133 (1998)
Wang, H., Wu, Z., Rahnamayan, S.: Enhanced opposition-based differentioal evolution for solving high-dimentional continuous optimization problems. Soft Computing 15(11), 2127–2140 (2011)
Tan, P.N., Steinback, M., Kumar, V.: Introduction to data mining. Addison Wesley, USA (2005)
Kaur, H., Wasan, S., Al-Hegami, A., Bhatnagar, V.: A unified approach for discovery of interesting association rules in medical databases. In: Perner, P. (ed.) ICDM 2006, vol. 4065, pp. 53–63. Springer, Heidelberg (2006)
Kaur, H., Wasan, S.: An integrated approach in medical decision making for eliciting knowledge. Web-based Applications in Health Care and Biomedicine 7, 215–227 (2010)
Kaur, H., Chauhan, R., Wasan, S.: A bayesian network model for probabilistic estimation. In: Mehdi Khosrow, P. (ed.) Encyclopedia of Research and Information Technology, 3rd edn., pp. 1551–1558. IGI Global, USA (2014)
Chauhan, R., Kaur, H.: Predictive analytics and data mining: A framework for optimizing decisions with R tool. In: Tripathy, B.K., Acharjya, D.P. (eds.) Advances in Secure Computing, Internet Services, and Applications, pp. 73–88. IGI Global, USA (2014), doi:10.4018/978-1-4666-4940-8.ch004
Chang, D., Zhang, X., Zheng, C., Zhang, D.: A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem. Pattern Recognition 43, 1346–1360 (2010)
Srikanth, R., George, R., Warsi, N.: A variable-length genetic algorithm for clustering and classification. Pattern Recognition Letters 16, 789–800 (1995)
Ghozeil, A., Fogel, D.B.: Discovering patterns in spatial data using evolutionary programming. In: Koza, J.R., Goldberg, D.E., Fogel, D.B., Riolo, R.L. (eds.) Genetic Programming, pp. 521–527. MIT Press, Cambridge (1996)
Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application in image classification. Pattern Recognition 35, 1197–1208 (2002)
Chiang, M.M.T., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification 27(1), 3–40 (2009)
Saitta, S., Raphael, B., Smith, I.F.C.: A comprehensive validity index for clustering. Intelligent Data Analysis 12, 529–548 (2008)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Society 66, 846–850 (1971)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Razavi, S.H., Ebadati, E.O.M., Asadi, S., Kaur, H. (2015). An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis. In: Acharjya, D., Dehuri, S., Sanyal, S. (eds) Computational Intelligence for Big Data Analysis. Adaptation, Learning, and Optimization, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-319-16598-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-16598-1_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16597-4
Online ISBN: 978-3-319-16598-1
eBook Packages: EngineeringEngineering (R0)