An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis

Razavi, Sayede Houri; Ebadati, E. Omid Mahdi; Asadi, Shahrokh; Kaur, Harleen

doi:10.1007/978-3-319-16598-1_5

Sayede Houri Razavi⁶,
E. Omid Mahdi Ebadati⁷,
Shahrokh Asadi⁸ &
…
Harleen Kaur⁹

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 19))

3062 Accesses
16 Citations

Abstract

Clustering as a formal, systematic subject in dissertations can be considered the most influential unsupervised learning problem; so, as every other problem of this kind, it deals with finding the structure in a collection of unlabeled data. One of the matters associated with this subject is undoubtedly determination of the number of clusters. In this chapter, an efficient grouping genetic algorithm is proposed under the circumstances of an anonymous number of clusters. Concurrent clustering with different number of clusters is implemented on the same data in each chromosome of grouping genetic algorithm in order to discern the accurate number of clusters. In subsequent iterations of the algorithm, new solutions with different clusters number or distinct accuracy of clustering are produced by application of efficient crossover and mutation operators that led to significant improvement of clustering. Furthermore, a local search by a special probability is applied in each chromosome of each new population in order to increase the accuracy of clustering.These special operators will lead to the successful application of the proposed method in the big data analysis. To prove the accuracy and the efficiency of the algorithm, its tested on various artificial and real data sets in a comparable manner. Most of the datasets consisted of overlapping clusters, but the algorithm could detect the proper number of all data sets with high accuracy of clustering. The consequences make the best evidence of the algorithms successful performance of finding an appropriate number of clusters and accomplishment of the best clusterings quality in comparison with others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Novel Genetic Algorithm with Specialized Genetic Operators for Clustering

Improving Group Search Optimization for Automatic Data Clustering Using Merge and Split Operators

Genetic Algorithm with New Fitness Function for Clustering

Article 23 May 2020

References

Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Article Google Scholar
Jain, M.N., Murty, A.K., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–322 (1999)
Article Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Article Google Scholar
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)
MATH Google Scholar
Gordon, A.D.: Classification, 2nd edn., Boca Raton FL (1999)
Google Scholar
He, H., Tan, Y.: A two stage genetic algorithm for automatic clustering. Neurocomputing 81, 49–59 (2012)
Article Google Scholar
Jiang, H., Yi, S., Yang, L.J.F., Hu, X.: Ant clustering algorithm with k harmonic means clustering. Expert Systems with Applications 37(12), 8679–8684 (2010)
Article Google Scholar
Zhang, C., Ouyang, D., Ning, J.: An artificial bee colony approach for clustering. Expert Systems with Applications 37(7), 4761–4767 (2010)
Article Google Scholar
Dong, H., Dong, Y., Zhou, C., Yin, W., Hou, G.: A fuzzy clustering algorithm based on evolutionary programming. Expert Systems with Applications 36, 11792–11800 (2009)
Article Google Scholar
Liu, Y., Wu, X., Shen, Y.: Automatic clustering using genetic algorithms. Applied Mathematics and Computation 218(4), 1267–1279 (2011)
Article MATH MathSciNet Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2), 224–227 (1997)
Google Scholar
Dongxia, C., Xianda, Z.: Dynamic niching genetic algorithm with data attraction for automatic clustering. Tsinhua Science and Techonology 14(6), 718–727 (2009)
Article MATH Google Scholar
Chang, D., Zhao, Y., Zheng, C., Zhang, X.: A genetic clustering algorithm using a messagebased similarity measure. Expert Systems with Applications 392(2), 2194–2202 (2012)
Article Google Scholar
Kuo, Y.J., Syu, R.J.A., Chen, Z.Y., Tien, F.C.: Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Information Sciences 195, 124–140 (2012)
Article Google Scholar
Liu, R., Jiao, L., Zhang, X., Li, Y.: Gene transposon based clone selection algorithm for automatic clustering. Information Sciences 204, 1–22 (2012)
Article Google Scholar
Falkenauer, E.: The grouping genetic algorithm: Widening the scope of the gas. Belgian Journal of Operations Research, Statistics and Computer Science 33, 79–102 (1992)
Google Scholar
Falkenauer, E.: Genetic algorithms and grouping problems. John Wiley & Sons, Inc., Chichester (1998)
Google Scholar
Brown, E.C., Vroblefski, M.: A grouping genetic algorithm for the microcell sectorization problem. Engineering Applications of Artificial Intelligence 17(6), 589–598 (2004)
Article Google Scholar
Keeling, K.B., James, T.L., Brown, E.C.: A hybrid grouping genetic algorithm for the cell formation problem. Computers and Operations Research 34, 2059–2079 (2007)
Article MATH Google Scholar
Hung, C., Brown, E.C., Sumichrast, R.T.: CPGEA: a grouping genetic algorithm for material cutting plan generation. Computers and Industrial Engineering 44 (4), 651–672 (2003)
Article Google Scholar
Agustin-Blas, L.E., Salcedo-Sanz, S., Vidales, P., Urueta, G., Portilla-Figueras, J.A.: Near optimal citywide wifi network deployment using a hybrid grouping genetic algorithm. Expert Systems with Applications 38(8), 9543–9556 (2011)
Article Google Scholar
Chen, Y., Fan, Z.P., Ma, J., Zeng, S.: A hybrid grouping genetic algorithm for reviewer group construction problem. Expert Systems with Applications 38, 2401–2411 (2011)
Article Google Scholar
Martinez-Bernabeu, L., Florez-Revuelta, F., Casado-Diaz, J.M.: Grouping genetic operators for the delineation of functional areas based on spatial interaction. Expert Systems with Applications 39, 6754–6766 (2012)
Article Google Scholar
Agustin-Blas, L.E., Salcedo-Sanz, S., Jimenez-Fernandez, S., Carro-Calvo, L., Del Ser, J., Portilla-Figueras, J.A.: A new grouping genetic algorithm for clustering problems. Expert Systems with Applications 39, 9695–9703 (2012)
Article Google Scholar
Bhattacharya, M., Islam, R., Abawajy, J.: Evolutionary optimization:a big data perspective. Journal of Network and Computer Applications (2014)
Google Scholar
Yannibelli, V., Amandi, A.: A deterministic crowdig evolutionary algorithm to form learning teams in a collabrative learning context. Expert Systems with Applications 39(10), 8584–8592 (2012)
Article Google Scholar
Mahfoud, S.: Crowding and preselection revisited. Technical Report, Illinois Genetic Algorithm Laboratoty (1992)
Google Scholar
Tsutsui, S., Ghosh, A.: A search space division in gas using phenotypic squares estimates. Information Sciences 109, 119–133 (1998)
Article Google Scholar
Wang, H., Wu, Z., Rahnamayan, S.: Enhanced opposition-based differentioal evolution for solving high-dimentional continuous optimization problems. Soft Computing 15(11), 2127–2140 (2011)
Article Google Scholar
Tan, P.N., Steinback, M., Kumar, V.: Introduction to data mining. Addison Wesley, USA (2005)
Google Scholar
Kaur, H., Wasan, S., Al-Hegami, A., Bhatnagar, V.: A unified approach for discovery of interesting association rules in medical databases. In: Perner, P. (ed.) ICDM 2006, vol. 4065, pp. 53–63. Springer, Heidelberg (2006)
Chapter Google Scholar
Kaur, H., Wasan, S.: An integrated approach in medical decision making for eliciting knowledge. Web-based Applications in Health Care and Biomedicine 7, 215–227 (2010)
Article Google Scholar
Kaur, H., Chauhan, R., Wasan, S.: A bayesian network model for probabilistic estimation. In: Mehdi Khosrow, P. (ed.) Encyclopedia of Research and Information Technology, 3rd edn., pp. 1551–1558. IGI Global, USA (2014)
Google Scholar
Chauhan, R., Kaur, H.: Predictive analytics and data mining: A framework for optimizing decisions with R tool. In: Tripathy, B.K., Acharjya, D.P. (eds.) Advances in Secure Computing, Internet Services, and Applications, pp. 73–88. IGI Global, USA (2014), doi:10.4018/978-1-4666-4940-8.ch004
Chapter Google Scholar
Chang, D., Zhang, X., Zheng, C., Zhang, D.: A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem. Pattern Recognition 43, 1346–1360 (2010)
Article MATH Google Scholar
Srikanth, R., George, R., Warsi, N.: A variable-length genetic algorithm for clustering and classification. Pattern Recognition Letters 16, 789–800 (1995)
Article Google Scholar
Ghozeil, A., Fogel, D.B.: Discovering patterns in spatial data using evolutionary programming. In: Koza, J.R., Goldberg, D.E., Fogel, D.B., Riolo, R.L. (eds.) Genetic Programming, pp. 521–527. MIT Press, Cambridge (1996)
Google Scholar
Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application in image classification. Pattern Recognition 35, 1197–1208 (2002)
Article MATH Google Scholar
Chiang, M.M.T., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification 27(1), 3–40 (2009)
Article MathSciNet Google Scholar
Saitta, S., Raphael, B., Smith, I.F.C.: A comprehensive validity index for clustering. Intelligent Data Analysis 12, 529–548 (2008)
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Society 66, 846–850 (1971)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Knowledge Engineering and Decision Science, University of Economic Science, Tehran, Iran
Sayede Houri Razavi
Department of Mathematics and Computer Science, University of Economic Science, Tehran, Iran
E. Omid Mahdi Ebadati
Department of Industrial Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Shahrokh Asadi
Department of Computer Science, Hamdard University, New Delhi, India
Harleen Kaur

Authors

Sayede Houri Razavi
View author publications
You can also search for this author in PubMed Google Scholar
E. Omid Mahdi Ebadati
View author publications
You can also search for this author in PubMed Google Scholar
Shahrokh Asadi
View author publications
You can also search for this author in PubMed Google Scholar
Harleen Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayede Houri Razavi .

Editor information

Editors and Affiliations

School of Computing Sciences and Engineering, VIT University, Vellore, India
D.P. Acharjya
Department of Information and Communication Technology, Fakir Mohan University, Balasore, India
Satchidananda Dehuri
Corporate Technology Ofﬁce, Tata Consultancy Services, Mumbai, Maharashtra, India
Sugata Sanyal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Razavi, S.H., Ebadati, E.O.M., Asadi, S., Kaur, H. (2015). An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis. In: Acharjya, D., Dehuri, S., Sanyal, S. (eds) Computational Intelligence for Big Data Analysis. Adaptation, Learning, and Optimization, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-319-16598-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-16598-1_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16597-4
Online ISBN: 978-3-319-16598-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Novel Genetic Algorithm with Specialized Genetic Operators for Clustering

Improving Group Search Optimization for Automatic Data Clustering Using Merge and Split Operators

Genetic Algorithm with New Fitness Function for Clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Novel Genetic Algorithm with Specialized Genetic Operators for Clustering

Improving Group Search Optimization for Automatic Data Clustering Using Merge and Split Operators

Genetic Algorithm with New Fitness Function for Clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation