Skip to main content

An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis

  • Chapter
Computational Intelligence for Big Data Analysis

Part of the book series: Adaptation, Learning, and Optimization ((ALO,volume 19))

Abstract

Clustering as a formal, systematic subject in dissertations can be considered the most influential unsupervised learning problem; so, as every other problem of this kind, it deals with finding the structure in a collection of unlabeled data. One of the matters associated with this subject is undoubtedly determination of the number of clusters. In this chapter, an efficient grouping genetic algorithm is proposed under the circumstances of an anonymous number of clusters. Concurrent clustering with different number of clusters is implemented on the same data in each chromosome of grouping genetic algorithm in order to discern the accurate number of clusters. In subsequent iterations of the algorithm, new solutions with different clusters number or distinct accuracy of clustering are produced by application of efficient crossover and mutation operators that led to significant improvement of clustering. Furthermore, a local search by a special probability is applied in each chromosome of each new population in order to increase the accuracy of clustering.These special operators will lead to the successful application of the proposed method in the big data analysis. To prove the accuracy and the efficiency of the algorithm, its tested on various artificial and real data sets in a comparable manner. Most of the datasets consisted of overlapping clusters, but the algorithm could detect the proper number of all data sets with high accuracy of clustering. The consequences make the best evidence of the algorithms successful performance of finding an appropriate number of clusters and accomplishment of the best clusterings quality in comparison with others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. Jain, M.N., Murty, A.K., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–322 (1999)

    Article  Google Scholar 

  3. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)

    Article  Google Scholar 

  4. Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)

    MATH  Google Scholar 

  5. Gordon, A.D.: Classification, 2nd edn., Boca Raton FL (1999)

    Google Scholar 

  6. He, H., Tan, Y.: A two stage genetic algorithm for automatic clustering. Neurocomputing 81, 49–59 (2012)

    Article  Google Scholar 

  7. Jiang, H., Yi, S., Yang, L.J.F., Hu, X.: Ant clustering algorithm with k harmonic means clustering. Expert Systems with Applications 37(12), 8679–8684 (2010)

    Article  Google Scholar 

  8. Zhang, C., Ouyang, D., Ning, J.: An artificial bee colony approach for clustering. Expert Systems with Applications 37(7), 4761–4767 (2010)

    Article  Google Scholar 

  9. Dong, H., Dong, Y., Zhou, C., Yin, W., Hou, G.: A fuzzy clustering algorithm based on evolutionary programming. Expert Systems with Applications 36, 11792–11800 (2009)

    Article  Google Scholar 

  10. Liu, Y., Wu, X., Shen, Y.: Automatic clustering using genetic algorithms. Applied Mathematics and Computation 218(4), 1267–1279 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  11. Davies, D., Bouldin, D.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2), 224–227 (1997)

    Google Scholar 

  12. Dongxia, C., Xianda, Z.: Dynamic niching genetic algorithm with data attraction for automatic clustering. Tsinhua Science and Techonology 14(6), 718–727 (2009)

    Article  MATH  Google Scholar 

  13. Chang, D., Zhao, Y., Zheng, C., Zhang, X.: A genetic clustering algorithm using a messagebased similarity measure. Expert Systems with Applications 392(2), 2194–2202 (2012)

    Article  Google Scholar 

  14. Kuo, Y.J., Syu, R.J.A., Chen, Z.Y., Tien, F.C.: Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Information Sciences 195, 124–140 (2012)

    Article  Google Scholar 

  15. Liu, R., Jiao, L., Zhang, X., Li, Y.: Gene transposon based clone selection algorithm for automatic clustering. Information Sciences 204, 1–22 (2012)

    Article  Google Scholar 

  16. Falkenauer, E.: The grouping genetic algorithm: Widening the scope of the gas. Belgian Journal of Operations Research, Statistics and Computer Science 33, 79–102 (1992)

    Google Scholar 

  17. Falkenauer, E.: Genetic algorithms and grouping problems. John Wiley & Sons, Inc., Chichester (1998)

    Google Scholar 

  18. Brown, E.C., Vroblefski, M.: A grouping genetic algorithm for the microcell sectorization problem. Engineering Applications of Artificial Intelligence 17(6), 589–598 (2004)

    Article  Google Scholar 

  19. Keeling, K.B., James, T.L., Brown, E.C.: A hybrid grouping genetic algorithm for the cell formation problem. Computers and Operations Research 34, 2059–2079 (2007)

    Article  MATH  Google Scholar 

  20. Hung, C., Brown, E.C., Sumichrast, R.T.: CPGEA: a grouping genetic algorithm for material cutting plan generation. Computers and Industrial Engineering 44 (4), 651–672 (2003)

    Article  Google Scholar 

  21. Agustin-Blas, L.E., Salcedo-Sanz, S., Vidales, P., Urueta, G., Portilla-Figueras, J.A.: Near optimal citywide wifi network deployment using a hybrid grouping genetic algorithm. Expert Systems with Applications 38(8), 9543–9556 (2011)

    Article  Google Scholar 

  22. Chen, Y., Fan, Z.P., Ma, J., Zeng, S.: A hybrid grouping genetic algorithm for reviewer group construction problem. Expert Systems with Applications 38, 2401–2411 (2011)

    Article  Google Scholar 

  23. Martinez-Bernabeu, L., Florez-Revuelta, F., Casado-Diaz, J.M.: Grouping genetic operators for the delineation of functional areas based on spatial interaction. Expert Systems with Applications 39, 6754–6766 (2012)

    Article  Google Scholar 

  24. Agustin-Blas, L.E., Salcedo-Sanz, S., Jimenez-Fernandez, S., Carro-Calvo, L., Del Ser, J., Portilla-Figueras, J.A.: A new grouping genetic algorithm for clustering problems. Expert Systems with Applications 39, 9695–9703 (2012)

    Article  Google Scholar 

  25. Bhattacharya, M., Islam, R., Abawajy, J.: Evolutionary optimization:a big data perspective. Journal of Network and Computer Applications (2014)

    Google Scholar 

  26. Yannibelli, V., Amandi, A.: A deterministic crowdig evolutionary algorithm to form learning teams in a collabrative learning context. Expert Systems with Applications 39(10), 8584–8592 (2012)

    Article  Google Scholar 

  27. Mahfoud, S.: Crowding and preselection revisited. Technical Report, Illinois Genetic Algorithm Laboratoty (1992)

    Google Scholar 

  28. Tsutsui, S., Ghosh, A.: A search space division in gas using phenotypic squares estimates. Information Sciences 109, 119–133 (1998)

    Article  Google Scholar 

  29. Wang, H., Wu, Z., Rahnamayan, S.: Enhanced opposition-based differentioal evolution for solving high-dimentional continuous optimization problems. Soft Computing 15(11), 2127–2140 (2011)

    Article  Google Scholar 

  30. Tan, P.N., Steinback, M., Kumar, V.: Introduction to data mining. Addison Wesley, USA (2005)

    Google Scholar 

  31. Kaur, H., Wasan, S., Al-Hegami, A., Bhatnagar, V.: A unified approach for discovery of interesting association rules in medical databases. In: Perner, P. (ed.) ICDM 2006, vol. 4065, pp. 53–63. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  32. Kaur, H., Wasan, S.: An integrated approach in medical decision making for eliciting knowledge. Web-based Applications in Health Care and Biomedicine 7, 215–227 (2010)

    Article  Google Scholar 

  33. Kaur, H., Chauhan, R., Wasan, S.: A bayesian network model for probabilistic estimation. In: Mehdi Khosrow, P. (ed.) Encyclopedia of Research and Information Technology, 3rd edn., pp. 1551–1558. IGI Global, USA (2014)

    Google Scholar 

  34. Chauhan, R., Kaur, H.: Predictive analytics and data mining: A framework for optimizing decisions with R tool. In: Tripathy, B.K., Acharjya, D.P. (eds.) Advances in Secure Computing, Internet Services, and Applications, pp. 73–88. IGI Global, USA (2014), doi:10.4018/978-1-4666-4940-8.ch004

    Chapter  Google Scholar 

  35. Chang, D., Zhang, X., Zheng, C., Zhang, D.: A robust dynamic niching genetic algorithm with niche migration for automatic clustering problem. Pattern Recognition 43, 1346–1360 (2010)

    Article  MATH  Google Scholar 

  36. Srikanth, R., George, R., Warsi, N.: A variable-length genetic algorithm for clustering and classification. Pattern Recognition Letters 16, 789–800 (1995)

    Article  Google Scholar 

  37. Ghozeil, A., Fogel, D.B.: Discovering patterns in spatial data using evolutionary programming. In: Koza, J.R., Goldberg, D.E., Fogel, D.B., Riolo, R.L. (eds.) Genetic Programming, pp. 521–527. MIT Press, Cambridge (1996)

    Google Scholar 

  38. Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application in image classification. Pattern Recognition 35, 1197–1208 (2002)

    Article  MATH  Google Scholar 

  39. Chiang, M.M.T., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification 27(1), 3–40 (2009)

    Article  MathSciNet  Google Scholar 

  40. Saitta, S., Raphael, B., Smith, I.F.C.: A comprehensive validity index for clustering. Intelligent Data Analysis 12, 529–548 (2008)

    Google Scholar 

  41. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Society 66, 846–850 (1971)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayede Houri Razavi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Razavi, S.H., Ebadati, E.O.M., Asadi, S., Kaur, H. (2015). An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis. In: Acharjya, D., Dehuri, S., Sanyal, S. (eds) Computational Intelligence for Big Data Analysis. Adaptation, Learning, and Optimization, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-319-16598-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16598-1_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16597-4

  • Online ISBN: 978-3-319-16598-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics