Skip to main content

Big Data Analysis Using Hybrid Meta-Heuristic Optimization Algorithm and MapReduce Framework

  • Chapter
  • First Online:
Integrating Meta-Heuristics and Machine Learning for Real-World Optimization Problems

Abstract

Clustering large data is a recent and popular challenge that is used in various applications, including social networking, bioinformatics, and many others. In order to manage the rapidly growing data sizes, traditional clustering algorithms must be improved. In this research, a hybrid Harris Hawks Optimizer (HHHO) with K-mean clustering and MapReduce framework is proposed to solve the various data clustering problem. The proposed scheme uses the K-means' ability to solve the various clustering problems. More specifically, the K-means are utilized as initial solutions to the traditional Harris Hawks Optimizer (HHO). In general, HHO tries to refine the candidate solutions to find the best one. MapReduce is a distributed processing computing paradigm that produces datasets using a parallel program on a cluster. In particular, it is adopted in the developed HHHO for parallelization since it offers fault tolerance, load balancing, and data locality. The performance of the presented methodology has been evaluated by means of numerical comparisons which proved the efficiency of the proposed HHHO, where it produces better results than other existing computation methods. Moreover, it has a very good ability in improving and finding optimal and converging sets of data. In addition, the accuracy and error rate of the obtained results are assessed. The proposed method is implemented and evaluated using PYTHON simulation settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T.K. Das, P.M. Kumar, Big data analytics: a framework for unstructured data analysis. Int. J. Eng. Sci. Technol. 5(1), 153 (2013)

    Google Scholar 

  2. M.A. Shinwan, K. Chul-Soo, Enhanced mobile packet core network scheme for next-generation mobile communication systems. Int. J. Electron. Commun. Comput. Eng. 8, 56–61 (2017)

    Google Scholar 

  3. M. Al Shinwan, T.-D. Huy, K. Chul-Soo, A flat mobile core network for evolved packet core based SAE mobile networks. J. Comput. Commun. 5(5), 62–73 (2017)

    Google Scholar 

  4. M. Al Shinwan, K. Chul-Soo, A future mobile packet core network based on ip-in-ip protocol. Int. J. Comput. Networks Commun. 10 (2018)

    Google Scholar 

  5. X. Cui, P. Zhu, X. Yang, K. Li, C. Ji, Optimized big data K-means clustering using MapReduce. J. Supercomput. 70(3), 1249–1259 (2014)

    Article  Google Scholar 

  6. S. De, S. Dey, S. Bhattacharyya, Recent advances in hybrid Metaheuristics for data clustering (2020)

    Google Scholar 

  7. D. Singh, C.K. Reddy, A survey on platforms for big data analytics. J. big data 2(1), 1–20 (2015)

    Article  Google Scholar 

  8. A.S. Shirkhorshidi, S. Aghabozorgi, T.Y. Wah, T. Herawan, Big data clustering: a review, in International Conference on Computational Science and its Applications (2014), pp. 707–720

    Google Scholar 

  9. H.-G. Li, G.-Q. Wu, X.-G. Hu, J. Zhang, L. Li, X. Wu, K-means clustering with bagging and mapreduce, in 2011 44th Hawaii International Conference on System Sciences (2011), pp. 1–8

    Google Scholar 

  10. T. Condie, N. Conway, P. Alvaro, J.M. Hellerstein, K. Elmeleegy, R. Sears, MapReduce online, in Nsdi (2010), vol. 10, no. 4, p. 20

    Google Scholar 

  11. M. Al Shinwan et al., An efficient 5G data plan approach based on partially distributed mobility architecture. Sensors 22(1), 349 (2022)

    Google Scholar 

  12. L.M. Abualigah, A.T. Khader, M.A. Al-Betar, O.A. Alomari, Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst. Appl. 84, 24–36 (2017)

    Article  Google Scholar 

  13. J. Fan, F. Han, H. Liu, Challenges of big data analysis. Natl. Sci. Rev. 1(2), 293–314 (2014)

    Article  Google Scholar 

  14. L. Abualigah et al., Hybrid Harris Hawks optimization with differential evolution for data clustering, in Metaheuristics in Machine Learning: Theory and Applications (Springer, 2021), pp. 267–299

    Google Scholar 

  15. A. Gupta, H.K. Thakur, R. Shrivastava, P. Kumar, S. Nag, A big data analysis framework using apache spark and deep learning, in 2017 IEEE international conference on data mining workshops (ICDMW) (2017), pp. 9–16

    Google Scholar 

  16. J. Qiu, Q. Wu, G. Ding, Y. Xu, S. Feng, A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016(1), 1–16 (2016)

    Article  Google Scholar 

  17. S. Sagiroglu, D. Sinanc, Big data: a review, in 2013 International Conference On Collaboration Technologies And Systems (CTS) (2013), pp. 42–47

    Google Scholar 

  18. M. Alshinwan, L. Abualigah, C.-S. Kim, H. Alabool, Development of a real-time dynamic weighting method in routing for congestion control: application and analysis. Wirel. Pers. Commun. 118(1), 755–772 (2021)

    Article  Google Scholar 

  19. M. Al Shinwan, L. Abualigah, N.D. Le, C. Kim, A.M. Khasawneh, An intelligent long-lived TCP based on real-time traffic regulation, Multimed. Tools Appl. 80(11), 16763–16780 (2021)

    Google Scholar 

  20. L. Abualigah, M. Shehab, M. Alshinwan, H. Alabool, Salp swarm algorithm: a comprehensive survey. Neural Comput. Appl. 32(15), 11195–11215 (2020)

    Article  Google Scholar 

  21. L. Abualigah, M. Shehab, M. Alshinwan, S. Mirjalili, M. Abd Elaziz, Ant lion optimizer: a comprehensive survey of its variants and applications. Arch. Comput. Methods Eng. 28(3), 1397–1416 (2021)

    Google Scholar 

  22. M. Shehab, L. Abualigah, H. Al Hamad, H. Alabool, M. Alshinwan, A.M. Khasawneh, Moth--flame optimization algorithm: variants and applications. Neural Comput. Appl. 32(14), 9859–9884 (2020)

    Google Scholar 

  23. L. Abualigah et al., Advances in meta-heuristic optimization algorithms in big data text clustering. Electronics 2021, 10, 101.” s Note: MDPI stays neu-tral with regard to jurisdictional clai-ms in~…, (2021)

    Google Scholar 

  24. L. Abualigah et al., Nature-inspired optimization algorithms for text document clustering—a comprehensive analysis. Algorithms 13(12), 345 (2020)

    Article  MathSciNet  Google Scholar 

  25. S. Lohr, The age of big data. N.Y. Times 11, 2012 (2012)

    Google Scholar 

  26. E. Slack, Storage infrastructures for big data workflows. Storage Switch. LLC, Tech. Rep (2012)

    Google Scholar 

  27. Z. Zheng, J. Zhu, M.R. Lyu, Service-generated big data and big data-as-a-service: an overview, in 2013 IEEE International Congress on Big Data (2013), pp. 403–410

    Google Scholar 

  28. H.N. Alshaer, M.A. Otair, L. Abualigah, M. Alshinwan, A.M. Khasawneh, Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application. Multimed. Tools Appl. 80(7), 10373–10390 (2021)

    Article  Google Scholar 

  29. S. Tiwari, H.-M. Wee, Y. Daryanto, Big data analytics in supply chain management between 2010 and 2016: insights to industries. Comput. Ind. Eng. 115, 319–330 (2018)

    Google Scholar 

  30. L. Mohammad Abualigah et al., Hybrid harmony search algorithm to solve the feature selection for data mining applications. Recent Adv. Hybrid Metaheuristics Data Clust. 19–37 (2020)

    Google Scholar 

  31. L. Abualigah, B. Alsalibi, M. Shehab, M. Alshinwan, A.M. Khasawneh, H. Alabool, A parallel hybrid krill herd algorithm for feature selection. Int. J. Mach. Learn. Cybern. 12(3), 783–806 (2021)

    Article  Google Scholar 

  32. L.M. Abualigah, E.S. Hanandeh, A.T. Khader, M.A. Otair, S.K. Shandilya, An improved b-hill climbing optimization technique for solving the text documents clustering problem. Curr. Med. imaging 16(4), 296–306 (2020)

    Article  Google Scholar 

  33. M.R. Naqvi, M.A. Jaffar, M. Aslam, S.K. Shahzad, M.W. Iqbal, A. Farooq, Importance of big data in precision and personalized medicine, in 2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) (2020), pp. 1–6

    Google Scholar 

  34. B.M. Balachandran, S. Prasad, Challenges and benefits of deploying big data analytics in the cloud for business intelligence. Procedia Comput. Sci. 112, 1112–1122 (2017)

    Article  Google Scholar 

  35. L. Abualigah et al., Ts-gwo: Iot tasks scheduling in cloud computing using grey wolf optimizer, in Swarm Intelligence for Cloud Computing, Chapman and Hall/CRC (2020), pp. 127–152

    Google Scholar 

  36. L. Barthelus, Adopting cloud computing within the healthcare industry: opportunity or risk? Online J. Appl. Knowl. Manag. 4(1), 1–16 (2016)

    Article  Google Scholar 

  37. N. Ilyasova, A. Kupriyanov, R. Paringer, D. Kirsh, Particular use of BIG DATA in medical diagnostic tasks. Pattern Recognit. Image Anal. 28(1), 114–121 (2018)

    Article  Google Scholar 

  38. M.M. Najafabadi, F. Villanustre, T.M. Khoshgoftaar, N. Seliya, R. Wald, E. Muharemagic, Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)

    Article  Google Scholar 

  39. E. Dumbill, What is big data?—An introduction to the big data landscape. (2012), [Online]. Available: http://radar.oreilly.com/2012/01/what-is-big-data.html

  40. H. Rashaideh et al., A grey wolf optimizer for text document clustering. J. Intell. Syst. 29(1), 814–830 (2020)

    Article  Google Scholar 

  41. W. Zhao, H. Ma, Q. He, Parallel k-means clustering based on mapreduce, in IEEE International Conference on Cloud Computing (2009), pp. 674–679

    Google Scholar 

  42. S.B. Elagib, A.R. Najeeb, A.H. Hashim, R.F. Olanrewaju, Big data analysis solutions using MapReduce framework, in 2014 International Conference on Computer and Communication Engineering (2014), pp. 127–130

    Google Scholar 

  43. L. Chen, X. Huo, G. Agrawal, Accelerating mapreduce on a coupled cpu-gpu architecture, in SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2012), pp. 1–11

    Google Scholar 

  44. V. López, S. Del R\’\io, J. M. Ben\’\itez, F. Herrera, Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst. 258, 5–38 (2015)

    Google Scholar 

  45. J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  46. M. Abd Elaziz et al., Advanced metaheuristic optimization techniques in applications of deep neural networks: a review. Neural Comput. Appl., 1–21 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Abualigah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bashabsheh, M.Q., Abualigah, L., Alshinwan, M. (2022). Big Data Analysis Using Hybrid Meta-Heuristic Optimization Algorithm and MapReduce Framework. In: Houssein, E.H., Abd Elaziz, M., Oliva, D., Abualigah, L. (eds) Integrating Meta-Heuristics and Machine Learning for Real-World Optimization Problems. Studies in Computational Intelligence, vol 1038. Springer, Cham. https://doi.org/10.1007/978-3-030-99079-4_8

Download citation

Publish with us

Policies and ethics