Big optimization with genetic algorithms: Hadoop, Spark, and MPI

Salto, Carolina; Minetti, Gabriela; Alba, Enrique; Luque, Gabriel

doi:10.1007/s00500-023-08301-x

Big optimization with genetic algorithms: Hadoop, Spark, and MPI

Optimization
Published: 20 May 2023

Volume 27, pages 11469–11484, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

226 Accesses
2 Citations
Explore all metrics

Abstract

Solving problems of high dimensionality (and complexity) usually needs the intense use of technologies, like parallelism, advanced computers and new types of algorithms. MapReduce (MR) is a computing paradigm long time existing in computer science that has been proposed in the last years for dealing with big data applications, though it could also be used for many other tasks. In this article, we address big optimization: the solution to large instances of combinatorial optimization problems by using MR as the paradigm to design solvers that allow transparent runs on a varied number of computers that collaborate to find the problem solution. We study and analyze the MR technology, focusing on Hadoop, Spark, and MPI as the middleware platforms to develop genetic algorithms (GAs). From this, MRGA solvers arise using a different programming paradigm from the usual imperative transformational programming. Our objective is to confirm the expected benefits of these systems, namely file, memory, and communication management, over the resulting algorithms. We analyze our MRGA solvers from relevant points of view like scalability, speedup, and communication vs. computation time in big optimization. The results for high-dimensional datasets show that the MRGA over Hadoop outperforms the implementations in Spark and MPI frameworks. For the smallest datasets, the execution of MRGA on MPI is always faster than the executions of the remaining MRGAs. Finally, the MRGA over Spark presents the lowest communication times. Numerical and time insights are given in our work, so as to ease future comparisons of new algorithms over these three popular technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics: a survey

Article Open access 01 October 2015

Big data analytics in Cloud computing: an overview

Article Open access 06 August 2022

A hybrid optimization algorithm for energy-aware multi-objective task scheduling in heterogeneous multiprocessor systems

Article 10 May 2024

Data availibility

Enquiries about data availability should be directed to the authors.

References

Alba E (2002) Parallel evolutionary algorithms can achieve super-linear performance. Inf Process Lett 82(1):7–13
Article MathSciNet MATH Google Scholar
Alba E (2005) Parallel metaheuristics: a new class of algorithms. Wiley-Interscience, New York
Book MATH Google Scholar
Alterkawi L, Migliavacca M (2019) Parallelism and partitioning in large-scale GAs using spark. In: Proceedings of the genetic and evolutionary computation conference, GECCO’19. New York, NY, USA. Association for Computing Machinery, pp 736–744
Cano A, García-Martínez C, Ventura S (2017) Extremely high-dimensional optimization with MapReduce: scaling functions and algorithm. Inf Sci 415, 416(Supplement C):110–127
Chávez F, Fernández F, Benavides C, Lanza D, Villegas J, Trujillo L, Olague G, Román G (2016) ECJ+Hadoop: an easy way to deploy massive runs of evolutionary algorithms. In: Squillero G, Burelli P (eds) Applications of evolutionary computation. Springer, Cham, pp 91–106
Chapter Google Scholar
De Kenneth J, William S (1991) An analysis of the interacting roles of population size and crossover in genetic algorithms. Parallel Problem Solv Nat 1:38–47
MathSciNet Google Scholar
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI’04: proceedings of the 6TH conference on symposium on operating systems design and implementation. USENIX Association
Di L, Geronimo, Ferrucci F, Murolo A, Sarro F (2012) A parallel genetic algorithm based on Hadoop MapReduce for the automatic generation of JUnit test suites. In: 2012 IEEE fifth international conference on software testing, verification and validation, April 2012. pp 785–793
Ferrucci F, Salza P, Sarro F (2017) Using Hadoop MR for parallel GAs: a comparison of the global, grid and island models. Evol Comput. https://doi.org/10.1162/evco_a_00213
Article Google Scholar
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman, San Francisco
MATH Google Scholar
Goldberg DE (2002) The design of innovation: lessons from and for competent genetic algorithms. Kluwer, Boston
Book MATH Google Scholar
Guo Z, Ruixin Z, Yongquan Z (2018) Solving large-scale 0–1 knapsack problem by the social-spider optimisation algorithm. IJCSM 9(5):433–441
Article MathSciNet MATH Google Scholar
Hamstra M, Karau H, Zaharia M, Konwinski A, Wendell P (2015) Learning spark: lightning-fast big data analytics. OReilly Media, Sebastopol
Google Scholar
Hashem I, Anuar N, Gani A, Yaqoob I, Xia F, Khan S (2016) Mapreduce: review and open challenges. Scientometrics 109(1):389–422
Article Google Scholar
Hu C, Ren G, Liu C, Li M, Jie W (2017) A spark-based genetic algorithm for sensor placement in large scale drinking water distribution systems. Clust Comput 20(2):1089–1099
Article Google Scholar
Jatoth C, Gangadharan GR, Fiore U, Buyya R (2018) QoS-aware big service composition using mapreduce based evolutionary algorithm with guided mutation. Futur Gener Comput Syst 86:1008–1018
Article Google Scholar
Jenkins L (2002) A bicriteria knapsack program for planning remediation of contaminated lightstation sites. Eur J Oper Res 140(2):427–433
Article MATH Google Scholar
Kellerer H, Pferschy U, Pisinger D (2004) Introduction to NP-completeness of knapsack problems. Springer, Berlin, pp 483–493
MATH Google Scholar
Klamroth K, Wiecek MM (2000) Time-dependent capital budgeting with multiple criteria. In: Haimes YY, Steuer RE (eds) Research and practice in multiple criteria decision making. Springer, Berlin, pp 421–432
Chapter MATH Google Scholar
Lozano M, Molina D, Herrera F (2011) Editorial scalability of evolutionary algorithms and other metaheuristics for large-scale continuous optimization problems. Soft Comput 15(11):2085–2087
Article Google Scholar
Miller B, Goldberg D (1995) Genetic algorithms, tournament selection, and the effects of noise. Complex Syst 9:193–212
MathSciNet Google Scholar
Paduraru C, Melemciuc M, Stefanescu A (2017) A distributed implementation using apache spark of a genetic algorithm applied to test data generation. In: Proceedings of the genetic and evolutionary computation conference companion, GECCO’17. ACM, pp 1857–1863
Pisinger D (1999) Core problems in knapsack algorithms. Oper Res 47:570–575
Article MathSciNet MATH Google Scholar
Plimpton S, Devine K (2011) Mapreduce in MPI for large-scale graph algorithms. Parallel Comput 37(9):610–632
Article Google Scholar
Pradhan T, Israni A, Sharma M (2014) Solving the 0–1 knapsack problem using genetic algorithm and rough set theory. In: 2014 IEEE international conference on advanced communications, control and computing technologies. pp 1120–112
Qi R, Wang Z, Li S (2016) A parallel genetic algorithm based on spark for pairwise test suite generation. J Comput Sci Technol 31:417–427
Article Google Scholar
Quintuna RV, Laye M (2016) Modeling and optimization of content delivery networks with heuristics solutions for the multidimensional knapsack problem. pp 13–18
Rui Figueira J, Tavares G, Wiecek M (2010) Labeling algorithms for multiple objective integer knapsack problems. Comput Oper Res 37(4):700–711
Article MathSciNet MATH Google Scholar
Salama A, Wahed M, Yousif E (2018) Big data flow adjustment using knapsack problem. J Comput Commun 6:30–39
Article Google Scholar
Salto C, Minetti G, Alba E, Luque G (2018) Developing genetic algorithms using different mapreduce frameworks: MPI vs. Hadoop. In: Herrera F, Damas S, Montes R, Alonso S, Cordón Ó, González A, Troncoso A (eds) Advances in artificial intelligence. Springer, Cham, pp 262–272
Chapter Google Scholar
Scott E, Luke S (2019) ECJ at 20: Toward a general metaheuristics toolkit. In: Proceedings of the genetic and evolutionary computation conference companion, GECCO’19, New York, Association for Computing Machinery, pp 1391–1398
Talbi E (2009) Metaheuristics: from design to implementation. Wiley, New York
Book MATH Google Scholar
Verma A, Llorà X, Goldberg DE, Campbell R (2009) Scaling genetic algorithms using MapReduce. In: ISDA’09, pp 13–18
Verma A, Llorà X, Venkataraman S, Goldberg DE, Campbell R (2010) Scaling eCGA model building via data-intensive computing. In: IEEE congress on evolutionary computation, pp 1–8
Welcome to (2014) Apache\(^{\rm TM}\) Hadoop®! Technical report. The Apache Software Foundation. http://hadoop.apache.org/
White T (2012) Hadoop, the definitive guide. O’Reilly Media, Sebastopol
Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauleyM, Franklin M, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12. USENIX Association, pp 2–2

Download references

Funding

This research received financial support from the Universidad Nacional de La Pampa and the Incentive Program from MINCyT (Argentina). Moreover, this research is partially funded by the Universidad de Malaga; under grant PID 2020-116727RB-I00 (HUmove) funded by MCIN/AEI/10.13039/501100011033; and TAILOR ICT-48 Network (No 952215) funded by EU Horizon 2020 research and innovation programme.

Author information

Authors and Affiliations

Facultad de Ingeniería, Universidad Nacional de La Pampa, General Pico, Argentina
Carolina Salto & Gabriela Minetti
Conicet, Neuquén, Argentina
Carolina Salto
ITIS Software, Universidad de Málaga, wright, Spain
Enrique Alba & Gabriel Luque

Authors

Carolina Salto
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Minetti
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Alba
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Luque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriela Minetti.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Salto, C., Minetti, G., Alba, E. et al. Big optimization with genetic algorithms: Hadoop, Spark, and MPI. Soft Comput 27, 11469–11484 (2023). https://doi.org/10.1007/s00500-023-08301-x

Download citation

Accepted: 23 April 2023
Published: 20 May 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00500-023-08301-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big optimization with genetic algorithms: Hadoop, Spark, and MPI

Abstract

Access this article

Similar content being viewed by others

Big data analytics: a survey

Big data analytics in Cloud computing: an overview

A hybrid optimization algorithm for energy-aware multi-objective task scheduling in heterogeneous multiprocessor systems

Data availibility

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Big optimization with genetic algorithms: Hadoop, Spark, and MPI

Abstract

Access this article

Similar content being viewed by others

Big data analytics: a survey

Big data analytics in Cloud computing: an overview

A hybrid optimization algorithm for energy-aware multi-objective task scheduling in heterogeneous multiprocessor systems

Data availibility

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation