When Huge Is Routine: Scaling Genetic Algorithms and Estimation of Distribution Algorithms via Data-Intensive Computing

Llorà, Xavier; Verma, Abhishek; Campbell, Roy H.; Goldberg, David E.

doi:10.1007/978-3-642-10675-0_2

Xavier Llorà⁴,
Abhishek Verma⁵,
Roy H. Campbell⁵ &
…
David E. Goldberg⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 269))

654 Accesses
9 Citations

Abstract

Data-intensive computing has emerged as a key player for processing large volumes of data exploiting massive parallelism. Data-intensive computing frameworks have shown that terabytes and petabytes of data can be routinely processed. However, there has been little effort to explore how data-intensive computing can help scale evolutionary computation. In this book chapter we explore how evolutionary computation algorithms can be modeled using two different data-intensive frameworks—Yahoo!’s Hadoop and NCSA’s Meandre. We present a detailed step-by-step description of how three different evolutionary computation algorithms, having different execution profiles, can be translated into the data-intensive computing paradigms. Results show that (1) Hadoop is an excellent choice to push evolutionary computation boundaries on very large problems, and (2) that transparent Meandre linear speedups are possible without changing the underlying data-intensive flow thanks to its inherent parallel processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alba, E. (ed.): Parallel Metaheuristics. Wiley, Chichester (2007)
Google Scholar
Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS Conference Proceedings, pp. 483–485 (1967)
Google Scholar
Baluja, S.: Population-based incremental learning: A method of integrating genetic search based function optimization and competitive learning. Tech. Rep. CMU-CS-94-163, Carnegie Mellon University (1994)
Google Scholar
Baluja, S., Caruana, R.: Removing the genetics from the standard genetic algorithm. Tech. Rep. CMU-CS-95-141, Carnegie Mellon University (1995)
Google Scholar
Beckett, D.: RDF/XM Syntax Specification (Revised). W3C Recommendation 10 February 2004, The World Wide Web Consortium (2004)
Google Scholar
Beynon, M.D., Kurc, T., Sussman, A., Saltz, J.: Design of a framework for data-intensive wide-area applications. In: HCW 2000: Proceedings of the 9th Heterogeneous Computing Workshop, p. 116. IEEE Computer Society, Washington (2000)
Chapter Google Scholar
Brickley, D., Guha, R.: RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10 February 2004, The World Wide Web Consortium (2004)
Google Scholar
Cantú-Paz, E.: Efficient and Accurate Parallel Genetic Algorithms. Springer, Heidelberg (2000)
MATH Google Scholar
De Jong, K., Sarma, J.: On decentralizing selection algorithms. In: Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 17–23. Morgan Kaufmann, San Francisco (1995)
Google Scholar
de la Ossa, L., Sastry, K., Lobo, F.G.: Extended compact genetic algorithm in C++: Version 1.1. IlliGAL Report No. 2006013, University of Illinois at Urbana-Champaign, Urbana, IL (2006)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation (2004)
Google Scholar
Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: ESCIENCE 2008: Proceedings of the 2008 Fourth IEEE International Conference on eScience, pp. 277–284. IEEE Computer Society, Washington (2008), http://dx.doi.org/10.1109/eScience.2008.59
Chapter Google Scholar
Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, Reading (1995)
MATH Google Scholar
Foster, I.: The virtual data grid: A new model and architecture for data-intensive collaboration. In: 15th International Conference on Scientific and Statistical Database Management, p. 11 (2003)
Google Scholar
Giacobini, M., Tomassini, M., Tettamanzi, A.: Takeover time curves in random and small-world structured populations. In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, pp. 1333–1340. ACM, New York (2005), http://doi.acm.org/10.1145/1068009.1068224
Chapter Google Scholar
Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading, MA (1989)
MATH Google Scholar
Goldberg, D.E.: The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Academic Publishers, Norwell (2002)
MATH Google Scholar
Goldberg, D.E., Deb, K., Clark, J.H.: Genetic algorithms, noise, and the sizing of populations. Complex Systems 6, 333–362 (1992); (Also IlliGAL Report No. 91010)
MATH Google Scholar
Goldberg, D.E., Korb, B., Deb, K.: Messy genetic algorithms: Motivation, analysis, and first results. Complex Systems 3(5), 493–530 (1989)
MATH MathSciNet Google Scholar
Harik, G., Cantú-Paz, E., Goldberg, D.E., Miller, B.L.: The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolutionary Computation 7(3), 231–253 (1999); (Also IlliGAL Report No. 96004)
Article Google Scholar
Harik, G., Lobo, F., Goldberg, D.E.: The compact genetic algorithm. In: Proceedings of the IEEE International Conference on Evolutionary Computation, pp. 523–528 (1998); (Also IlliGAL Report No. 97006)
Google Scholar
Harik, G.R., Lobo, F.G., Sastry, K.: Linkage learning via probabilistic modeling in the ECGA. In: Pelikan, M., Sastry, K., Cantú-Paz, E. (eds.) Scalable Optimization via Probabilistic Modeling: From Algorithms to Applications, ch. 3. Springer, Berlin (in press) (Also IlliGAL Report No. 99010)
Google Scholar
Jin, C., Vecchiola, C., Buyya, R.: MRPGA: An extension of mapreduce for parellelizing genetic algorithms. In: Press, I. (ed.) IEEE Fouth International Conference on eScience 2008, pp. 214–221 (2008)
Google Scholar
Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms. Kluwer Academic Publishers, Boston (2002)
MATH Google Scholar
Lim, D., Ong, Y.S., Jin, Y., Sendhoff, B., Lee, B.S.: Efficient hierarchical parallel genetic algorithms using grid computing. Future Gener. Comput. Syst. 23(4), 658–670 (2007), http://dx.doi.org/10.1016/j.future.2006.10.008
Article Google Scholar
Lin, S.C., Punch, W.F., Goodman, E.D.: Coarse-grain parallel genetic algorithms: Categorization and new approach. In: Proceeedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, pp. 28–37 (1994)
Google Scholar
Llorà, X.: E2K: evolution to knowledge. SIGEVOlution 1(3), 10–17 (2006), http://doi.acm.org/10.1145/1181964.1181966
Article Google Scholar
Llorà, X.: Data-intensive computing for competent genetic algorithms: A pilot study using meandre. In: Proceedings of the 2009 conference on Genetic and evolutionary computation (GECCO 2009). ACM Press, Montreal (in press, 2009)
Google Scholar
Llorà, X.: Genetic Based Machine Learning using Fine-grained Parallelism for Data Mining. Ph.D. thesis, Enginyeria i Arquitectura La Salle. Ramon Llull University, Barcelona (February 2002)
Google Scholar
Llorà, X., Ács, B., Auvil, L., Capitanu, B., Welge, M., Goldberg, D.E.: Meandre: Semantic-driven data-intensive flows in the clouds. In: Proceedings of the 4th IEEE International Conference on e-Science, pp. 238–245. IEEE Press, Los Alamitos (2008)
Google Scholar
Maruyama, T., Hirose, T., Konagaya, A.: A fine-grained parallel genetic algorithm for distributed parallel systems. In: Proceedings of the 5th International Conference on Genetic Algorithms, pp. 184–190. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Mattmann, C.A., Crichton, D.J., Medvidovic, N., Hughes, S.: A software architecture-based framework for highly distributed and data intensive scientific applications. In: ICSE 2006: Proceedings of the 28th international conference on Software engineering, pp. 721–730. ACM, New York (2006), http://doi.acm.org/10.1145/1134285.1134400
Chapter Google Scholar
Morrison, J.P.: Flow-Based Programming: A New Approach to Application Development. Van Nostrand Reinhold (1994)
Google Scholar
Mühlenbein, H.: The equation for response to selection and its use for prediction. Evolutionary Computation 5(3), 303–346 (1997)
Article Google Scholar
Mühlenbein, H., Paaß, G.: From recombination of genes to the estimation of distributions I. Binary parameters. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 178–187. Springer, Heidelberg (1996)
Chapter Google Scholar
Pelikan, M., Lobo, F., Goldberg, D.E.: A survey of optimization by building and using probabilistic models. Computational Optimization and Applications 21, 5–20 (2002); (Also IlliGAL Report No. 99018)
Article MATH MathSciNet Google Scholar
Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multicore and multiprocessors systems. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (2007)
Google Scholar
Sarma, J., De Jong, K.: An analysis of local selection algorithms in a spatially structured evolutionary algorithm. In: Proceedings of the Seventh International Conference on Genetic Algorithms, pp. 181–186. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Sarma, J., De Jong, K.: Selection pressure and performance in spatially distributed evolutionary algorithms. In: Proceedings of the World Congress on Computatinal Intelligence, pp. 553–557. IEEE Press, Los Alamitos (1998)
Google Scholar
Sastry, K., Goldberg, D.E.: Designing competent mutation operators via probabilistic model building of neighborhoods. In: Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2, pp. 114–125 (2004); Also IlliGAL Report No. 2004006
Google Scholar
Sastry, K., Goldberg, D.E., Llorà, X.: Towards billion-bit optimization via a parallel estimation of distribution algorithm. In: GECCO 2007: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pp. 577–584. ACM Press, New York (2007), http://doi.acm.org/10.1145/1276958.1277077
Chapter Google Scholar
Sywerda, G.: Uniform crossover in genetic algorithms. In: Proceedings of the third international conference on Genetic algorithms, pp. 2–9. Morgan Kaufmann Publishers Inc., San Francisco (1989)
Google Scholar
Uysal, M., Kurc, T.M., Sussman, A., Saltz, J.: A performance prediction framework for data intensive applications on large scale parallel machines. In: O’Hallaron, D.R. (ed.) LCR 1998. LNCS, vol. 1511, pp. 243–258. Springer, Heidelberg (1998)
Chapter Google Scholar
Weibel, S., Kunze, J., Lagoze, C., Wolf, M.: Dublin Core Metadata for Resource Discovery. Tech. Rep. RFC2413, The Dublin Core Metadata Initiative (2008)
Google Scholar
Welge, M., Auvil, L., Shirk, A., Bushell, C., Bajcsy, P., Cai, D., Redman, T., Clutter, D., Aydt, R., Tcheng, D.: Data to Knowledge (D2K). Tech. rep., Technical Report Automated Learning Group, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, 1205 W. Clark Street, Urbana, IL, 61801
Xavier Llorà
Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, IL, 61801
Abhishek Verma & Roy H. Campbell
Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, 104 S. Mathews Ave, Urbana, IL, 61801
David E. Goldberg

Authors

Xavier Llorà
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Verma
View author publications
You can also search for this author in PubMed Google Scholar
Roy H. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
David E. Goldberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Extremadura, C/ Sta Teresa de Jornet, 38, Merida, Spain
Francisco Fernández de Vega
Yahoo! Inc., 701 First Avenue, 94087, Sunnyvale, CA, USA
Erick Cantú-Paz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Llorà, X., Verma, A., Campbell, R.H., Goldberg, D.E. (2010). When Huge Is Routine: Scaling Genetic Algorithms and Estimation of Distribution Algorithms via Data-Intensive Computing. In: de Vega, F.F., Cantú-Paz, E. (eds) Parallel and Distributed Computational Intelligence. Studies in Computational Intelligence, vol 269. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10675-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-10675-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10674-3
Online ISBN: 978-3-642-10675-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics