Skip to main content

When Huge Is Routine: Scaling Genetic Algorithms and Estimation of Distribution Algorithms via Data-Intensive Computing

  • Chapter
Parallel and Distributed Computational Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 269))

Abstract

Data-intensive computing has emerged as a key player for processing large volumes of data exploiting massive parallelism. Data-intensive computing frameworks have shown that terabytes and petabytes of data can be routinely processed. However, there has been little effort to explore how data-intensive computing can help scale evolutionary computation. In this book chapter we explore how evolutionary computation algorithms can be modeled using two different data-intensive frameworks—Yahoo!’s Hadoop and NCSA’s Meandre. We present a detailed step-by-step description of how three different evolutionary computation algorithms, having different execution profiles, can be translated into the data-intensive computing paradigms. Results show that (1) Hadoop is an excellent choice to push evolutionary computation boundaries on very large problems, and (2) that transparent Meandre linear speedups are possible without changing the underlying data-intensive flow thanks to its inherent parallel processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alba, E. (ed.): Parallel Metaheuristics. Wiley, Chichester (2007)

    Google Scholar 

  2. Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS Conference Proceedings, pp. 483–485 (1967)

    Google Scholar 

  3. Baluja, S.: Population-based incremental learning: A method of integrating genetic search based function optimization and competitive learning. Tech. Rep. CMU-CS-94-163, Carnegie Mellon University (1994)

    Google Scholar 

  4. Baluja, S., Caruana, R.: Removing the genetics from the standard genetic algorithm. Tech. Rep. CMU-CS-95-141, Carnegie Mellon University (1995)

    Google Scholar 

  5. Beckett, D.: RDF/XM Syntax Specification (Revised). W3C Recommendation 10 February 2004, The World Wide Web Consortium (2004)

    Google Scholar 

  6. Beynon, M.D., Kurc, T., Sussman, A., Saltz, J.: Design of a framework for data-intensive wide-area applications. In: HCW 2000: Proceedings of the 9th Heterogeneous Computing Workshop, p. 116. IEEE Computer Society, Washington (2000)

    Chapter  Google Scholar 

  7. Brickley, D., Guha, R.: RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10 February 2004, The World Wide Web Consortium (2004)

    Google Scholar 

  8. Cantú-Paz, E.: Efficient and Accurate Parallel Genetic Algorithms. Springer, Heidelberg (2000)

    MATH  Google Scholar 

  9. De Jong, K., Sarma, J.: On decentralizing selection algorithms. In: Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 17–23. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  10. de la Ossa, L., Sastry, K., Lobo, F.G.: Extended compact genetic algorithm in C++: Version 1.1. IlliGAL Report No. 2006013, University of Illinois at Urbana-Champaign, Urbana, IL (2006)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation (2004)

    Google Scholar 

  12. Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: ESCIENCE 2008: Proceedings of the 2008 Fourth IEEE International Conference on eScience, pp. 277–284. IEEE Computer Society, Washington (2008), http://dx.doi.org/10.1109/eScience.2008.59

    Chapter  Google Scholar 

  13. Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  14. Foster, I.: The virtual data grid: A new model and architecture for data-intensive collaboration. In: 15th International Conference on Scientific and Statistical Database Management, p. 11 (2003)

    Google Scholar 

  15. Giacobini, M., Tomassini, M., Tettamanzi, A.: Takeover time curves in random and small-world structured populations. In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, pp. 1333–1340. ACM, New York (2005), http://doi.acm.org/10.1145/1068009.1068224

    Chapter  Google Scholar 

  16. Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading, MA (1989)

    MATH  Google Scholar 

  17. Goldberg, D.E.: The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Academic Publishers, Norwell (2002)

    MATH  Google Scholar 

  18. Goldberg, D.E., Deb, K., Clark, J.H.: Genetic algorithms, noise, and the sizing of populations. Complex Systems 6, 333–362 (1992); (Also IlliGAL Report No. 91010)

    MATH  Google Scholar 

  19. Goldberg, D.E., Korb, B., Deb, K.: Messy genetic algorithms: Motivation, analysis, and first results. Complex Systems 3(5), 493–530 (1989)

    MATH  MathSciNet  Google Scholar 

  20. Harik, G., Cantú-Paz, E., Goldberg, D.E., Miller, B.L.: The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolutionary Computation 7(3), 231–253 (1999); (Also IlliGAL Report No. 96004)

    Article  Google Scholar 

  21. Harik, G., Lobo, F., Goldberg, D.E.: The compact genetic algorithm. In: Proceedings of the IEEE International Conference on Evolutionary Computation, pp. 523–528 (1998); (Also IlliGAL Report No. 97006)

    Google Scholar 

  22. Harik, G.R., Lobo, F.G., Sastry, K.: Linkage learning via probabilistic modeling in the ECGA. In: Pelikan, M., Sastry, K., Cantú-Paz, E. (eds.) Scalable Optimization via Probabilistic Modeling: From Algorithms to Applications, ch. 3. Springer, Berlin (in press) (Also IlliGAL Report No. 99010)

    Google Scholar 

  23. Jin, C., Vecchiola, C., Buyya, R.: MRPGA: An extension of mapreduce for parellelizing genetic algorithms. In: Press, I. (ed.) IEEE Fouth International Conference on eScience 2008, pp. 214–221 (2008)

    Google Scholar 

  24. Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms. Kluwer Academic Publishers, Boston (2002)

    MATH  Google Scholar 

  25. Lim, D., Ong, Y.S., Jin, Y., Sendhoff, B., Lee, B.S.: Efficient hierarchical parallel genetic algorithms using grid computing. Future Gener. Comput. Syst. 23(4), 658–670 (2007), http://dx.doi.org/10.1016/j.future.2006.10.008

    Article  Google Scholar 

  26. Lin, S.C., Punch, W.F., Goodman, E.D.: Coarse-grain parallel genetic algorithms: Categorization and new approach. In: Proceeedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, pp. 28–37 (1994)

    Google Scholar 

  27. Llorà, X.: E2K: evolution to knowledge. SIGEVOlution 1(3), 10–17 (2006), http://doi.acm.org/10.1145/1181964.1181966

    Article  Google Scholar 

  28. Llorà, X.: Data-intensive computing for competent genetic algorithms: A pilot study using meandre. In: Proceedings of the 2009 conference on Genetic and evolutionary computation (GECCO 2009). ACM Press, Montreal (in press, 2009)

    Google Scholar 

  29. Llorà, X.: Genetic Based Machine Learning using Fine-grained Parallelism for Data Mining. Ph.D. thesis, Enginyeria i Arquitectura La Salle. Ramon Llull University, Barcelona (February 2002)

    Google Scholar 

  30. Llorà, X., Ács, B., Auvil, L., Capitanu, B., Welge, M., Goldberg, D.E.: Meandre: Semantic-driven data-intensive flows in the clouds. In: Proceedings of the 4th IEEE International Conference on e-Science, pp. 238–245. IEEE Press, Los Alamitos (2008)

    Google Scholar 

  31. Maruyama, T., Hirose, T., Konagaya, A.: A fine-grained parallel genetic algorithm for distributed parallel systems. In: Proceedings of the 5th International Conference on Genetic Algorithms, pp. 184–190. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  32. Mattmann, C.A., Crichton, D.J., Medvidovic, N., Hughes, S.: A software architecture-based framework for highly distributed and data intensive scientific applications. In: ICSE 2006: Proceedings of the 28th international conference on Software engineering, pp. 721–730. ACM, New York (2006), http://doi.acm.org/10.1145/1134285.1134400

    Chapter  Google Scholar 

  33. Morrison, J.P.: Flow-Based Programming: A New Approach to Application Development. Van Nostrand Reinhold (1994)

    Google Scholar 

  34. Mühlenbein, H.: The equation for response to selection and its use for prediction. Evolutionary Computation 5(3), 303–346 (1997)

    Article  Google Scholar 

  35. Mühlenbein, H., Paaß, G.: From recombination of genes to the estimation of distributions I. Binary parameters. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 178–187. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  36. Pelikan, M., Lobo, F., Goldberg, D.E.: A survey of optimization by building and using probabilistic models. Computational Optimization and Applications 21, 5–20 (2002); (Also IlliGAL Report No. 99018)

    Article  MATH  MathSciNet  Google Scholar 

  37. Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multicore and multiprocessors systems. In: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (2007)

    Google Scholar 

  38. Sarma, J., De Jong, K.: An analysis of local selection algorithms in a spatially structured evolutionary algorithm. In: Proceedings of the Seventh International Conference on Genetic Algorithms, pp. 181–186. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  39. Sarma, J., De Jong, K.: Selection pressure and performance in spatially distributed evolutionary algorithms. In: Proceedings of the World Congress on Computatinal Intelligence, pp. 553–557. IEEE Press, Los Alamitos (1998)

    Google Scholar 

  40. Sastry, K., Goldberg, D.E.: Designing competent mutation operators via probabilistic model building of neighborhoods. In: Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2, pp. 114–125 (2004); Also IlliGAL Report No. 2004006

    Google Scholar 

  41. Sastry, K., Goldberg, D.E., Llorà, X.: Towards billion-bit optimization via a parallel estimation of distribution algorithm. In: GECCO 2007: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pp. 577–584. ACM Press, New York (2007), http://doi.acm.org/10.1145/1276958.1277077

    Chapter  Google Scholar 

  42. Sywerda, G.: Uniform crossover in genetic algorithms. In: Proceedings of the third international conference on Genetic algorithms, pp. 2–9. Morgan Kaufmann Publishers Inc., San Francisco (1989)

    Google Scholar 

  43. Uysal, M., Kurc, T.M., Sussman, A., Saltz, J.: A performance prediction framework for data intensive applications on large scale parallel machines. In: O’Hallaron, D.R. (ed.) LCR 1998. LNCS, vol. 1511, pp. 243–258. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  44. Weibel, S., Kunze, J., Lagoze, C., Wolf, M.: Dublin Core Metadata for Resource Discovery. Tech. Rep. RFC2413, The Dublin Core Metadata Initiative (2008)

    Google Scholar 

  45. Welge, M., Auvil, L., Shirk, A., Bushell, C., Bajcsy, P., Cai, D., Redman, T., Clutter, D., Aydt, R., Tcheng, D.: Data to Knowledge (D2K). Tech. rep., Technical Report Automated Learning Group, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Llorà, X., Verma, A., Campbell, R.H., Goldberg, D.E. (2010). When Huge Is Routine: Scaling Genetic Algorithms and Estimation of Distribution Algorithms via Data-Intensive Computing. In: de Vega, F.F., Cantú-Paz, E. (eds) Parallel and Distributed Computational Intelligence. Studies in Computational Intelligence, vol 269. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10675-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10675-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10674-3

  • Online ISBN: 978-3-642-10675-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics