Maintenance of a Long Running Distributed Genetic Programming System for Solving Problems Requiring Big Data

  • Babak Hodjat
  • Erik Hemberg
  • Hormoz Shahrzad
  • Una-May O’Reilly
Part of the Genetic and Evolutionary Computation book series (GEVO)


We describe a system, ECStar, that outstrips many scaling aspects of extant genetic programming systems. One instance in the domain of financial strategies has executed for extended durations (months to years) on nodes distributed around the globe. ECStar system instances are almost never stopped and restarted, though they are resource elastic. Instead they are interactively redirected to different parts of the problem space and updated with up-to-date learning. Their non-reproducibility (i.e. single “play of the tape” process) due to their complexity makes them similar to real biological systems. In this contribution we focus upon how ECStar introduces a provocative, important, new paradigm for GP by its sheer size and complexity. ECStar’s scale, volunteer compute nodes and distributed hub-and-spoke design have implications on how a multi-node instance is managed. We describe the set up, deployment, operation and update of an instance of such a large, distributed and long running system. Moreover, we outline how ECStar is designed to allow manual guidance and re-alignment of its evolutionary search trajectory.


Learning classifier system Cloud scale Distributed Big data 



We acknowledge the generous support of Li Ka-Shing Foundation.


  1. Anderson D (2004) BOINC: a system for public-resource computing and storage. In: Proceedings of fifth international workshop on grid computing, Pittsburg, 2004. IEEE/ACM, pp 4–10. doi:10.1109/GRID.2004.14Google Scholar
  2. Bedau MA (2003) Artificial life: organization, adaptation and complexity from the bottom up. Trends Cogn Sci 7(11):505–512CrossRefGoogle Scholar
  3. Bennett III FH, Koza JR, Shipman J, Stiffelman O (1999) Building a parallel computer system for $18,000 that performs a half peta-flop per day. In: Proceedings of the genetic and evolutionary computation conference, Shanghai, vol 2, pp 1484–1490Google Scholar
  4. Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms, vol 1. Kluwer, BostonzbMATHGoogle Scholar
  5. Crainic TG, Toulouse M (2010) Parallel meta-heuristics. In: Handbook of metaheuristics. Springer, Berlin/New York, pp 497–541Google Scholar
  6. Desell T, Anderson DP, Magdon-Ismail M, Newberg H, Szymanski B, Varela CA (2010) An analysis of massively distributed evolutionary algorithms. In: 2010 IEEE World Congress on computational intelligence, Barcelona, pp 18–23Google Scholar
  7. de Vega FF, Olague G, Trujillo L, Lombraña González D (2012) Customizable execution environments for evolutionary computation using boinc+ virtualization. Nat Comput 1–15Google Scholar
  8. Gonzalez DL, Laredo JLJ, Vega FF, Guervas JJM (2012) Characterizing fault-tolerance in evolutionary algorithms. In: Fernandez de Vega F, Hidalgo Perez JI, Lanchares J (eds) Parallel architectures and bioinspired algorithms, studies in computational intelligence, vol 415. Springer, Berlin/Heidelberg, pp 77–99. doi:10.1007/978-3-642-28789-3-4,
  9. Hemberg E, Wagy M, Dernoncourt F, Veeramachaneni K, O’Reilly UM (2013a) Efficient training set use for blood pressure prediction in a large scale learning classifier system. In: Sixteenth international workshop on learning classifiers systems, Amsterdam. ACM, New YorkGoogle Scholar
  10. Hemberg E, Wagy M, Dernoncourt F, Veeramachaneni K, O’Reilly UM (2013b) Imprecise selection and fitness approximation in a large-scale evolutionary rule based system for blood pressure prediction. In: Proceedings of the fifthteenth international conference on genetic and evolutionary computation conference – GBML, GECCO’13, Amsterdam. ACM, New YorkGoogle Scholar
  11. Hodjat B, Shahrzad H (2012) Introducing an age-varying fitness estimation function. In: Genetic programming theory and practice x. Springer, New YorkGoogle Scholar
  12. Langdon WB (2012) Distilling genechips with GP on the emerald GPU supercomputer. ACM SIGEVOlution 6(1):16–22CrossRefGoogle Scholar
  13. Merelo J, Mora A, Fernandes C, Esparcia-Alcazar AI, Laredo JL (2012) Pool vs. island based evolutionary algorithms: an initial exploration. In: 2012 seventh international conference on P2P, parallel, grid, cloud and internet computing (3PGCIC), Victoria. IEEE, pp 19–24Google Scholar
  14. O’Reilly UM, Wagy M, Hodjat B (2012) Ec-Star: a massive-scale, hub and spoke, distributed genetic programming system. In: Genetic programming theory and practice x. Springer, New YorkGoogle Scholar
  15. Rivest R (1987) Learning decision lists. Mach Learn 2(3):229–246Google Scholar
  16. Scheibenpflug A, Wagner S, Kronberger G, Affenzeller M (2012) Heuristiclab hive – an open source environment for parallel and distributed execution of heuristic optimization algorithms. In: 1st Australian conference on the applications of systems engineering ACASE’12, Sydney, p 63Google Scholar
  17. Smaoui M, Garbey M (2013) Improving volunteer computing scheduling for evolutionary algorithms. Future Gener Comput Syst 29(1):1–14CrossRefGoogle Scholar
  18. Tomassini M (2005) Spatially structured evolutionary algorithms. Springer, Berlin/New YorkzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Babak Hodjat
    • 1
  • Erik Hemberg
    • 2
  • Hormoz Shahrzad
    • 1
  • Una-May O’Reilly
    • 2
  1. 1.Genetic FinanceSan FranciscoUSA
  2. 2.ALFA Group, CSAIL, MITCambridgeUSA

Personalised recommendations