Journal of Grid Computing

, Volume 13, Issue 3, pp 391–407

FlexGP

Cloud-Based Ensemble Learning with Genetic Programming for Large Regression Problems
  • Kalyan Veeramachaneni
  • Ignacio Arnaldo
  • Owen Derby
  • Una-May O’Reilly
Article

DOI: 10.1007/s10723-014-9320-9

Cite this article as:
Veeramachaneni, K., Arnaldo, I., Derby, O. et al. J Grid Computing (2015) 13: 391. doi:10.1007/s10723-014-9320-9

Abstract

We describe FlexGP, the first Genetic Programming system to perform symbolic regression on large-scale datasets on the cloud via massive data-parallel ensemble learning. FlexGP provides a decentralized, fault tolerant parallelization framework that runs many copies of Multiple Regression Genetic Programming, a sophisticated symbolic regression algorithm, on the cloud. Each copy executes with a different sample of the data and different parameters. The framework can create a fused model or ensemble on demand as the individual GP learners are evolving. We demonstrate our framework by deploying 100 independent GP instances in a massive data-parallel manner to learn from a dataset composed of 515K exemplars and 90 features, and by generating a competitive fused model in less than 10 minutes.

Keywords

Cloud computing Ensemble learning Genetic programming Symbolic regression 

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Kalyan Veeramachaneni
    • 1
  • Ignacio Arnaldo
    • 1
  • Owen Derby
    • 1
  • Una-May O’Reilly
    • 1
  1. 1.Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations