FlexGP

Veeramachaneni, Kalyan; Arnaldo, Ignacio; Derby, Owen; O’Reilly, Una-May

doi:10.1007/s10723-014-9320-9

FlexGP

Cloud-Based Ensemble Learning with Genetic Programming for Large Regression Problems

Published: 18 November 2014

Volume 13, pages 391–407, (2015)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Kalyan Veeramachaneni¹,
Ignacio Arnaldo¹,
Owen Derby¹ &
…
Una-May O’Reilly¹

413 Accesses
20 Citations
Explore all metrics

Abstract

We describe FlexGP, the first Genetic Programming system to perform symbolic regression on large-scale datasets on the cloud via massive data-parallel ensemble learning. FlexGP provides a decentralized, fault tolerant parallelization framework that runs many copies of Multiple Regression Genetic Programming, a sophisticated symbolic regression algorithm, on the cloud. Each copy executes with a different sample of the data and different parameters. The framework can create a fused model or ensemble on demand as the individual GP learners are evolving. We demonstrate our framework by deploying 100 independent GP instances in a massive data-parallel manner to learn from a dataset composed of 515K exemplars and 90 features, and by generating a competitive fused model in less than 10 minutes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Friese, M., Flasch, O., Vladislavleva, K., Bartz-Beielstein, T., Mersmann, O., Naujoks, B., Stork, J., Zaefferer, M.: Ensemble-based model selection for smart metering data. In: Proceedings of the 22nd Workshop Computational Intelligence, pp. 215–227. Dortmund, Germany (2012)
Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009)
Article Google Scholar
Choudhury, A., Nair, P.B., Keane, A.J., et al.: A data parallel approach for large-scale gaussian process modeling. In: Proceedings of the Second SIAM International Conference on Data Mining, pp 95–111. SIAM (2002)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B 58, 267–288 (1994)
MathSciNet Google Scholar
Arnaldo, I., Krawiec, K., O’Reilly, U.M.: Multiple regression genetic programming. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation, GECCO ’14, pp 879–886. ACM, New York (2014)
Vladislavleva, E.: Model-based problem solving through symbolic regression via pareto genetic programming. Ph.D. thesis, Tilburg University, Tilburg, the Netherlands (2008)
Google Scholar
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002). doi:10.1109/4235.996017
Article Google Scholar
Ganjisaffar, Y.: Lasso4j. https://code.google.com/p/lasso4j/ (2014)
Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Article Google Scholar
Veeramachaneni, K., Derby, O., Sherry, D., O’Reilly, U.M.: Learning regression ensembles with genetic programming at scale. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, GECCO ’13, pp 1117–1124. ACM, New York (2013)
Yang, Y.: Adaptive regression by mixing. J. Am. Stat. Assoc. 96(454), 574–588 (2001)
Article MATH Google Scholar
Derby, O: FlexGP: a scalable system for factored learning in the cloud. Master’s thesis, Massachusetts Institute of Technology (2013)
Jelasity, M., Montresor, A., Babaoglu, O.: Gossiping in distributed systems. Comput. Netw. 53(13), 2321 (2009). doi:10.1016/j.comnet.2009.03.013
Article MATH Google Scholar
Langford, J.: Vowpal wabbit. http://hunch.net/vw/ (2014)
Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777–801 (2009)
MathSciNet MATH Google Scholar
MathWorks: Neural network toolbox. http://www.mathworks.com/products/neural-network/ (2014)
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) Genetic Programming. Lecture Notes in Computer Science, vol. 2610, pp 275–299. Springer, Berlin / Heidelberg (2003)
Google Scholar
Vladislavleva, C., Smits, G.: Symbolic regression via genetic programming. Final Thesis for Dow Benelux BV (2005)
Silva, S., Dignum, S., Vanneschi, L.: Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program Evolvable Mach. 13(2), 197–238 (2012)
Article Google Scholar
Eureqa desktop: http://www.nutonian.com/products/eureqa/ (2014)
Amazon web services (AWS): http://aws.amazon.com/ (2014)
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011) (2011)
Sherry, D., Veeramachaneni, K., McDermott, J., O’Reilly, U.M.: Flex-GP: genetic programming on the cloud. In: Chio, C.D., Agapitos, A., Cagnoni, S., Cotta, C., Vega, F.F.d., Caro, G.A.D., Drechsler, R., Ekart, A., Esparcia- Alcazar, A.I., Farooq, M., Langdon, W.B., Merelo- Guervos, J.J., Preuss, M., Richter, H., Silva, S., Simes, A., Squillero, G., Tarantino, E., Tettamanzi, A.G.B., Togelius, J., Urquhart, N., Uyar, A., Yannakakis, G.N. (eds.) Applications of Evolutionary Computation no. 7248 in Lecture Notes in Computer Science, pp. 477–486. Springer, Berlin Heidelberg (2012)
Sherry, D.J.: FlexGP 2.0: multiple levels of parallelism in distributed machine learning via genetic programming. Master’s thesis, Massachusetts Institute of Technology (2013)
Fernández, F., Tomassini, M., Vanneschi, L.: An empirical study of multipopulation genetic programming. Genet. Program Evolvable Mach. 4(1), 21–51 (2003). doi:10.1023/A:1021873026259
Article MATH Google Scholar
Fazenda, P., McDermott, J., O’Reilly, U.M.: A library to run evolutionary algorithms in the cloud using MapReduce. In: Chio, C., Agapitos, A., Cagnoni, S., Cotta, C., Vega, F., Caro, G., Drechsler, R., Ekárt, A., Esparcia-Alcázar, A., Farooq, M., Langdon, W., Merelo-Guervós, J., Preuss, M., Richter, H., Silva, S., Simes, A., Squillero, G., Tarantino, E., Tettamanzi, A., Togelius, J., Urquhart, N., Uyar, A., Yannakakis, G. (eds.) Applications of Evolutionary Computation. Lecture Notes in Computer Science, Vol. 7248, pp 416– 425. Springer, Berlin Heidelberg (2012)
Wang, S., Gao, B.J., Wang, K., Lauw, H.W.: Parallel learning to rank for information retrieval. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, pp 1083–1084. ACM, New York (2011)
Verma, A., Llora, X., Goldberg, D., Campbell, R.: Scaling genetic algorithms using MapReduce. In: Intelligent Systems Design and Applications, 2009. ISDA ’09. Ninth International Conference on, pp 13–18 (2009)
Verma, A., Llora, X., Venkataraman, S., Goldberg, D., Campbell, R.: Scaling eCGA model building via data-intensive computing. In: Evolutionary Computation (CEC), 2010 IEEE Congress on, pp 1–8 (2010)
Huang, D.W., Lin, J.: Scaling populations of a genetic algorithm for job shop scheduling problems using MapReduce. In: Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, pp 780–785 (2010)
Jiménez Laredo, J., Lombrańa González, D., Fernández de Vega, F., García Arenas, M., Merelo Guervós, J.: A peer-to-peer approach to genetic programming. In: Silva, S., Foster, J., Nicolau, M., Machado, P., Giacobini, M. (eds.) Genetic programming. Lecture Notes in Computer Science, Vol. 6621, pp 108–117. Springer, Berlin Heidelberg (2011)
Laredo, J., Eiben, A., Steen, M., Merelo, J.: Evag: a scalable peer-to-peer evolutionary algorithm. Genet. Program Evolvable Mach. 11, 227–246 (2010). doi:10.1007/s10710-009-9096-z
Article Google Scholar
Folino, G., Forestiero, A., Spezzano, G.: A jxta based asynchronous peer-to-peer implementation of genetic programming. J. Softw. 1(2), 12–23 (2006)
Article Google Scholar
Perrone, M.P., Cooper, L.N.: When networks disagree: Ensemble methods for hybrid neural networks. In: Mammone, R. (ed.) Neural Networks for Speech and Image processing, pp 126–142. Chapman and Hall (1993)
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. Adv. Neural Inf. Process. Syst. 7, 231–238 (1995)
Google Scholar
Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI’96, vol. 1, pp 725–730. AAAI Press (1996)
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40(2), 139– 157 (2000)
Article Google Scholar
Dietterich, T.: Ensemble methods in machine learning In: Multiple Classifier Systems. Lecture Notes in Computer Science, Vol. 1857, pp 1–15. Springer, Berlin Heidelberg (2000)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Machine learning international conference, pp 148–156. Morgan Kauffman Publishers, Inc. (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Imamura, K., Soule, T., Heckendorn, R., Foster, J.: Behavioral diversity and a probabilistically optimal GP ensemble. Genet. Program Evolvable Mach. 4(3), 235–253 (2003)
Article Google Scholar
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2013). doi:10.1109/TEVC.2012.2199119
Article Google Scholar
Langdon, W., Barrett, S., Buxton, B.: Combining decision trees and neural networks for drug discovery. In: Foster, J., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A. (eds.) Genetic Programming. Lecture Notes in Computer Science, Vol. 2278, pp 60–70. Springer, Berlin Heidelberg (2002)
Johansson, U., Löfström, T., König, R., Niklasson, L.: Genetically evolved trees representing ensembles. In: Artificial Intelligence and Soft Computing–ICAISC 2006, pp 613–22 (2006)
Folino, G., Pizzuti, C., Spezzano, G.: Mining distributed evolving data streams using fractal GP ensembles. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A. (eds.) Genetic Programming. Lecture Notes in Computer Science, Vol. 4445, pp 160–169. Springer, Berlin Heidelberg (2007)
Lanzi, P.L.: XCS with stack-based genetic programming. In: Sarker, R., Reynolds, R., Abbass, H., Tan, K.C., McKay, B., Essam, D., Gedeon, T. (eds.) Proceedings of the 2003 Congress on Evolutionary Computation CEC2003, pp 1186–1191. IEEE Press, Canberra (2003)
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Article Google Scholar
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., Honavar, V., Jakiela, M., Smith, R.E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2, pp 1053–1060. Morgan Kaufmann, Orlando, Florida (1999)
Veeramachaneni, K., Vladislavleva, K., Burland, M., Parcon, J., O’Reilly, U.M.: Evolutionary optimization of flavors. In: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp 1291–1298. ACM (2010)
Kotanchek, M., Smits, G., Vladislavleva, E.: Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo, R., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice V. Genetic and Evolutionary Computation Series, pp 201–220. Springer, US (2008)

Download references

Author information

Authors and Affiliations

Massachusetts Institute of Technology, 32, Vassar Street, Cambridge, MA, 02139, USA
Kalyan Veeramachaneni, Ignacio Arnaldo, Owen Derby & Una-May O’Reilly

Authors

Kalyan Veeramachaneni
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Arnaldo
View author publications
You can also search for this author in PubMed Google Scholar
Owen Derby
View author publications
You can also search for this author in PubMed Google Scholar
Una-May O’Reilly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ignacio Arnaldo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Veeramachaneni, K., Arnaldo, I., Derby, O. et al. FlexGP. J Grid Computing 13, 391–407 (2015). https://doi.org/10.1007/s10723-014-9320-9

Download citation

Received: 23 June 2014
Accepted: 17 October 2014
Published: 18 November 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10723-014-9320-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FlexGP

Abstract

Access this article

Similar content being viewed by others

Flash: A GP-GPU Ensemble Learning System for Handling Large Datasets

Cloud Driven Design of a Distributed Genetic Programming Platform

Speeding up Genetic Programming Based Symbolic Regression Using GPUs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FlexGP

Abstract

Access this article

Similar content being viewed by others

Flash: A GP-GPU Ensemble Learning System for Handling Large Datasets

Cloud Driven Design of a Distributed Genetic Programming Platform

Speeding up Genetic Programming Based Symbolic Regression Using GPUs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation