Sampling Methods in Genetic Programming Learners from Large Datasets: A Comparative Study

  • Hmida HmidaEmail author
  • Sana Ben Hamida
  • Amel Borgi
  • Marta Rukoz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 529)


The amount of available data for data mining and knowledge discovery continue to grow very fast with the era of Big Data. Genetic Programming algorithms (GP), that are efficient machine learning techniques, are face up to a new challenge that is to deal with the mass of the provided data. Active Sampling, already used for Active Learning, might be a good solution to improve the Evolutionary Algorithms (EA) training from very big data sets. This paper present a review of sampling techniques already used with active GP learner and discuss their ability to improve the GP training from very big data sets. A method in each sampling strategy is implemented and applied on the KDD intrusion detection problem using very close parameters. Experimental results show that sampling methods outperforms results obtained with full dataset but some of them cannot be scaled to large datasets.


  1. 1.
    CGP: Cartesian gp website,
  2. 2.
    Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994)Google Scholar
  3. 3.
    Curry, R., Heywood, M.: Towards efficient training on large datasets for genetic programming. In: Tawfik, A.Y., Goodwin, S.D. (eds.) AI 2004. LNCS (LNAI), vol. 3060, pp. 161–174. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-24840-8_12 CrossRefGoogle Scholar
  4. 4.
    Curry, R., Lichodzijewski, P., Heywood, M.I.: Scaling genetic programming to large datasets using hierarchical dynamic subset selection. IEEE Trans. Syst. Man Cybern. Part B 37(4), 1065–1073 (2007)Google Scholar
  5. 5.
    Gathercole, C.: An Investigation of Supervised Learning in Genetic Programming. University of Edinburgh, Thesis (1998)Google Scholar
  6. 6.
    Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in Genetic Programming. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994). doi: 10.1007/3-540-58484-6_275 CrossRefGoogle Scholar
  7. 7.
    Hunt, R., Johnston, M., Browne, W., Zhang, M.: Sampling methods in genetic programming for classification with unbalanced data. In: Li, J. (ed.) AI 2010. LNCS (LNAI), vol. 6464, pp. 273–282. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-17432-2_28 CrossRefGoogle Scholar
  8. 8.
    Iba, H.: Bagging, boosting, and bloating in genetic programming. In: The 1st Annual Conference on Genetic and Evolutionary Computation, Proceedings of GECCO 1999, vol. 2, pp. 1053–1060. Morgan Kaufmann, San Francisco (1999)Google Scholar
  9. 9.
    Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection. Stat. Comput. 4(2), 87–112 (1994)CrossRefGoogle Scholar
  10. 10.
    Lasarczyk, C., Dittrich, P., Banzhaf, W.: Dynamic subset selection based on a fitness case topology. Evol. Comput. 12(2), 223–242 (2004)CrossRefGoogle Scholar
  11. 11.
  12. 12.
    Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000). doi: 10.1007/978-3-540-46239-2_9 CrossRefGoogle Scholar
  13. 13.
    Nordin, P., Banzhaf, W.: An on-line method to evolve behavior and to control a miniature robot in real time with genetic programming. Adaptive Behav. 5(2), 107–140 (1997)CrossRefGoogle Scholar
  14. 14.
    Teller, A., David, A.: Automatically choosing the number of fitness cases: the rational allocation of trials. In: Genetic Programming 1997: Proceedings of the Second Annual Conference, pp. 321–328. Morgan Kaufmann (1997)Google Scholar
  15. 15.
  16. 16.
    Zhang, B.-T., Cho, D.-Y.: Genetic programming with active data selection. In: McKay, B., Yao, X., Newton, C.S., Kim, J.-H., Furuhashi, T. (eds.) SEAL 1998. LNCS (LNAI), vol. 1585, pp. 146–153. Springer, Heidelberg (1999). doi: 10.1007/3-540-48873-1_20 CrossRefGoogle Scholar
  17. 17.
    Zhang, B.T., Joung, J.G.: Genetic programming with incremental data inheritance. In: The Genetic and Evolutionary Computation Conference, Proceedings, vol. 2, pp. 1217–1224. Morgan Kaufmann, Orlando (1999)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Hmida Hmida
    • 1
    • 2
    Email author
  • Sana Ben Hamida
    • 2
  • Amel Borgi
    • 1
  • Marta Rukoz
    • 2
  1. 1.Université Tunis El Manar, LIPAHTunisTunisia
  2. 2.Université Paris Dauphine, PSL Research University, CNRS, UMR 7243, LAMSADEParisFrance

Personalised recommendations