Characterizing the Effects of Random Subsampling on Lexicase Selection

Part of the Genetic and Evolutionary Computation book series (GEVO)


Lexicase selection is a proven parent-selection algorithm designed for genetic programming problems, especially for uncompromising test-based problems where many distinct test cases must all be passed. Previous work has shown that random subsampling techniques can improve lexicase selection’s problem-solving success; here, we investigate why. We test two types of random subsampling lexicase variants: down-sampled lexicase, which uses a random subset of all training cases each generation; and cohort lexicase, which collects candidate solutions and training cases into small groups for testing, reshuffling those groups each generation. We show that both of these subsampling lexicase variants improve problem-solving success by facilitating deeper evolutionary searches; that is, they allow populations to evolve for more generations (relative to standard lexicase) given a fixed number of test-case evaluations. We also demonstrate that the subsampled variants require less computational effort to find solutions, even though subsampling hinders lexicase’s ability to preserve specialists. Contrary to our expectations, we did not find any evidence of systematic loss of phenotypic diversity maintenance due to subsampling, though we did find evidence that cohort lexicase is significantly better at preserving phylogenetic diversity than down-sampled lexicase.



This research was supported by the National Science Foundation through the BEACON Center (Coop. Agreement No. DBI-0939454), a Graduate Research Fellowship to AL (Grant No. DGE-1424871), and Grant No. DEB-1655715 to CO. Michigan State University provided computational resources through the Institute for Cyber-Enabled Research.


  1. 1.
    Aenugu, S., Spector, L.: Lexicase selection in learning classifier systems. In: Proceedings of the Genetic and Evolutionary Computation Conference - GECCO 2019, pp. 356–364. ACM Press, Prague, Czech Republic (2019)Google Scholar
  2. 2.
    Curry, R., Heywood, M.: Towards efficient training on large datasets for genetic programming. In: A. Tawfik, S. Goodwin (eds.) Conference of the Canadian Society for Computational Studies of Intelligence, pp. 161–174. Springer (2004)Google Scholar
  3. 3.
    Dolson, E., Lalejini, A., Jorgensen, S., Ofria, C.: Quantifying the tape of life: Ancestry-based metrics provide insights and intuition about evolutionary dynamics. In: Artificial Life Conference Proceedings, pp. 75–82. MIT Press (2018)Google Scholar
  4. 4.
    Dolson, E.L., Banzhaf, W., Ofria, C.: Ecological theory provides insights about evolutionary computation. preprint, PeerJ Preprints (2018). URL
  5. 5.
    Ferguson, A.: FergusonAJ/gptp-2019-subsampled-lexicase: GPTP Chapter Companion (2020).,
  6. 6.
    Forstenlechner, S., Fagan, D., Nicolau, M., O’Neill, M.: Towards Understanding and Refining the General Program Synthesis Benchmark Suite with Genetic Programming. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–6. IEEE, Rio de Janeiro (2018)Google Scholar
  7. 7.
    Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in Genetic Programming. In: Y. Davidor, H.P. Schwefel, R. Maenner (eds.) Parallel Problem Solving from Nature - PPSN III, vol. 866, pp. 312–321. Springer Berlin Heidelberg, Berlin, Heidelberg (1994)CrossRefGoogle Scholar
  8. 8.
    Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.: Random sampling technique for overfitting control in genetic programming. In: A. Moraglio, S. Silva, K. Krawiec, P. Machado, C. Cotta (eds.) European Conference on Genetic ProgrammingGoogle Scholar
  9. 9.
    Helmuth, T., McPhee, N.F., Spector, L.: Effects of lexicase and tournament selection on diversity recovery and maintenance. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, pp. 983–990. ACM (2016)Google Scholar
  10. 10.
    Helmuth, T., Pantridge, E., Spector, L.: Lexicase selection of specialists. In: Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO 2019, pp. 1030–1038. ACM Press, Prague, Czech Republic (2019)Google Scholar
  11. 11.
    Helmuth, T., Spector, L.: General program synthesis benchmark suite. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1039–1046. ACM (2015)Google Scholar
  12. 12.
    Helmuth, T., Spector, L., Matheson, J.: Solving uncompromising problems with lexicase selection. IEEE Transactions on Evolutionary Computation 19(5), 630–643 (2015)CrossRefGoogle Scholar
  13. 13.
    Hernandez, J.G., Lalejini, A., Dolson, E., Ofria, C.: Random Subsampling Improves Performance in Lexicase Selection. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2019, pp. 2028–2031. ACM, New York, NY, USA (2019). Event-place: Prague, Czech RepublicGoogle Scholar
  14. 14.
    Hmida, H., Hamida, S.B., Borgi, A., Rukoz, M.: Sampling Methods in Genetic Programming Learners from Large Datasets: A Comparative Study. In: P. Angelov, Y. Manolopoulos, L. Iliadis, A. Roy, M. Vellasco (eds.) Advances in Big Data, vol. 529, pp. 50–60. Springer International Publishing, Cham (2017)CrossRefGoogle Scholar
  15. 15.
    La Cava, W., Helmuth, T., Spector, L., Moore, J.H.: A Probabilistic and Multi-Objective Analysis of Lexicase Selection and 𝜖-Lexicase Selection. Evolutionary Computation 27, 377–402 (2018)CrossRefGoogle Scholar
  16. 16.
    La Cava, W., Spector, L., Danai, K.: Epsilon-Lexicase Selection for Regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 741–748. ACM, New York, NY, USA (2016). Event-place: Denver, Colorado, USAGoogle Scholar
  17. 17.
    Lalejini, A., Ofria, C.: Evolving event-driven programs with SignalGP. In: Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO 2018, pp. 1135–1142. ACM Press, Kyoto, Japan (2018)Google Scholar
  18. 18.
    Lalejini, A., Ofria, C.: Tag-accessed memory for genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion - GECCO 2019, pp. 346–347. ACM Press, Prague, Czech Republic (2019)Google Scholar
  19. 19.
    Lalejini, A., Wiser, M.J., Ofria, C.: Gene duplications drive the evolution of complex traits and regulation. In: Artificial Life Conference Proceedings 14, pp. 257–264. MIT Press (2017)Google Scholar
  20. 20.
    Martinez, Y., Naredo, E., Trujillo, L., Legrand, P., Lopez, U.: A comparison of fitness-case sampling methods for genetic programming. Journal of Experimental & Theoretical Artificial Intelligence 29, 1203–1224 (2017)CrossRefGoogle Scholar
  21. 21.
    Melo, V.V., Vargas, D.V., Banzhaf, W.: Batch Tournament Selection for Genetic Programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion - GECCO 2019, pp. 994–1002. ACM Press, Prague, Czech Republic (2019)Google Scholar
  22. 22.
    Metevier, B., Saini, A.K., Spector, L.: Lexicase selection beyond genetic programming. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice XVI, pp. 123–136. Springer International Publishing, Cham (2019)CrossRefGoogle Scholar
  23. 23.
    Moore, J.M., Stanton, A.: Lexicase selection outperforms previous strategies for incremental evolution of virtual creature controllers. In: Proceedings of the 14th European Conference on Artificial Life ECAL 2017, pp. 290–297. MIT Press, Lyon, France (2017)Google Scholar
  24. 24.
    Moore, J.M., Stanton, A.: Tiebreaks and Diversity: Isolating Effects in Lexicase Selection. In: The 2018 Conference on Artificial Life, pp. 590–597. MIT Press, Tokyo, Japan (2018)Google Scholar
  25. 25.
    Moore, J.M., Stanton, A.: The Limits of Lexicase Selection in an Evolutionary Robotics Task. In: The 2019 Conference on Artificial Life, pp. 551–558. MIT Press, Newcastle, United Kingdom (2019)Google Scholar
  26. 26.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). URL
  27. 27.
    Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the 14th annual conference companion on Genetic and evolutionary computation, pp. 401–408. ACM (2012)Google Scholar
  28. 28.
    Spector, L., Cava, W.L., Shanabrook, S., Helmuth, T., Pantridge, E.: Relaxations of Lexicase Parent Selection. In: W. Banzhaf, R.S. Olson, W. Tozier, R. Riolo (eds.) Genetic Programming Theory and Practice XV, pp. 105–120. Springer International Publishing, Cham (2018)CrossRefGoogle Scholar
  29. 29.
    Spector, L., Martin, B., Harrington, K., Helmuth, T.: Tag-based modules in genetic programming. In: Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO 2011, p. 1419. ACM Press, Dublin, Ireland (2011)Google Scholar
  30. 30.
    Webb, C.O.: Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. The American Naturalist 156(2), 145–155 (2000)CrossRefGoogle Scholar
  31. 31.
    Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York (2016). URL

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The BEACON Center for the Study of Evolution in ActionMichigan State UniversityEast LansingUSA
  2. 2.Department of Translational Hematology and Oncology ResearchCleveland ClinicClevelandUSA

Personalised recommendations