Abstract
Lexicase selection is a proven parent-selection algorithm designed for genetic programming problems, especially for uncompromising test-based problems where many distinct test cases must all be passed. Previous work has shown that random subsampling techniques can improve lexicase selection’s problem-solving success; here, we investigate why. We test two types of random subsampling lexicase variants: down-sampled lexicase, which uses a random subset of all training cases each generation; and cohort lexicase, which collects candidate solutions and training cases into small groups for testing, reshuffling those groups each generation. We show that both of these subsampling lexicase variants improve problem-solving success by facilitating deeper evolutionary searches; that is, they allow populations to evolve for more generations (relative to standard lexicase) given a fixed number of test-case evaluations. We also demonstrate that the subsampled variants require less computational effort to find solutions, even though subsampling hinders lexicase’s ability to preserve specialists. Contrary to our expectations, we did not find any evidence of systematic loss of phenotypic diversity maintenance due to subsampling, though we did find evidence that cohort lexicase is significantly better at preserving phylogenetic diversity than down-sampled lexicase.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Evaluating a single program on a single test case is one test case evaluation.
- 2.
Choosing when to measure diversity in evolutionary computation is an interesting problem. In evolutionary computation, diversity maintenance is often viewed as a mechanism to avoid premature convergence on suboptimal solutions. If our goal is to compare how well different selection schemes maintain diversity, when should we measure diversity? Measuring diversity after a global solution is found is not particularly meaningful, as finding the solution often causes the population to converge, decreasing diversity. We measured diversity at the time the solution is found to mitigate this problem. However, this solution only partially addresses the underlying problem: the process of evolution often involves many selective sweeps and subsequent divergences and we cannot know where in this cycle our measurements occurred.
References
Aenugu, S., Spector, L.: Lexicase selection in learning classifier systems. In: Proceedings of the Genetic and Evolutionary Computation Conference - GECCO 2019, pp. 356–364. ACM Press, Prague, Czech Republic (2019)
Curry, R., Heywood, M.: Towards efficient training on large datasets for genetic programming. In: A. Tawfik, S. Goodwin (eds.) Conference of the Canadian Society for Computational Studies of Intelligence, pp. 161–174. Springer (2004)
Dolson, E., Lalejini, A., Jorgensen, S., Ofria, C.: Quantifying the tape of life: Ancestry-based metrics provide insights and intuition about evolutionary dynamics. In: Artificial Life Conference Proceedings, pp. 75–82. MIT Press (2018)
Dolson, E.L., Banzhaf, W., Ofria, C.: Ecological theory provides insights about evolutionary computation. preprint, PeerJ Preprints (2018). URL https://peerj.com/preprints/27315
Ferguson, A.: FergusonAJ/gptp-2019-subsampled-lexicase: GPTP Chapter Companion (2020). https://doi.org/10.5281/zenodo.3679380, https://github.com/FergusonAJ/gptp-2019-subsampled-lexicase
Forstenlechner, S., Fagan, D., Nicolau, M., O’Neill, M.: Towards Understanding and Refining the General Program Synthesis Benchmark Suite with Genetic Programming. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–6. IEEE, Rio de Janeiro (2018)
Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in Genetic Programming. In: Y. Davidor, H.P. Schwefel, R. Maenner (eds.) Parallel Problem Solving from Nature - PPSN III, vol. 866, pp. 312–321. Springer Berlin Heidelberg, Berlin, Heidelberg (1994)
Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.: Random sampling technique for overfitting control in genetic programming. In: A. Moraglio, S. Silva, K. Krawiec, P. Machado, C. Cotta (eds.) European Conference on Genetic Programming
Helmuth, T., McPhee, N.F., Spector, L.: Effects of lexicase and tournament selection on diversity recovery and maintenance. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, pp. 983–990. ACM (2016)
Helmuth, T., Pantridge, E., Spector, L.: Lexicase selection of specialists. In: Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO 2019, pp. 1030–1038. ACM Press, Prague, Czech Republic (2019)
Helmuth, T., Spector, L.: General program synthesis benchmark suite. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1039–1046. ACM (2015)
Helmuth, T., Spector, L., Matheson, J.: Solving uncompromising problems with lexicase selection. IEEE Transactions on Evolutionary Computation 19(5), 630–643 (2015)
Hernandez, J.G., Lalejini, A., Dolson, E., Ofria, C.: Random Subsampling Improves Performance in Lexicase Selection. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2019, pp. 2028–2031. ACM, New York, NY, USA (2019). Event-place: Prague, Czech Republic
Hmida, H., Hamida, S.B., Borgi, A., Rukoz, M.: Sampling Methods in Genetic Programming Learners from Large Datasets: A Comparative Study. In: P. Angelov, Y. Manolopoulos, L. Iliadis, A. Roy, M. Vellasco (eds.) Advances in Big Data, vol. 529, pp. 50–60. Springer International Publishing, Cham (2017)
La Cava, W., Helmuth, T., Spector, L., Moore, J.H.: A Probabilistic and Multi-Objective Analysis of Lexicase Selection and 𝜖-Lexicase Selection. Evolutionary Computation 27, 377–402 (2018)
La Cava, W., Spector, L., Danai, K.: Epsilon-Lexicase Selection for Regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 741–748. ACM, New York, NY, USA (2016). Event-place: Denver, Colorado, USA
Lalejini, A., Ofria, C.: Evolving event-driven programs with SignalGP. In: Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO 2018, pp. 1135–1142. ACM Press, Kyoto, Japan (2018)
Lalejini, A., Ofria, C.: Tag-accessed memory for genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion - GECCO 2019, pp. 346–347. ACM Press, Prague, Czech Republic (2019)
Lalejini, A., Wiser, M.J., Ofria, C.: Gene duplications drive the evolution of complex traits and regulation. In: Artificial Life Conference Proceedings 14, pp. 257–264. MIT Press (2017)
Martinez, Y., Naredo, E., Trujillo, L., Legrand, P., Lopez, U.: A comparison of fitness-case sampling methods for genetic programming. Journal of Experimental & Theoretical Artificial Intelligence 29, 1203–1224 (2017)
Melo, V.V., Vargas, D.V., Banzhaf, W.: Batch Tournament Selection for Genetic Programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion - GECCO 2019, pp. 994–1002. ACM Press, Prague, Czech Republic (2019)
Metevier, B., Saini, A.K., Spector, L.: Lexicase selection beyond genetic programming. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice XVI, pp. 123–136. Springer International Publishing, Cham (2019)
Moore, J.M., Stanton, A.: Lexicase selection outperforms previous strategies for incremental evolution of virtual creature controllers. In: Proceedings of the 14th European Conference on Artificial Life ECAL 2017, pp. 290–297. MIT Press, Lyon, France (2017)
Moore, J.M., Stanton, A.: Tiebreaks and Diversity: Isolating Effects in Lexicase Selection. In: The 2018 Conference on Artificial Life, pp. 590–597. MIT Press, Tokyo, Japan (2018)
Moore, J.M., Stanton, A.: The Limits of Lexicase Selection in an Evolutionary Robotics Task. In: The 2019 Conference on Artificial Life, pp. 551–558. MIT Press, Newcastle, United Kingdom (2019)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). URL https://www.R-project.org/
Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the 14th annual conference companion on Genetic and evolutionary computation, pp. 401–408. ACM (2012)
Spector, L., Cava, W.L., Shanabrook, S., Helmuth, T., Pantridge, E.: Relaxations of Lexicase Parent Selection. In: W. Banzhaf, R.S. Olson, W. Tozier, R. Riolo (eds.) Genetic Programming Theory and Practice XV, pp. 105–120. Springer International Publishing, Cham (2018)
Spector, L., Martin, B., Harrington, K., Helmuth, T.: Tag-based modules in genetic programming. In: Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO 2011, p. 1419. ACM Press, Dublin, Ireland (2011)
Webb, C.O.: Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. The American Naturalist 156(2), 145–155 (2000)
Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York (2016). URL https://ggplot2.tidyverse.org
Acknowledgements
This research was supported by the National Science Foundation through the BEACON Center (Coop. Agreement No. DBI-0939454), a Graduate Research Fellowship to AL (Grant No. DGE-1424871), and Grant No. DEB-1655715 to CO. Michigan State University provided computational resources through the Institute for Cyber-Enabled Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ferguson, A.J., Hernandez, J.G., Junghans, D., Lalejini, A., Dolson, E., Ofria, C. (2020). Characterizing the Effects of Random Subsampling on Lexicase Selection. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-39958-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39957-3
Online ISBN: 978-3-030-39958-0
eBook Packages: Computer ScienceComputer Science (R0)