Genetic programming (GP) has been pushing the boundaries of what a computer may achieve in an autonomous way since its introduction [1]. Over the years, John Koza himself tracked some one hundred results that are competitive with human-produced ones in a wide variety of fields,Footnote 1 and success stories have been steadily published by both scholars and practitioners in the specialized literature. However, we cannot help noticing that today GP is largely underutilized in the real-world domains where it was originally supposed to excel. Artificial intelligence is considered a core technology of the fourth industrial revolution (4IR, or Industry 4.0), but, while machine learning (ML) is explicitly mentioned, there is little doubt that the term refers to statistical models and neural networks, not GP nor other evolutionary algorithms.

Regression is a paradigmatic example of this trend. In the early 1990s, researchers excitedly demonstrated the GP’s ability to evolve mathematical functions that could fit to a set of data, but after 20 years, deep neural networks are showing competitive performances [2]—the two winners of the GECCO22 SRBench competition on inter- pretable symbolic regression for data scienceFootnote 2 do not exploit GP, nor do they mention “evolutionary computation” in their descriptions. Nowadays, the most popular algo- rithms for classification are either ensembles of boosted trees, like XGBoost [3], or, again, deep neural networks [4]. A cursory analysis of the number of likes on GitHub repositories provides a rather clear overview of the situation: as of July 22, 2023, one may observe 176k stars for Tensorflow,Footnote 3 68k for PyTorch,Footnote 4 and 55k for scikit-learn.Footnote 5 TPOT,Footnote 6 a GP-based optimizer for ML pipelines, scores nearly 10k stars, presumably thanks to the “ML” connection, DEAP,Footnote 7 a library for evolutionary optimization that also includes GP has 5.2k stars, but jenetics,Footnote 8 gplearn,Footnote 9 tiny-gpFootnote 10 and the other 7 projects listed when searching for “genetic programming” cumulatively got less than 2k stars.

Here, we would like to draw the readers’ attention to an archetypal topic, although “less explored” as pointed out by prof. Langdon: the creation of computer programs. When the goal is to generate fragments frequently coded by humans, such as API calls or common algorithms, the task can be delegated almost safely to neural networks. Non-evolutionary, “AI-powered” tools like ChatGPT, Github Copilot, or Tabnine are practical because of their speed and the reduced amount of meta-parameters that need to be tweaked. Extremely complex models are often available out-of-the-box, already trained, and may be tweaked with reduced effort exploiting transfer-learning techniques; moreover, practitioners are finding clever workarounds, such as the down- casting of the weights for inference, to allow even large models to work on end-user PCs.

However, while deep learning (DL) methodologies have been shown able to efficiently learn from huge amounts of data and interpolate among existing results, GP displayed a unique ability to slowly unfold brand new solutions; and in the generation of never-before-written programs it could be thriving with little to no competition. For example, in the creation of assembly-language programs to test modern micropro- cessors [5] there are no libraries of already-written solutions and the goal is to create a unique program from scratch, targeting a new hardware design. More broadly, when- ever the goal is to devise a test, by definition, one cannot exploit already-existing material, and therefore neural-network models trained on available data are of little use. GP- and other evolutionary-based techniques have been and still are perfectly suited as fuzzer and feedback-based test generators [6, 7]. In another emblematic case study, GP was able to design a novel antenna [8], proving its effectiveness in creating a structure considerably different from human blueprints. A hypothetical generative DL system applied to the same task would be unlikely to uncover such a solution, as the final result fell well outside the distribution of samples it could have had observed. Apart from this niche, in some cases GP seems to have followed the old saying “if you cannot defeat them, join them”: While GP cannot compete with ML and DL directly, its inherent characteristics might be used to support them. GP-based neuro- evolution is again at the forefront, with interesting results, close to or better than the state-of-the-art human-designed networks [9,10,11]; and even for boosted trees, recent attempts at using ensembles of GP trees evolved with a MAP-Elites [12] scheme were able to outperform classical strategies for boosting [13]—not to mention TPOT [14], seen above, by far the GP-based tool with more stars on GitHub. In general, evolutionary ML is a rapidly growing sub-field, with dedicated workshops and tracks.

To conclude, we believe that so far GP failed to be adopted by the mainstream community, especially in industrial contexts. However, this is not just due to the relative effectiveness of the different techniques. The DL/ML scholars managed to advertise their successes and coalesce a large community of practitioners around their algorithms. Just like some researchers kept the fire going while the interest in neural networks waned during the 90s and the 2000s, the GP community should keep working to make the world aware of the potentiality of this approach. After 30 years, the future seems ripe for GP applications; moreover, the ML/DL rising wave is creating vast search spaces that need to be effectively explored (neuro-evolution, diversity of boosted trees); and with DL models being complete black boxes, there is a growing need for explainability, a call for symbolic or neuro-symbolic AI, where GP could provide good solutions, especially in areas where DL fails, like the Abstraction and Reasoning Corpus benchmark [15]. A new GP spring may very well be looming on the horizon.