Table 5 shows the results in terms of MAE for all methods using the generational, steady-state, and (\(1+\lambda\)) EAs on the regression benchmarks. Table 6 shows the percentage of correct bits and Computational Effort for all techniques on the digital circuits. As GP is only defined for 1-output problems, we did not run it for the adder and multiplier circuits.
Comparison between evolutionary algorithms
Based on Table 5, the following observations can be made for solving regression problems:
The tendencies in the results for pagie1, nguyen, and real-world benchmarks are different and can be better analysed separately.
The (\(1+\lambda\)) EA is consistently the best scheme among all optimization algorithms for the pagie1 benchmark. While EGGP excel, the differences between remaining algorithms and evolutionary schemes are rather small.
For the nguyen3 and nguyen5 benchmarks, generational GP and steady-state LGP are better than the remaining optimization algorithms.
For the nguyen7 benchmark, results among the optimization algorithms and evolutionary schemes are similar. GP and LGP-micro perform best, regardless of the evolutionary scheme and the remaining optimization algorithms follow closely.
For the nguyen benchmarks, the generational EA works best for GP and LGP-micro, while the steady-state EA works best for LGP and the (\(1+\lambda\)) EA for CGP as well as EGGP.
For the real-world datasets, the generational algorithm worked best for GP, CGP, and EGGP, the only exception being the dataset yacht for CGP and EGGP. For LGP and LGP-micro, however, the (\(1+\lambda\)) EA worked better, but the difference in comparison to the other EAs was small for LGP-micro.
For the Boolean benchmarks in Table 6, the following observations can be made:
The (\(1+\lambda\)) EA is consistently and by far the best evolutionary scheme for all optimization algorithms and benchmarks.
The generational and steady-state EAs present similar performances and do not scale well on the even-parity benchmarks.
We show in Table 7 the results of statistical comparisons between the generational, steady-state, and (\(1+\lambda\)) EAs for all methods. For each pair of EAs and benchmark category, we show the mean ranking for the EAs and the p-value resulting from a Friedman test, following the approach in  for comparison of multiple algorithms on multiple datasets.
The rankings confirm our observations: for the symbolic regression regression problems, the generational EA worked best for GP and LGP-micro, the steady-stated EA for LGP, and the (\(1+\lambda\)) EA for CGP and EGGP. For the real-world regression datasets, the generational EA worked best for GP, CGP, and EGGP, but the (\(1+\lambda\)) EA was the best for LGP and LGP-micro. For evolving digital circuits, the generational and steady-state EAs are similarly ranked, and the (\(1+\lambda\)) EA has the best rank for all combinations of algorithms and problem instances.
Most p-values are greater than 0.05, and this are not statistically significant. The exceptions are CGP and EGGP on the symbolic regression functions ((\(1+\lambda\)) with the best rank), and GP, LGP, and LGP-micro on the real-world regression datasets (generational with the best rank for GP and (\(1+\lambda\)) for LGP and LGP-micro). For regression, this outcome was expected, as results are sometimes mixed and vary between problem instances. For digital circuits, the three different EAs (generational, steady-state, and (\(1+\lambda\))) perform similarly in terms of percentage of correct bits for simpler circuits, but differ when we look at the CE. For example, LGP-micro achieves a performance of 1.0 for all EAs for functions par3, 4, and 5, but the CEs for the (\(1+\lambda\)) EA are much lower (Table 6). Even when the results differ, the difference is not always extremely large (for example, CGP and EGGP on mult3 in Table 6), although there is a clear difference if we look at the CE.
We show in Tables 8 and 9 a statistical comparison of selected methods on each individual problem based on a Mann-Whitney U test and the Vargha and Delaney A measure, in order to assess possible statistical differences that were not captured by the Friedman test. We focus here on a comparison between the generational and the (\(1+\lambda\)) EAs, as the generational EA worked best for regression in some cases, while the (\(1+\lambda\)) EA worked best in other cases, and clearly produced the best results for all digital circuits problems.
From Table 8, we confirm that, on individual problems, the generational EA statistically outperforms the (\(1+\lambda\)) EA for GP and LGP-micro, with some large effect sizes. Whereas for CGP the differences are not significant on the symbolic regression functions, for EGGP the (\(1+\lambda\)) EA is statistically better than the generational EA on all problems, with mostly moderate effect sizes. For the real-world regression datasets, on the other hand, CGP and EGGP under the generational EA outperform the (\(1+\lambda\)) EA with large effect sizes. From Table 9, it is clear that the improvement of the (\(1+\lambda\)) EA over the generational EA is statistically significant for all methods on almost all problems, with many large effect sizes.
Based on these results, we can say that the results for the regression problem class are more mixed and dependent on the combination of the optimization algorithm and a problem instance. For the digital circuits, however, results fully support that the use of the (\(1+\lambda )\) EA causes a significant improvement in performance for this benchmark class, regardless of the representation being used, which suggests that solutions to these benchmark problems benefit from intensive exploitation. Similar conclusions have been observed by Kaufmann and Kalkreuth in their parameter studies [15, 16]. Increased exploitation by reducing \(\lambda \rightarrow 1\) achieved best convergence rates over a wide range of Boolean benchmarks.
Comparison between graph-based GP methods
In this section, we focus on the comparison between LGP, LGP-micro, CGP, and EGGP when the same evolutionary algorithm is used. From Table 5, we make the following observations for the comparison of the graph-based methods on the regression problems:
When the generational EA is used, LGP-micro has the best performance for the symbolic regression functions, whereas CGP and EGGP present the best performance for the real-world datasets. For the symbolic regression functions, LGP, CGP, and EGGP present mixed results dependent on each problem. For the real-world datasets, LGP shows a dramatic decrease in performance when compared to LGP-micro and the other graph-based methods.
With the steady-state EA, LGP produces the best results for the symbolic regression functions and CGP and EGGP for the real-world datasets. LGP-micro, CGP, and EGGP show again mixed results on the symbolic regression functions, while LGP again performs much worse in comparison to the other graph-based methods.
For the (\(1+\lambda\)) EA, results are also mostly mixed, but EGGP has the lower MAEs on the symbolic regression functions. On the real-world datasets, LGP has still some remarkably higher MAEs.
LGP was the only graph-based method that was able to achieve a near-optimal fitness on function nguyen5. As GP also has a good performance for this function, finding the optimal solution, this could suggest that this function benefited from a macro operator at the program level (crossover in GP and macro-mutation in LGP).
For the evolution of digital circuits, we focus on the performance of algorithms when the (\(1+\lambda\)) EA is used, as it by far outperformed the generational and steady-state EAs (Sect. 4.1). According to Table 6, the results are the following:
For multi-output benchmarks (adder and multiplier circuits), CGP and EGGP scale similarly well, with LGP-micro lagging slightly behind. LGP has the worst performance.
For the parity benchmark, LGP-micro performs the best. EGGP follows closely and CGP doesn’t scale well with the increasing number of inputs. As an exception, EGGP presents a lower CE value for par7.
In Table 10, we show the rankings and Friedman p-values for a comparison between the graph GP methods with the same evolutionary algorithm. As the difference between the generational and steady-state EAs was not clear, we show here results only for the generational and (\(1+\lambda\)) EAs. For the symbolic regression functions, the rankings confirm that LGP-micro achieves the best result with the generational EA and EGGP with the (\(1+\lambda\)) EA. On the other hand, on the real-world regression datasets, the best result using the generational EA was obtained by EGGP, and by CGP when the (\(1+\lambda\)) is used. CGP and EGGP are the better ranking methods when the (\(1+\lambda\)) EA is used for the adder and multiplier circuits, but but all ranks are similar for the even-parity functions. This time, no Friedman p-value is significant. Again, this is because all these methods perform well in terms of percentage of correct bits (Table 6), and the difference between them lies more in the computational effort.
In Tables 11 and 12, we again show a Mann Whitney and A measure analysis for all individual problems for selected methods. For regression, we show a comparison between LGP-micro, CGP, and EGGP using the generational and (\(1+\lambda\)) EA, as both EAs performed well depending on the graph-based method used. For the digital circuits, as the (\(1+\lambda\)) EA was the clear winner, we show the comparison only for it.
From Table 11, we see that the better performance of LGP-micro using the generational EA on the symbolic regression functions is statistically significant with some large effect sizes. When the (\(1+\lambda\)) EA is used, CGP and in particular EGGP statistically outperform LGP-micro. The difference between CGP and EGGP is sometimes significant but with low effect sizes only. On the real-world regression datasets, CGP and EGGP again outperform LGP-micro with some large effect sizes using the generational EA. When the (\(1+\lambda\)) EA is used, the differences are significant and with large effect sizes, although, as the results from Table 5 are mixed, this still provides no conclusive insight. For the digital circuits (Table 12), most differences are not detected as statistically significant, and even less present high effect sizes. As discussed previously, this is due to all methods performing similarly well in terms of the quality of the final solution found, although they differ in how many evaluations they need to find it (CE values in Table 6).
In summary, results are quite mixed and context dependent for symbolic regression, although LGP-micro with a generational EA performed the best for the symbolic regression functions and EGGP with a generational EA for the real-world datasets. For digital circuits, results are clearer, with EGGP being the best method but LGP-micro outperforming it on all but one even-parity function.
Based on these results, the use of LGP with a fixed-size genotype and mutations that change only functions inside instructions or connections (LGP-micro) is recommended, as is done in CGP and EGGP, and this becomes evident when looking at the results from LGP on the real-world regression datasets (Table 5). As the difference between LGP-micro and CGP lies in the representation, we claim that the representation in LGP, where the number of registers (10 + #Inputs) is much lower than the genotype size and registers can be overwritten, can be a disadvantage. However, LGP-micro performed better for the even parity benchmarks, even though CGP and EGGP outperformed it on the adder and multiplier circuits. As all configurations were the same between the two experiments and the three algorithms, one hypothesis is that the even parity benchmarks benefit from more sharing of results - less sharing occurs in CGP and EGGP, as any node can use any of the previous nodes as arguments, whereas in LGP this is limited by the number of available registers, which is significantly lower than the total number of instructions. We examine this hypothesis in Sect. 4.4.
Comparison with tree-based GP
In order to assess the impact of the graph representation when the same evolutionary algorithm and similar configurations are used, we compare GP with the graph-based method and using the EA that worked best on each benchmark class: LGP-micro with the generational EA for the symbolic regression functions, EGGP with the generational EA for the real-world regression datasets, and LGP-micro with the (\(1+\lambda\)) EA for the even parity circuits. From Table 5, apart from pagie1, GP performs better than LGP-micro with the generational and steady-state EAs. When the (\(1+\lambda\)) EA is used, GP has better results on nguyen3 and nguyen5. With the exception of the concrete dataset when the (\(1+\lambda\)) EA is used, EGGP outperforms LGP on all real-world regression datasets, with some large improvements in MAE. Looking at Table 6, LGP-micro outperforms GP on all parity functions, both in terms of percentage of correct bits as in terms of Computational Effort, which shows that the graph representation presents a great advantage in this benchmark class.
Tables 13 and 14 show the effect sizes for a statistical comparison between GP and LGP-micro/EGGP. On the regression benchmarks, in general GP is statistically better than LGP-micro with some large effect sizes. LGP was better on pagie1 using the steady-state EA and on nguyen7 using the (\(1+\lambda\)) EA, although the effect sizes are not large. EGGP is statistically better than GP on the real-world regression datasets, and with large effect sizes under the (\(1+\lambda\)) EA. On the even-parity circuits, almost all differences between GP and LGP-micro were significant and with a very high effect size. When the (\(1+\lambda\)) EA is used, GP performs better than before, but is still outperformed by LGP micro from par5 onward.
In conclusion, the graph representation was a disadvantage for the symbolic regression problems considered here. On the other hand, it outperformed trees on the real-world regression datasets, which are much more difficult problems based on the error values obtained. This suggests that, although the results for the regression problem class are quite mixed, the graph representation has the potential of improving results, especially for more complex problems.
Graphs were also able to outperform trees for digital circuits benchmarks regardless of the EA being used. Further, the magnitude of the increase in performance increases with the complexity of the function, and also when graphs are combined with the \((1+\lambda\)) EA (par6 and par7 in Table 6). Thus, the graph representation has features that are of advantage for evolving digital circuits, and the (\(1+\lambda\)) EA is capable of better exploiting these features. As the (\(1+\lambda\)) EA performs more local search, one of these features may be neutral genetic drift, which occurs more frequently in graph representations due to mutations in inactive portions of the genotype. This is in accordance with publications examining the search space of the task of evolving circuits and showing that allowing neutral genetic drift is of help for these benchmarks in CGP [22, 30, 36]. Thus, as shown by our results, even if we change GP to work with the (\(1+\lambda\)) EA, the inclusion of graphs is still able to outperform it on digital circuit benchmarks. Sotto and Rothlauf also show in  that increasing mutations on inactive instructions slightly improved search performance for some symbolic regression benchmarks. As in that publication the authors used the standard EA for LGP, which is the steady-state EA, the feature of neutral search should be probably potentialized in combination with the (\(1+\lambda\)) EA, especially for evolving digital circuits.
Number of registers and levels-back parameter
In Sect. 4.2 we hypothesized that the better performance of LGP-micro on the parity functions lies in the small number of registers. A small number of registers forces evolution to reuse intermediate results more frequently. In turn, this helps optimization to develop more complex solutions quicker. To elaborate on this idea, we fix the evolutionary algorithm as being the (\(1+\lambda\)) EA, as it performed best on the parity functions, and carry out two experiments. In the first experiment we measure the performance of LGP-micro on the parity benchmarks using a rising number of registers from one to 100, by a step of 2. In a second experiment we test the “intermediate results reuse” factor for CGP. CGP implements the levels back parameter l which, similarly to the number of registers in LGP, can control the use of intermediate results. Measuring the performance of CGP for \(l=1\dots 100\) with a step of 2 helps us to see whether restricting the levels back parameter shows a specific behaviour, how this behaviour compares to restricting |R| for LGP-micro, and how the results compare to the previous experiments with \(l=\infty\).
Figure 5 shows the development of the CE for LGP-micro and CGP when letting R and l sweep from 1 to 100. All remaining algorithm parameters are set to the same values as in previous experiments. Following observations can be made:
There is an optimal interval for R and l. LGP-micro shows best performance for \(R\in [10,15]\) and CGP for \(l\in [15,25]\). Because in previous experiments we have configured R for LGP-micro almost optimal based on the literature and selected for CGP the common, but vastly suboptimal \(l=n_c=100\), CGP underperformed. Given a better configuration of l, CGP should perform similarly to LGP-micro and EGGP in Table 6.
The more complex a parity function gets, the more sensitive the setting of the R of LGP and l of CGP become. For LGP-micro the optimal interval for R gradually rises from [10, 13], to [10, 20] for par7, par6, par5, par4, and par3, in this order. CGP is more robust towards misconfigured l’s. For par3 and par4 there are no large differences in performances for \(l>20\). However, for larger parity functions the increase of CE rises significantly for \(l>20\).
These results confirm that more intermediate results reuse is beneficial for complex parity problems, and LGP and CGP provide a mechanism to control this reuse. The fact that LGP is less robust to higher values of R can be a consequence of registers being overwritten, as then we have two factors decreasing intermediate results reuse: more available instructions from the beginning of programs and overwritten results.
Similar impacts of the configuration parameters R of LGP-micro and l of CGP is an indication that these DAG-based approaches could probably deploy the very similar mechanisms and are in fact two different forms of the same principle. Similar insights have been observed in a more detailed work in .