This section starts by looking at some technical implementation choices and then discusses the results and visualisations.
The implementation is not a stand-alone program but a set of scripts, mainly written in the R programming language but also in Python, in order to be easily integrated within current workflows for local optima network analysis. The implementation choices allow for programmatically generating and manipulating large networks and their static visualisations at the expense of more interactive approaches, for instance based on GUI network visualisation solutions such as Cytoscape  or Gephi .
The networks are built with the igraph library  in R. All the layouts are generated in two dimensions using either the Dr.L force-directed layout algorithm, as implemented in igraph, or the t-SNE algorithm, as implemented in the scikit-learn package in Python (and with the default parameters). This choice was made because scikit-learn provides the ability to easily change the dissimilarity metric used in the algorithm: the Hamming distance in our case instead of the Euclidean distance. The 3D networks are created by adding fitness as a third dimension, or height, to the 2D layouts. Rendering the images is carried out in R using the ggplot2 package to create the scatterplots and the rgl package to create OpenGL 3D output.
The following encompasses a number of different aspects of our results. We first look at the characteristics of the visualisations (Sect. 6.2.1) and then discuss how they relate to the network objects when considering the ILS samples (Sect. 6.2.2). This is followed by examining the influence of different subsamples on the visualisations (Sect. 6.3) and a comparison between the ILS and Hybrid GA samples (Sect. 6.4).
Figures 4 (triangle program with comparison operators only), 5 (triangle program with comparison and Boolean operators), 6 (tcas program with comparison operators only), 7 (tcas program with comparison and Boolean operators) show visualisations of the LONs. The figures show a subsample of the first 100 runs of the original sample and the first 1000 iterations of each of these. The scatter plots are the layouts generated by the t-SNE algorithm. Nodes in the 3D images are not displayed to minimise occlusion. However, the edges by themselves provide insight into the nature of the landscapes. Edges between global optima are painted red and edges between local optima of equal fitness are painted grey. Edges between local optima with different fitness are painted black.
Characteristics of the visualisations
The force-directed layouts (subfigures (a) and (b) in Figs. 4, 5, 6, 7) display fairly symmetric layouts which are characteristic of the force-directed algorithms. Since no information about the solutions is taken into account when computing the layout, the different plateaus usually appear more-or-less on top of each other. Furthermore, edges linking different plateaus are generally vertical. This may influence the viewer into thinking, wrongly, that solutions at each end of those edges are always genotypically close.
On the other hand, the t-SNE algorithm uses the similarity between the solutions to infer meaningful coordinates (subfigure (c) in Figs. 4, 5, 6, 7). In each of the scatter plots, clear local and global structures emerge. Locally, there are a number of worm-like artefacts which are composed of series of consecutive and very similar points generated by the sampling process. This happens because the sampling process generates monotonic sequences of solutions and accepts non-worsening moves (solutions of same fitness). In addition, the first degree—or 1-move—neighbourhood is considered for both mutation and hillclimbing, producing generally small local steps in which two consecutive solutions are very similar to each other. Since the t-SNE algorithm relies on the similarity between points to determine their position relative to each other—and the measured Hamming distance similarity corresponds to as many 1-moves—this translates into these worm-like structures.
More globally, solutions with the same fitness seem to cluster together, indicating plateaus of solutions that occupy different areas of the search space.
When fitness is added as a third dimension to the t-SNE layouts (subfigures (d) and (e) in Figs. 4, 5, 6, 7), we can observe a more accurate picture of the sampled search space than when the force-directed layout was used. In particular, it is visible that moving from one plateau to another often involves moving to an altogether different part of the landscape. This naturally creates a large number of crossing edges which may be considered less aesthetic but which convey more information. There would, however, be a point, as the number of displayed edges increases, where the visualisation would become an uninformative hairball. One way around this, but which is beyond the scope of this paper, may be edge bundling , where related edges are routed along similar paths, thus minimising edge clutter.
Relating the visualisations to the network objects
Table 2 reports the main characteristics of the LON graphs extracted from the benchmark problems described in Sect. 3.1. The sampling procedure yielded, in all cases, graph sizes in the order of one million edges, which are non-deteriorating transitions between local minima. The actual number of distinct local minima visited during the search, that is, the number of nodes in the graph, is also in the order of one million for triangle.c, and one order of magnitude less in the case of tcas.c. In particular, allowing mutations to both comparisons operators and Boolean operators, increases the size of the search space and the number of local minima.
In all benchmarks, LONs are rather sparse but present patterns of local connectivity. In fact, the clustering coefficient, that is, the average proportion of transitive closures among the neighbours of a vertex, is always around four orders of magnitude higher than the overall network density. That is, connections between nodes that already share a neighbour, are orders of magnitude more frequent than connections in general, which could be explained by the fact that the LON graph displays the traces of iterated local search trajectories.
However, the great majority of these local connections happen on the plateaus that are clearly visible in Figs. 4, 5, 6, 7. Indeed, considering the sampled non-deteriorating moves, more than \(99\%\) of the times a transition out of a local minimum leads to another local minimum with the same fitness value. That applies to both problems and both mutation operators subsets.
These plateaus at the LON level are also known as meta-plateaus. Let us note that, in general, meta-plateaus do not necessarily indicate plateaus at the solution level. This is because the standard fitness landscape model considers a single neighbourhood relation N while the LON model considers at least two neighbourhoods: one for the definition of local optima (N) and another for the edge transitions. It follows that two connected solutions of same fitness in a LON may not be part of the same plateau in the underlying landscape defined by N. However, in the present study, the same neighbourhood is used for both the hillclimber and mutation, therefore blurring the difference between a plateau at the landscape level and a meta-plateau at the LON level.
The triangle program LONs are made up of 6 (fitness 0–5) large plateaus relatively well-connected between each other—and there is a tiny fitness 6, easily escapable, plateau when both comparison and Boolean operators are considered (Fig. 5). When mutations on Boolean operators are introduced, more ILS runs get stuck at fitness 2 and are not able to progress to the global optima level. The ILS success rate, i.e., the proportion of runs that reach a global optimum, is measurably lower (87 vs. 31%).
For the tcas program, the maximum observed fitness in the networks is 264. For both programs, we thus have plateaus that are well below the maximum fitness of 14 and 1578. Since random, not locally-optimal solutions, are often close to those maximum values, this indicates that it is fairly easy to improve the solution fitness with a simple hill-climber, at least initially.
The tcas program LONs display more difference between them. Perhaps surprisingly, the variant that only considers comparison operators shows fairly well-defined plateaus—the three main ones have fitness 0, 144 and 264. This may be an artefact of some interaction between mutations. The study of these interactions is beyond the scope of this paper but seems to be an interesting area for future research. The variant with both comparison and Boolean operators shows more “steps” along the different runs and, therefore, less well-defined plateau structures. This potentially means that finding improving solutions, and ultimately a global optimum, is easier. Whilst the ILS success rate for both variants is quite high, there is a marked improvement for the second variant (from 94 to 98%).
Let us observe that there is a high number of global optima, i.e., solutions that are test-equivalent to the original programs. This may mean that the programs are quite robust—we have not tested this hypothesis—or that the test suite does not provide enough coverage.
In terms of global connectivity, we can observe that a path between any pair of nodes is not always present, even if we disregard the direction of the edges. That is, the networks break down into a number of weakly-disconnected components, especially when the larger search space of comparison and Boolean operators is considered. Nonetheless, more than \(92\%\) of the local minima we observed belong to a single, largest connected component (Table 2). Moreover, a similar high fraction of all local minima lie on paths that could eventually descend to a global optimum.
By following the steepest descent directions on the LON, we can also detect the presence of multiple attractors with no non-deteriorating transitions around them, i.e., dead-ends for the search. Their number is indicative of the multi-funnel global structure of the landscapes [26, 28], and may directly relate to the empirical problem hardness from the point of view of an Iterated Local Search . Among the four benchmark instances, the one with the lowest success rate also has multiple sub-optimal attractors. Indeed, almost all its local minima have access or belong to the funnel containing the global optima, but we hypothesise that, given the ILS stopping criterion, the actual success rate might depend on how easy it is for the search to find exits across plateaus and gain access to better (lower) fitness levels within the budget of function evaluations. As it can be visually appreciated on Fig. 5, the “hardest” instance is also, notably, the one with fewer such connections across the lowest fitness levels.
More generally, the notion of difficulty we consider is based on how easy it is for the algorithm to reach a global optimum. This is influenced—for both the mathematical object that is the LON and its visualisation—by the number of paths that lead to global optimum. Another factor associated to difficulty is whether an algorithm will remain stuck in a local optimum and whether there are regions of the landscape that tend to concentrate those deceptive solutions. A quick visual assessment for difficulty can therefore be to observe the connectivity patterns, or absence thereof, leading to the global optima or to local optima that cannot be escaped. The link between different network metrics and search difficulty has been quantified in other contexts, for instance in [27, 33]. The visualisations presented here provide an immediate translation of the network objects, revealing different connectivity patterns. In the future, it would be interesting to assess how researchers and practitioners in the field of local search search and metaheuristics interpret these visualisations and whether that relates to their intuitions on search difficulty and the objective metrics that can be computed on a landscape.
Influence of subsamples on visualisations
In general, search landscapes, or even local optima networks, for non-trivial problem instances cannot be enumerated. Sampling is therefore required. Furthermore, the amount of information that can be visually represented in a coherent manner is also limited. In addition, the larger the number of points, the more it becomes computationally expensive to generate a layout. In our experience, we can usually sample networks at a much finer granularity than what can be represented visually on print medium or via non-interactive representations on screen that are discussed here. This means that one or more subsamples from the initial sample are examined. We look into some of the issues of subsampling here.
The networks that were considered in the previous subsection are based on specific subsamples. In the current subsection, we assess how subsampling at the same level and at a higher level influences the visualisations.
Figures 8, 9 and 10 present subsamples of the triangle program where only mutations of comparison operators are considered. Each subsample consists of a different set of 100 runs of 1000 iterations each. The figures show, in order, the t-SNE 2D layouts, force-directed 3D layouts and t-SNE 3D layouts of the subsamples. These are visually very similar. The t-SNE 2D layouts (four of them are shown in Fig. 8) exhibit similar local and global structures across the subsamples (modulo rotation and reflection symmetry). The force-directed 3D layouts (two of them are shown in Fig. 9) are almost indistinguishable at first glance, however the t-SNE 3D layouts (two of them are shown in Fig. 10) appear more different. This is due to different viewing directions—they are based on the t-SNE 2D layouts which are similar. These observations point to the relative robustness of the visualisations for this subsampling level. Similar observations can be made for the other networks but are not visualised here due to space constraints.
Figure 11 considers larger subsamples of 1000 runs of 1000 iterations for each of the four benchmarks and presents them as 2D t-SNE scatterplots. The number of solutions is indicated in the figure’s captions. Our current R implementation is unable to scale to generate 3D plots of such large networks. The scatterplot visualisations for these larger subsamples are naturally denser than for smaller subsamples because of the larger number of points displayed. There are some notable differences between the larger subsamples and their smaller counterparts. This is especially true for the triangle program where only comparison operators are mutated (Fig. 11a). In this case, the two visualisations are completely different and this highlights that any interpretation of the landscape structure visualisations needs to be carefully considered. One of the reasons for the marked contrast is the difference in the distribution of fitness across both subsamples: notably, the proportion of globally optimal solutions (red) in the smaller subsample is much higher than in the larger subsample (4275/64876 vs. 9211/599340 or 6.6 vs. 1.5 %). This difference stems from the fact that the search algorithm is meant to find good solutions, and global optima in particular, and that there is a fixed number of relatively easily discoverable global optima. It is therefore likely for multiple runs to end up discovering the same very good solutions. However, the solutions discovered during the initial part of the search process will almost surely be different for each run as each starts from a totally random solution. When the subsampling size increases, it is much more likely that more poorer quality solutions will be discovered than new global optima, thus explaining the different ratios.
The difference is less notable for the other benchmarks. For the triangle program with mutations of both comparison and Boolean operators, the more local structures are lost because of the higher point density but there is still a similar segregation of points based on fitness. It is interesting to observe that for the two tcas variants, the structures are visually largely similar, especially for lower fitness levels. This is partly due to the lower number of points generated in the tcas samples which reduces the blurring effect that large numbers of points produce when plotted in a restricted area.
Comparing search techniques
The sampling algorithm naturally biases the networks that are generated. It is therefore interesting and useful to compare and contrast samples generated by different algorithms, and their visualisations. We consider here the networks generated by the Hybrid GA detailed in Sect. 4.2 and Table 3 presents some of their key metrics.
The samples for the Hybrid GA aggregate 1000 runs of 50 generations with a fixed population size of 400, a crossover probability of 0.5 and no mutation. This crossover probability means that roughly 200 new solutions may be created per generation, and thus 10000 per run. The samples for the ILS considered previously aggregated 1000 runs of 10000 iterations each. Naturally, these two sampling techniques produce samples with different characteristics. Notably, the number of nodes and edges generated is higher for the Hybrid GA, especially for tcas with mutations across both comparison and Boolean operators. This difference also translates in terms of number of global optima found—or programs that pass all test cases—except for the triangle program with mutations only on comparison operators where the numbers are the same. In fact, in this scenario both algorithms generated the same set of global optima. Because of the computational complexity we are not able to assess if this is the complete set of globally optimal solutions but it is a possibility. Network density and clustering coefficients are mostly similar across the two sampling techniques. The Hybrid GA samples exhibit lower neutrality, as measured by the neutral degree, which is expected because the ILS has the tendency to explore several solutions in a plateau before being able to escape, which is generally not the case for crossover. A marked difference is in the number of connected components which is drastically higher for the Hybrid GA because it is not a trajectory based method and it employs selection in the parent population. Nevertheless, the relative size of the largest connected component is always well above \(90\%\) of all the local optima encountered. Another major difference with the ILS samples is the percentage of nodes with a path to the global optimum which is much lower for the Hybrid GA. The Hybrid GA samples also exhibit a significant difference in terms of success rate, although both variants of the triangle program remain harder to solve across both algorithms. This suggests that the hybrid approach may be better than ILS for genetic improvement.
Figures 12 and 13 highlight the differences in terms of t-SNE scatterplot visualisations for the triangle and tcas programs respectively. The ILS visualisations reuse the same 100 run subsample as seen previously while the number of runs per subsample of the Hybrid GA, indicated in the caption of each figure, is chosen to roughly match the number of solutions of the ILS subsample. This is in order to stay within a reasonable and comparable number of points.
Since t-SNE takes the dissimilarity between points into account when assigning their coordinates, it is reasonable to compare each sample type side-by-side. If both sampling methods returned roughly the same set of solutions or sets of solutions with very similar characteristics, one would expect the dissimilarities to be roughly equivalent and thus translate to similar visualisations. Here we can see that the two methods yield relatively similar visualisations for the benchmarks that only consider comparison operators. Globally optimal solutions, in particular, exhibit the same clustering patterns and solutions of equivalent fitness are grouped together indicating that they occupy specific areas of the search space. One major difference are the worm-like structures in the ILS sample—artefacts of this single point trajectory sampling methodology—which do not appear in the Hybrid GA samples.
In comparison, the variants that explore the mutations of both comparison and Boolean operators are markedly different, although there is still a differentiation in the areas occupied by various fitness levels. One key difference is the number of globally optimal solutions which is much greater for the Hybrid GA and which therefore influences the layouts produced. One possible explanation for this difference is that there may be disjoint plateaus of globally optimal solutions—or plateaus that are extremely large—of which the ILS is able to explore a smaller subset that the Hybrid GA. In addition, we observed that the Hybrid GA converged very quickly to the globally optima solutions, usually within 10 generations or fewer, strongly suggesting that the landscapes should be quite different.
We do not show any 3D visualisations for the Hybrid GA because the number of crossing edges tends to create massive hairballs that do not provide significant useful information. One potential way to deal with this issue in the future would be to use edge bundling  which routes related edges along similar paths.
In addition to side-by-side comparisons, the t-SNE layouts can be used to overlay two sets of points to highlight similarities and differences. In this scenario, both sets of points are merged and a single layout is computed. Figure 14 shows such an overlay, with 100 runs of the ILS in the background and 2 runs of the Hybrid GA on top. We only show a smaller subset of the Hybrid GA points because otherwise they would cover the ILS points. This serves to highlight the fact that similar areas of the search space are explored by both algorithm. In an interactive context, the overlay could be manipulated dynamically to reveal different levels of information. The 3D view is less informative because of the large number of crossing edges.