Cover-Encodings of Fitness Landscapes

The traditional way of tackling discrete optimization problems is by using local search on suitably defined cost or fitness landscapes. Such approaches are however limited by the slowing down that occurs when the local minima that are a feature of the typically rugged landscapes encountered arrest the progress of the search process. Another way of tackling optimization problems is by the use of heuristic approximations to estimate a global cost minimum. Here, we present a combination of these two approaches by using cover-encoding maps which map processes from a larger search space to subsets of the original search space. The key idea is to construct cover-encoding maps with the help of suitable heuristics that single out near-optimal solutions and result in landscapes on the larger search space that no longer exhibit trapping local minima. We present cover-encoding maps for the problems of the traveling salesman, number partitioning, maximum matching and maximum clique; the practical feasibility of our method is demonstrated by simulations of adaptive walks on the corresponding encoded landscapes which find the global minima for these problems.


Introduction
Fitness landscapes have proved to be a valuable concept in the understanding of adaptation in evolutionary biology and beyond, by visualizing the relationships between genotypes and effective reproductive success (Wright 1932(Wright , 1967. This concept has been taken forward in the field of evolutionary computation, where the performance of optimization algorithms utilizing local search has often been described as dynamics on a fitness landscape, see, e.g., the book by Engelbrecht and Richter (2014).
However, fitness functions alone do not determine the performances of local search algorithms, which depend also on the structure of the search spaces involved. These in turn are determined by two largely independent ingredients: (1) the concrete representations of the configurations that are to be optimized, referred to as encodings, and (2) locality in the search space, referred to as a move set.
For many well-studied combinatorial optimization problems and related models from statistical physics (such as spin glasses), there is a natural encoding. For instance, tours of a traveling salesperson problem (TSP) are naturally encoded as permutations of the cities concerned, while spin configurations are encoded as strings over the alphabet {+, −} with each letter referring to a fixed spin variable. This natural encoding is usually free of redundancy; any residual redundancies that occur usually arise from simple symmetries of the problem which can easily be factored out. For instance, TSP tours can start at any city so that they are invariant under rotations, while many spin glass models are invariant under simultaneous flipping of all spins. This natural or "direct" encoding is often referred to as the phenotype space, see, e.g., (Rothlauf 2006;Neumann and Witt 2010;Rothlauf 2011;Borenstein and Moraglio 2014).
In biology, fitness is conceptually understood as a property (function) of the genotype. It depends, however, on properties of higher-level structures such as molecular structure, gene-regulatory networks, tissues, or organs, i.e., on a phenotype. The relationship of genotype and fitness, therefore, is a composition of a genotype-phenotype map and phenotype-dependent fitness function. This decomposition has been studied extensively in several distinct models systems, including RNA secondary structures, (Schuster et al. 1994), gene-regulatory networks (Ciliberti et al. 2007), and metabolic networks (Dykhuizen et al. 1987;Flamm et al. 2010). Here, we focus on the abstract structure rather than the specifics of such models.
For a given encoding, irrespective of whether it is genotypic or phenotypic, the performance of search crucially depends on the move set. Here, we will consider only reversible, mutation-like moves. The search space therefore is modeled as an undirected graph. More general settings are discussed, e.g., by Flamm et al. (2007). The cost function assigned to a specific search space defines a fitness landscape. Evolutionary algorithms can thus be viewed as dynamical systems operating on landscapes, whose structure has, as a consequence, been studied extensively in the field (Reidys and Stadler 2002;Østman et al. 2010;Engelbrecht and Richter 2014).
Continuing the analogy with biology in evolutionary computation, an additional encoding Y , the so-called genotype space, is often used (Rothlauf and Goldberg 2003;Rothlauf 2006). The genotype-phenotype relation is determined by a map α : Y → X ∪{∅}, where ∅ represents phenotypic configurations that do not occur in the original problem, i.e., y ∈ Y does not encode a feasible solution of the original problem whenever α(y) = ∅. For example, a frequently used genotypic encoding for TSP tours comprises binary strings for two cities which represent their presence (1) or absence (0), for each of the possible adjacencies (Applegate et al. 2006). Most binary strings, however, do not correspond to TSP tours.
In practice, genotypic representations are usually chosen with a high degree of redundancy to tackle optimization problems which often also introduces neutrality, i.e., the appearance of adjacent configurations with the same value of the cost function. Detailed investigations of fitness landscapes from molecular biology have shown that degrees of neutrality can facilitate optimization (Schuster et al. 1994;Reidys and Stadler 2002) due to the inclusion of extensive neutral paths which prevent trapping in metastable states (Schuster et al. 1994;Fernández and Solé 2007;Yu and Miller 2002;Banzhaf and Leier 2006). On the other hand, "synonymous encodings" where genotypes mapping to the same phenotype form tight clusters in the genotype space have been advocated for the design of evolutionary algorithms (Rothlauf 2006;Choi and Moon 2008;Rothlauf 2011). Rather than having neutral paths connecting remote areas of the landscape, cost-equivalent configurations are locally clustered in synonymous encodings.
What is clear is that, empirically, the introduction of arbitrary redundancy (by means of random Boolean network mapping) does not increase the performance of mutationbased search (Knowles and Watson 2002), suggesting that the inclusion of redundancy should be suitably designed in order to facilitate optimization. One such approach was that of Klemm et al. (2012), which emphasized the utility of such inhomogeneous genotype-phenotype maps via the idea that low-cost solutions could be enriched and optimization made more efficient in genotype space if the size of the preimage |α −1 (x)| of the phenotypes were anti-correlated with the cost function f (x) . Of course, for such anti-correlations to be imposed, α needs to become explicitly dependent on the cost function.

Simplifying Landscape Structure by Encoding
Before delving into the technicalities, we present a conceptual outline of the key ideas of this contribution. Our starting point is the twenty-year-old observation by Ruml et al. (1996) that certain redundant encodings of the Number-Partitioning Problem (NPP) allow simple, generic optimization heuristics to find dramatically improved solutions. In previous work (Klemm et al. 2012) we found that this approach was not limited to the NPP, but that suitably chosen redundant encodings also improved the performance of heuristics on several other combinatorial optimization problems. In the present work, our objectives are to understand (a) why the particular method used by (Ruml et al. 1996) works so well and (b) how it can be generalized to essentially arbitrary combinatorial optimization problems in a principled way.
We focus in this contribution on black-box-type optimization scenarios in which the information on the cost function f (x) is exclusively obtained by evaluating it for specific configurations x ∈ X in the search space X . The sequence of these function evaluations is determined by the optimization heuristic. Practical algorithms of this type propose candidates x ∈ X for evaluation based on past evaluation results. These candidates are chosen locally in the vicinity of past successful candidates with the help of rules that depend on the representation of X . This explicitly or implicitly defines a topological structure on X . For the purpose of the present contribution, we assume that the topology of the search space X is expressed by a notion of adjacency that is respected by the search process.
Intuitively, the most important obstruction for local optimization heuristics is the presence of a large number of local optima that trap the search process. The aim of a redundant encoding, therefore, is to provide an alternative representation Y of the optimization problem that reduces the number of local optima and makes it easier to find the globally optimal solution. Formulated over Y , we would wish that (i) neighborhoods in Y are small enough to be searched in practice. (ii) for every starting point there is a path to the global optimum such that the cost function is decreasing, or at least non-increasing.
Condition (i) ensures that we still deal with local search heuristics, while condition (ii) intuitively makes the landscape easy to search. Note that condition (ii) does not make the optimization problem trivial, since the heuristics still have to find an efficient path among possibly many very long ones. Its real significance is that it rules out traps and guarantees that simple downhill search will be successful eventually.
Is it possible at least in principle to construct such an encoding? The prepartition encoding, which performed best for the NPP (Ruml et al. 1996), provides an important hint. Each particular encoding y ∈ Y corresponds to a restricted version of the original optimization problem, i.e., it can be seen as constraining the original search space X to a subset ϕ(y) ⊆ X . A deterministic approximation is then used to solve the restricted problem on ϕ(y). For every y ∈ Y , this provides an upper bound on the cost functioñ f (y). Since the encoding is chosen such that there is also a codeŷ for the global optimumx ∈ X , i.e., ϕ(ŷ) = {x}, the task now becomes to findŷ, which minimizes f by construction. The numerical results by (Ruml et al. 1996) suggest that this auxiliary problem of minimizing the cost function of the encoding is much easier than the original, despite the fact that the search space is much larger. Below we show that this is case because (1)f does a good job at approximating the true solutionF(y) of the restricted optimization problem on ϕ(y) and (2) the perfect solutionsF(y) give rise to landscapes with the desired properties mentioned above.
This observation suggests a general construction for "good" landscape encodings. The first step is the construction of a genotype space Y and an encoding scheme ϕ that maps genotypes to restrictions of the original problem rather than a particular phenotype y. This map has to satisfy certain conditions discussed in detail in Sect. 3.2 to be a good choice. The cost function then enters by guiding, for every genotype y ∈ Y , a heuristic that solves the restricted problem ϕ(y).
Following the formal introduction of the general concepts, we construct landscape encodings explicitly for several well-known examples. In Sect. 4, we focus on a par-ticularly useful construction that makes use of the fact that the restricted subproblems on ϕ(y) can be seen as smaller instances of the same type of optimization problem, or alternatively, as coarse-grained problems. We show in particular that the NPP heuristic that motivated our approach is also of this type. In Sect. 5, finally, we use numerical experiments to show that the encoding scheme proposed here also works well in practice.

Landscapes
Formally, an instance (X, f ) of a combinatorial optimization problem consists of a finite set X and a cost function f : X → R on X . The task of the combinatorial A landscape (X, ∼, f ) consists of a finite set X endowed with a symmetric and irreflexive (adjacency) relation ∼ and a cost function f : Note that a global minimumx is not a strict local minimum as defined above.
For any X ⊆ X , the restricted problem can be defined analogously.

Oracle Function and Cover-Encoding Map
A key ingredient in our reasoning is to consider the global solutions of restricted optimization problems. This is formalized as follows: for all X ⊆ X . We use the convention F(∅) = ∞.
We say that a subset X ⊆ X is good if F(X ) = F(X ), i.e., if X contains a global optimum, and bad if F(X ) > F(X ). The oracle function is by definition monotonic in the following sense: We call F an oracle function because in general there is no efficient algorithm for computing it. In fact, if we had an efficient way to compute F, we would already have solved the original optimization problem as well. Nevertheless, it is a useful theoretical construct, as we shall see below. First, it guides our construction of encodings of the original optimization problem that have the potential of being easily solved, or at least easier to solve. Second, it provides an inroad for constructing practical heuristics provided we can come up with a good approximation for F.
We start by formalizing the idea of an encoding of a landscape.
Property (Y1) states that the collection of sets {ϕ(y)|y ∈ Y } is a set cover of X . The points y ∈ Y can be thought as coding for a particular element of this set cover. In the following, we will be interested in cover-encoding maps that satisfy some or all of the following additional properties: Note that both (Y2) and (Y3) imply (Y1). Axiom (Y0) excludes infeasible points in Y . It is not hard to see that cover-encoding maps always exist. In particular, consider any subset Y ⊆ P 0 (X ) = 2 X \ {∅}, the set of non-empty subsets of X , such that (i) the singletons {x} ∈ Y for all x ∈ X and (ii) {X } ∈ Y . Then the identity ι is obviously a cover-encoding map that satisfies (Y0), (Y1), (Y2), and (Y3). Now consider an optimization problem (X, f ) and let ϕ : Y → 2 X be a coverencoding map for X . We defineF : Y → R as the composition of ϕ with the oracle function of (X, f ), i.e.,F(y) = F(ϕ(y)). In the following, we will be interested in the relationship between the "encoded" optimization problem (Y,F) and the original problem (X, f ).
If condition (Y2) is satisfied, there isŷ ∈ Y so that ϕ(ŷ) = {x} for every global optimum of the original problem. For most applications, it is sufficient to find one global optimum, hence we will consider the weaker condition: Condition (F0) simply states that there exists a code y ∈ Y that identifies a global optimum of the original problem (X, f ). This is sufficient to consider (X, f ) and (Y,F) as "equivalent optimization problems." The identity cover-encodings from Y max := P 0 (X ) and Y min := {{x}|x ∈ X }∪{X } are the extreme cases. Y max encodes all possible subproblems, while Y min only encodes the singletons, i.e., the evaluation of the cost function f for every x ∈ X , as well as the full optimization problem.
In this contribution, we are interested in search-based algorithms. Hence we fix an adjacency relation ∼ on Y . For the landscape (Y, ∼,F), we consider the following three properties: (R3) Every y with ϕ(y) = X has a neighbor y ∼ y with ϕ(y) ⊂ ϕ(y ).
In plain words, (R1) ensures that all minimum-cost encodings are connected by paths staying at minimum cost. Under (R2), each configuration is the beginning of a path to a minimum-cost configuration, with the value of the cost function not increasing along the path. Property (R3) uses the fact that all configurations in Y are subsets of X . It says that each configuration y ∈ Y has a neighboring configuration properly containing y. It is worth noting that (R3) is independent of the oracle function F. For identity cover-encodings introduced above, a natural definition of adjacency is to set y ∼ y and y ∼ y whenever (i) y ⊆ y , (ii) y = y , and (iii) if y ⊆ y ⊆ y then y = y or y = y . That is, two sets are adjacent if they are adjacent in the Hasse diagram for set inclusion. By construction, every y ∈ Y is connected by a sequence of adjacent sets to all singletons {x} with x ∈ y and to the full set y = X . Since ϕ is the identity, (R3) holds. Using that y ⊆ y impliesF(y) ≥F(y ), properties (R1) and (R2) also follows immediately.
The importance of conditions (R1) and (R2) stems from the following observation: Proof Let y ∈ Y be an arbitrary starting point. IfF(y) =F(ŷ) then y, by (R1), is not a local optimum but part of a connected neutral network that contains the global optimumŷ. IfF(y) =F(ŷ), thenF(y) >F(ŷ). By (R2), there is a path with nonincreasing values ofF that connects y to a point y withF(y ) =F(ŷ). We already know that there is a path with constant values ofF leading from y to the global optimumŷ. Thus y is connected by aF-non-increasing path toŷ. Hence y is, by definition, not a strict local optimum.
In particular, the identity cover-encodings satisfy the conditions of Theorem 1 and thus their landscapes have no strict local optima. There are, however, also very different general constructions with this property. In the remainder of this section, we consider one example.
Definition 3 Let (X, ∼ X , f ) be an arbitrary landscape. Its square encoding is the map (Hammack et al. 2016). The idea behind this construction is to allow a local search algorithm to keep track of the best solution so far in one variable and use the other variable for exploration. Figure 1 shows an example.
In particular it has no strict local optima.
For y, y ∈ Y , we write d Y (y, y ) for the standard graph distance, the length of a shortest path, between y and y ; analogous notation for the distance d X on (X, ∼ X ).
For each element y ∈ Y we thus find a y ∈ Y that (i) is strictly closer toŷ than y is; and (ii) does not evaluate at higher value than y underF. Using the argument inductively at most d Y (y,ŷ) times, the desired sequences in (R1) and (R2) are constructed. Therefore properties (R1) and (R2) are fulfilled by (Y, ∼ Y ,F). Theorem 1 now implies that there are no strict local minima.

Adaptive Walks
An adaptive walk on a fitness landscape (Y, ∼ Y ,F) is a Markov chain on the state space Y with transition probabilities π y→z = 1/d y for y ∼ Y z andF(z) ≤F(y). Otherwise π y→z = 0, except for y = z where π y→y is obtained by normalization of probability. The degree d y of state y is the number of neighbors |{z ∈ Y : z ∼ Y y}|. Formulated as a stochastic search algorithm, a neighbor z of the current (time t) configuration y is drawn uniformly at random. IfF(z) ≤F(y), the walk proceeds to configuration z at time t + 1; otherwise it remains at configuration y.
CallŶ the set of global minima of the landscape (Y, ∼ Y ,F). Assume that this landscape does not have a strict local minimum. Then each realization of an adaptive walk eventually hits a global minimum. Due to the absence of strict local minima, the adaptive walk is trapped only at global minima. Each invariant measure of the On this landscape, a minimal cost configuration is reachable from all configurations by a non-increasing path adaptive walk therefore evaluates to zero on all configurations with non-minimum cost. Property (R2) clearly is a necessary condition for an optimization problem to be solvable by adaptive walks alone. The conditions of Theorem 1 are already sufficient as it excludes strict local optima.

Examples of Cover-Encoding Maps
Let us now turn to constructing some problem-specific examples of cover-encoding maps. We will then use some of these examples to show that some cover-encoding maps are useful to construct good heuristic search algorithms for several well-studied combinatorial optimization problems.

Prepartition Encoding for the NPP
An NPP instance is described by a list (a 1 , . . . , a n ) of numbers. We write [n] := {1, . . . , n} for the index set. We have to divide these n numbers into two subsets with as equal a sum as possible. In other words, we assign to each index i a variable see, e.g., (Mertens 2006) for a review. The set X consists of all strings of −1 and +1 of length n, the set Y consists of all functions [n] → [n]. The so-called prepartitioning encoding (Ruml et al. 1996) of the NPP can be written in the following way: Each function y : [n] → [n] defines the partition Π y := {y −1 (k)|1 ≤ k ≤ n} whose classes are the indices of the input numbers that are assigned the same value of y. As usual we write [i] Π y for the class y −1 (k) that contains index i. For given Π y we now insist that the signs x i = x j whenever y(i) = y( j). This amounts to the restricted set of configurations One easily checks that ϕ(y) = X whenever y is a bijection, i.e., (Y3) is satisfied. Furthermore, the subset Y * = {y ∈ Y |y([n])| = 2} corresponds exactly to the assignments of positive and negative signs: Writing y([n]) = {p, q} simply set x 1 = +1 if y(i) = p and x 1 = −1 if y(i) = q. (More precisely, the choice of x 1 = +1 or x 1 = −1 is arbitrary; the symmetry can, however, easily be removed, e.g., fixing x 1 = +1 once and for all.) Conversely, every assignment of signs has a representation as a bipartition in Y * . Thus (Y2) is satisfied.
The most natural choice of an adjacency ∼ on Y is to define y ∼ y if and only if y(i) = y (i) for exactly one i ∈ [n]. Unless y is a bijection, there is at least one unused value k ∈ [n] \ y ([n]) and at least one pair j , j ∈ [n] with y( j ) = y( j ). The neighbor y of y with y (i) = y(i) for i = j and y ( j ) = k corresponds to refinement of the partition Π y because and all other classes of Π y and Π y are the same. Thus (Y, ∼) satisfies (R3).
An optimal solutionx of the NPP (X, f ) is a partitionˆ of [n] into exactly two classes Q + and Q − so that x i = +1 for i ∈ Q + and x i = −1 for i ∈ Q − . A code y ∈ Y is good if there is a configuration in ϕ(y) in which the signs can be assigned in exactly this manner, i.e., if Π y is a refinement ofˆ . Conversely, ϕ(y) is good only if it is a refinement of a bipartition that represents a global minimum. Genericallŷ is unique. Now consider two classes Q 1 and Q 2 in Π y that are contained in the small class of , i.e., Q 1 , Q 2 ⊂ . Reassigning one element at a time from Q 2 to Q 1 thus corresponds to a sequence of codes y = y 1 , y 2 , . . . y |Q 2 | all of which are encode refinements . Furthermore, y |Q 2 | is one class less than y. Repeating this step at most n −2 times eventually results in . Intermediate codes y i and y i−1 are adjacent by construction and satisfyF(y i ) =F(ŷ), i.e, condition (R1) is satisfied. Thus, we conclude that the "oracle landscape" (Y, ∼,F) has no strict local minima.

The cost function of TSP (Gutin and Punnen 2007) is
where π ∈ X is a bijection π : [n] → C from the index set [n] to a set of cities C. The index i specifies the position along the tour. For a city c, therefore, π −1 (c) is its position along the tour. The problem is parametrized by distances d : C × C → R that satisfy d(c, c) = 0 for all c ∈ C but in general are neither symmetric nor do they satisfy the triangle inequality. Klemm et al. (2012) introduced the following version of a prepartition encoding. Here, an arbitrary function y : C → [n] is used to restrict the possible orderings of the cities along the tour as follows: For all cities c, d ∈ C, the condition y(c) < y(d) implies π −1 (c) < π −1 (d). Again this defines a subset X y of the search space X of each y. We use the same definition of adjacency in Y . Here, constant functions y impose no restrictions on π , i.e, ϕ(y) = X whenever y(c) = y(d) for all c, d ∈ C.
On the other hand, if y is bijective then X y consists only of a single tour since in this case y(c) = π −1 (c) for all c ∈ C, i.e., π = y −1 . Thus, (Y2) and (Y3) are satisfied.
To address properties (R2) and (R1), we first observe that given an encoding y, we can always move one city c with y(c) = k to one of the classes defined by y with an adjacent value k . More precisely, suppose k is such that (a) there is a city d so that y(d) = k and b) there are no cities e with y(e) = k , for any k between k and k . If k > k, the city which we can move is the one with y(c) = k that appears last in the optimal tour ω ∈ ϕ(y); similarly, if k < k, we can move the city c with y(c) = k that appears first in the optimal tour ω ∈ ϕ(y). In the first case, we can set k < y (c) ≤ k , while in the second case, we can choose k ≤ y (c) < k . By construction ω ∈ ϕ(y ), and thereforeF(y ) ≤F(y). It is also clear from the construction that the step from y to y can always be chosen so that the number of classes |y −1 ([n])| remains constant, increases by one |y −1 ([n])|, or decreases by oneunless we already have |y −1 ([n])| = n, in which case only a decrease is possible, or we have |y −1 ([n])| = 1, in which case only an increase is possible. Thus, we can always find a path along whichF(y ) does not increase and along which |y −1 ([n])| is non-increasing or non-decreasing, respectively. Note the moves keeping |y −1 ([n])| constant might be necessary to move the values y(c) stepwise around in [n] to have enough "space" to break up individual classes of y −1 , so that its members in the end have consecutive values of y. It is not hard to convince oneself that this is always possible. As a consequence, we can always connect any y to a code with a single class (for which ϕ(y) = X ). For two adjacent classes, we simply join, one-by-one, the cities of the smaller class to the larger one. Furthermore, the single-class code can be broken by pulling a city at a time so that (R1) also holds. Note that (R3) is not necessarily satisfied, however.
In contrast to the previous example of the NPP, here the paths are much more involved and often longer. We therefore conjecture that the prepartition encoding is less efficient for the TSP than for the NPP.

Spanning Forest Encoding for the NPP
A very different encoding for the NPP can be constructed as follows. Denote by Y the set of all spanning forests of the complete graph K n . For a detailed discussion of the combinatorics of spanning forests, we refer to (Teranishi 2005). For each forest y ∈ Y denote by y a one of its connected components. Since y a is a tree and thus bipartite, there is a uniquely defined bipartition (V + y a , V − y a ) of its vertex set. We assign q i = +1 for i ∈ V + y a and q i = −1 for i ∈ V − y a to the other.
Suppose the spanning forest y has k components. Then, the sign pattern on each component y a is uniquely defined by fixing independently the sign of the lexicographically smallest i ∈ V y a . Thus, ϕ(y) consists of exactly 2 k distinct configurations. It follows that ϕ(y) = X if y contains no edges. Denoting the complement of x byx, we have ϕ(y) = {x,x} whenever y is a spanning tree. Since x andx represent the same solution of the number partitioning problem, ϕ satisfies (Y2) and (Y3). (R3) holds since removing an edge from the spanning forest y yields another spanning forest y that imposes fewer restrictions and thus corresponds to a larger subset of X . In general, write y ≺ y if y is a subforest of y. Then ϕ(y) ⊂ ϕ(y ). The unconstrained search space corresponds to the spanning forest y 0 without edges. Conversely, every spanning treet that defines the bipartition of the globally minimal solution of the original NPP encodes exactly this solution. Every sequencê t = y n−1 y n−2 · · · y 1 y 0 of spanning forests obtained by successive edge deletions fromt connects y 0 andt and each ϕ(y i ) also contains the global minimum encoded byt. Thus (R1) holds.

Subdivision Encoding for the TSP
An alternative encoding for the TSP uses a permutation ψ : [n] → C of the set of C cities and subdivision Π of [n] into consecutive intervals. We specify Π by the upper bound of the interval, i.e., I u := {k|i u−1 < k ≤ i u }. Since the tours are circular, we set i 0 = i m and as usual consider the order < circular on [n]. Therefore, I 1 := {i m+1 , . . . , i n , 1, . . . i 1 }. An encoded configuration y := (ψ, Π ) fixes the order ψ of cities ψ(k) within each of the index intervals I u . The first city in interval I u is ψ(i u−1 + 1), the last city is ψ(i u ). Thus, π ∈ ϕ(y) if π is obtained by permuting the intervals I u and following the order given by ψ within each interval, as shown in Fig. 2.
If Π is the discrete partition, then we obviously have ϕ(y) = X , while the indiscrete partition uniquely specifies the tour ψ. The encoding therefore satisfies (F0), (Y0), (Y1), (Y2), and (Y3). Consider any adjacency relation ∼ on Y so that y ∼ y if Π is obtained by splitting a class (interval) into two or merging two intervals. Then (R3) is clearly satisfied.
In order to consider (R1), we specify the adjacency relation ∼ more stringently. If y ∼ y , then either (i) y is obtained from y by splitting exactly one class of y into two non-empty parts or vice versa, or (ii) y and y exhibit the same partition of the cities, i.e., Π = Π . In case (i), the ordering within each class in maintained. For the split interval I u = [ψ (i u−1 +1), . . . , ψ (i u )], this means that an index j ∈ [i u−1 +1, i u −1] is chosen and the resulting intervals become I u 1 = [ψ (i u−1 + 1), . . . , ψ ( j − 1)] and I u 2 = [ψ ( j), . . . , ψ (i u )]. The ordering between intervals (classes of Π ) remains fixed. In case (ii), the partition and the ordering within the intervals both remain unchanged, but the ordering of the intervals (classes of Π ) changes. For our purposes, it is not important which types of permutations between intervals are allowed, as long as they form an ergodic set. Plausible choices are transpositions, canonical transpositions, reversals, or even all permutations. Now consider an encoded configurationŷ withx ∈ ϕ(ŷ). The intervals of specified y are partial tours of the globally optimal solution. Moves on Y can now be performed so that a new encoding y is obtained in a stepwise fashion, that uses the same intervals and brings two partial tours that are consecutive inx into the desired order. During this stepwise change of ψ, the encoded sets ϕ(y) stay the same, and thus ϕ(y ) = ϕ(ŷ). Now the two appropriate consecutive intervals can be merged. This reduces m by 1 and makes ϕ(y) smaller, but the globally optimal solution is still retained, i.e., x ∈ ϕ(y). The procedure can be repeated at most m − 1 times to reach the indiscrete partition, which fully specifies the globally optimal tour. Thus, (R1) holds for all choices of neighborhoods that allow merging/splitting of adjacent intervals and an ergodic permutation of the intervals.

Sparse Subgraph Encoding for the Maximum Matching Problem
For a graph G = (V, E), a matching is a subset M ⊆ E of pairwise disjoint edges, i.e., (V, M) is a graph with a maximum degree of at most 1. Denoting by X the set of matchings on G, the maximum matching problem (MMP) (X, f ) has the cost function f giving the number of unmatched nodes in a matching M. Thus, the MMP asks for a subset of edges that cover as many nodes as possible without having any node contained in more than one edge (Lovász and Plummer 1986). Now consider an edge subset S ⊆ E. In the present context, we call S sparse if the graph (V, S) has maximum degree 2, so each connected component of (V, S) is a cycle or path (including isolated nodes as trivial paths). Denote by Y the set of all sparse subsets of E. Since a matching M is also a sparse subset of G, we have X ⊆ Y .
The cover-encoding map ϕ : Y → 2 X assigns each S ∈ Y the set of maximum matchings of the graph (V, S). Now with S sparse, the maximum matching problem on (V, S) is trivially solved separately on each connected component being a path or cycle. For a path of odd length k, the maximum matching is unique with (k + 1)/2 edges; a path or cycle of even length k has exactly two disjoint maximum matchings of cardinality k/2. A cycle of odd length k has exactly k pairwise different maximal matchings of cardinality (k − 1)/2. In order to demonstrate properties (R1) and (R2), let y ∈ Y \ {ŷ}. We show that there is y ∼ Y y withF(y ) ≤F(y) and |(y ∪ŷ)\(y ∩ŷ)| ≤ |(y ∪ŷ)\(y ∩ŷ)|. Thus, neighbor y is obtained from y either by adding an edge contained inŷ or removing an edge not contained inŷ. If y ⊃x, find an edges e ∈ y \x and set y = y \ {e}, and we are done. Otherwise, since y =ŷ, there is an edge {v, w} = e ∈x \ y. If y ∪ {e} =: z is sparse, we are done using y = z. Otherwise at least one of nodes v and w has degree 3 in the graph (V, z); suppose node v has degree 3. Find a maximum matching x ∈ ϕ(y). Since v has degree 2 in the graph (V, y), there is an edge e ∈ y \ x incident in v. Set y = y \ {e }. We easily confirmF(y ) ≤F(y) in each of the cases above. Sequences for properties (R1) and (R2) are obtained by induction.

String Encoding for the Maximum Clique Problem
For a graph G = (V, E), a clique is a node subset C ⊆ V inducing a fully connected subgraph, i.e., {v, w} ∈ E for all v, w ∈ C with v = w. Denoting by X the set of cliques of G, the maximum clique problem (MCP) (X, f ) has the cost function f giving the number of nodes outside a clique M (Bomze et al. 1999). For arbitrary l ∈ N and any string of not necessarily distinct nodes (v 1 , v 2 , . . . , v l ) ∈ V l , we define the greedy clique γ G (v 1 , v 2 , . . . , v l ) ⊆ V recursively by . . , v l−1 ) otherwise (9) and γ G (∅) = ∅ for the empty string ∅.
We construct a cover-encoding map ϕ based on strings of length |V | =: n, so Y = V n . For a string y ∈ Y , we denote the substring (suffix) from index k to the end (index n) by (y) k . Now ϕ maps a string y ∈ Y to maximal greedy cliques over suffices of y, So a clique C is contained in ϕ(y) if and only if C is a greedy clique from a suffix of y and none of the other greedy cliques from y properly contains C. This ensures that ϕ produces all the singletons, thus fulfilling property (Y2). We call y pure if |ϕ(y)| = 1. A string y ∈ Y is pure if and only if {y i : i ∈ [n]} is a clique of G. We define strings y, y ∈ Y to be adjacent, in symbols y ∼ Y y , if and only if there is a unique index i ∈ [n] with y i = y i (Hamming distance 1).
In order to prove properties (R1) and (R2), we first observe that there is a nonincreasing sequence of strings from any y ∈ Y to a pure y (p) ∈ Y with ϕ(y (p) ) ⊆ ϕ(y) andF(y (p) ) =F(y). The sequence is obtained by finding a maximal C ∈ ϕ(y). If y is not pure, there is i ∈ [n] with y i / ∈ C. The next string in the sequence can be obtained by replacing the entry y i with an arbitrary element from C.
If y, z ∈ Y are pure with ϕ(y) = ϕ(z) = {C} and |C| < n, there is a non-increasing sequence from y to z. It may be constructed by stepwise swapping operations. Since |C| < n, there is at least one element in C found at two distinct positions in y so one of these can be used as a temporary variable in the swap. Now let y, y ∈ Y withF(y ) ≤F(y). Find a maximal clique C ∈ ϕ(y) and a maximal clique C ∈ ϕ(y ). We construct a non-increasing sequence from y to y by concatenating the following sequences. First, a non-increasing sequence from y to a pure y (p) ∈ Y withF(y (p) ) =F(y). Second, a non-increasing sequence from y (p) to a pure z ∈ Y with {z 1 , z 2 , . . . , z |C| } = C and {z 1 , z 2 , . . . , z |C\C | } = C \ C , and arbitrary z |C|+1 , z |C|+2 , . . . , z n ∈ C. Third, a sequence from z to a string z is obtained by assigning, step by step, nodes in C \ C to entries from z |C|+1 to z n . The sequence is non-increasing because each of its strings generates C under ϕ. On the other hand, γ G ((z ) (|C\C |+1) ) = C soF(z ) =F(y ). Now again by swap steps, we transform z into y .

Coarse-Graining
Some of the restricted search spaces ϕ(y) introduced above can also be thought of as coarse-grainings of the original problem. In the following subsections, we show this for the prepartition and spanning forest encodings of the NPP, as well as for the TSP.

Prepartition Encoding of the NPP
Consider the NPP instance with numbers {a 1 , a 2 , . . . , n} and let Π = {Q 1 , . . . , Q m } be an arbitrary partition of [n] with classes (subsets) Q j so that m ≤ n. Of course, we can think of Π as the classes defined by the prepartition encoding, i.e., Π = {y −1 (k)|k ∈ [n]}. Set b j = i∈Q j a i . Then the set of numbers {b 1 , . . . , b m } defines an NPP on m numbers. In terms of a prepartition y this amounts to b k = i∈y −1 (k) a i . Note that if m = n, then Π is the discrete partition in which every class Q j contains only a single element, and hence {a 1 , . . . a n } = {b 1 , . . . , b m }. In the general case, the solutions of the two NPPs are related to each other in the following way. Denote the variables for the smaller NPP by x j ∈ {+1, −1} and write f a and f b for the cost functions. Then, obviously An optimal solutionx of the larger problem (X, f a ) corresponds to a partitionˆ of [n] into exactly two classes Q + and Q − so that x i = +1 for i ∈ Q + and x i = −1 for i ∈ Q − . The coarse-grained NPP (X , f b ) has an optimal solution with the same cost if (and in the generic case also only if) Q j ⊆ Q + or Q j ⊆ Q − holds for all j ∈ [m], i.e., if (and generically only if) the coarse-graining partition Π is a refinement of the partitionˆ that encodes the globally optimal solution of the original problem.

Travelling Salesman Problems
Recall the subdivision encoding for the TSP and fix an encoding y = (ψ, Π ). The length of the partial tour inside the interval I u is Furthermore, the road from interval I p to interval I q is the road from ψ(i p ) to ψ(i q−1 + 1), i.e.,d Since a tour π ∈ ϕ(y) is uniquely defined by a permutation ξ : [m] → [m] of the intervals, we have where˜ (ξ ) = id ξ(i),ξ(i+1) is the tour length of the TSP restricted to the connections between the fixed intervals. With a slight change, one can also produce a TSP that retains the original values of the cost function. To this end, we set and (ξ ) := id ξ(i),ξ(i+1) . A short computation verifies (π ) = (ξ ). Note that we naturally obtain an asymmetric TSP even if the original problem was symmetric since now d pq = d qp because in general we will have d π(i p )π(i q−1 +1) = d π(i q )π(i p−1 +1) .

Spanning Forest Representation of the NPP
Let us now return to the NPP. Let y be a spanning forest of K n . For each connected component (tree) t⊆y let V + t and V − t be the corresponding bipartition of the vertex set of t.
This defines an instance of the NPP with as many numbers b t as connected components in y. A choice of sign z t ∈ {+1, −1} for t implies a particular choice of sign for each a i , i.e., each configuration z for the NPP with numbers {b} corresponds to a configuration x of the original problem with numbers {a}. Clearly, these coincide with the configurations ϕ(y) described in Sect. 3.4.3.

Some Remarks on Coarse-Grainings: Analogies with the Renormalization Group?
It is tempting to speculate that the coarse-grainings we have observed in the above are analogous to those observed in renormalization group theory, well known for its use in analyzing spin glasses and related disordered systems (Rosten 2012). In our context, it can be described as follows. For a given type of problem, such as the NPP or the TSP, consider the space X of all possible instances of all sizes. A particular instance (e.g., the NPP with n numbers a = {a 1 , a 2 , . . . , a n }) is a point x ∈ X. Now we define a set R of maps r : X → X that map larger instances to strictly smaller ones. Of interest in this context are in particular those maps r that (approximately) preserve salient properties. Since r (x) is a smaller instance than x, the map r is not invertible. The maps in R can of course be composed, and thus form a semi-group which is known as the renormalization group (Wilson and Kogut 1974;Wilson 1971). Of course, while renormalization groups in statistical physics are used to analyze the typical behavior of large systems near criticality, our focus in the present optimization context is on particular instances of systems that are typically large. This does not yet rule out an analogy, assuming that something like an ergodic hypothesis applies, where the behavior of typical instances is indeed that of the average. Thus, starting from x = (X, f ), or more precisely, an encoding y so that ϕ(y) = x, we can think of adjacent encodings y ∼ y with |ϕ(y )| < |ϕ(y)| as "renormalized" versions of ϕ(y). A path in (Y, ∼) leading from x to the trivial instance thus can be seen as the iteration of progressively renormalized samples. A positive example of this analogy could be that of the spanning forest encoding of the NPP with real-space renormalization schemes for Ising spins: an example of an R could be a so-called block spin transformation (Kadanoff 1966), where suitable averages are taken over small local subsets of spins, which are then progressively scaled up to larger system sizes to explore their critical behavior. Only certain block variables will work for such schemes, depending on the underlying symmetries of the problem, just as, in the earlier subsection, only the sums of numbers a i preserve the optimal solutions. Such simple real-space scalings, do not, however, always exist for our optimization schemes: the prepartition encoding of the TSP, for example, cannot be rephrased as a coarse-grained (i.e., reduced-size) TSP. To see this, simply observe that the evaluation of a tour in the restricted model still requires an optimization over multiple incoming and outgoing connections (roads) for every city, i.e., the information of inter-city distances cannot be collapsed in any way upon the transition from a larger (less restricted) to a smaller (more restricted) problem. This does not, however, rule out the possibility of, say, a renormalization-type scaling in some sort of generalized Fourier space. In the case of landscapes on permutation spaces, the characters of the symmetric group provide a suitable Fourier-like basis (Rockmore et al. 2002), which seem to be applicable to TSP and certain assignment problems. These and other possibilities are currently being explored, since it seems that deep similarities may underlie relatively superficial differences in the nature of the transformations involved in renormalization groups and the optimization-facilitating encodings that are the subject of this paper.

General Considerations
So far, we were only concerned with the abstract structure of cover-encoding maps ϕ : Y → 2 X and the adjacencies ∼ in their encodings Y . On this theoretical basis, we can now construct a search-based optimization heuristic that generalizes the approaches in (Ruml et al. 1996) and our earlier work (Klemm et al. 2012). The idea is very simple: If we have an accurate and efficiently computable heuristic, we can quickly obtain good upper bounds α f (y) ≥F(y) for each of the restricted problems (ϕ(y), f ). The properties (R1) and (R2) guarantee the existence of non-increasing paths from an arbitrary initial encoding y 0 down to a final encodingŷ. Steps to adjacent encodings that decrease α f therefore will have a bias toward the optimal solution of the original problem.
The fact that we have to rely on the quality of the estimate α f (y) ≈F(y) also suggests that it should be more efficient to restart the search often rather than try to overcome barriers of local minima in the landscape (Y, α f ). In the examples above, local minima in (Y, α f ) can, as we have proved, appear only due to insufficient accuracy of the heuristic solutions α f (y) for some encodings.
The discussion above also implies guidelines for the construction of encodings: 1. The cover-encoding map ϕ : Y → X should be of a form that guarantees that (Y, ∼,F) has no local optima, i.e., the properties (R1), (R2), (Y1), and (Y2) should hold. 2. The paths in (Y, ∼) connecting large sets ϕ(y) to smaller ones should not contain many steps along which the sets do not shrink. For instance, while the prepartition encoding for the NPP always has a strictly coarse-grained neighbor, this is not the case for the prepartition encoding for the TSP. We therefore suspect that other encodings for the TSP will work better in general. 3. The heuristic producing α f (y) needs to be efficient, ideally not much slower than the function evaluations for the initial cost function f .
In order to demonstrate that the theory developed above may also have practical implications we probe instances of encoded landscapes by adaptive walks. To simulate a realization of an adaptive walk, we first generate an initial state y(0) by a procedure specific for the given landscape. At each time step t, we uniformly draw a neighbor z of state y(t) and set y(t + 1) = z ifF(z) ≤F(y(t + 1)), y(t + 1) = y(t) otherwise.
We select the MMP and the MCP as examples because (1) oracle functions and encodings can constructed that guarantee the absence of strict local minima; and (2) there is a simple and efficient algorithm for exact computation ofF(y) for each y ∈ Y . So we do not require heuristics. We leave the combination of cover-encoding maps with non-trivial heuristics for a future manuscript. Figure 3 shows the time evolution of cost in adaptive walks on the encoded landscapes of matchings encoded by sparse graphs, where the figure caption contains details on the instances and the definitions are to be found in Sect. 3.4.5. Note the logarithmic time axis in the plot.

Maximum Matching Problems
Both on purely random graphs and on those with a planted perfect matching, a solution of globally minimal cost is found. In addition to reaching a minimum-cost solution, we observe another interesting feature of the dynamics. The sizes of symbols (and annotated values in the uppermost curve) indicate the number of degrees of freedom δ = log 2 |ϕ(y(t))| of the solution y(t) at time t. This is the number of the connected components in the sparse graph, with two distinct maximum matchings. Departing from a singleton state (δ = 0), the number of degrees of freedom first increases and then decreases during the descent of cost. So the optimization happens as a walk through states y ∈ Y with large cardinality |ϕ(y)| of the encoded set. Furthermore as a particular feature of this encoded landscape, the optimization dynamics eventually returns to low δ, having |ϕ(y(t))| = 1 with a single optimal solution selected at large time t. For each graph size |V |, 100 random graph instances with parameter p = 1/2 are generated independently. For each instance, an adaptive walk on the encoded landscape is performed with starting state (1, 1, . . . , 1). Plotted values are differences betweenF(y(t)) of the state y(t) held by the adaptive walk at time t and the optimal cost F(x), averaged over the 100 instances. Length of error bars is the standard deviation over these instances. The exact F(x) is computed with a branch-and-bound algorithm (Östergård 2002) Figure 4 shows the time evolution of the cost of adaptive walks on the encoded landscapes of graph cliques encoded by node sequences. The figure caption contains details on the instances and relevant definitions can be found in Sect. 3.4.6. We plot the difference with the minimum costF(y), so that a plotted value of 0 means the global optimum has been found. Our tentative conclusions are that the time to reach the optimal solution scales moderately with problem size. The standard deviation over realizations (error bars in the plot) also indicates a moderate variation of optimization time across these randomly generated instances.

Discussion and Conclusions
In this contribution we have shown that, in principle, it is possible to construct a genotypic encoding for any given phenotypically encoded combinatorial optimization problem with the property that the encoded landscape has no strict local minima. The construction hinges on three ingredients: a cover-encoding map ϕ : Y → 2 X that satisfies a few additional conditions, a suitable adjacency relation on Y , and an oracle function that (miraculously) returns the optimal cost value on the restrictions of the original problem to the covering sets ϕ(y). Of course, if we had such an oracle function in practice, we would not need a search heuristic in the first place.
Nevertheless, the concepts of oracle functions and cover-encoding maps are not just an empty exercise. We have seen that cover-encoding maps ϕ give rise to practically useful encodings provided there is a good deterministic heuristic for the restriction of the optimization problem to ϕ(y). For the NPP, it turns out that the Karmarkar-Karp differencing algorithm (Karmarkar and Karp 1982;Boettcher and Mertens 2008) provides a very good approximation to the oracle function. The prepartition encoding proposed by Ruml et al. (1996), on the other hand, ensures that the landscape of the oracle function is of the desirable type that has no local minima. Together these two facts make the work of Ruml et al. (1996) a showcase application of the theory developed here.
The numerical simulations of Sect. 5 strongly suggest that encodings with localminima-free landscapes indeed admit efficient optimization by local search-based methods also for other optimization problems. Hence the theoretical results obtained here are of practical relevance provided a sufficiently accurate approximation to the oracle function can be computed. The precise meaning of the phrase "sufficiently accurate approximation" remains an open question for future research. We suspect, however, that the main problem arises when the approximation claims α f (y ) < α(y), suggesting that a step from y to y be accepted, whileF(y ) >F(y) holds, suggesting the step to y should not be taken.
The construction of encodings for several well-known optimization problems also highlights the connections between encodings and a natural notion of coarse-graining for optimization problems. This also suggests a link to renormalization group methods commonly used in statistical physics. While it is clear that there is not a trivial correspondence, and that real-space coarse-grainings are just a particular subclass of encodings, this connection certainly deserves further study. The formalism laid out here at least provides a promising starting point.
An important issue in biology is the fact that encodings as symbolized by the genotype-phenotype map are themselves subject to evolutionary changes because the mechanisms of development evolve. It is well known that features of the genotypephenotype, such as robustness (Wagner 2005) and accessibility (Fontana and Schuster 1998;Ndifon et al. 2009) have a key influence on evolution in the long term. Mathematical approaches that focus on the properties of encodings thus may become a very useful component in formal theories of evolvability and developmental evolution.