1 Introduction

The theory of self-adjusting evolutionary algorithms (EAs) is a research area that has made significant progress in recent years [2]. For example, a self-adjusting choice of mutation and crossover probability in the algorithm so-called (1+(\(\lambda ,\lambda \))) GA allows an expected optimization time of O(n) on OneMax, which is not possible with any static setting [3, 4]. Many studies focus on unimodal problems, while self-adjusting EAs for multimodal problems in discrete search spaces, more precisely for pseudo-Boolean optimization, were investigated only recently from a theoretical runtime perspective. Stagnation detection proposed in Rajabi and Witt [1] addresses a shortcoming of classical self-adjusting EAs, which try to learn promising parameter settings from fitness improvements. Despite the absence of fitness improvements when the best-so-far solution is at a local optimum, stagnation detection learns from the number of unsuccessful steps and adjusts the mutation rate if this number exceeds a certain threshold. Thanks to this mechanism, the so-called SD-(1+1) EA proposed in Rajabi and Witt [5] optimizes the classical Jump function with n bits and gap size m in expected time \(O((en/m)^m)\), which corresponds asymptotically to the best possible time achievable through standard bit mutation, more precisely when each bit is flipped independently with probability m/n. For comparison, the classical (1+1) EA with mutation rate 1/n optimizes Jump in time \((1\pm o(1)) en^m\) [6]. It is worth pointing out that stagnation detection does not have any prior information about the gap size m.

Although leaving a local optimum requires a certain number of bits to be flipped simultaneously, which we call the gap size, the SD-(1+1) EA mentioned above still performs independent bit flips. Therefore, even for the best setting of the mutation rate, only the expected number of flipping bits equals the gap size while the actual number of flipping bits may be different. This has motivated Rajabi and Witt [7] to consider the k-bit flip operator flipping a uniform random subset of k bits as known from randomized local search (RLS) [4] and to adjust k via stagnation detection.

Rajabi and Witt [7] emphasize that their RLS with self-adjusting k-bit flip operator resembles variable neighborhood search [8] but features less determinism by drawing the k bits to be flipped uniformly at random instead of searching the neighborhood in a fixed order. The random behavior still maintains many characteristics of the original RLS, including independent stochastic decisions which ease the runtime analysis. If the bit positions to be flipped follow a deterministic scheme as in quasirandom EAs [9], dependencies complicate the analysis and make it difficult to apply tools like drift analysis. However, a drawback of the randomness is that the independent, uniform choice of the set of bits to be flipped leaves a positive probability of missing an improvement within the given time a specific parameter value k is tried. Therefore, the first RLS variant with stagnation detection proposed in Rajabi and Witt [7] and called SD-RLS\(^{\text {p}}\) there has infinite expected runtime in general, but is efficient with high probability, where the success probability of the algorithm is controlled via the threshold value for the number of steps without fitness improvement that trigger a change of k. We remark here that the problem with infinite runtime does not exist with the independent bit flips as long as each bit is flipped with probability in the open interval (0, 1).

In this paper, we denote by SD-RLS\(^{\text {p}}\) the simple SD-RLS just proposed. To guarantee finite expected optimization time, Rajabi and Witt [7] introduce a second variant that repeatedly returns to lower so-called (mutation) strengths, i. e., number of bits flipped, while the algorithm is still waiting for an improvement. The largest neighborhood size (i. e., number of bits flipped) is denoted as radius r and, in essence, the strength s is increased in a loop from 1 to r before the radius is increased. Interestingly, the additional time spent at exploring smaller strengths in this loop, with the right choice of phase lengths, contributes only a lower-order term to the typical time that SD-RLS\(^{\text {p}}\) has in the absence of errors. Hence, we do not discuss the original SD-RLS\(^{\text {p}}\) in this paper in detail. The resulting algorithm (given in Algorithm 2 in Sect. 2) that implements the additional return to smaller mutation strengths to ensure global convergence is called SD-RLS\(^{\text {r}}\) in Rajabi and Witt [7], where the label r denotes robust.

For optimized settings of its parameter R and all \(2\le m\le (1-\epsilon )n\), where \(\epsilon >0\) is a constant, SD-RLS\(^{\text {r}}\) achieves an expected runtime of \((1\pm o(1)) \left( {\begin{array}{c}n\\ m\end{array}}\right) \) on Jump functions and, and in general, it overcomes gaps of size \(m\ge 1\) in expected time at most \((1 + o(1)) \left( {\begin{array}{c}n\\ m\end{array}}\right) \). Hence, compared to the SD-(1+1) EA with independent bit flips from Rajabi and Witt [5], SD-RLS\(^{\text {r}}\) allows a speed-up of \((\frac{ne}{m})^m/\left( {\begin{array}{c}n\\ m\end{array}}\right) \) (up to lower-order terms) on functions with gap size m, amounting for a speed-up of up to roughly \(e=2.718\dots \) on unimodal functions, while still being able to search globally. See Table 1 for comparison of runtime results of SD-(1+1) EA and SD-RLS\(^{\text {r}}\) on Jump and further optimization scenarios (detailed below).

As already explained in Rajabi and Witt [7], SD-RLS\(^{\text {r}}\) (and also the plain SD-RLS\(^{\text {p}}\)) returns to strength 1 after every fitness improvement and tries this strength for a sufficiently large time to find an improving one-bit flip with high probability. This behavior can be undesired on highly multimodal landscapes where progress is typically only made via larger strengths. As an example, the minimum spanning tree (MST) problem as originally considered for the (1+1) EA and an RLS variant in Neumann and Wegener [10] requires two-bit flips to make progress in its crucial optimization phase. Both theoretically and experimentally, Rajabi and Witt [7] observed that SD-RLS\(^{\text {r}}\) is less efficient than the RLS variant from Neumann and Wegener [10] since low, useless strengths (here 1) are tried for a too long period of time. On the other hand, it can also be risky exclusively to proceed with the strength that was last found to be working if the fitness landscape becomes easier at some point and progress can again be made using smaller strengths.

In this paper, we address this trade-off between exploiting high strengths that were necessary in the past and again trying smaller strengths for a certain amount of time. We propose a mechanism called radius memory that uses the last successful strength value along with the number of evaluations conducted using this strength to assign a reduced budget of iterations to smaller strengths. This budget is often much less than the number stereotypically tried in SD-RLS\(^{\text {r}}\) after every fitness improvement. However, the budget must be balanced carefully to allow the algorithm to adjust itself to gap sizes becoming smaller over the run of the algorithm. Our choice of budget is based on the number of iterations (which is the same as the number of fitness evaluations) passed to find the latest improvement and assigns the same combined amount of time, divided by \(\ln n\), to smaller strengths tried afterwards. This choice, incorporated in our new algorithm SD-RLS\(^{\text {m}}\), basically limits the time spent at unsuccessful strengths by less than the waiting time for an improvement with the last successful strength but is still big enough to adjust to smaller strength sufficiently quickly. On one hand, it (up to lower-order terms) preserves the runtime bounds on general unimodal function classes and jump functions shown for SD-RLS\(^{\text {r}}\) in Rajabi and Witt [7]. On the other hand, it significantly reduces the time for the strength to return to larger values on two highly multimodal problems, namely optimization of linear functions under uniform constraints and the minimum spanning tree problem (MST).

Concretely, we obtain the following runtime bounds and trade-offs for our new algorithm SD-RLS\(^{\text {m}}\) compared to the previous variant SD-RLS\(^{\text {r}}\) from Rajabi and Witt [7] and the variant with independent bit flips from Rajabi and Witt [5]: on the Jump function class with \(2\le m\le (1-\epsilon )n/2\), we prove the same expected runtime of \((1 + o(1)) \left( {\begin{array}{c}n\\ m\end{array}}\right) \) as for SD-RLS\(^{\text {r}}\). On the classical fitness function for instances of the MST from Neumann and Wegener [10], the previous algorithm SD-RLS\(^{\text {r}}\) was proved to run in time \((1+o(1)) (m^2 \ln m + 4mn\ln n\cdot \mathord {\textrm{E}}\mathord {\left( S\right) })\), where S is the number of strict improvements during the run. For small value of \(\mathord {\textrm{E}}\mathord {\left( S\right) }\), this already improves upon the classical bounds for the (1+1) EA of \(O(m^2(\ln n + \ln w_{\max }))\) and \(O(m^3\ln n)\) from Neumann and Wegener [10] and Reichel and Skutella [11], respectively. However, for our new SD-RLS\(^{\text {m}}\) with radius memory, we prove a bound of \((1+o(1)) (m^2 \ln m)\). This is by a factor of roughly 2 faster than the bound proved for the classical RLS using 1- and 2-bit flips only [10]. It is worth pointing out that our result for SD-RLS\(^{\text {m}}\) not only improves upon all existing bounds, but still considers a globally searching algorithm. For such algorithms, the analysis for the MST problem is usually more complicated than for local search, as pointed out in Reichel and Skutella [11]. Moreover, for an example instance of linear functions under a uniform constraint, we show that SD-RLS\(^{\text {m}}\) can obtain a certain approximation of the optimum in time \(O(n \log n)\), while the variant without radius memory SD-RLS\(^{\text {r}}\) requires time \(\Omega (n^2 \log n)\). Finally, we present an artificially constructed function where radius memory makes SD-RLS\(^{\text {m}}\) highly inefficient while SD-RLS\(^{\text {r}}\) is highly efficient. A summary and overview of all these runtime results can also be found in Table 1.

Although our concept of stagnation detection using radius memory is implemented in a simple RLS maintaining one individual only, we implicitly consider stagnation detection as a module that can be added to other algorithms as shown in Rajabi and Witt [1] and very recently in Doerr and Zheng [12] for multi-objective optimization. Concretely, we could also use the stagnation detection with radius memory in population-based algorithms.

This paper is structured as follows. In Sect. 2, we define the algorithms considered and collect some important technical lemmas. Section 3 presents time bounds for the new algorithm SD-RLS\(^{\text {m}}\) to leave local optima and applies these to obtain bounds on the expected optimization time on unimodal and jump functions. Moreover, it includes in Theorem 2 the crucial analysis of the time for the strength to settle at smaller values when an improvement is missed. Thereafter, these results are used in Sect. 4 to analyze SD-RLS\(^{\text {m}}\) on linear functions under uniform constraints and to show a linear-time speedup compared to the SD-RLS\(^{\text {r}}\) algorithm in Rajabi and Witt [7]. Section 5 shows the above-mentioned result that SD-RLS\(^{\text {m}}\) optimizes MST instances on graphs with m edges in expected time at most \((1+o(1))(m^2\ln m)\), to the best of our knowledge, the first asymptotically tight analysis of a globally searching (1+1)-type algorithm on the problem. In Sect. 6, we present the example where the radius memory is detrimental and leads to exponential optimization time with probability \(1-o(1)\) while the original SD-RLS\(^{\text {r}}\) from Rajabi and Witt [7] is highly efficient. Section 7 presents experimental supplements to the analysis of SD-RLS\(^{\text {r}}\) and SD-RLS\(^{\text {m}}\) and comparisons with other algorithms from the literature before we finish with some conclusions.

Table 1 Comparison of expected runtime bounds for the classical (1+1) EA with mutation probability 1/n and the three variants of stagnation detection algorithms SD-(1+1) EA, SD-RLS\(^{\text {r}}\) and SD-RLS\(^{\text {m}}\)

2 Preliminaries

2.1 Algorithms

We start by describing a class of classical RLS algorithms and the considered extensions with stagnation detection. Algorithm 1 is a simple hill climber that uses a static strength s and always flips s bits uniformly at random. The special case where \(s=1\), i. e., using one-bit flips, has been investigated thoroughly in the literature [13] and is mostly just called RLS.

Algorithm 1
figure a

RLS with static strength s for the maximization of \(f:\{0,1\}^n \rightarrow \mathbb {R}\)

The algorithm RLS\(^{1,2}\) (which also is just called RLS in Neumann and Wegener [10]) is an extension of this classical RLS (i. e., Algorithm 1 with strength 1) choosing strength \(s\in \{1,2\}\) uniformly before flipping s bits. This extension is crucial for making progress on the MST problem, as further explained in Sect. 5.

In Rajabi and Witt [7], RLS is enhanced by stagnation detection, leading to Algorithm 2. In a nutshell, the algorithm increases its strength after a certain number of unsuccessful steps according to the threshold value \(\left( {\begin{array}{c}n\\ s\end{array}}\right) \ln R\) which has been chosen to bound the so-called failure probability at strength s, i. e., the probability of not finding an improvement at Hamming distance s, by at most \(\bigl (1-1/\left( {\begin{array}{c}n\\ s\end{array}}\right) \bigr )^{\left( {\begin{array}{c}n\\ s\end{array}}\right) \ln R}\le 1/R\). It also incorporates logic to return to smaller strengths repeatedly by maintaining the so-called radius value r. All variables and parameters will be discussed in detail below when we come to our extension with radius memory. Algorithm 2 is called SD-RLS\(^{\text {r}}\), where the label r denotes robust, in Rajabi and Witt [7]. As mentioned above, there is also a simpler variant called SD-RLS\(^{\text {p}}\) without the logic related to the radius variable. However, that variant is not robust, has infinite expected optimization time in general even on unimodal problems.

Algorithm 2
figure b

RLS with robust stagnation detection (SD-RLS\(^{\text {r}}\)) for the maximization of \(f:\{0,1\}^n \rightarrow \mathbb {R}\)

Algorithm 3
figure c

RLS with robust stagnation detection and radius memory mechanism (SD-RLS\(^{\text {m}}\)) for the maximization of \(f:\{0,1\}^n \rightarrow \mathbb {R}\)

In the following, we present in Algorithm 3 the new algorithm SD-RLS\(^{\text {m}}\) using stagnation detection and radius memory. It extends SD-RLS\(^{\text {r}}\) by adding logic for setting the helper variable B (called budget). This variable is computed based on the number of unsuccessful steps and subsequently used to impose a limit on the original threshold \(\left( {\begin{array}{c}n\\ s\end{array}}\right) \ln R\). The limitation is achieved by applying the ratio \(\frac{u}{(\ln n)(r-1)}\), where u is explained later. We describe the algorithm in more detail now.

After a strict improvement with strength s (which becomes the initial radius r for the next search point), the algorithm uses all strengths \(s^\prime <r\) for \(\min \{\frac{u}{(\ln n)(r-1)}, \left( {\begin{array}{c}n\\ s^\prime \end{array}}\right) \ln R\}\) attempts, where u is the value of the counter at the time that the previous improvement happened. Hence, the smaller strength are used in altogether at most \(u/\ln n\) steps, i. e., asymptotically less than the number of steps for the last improvement. Once the current strength becomes equal to the current radius, the threshold becomes \(\min \{\infty , \left( {\begin{array}{c}n\\ s^\prime \end{array}}\right) \ln R\}=\left( {\begin{array}{c}n\\ s^\prime \end{array}}\right) \ln R\) for the rest of iterations with the current search point. Therefore, the cap at \(\frac{u}{(\ln n)(r-1)}\) is only effective as long as the current radius equals r and the current strengths are smaller than r.

For technical reasons, the radius increases directly to n when it has passed n/2. Moreover, as another technical detail, we accept search points of equal fitness only if the current radius is one (leading to the same acceptance behavior as in classical RLS, see Algorithm 1), whereas only strict improvements are accepted at larger radii.

The factor \(1/\ln n\) appearing in the first argument of the minimum

$$\begin{aligned}\min \left\{ \frac{u}{(\ln n)(r-1)}, \left( {\begin{array}{c}n\\ s\end{array}}\right) \ln R\right\} \end{aligned}$$

is a parameter choice that has turned out robust and useful in our analyses. The choice \(\frac{u}{r-1}\), i. e., an implicit constant of 1, could seem more natural here since then the algorithm would look at smaller strengths as often as the last successful strength was tried; however, this would make our forthcoming bounds worse by a constant factor.

As mentioned above, stagnation detection has also a parameter R to bound the probability of failing to find an improvement at the “right” strength. We will prove (see Lemma 1) that the probability of not finding a strict improvement where there is a potential of making progress is at most 1/R. The recommended value of R from Rajabi and Witt [7] for SD-RLS\(^{\text {r}}\) is still valid for SD-RLS\(^{\text {m}}\). We demand \(R\ge n^{4+\epsilon }\cdot |{{\,\textrm{Im}\,}}f|\) for an arbitrarily small constant \(\epsilon \) (where \({{\,\textrm{Im}\,}}f\) is the image set of f). In any case, R has been chosen so that the probability of ever missing an improvement at the right strength is sufficiently small throughout the run. In fact, in some places, we recommend a tighter value for R, namely \(R\ge \max \{S, n^{4+\epsilon }\}\), where S is an upper bound on the number of strict improvements during the run. Obviously, we can always choose \(S = |{{\,\textrm{Im}\,}}f|\).

The runtime or the optimization time of a search heuristic on a function f is the first point in time t where a search point of optimal fitness has been created; often the expected runtime, i. e., the expected value of this time, is analyzed.

3 Analysis of the Algorithm SD-RLS\(^{\text {m}}\)

In this section, we shall derive general bounds on the time for SD-RLS\(^{\text {m}}\) to leave local optima, see Sect. 3.1. Afterwards, we apply them to the class of unimodal functions as well as a variant of the Jump benchmark (Sect. 3.2). As mentioned in the introduction, we will obtain results comparable to SD-RLS\(^{\text {r}}\) here.

3.1 Expected Times to Leave a Search Point

In this subsection, we will analyze the time for SD-RLS\(^{\text {m}}\) to leave a local optimum, i. e., a search point that has a Hamming distance larger than 1 to all improvements. This analysis will be similar to the one conducted for SD-RLS\(^{\text {r}}\) in Rajabi and Witt [7]. However, the radius memory comes with a potential risk if a local optimum is not left fast enough. We shall deal with this in Theorem 2 below, which is a crucial result about the time for the radius and strength variables of SD-RLS\(^{\text {m}}\) to recover from such a situation.

We recall the so-called individual gap and fitness level gap of a non-optimal point \(x\in \{0,1\}^n \) defined in Rajabi and Witt [5, 7] as follows.

$$\begin{aligned} {{\,\textrm{IndividualGap}\,}}(x):&=\min \{H(x,y):f(y)>f(x) , y\in \{0,1\}^n\}, \\ {{\,\textrm{FitnessLevelGap}\,}}(x):&=\max \left\{ {{\,\textrm{IndividualGap}\,}}(y): f(y)=f(x), y\in \{0,1\}^n \right\} . \end{aligned}$$

The individual gap of \(x\in \{0,1\}^n\) equals the minimum Hamming distance of x to points with strictly larger fitness function value. Also, by the fitness level of x, we mean all the search points with fitness value f(x). We call the fitness level gap of a point \(x\in \{0,1\}^n\) the maximum of all individual gap sizes in the fitness level of x.

We define a function f as having a uniform level gap at x, if for every search point y in the the fitness level of x, we have \({{\,\textrm{IndividualGap}\,}}(x)={{\,\textrm{IndividualGap}\,}}(y)\).

Assume x is the current search point immediately following a strict improvement or the initial search point, let \(m={{\,\textrm{FitnessLevelGap}\,}}(x)\) and r be the current radius. After the improvement, SD-RLS\(^{\text {m}}\) has not necessarily reset the radius to 1 as SD-RLS\(^{\text {r}}\) does. Hence for the analysis, we consider two cases where

  1. (a)

    \(m \ge r\) and

  2. (b)

    \(m < r\).

In Case (a), the upper bound of SD-RLS\(^{\text {r}}\) is also an upper bound on the improvement time of SD-RLS\(^{\text {m}}\). Intuitively this holds since in the proof of Theorem 3 in Rajabi and Witt [7], we pessimistically assume that no strict improvement is found before increasing the radius to m. Also, the number of iterations SD-RLS\(^{\text {m}}\) needs to increase the radius to m is at most the iterations SD-RLS\(^{\text {r}}\) needs. Formally, we shall prove the following theorem.

Theorem 1

Let \(0<\epsilon <1\) be a constant. Consider SD-RLS\(^{\text {m}}\) with \(R\ge n^{4+\varepsilon }\) on a pseudo-Boolean function \(f:\{0,1\}^n\rightarrow \mathbb {R}\). Let \(x_t\) and \(r_t\) be the search point and radius value at time \(t\ge 0\), respectively. Let \(t^*\) be a point of time immediately following a strict improvement or the time 0 of initialization, assuming that \(x_{t^*}\) is not yet globally optimal. Let \(m={{\,\textrm{FitnessLevelGap}\,}}(x_{t^*})\). By T we define the number of iterations after time \(t^*\) until SD-RLS\(^{\text {m}}\) makes a strict improvement. Then \(\mathord {\textrm{E}}\mathord {\left( T\right) } = O\left( 2^n n \ln R\right) \). Moreover, if \(r_{t^*}\le m\le (1-\epsilon )n/2\), we have

$$\begin{aligned} \mathord {\textrm{E}}\mathord {\left( T\right) } \le \left( {\begin{array}{c}n\\ m\end{array}}\right) \left( 1+O\left( \frac{m\ln R}{n}\right) \right) . \end{aligned}$$

Proof

We start with the case \(r_{t^*}\le m\le (1-\epsilon )n/2\). Note that SD-RLS\(^{\text {m}}\) starts with radius \(r_{t^*}\), but SD-RLS\(^{\text {r}}\) begins with radius 1. Thus we can use all arguments used in the proof of Theorem 3 and its dependent results (Lemma 2 and Lemma 3) in Rajabi and Witt [7] except for the fact that SD-RLS\(^{\text {m}}\) might need fewer iterations to increase the radius to m. Then we prove the same statement but for SD-RLS\(^{\text {m}}\) in this case. Here we note that for the case \(r_{t^*}=m\), the budget B is used only for strengths less than m (and not m), so we have \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) iterations with strength m.

The general bound \(\mathord {\textrm{E}}\mathord {\left( T\right) } = O\left( 2^n n \ln R\right) \) proved for SD-RLS\(^{\text {r}}\) in Rajabi and Witt [7] immediately transfers to SD-RLS\(^{\text {m}}\) since it pessimistically assumes that the maximum radius of n has been reached. The time to reach that radius is again no larger for SD-RLS\(^{\text {m}}\) than for SD-RLS\(^{\text {r}}\). \(\square \)

However, in Case (b), where \(r_{t^*} > {{\,\textrm{FitnessLevelGap}\,}}(x_{t^*})\), SD-RLS\(^{\text {m}}\) can be outperformed by SD-RLS\(^{\text {r}}\) since if it fails to improve at every radius, the algorithm meets larger strengths, which may be costly. It means that although the gap size of the current search point is less than its parent, the time for a strict improvement may be larger than with SD-RLS\(^{\text {r}}\). However, in the rest of the paper, we will use the fact that the radius memory mechanism does not essentially increase the optimization time. Intuitively, the additional waiting time is captured by the escape time for the parent of x so that rather quickly, the radius “recovers” and is set to the gap of the current search point.

More concretely, in Theorem 2 below, we will show that if the algorithm uses larger strengths than the gap size to make progress, after a relatively small number of iterations the algorithm makes progress with high probability and chooses the gap size of a current search point as the subsequent strength. In particular, the theorem includes bounds on the expected time for a strict improvement in the case \(m<r\) not captured in Theorem 1 above.

Theorem 2

Let \(\epsilon >0\) be an arbitrary constant. Consider SD-RLS\(^{\text {m}}\) with \(R\ge n^{4+\epsilon }\) on a pseudo-Boolean function \(f:\{0,1\}^n\rightarrow \mathbb {R}\). Let \(x_t\), \(r_t\), and \(B_t\) be the search point, the radius value and the value of the budget at time \(t\ge 0\), respectively. Let \(t^*\) be a point of time immediately following a reset of strength “\(s\leftarrow 1\)” in lines 8 or 17 of SD-RLS\(^{\text {m}}\), assuming that \(x_{t^*}\) is not yet globally optimal. Assume that \(m={{\,\textrm{IndividualGap}\,}}(x_{t^*})\) and \(m< r_{t^*}\le n/2\). Moreover, assume that for all \(t\ge t^*\) and before a global optimum is found, f has a uniform gap at \(x_{t}\) and \({{\,\textrm{IndividualGap}\,}}(x_{t})=m\).

Define S as the smallest point of time \(a\ge t^*\) such that \(r_{a}\le m\), i. e., where the current radius is at most the current gap size, or a global optimum is reached.

Define the waiting time \(T=S-t^*\). If \(B_{t^*} \ge \left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\), then

$$\begin{aligned} \mathord {\textrm{E}}\mathord {\left( T\right) } = \mathord {o}\mathord {\left( \left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \right) },\end{aligned}$$

and in general

$$\begin{aligned} \mathord {\textrm{E}}\mathord {\left( T\right) } = \mathord {o}\mathord {\left( R\cdot \left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \right) }.\end{aligned}$$

The idea of the proof is that SD-RLS\(^{\text {m}}\) still tries all smaller strengths often enough, although the current radius is large. More precisely, either (a) spending not too many iterations at larger strengths, including some strict improvements with strength larger than m, the algorithm also tries the “right” strength sufficiently often, i. e., \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) iterations with strength m or (b) the current large radius increases by 1, leading to a reset of the budget (\(B=\infty \)). Thus, the algorithm can find a strict improvement when the strength equals the current gap with good probability, and afterwards the radius equals that strength. Here it is crucial that the current gap size does not change over time. Note also that the condition \(r_a\le m\) from the theorem is equivalent to \(r_a=m\) since the individual gap of all non-optimal search points from time \(t^*\) is assumed to be m, i. e., there is no improvement at Hamming distance smaller than m.

Before we proceed with the proof, we state a simple lemma that will be repeatedly used in our analyses. Informally speaking, it states that \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) steps at strength m are sufficient with high probability to overcome a gap of size m. To handle the acceptance of equally fit search points in a fitness level while the radius is 1, the lemma assumes that the fitness level gap equals the individual gap of all points in the fitness level.

Lemma 1

Consider SD-RLS\(^{\text {m}}\) on a pseudo-Boolean fitness function \(f:\{0,1\}^n\rightarrow \mathbb {R}\). Let \(x_t\) denote the search point at time \(t\ge 0\). Consider an arbitrary point in time \(t_1>0\) and let \(L_{t_1}=\{x\in \{0,1\}^n \mid f(x) = f(x_{t_1})\}\) be the fitness level set of \(x_{t_1}\). Assume that f has a uniform level gap at \(x_{t_1}\).

Let \(t_2>t_1\) be the first point in time such that there are at least \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) iterations at strength m in the time interval \([t_1,t_2]\) or a strict improvement

has been found. Then with probability at least \(1-1/R\), a strict improvement is found in the interval.

Proof

We distinguish two cases according to \(m={{\,\textrm{IndividualGap}\,}}(x_{t_1})\).

Assume \(m>1\). Then the algorithm SD-RLS\(^{\text {m}}\) cannot find a strict improvement with radius 1 and will not change the current search point before the next improvement. By assumption on individual graps of the search points in \(L_{t_1}\), there is at least one search point at Hamming distance m of higher f-value. Therefore, at each point of time in the interval where strength m is used, the probability of finding a strict improvement is at least \(1/\left( {\begin{array}{c}n\\ m\end{array}}\right) \) and the probability of failing to find a strict improvement in \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) such steps is at most

$$\begin{aligned} \left( 1-\frac{1}{\left( {\begin{array}{c}n\\ m\end{array}}\right) }\right) ^{\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R} \le \frac{1}{R}. \end{aligned}$$

In the case \(m=1\), the algorithm might accept search points of equal fitness value. However, by definition of the fitness level gap, all these points have an individual gap of 1 and we can still apply the above argumentation with \(m=1\). \(\square \)

In all examples considered in this paper, all search points in the same fitness level have the same individual gap sizes, so the assumptions of Lemma 1 will be satisfied for each current search point. In the following, we do not mention this in detail any longer.

We can now give the proof of Theorem 2.

Proof of Theorem 2

Clearly, by definition of the stopping time T, we will only analyze points in time t where \({{\,\textrm{IndividualGap}\,}}(x_t)<r_t\). Let \(g_{x_t}:={{\,\textrm{IndividualGap}\,}}(x_t)\). Note that \(g_{x_{t}} = m\) as assumed in the lemma for all points in time \(t\ge t^*\) before reaching the global optimum.

Recall that the radius equals \(r_{t^*}\) and the strength equals 1 at the start of the period that we analyze in the stopping time T. We will prove the following claim: After at most \(I^*:=2r_{t^*}(\ln n)\left( {\begin{array}{c}n\\ r_{t^*}\end{array}}\right) \ln R\) iterations, either

  1. a)

    at least \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) iterations at strength m have happened and the current radius has never increased in the \(I^*\) iterations (or a strict improvement at strength m happens).

  2. b)

    or, the current radius increases by 1 since SD-RLS\(^{\text {m}}\) has not found a strict improvement in the iterations where its strength equals its radius

Before we prove the claim, we will discuss its implications. If case a) happens, then SD-RLS\(^{\text {m}}\) finds a strict improvement with the strength equal to the gap size of the current search point with probability at least \(1-1/R\) according to Lemma 1. By definition of the algorithm, if this improvement happens, the radius at the next point of time will be set to this strength, i. e., strength m. Since the gap size does not change before reaching the optimum, this new radius equals the gap of the new search point after the improvement.

If Case b) happens, after increasing the radius, SD-RLS\(^{\text {m}}\) starts a new phase with radius at most \(r_{t^*}+1\), strength 1 and budget \(B=\infty \). After at most \(\sum _{i=1}^m \left( {\begin{array}{c}n\\ i\end{array}}\right) \ln R \le n\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) iterations (using \(m\le n/2\)), it has increased the strength to m and used that strength at least \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) times (or found a strict improvement at that strength). Again this means that with probability \(1-1/R\), SD-RLS\(^{\text {m}}\) finds a strict improvement with strength equal to the current gap size. Hence, if the claim holds, then after \(I^{**}= I^*+n\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R \le 2n (\ln n) \left( {\begin{array}{c}n\\ r_{t^*}\end{array}}\right) \ln R + n\left( {\begin{array}{c}n\\ r_{t^*}\end{array}}\right) \ln R \le 3n (\ln n) \left( {\begin{array}{c}n\\ r_{t^*}\end{array}}\right) \ln R\) iterations a strict improvement at gap size m has been found with probability at least \(1-1/R\) and the strength at the next point of time equals the current gap size m, too. In the case of a failure, i. e., not finding a strict improvement, the radius equals at most \(r_{t^*}+1\) afterwards since we only consider \(\sum _{i=1}^m \left( {\begin{array}{c}n\\ i\end{array}}\right) \ln R\) additional iterations after resetting the strength to 1. We are left with an analysis of the failure event and the proof of the claim. We do the latter first.

We now prove the claim. As the radius increases after at most \(\left( {\begin{array}{c}n\\ r_{t^*}\end{array}}\right) \ln R\) unsuccessful steps at strength and radius \(r_{t^*}\) (corresponding to Case b)), it is sufficient to estimate the number of steps V until \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) iterations at strength m have happened (Case a)) and to add another \(\left( {\begin{array}{c}n\\ r_{t^*}\end{array}}\right) \ln R\) steps to obtain the estimate \(I^*\). Hence, it suffices to show that \(V\le (2r_{t^*}-1)(\ln n) \left( {\begin{array}{c}n\\ r_{t^*}\end{array}}\right) \ln R\). To see this, we note that every strict improvement sets the radius to the currently used strength. If that strength equals m, there is nothing left to show. If the radius after the improvement, call it \(r'\), is larger than m (but as most \(r_{t^*}\), since otherwise we have entered Case b)), then all strengths less than \(r'\), including strength m, are used \(\frac{u}{(r'-1)\ln n} \ge \frac{u}{(r_{t^*}-1)\ln n}\) times, where u is the number of iterations before this strict improvement in Case a). The total number of iterations after the strict improvement until reaching strength m is \((r'-1) \frac{u}{(r'-1)\ln n} + u \le 2u\). Only after reaching strength at least m, which is the current gap size, another strict improvement can happen. These arguments hold for any further strict improvement in Case a). Hence, after at most \(2((r_{t^*}-1)\ln n) a\) iterations in Case a), regardless of the actual number of strict improvements in between, at least a steps at strength m have happened. Substituting \(a=\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) proves the claim.

Finally, we have to handle the error case. As argued above, only with probability at most 1/R a failure occurs and the radius equals at most \(r_{t^*}+1\) afterwards. However, we do not know whether the strength equals 1 after a failure, which is required in the assumptions of this theorem. Therefore, we pessimistically allow \(\sum _{i=1}^{r_{t^*}+1} \left( {\begin{array}{c}n\\ i\end{array}}\right) \ln R \le n\left( {\begin{array}{c}n\\ r_{t^*}+1\end{array}}\right) \ln R\) iterations without strict improvement to happen to achieve a strength of 1, pessimistically assuming a radius of \(r_{t^*}+2\) afterwards. Together with our bound \(I^{**}\), this amounts to at most \(4n(\ln n)\left( {\begin{array}{c}n\\ r_{t^*}+1\end{array}}\right) \ln R\) iterations. Then we conduct the above analysis again with a starting radius of \(r_{t^*}+2\), where the probability of failure is again bounded by 1/R. Hence, iterating the argument and using the law of total probability, the expected number of steps until the algorithm finds a strict improvement at strength m is at most

$$\begin{aligned}&\sum _{k=0}^{\left( \lfloor n/2\rfloor -r_{t^*}\right) /2} 4n\left( \ln n\right) \left( {\begin{array}{c}n\\ r_{t^*}+2k\end{array}}\right) \left( \ln R\right) \cdot R^{-k} \\&\quad < \sum _{k=0}^{\left( \lfloor n/2\rfloor -r_{t^*}\right) /2} 4n\left( \ln n\right) \cdot \left( \frac{n-r_{t^*}}{r_{t^*}}\right) ^{2k+1}\left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \left( \ln R\right) \cdot R^{-k}\\&\quad \le \sum _{k=0}^{\left( \lfloor n/2\rfloor -r_{t^*}\right) /2} 4n^2\left( \ln (n+R)\right) \left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \left( \frac{n^2}{R}\right) ^{k} = o\left( R \left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \right) \end{aligned}$$

for the radii ranging from \(r_{t^*}\) to \(\lfloor n/2\rfloor \), using \(R\ge n^{4+\epsilon }\). After radius \(\lfloor n/2\rfloor \), the next radius is radius \(r=n\), which is kept until the next strict improvement. Here we estimate the number of iterations that are sufficient for a strict improvement with probability \(1-1/R\) by \(\tilde{I}:=4n(\ln n)\left( {\begin{array}{c}n\\ \lfloor n/2\rfloor \end{array}}\right) \ln R\) since the binomial coefficient is largest with second argument \(\lfloor n/2\rfloor \). Noting that at least \(k^*:=(\lfloor n/2\rfloor +1-r_{t^*})/2\) failures are necessary to reach this radius, we obtain the upper bound

$$\begin{aligned}&\frac{R}{R-1}\cdot 4n(\ln n)\left( {\begin{array}{c}n\\ \lfloor n/2\rfloor \end{array}}\right) (\ln R) \cdot R^{-k^*} \\&\quad < 4n (\ln (n+R)) \frac{R}{R-1} \left( \frac{n-r_{t^*}}{r_{t^*}}\right) ^{\lfloor n/2\rfloor +1-r_{t^*}} \left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \cdot R^{-k^*}\\&\quad \le 4n(\ln (n+R)) \frac{R}{R-1} \left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \left( \frac{n^2}{R}\right) ^{k^*}. \end{aligned}$$

Here the fraction \(R/(R-1)\) stems from the expected value of a geometric distribution with success probability \(1-1/R\), since each phase of length \(\tilde{I}\) fails to find a strict improvement at strength m with probability at most 1/R by Lemma 1. Since \(R\ge n^{4+\epsilon }\), also the last estimate is bounded from above by \(\mathord {o}\mathord {\left( R\left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \right) }\). This proves the general (second) statement

$$\begin{aligned} \mathord {\textrm{E}}\mathord {\left( T\right) }=\mathord {o}\mathord {\left( R\left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \right) } \end{aligned}$$

of the lemma.

Finally, we use the previous result to show the first statement of the lemma. Note that from the initial radius \(r_{t^*}\) and strength 1, before reaching the strength \(r_{t^*}\), the algorithm uses strength m \(B_{t^*}\) times. If \(B_{t^*}\ge \left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\), then by Lemma 1 with probability at least \(1-1/R\), the algorithm finds an improvement at the Hamming distance corresponding to the gap size, resulting in the stronger bound

$$\begin{aligned}\mathord {\textrm{E}}\mathord {\left( T\right) }\le \frac{1}{R} \cdot \mathord {o}\mathord {\left( R\left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \right) }=\mathord {o}\mathord {\left( \left( {\begin{array}{c}n\\ r_{t^*}-1\end{array}}\right) \right) }. \end{aligned}$$

\(\square \)

3.2 Expected Optimization Times

In this subsection, we use the results from the previous subsection to obtain bounds on the expected optimization time of SD-RLS\(^{\text {m}}\) on unimodal functions. These bounds are similar to corresponding results for SD-RLS\(^{\text {r}}\). Afterwards, we will consider generalized jump functions to analyze a situation where the radius memory of SD-RLS\(^{\text {m}}\) may potentially be harmful. However, it will turn out that the potential harm at most contributes a lower-order term to the total expected runtime.

Analysis on unimodal functions We consider the definition of unimodal functions given in Droste et al. [14]. A function is unimodal if and only if there is only one local maximum, where a local maximum is defined as a search point such that no Hamming neighbor has a larger fitness value. An immediate result of this definition is that on unimodal functions, any search point except the global optimum has a better point in its Hamming neighborhood. We note that other authors define the class of unimodal functions with weaker conditions, which would allow for plateaus of constant fitness.

Using our definition of unimodal functions, the individual gap of all points in the search space (except for global optima) is one, so the algorithm can make progress with strength 1 from all non-optimal search points. In the following theorem, we show how SD-RLS\(^{\text {m}}\) behaves on unimodal functions compared to RLS using an upper bound based on the fitness-level method [15]. The result and its proof are similar to Theorem 4 in Rajabi and Witt [7].

Theorem 3

Let \(\epsilon >0\) be a constant. Let \(f:\{0,1\}^n\rightarrow \mathbb {R}\) be a unimodal function and consider SD-RLS\(^{\text {m}}\) on f with \(R\ge \max \{S, n^{4+\epsilon }\}\), where S is an upper bound on the number of strict improvements during the run, e. g., \(S = |{{\,\textrm{Im}\,}}f|\). Then there is an event G happening with probability at least \(1-S/R^2\), such that conditionend on G, SD-RLS\(^{\text {m}}\) never uses strengths larger than 1 and behaves stochastically like RLS, also conditioned on G, before finding an optimum of f.

Denote by T the runtime of SD-RLS\(^{\text {m}}\) on f. Let \(f_i\) be the i-th fitness value of an increasing order of all fitness values in f and \(s_i\) be a lower bound for the probability that RLS finds an improvement from search points with fitness value \(f_i\). Then

$$\begin{aligned} \mathord {\textrm{E}}\mathord {\left( T\right) } \le \left( 1+o(1)\right) \sum _{i=1}^{ |{{\,\textrm{Im}\,}}f| } \frac{1}{s_i}.\end{aligned}$$

Proof

The algorithm SD-RLS\(^{\text {m}}\) uses strength 1 for \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) iterations when the radius is 1 and \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) times when the radius is 2 but the strength is still 1. (Only considering the first case would not be sufficent for the result of this theorem.) Overall, the algorithm tries \(2\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R\) steps with strength 1 before setting the strength to 2. As on unimodal functions, the individual gap of all points is 1, the probability of not finding a strict improvement in these steps is

$$\begin{aligned}\left( 1-\frac{1}{\left( {\begin{array}{c}n\\ m\end{array}}\right) }\right) ^{2\left( {\begin{array}{c}n\\ m\end{array}}\right) \ln R} \le \frac{1}{R^2}.\end{aligned}$$

This argumentation holds for each strict improvement that has to be found starting from the initial search point, and also after each such strict improvement, the budget B is reset to \(\infty \) while both radius and strength are (re)set to 1. Since at most \(|{{\,\textrm{Im}\,}}f|\) improving steps happen before finding the optimum, by a union bound the probability of SD-RLS\(^{\text {m}}\) ever increasing the strength beyond 1 is at most \(S\frac{1}{R^2}\). If no such failure happens, the algorithm only use 1-bit flips like RLS. This proves the first claim of the theorem.

To prove the second claim, we again argue similarly to the proof of Theorem 4 in Rajabi and Witt [7]. We consider all fitness levels \(A_1, \dots , A_{|{{\,\textrm{Im}\,}}f|}\) such that \(A_i\) contains all search points with fitness value \(f_i\) and sum up upper bounds on the expected times to leave each of these fitness levels. Let \(T_i\) denote the random time spent in level \(A_i\). We note that \(T_i\) follows a truncated geometric distribution with success probability at least \(1/s_i\) and a number of trials equal to the number of steps where the strength fitting the fitness-level gap is used. Using the truncated geometric distribution, within at most \(1/s_i\) iterations in expectation, the algorithm leaves the fitness level by finding a strict improvement or increases the radius to 2. Using Lemma 1, the probability that the radius is increased before creating a strict improvement is at most 1/R. Moreover, according to line 8 of SD-RLS\(^{\text {m}}\), the budget variable B had the value \(\infty \) before the improvement since the radius started at 1. Using Theorem 2 in the case of large \(B_{t^*}\), if the algorithm increases the radius to 2, the expected time to find a strict improvement, weighted by the probability of reaching radius 2, is \((1/R)o( \left( {\begin{array}{c}n\\ 1\end{array}}\right) )=o(n/R)=o(1)\). Altogether, since \(s_i\le 1\), we have by the law of total probability that

$$\begin{aligned} \mathord {\textrm{E}}\mathord {\left( T_i\right) } \le \frac{1}{s_i} + o(1) = \left( 1+o(1)\right) \frac{1}{s_i}. \end{aligned}$$

Then, since \(T\le \sum _{i=1}^{|{{\,\textrm{Im}\,}}f|-1}T_i\),

$$\begin{aligned} \mathord {\textrm{E}}\mathord {\left( T\right) } \le \left( 1+o(1)\right) \sum _{i=1}^{|{{\,\textrm{Im}\,}}f|-1} \frac{1}{s_i}. \end{aligned}$$

\(\square \)

Analysis on JumpOffset

We now use the results developed so far to analyze a situation where the radius memory in SD-RLS\(^{\text {m}}\) may potentially be detrimental. Concretely, we will prove a bound on a newly designed function called Jump with Offset or JumpOffset illustrated in Fig. 1 with two parameters m and c. The function JumpOffset can be considered as a variant of well-known Jump benchmark [14], which the location of the jump with size m is moved to an earlier point. Then, after the jump, there is a unimodal sub-problem behaving like OneMax of length c. The Jump function is a special case of JumpOffset with \(c=0\), i. e., \(\textsc {JumpOffset} _{m,0}=\textsc {Jump} _m\). Formally,

$$\begin{aligned} \textsc {JumpOffset} _{m,c} :={\left\{ \begin{array}{ll} m + |x|_1 & \text { if }|x|_1\le n-m-c\text { or }|x|_1\ge n-c,\\ n-|x|_1-c & \text { otherwise.} \end{array}\right. } \end{aligned}$$
Fig. 1
figure 1

The function \(\textsc {JumpOffset} _{m,c}\)

This generalized version of Jump was also proposed independently in Bambury et al. [16] to investigate how quickly various mutation-based algorithms overcome a gap that is not adjacent to the global optimum. In another recent study, Witt in [17] analyzes the performance of other algorithms on the function JumpOffset.

The following theorem shows that SD-RLS\(^{\text {m}}\) optimizes JumpOffset in a time that is essentially determined by the time to overcome the gap only. The proof idea is that the algorithm can quickly re-adapt its radius value to the gap size of the current search point after escaping the local optimum. Hence, the remaining time to find the global optimum after overcoming the fitness gap is essentially not negatively influenced by the radius memory mechanism.

Theorem 4

Let \(\epsilon >0\) be a constant. For all \(2\le m<O(\ln n)\) and \(0\le c<O(\ln n)\), the expected runtime \(\mathord {\textrm{E}}\mathord {\left( T\right) }\) of SD-RLS\(^{\text {m}}\) with \(R\ge n^{4+\epsilon }\) and \(R\le 2^n\) on \(\textsc {JumpOffset} _{m,c}\) satisfies \(\mathord {\textrm{E}}\mathord {\left( T\right) } \le (1+o(1))\mathord {\left( \left( {\begin{array}{c}n\\ m\end{array}}\right) \right) }\), conditioned on an event that happens with probability \(1-o(1)\).

Proof

Before reaching the local optimum consisting of all points with \(n-m-c\) one-bits, \(\textsc {JumpOffset} _{m,c}\) is equivalent to OneMax; hence, according to Lemma 3, the expected time SD-RLS\(^{\text {m}}\) takes to reach the local optimum is at most \(O(n\ln n)\).

Here we used the fitness level method with \(s_i=(n-i)/n\) as the minimum probability for leaving the set of search points with i one-bits.

Every local optimum x with \(n-m-c\) one-bits satisfies \({{\,\textrm{IndividualGap}\,}}(x)=m\) according to the definition of \(\textsc {JumpOffset} _{m,c}\). Thus, using Theorem 1, the algorithm finds one of the \(\left( {\begin{array}{c}m+c\\ m\end{array}}\right) \) improvements within an expected number of at most \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \) iterations. According to Lemma 1, this happens with strength m (and not larger) with probability at least \(1-1/R\).

Let \(t^*\) be the first point of time after overcoming the local optimum, i. e., after making progress over the gap of size m. Then the radius \(r_{t^*}\) is at least m, although a strict improvement can be found at Hamming distance 1, i. e., \({{\,\textrm{IndividualGap}\,}}(x_{t^*})=1\). Also, the individual gap of all subsequent search points (except the global optimum) is 1. If we show that \(B_{t^*}\ge \left( {\begin{array}{c}n\\ 1\end{array}}\right) \ln R\), the algorithm sets the radius to 1 within \(o(\left( {\begin{array}{c}n\\ m-1\end{array}}\right) )\) steps in expectation via Theorem 2 with \(r_{t^*}=m\), pessimistically assuming it does not find the optimum in the meantime. Now, we compute the probability that \(B_{t^*}<n \ln R\). By the working principles of SD-RLS\(^{\text {m}}\), this can only result from overcoming the local optimum within less than \( (m-1)(\ln n)n(\ln R)\) steps. Hence, let u be the number of iterations at strength m before the algorithm leaves the local optimum. We have

$$\begin{aligned}&\mathord {{{\,\textrm{Pr}\,}}}\mathord {\left( u<\left( m-1\right) \left( \ln n\right) n\left( \ln R\right) \right) } \le 1-\left( 1-\frac{\left( {\begin{array}{c}m+c\\ m\end{array}}\right) }{\left( {\begin{array}{c}n\\ m\end{array}}\right) }\right) ^{(m-1)(\ln n)n (\ln R)} \\&\quad \le (m-1)(\ln n)n(\ln R)\frac{\left( {\begin{array}{c}m+c\\ m\end{array}}\right) }{\left( {\begin{array}{c}n\\ m\end{array}}\right) } \le (m-1)(\ln n)n(\ln R)\frac{(e(m+c))^m}{n^m}. \end{aligned}$$

According to the assumptions on m, c and R, the last term is bounded from above by

$$\begin{aligned} (m-1)(\ln n)n(\ln R)\frac{O(\ln ^m n)}{n^m} = O\left( \frac{m(\ln ^m n)}{n^{m-1}}\right) = o(1). \end{aligned}$$

This means that with probability at least \(1-o(1)\), the budget variable B is not effective in the beginning of the next epoch with strength 1, so we use the first case in Theorem 2.

After recovering the radius to 1, the algorithm needs to optimize a sub-problem like OneMax of length at most c. Hence, similarly to the first paragraph of this proof, the expected remaining time until reaching the global optimum can again be bounded by \(O(n\ln n)\) according to Lemma 3.

Altogether, \(\mathord {\textrm{E}}\mathord {\left( T\right) } \le O(n\ln n)+ \left( {\begin{array}{c}n\\ m\end{array}}\right) +o(\left( {\begin{array}{c}n\\ m-1\end{array}}\right) )+O(n\ln n) = (1+o(1))\left( {\begin{array}{c}n\\ m\end{array}}\right) \), conditioned on the mentioned events of having enough iterations at the local optimum and escaping from the local optimum with strength m,

each happening with probability \(1-o(1)\). \(\square \)

We note that the bound \((1+o(1))\left( {\begin{array}{c}n\\ m\end{array}}\right) )\) is not necessarily tight. Using the techniques from Bambury et al. [16], it might be improved to \(O(\left( {\begin{array}{c}n\\ m\end{array}}\right) /\left( {\begin{array}{c}m+c\\ m\end{array}}\right) +n\log n)\). However, as mentioned above, the main point of Theorem 4 was to show that the radius memory mechanism of SD-RLS\(^{\text {m}}\) is not detrimental after crossing the fitness gap.

4 Speed-Ups by Using Radius Memory

The previous section showed that the radius memory of SD-RLS\(^{\text {m}}\) does not essentially slow down the algorithm on typical optimization scenarios compared to the earlier algorithm SD-RLS\(^{\text {r}}\). In this section, we will see a first example where the new radius memory mechanism leads to a significant speed-up. We consider the problem of minimizing a linear function under a uniform constraint as analyzed in Neumann et al. [18]: given a linear pseudo-Boolean function \(f(x_1,\dots ,x_n)=\sum _{i=1}^n w_i x_i\), the aim is to find a search point x minimizing f under the constraint \(|x|_1\ge B\) for some \(B\in \{1,\dots ,n\}\). W. l. o. g., \(w_1\le \dots \le w_n\).

Neumann, Pourhassan and Witt [18] obtain a tight worst-case runtime bound \(\Theta (n^2)\) for RLS\(^{1,2}\) and a bound for the (1+1) EA which is \(O(n^2\log B)\) and therefore tight up to logarithmic factors. We will see in Theorem 5 that with high probability, SD-RLS\(^{\text {m}}\) achieves the same bound \(O(n^2)\) despite being able to search globally like the (1+1) EA. Afterwards, we will identify a scenario where SD-RLS\(^{\text {r}}\) is by a factor of \(\Omega (\log n)\) slower and even by a factor of \(\Omega (n)\) slower if the target of optimization is restricted to finding an approximation of the optimum.

We start with the general result on the worst-case expected optimization time, assuming the set-up of Neumann et al. [18]. Here and in the following, we apply the algorithms for the minimization of the objective function, which is equivalent to maximizing the negated function.

Theorem 5

Let \(\epsilon >0\) be an arbitrary constant and consider SD-RLS\(^{\text {m}}\) with \(R\ge n^{4+\epsilon }\) on a linear function with a uniform constraint. Starting with an arbitrary initial solution, the expected optimization time is \(O(n^2)\), conditioned on an event that happens with probability \(1-O(1/n)\).

Proof

We follow closely the proof of Theorem 5 in Neumann et al. [18] which analyzes RLS\(^{1,2}\). The first phase of optimization (covered in Lemma 4 of the paper) deals with the time to reach a feasible search point and proves this to be \(O(n\log n)\) in expectation. Since the proof uses multiplicative drift, it is easily seen that the time is \(O(n\log n)\) with probability at least \(1-O(1/n)\) thanks to the tail bounds for multiplicative drift [19]. The second phase deals with the time to reach a tight search point (i. e., containing B one-bits, which is \(O(n\log n)\) with probability at least \(1-O(1/n)\) by the very same type of arguments. The analyses so far rely exclusively on one-bit flips so that the bounds also hold for SD-RLS\(^{\text {m}}\) thanks to Lemma 3, up to a failure event of \(O(S/R^2)=o(1/n)\) since it holds for the number of improvements S that \(S\le n\). By definition of the fitness function, only tight search points will be accepted in the following and 2-bit flips are only accepted if they flip both a one and a zero-bit.

The third phase in the analysis from Neumann et al. [18] considers a potential function \(\phi (x)\) defined below based on the configuration of the B ones-bits in the current search point. Let r be the number of bits of weight \(w_B\) among the B smallest weights \(w_1,\dots ,w_B\), i. e., \(r:=|\{i\mid w_i=w_B, 1\le i\le B\}|\). An optimal solution selects all bits of weight less than \(w_B\) and exactly r bits of weight \(w_B\). Let x be the current search point and

$$\begin{aligned} s(x)=\max \{0,r-|\{i\mid w_i=w_B \wedge x_i=1\}|\} \end{aligned}$$

the number one-bits of weight \(w_B\) missing in x. Furthermore, let

$$\begin{aligned} t(x):=|\{i\mid w_i<w_B \wedge x_i = 0\} |\end{aligned}$$

be the number of one-bits of weight less than \(w_B\) missing in x. We denote by the potential

$$\begin{aligned} \phi (x) = s(x) + t(x) \end{aligned}$$

the number of weights that are missing in the weight profile of the solution x compared to an arbitrary optimal solution.

As there are exactly B one-bits in the current solution x, this implies that there are exactly

$$\begin{aligned} \phi (x)=\{i\mid w_i>w_B \wedge x_i=1\} + \max \{0,|\{i\mid w_i=w_B \wedge x_i=1\}|-r \} \end{aligned}$$

weights chosen in x that do not belong to an optimal weight profile.

Given \(\phi (x)=i>0\) for the current search point, our definitions imply that there are at least i one-bits that can be swapped with an arbitrary 0-bit of the i missing weights in order to reduce the potential. The probability of decreasing the potential is \(\Theta (i^2/n^2)\) since there are \(i^2\) improving two-bit flips (i choices for a one-bit to be flipped to 0 and i choices for a zero-bit to be flipped to 1). Since the potential cannot increase, this results in an expected optimization time of at most \(\sum _{i=1}^\infty O(n^2/i^2) = O(n^2)\) for RLS\(^{1,2}\). Note that this algorithm uniformly at random decides whether one or two bits flip.

SD-RLS\(^{\text {m}}\) achieves the same asymptotic time bound as RLS\(^{1,2}\) since after every two-bit flip, the radius memory only allocates the time for the last improvement (via two-bit flips) to iterations trying strength 1. Hence, as long as no strengths larger than 2 are chosen, the expected optimization time of SD-RLS\(^{\text {m}}\) is \(O(n^2)\).

To estimate the failure probability, we need a bound on the number of strict improvements of f, which may be larger than the number of improvements of the \(\phi \)-value since steps that flip a zero-bit and a one-bit both located in the positions \(B+1,\dots ,n\) or in positions \(1,\dots ,B\), may be strictly improving without changing \(\phi \). Let us assign a value to each zero-bit, representing the number of one-bits at higher indices. In other words, each of these values shows the number of one-bits that can be flipped with the respective zero-bit to make a strict improvement. Let us define by S the sum of these values. Clearly, \(S\le n(n-1)/2 = O(n^2)\). Now, we claim that each strict improvement decreases S by at least one, resulting in bounding the number of strict improvements from above by \(O(n^2)\). Assume that in a strict improvement, the algorithm flips one one-bit at position i and one zero-bit at position j. Obviously, \(j<i\). The corresponding value for the zero-bit at the new position i is less than the corresponding number for the zero-bit at position j before flipping because there is at least one one-bit less for the new zero-bit, which was the one-bit at the position of i. Altogether, the number of strict improvements at strength 2 is at most \(O(n^2)\).

The proof is completed by noting that the strength never exceeds 2 with probability at least \(1-O(n^2)/R=1-O(1/n)\), using Lemma 1 and a union bound. \(\square \)

We believe that with more effort, including an application of Theorem 2, the bound from Theorem 5 could be turned into a bound on the expected optimization time. We will see an example of such arguments later in the proof of Theorem 6 and in the proof of Theorem 7.

We now illustrate why the original SD-RLS\(^{\text {r}}\) is less efficient on linear functions under uniform constraints than SD-RLS\(^{\text {m}}\). To this end, we study the following instance: the weights of the objective function are n pairwise different natural numbers (sorted increasingly), and the constraint bound is \(B=n/2\), i. e., only search points having at least n/2 one-bits are valid. Writing search points in big-endian as \(x=(x_n,\dots ,x_1)\), we assume the point \(1^{n/2}0^{n/2}\) as starting point of our search heuristic. The optimum is then \(0^{n/2}1^{n/2}\) since the n/2 one-bits are at the least significant positions. We call the latter positions the suffix and the other the prefix.

We defining the potential \(\phi (x)\) as the number of one-bits in the prefix. We will show that the expected time for for SD-RLS\(^{\text {m}}\) (up to a failure event) to reduce the potential from its initial value n/4 to n/8 is O(n) since during this period there is always an improvement probability of at least \(\Omega ((n/8)^2/n^2)=\Omega (1)\). This is essentially the same argument and running time as for the classical RLS\(^{1,2}\). Then we claim that SD-RLS\(^{\text {r}}\) needs time \(\Omega (n^2\log n)\) for this. Intuitively, this holds since after each improvement the strength and radius are reset to 1, resulting in \(\Omega (n)\) phases where the algorithm is forced to iterate unsuccessfully with strength 1 until the threshold \(\left( {\begin{array}{c}n\\ 1\end{array}}\right) \ln R\) is reached. By contrast, SD-RLS\(^{\text {m}}\) will spend \(O(n\ln n)\) steps to set the strength and radius to 2. Afterwards, during the n/8 improvements it will only spend an expected number of O(1) steps at strength 1 before it returns to strength 2 and finds an improvement in expected time O(1). The total time is \(O(n\log n+n)=O(n\log n)\). Based on these ideas, we formulate the following theorem.

Theorem 6

Let \(\epsilon >0\) be an arbitrary constant. Consider a linear function with n pairwise different weights under a uniform constraint with \(B=n/2\) and let \(S_c:=\{\phi (y)\le c\mid y\in \{0,1\}^n\}\). Starting with a feasible solution x such that x contains B one-bits and \(\phi (x)=a\), the expected time for SD-RLS\(^{\text {m}}\) with \(R\ge n^{4+\epsilon }\) to find a search point in \(S_c\) is at most

$$\begin{aligned} \mathord {O}\mathord {\left( n\ln R+n^2\sum _{i=c}^{a}\frac{1}{i^2}\right) }.\end{aligned}$$

For SD-RLS\(^{\text {r}}\) with \(R\ge n^{4+\epsilon }\) for an arbitrary constant \(\epsilon >0\) it is at least

$$\begin{aligned}\Omega \left( P\cdot n\ln R+ n^2\sum _{i=c}^{a}\frac{1}{i^2}\right) ,\end{aligned}$$

where P is the number of improvements with strengths larger than 1. Moreover, it holds that \(P>a-c\) with probability \(1-o(1/n)\).

Proof

We first find an upper bound for SD-RLS\(^{\text {m}}\). First, SD-RLS\(^{\text {m}}\) spends \(O(n\ln R)\) iterations with unsuccessful steps until it sets the strength and radius to 2. Afterwards, the number of iterations at strength 1 is a \((1/\ln n)\)-factor of the number of iterations with strength 2, conditioned on not exceeding the threshold when \(r=2\). The failure case and its probability will be studied later in detail.

When the strength equals 2, the probability of improving the potential is \(\Theta (i^2/n^2)\), resulting in an expected time of \(\Theta (n^2/i^2)\) for each improvement. Thus, in expectation, there are \(n^2\sum _{i=c}^a1/i^2\) iterations to find a search point in \(S_c\). For each improvement, the counter u of SD-RLS\(^{\text {m}}\) exceeds the current threshold only with probability 1/R using Lemma 1. If the radius becomes larger than 2, we use Theorem 2 to bound the expected number of steps until it becomes 2 again. Here we note that the theorem can be applied since the gap of all non-optimal search points is 2. Moreover, since we consider the situation where the radius is increased from 2 to 3, the last strength used was 2 and equaled the radius. Hence, the budget variable is still \(\infty \) after the increase of radius, so the first case of Theorem 2 applies. The expected number of iterations until reaching radius 2 again is therefore bounded by \(o(\left( {\begin{array}{c}n\\ 2\end{array}}\right) )\). Since the number of fitness improvements at strength 2 is at most \(n^2\) (see the proof of Theorem 5), we obtain a failure probability of at most \(n^2\cdot 1/R = o(1/n)\) for the event of exceeding the threshold. Hence, the expected number of iterations with different strengths is \(o((1/n)\left( {\begin{array}{c}n\\ 2\end{array}}\right) )=o(n)\) and therefore a lower-order term of the claimed bound on the expected time to reach \(S_c\).

In case of a failure, we repeat the argumentation. Hence, after an expected number of \(1/(1-o(1/n))=1+o(1)\) repetitions no failure occurs and \(S_c\) is reached.

Overall, the expected time for SD-RLS\(^{\text {m}}\) to find a search point in \(S_c\) is at most

$$\begin{aligned}(1+o(1))\left( n\ln R+(1+1/\ln n)\cdot \sum _{i=c}^a\frac{n^2}{i^2}\right) = \mathord {O}\mathord {\left( n\ln R+n^2\sum _{i=c}^{a}\frac{1}{i^2}\right) }.\end{aligned}$$

We now turn to the lower bound on the optimization time of SD-RLS\(^{\text {r}}\). We claim that the number of potential improvements exceeds \(a-c\) with probability \(1-o(1/n)\), i. e., each improvement decreases the potential function roughly by at most one in expectation.

As long as the strength does not become greater than 2, the number of one-bits remains B. Hence, when the strength is at most 2, the algorithm can only improve the potential by 1 since it cannot make progress by flipping at least two zero-bits in B least significant positions. The radius becomes 3 with probability at most 1/R for each improvement at strength 2. Since there are at most \(n^2\) improvements, the probability of not increasing the current radius to 3 during the run is at least \(1-n^2/R=1-o(1/n)\) by a union bound.

Now, since the algorithm spends \(n\ln R\) steps for each improvement, the number of steps with strength 1 is \(\Omega ((a-c)n\ln R)\) with probability \(1-o(1/n)\). Also, the number of steps with strength 2 is \(\Theta (n^2\sum _{i=c}^a1/i^2)\) in expectation. Note that we ignore the number of iterations with strengths larger than 2 for the lower bound.

Overall, with probability at least \(1-o(1/n)\) the required time of SD-RLS\(^{\text {r}}\) is at least

$$\begin{aligned}\mathord {\Omega }\mathord {\left( (a-c)n\ln R+ n^2\sum _{i=c}^{a}\frac{1}{i^2}\right) }. \end{aligned}$$

\(\square \)

In the following corollary, it can be seen that SD-RLS\(^{\text {m}}\) is faster than SD-RLS\(^{\text {r}}\) in the middle of the run by a factor of roughly n.

Corollary 1

Let \(\epsilon >0\) be an arbitrary constant. The relative speed-up of SD-RLS\(^{\text {m}}\) with \(R\ge n^{4+\epsilon }\) compared to SD-RLS\(^{\text {r}}\) with \(R\ge n^{4+\epsilon }\) to find a search point in \(S_c\) with \(c=n/8\) for a starting search point x with \(\phi (x)=n/4\) is \(\Omega (n)\) with probability at least \(1-o(1/n)\).

Proof

Assume that \(T_r\) and \(T_m\) are the considered hitting times of SD-RLS\(^{\text {r}}\) and SD-RLS\(^{\text {m}}\), respectively. Using Theorem 6 with \(a=n/4\) and \(c=n/8\), we have

$$\begin{aligned}\frac{\mathord {\textrm{E}}\mathord {\left( T_r\right) }}{\mathord {\textrm{E}}\mathord {\left( T_m\right) }}\ge \frac{\Omega (n^2\ln R)}{O(n\ln R)} = \Omega (n),\end{aligned}$$

with probability at least \(1-o(1/n)\). \(\square \)

5 Minimum Spanning Trees

This section will deal with another example where the radius memory leads to a speed-up of SD-RLS\(^{\text {m}}\) compared to the variant SD-RLS\(^{\text {r}}\) without radius memory. In Theorem 7 in Rajabi and Witt [7], the authors study SD-RLS\(^{\text {r}}\) on the MST problem as formulated in Neumann and Wegener [10]: we are given an undirected, weighted graph \(G=(V,E)\), where \(n=|V|\), \(m=|E|\) and the weight of edge \(e_i\), where \(i\in \{1,\dots ,m\}\), is a positive integer \(w_i\). Let c(x) denote the number of connected components in the subgraph described by the search point \(x\in \{0,1\}^m\). The fitness function \(f:\{0,1\}^m\rightarrow \mathbb {R}\) considered in Neumann and Wegener [10], to be minimized, is defined by

$$\begin{aligned} f(x):=M^2 (c(x)-1) + M \left( \sum _{i=1}^m x_i - (n-1)\right) + \sum _{i=1}^m w_ix_i \end{aligned}$$

for an integer \(M\ge n^2 w_{\max }\), where \(w_{\max }\) denotes the largest edge weight. Hence, f returns the total weight of a given spanning tree and penalizes unconnected graphs as well as graphs containing cycles so that such graphs are always inferior than spanning trees. The authors of Rajabi and Witt [7] showed that SD-RLS\(^{\text {r}}\) with \(R=m^4\) can find an MST starting with an arbitrary spanning tree in \((1+o(1)) \bigl (m^2\ln m + (4\,m\ln m)\mathord {\textrm{E}}\mathord {\left( S\right) }\bigr )\) fitness calls where \(\mathord {\textrm{E}}\mathord {\left( S\right) }\) is the expected number of strict improvements. The reason behind \(\mathord {\textrm{E}}\mathord {\left( S\right) }\) is that for each improvement with strength 2, the algorithm resets the radius to one for the next epoch and explores this radius more or less completely in \(m\ln R\) iterations. This can be costly for the graphs requiring many improvements.

However, with SD-RLS\(^{\text {m}}\), we do not need to include the number of improvements for estimating the number of iterations with strength 1 in the runtime bound since with the radius memory mechanism, the number of iterations with strength 1 is asymptotically in the order of \(1/\ln n\) times the number of successes. This leads to the following simple bound.

Theorem 7

Consider an instance to the MST problem, modeled with the classical fitness function from Neumann and Wegener [10]. The expected optimization time of SD-RLS\(^{\text {m}}\) with \(R=m^4\), starting with an arbitrary spanning tree, is at most

$$\begin{aligned}&(1+o(1))\bigl ((m^2/2)(1+\ln (r_1+\dots +r_m))\bigr ) \le (1+o(1)) \bigl (m^2\ln m \bigr ), \end{aligned}$$

where \(r_i\) is the rank of the i-th edge in the sequence sorted by increasing edge weights.

The proof is similar to the proof of Theorem 6 in Rajabi and Witt [7] by using multiplicative drift analysis [19]. However, we show that the radius memory mechanism controls the number of iterations with strength 1, and we apply Theorem 2 to show that if the algorithm uses strengths larger than 2, the algorithm shortly after makes an improvement with strength 2 again.

Proof of Theorem 7

We aim at using multiplicative drift analysis using \(g(x)=\sum _{i=1}^m x_ir_i\) as potential function. As shown in Raidl et al. [20], RLS behaves stochastically identical on the original fitness function f and the function g if at most two bits may flip simultaneously. Since SD-RLS\(^{\text {m}}\) has different states, we do not have the same lower bound on the drift towards the optimum as with the classical RLS\(^{1,2}\) from Neumann and Wegener [10]. However, at strength 1 no mutation is accepted since the fitness function from Neumann and Wegener  [10] gives a huge penalty to non-trees. Hence, our plan is to conduct the drift analysis conditioned on that the strength is at most 2 and account for the steps spent at strength 1 separately. Cases where the strength exceeds 2 will be handled by an error analysis and a restart argument.

Let \(X^{(t)}:=g(x_t)-g(x_{{{\,\textrm{opt}\,}}})\) for the current search point \(x_t\) and an optimal search point \(x_{{{\,\textrm{opt}\,}}}\). Since the algorithm behaves stochastically the same on the original fitness function f and the potential function g, we obtain that \(\mathord {\textrm{E}}\mathord {\left( X^{(t)} - X^{(t+1)}\mid X^{(t)}\right) } \ge X^{(t)}/\left( {\begin{array}{c}m\\ 2\end{array}}\right) \ge 2X^{(t)}/m^2\) since the g-value can be decreased by altogether \(g(x_t)-g(x_{{{\,\textrm{opt}\,}}})\) via a sequence of at most \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \) disjoint two-bit flips; see also the proof of Theorem 15 in Doerr et al. [21] for the underlying combinatorial argument. Let T denote the number of steps at strength 2 until g is minimized, assuming no larger strength to occur. Using the multiplicative drift theorem, we have \(\mathord {\textrm{E}}\mathord {\left( T\right) }\le (m^2/2)(1+\ln (r_1+\dots +r_m)) \le (m^2/2)(1+\ln (m^2))\) and by the tail bounds for multiplicative drift (e. g., [19]) it holds that \(\mathord {{{\,\textrm{Pr}\,}}}\mathord {\left( T> (m^2/2)(\ln (m^2) + \ln (m^2))\right) } \le e^{-\ln (m^2)} = 1/m^2\). Note that this bound on T is below the threshold for strength 2 since \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \ln R = (m^2-m) \ln (m^4) \ge (m^2/2)(4\ln m)\) for m large enough. Hence, with probability at most \(1/m^2\) the algorithm fails to find the optimum before the strength can change from 2 to a different value due to the threshold being exceeded.

We next bound the expected number of steps spent at larger strengths. By Theorem 2, if the algorithm fails to find an improvement with the right radius, i. e. when the radius becomes \(r=3>{{\,\textrm{IndividualGap}\,}}(x)=2\), then, in at most \(o(R\left( {\begin{array}{c}m\\ r-1\end{array}}\right) )\) iterations in expectation, the radius is set to the gap size of the current search point at that time. Here we note that the gap of all non-optimal search points is 2 since there must be an improving edge swap. Thus, along with the bound 1/R on the probability of missing an improvement at radius 2, an increase of radius above 2 costs an expected number of at most \((1/R) \cdot o\!\left( R\left( {\begin{array}{c}m\\ 2\end{array}}\right) \right) = o\!\left( \left( {\begin{array}{c}m\\ 2\end{array}}\right) \right) \) iterations to make an improvement with strength 2 again and set the radius to two. This time is only a lower-order term of the runtime bound claimed in the theorem. If the strength exceeds 2, we wait for it to become 2 again and restart the previous drift analysis, which is conditional on strength at most 2. Since the probability of a failure is at most \(1/m^2\), this gives an expected number of at most \(1/(1-m^{-2})\) restarts, which is \(1+o(1)\) as well.

It remains to bound the number of steps at strength 1. For each strict improvement, B is set to \((1/\ln n)\cdot u\), where u is the counter value, and the counter is reset, and the radius r set to 2. Thereafter, B steps (where B is the budget variable) of SD-RLS\(^{\text {m}}\) pass before the strength becomes 2 again. Hence, if the strength does not exceed 2 before the optimum is reached, this contributes a term of \((1+1/\ln n)\) to the number of iterations with strength 2 in the previous epoch. Also, in the beginning of the algorithm, there is a complete phase of unsuccessful steps at strength 1 costing \(m\ln R\), which only contributes a lower-order term. \(\square \)

Theorem 7 is interesting since it is asymptotically in the same order as lower bound known from the analysis of the classical (1+1) EA [10]. In particular, it does not suffer from the additional \(\log (w_{\max })\) factor that appears in the upper bound for the (1+1) EA. In this sense, this seems to be the first asymptotically tight analysis of a globally searching (1+1)-type algorithm on the MST. So far a tight analysis of evolutionary algorithms on the MST was only known for RLS\(^{1,2}\) with one- and two-bit flip mutations [20]. Our bound in Theorem 7 is by a factor of roughly 2 better since it avoids an expected waiting time of 2 for a two-bit flip. On the technical side, it is interesting that we could apply drift analysis in its proof despite the algorithm being able to switch between different mutation strengths influencing the current drift.

6 Radius Memory can be Detrimental

After a high mutation strength has been selected, e. g., to overcome a local optimum, the radius memory of SD-RLS\(^{\text {m}}\) decreases the threshold values for phase lengths related to lower strengths via the counter u in the algorithm. As we have seen in Theorem 2 and its application in Theorem 4, SD-RLS\(^{\text {m}}\) can often return to a smaller strength quickly, so that the radius memory and the smaller threshold values do not essentially increase the remaining optimization time. However, we can also point out situations where using the smaller strength with their original threshold values as used in the original SD-RLS\(^{\text {r}}\) from Rajabi and Witt [7] is crucial.

Our example is based on a general construction principle that can be traced back to Witt [22] and was picked up in Rajabi and Witt [1] to show situations where stagnation in the context of the (1+1) EA is detrimental; see that paper for a detailed account of the construction principle. In Rajabi and Witt [7], the idea was used to demonstrate situations where bit-flip mutations outperform SD-RLS\(^{\text {r}}\). Roughly speaking, the functions combine two gradients one of which is easier to exploit for an algorithm A while the other is easier to exploit for another algorithm B. By appropriately defining local and global optima close to the end of the search space in direction of the gradients, either Algorithm A significantly outperforms Algorithm B or the other way round.

We will now define a function on which SD-RLS\(^{\text {m}}\) is exponentially slower than SD-RLS\(^{\text {r}}\). In the following, we will imagine any bit string x of length n as being split into a prefix \(a:=a(x)\) of length \(n-m\) and a suffix \(b:=b(x)\) of length m, where m is defined below. Hence, \(x=a(x)\circ b(x)\), where \(\circ \) denotes the concatenation. The prefix a(x) is called valid if it is of the form \(1^i 0^{n-m-i}\), i. e., i leading ones and \(n-m-i\) trailing zeros. The prefix fitness \(\textsc {pre}(x)\) of a string \(x\in \{0,1\}^n\) with valid prefix \(a(x)=1^i0^{n-m-i}\) equals i, the number of leading ones. The suffix consists of \(\lceil n^{1/8}\rceil \) consecutive blocks of \(\lceil n^{3/4}\rceil \) bits each, altogether \(m=O(n^{7/9})\) bits. Such a block is called valid if it contains either 0 or 2 one-bits; moreover, it is called active if it contains 2 and inactive if it contains 0 one-bits. A suffix where all blocks are valid and where all blocks following first inactive block are also inactive is called valid itself, and the suffix fitness \(\textsc {suff}(x)\) of a string x with valid suffix b(x) is the number of leading active blocks before the first inactive one. Finally, we call \(x\in \{0,1\}^n\) valid if both its prefix and suffix are valid.

The final fitness function, which the algorithms have to maximize, is a weighted combination of \(\textsc {pre}(x)\) and \(\textsc {suff}(x)\). We define for \(x\in \{0,1\}^n\), where \(x=a\circ b\) with the above-introduced a and b,

$$\begin{aligned}&\textsc {PreferOneBitFlip} (x):=\\&{\left\{ \begin{array}{ll} n -m - \textsc {pre}(x) + \textsc {suff}(x) & \text { if }\textsc {suff}(x)\le n^{1/9} \wedge x\text { valid}\\ n^2\textsc {pre}(x) + \textsc {suff}(x)& \text { if }n^{1/9}\!<\!\textsc {suff}(x)\!\le \! n^{1/8}/2 \wedge x\text { valid}\\ n^2 (n\!-\!m) + \textsc {suff}(x)-\!n-\!1 \hspace{-3ex}& \text { if }\textsc {suff}(x)>n^{1/8}/2 \wedge x\text { valid}\\ - \textsc {OneMax} (x) & \text { otherwise.} \end{array}\right. } \end{aligned}$$

We note that all search points in the third case have a fitness of at least \(n^2 (n-m) - n - 1\), which is bigger than \(n^2(n-m-1) + n\), an upper bound on the fitness of search points that fall into the second case without having \(n-m\) ones in the prefix. Hence, search points x where \(\textsc {pre}(x)=n-m\) and \(\textsc {suff}(x)=\lceil n^{1/8}\rceil \) represent local optima of second-best overall fitness. The set of global optima equals the points where \(\textsc {suff}(x)=\lfloor n^{1/8}/2\rfloor \) and \(\textsc {pre}(x)=n-m\), which implies that at least \(n^{1/8}\) bits (two from each block) have to be flipped simultaneously to escape from the local toward the global optimum. The first case is special in that the function is decreasing in the pre-value as long as \(\textsc {suff}(x)\le n^{1/9}\). Typically, the first valid search point falls into the first case. Then two-bit flips are essential to make progress and the radius memory of SD-RLS\(^{\text {m}}\) will be used when waiting for the next improvement. After leaving the first case, since two-bit flips happen quickly enough, the memory will make progress via one-bit flips unlikely, leading to the local optimum.

We note that function PreferOneBitFlip shares some features with the function \(\textsc {NeedHighMut} \) from Rajabi and Witt [1] and the function NeedGlobalMut from Rajabi and Witt [7]. However, it contains an extra case for small suffix values, uses different block sizes and block count for the suffix, and inverts roles of prefix and suffix by leading to a local optimum when the suffix is optimized first.

In the following, we make the above ideas precise and show that SD-RLS\(^{\text {r}}\) outperforms SD-RLS\(^{\text {m}}\) on PreferOneBitFlip.

Theorem 8

Let \(\epsilon >0\) be an arbitrary constant. With probability at least \(1-1/n^{1/8}\), SD-RLS\(^{\text {m}}\) with \(R\ge n^{6+\epsilon }\) needs time \(2^{\Omega (n^{1/8})}\) to optimize PreferOneBitFlip. With probability at least \(1-1/n\), SD-RLS\(^{\text {r}}\) with \(R\ge n^{4+\epsilon }\) optimizes the function in time \(O(n^2)\).

Proof

As in the proof of Theorem 5 in Rajabi and Witt [5], we have that the first valid search point (i. e., search point of non-negative fitness) of both SD-RLS\(^{\text {r}}\) and SD-RLS\(^{\text {m}}\) has both \(\textsc {pre}\)- and \(\textsc {suff}\)-value value of at most \(n^{1/9}/2\) with probability \(2^{-\Omega (n^{1/9})}\). In the following, we tacitly assume that we have reached a valid search point of the described maximum \(\textsc {pre}\)- and \(\textsc {suff}\)-value and note that this changes the required number of improvements to reach local or global maximum only by a \(1-o(1)\) factor. For readability this factor will not be spelt out any more.

Given the situation with a valid search point x where \(\textsc {suff}(x)<n^{1/9}/2\), fitness improvements are only possible by increasing the suff-value. Since the probability of a suff-improving step is at least \(\left( {\begin{array}{c}n^{3/4}\\ 2\end{array}}\right) /n^2=\Omega (n^{-1/2})\) at strength 2, the expected time for both algorithms to reach the second case of the definition of PreferOneBitFlip is \(O(n^{1/9}n^{1/2})= O(n^{11/18})\) according to Theorem 1, and by repeating independent phases and Markov’s inequality, the time is O(n) with probability exponentially close to 1. Afterwards, one-bit flips increasing the pre-value are strictly improving and happen while the strength is 1 with probability at least \(1-1/R\) in SD-RLS\(^{\text {r}}\), which does not have radius memory. Therefore, by a union bound over all O(n) improvements, with probability at least \(1-1/n\), SD-RLS\(^{\text {r}}\) increases the pre-value to its maximum before the suff-value becomes greater than \(\lceil n^{1/9}\rceil \). The time for this is \(O(n^2)\) even with probability exponentially close to 1 by Chernoff bounds. Afterwards, by reusing the above analysis to leave the first case, with probability at least \(1-1/n\) a number of \(o(n^2)\) steps is sufficient for SD-RLS\(^{\text {r}}\) to find the global optimum. This proves the second statement of the theorem.

To prove the first statement, i. e., the claim for SD-RLS\(^{\text {m}}\), we first note that the probability of improving the pre-value at strength 2 is \(O(n^{-2})\) since two specific bits would have to flip. Hence, such steps never happen in \(\Theta (n)\) steps with probability \(1-O(1/n)\). By contrast, a suff-improving step at strength 2, which has probability \(\Omega (n^{-1/2})\), happens within \(O(n^{3/4})\) steps with probability at least \(1-1/n^{1/4}\) according to Markov’s inequality. In this case, the radius memory of SD-RLS\(^{\text {m}}\) will set a threshold of \(B=O(n^{3/4})\) for the subsequent iterations at strength 1. The probability of improving the pre-value within this time is \(O(1/n^{-1/4})\) by a union bound, noting the success probability of at most 1/n. Hence, the probability of having at least \(n^{1/8}\) improvements of the suff-value within \(n^{3/4}\) steps each before an improvement of the pre-value (at strength 1) happens, is at least \(1-1/n^{1/8}\) by a union bound. If all this happens, the algorithm has to flip at least \(n^{1/8}\) bits simultaneously, which requires \(2^{\Omega (n^{1/8})}\) steps already to reach the required strength. The total failure probability is \(O(n^{-1/8})\). \(\square \)

7 Experiments

To supplement our theoretical findings, we ran an implementation of five algorithms SD-RLS\(^{\text {m}}\) with \(R=m^5\), SD-RLS\(^{\text {r}}\) with \(R=m^5\), (1+1) FEA\(_\beta \) with \(\beta =1.5\) from Doerr et al. [6], the standard (1+1) EA and RLS\(^{1,2}\) on the MST problem with n vertices and m edges and the fitness function from Neumann and Wegener [10] for three types of graphs called TG, Erdős–Rényi with \(p=(2\ln n)/n\), and \(K_n\). We carried out similar experiments to Rajabi and Witt [7] with the additional class of complete graphs \(K_n\) to illustrate the performance of the new algorithm and compare with the other algorithms.

Fig. 2
figure 2

Average number of fitness calls (over 200 runs) the mentioned algorithms took to optimize the fitness function MST of the graphs

Fig. 3
figure 3

Example TG graph with \(p=n/4\) connected triangles and a complete graph on q vertices with edges of weight 1 [10]

The TG graph represented in Fig. 3 with n vertices and \(m=3n/4+\left( {\begin{array}{c}n/2\\ 2\end{array}}\right) \) edges contains a sequence of \(p=n/4\) triangles which are connected to each other, and the last triangle is connected to a complete graph of size \(q=n/2\). Regarding the weights, the edges of the complete graph have the weight 1, and we set the weights of edges in triangle to 2a and 3a for the side edges and the main edge, respectively. In this paper, we consider \(a=n^2\). The TG graph is used for estimating lower bounds on the expected runtime of the (1+1) EA and RLS in the literature [10]. As can be seen in Fig. 2a, (1+1) EA with heavy-tailed mutation (i. e., (1+1) FEA\(_\beta \)) with \(\beta =1.5\) outperformed the rest of the algorithms. However, SD-RLS\(^{\text {r}}\) and SD-RLS\(^{\text {m}}\) also outperformed the standard (1+1) EA and RLS\(^{1,2}\).

Regarding the Erdős–Rényi graphs, we produced the graphs randomly with \(p=(2\ln n)/n\) and assigned each edge an integer weight in the range \([1,n^2]\) uniformly at random. We also checked that the graphs were connected. Then, we ran the implementation to find the MST of these graphs. The obtained results can be seen in Fig. 2b. As discussed in Section 6 in Rajabi and Witt [7], SD-RLS\(^{\text {r}}\) does not outperform the (1+1) EA and RLS\(^{1,2}\) on MST with graphs when the number of strict improvements in SD-RLS\(^{\text {r}}\) is large. However, the proposed algorithm in this paper, SD-RLS\(^{\text {m}}\) outperformed the rest of the algorithms, although there can be a relatively large number of improvements on such graphs. We can also see this superiority in Fig. 2c for the complete graphs \(K_n\) with random edge weights in the range \([1,n^2]\).

For statistical tests, we ran the algorithms on the TG and Erdős–Rényi graphs 200 times, and all p-values obtained from a Mann-Whitney U-test between the algorithms, with respect to the null hypothesis of identical behavior, are less than \(10^{-2}\) except for the results regarding the smallest size in each set of graphs.

8 Conclusions

We have investigated stagnation detection with the s-bit flip operator as known from randomized local search and introduced a mechanism called radius memory that allows continued exploitation of large s values that were useful in the past. Improving earlier work from Rajabi and Witt [7], this leads to tight bounds on complex multimodal problems like linear functions with uniform constraints and the minimum spanning tree problem, while still optimizing unimodal and jump functions essentially as efficiently as in earlier work. The bound for the MST is the first runtime bound for a global search heuristics of order \(O(m^2\ln m)\) and improves upon the runtime of classical RLS algorithms by a factor of roughly 2. We have also pointed out situations where the radius memory is detrimental for the optimization process. In the future, we would like to investigate the concept of stagnation detection with radius memory in population-based algorithms and plan analyses on further combinatorial optimization problems.