As mentioned above, we present a runtime analysis of OnePSO for two combinatorial problems, the sorting problem and OneMax. Our analysis is based on the fitness level method (Wegener 2002), in particular its application to the runtime analysis of a \((1+1)\)-EA for the sorting problem in Scharnow et al. (2004). Consider a (discrete) search space X and an objective function \(f:X \rightarrow {\mathbb {R}} \), where f assigns m distinct values \(f_1< f_2< \cdots < f_m\) on X. Let \(S_i \subseteq X\) be the set of solutions with value \(f_i\). Assuming that some algorithm \({\mathscr {A}}\) optimizing f on X leaves fitness level i at most once then the expected runtime of \({\mathscr {A}}\) is bounded from above by \(\sum _{i=1}^m {1}/{s_i}\), where \(s_i\) is a lower bound on the probability of \({\mathscr {A}}\) leaving \(S_i\). The method has also been applied successfully, e.g., in Sudholt and Witt (2010) to obtain bounds on the expected runtime of a binary PSO proposed in Kennedy and Eberhart (1997).
Upper bounds on the expected optimization time
Similar to Scharnow et al. (2004) and Sudholt and Witt (2010), we use the fitness-level method to prove upper bounds on the expected optimization time of the OnePSO for sorting and OneMax. In contrast to the former, we allow non-improving solutions and return to the attractor as often as needed in order to sample a neighbor of the attractor that belongs to a better fitness level. Therefore, the time needed to return to the attractor contributes a multiplicative term to the expected optimization time, which depends on the choice of the algorithm parameter c.
We first consider the sorting problem. The structure of the search space of the sorting problem has been discussed already in Scharnow et al. (2004) and a detailed analysis of its fitness levels is provided in Mühlenthaler et al. (2017). In the following lemma we bound the transition probabilities for the Markov model for the sorting problem. This allows us to bound the runtime of OnePSO for the sorting problem later on.
Lemma 3
For the sorting problem on n items, \(c = 1/2\) and \(x\in X_i\), the probability \(p_x\) that OnePSO moves from x to an element in \(X_{i-1}\) is bounded from below by \(p_i = \frac{1}{2} (1+ i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) )\). Furthermore, this bound is tight.
Proof
The lower bound \(p_i\) on \(p_x\) can be obtained by
$$\begin{aligned} p_x = \left( c+(1-c)\frac{\vert {\mathscr {N}}(x)\cap X_{i-1}\vert }{\vert {\mathscr {N}}(x)\vert }\right) = \frac{1}{2}\left( 1+\frac{\vert {\mathscr {N}}(x)\cap X_{i-1}\vert }{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\right) \ge p_i . \end{aligned}$$
To show the above inequality, consider the attractor a and a permutation \(\tau \) such that \(x \circ \tau = a\). For each cycle of length k of \(\tau \), exactly \(k-1\) transpositions are needed to adjust the elements in this cycle and there are \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \ge k-1\) transpositions which decrease the transposition distance to the attractor a. Therefore the number of ways to decrease the transposition distance to a is bounded from below by the transposition distance to a. Hence, we have \(\vert {\mathscr {N}}(x)\cap X_{i-1}\vert \ge i\).
The lower bound is tight as it appears if only cycles of length two (or one) appear. In Mühlenthaler et al. (2017, Sect. 4) a more detailed discussion on improvement probabilities can be found. \(\square \)
Using Lemma 3 we prove the following bounds on the expected optimization time \(T_{{\mathrm{sort}}}(n)\) required by OnePSO for sorting n items by transpositions.
Theorem 10
(Mühlenthaler et al. 2017, Theorem 13) The expected optimization time \(T_{{\mathrm{sort}}}(n)\) of the OnePSO sorting n items is bounded from above by
$$\begin{aligned} T_{{\mathrm{sort}}}(n) = {\left\{ \begin{array}{ll} O(n^2 \log n ) & \text {if }c \in (\frac{1}{2}, 1] \\ O(n^{{3}} \log n) & \text {if }c = \frac{1}{2} \\ O\left( \left( \frac{1-c}{c} \right) ^n\cdot n^2\log n\right) & \text {if }c \in (0, \frac{1}{2}). \end{array}\right. } \end{aligned}$$
See Fig. 5 for a visualization of \(\frac{1-c}{c}\).
Proof
Consider the situation that the attractor has just been updated. Whenever the OnePSO fails to update the attractor in the next iteration it will take in expectation \(H_1\) iterations until the attractor is reached again and then it is improved with probability at least \(i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \). Again, if the OnePSO fails to improve the attractor we have to wait \(H_1\) steps, and so on. Since we do not consider the case that the attractor has been improved meanwhile, the general fitness level method yields an expected runtime of at most \(\sum _{i=1}^n((H_1+1)(1/s_i-1)+1) = H_1 \cdot O(n^2 \log n)\).
We now bound the expected return time \(H_1\). Let \(c \in (\frac{1}{2}, 1]\) and recall that \(p_i\) is the probability of moving from state \(S_i\) to state \(S_{i-1}\). Then \(1 \ge p_i> c > \frac{1}{2}\). Then the expression for \(H_1\) given in Theorem 5 is bounded from above by the constant \(1/(2c-1)\), so \(T_{{\mathrm{sort}}}(n)=O(n^2\log n)\). Now let \(c = \frac{1}{2}\), so \(p_i \ge \frac{1}{2} (1+ i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) )\) by Lemma 3. Then, by Corollary 1, we have \(H_1 = O(n)\), so \(T_{{\mathrm{sort}}}(n)=O(n^3\log n)\). Finally, let \(c \in (0, \frac{1}{2})\). Then \(p_i> c > 0\), and by Theorem 5, \(H_1\) is bounded from above by
$$\begin{aligned} H_1 \le \frac{2c}{1-2c}\left( \frac{1-c}{c} \right) ^n = O\left( \left( \frac{1-c}{c}\right) ^n\right) , \end{aligned}$$
so \(T_{{\mathrm{sort}}}(n)=O\left( \left( \frac{1-c}{c} \right) ^n\cdot n^2\log n\right) \). \(\square \)
For \(c = 0\), OnePSO always moves to a uniformly drawn adjacent solution. Hence, the algorithm just behaves like a random walk on the search space. Hence, in this case, \(T_{{\mathrm{sort}}}(n)\) is the expected number of transpositions that need to be applied to a permutation in order to obtain a given permutation. We conjecture that \(T_{{\mathrm{sort}}}(n)\) has the following asymptotic behavior and provide theoretical evidence for this conjecture in the “Appendix A”.
Conjecture 1
\(T_{{\mathrm{sort}}}(n)\sim n!\) if \(c=0\).
Please note that the conjecture is actually only a conjecture on the upper bound as Theorem supplies a proof that \(T_{{\mathrm{sort}}}(n)=\varOmega (n!)\) if \(c=0\).
Using a similar approach as in Theorem 10, we now bound the expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) of OnePSO for OneMax.
Theorem 11
The expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) of the OnePSO solving OneMax is bounded from above by
$$\begin{aligned} T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n) = {\left\{ \begin{array}{ll} O(n \log n) \text {if }c \in (\frac{1}{2}, 1],\\ O(n^\frac{3}{2} \log n) \text {if }c = \frac{1}{2},\\ O\left( \beta (c)^n\cdot n^2\log n\right) \text {if }c \in (0, \frac{1}{2}),\text { and}\\ O(2^n) \text {if }c = 0. \end{array}\right. } \end{aligned}$$
where \(\beta (c)=2^{{1}/({1-c})}\cdot (1-c)\cdot c^{{c}/({1-c})}\).
See Fig. 5 for a visualization of \(\beta (c)\).
Proof
The argument is along the lines of the proof of Theorem 10. We observe that on fitness level \(0 \le i \le n\) there are i bit flips that increase the number of ones in the current solution. Therefore, \(s_i = i/n\) and the fitness level method yields an expected runtime of at most \(\sum _{i=1}^n (H_1 + 1)(1/s_i-1)+1 = H_1 \cdot O(n \log n)\). The bounds on \(H_1\) for \(c > \frac{1}{2}\) are as in the proof of Theorem 10. For \(c = \frac{1}{2}\) we invoke Lemma 1 and have \(H_1 = O(\sqrt{n})\). For \(c <\frac{1}{2}\) we use Theorem 8. The probabilities in the Markov model for \(H_1\) are \(p_i=c+(1-c)i/n\) which can be continuously extended to the non-decreasing function \(p(i)=c+(1-c)i/n\). Here \(k=n\cdot \frac{1-2c}{2(1-c)}\) solves the equation \(p(k)=\frac{1}{2}\). Hence, we need the value of
$$\begin{aligned}&{\mathrm {base}}(p,n) =\exp \left( \int _0^\frac{1-2c}{2(1-c)}\ln \left( \frac{1-c-(1-c)\cdot x}{c+(1-c)\cdot x}\right) \mathrm {d}x\right) \nonumber \\&=\exp \left( \int _0^\frac{1-2c}{2(1-c)}\left( \ln \left( {1- x}\right) -\ln \left( {\frac{c}{1-c}+x}\right) \right) \mathrm {d}x\right) \nonumber \\&=\exp \left( (x-1)\ln (1-x)- \left. \left( \frac{c}{1-c}+x\right) \ln \left( \frac{c}{1-c}+x\right) \right| ^\frac{1-2c}{2(1-c)}_0\right) \nonumber \\&=2^{{1}/({1-c})}\cdot (1-c)\cdot c^{{c}/({1-c})}=\beta (c). \end{aligned}$$
(10)
Now Theorem 8 gives the upper bound \(H_1=O(n\cdot \beta (c)^n)\).
It remains to consider the case that \(c=0\). The claimed bound on \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) can be obtained by using the model \({{\mathscr {M}}} ((\frac{i}{n})_{1\le i\le n})\). Each state represents the distance to the optimal point. By Eq. (6) we have
$$\begin{aligned} H_k&=\sum _{i=k}^n\frac{n}{i}\prod _{j=k}^{i-1}\frac{n-j}{j} =\prod _{j=1}^{k-1}\frac{j}{n-j}\sum _{i=k}^n\frac{n}{i}\prod _{j=1}^{i-1}\frac{n-j}{j}\\&=\frac{1}{\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) }\sum _{i=k}^n\left( {\begin{array}{c}n\\ i\end{array}}\right) \le \frac{2^n}{\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) }. \end{aligned}$$
The maximal expected time to reach the optimal point is the sum of all \(H_k\):
$$\begin{aligned} T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\le \sum _{k=1}^n H_k \le \sum _{k=1}^n \frac{2^n}{\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) } =2^n\cdot \left( 2+O\left( \frac{1}{n}\right) \right) =O(2^n). \end{aligned}$$
\(\square \)
We remark that the upper bounds given in Theorem 11 for \(c\in [\frac{1}{2},1]\) were presented in Mühlenthaler et al. (2017, Theorem 14) and that the upper bound for \(c\in (0,\frac{1}{2})\) is newly obtained using the bounds-by-integration from Sect. 4.4 and the proof of the upper bound for \(c=0\) is also new compared to Mühlenthaler et al. (2017). Admittedly, the bound for \(c=0\) is already available in the context of randomized local search and can be found in Garnier et al. (1999). Furthermore, note that for \(c=\frac{1}{2}\) it is not sufficient to use the lower bound \(p_i \ge p_1=\frac{1}{2} + \frac{1}{2n}\) in order to obtain the runtime bound given in Theorem 11.
By repeatedly running OnePSO and applying Markov’s inequality for the analysis, an optimal solution is found with high probability so we have the following Corollary.
Corollary 3
If the OnePSO is repeated \(\lambda \cdot \log _2(n)\) times but each repetition is terminated after \(2\cdot T(n)\) iterations, where T(n) is the upper bound on the expected number of iterations to find the optimum specified in Theorems 10and 11with suitable constant factor, then OnePSO finds the optimal solution with high probability.
Lower bounds via indistinguishable states
In this section we will provide lower bounds on the expected optimization time of OnePSO that almost match our upper bounds given in Sect. 5.1. We will use the Markov model from Sect. 3 to obtain these lower bounds. The main difference to the previous section is that we restrict our attention to the last improvement of the attractor, which dominates the runtime, both for sorting and OneMax. We will introduce the useful notion of indistinguishability of certain states of a Markov chain. Note that our lower bounds are significantly improved compared to the conference version (Mühlenthaler et al. 2017) by using the newly introduced bounds-by-integration from Sect. 4.4.
Indistinguishable states
We now introduce a notion of indistinguishability of certain states of a Markov chain already presented in Mühlenthaler et al. (2017). We will later use this notion to prove lower bounds on the expected optimization time of OnePSO for sorting and OneMax as follows: We show that the optimum is contained in a set \({\hat{Y}}\) of indistinguishable states. Therefore, in expectation, the states \({\hat{Y}}\) have to be visited \(\varOmega (|{\hat{Y}}|)\) times to hit the optimum with positive constant probability.
Definition 2
(Indistinguishable states) Let M be a Markov process with a finite set Y of states and let \({\hat{Y}}\subseteq Y\). Furthermore, let \((Z_i)_{i\ge 0}\) be the sequence of visited states of M and let \(T=\min \lbrace t>0\mid Z_t\in {\hat{Y}}\rbrace \). Then \({\hat{Y}}\) is called indistinguishable with respect to M if
-
1.
the initial state \(Z_0\) is uniformly distributed over \({\hat{Y}}\), i. e., for all \(y\in Y\):
$$\begin{aligned} \Pr [Z_0=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert = {\left\{ \begin{array}{ll} 1/\vert {\hat{Y}}\vert &{} \text {if }y\in {\hat{Y}}\\ 0 &{} \text {if }y\not \in {\hat{Y}} . \end{array}\right. } \end{aligned}$$
-
2.
and the probabilities to reach states in \({\hat{Y}}\) from states in \({\hat{Y}}\) are symmetric, i. e., for all \(y_1,y_2\in {\hat{Y}}\):
$$\begin{aligned} \Pr [Z_T=y_2\mid Z_0=y_1]=\Pr [Z_T=y_1\mid Z_0=y_2] . \end{aligned}$$
Now we can prove a lower bound on the expected time for finding a specific state.
Theorem 12
Let M be a Markov process as in Definition 2and let \({\hat{Y}}\) be indistinguishable with respect to M. Let h(M) be a positive real value such that \(\mathrm{E}[T]\ge h(M)\), then the expected time to reach a fixed \(y\in {\hat{Y}}\) is bounded below by \(h(M)\cdot \Omega (\vert {\hat{Y}}\vert )\).
Proof
Let \(T_i\) be the stopping time when \({\hat{Y}}\) is visited the ith time.
$$\begin{aligned} T_i=\min \lbrace t\ge 0 \mid \vert \lbrace k\mid 0\le k \le t \wedge Z_k\in {\hat{Y}}\rbrace \vert \ge i\rbrace . \end{aligned}$$
With Statement 1 of Definition 2\(Z_0\) is uniformly distributed over \({\hat{Y}}\). Therefore \(T_1=0\) and \(T_2=T\). Statement 2 of Definition 2 implies that \(\Pr [Z_{T_i}=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert \) for all \(i\ge 1\) by the following induction. The base case for \(i=1\) and \(T_i=0\) is ensured by the Statement 1 of Definition 2. The induction hypothesis is \(\Pr [Z_{T_{i-1}}=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert \). The inductive step is verified by the following series of equations.
$$\begin{aligned} \Pr [Z_{T_i}=y]&= \sum _{{\hat{y}}\in {\hat{Y}}}\Pr [Z_{T_{i-1}}={\hat{y}}]\cdot \Pr [Z_{T_i}=y\mid Z_{T_{i-1}}={\hat{y}}] \\ {\mathop {=}\limits ^{{\text {ind. hyp.}}}}\quad&\sum _{{\hat{y}}\in {\hat{Y}}}1/\vert {\hat{Y}}\vert \cdot \Pr [Z_{T_i}=y\mid Z_{T_{i-1}}={\hat{y}}] \\ {\mathop {=}\limits ^{{\text {Definition}\ 2\, \text {St.2}}}}&1/\vert {\hat{Y}}\vert \cdot \sum _{{\hat{y}}\in {\hat{Y}}}\Pr [Z_{T_i}={\hat{y}}\mid Z_{T_{i-1}}=y] = 1/\vert {\hat{Y}}\vert . \end{aligned}$$
It follows that for all \(i>0\) the difference \(T_{i+1}-T_i\) of two consecutive stopping times has the same distribution as T and also
$$\begin{aligned} \mathrm{E}[T_{i+1}-T_{i}]=\mathrm{E}[T]\ge h(M). \end{aligned}$$
Now let \(y\in {\hat{Y}}\) be fixed. The probability that y is not reached within the first \(T_{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}\) steps is bounded from below through union bound by
$$\begin{aligned} 1-\Pr [Z_0=y]-\sum _{i=1}^{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}\Pr [Z_{T_i}=y]\ge 1/2 \end{aligned}$$
and therefore the expected time to reach the fixed \(y\in {\hat{Y}}\) is bounded from below by
$$\begin{aligned}\frac{1}{2}\cdot \mathrm{E}[T_{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}]&=\frac{1}{2}\cdot \sum _{i=2}^{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}\mathrm{E}[T_i-T_{i-1}]\\&\ge \frac{1}{2}\cdot \sum _{i=2}^{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}h(M)=h(M)\cdot \Omega (\vert {\hat{Y}}\vert ). \end{aligned}$$
\(\square \)
Lower bounds on the expected optimization time for sorting
In this section we consider the sorting problem. Our first goal is to provide lower bounds on the expected return time to the attractor for the parameter choice \(c\in (0,\frac{1}{2})\).
Lemma 4
Let \(c\in (0,\frac{1}{2})\). For the sorting problem on n items, assume that the attractor has transposition distance one to the identity permutation. Then the expected return time \(H_1\) to the attractor is bounded from below by \(\Omega (\alpha (c)^n)\), where
$$\begin{aligned} \alpha (c)=\left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) \cdot \exp \left( -2\sqrt{\frac{c}{1-c}}\arctan \left( \sqrt{\frac{1-2c}{2c}}\right) \right) . \end{aligned}$$
See Fig. 5 for a visualization of \(\alpha (c)\).
Proof
The probability of decreasing the distance to the attractor in state \(S_i\) can be bounded from above by
$$\begin{aligned} p_{i-1} \le c+(1-c)\cdot \frac{\left( {\begin{array}{c}i\\ 2\end{array}}\right) }{\left( {\begin{array}{c}n\\ 2\end{array}}\right) } = c+(1-c)\cdot \frac{i(i-1)}{n(n-1)} \le c+(1-c)\cdot \frac{i^2}{n^2}. \end{aligned}$$
We increase all indices by one such that \({{\tilde{p}}}_i=p_{i-1}\) such that we have n states again. Please note that \(H_2=\varOmega (H_1)\). This can be obtained by the following equations while using Eq. 2 and the fact that \(H_1\ge \frac{1}{1-c}\) is true in this case
$$\begin{aligned} H_2 =\frac{p_1}{1-p_1}H_1-\frac{1}{1-p_1} \ge \frac{c}{1-c}H_1-\frac{1}{1-c-o(1)}=\varOmega (H_1). \end{aligned}$$
We use Theorem 8 to get a lower bound on \(H_1\) by using \(p(i)=c+(1-c)\cdot \frac{i^2}{n^2}\). Here \(k=n\cdot \sqrt{\frac{1-2c}{2(1-c)}}\) maximizes the integral, because it solves the equation \(p(k)=\frac{1}{2}\). An application of Theorem 8 supplies
$$\begin{aligned} H_1\!=\!\Omega \left( \exp \left( \int _0^{\sqrt{\frac{1-2c}{2(1-c)}}}\ln \left( \frac{1-c-(1-c)x^2}{c+(1-c)x^2}\right) \mathrm {d}x\right) ^{\!\!\!n}\right) . \end{aligned}$$
In the following we calculate the exact value of this integral. The integrand can be converted to the expression
$$\begin{aligned}&\ln \left( \frac{1-c-(1-c)x^2}{c+(1-c)x^2}\right) =\ln \left( \frac{1-x^2}{\frac{c}{1-c}+x^2}\right) \\&\quad =\ln (1-x^2)-\ln \left( \frac{c}{1-c}+x^2\right) . \end{aligned}$$
The indefinite integral of \(\ln (1-x^2)\) is
$$\begin{aligned} x\cdot \ln (1-x^2)-2x+\ln \left( \frac{1+x}{1-x}\right) . \end{aligned}$$
It can be evaluated for values \(x\in [0,1[\), but this is fine as \(0\le k/n<1\). Furthermore the indefinite integral of \(\ln \left( \frac{c}{1-c}+x^2\right) \) is
$$\begin{aligned} x\cdot \ln \left( \frac{c}{1-c}+x^2\right) -2x+2\sqrt{\frac{c}{1-c}}\arctan \left( {x}\cdot {\sqrt{\frac{1-c}{c}}}\right) , \end{aligned}$$
which can be evaluated for all values, because \(\frac{c}{1-c}\) is positive. The indefinite integral of the whole expression is obtained by the subtraction of both
$$\begin{aligned}&x\cdot \ln (1-x^2)+\ln \left( \frac{1+x}{1-x}\right) -x\cdot \ln \left( \frac{c}{1-c}+x^2\right) \\&\quad -2\sqrt{\frac{c}{1-c}}\arctan \left( {x}\cdot {\sqrt{\frac{1-c}{c}}}\right) \end{aligned}$$
and evaluation of the bounds \(k/n=\sqrt{\frac{1-2c}{2(1-c)}}\) and 0 results in
$$\begin{aligned}&\left[ \sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( 1-\frac{1-2c}{2(1-c)}\right) +\ln \left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) \right. [0]\\&\qquad -\,\sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( \frac{c}{1-c}+\frac{1-2c}{2(1-c)}\right) [0]\\&\left. \qquad -\,2\sqrt{\frac{c}{1-c}}\arctan \left( \frac{\sqrt{\frac{1-2c}{2(1-c)}}}{\sqrt{\frac{c}{1-c}}}\right) \right] -\left[ 0+\ln (1)-0-0\right] \\&\quad =\sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( \frac{1}{2(1-c)}\right) +\ln \left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) \\&\qquad -\,\sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( \frac{1}{2(1-c)}\right) -2\sqrt{\frac{c}{1-c}}\arctan \left( \sqrt{\frac{1-2c}{2c}}\right) \\&\quad =\ln \left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) -2\sqrt{\frac{c}{1-c}}\arctan \left( \sqrt{\frac{1-2c}{2c}}\right) . \end{aligned}$$
An application of the \(\exp \) function on this result gives the claimed lower bound. \(\square \)
This lower bound is the best possible bound which can be achieved with this model as the probability \(p_i = c+(1-c)\cdot {\left( {\begin{array}{c}i+1\\ 2\end{array}}\right) }/{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) actually appears at distance i if the permutation transforming the current position to the attractor consists of one cycle of length \(i+1\) and the remaining permutation consists of singleton cycles. For this improvement probability the bound is \(\varTheta ^*(\alpha (c)^n)\).
The following theorem supplies lower bounds on the expected optimization time of OnePSO on the sorting problem.
Theorem 13
The expected optimization time \(T_{{\mathrm{sort}}}(n)\) of the OnePSO sorting n items is bounded from below by
$$\begin{aligned} T_{{\mathrm{sort}}}(n) = {\left\{ \begin{array}{ll} \varOmega (n^2) & \text {if }c \in (\frac{1}{2}, 1] \\ \varOmega (n^{\frac{8}{3}}) & \text {if }c = \frac{1}{2} \\ \varOmega \left( \alpha (c)^n\cdot n^2\right) & \text {if }c \in (0, \frac{1}{2})\\ \varOmega \left( n!\right) & \text {if }c =0 . \end{array}\right. } \end{aligned}$$
Proof
The situation where already the initial position is the optimum has probability 1/n!. As \(1-1/n!>1/2\) for \(n\ge 2\) we have the same \(\varOmega \) bound if we ignore this case. In all other cases we can consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. Without loss of generality, we assume that the attractor is the identity permutation and the optimum is the transposition \((0\,1)\). The number of steps required for the next (hence final) improvement of the attractor is a lower bound on the expected optimization time for the OnePSO. We determine a lower bound on this number for various choices of c.
For all \(c \in (0,1]\) we apply Theorem 12. We use all permutations as set of states Y in the Markov process M. Let \({\hat{Y}}=X_1\) be the subset of states which are a single swap away from the attractor. Therefore the optimal solution is contained in \({\hat{Y}}\), but up to the point when the OnePSO reaches the optimal solution it is indistinguishable from all other permutations in \({\hat{Y}}\). We will immediately prove that \({\hat{Y}}\) is actually indistinguishable with respect to M. Initially the particle is situated on the attractor and after a single step it is situated at a permutation in \({\hat{Y}}\), where each permutation has equal probability. We use the permutation after the first step as the initial state of the Markov process \(Z_0\) and all other \(Z_i\) are the successive permutations. Therefore Statement 1 of Definition 2 is fulfilled. Let \(T=\min \lbrace t>0\mid Z_t\in {\hat{Y}}\rbrace \) the stopping time of Theorem 12. For each sequence of states \(Z_0,\ldots ,Z_T\) there is a one to one mapping to a sequence \({{\tilde{Z}}}_0=Z_T,{{\tilde{Z}}}_1,\ldots ,{{\tilde{Z}}}_{T-1},{{\tilde{Z}}}_{T}=Z_0\) which has equal probability to appear. The sequence \({{\tilde{Z}}}_0,\ldots ,\tilde{Z}_T\) is not the reversed sequence, because the forced steps would then lead to the wrong direction, but the sequence can be obtained by renaming the permutation indices. The renaming is possible because the permutations \(Z_0\) and \(Z_T\) are both single swaps. As this one to one mapping exists also the Statement 2 of Definition 2 is fulfilled. Finally we need a bound on the expectation of T. If we are in \(X_1={\hat{Y}}\) we can either go to the attractor by a forced move or random move and return to \(X_1\) in the next step or we can go to \(X_2\) by a random move and return to \(X_1\) in expectation after \(H_2\) steps. We have \(\mathrm{E}[T]=\left( c+(1-c)/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \right) \cdot 2 + (1-c)\cdot \left( 1-1/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \right) (1+H_2)=\varOmega (H_2)=:h(M)\). Theorem 12 provides the lower bound \(\Omega (\vert {\hat{Y}}\vert \cdot H_2)\) for the runtime to find the fixed permutation \((0,1)\in {\hat{Y}}\) which is the optimal solution. From Eq. 2 we get \(H_2=(p_1\cdot H_1-1)/(1-p_1)\ge (c\cdot H_1-1)/(1-c)\). As \(H_1=\Omega (n^{2/3})\) for \(c=\frac{1}{2}\) (see Theorem 7) and \(H_1=\Omega (\alpha (c)^n)\) for \(c\in (0,\frac{1}{2})\) (see Lemma 4) also \(H_2=\Omega (H_1)\) for \(c\in (0,\frac{1}{2}]\) which results in the lower bounds \(T_{{\mathrm{sort}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (\left( {\begin{array}{c}n\\ 2\end{array}}\right) \cdot n^{2/3})=\Omega (n^{8/3})\) for \(c=\frac{1}{2}\) and \(T_{{\mathrm{sort}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (\left( {\begin{array}{c}n\\ 2\end{array}}\right) \cdot \alpha (c)^n)=\Omega (n^2\cdot \alpha (c)^n)\) for \(c\in (0,\frac{1}{2})\). Trivially the return time to \(X_1\) in M can be bounded by 2, which results in the lower bound \(T_{\mathrm{sort}}(n)=\Omega (n^2)\) for the case \(c\in (\frac{1}{2},1]\).
The lower bound for \(c=0\) can be derived directly from the indistinguishability property: Let \({\hat{Y}} = Y\). It is readily verified that the initial state is uniformly distributed over \({\hat{Y}}\). Furthermore, any \({\hat{Y}}\)-\({\hat{Y}}\)-path can be reversed and has the same probability to occur. Therefore, Condition 2 of Definition 2 is satisfied and the lower runtime bound follows from Theorem 12 by choosing \(h(M) = 1\). \(\square \)
Beside that formally proved lower bounds we conjecture the following lower bounds on the expected optimization time of OnePSO for sorting n items.
Conjecture 2
$$\begin{aligned} T_{{\mathrm{sort}}}(n) = {\left\{ \begin{array}{ll} \varOmega (n^2) & \text {if }c \in (\frac{1}{2}, 1] \\ \varOmega (n^{3}) & \text {if }c = \frac{1}{2} \\ \varOmega \left( \left( \frac{1-c}{c}\right) ^n\cdot n^2\right)& \text {if }c \in (0, \frac{1}{2}). \end{array}\right. } \end{aligned}$$
Note that these lower bounds differ from our upper bounds given in Theorem 10 only by a \(\log \)-factor. Evidence supporting this conjecture is given in “Appendix B”. We obtain our theoretical evidence by considering the average probability to move towards the attractor, instead of upper and lower bounds as before.
Lower bounds on the expected optimization time for OneMax
First we provide a lower bound on the expected return time to the attractor.
Lemma 5
Let \(c\in (0,\frac{1}{2})\). For OneMax, assume that the attractor has Hamming distance one to the optimum \(1^n\). Then the expected return time \(H_1\) to the attractor is bounded from below by \(H_1=\Omega (\beta (c)^n)\), where
$$\begin{aligned} \beta (c)=2^{{1}/({1-c})}\cdot (1-c)\cdot c^{{c}/({1-c})}. \end{aligned}$$
See Fig. 5 for a visualization of \(\beta (c)\).
Proof
We use Theorem 8. The value \(\beta (c)\) is already calculated in Theorem 11 Eq. 10. \(\square \)
This result enables us to prove lower bounds on \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\).
Theorem 14
The expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\) of the OnePSO for solving OneMax is bounded from below by
$$\begin{aligned} T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n) = {\left\{ \begin{array}{ll} \varOmega (n \log n) \text {if }c \in (\frac{1}{2}, 1]\\ \varOmega (n^\frac{3}{2}) \text {if }c = \frac{1}{2}\\ \varOmega \left( \beta (c)^n\cdot n\right) \text {if }c \in (0, \frac{1}{2})\\ \varOmega \left( 2^n\right) \text {if }c =0 . \end{array}\right. } \end{aligned}$$
Proof
First, let \(c \in (\frac{1}{2}, 1]\). Then, with probability at least \(\frac{1}{2}\), the initial solution contains at least \(k = \lfloor n/2 \rfloor = \varOmega (n)\) zeros. Each zero is flipped to one with probability 1/n in a random move, and none of the k entries is set to one in a move towards the attractor. The expected time required to sample the k distinct bit flips is bounded from below by the expected time it takes to obtain all coupons in the following instance of the coupon collector’s problem: there are k coupons and each coupon is drawn independently with probability 1/k. The expected time to obtain all coupons is \(\varOmega (k \log k)\) (Mitzenmacher and Upfal 2005, Sect. 5.4.1). It follows that the expected optimization time is \(\varOmega (n \log n)\) as claimed.
For \(c \in (0,\frac{1}{2}]\) we use the same approach as in the proof of Theorem . Also here the event that the initial solution is optimal can be ignored. Consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. We use the set of all bit strings as set of states Y in the Markov process M. Let \({\hat{Y}}=X_1\) the subset of bit strings which is a single bit flip away from the attractor, hence \({\hat{Y}}\) contains the optimum. \(Z_i\) and T are instantiated as in the proof of Theorem . Therefore Statement 1 of Definition 2 is fulfilled. Again for each sequence of states \(Z_0,\ldots ,Z_T\) we have a one to one mapping to a sequence \(\tilde{Z}_0=Z_T,{{\tilde{Z}}}_1,\ldots ,{{\tilde{Z}}}_{T-1},{{\tilde{Z}}}_{T}=Z_0\) which has equal probability to appear. This sequence is again obtained by renaming the indices plus some bit changes according to the shape of the attractor. Hence also Statement 2 of Definition 2 is fulfilled. Hence \({\hat{Y}}\) is indistinguishable with respect to M. We obtain \(\mathrm{E}[T]=\Omega (H_2)=:h(M)\). Theorem 12 provides the lower bound \(\Omega (\vert {\hat{Y}}\vert \cdot H_2)\) for the runtime to find the optimal solution. From Eq. (2) we get \(H_2\ge (c\cdot H_1-1)/(1-c)\). As \(H_1=\Omega (n^{1/2})\) for \(c=\frac{1}{2}\) (see Theorem 6) and \(H_1=\Omega (\beta (c)^n)\) for \(c\in (0,\frac{1}{2})\) (see Lemma 5) also \(H_2=\Omega (H_1)\) for \(c\in (0,\frac{1}{2}]\) which results in the lower bounds \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (n\cdot n^{1/2})=\Omega (n^{3/2})\) for \(c=\frac{1}{2}\) and \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (n\cdot \beta (c)^n)=\Omega (n\cdot \beta (c)^n)\) for \(c\in (0,\frac{1}{2})\).
Again the lower bound for \(c=0\) can be obtained by the indistinguishability property. The proof for this case is identical to the corresponding part of Theorem . \(\square \)
Finally all runtime bounds claimed in Table 1 are justified and for this purpose all of the presented tools in Sect. 4 are used.
Bounds on the expected optimization time for D-PSO
The upper bounds on the runtime of OnePSO in Theorem 10 and Theorem 11 directly imply upper bounds for D-PSO. Recall that we denote by c the parameter of OnePSO and by \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}} (n)\) and \(T_{{\mathrm{sort}}}(n)\) the expected optimization time of OnePSO for OneMax and sorting, respectively.
Corollary 4
Let \(T'_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) and \(T'_{{\mathrm{sort}}}(n)\) be the expected optimization time of D-PSO for OneMax and sorting, respectively. If \(c = c_{glob}\), then \(T'_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n) = O(P\cdot T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n))\) and \(T'_{{\mathrm{sort}}}(n) =O(P\cdot T_{{\mathrm{sort}}}(n))\), where P is the number of particles.
Proof
In each trial to improve the value of the global attractor at least the particle which updated the global attractor has its local attractor at the same position as the global attractor. This particle behaves exactly like the single particle in OnePSO until the global attractor is improved. Therefore we have at most P times more objective function evaluations than OnePSO, where P is the number of particles. \(\square \)
One can refine this result by again looking on return times to an attractor.
If the global attractor equals the local attractor then this particle performs the same steps as all other particles having equal attractors. As all those particles perform optimization in parallel in expectation no additional objective function evaluations are made compared to the OnePSO.
For particles where the global attractor and the local attractor differ we can use previous arguments applied to the local attractor. With two different attractors alternating movements to the local and global attractor can cancel each other out. Therefore if (only) \(c_{loc}\) is fixed then for the worst case we can assume only \(c_{loc}\) as probability of moving towards the local attractor and \(1-c_{loc}\) as probability of moving away from the local attractor. This enables us to use Theorem 5 to calculate the expected time to reduce the distance to an attractor from one to zero. We denote the return time from Theorem 5 as \(\varPsi (n,p)\)
$$\begin{aligned} \varPsi (n,p):=\left. {\left\{ \begin{array}{ll} \dfrac{1-2p \left( \dfrac{1-p}{p} \right) ^n }{2p-1} & \text {if } p \ne \frac{1}{2}\\ 2n-1 & \text {otherwise} \end{array}\right. }\right\} ={\left\{ \begin{array}{ll} \varTheta (1) & \text {if }\frac{1}{2}<p\le 1\\ \varTheta (n) & \text {if }p = \frac{1}{2}\\ \varTheta \left( \left( \frac{1-p}{p} \right) ^n \right) & \text {if } 0<p < \frac{1}{2}. \end{array}\right. } \end{aligned}$$
If the position equals the local attractor and consequently differs from the global attractor the probability for improving the local attractor can be bounded from below by a positive constant. E.g., for the problem OneMax this constant is \(c_{glob}/2\) because for a move towards the global attractor for at least half of the differing bits the value of the global attractor equals the value of the optimal solution as the global attractor is at least as close to the optimum as the local attractor. Therefore the number of trials until the local attractor is improved is constant. As such an update occurs at most once for each particle and fitness level we obtain an additional summand of \(O(\varPsi (n,c_{loc})\cdot P\cdot n)\) instead of the factor P for the problems OneMax and the sorting problem.
In contrast to the upper bounds, the lower bounds for OnePSO do not apply for D-PSO for the following reason. The bottleneck used for the analysis of OnePSO is the very last improvement step. However, D-PSO may be faster at finding the last improvement because it may happen that the local and global attractor of a particle have both distance one to the optimum but are not equal. In this case, as described above, there is a constant probability of moving to the optimum if a particle is at one of the two attractors whereas for OnePSO the probability of moving towards the optimum if the particle is located at the attractor tends to zero for large n.
An analysis of experiments of D-PSO with small number of particles and OnePSO applied to the sorting problem and OneMax revealed only a small increase in the optimization time of D-PSO compared to OnePSO. This increase is way smaller than the factor P. For some parameter constellations also a significant decrease of the optimization time of D-PSO compared to OnePSO is achieved.
Lower bounds for pseudo-Boolean functions
Also for general pseudo-Boolean functions \(f:\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}} \) we can prove lower bounds on the expected optimization time.
Theorem 15
If \(P=\varTheta (1)\), where P is the number of particles, then the expected optimization time of D-PSO optimizing pseudo-Boolean functions \((\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}} \)) with a unique optimal position is in \(\varOmega (n\log (n))\).
Proof
If there are \(P=\varTheta (1)\) particles, then in expectation there are \(n/2^P=\varOmega (n)\) bits such that there is no particle where this bit of the optimal position equals the corresponding bit of the initial position. The expected optimization time is therefore bounded by the time that each such bit is flipped in a random move at least once. This subproblem corresponds to a coupon collectors problem and therefore we have the claimed lower bound of \(\varOmega (n\log (n))\). \(\square \)
For larger values of P we obtain an even higher lower bound by the following theorem.
Theorem 16
If \(P=O(n^k)\), where P is the number of particles and k is an arbitrary non-negative real value, then the expected optimization time of D-PSO optimizing pseudo-Boolean functions \((\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}})\) with a unique optimal position is in \(\varOmega (n\cdot P)\).
Proof
To bound the probability to be at least some distance apart from the attractor after initialization we can use Chernoff bounds. For a fixed particle we can define
$$\begin{aligned} Y_i={\left\{ \begin{array}{ll} 1 & \begin{array}{l}\text {if the }i\text {th bit of the initial position differs from}\\ \text {the corresponding bit of the unique optimal position}\end{array}\\ 0 & \text {otherwise}. \end{array}\right. } \end{aligned}$$
Therefore \(Y=\sum _{i=1}^n Y_i\) is exactly the initial distance of the fixed particle to the unique optimal position. For each i we have that \(\Pr [Y_i=1]=\frac{1}{2}\) and \(\mathrm{E}[Y]=\frac{n}{2}\). By Chernoff bounds we obtain the lower bound
$$\begin{aligned} \Pr \left[ Y > \frac{n}{4}\right]&=1-\Pr \left[ Y\le \left( 1-\frac{1}{2}\right) \mathrm{E}[Y]\right] \ge 1-\exp \left( -\frac{\left( \frac{1}{2}\right) ^2\cdot \frac{n}{2}}{2}\right) \\&=1-\exp \left( -\frac{n}{16} \right) . \end{aligned}$$
The probability that the initial position of all P particles have distance at least \(\frac{n}{4}\) to the unique optimal position is the \(P\hbox {th}\) power of this probability and can be bounded from below for large n by
$$\begin{aligned} \left( 1-\exp \left( -\frac{n}{16} \right) \right) ^P \ge 1-P \cdot \exp \left( -\frac{n}{16} \right) \overset{n\ge 16\ln (2P)}{\ge } \frac{1}{2}. \end{aligned}$$
Please note that one can choose such an n as \(P=O(n^k)\) and \(16\cdot \ln (2\cdot {\mathrm{poly}}(n))=o(n)\) for any polynomial. If the distance of the positions of all particles is at least \(\frac{n}{4}\) then it takes at least \(\frac{n}{4}\) iterations until the optimal position can be reached as the distance can change only by one in each iteration. For each iteration P evaluations of the objective function are performed. Therefore we have at least \(\frac{n\cdot P}{4}\) objective function evaluations with probability at least \(\frac{1}{2}\) for large n which results in the claimed optimization time of \(\varOmega (n\cdot P)\). \(\square \)
This means if we choose , e.g., \(P=n^{10}\) we would have at least \(\varOmega (n^{11})\) function evaluations in expectation.