Abstract
Metaheuristics are powerful tools for solving optimization problems whose structural properties are unknown or cannot be exploited algorithmically. We propose such a metaheuristic for a large class of optimization problems over discrete domains based on the particle swarm optimization (PSO) paradigm. We provide a comprehensive formal analysis of the performance of this algorithm on certain “easy” reference problems in a blackbox setting, namely the sorting problem and the problem OneMax. In our analysis we use a Markov model of the proposed algorithm to obtain upper and lower bounds on its expected optimization time. Our bounds are essentially tight with respect to the Markov model. We show that for a suitable choice of algorithm parameters the expected optimization time is comparable to that of known algorithms and, furthermore, for other parameter regimes, the algorithm behaves less greedy and more explorative, which can be desirable in practice in order to escape local optima. Our analysis provides a precise insight on the tradeoff between optimization time and exploration. To obtain our results we introduce the notion of indistinguishability of states of a Markov chain and provide bounds on the solution of a recurrence equation with nonconstant coefficients by integration.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Metaheuristics are very successful at finding good solutions for hard optimization problems in practice. However, due to the nature of such algorithms and the problems they are applied to, it is generally very difficult to derive performance guarantees, or to determine the number of steps it takes until an optimal solution is found. In the present work we propose a simple adaptation of the particle swarm optimization (PSO) algorithm introduced by Eberhart and Kennedy (1995) and Kennedy and Eberhart (1995) to optimization problems over discrete domains. Our proposed algorithm assumes very little about the problem structure and consequently, it works naturally for a large class of discrete domains. It is reasonable to expect from a metaheuristic that it solves blackbox versions of many tractable problems in expected polynomial time. We provide a formal analysis based on Markov chains and establish under which conditions our algorithm satisfies this basic requirement. More concretely, we consider two classical problems that are easy to solve in a nonblackbox setting, namely the problem of sorting items by transpositions and the problem OneMax, which asks to maximize the number of ones in a bitstring. Our analysis gives precise information about the expected number of steps our algorithm takes in order to solve these two reference problems. Our runtime bounds are essentially tight with respect to the Markov process we use to model the behavior of the algorithm.
For practical purposes, a metaheuristic should, in one way or another, incorporate the following two general strategies: (1) find an improving solution locally (often referred to as exploitation) and (2) move to unexplored parts of the search space (often referred to as exploration). The first strategy essentially leads the algorithm to a local optimum while the second one helps the algorithm to avoid getting stuck when it is close to a local optimum. For our proposed algorithm, as for many other metaheuristics, the tradeoff between the two strategies can be conveniently set by an algorithm parameter. Our analysis shows that there is a sharp threshold with respect to this parameter, where the expected runtime of the algorithm on the reference problems turns from polynomial to exponential. Hence, we can maximize the algorithm’s ability to escape local optima while still maintaining polynomial runtime on the reference problems.
A key tool for the runtime analysis of metaheuristics for optimization problems over discrete domains is the fitness level method pioneered by Wegener (2002). The basic idea is to consider the level sets of the objective function of a problem instance and to determine the expected number of steps an algorithm takes to move to a better level set. This approach has been used extensively in the study of socalled elitist \((1+1)\)EAs (Wegener 2002; Droste et al. 2002; Giel and Wegener 2003; Sudholt 2013). These algorithms keep a single “current” solution and update this solution only if a better one is found. Our analysis of the proposed PSO algorithm also relies on the fitness level method. However, since our algorithm also considers nonimproving solutions in order to escape local optima, a much more involved analysis is required in order to determine the time it takes to move to a better fitness level.
We will refer in the following by expected optimization time to the expected number of evaluations of the objective function an algorithm performs until an optimal solution is found. Before giving a precise statement of our results we provide some background information on the PSO algorithm as well as the two reference problems we consider.
1.1 Particle swarm optimization
The PSO algorithm has been introduced by Eberhart and Kennedy (1995) and Kennedy and Eberhart (1995) and is inspired by the social interaction of bird flocks. Fields of successful application of PSO are, among many others, Biomedical Image Processing (Schwab et al. 2015; Wachowiak et al. 2004), Geosciences (Onwunalu and Durlofsky 2010), Agriculture (Yang et al. 2017), and Materials Science (Ramanathan et al. 2009). In the continuous setting, it is known that the algorithm converges to a local optimum under mild assumptions (Schmitt and Wanka 2015). The algorithm has been adapted to various discrete problems and several results are available, for instance for binary problems (Sudholt and Witt 2010) and the traveling salesperson problem (TSP) (Hoffmann et al. 2011).
A PSO algorithm manages a collection (called swarm) of particles. Each particle consists of an (admissible) solution together with a velocity vector. Additionally each individual particle knows the local attractor, which is the best solution found by that particle. Information between particles is shared via a common reference solution called global attractor, which is the best solution found so far by all particles. In each iteration of the algorithm, the solution of each particle is updated based on its relative position with respect to the attractors and some random perturbation. Algorithm parameters balance the influence of the attractors and the perturbation and hence give a tradeoff between the two general search strategies “exploration” and “exploitation”. Although PSO has originally been proposed to solve optimization problems over a—typically rectangular—domain \(X \subseteq {\mathbb {R}} ^n\), several authors have adapted PSO to discrete domains. This requires a fundamental reinterpretation of the PSO movement equation because corresponding mathematical operations of a vector space are typically lacking in the discrete setting. An early discrete PSO variant is the binary PSO presented in Kennedy and Eberhart (1997) for optimizing over \(X=\{0,1\}^n\) where velocities determine probabilities such that a bit is zero or one in the next iteration. A PSO variant for optimizing over general integral domains \(X=\{0,1,\ldots ,M1\}^n\), \(M \in {\mathbb {N}} \), has been proposed in Veeramachaneni et al. (2007).
1.2 Problems and search spaces
In this section we briefly define the optimization problems for which we will study the performance of the proposed PSO algorithm. The problem OneMax asks for a binary string of length n that maximizes the function
which counts the number of ones in a binary string. A more general version of this problem asks to minimize the Hamming distance to an unknown binary string of length n. The proposed algorithm works exactly the same on the more general problem since it is indifferent to the actual bit values and each bit is handled independently. Therefore, the performance of our algorithm on this more general version is equal to its performance on OneMax. The corresponding search space is the ndimensional hypercube: Any binary string of length n is a (feasible) solution, and two solutions are adjacent iff they differ by exactly one bitflip. For \(n=4\), the search space is shown in Fig. 1. More generally, a pseudoBoolean function is any function \(f : \{0, 1\}^n \rightarrow {\mathbb {R}} \).
By the sorting problem we refer to the task of arranging n items in nondecreasing order using transpositions. An (algebraic) transposition \(t=(i\,\,\, j)\) is the exchange of the entries at positions i and j. Therefore, the search space is the following (undirected) graph: the vertices are the permutations on \(\{1,2,\ldots ,n\}\) and two vertices x, y are adjacent iff there is a transposition t such that \(x \circ t = y\). The objective function is the transposition distance to the identity permutation.^{Footnote 1} Figure 2 shows the search space for the problem of sorting items \(\{1, 2, 3, 4\}\) using transpositions. Any two permutations drawn in the same vertical layer have the same objective value.
The sorting problem and OneMax have a unique optimum. Furthermore, the value of the objective function is the distance to the unique optimal solution in the corresponding directed graph.
1.3 Our contribution
We propose a simple adaptation of the PSO algorithm to optimization problems over discrete domains. We refer to this algorithm as DPSO. The algorithm works naturally on a large class of discrete problems, for instance optimization problems over bitstrings, integral domains, and permutations. The general task is to optimize a function \(f:X\rightarrow {\mathbb {R}} \), where X is a finite set of feasible solutions. Our assumptions on the problem structure are the following. We assume that the set X is the vertex set of a finite, strongly connected graph and for any solution \(x \in X\), we can sample a neighbor of x efficiently and uniformly. The DPSO algorithm essentially explores this graph, looking for an optimal vertex. In our analysis, we assume at first a swarm size of one as in Mühlenthaler et al. (2017), similar to the analysis of EAs and ACO in Sudholt and Witt (2010). We refer to the corresponding specialization of DPSO as OnePSO. Indeed, for a single particle we have only a single attractor and, as a consequence, a single parameter is sufficient to control the tradeoff between the moving towards the attractor and performing a random perturbation.
Our main results are upper and lower bounds on the expected optimization time of the proposed OnePSO algorithm for solving the sorting problem and OneMax in a blackbox setting summarized in Table 1. Certainly there are faster algorithms for the sorting problem or OneMax in a nonblackbox setting, e.g., quicksort for the sorting problem. The upper bounds we prove for OnePSO naturally hold for DPSO and the bounds are tight with respect to our Markov model. The algorithm parameter c determines the probability of making a move towards the attractor. Depending on the parameter c, we obtain a complete classification of the expected optimization time of OnePSO. For \(c=0\), OnePSO performs a random walk on the search space, and for \(c=1\), the algorithm behaves like randomized local search (see Papadimitriou et al. 1990 or Doerr and Neumann 2020 for results on local search variants). For \(c\in (1/2,1)\) the OnePSO behaves essentially like the \((1+1)\)EA variants from Droste et al. (2002); Scharnow et al. (2004), since \((1+1)\)EA variants perform in expectation a constant number of elementary mutations to obtain an improved solution and OnePSO with \(c\in (1/2,1)\) in expectation also performs a constant number of elementary mutations to find an improved solution, before it returns to the current best solution. Therefore, OnePSO in a sense generalizes the \((1+1)\)EA algorithm since a parameter choice in \(c\in [0,1]\) supplies a broader range of behavior options than exploring solutions which are in expectation a constant number of elementary mutations away from the current best solution. If \(c<1\) then the OnePSO uses similar subroutines as the Metropolis algorithm (see Metropolis et al. 1953), but for OnePSO new positions are always accepted and guidance to good solutions is instead implemented by a “drift” to the best position found so far.
Indeed bounds on the expected optimization time for OneMax and upper bounds on the expected optimization time for the sorting problem of the OnePSO with parameter \(c\in (1/2,1]\) match the respective bounds on the expected optimization time for \((1+1)\)EA variants from citeDJW:02,STW:04. We show that for \(c \in [1/2, 1)\), the expected optimization time of OnePSO for sorting and OneMax is polynomial and for \(c \in (0, 1/2)\), the expected optimization time is exponential. For c in the latter range we provide lower bounds on the base of the exponential expression by \(\alpha (c)\) for sorting and \(\beta (c)\) for OneMax such that \(1< \beta (c)< \alpha (c) < 6\) (see Fig. 5). Please note that \(\alpha \) and \(\beta \) have been significantly improved compared to the conference version (Mühlenthaler et al. 2017) and the upper bound on OneMax has also been reduced heavily to an exponential term with base \(\beta (c)\). This means that the lower and upper bound on the base of the exponential expression for the expected optimization time is equal. Note that for \(c=1/2\) the expected time it takes to visit the attractor again after moving away from it is maximal while keeping the expected optimization time polynomial, i.e., we have a phase transition at \(c=1/2\) such that for any \(c\le 1/2\) the expected optimization time is polynomial and for any \(c>1/2\) the expected optimization time is exponential in n. Hence, the parameter choice \(c=1/2\) maximizes the time until the attractor is visited again, i. e., the particles can explore the search space to the largest possible extent, provided that OneMax and the sorting problem are solved efficiently in a blackbox setting.
In order to obtain the bounds shown in Table 1, we use a Markov model which captures the behavior of the OnePSO algorithm between two consecutive updates of the attractor. Depending on whether we derive upper or lower bounds on the expected optimization time, the Markov model is instantiated in a slightly different way. The relevant quantity we extract from the Markov model is the expected number of steps it takes until the OnePSO algorithm returns to the attractor. We determine \(\varTheta \)bounds on the expected return time by an analysis of appropriate recurrence equations. Similar recurrences occur, for example, in the runtime analysis of randomized algorithms for the satisfiability problem (Papadimitriou 1991; Schöning 1999). Thus, our analysis of the Markov model presented in Sect. 3 may be of independent interest. For \(c > 1/2\), the recurrence equations can be solved using standard methods. For the parameter choice \(c \le 1/2\) however, we need to solve recurrence equations with nonconstant coefficients in order to get sufficiently accurate bounds from the model. The gaps between upper and lower bounds on the expected optimization times shown in Table 1 result from choosing bestcase or worstcase bounds on the transition probabilities in the Markov model, which are specific to the optimization problem. Since our bounds on the transition probabilities are essentially tight, we can hope to close the gap between the upper and lower bounds on the sorting problem (especially in the exponential case) only by using a more elaborate model.
Furthermore, based on Wald’s equation and the BlackwellGirshick equation we obtain also the variance of the number of function evaluations needed to find an optimal solution with respect to the Markov model.
Upper bounds To obtain the upper bounds shown in Table 1 we use the established fitness level method (e.g., see Wegener 2002). We instantiate our Markov model such that improvements of the attractor are only accounted for if the current position is at the attractor. The main difficulty is to determine the expected number of steps needed to return to the attractor. We obtain this quantity from the analysis of the corresponding recurrences with constant and nonconstant coefficients. Furthermore, we obtain by integration closedform expressions of the expected number of steps it takes to return to the attractor after an unsuccessful attempt to improve the current best solution.
Lower bounds The runtime of the OnePSO algorithm is dominated by the time required for the last improvement of the attractor, after which the global optimum has been found. We again use the Markov model and observe that in this situation, the global optimum can be reached only when the Markov model is in a specific state. We argue that the optimal solution is included in a certain set \({\hat{Y}}\) of indistinguishable states. Therefore, in expectation, this set needs to be hit \(\varOmega (\vert {\hat{Y}}\vert )\) times until the optimum has been found. By evaluation of the return time to the attractor we also obtain bounds on the return time to the set \({\hat{Y}}\). Furthermore, for DPSO with a constant number of particles, we give a lower bound of \(\varOmega (\log n)\) for optimizing a pseudoBoolean function and for \(P = \mathrm poly(n)\) particles a stronger lower bound of \(\varOmega (P\cdot n)\) for the same task.
Open problems Finally, we conjecture that the expected optimization time of OnePSO for sorting n items is asymptotically equivalent to n! if the attractor is not used at all (\(c = 0\)). An equivalent statement is that a random walk on the set of permutations of n items using single transpositions as neighborhood relation asymptotically takes expected time n! to discover a fixed given permutation starting at a random position. We provide theoretical evidence for this conjecture in “Appendix A”. Furthermore, we conjecture stronger lower bounds for OnePSO for sorting n items for \(c > 0\) and provide evidence in “Appendix B”.
Extensions This article extends the conference paper (Mühlenthaler et al. 2017) as follows. We present DPSO, a discrete PSO algorithm with multiple particles in Sect. 2 and show in Sect. 5.3 which results for OnePSO generalize to DPSO. For PseudoBoolean functions, a new general lower bound is presented, which holds for DPSO (see Sect. 5.4). Furthermore, we give a refined analysis in the case of exponential runtime which allows us to determine the exact base of the exponential expression (see Sect. 4.4). In addition to the analysis of the expected optimization time, we also consider the variance of the expected optimization time (see Sect. 4.5). We conjecture that the expected number of steps needed until the random walk on the space of permutations hits a fixed permutation is n!. We provide evidence for this conjecture in “Appendix A”. Finally, we present an approximate model of the algorithm with average transition probabilities to obtain the actual bounds where lower and upper bounds on the expected optimization time differ (see “Appendix B”). Also, some theorems have been extended to larger classes of functions, which may make them more useful in other settings.
1.4 Related work
Runtime results are available for several other metaheuristics for optimization problems over discrete domains, for example evolutionary algorithms (EAs) (Droste et al. 2002; Giel and Wegener 2003; Wegener 2002; Antipov et al. 2018, 2019) and ant colony optimization (ACO) (Doerr et al. 2007; Neumann and Witt 2007; Sudholt and Thyssen 2012). Most of the results relevant to this work concern the binary PSO algorithm and the \((1+1)\)EA algorithm. For the binary PSO, Sudholt and Witt (2008, 2010) provide various runtime results. For instance, they give general lower bound of \(\varOmega (n/\log n)\) for every function with a unique global optimum and a bound of \(\varTheta (n \log n)\) on the function OneMax. Note that the binary PSO studied in Sudholt and Witt (2010) has been designed for optimizing over \(\{0,1\}^n\) and it is different from our proposed DPSO, which can be applied to a much wider range of discrete problems. Sudhold and Witt show the following bound for the binary PSO.
Theorem 1
(Sudholt and Witt 2010, Theorem) Under certain assumptions on the algorithm parameters, the expected optimization time of the binary PSO for optimizing \(f : \{0,1\}^n \rightarrow {\mathbb {R}} \) is \(O(mn\log n) + \sum _{i=1}^{m1} 1/s_i\), where m is the number of level sets of f and \(s_i\) is a lower bound on the probability to move from level i to level \(i1\).
Essentially, this result reflects the fact that the binary PSO converges to the attractor in expected time \(O(n\log n)\) unless the attractor has been updated meanwhile. This happens once for each fitness level. For OneMax, this result yields an expected optimization time of \(O(n^2 \log n)\). By a more careful analysis of the binary PSO on OneMax, the following improved bound is established:
Theorem 2
(Sudholt and Witt 2010, Theorem) The expected optimization time of the binary PSO with a single particle optimizing OneMax is \(O(n \log n)\).
The \((1+1)\)EA considered in Scharnow et al. (2004) is reminiscent of stochastic hill climbing: in each iteration, a random solution is sampled and the current solution is replaced if and only if the solution is better. In order to escape local optima, the distance between the current solution and the new one is determined according to Poisson distributed random variables. Scharnow et al. (2004) provide bounds on the expected optimization time of a \((1+1)\)EA sorting n items. They consider various choices of objective functions (e.g., Hamming distance, transposition distance, ...) as well as mutation operators (e.g., transpositions, reversing keys in a certain range, ...). A general lower bound of \(\varOmega (n^2)\) is proved, which holds for all permutation problems having objective functions with a unique optimum Scharnow et al. (2004, Theorem 1). The most relevant runtime result for a comparison with our DPSO algorithm is the following.
Theorem 3
(Scharnow et al. 2004, Theorem 2/Theorem 4) The expected optimization time of the \((1+1)\)EA for sorting n items is \(\varTheta (n^2 \log n)\) if the objective function is the transposition distance to the sorted sequence and mutations are transpositions.
The upper bound can be obtained by the fitness level method and a lower bound of \(\varOmega (k/n^2)\) on the probability of improvement when the current solution is at transposition distance k to the attractor. The lower bound follows from a similar argument. In addition, for determining the lower bound (Scharnow et al. 2004) consider the Hamming distance to evaluate the distance between the current position and the optimum although the algorithm still uses the transposition distance to decide which position is better. In the conference version (Mühlenthaler et al. 2017) we incorrectly claimed that the proof does not apply to this setting. Thanks to an anonymous reviewer we can correct this statement here.
In contrast to the \((1+1)\)EA algorithm, the binary PSO studied in Sudholt and Witt 2010 allows for nonimproving solutions, but it converges to the attractor exactly once per fitness level. After the convergence occurred, the binary PSO behaves essentially like the \((1+1)\)EA.
Additionally, Raß et al. (2019) is based on a preliminary version of the present paper. There the authors applied the OnePSO to the singlesource shortest path problem. For this purpose they extend the Markov model presented here by allowing self loops. They also used the bounds by integration which are proved in this work without repeating the proof. The following upper and lower bounds on the expected optimization time are given which are dependent on the algorithm parameter c specifying the probability of movement towards the attractor.
Theorem 4
(Raß et al. 2019, Theorem 5/Theorem 7) The expected optimization time T(n), to solve the singlesource shortest path problem with n nodes is bounded by
and
where \(\varphi (c)=e^{(12c)/(1c)}\cdot (\frac{1c}{c})\) and any arbitrarily small \(\varepsilon >0\).
1.5 Organization of the paper
In Sect. 2 we introduce the algorithm DPSO for solving optimization problems over discrete domains. In Sect. 3 we provide a Markov model for the behavior of the algorithm OnePSO—a restriction of DPSO to one particle—between two updates of the local attractor. Section 4 contains a comprehensive analysis of this Markov model. The results from this section are used in Sect. 5 in order to obtain the bounds on the expected optimization time for OnePSO shown in Table 1 as well as lower bounds for DPSO on pseudoBoolean functions. Section 6 contains some concluding remarks.
2 Discrete PSO algorithm
In this section we introduce the DPSO algorithm, a PSO algorithm that optimizes functions over discrete domains. A simplified version of the algorithm that uses just a single particle will be referred to as OnePSO. Note that OnePSO is different from the 1PSO studied in Sudholt and Witt (2010), which is tailored to optimization over bitstrings. The DPSO and OnePSO algorithm sample items from a finite set X in order to determine some \(x^{*} \in X\) that minimizes a given objective function \(f:X \longrightarrow {\mathbb {R}} \). In order to have a discrete PSO that remains true to the principles of the original PSO for optimization in the domain \({\mathbb {R}} ^n\) from Eberhart and Kennedy (1995) and Kennedy and Eberhart (1995), we need some additional structure on X: For each \(x \in X\) we have a set of neighbors \({\mathscr {N}}_X(x)\). If the set X is clear from the context we may drop the subscript. The neighborhood structure induces a solution graph with nodes X and arcs \(\{xy \mid x, y \in X, y \in {\mathscr {N}}(x)\}\). The distance \(d (x, y)\) of solutions \(x, y \in X\) is the length of a shortest (directed) xypath in this graph. We assume that the solution graph is strongly connected, so the PSO cannot get “trapped” in a particular strongly connected component. The search spaces of our reference problems in combination with the used neighborhood relationship satisfy this assumption.
The DPSO algorithm performs the steps shown in Algorithm 1. The initial positions of the particles are chosen uniformly at random (u.a.r.) from the search space X. The parameter \(c_{loc}\) determines the importance of the local attractor \(l_i\) of each particle, which is the best solution a single particle has found so far, and the parameter \(c_{glob}\) determines the importance of the global attractor g, which is the best solution all particles have found so far. In each iteration each particle moves towards the local attractor with probability \(c_{loc}\), moves towards the global attractor with probability \(c_{glob}\) and otherwise move to a random neighbor. If \(l_i\) equals g then the particles still move only with probability \(c_{glob}\) to g. Note that the attractors \(l_i\) and g are updated in lines 21 and 24 whenever a strictly better solution has been found.
Alternatively, one could choose to update local and global attractors whenever the new position is at least as good as the position of the attractor (use \(\le \) instead of < in lines 21 and 24). The theorems presented here can also be carried over to this modified setting. Admittedly, for functions with plateaus this version potentially performs better, since a plateau is traversed easily by the modified algorithm. However, for the problems considered in this work there are no plateaus. Additionally, the probability to improve the objective function in the situation where position and attractor differ but have equal objective function value is higher than in the situation where the position is placed at the attractor.
At first glance, Algorithm 1 may not seem like an implementation of the PSO ideas, since we are choosing only a single attractor on each move or even make a random move whereas the classical PSO uses local and global attractor at each move, but looking at several consecutive iterations we retain the tendency of movement towards all the attractors. We consider the PSO to be an infinite process, so we do not give a termination criterion. We assume that sampling of \(x'\) in lines 11, 14 and 17 can be performed efficiently. This is the case for the neighborhood structures we consider.
The algorithm OnePSO is simply given by Algorithm 1 with a single particle, i.e., we have \(P = 1\). Note that there is only a single attractor in this case. Hence, OnePSO has just a single parameter \(c=c_{glob}\) that determines the probability of moving towards the (global) attractor g. In all other aspects it behaves like the DPSO algorithm.
3 Markov model of OnePSO
We present a simple Markov model that captures the behavior of the OnePSO algorithm between two consecutive updates of the attractor. This model has been presented already in Mühlenthaler et al. (2017) but we repeat it here to present a self contained overview on the presented approach. As an extension to Mühlenthaler et al. (2017) we also present how the variance can be computed. This is essential for experiments and if OnePSO and DPSO are actually used. Using this model we can infer upper and lower bounds on the expected optimization time of the OnePSO algorithm on suitable discrete functions. For our analysis, we assume that the objective function \(f: X \rightarrow {\mathbb {R}} \) has the property that every local optimum is a global one. That is, the function is either constant or any nonoptimum solution x has a neighbor \(y \in {\mathscr {N}}(x)\) such that \(f(x) > f(y)\). Functions with that property are called unimodal functions. Although this restriction certainly narrows down the class of objective functions to which our analysis applies, the class seems still appreciably large, e.g., it properly contains the class of functions with a unique global optimum and no further local optima.
Assume that the attractor \(g \in X\) is fixed and g is not a minimizer of f. Under which conditions can a new “best” solution be found? Certainly, if the current position x is equal to g, then, by the described structure of f we get an improvement with positive probability. If \(x \ne g\) then the attractor may still be improved. However, for the purpose of upper bounding the expected optimization time of the OnePSO we dismiss the possibility that the attractor is improved if \(x \ne g\). As a result, we obtain a reasonably simple Markov model of the OnePSO behavior. Quite surprisingly, using the same Markov model, we are also able to get good lower bounds on the expected optimization time of the OnePSO (see Sect. 5.2 for the details).
Recall that we think of the search space in terms of a strongly connected graph. Let n be the diameter of the search space X, i.e., the maximum distance of any two points in X. We partition the search space according to the distance to the attractor \(g\in X\). That is, for \(0 \le i \le n\), let \(X_i = \{ x \in X \mid d (x, g) = i\}\). Note that this partition does not depend on the objective function. If the search space is not symmetric as in our case it could also be possible that some \(X_i\) are empty, because the maximal distance to a specific solution in the search space could be less than the diameter. The model consists of \(n+1\) states \(S_0,S_1,\ldots ,S_n\). Being in state \(S_i\) indicates that the current solution x of the OnePSO is in \(X_i\).
For each \(x\in X_i\) we denote by \(p_x\) the transition probability from x to an element in \(X_{i1}\). The probabilities \(p_x\) in turn depend on the parameter c, which is the probability that the OnePSO explicitly moves towards the attractor. If the current position x is in \(X_{i}\) and the algorithm moves towards the attractor, then the new position is in \(X_{i1}\). On the other hand, if the PSO updates x to any neighbor chosen u.a.r. from \({\mathscr {N}}(x)\), then the new position is in \(X_{i1}\) with probability \({\mathscr {N}}(x) \cap X_{i1}/{\mathscr {N}}(x)\). So we obtain the transition probability
Remark 1
In this work we assume that the probability that we move from a position \(x\in X_i\) to an element in \(X_i\) is zero, i. e., if we move from a position x to a neighboring position \(x'\in {\mathscr {N}}(x)\) then the distance to the attractor does always change (\(d (x,g)\ne d (x',g)\)).
This assumption holds for both problems we investigate in Sect. 5. Nevertheless, an extensions allowing transitions inside a level \(X_i\) is possible and has been considered already in the article (Raß et al. 2019) which is based on an arXiv preprint of this article.
Using the assumption in Remark 1 and the fact that \(X_i\) is defined by distance to a fixed position g the probability of moving from x to an element in \(X_j\), \(j \notin \{i1, i+1\}\), is zero. Consequently, the probability of moving from x to an element in \(X_{i+1}\) is then \(1p_x\). Furthermore, if the OnePSO is at position \(x \in X_n\) then any move brings us closer to the reference solution; so \(p_x = 1\) in these cases.
Please note that \(p_{x}\) and \(p_{x'}\) can differ even if \(x,x'\) are contained in the same set \(X_i\). Therefore we do not necessarily obtain a Markov model if we use the states \(S_i\) and the transition probabilities \(p_i=p_x\) for some \(x\in X_i\) as this value is not necessarily equal for all \(x'\in X_i\).
Nevertheless, we can analyze Markov chains using bounds on the transition probabilities. To be more precise, we can use \(p_i:=\min _{x\in X_i}p_x\) as lower bound and \(p_i':=\max _{x\in X_i}p_x\) as upper bound on the transition probabilities in the direction to the attractor to obtain an upper bound and lower bound on the expected number of iterations until the distance to the attractor is decreased respectively. Figure 3 shows the state diagram of this model.
Definition 1
By \({{\mathscr {M}}} \left( (p_i)_{1 \le i \le n}\right) \) we denote an instance of the Markov model with states \(S_0,S_1,\ldots ,S_n\) and data \(p_i\in {\mathbb {R}} _{\ge 0}\), \(1 \le i \le n\). Suppose we are in state \(S_i\). Then we move to state \(S_{i1}\) with probability
and otherwise we move to state \(S_{i+1}\).
Please note that the data does not need to be in [0, 1] but for the ease of presentation we will refer to them as probabilities. Furthermore, supposed we are in state \(S_{n}\) even if \(p_n\ne 1\) the probability of moving to \(S_{n1}\) is 1. This notation allows us to succinctly specify the currently used Markov model, e.g., the model which is used to obtain upper bounds is described by \({{\mathscr {M}}} \left( (\min _{x\in X_i}p_x)_{1\le i\le n} \right) \).
Our goal is to determine the expected number of steps needed to hit a solution which is better than the attractor after starting in \(S_0\). Let \(p_g\) be the probability to improve the attractor if we are currently in state \(S_0\), hence at the attractor. Then the probability \(p_g\) depends on f and the choice of g. We have that \(p_g\) is positive since f is unimodal. In order to reach a better solution from \(S_0\) we need in expectation \(1/p_g\) tries. If we are unsuccessful in some try, then the OnePSO moves to \(S_1\). For upper bounds we can ignore the chance to improve the attractor through other states. Thus we need to determine the expected number of steps it takes until we can perform the next try, that is, the expected first hitting time for the state \(S_0\), starting in \(S_1\). The expected number \(h_i\) of steps needed to move from \(S_i\) to \(S_0\) is given by the following recurrence:
4 Analysis of the Markov model
In this section we prove upper and lower bounds on the expected return time to the state \(S_0\) of the Markov model from Sect. 3. These bounds are of key importance for our runtime analysis in Sect. 5. For the lower bounds we also introduce a notion of indistinguishability of certain states of a Markov model.
In our analysis the probabilities \(p_i\) are generally not identical. If we assume that \(p_1 = p_2 = \cdots = p_n = p\), then we obtain a nonhomogeneous recurrence of order two with constant coefficients. In this case, standard methods can be used to determine the expected time needed to move to the attractor state \(S_0\) as a function of n (Graham et al. 1994, Chap. 7). Note also that for \(p=1/k\) this is exactly the recurrence that occurs in the analysis of a randomized algorithm for kSAT (Papadimitriou 1991; Schöning 1999; Mitzenmacher and Upfal 2005, pp. 160f.). If \(p_i\) has not necessarily identical values which are dependent on i, then the recurrence can in some cases be solved, see e.g., Graham et al. (1994, Chap. 7) and Petkovšek (1992). Here, due to the structure of the recurrence, we can use a more pedestrian approach, which is outlined in the next section.
4.1 Reformulation of the recurrence
We first present a useful reformulation of Recurrence (1). From this reformulation we will derive closedform expressions and asymptotic properties of the return time to the attractor of the transition probabilities.
Let \(W_{i}\) be the number of steps needed to move from state \(S_i\) to state \(S_{i1}\) and let \(H_i:=\mathrm{E}[W_i]\) be its expectation. Then \(H_i\) can be determined from \(H_{i+1}\) as follows: In expectation, we need \(1/p_i\) trials to get from \(S_i\) to \(S_{i1}\), and each trial, except for the successful one, requires \(1+H_{i+1}\) steps. The successful trial requires only a single step, so \(H_{i}\) is captured by the following recurrence:
Another interpretation is the following: \(W_i\) is equal to one with probability \(p_i\), the direct step to \(S_{i1}\), and with probability \(1p_i\) it takes the current step which leads to state \(S_{i+1}\), then \(W_{i+1}\) steps to go from \(S_{i+1}\) back to \(S_i\) and then again \(W_i\) steps. For the expected value of \(W_i\) this interpretation leads to the formula
which is equivalent to Eq. 2 after solving for \(H_i\). Please note that the probabilities \(p_i\) are mostly determined by some function depending on n and i. Unfolding the recurrence specified in Eq. (2) k times, \(1 \le k \le n\), followed by some rearrangement of the terms yields
Thus, for \(k=n\) we obtain the following expression for \(H_1\):
where the second term is a correction term which is required whenever \(p_n < 1\) (see Definition 1) in order to satisfy the initial condition given in Eq. (3). Equation (5) has also been mentioned in Droste et al. (2001, Lemma 3) in the context of the analysis of randomized local search or in Kötzing and Krejca (2018, Theorem 3). \(H_k\) can be obtained analogously, which leads to
4.2 Identical transition probabilities
If the probabilities \(p_i = p\) for some constant \(p \in [0,1]\) and \(1 \le i < n\), then Recurrences (2) become linear recurrence equations with constant coefficients. Standard methods can be used to determine closedform expressions for \(h_i\) and \(H_i\). However, we are mainly interested in \(H_1\) and are able to determine closedform expressions directly from Eq. (5).
Theorem 5
Let \(0< p < 1\). Then the expected return time \(H_1\) to \(S_0\) is
Proof
By setting \(p_i = p\) in Eq. (5) and performing some rearrangements the theorem is proved. \(\square \)
It is easily verified that this expression for \(h_1\) satisfies Eq. (1). So, with \(p_i = p\) we have that the time it takes to return to the attractor is bounded from above by a constant, a linear function, or an exponential exponential function in n if \(p > 1/2\), \(p = 1/2\), or \(p < 1/2\), respectively.
For the case \(p=1/2\) one can obtain this result also from the Gambler’s Ruin Model by mirroring the state 0 at n which would result in termination if values 0 or 2n appear. Starting at value 1 results in a first hitting time of \(2n1\) in this Gambler’s Ruin Model as stated in Theorem 5.
4.3 Nonidentical transition probabilities
Motivated by the runtime analysis of OnePSO applied to optimization problems such as sorting and OneMax, we are particularly interested in the expected time it takes to improve the attractor if the probabilities \(p_i\) are slightly greater than 1/2. By slightly we mean \(p_i = 1/2 + i/(2A(n))\) which appears in the analysis of OnePSO optimizing OneMax and the sorting problem, or \(p_i=1/2+A(i)/(2A(n))\) which appears in the analysis of OnePSO optimizing the sorting problem, where \(A:{\mathbb {N}} \rightarrow {\mathbb {N}} \) is some nondecreasing function of n such that \(\lim _{n \rightarrow \infty } A(n) = \infty \). Recall from Definition 1 that if \(p_i>1\) then we move from state \(S_i\) to state \(S_{i1}\) with probability 1. Clearly, in this setting we cannot hope for a recurrence with constant coefficients. Our goal in this section is to obtain the asymptotics of \(H_1\) as \(n \rightarrow \infty \) for \(A(n) = n\) and \(A(n) = \left( {\begin{array}{c}n\\ 2\end{array}}\right) \). We show that for \(p_i=1/2+i/(2\cdot A(n))\) and \(A(n) = n\) the return time to the attractor is \(\varTheta (\sqrt{n})\), while for \(A(n) = \left( {\begin{array}{c}n\\ 2\end{array}}\right) \) the return time is \(\varTheta (n)\).
Lemma 1
Let \(M={{\mathscr {M}}} ((1/2+i/(2n))_{1\le i\le n})\). Then
Proof
We have \(p_n = 1\) so the correction term in Eq. (5) is zero. We rearrange the remaining terms of Eq. (5) and find that
Applying the wellknown relation
(for an elegant derivation, see Hirschhorn 2015) finishes the proof. \(\square \)
This lemma can be generalized for linearly growing probabilities.
Theorem 6
Let \(M = {{\mathscr {M}}} \left( (p_i)_{1 \le i \le n}\right) \), where \(p_i = 1/2 + i /(2A(n))\).
Then \(H_1 = \varTheta (\min (\sqrt{A(n)},n))\) with respect to M.
Proof
First, we consider the case \(A(n) \le n^2\). Let \(n' = A(n)\), which is the smallest number such that \(p_{n'} = 1/2 + n'/(2A(n)) \ge 1\). First, assume that \(n' \le n\) and consider the “truncated” model \(M' = {{\mathscr {M}}} \left( (p_i)_{1 \le i \le n'} \right) \). Please note that there is actually no difference between M and \(M'\) because the removed states are never visited as \(p_{n'}\), the probability to move from \(S_{n'}\) to \(S_{n'1}\), is already one and \(S_{n'+1}\) is never visited. Let \(H_1'\) be the expected time to reach state \(S_0\) starting at state \(S_1\) with respect to \(M'\). By Lemma 1 we have \(H_1'=\Theta (\sqrt{A(n)})\), which is by the construction of \(M'\) equal to \(H_1\). On the other hand, assume that \(n' > n\) and consider the “extended” model \({{\tilde{M}}} = {{\mathscr {M}}} \left( (p_i)_{1 \le i \le n'} \right) \). M and \({{\tilde{M}}}\) are visualized in Fig. 4a with omitted probability of worsening. Let \({{\tilde{H}}}_1\) be the expected time to reach state \(S_0\) starting at state \(S_1\) with respect to \({{\tilde{M}}}\). By Lemma 1 we have \({{\tilde{H}}}_1 = \Theta (\sqrt{A(n)})\) and since \({{\tilde{H}}}_1 \ge H_1\) we obtain \(H_1 = O(\sqrt{A(n)})\).
To obtain a lower bound on \(H_1\) for the case \(n' > n\) we consider the model \({{\hat{M}}} = {{\mathscr {M}}} \left( ({{\hat{p}}}_i)_{1\le i\le {{\hat{n}}}} \right) \), where \(\hat{n} = \min (n,\lfloor \sqrt{A(n)}\rfloor )\) and \({{\hat{p}}}_i = 1/2 + 1/(2{{\hat{n}}})\) for \(1 \le i < {{\hat{n}}}\), and \({{\hat{p}}}_{{{\hat{n}}}} = 1\). For \(1\le i < {{\hat{n}}}\) we have that \({{\hat{p}}}_i \ge p_i\) because \(1/{{\hat{n}}}={{\hat{n}}}/{{\hat{n}}}^2\ge \hat{n}/(A(n))\). A schematic representation of M and \({{\hat{M}}}\) can be found in Fig. 4b. Let \({{\hat{H}}}_1\) denote the expected time to reach state \(S_0\) starting at state \(S_1\) in \({{\hat{M}}}\). Since \({{\hat{p}}}_i \ge p_i\) for \(1 \le i \le {{\hat{n}}}\), \({{\hat{H}}}_1\) is a lower bound on \(H_1\). This fact is obvious as in this Markov model one can move only to neighboring states. Increasing the probability of moving towards the final state consequently decreases the probability of movement in the opposite direction and therefore the hitting time to reach the final state is in expectation reduced if the probabilities of moving towards the final state are increased. Since \(p := {{\hat{p}}}_i\) is constant for \(1 \le i < {{\hat{n}}}\) we get from Theorem 5 that
Substituting \(p = 1/2 + 1/{(2 {{\hat{n}}})}\) gives
Therefore \(H_1=\varOmega ({{{\hat{n}}}})=\varOmega (\min (n,\sqrt{A(n)}))\).
It remains to show that the statement holds if \(A(n) > n^2\). In this case, \(H_1 = O(n)\) is obtained by setting \(p_i = 1/2\), which is a lower bound on the probabilities of moving towards \(S_0\), for \(1 \le i < n\) and invoking Theorem 5. On the other hand, setting \(\breve{A}(n) = n^2\) and using \(\breve{M}={{\mathscr {M}}} ((1/2+i/(2\breve{A}(n)))_{1 \le i \le n})\) gives a lower bound on \(H_1\), because \(1/2+i/(2\breve{A}(n))\) is an upper bound on \(1/2 + i/(2A(n))\). As discussed above, for the case \({A}(n) \le n^2\), the expected time to reach state \(S_0\) starting at state \(S_1\) in \(\breve{M}\) is \(\Omega (\min (n,\sqrt{\breve{A}(n)}))\). Therefore, \(H_1 = \Omega (\min (n,\sqrt{\breve{A}(n)}))=\Omega (n)\), which completes the proof. \(\square \)
For our application, the sorting problem, the following special case of Theorem 6 will be of interest:
Corollary 1
Let \(M = {{\mathscr {M}}} ((1/2(1+i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) ))_{1 \le i \le n})\), then \(H_1=\varTheta (n)\).
We will now consider a slightly different class of instances of the Markov model in order to obtain a lower bound on the OnePSO runtime for sorting in Sect. 5. For this purpose we consider transition probabilities \(p_i\) that increase in the same order as the divisor A(n), which suits our fitness level analysis of the sorting problem. This class of models is relevant for the analysis of the best case behavior of the OnePSO algorithm for sorting n items (see Theorem 3). Although we will only make use of a lower bound on \(H_1\) in this setting later on, we give the following \(\Theta \)bounds:
Theorem 7
Let \(M = {{\mathscr {M}}} \left( (\frac{1}{2}\left( 1+A(i)/A(n)\right) )_{1\le i\le n}\right) \) where \(A:{\mathbb {N}} \rightarrow {\mathbb {R}} ^+\) is a nondecreasing function and \(A(n)=\Theta (n^d)\) for some \(d> 0\). Then \(H_1 = \Theta (n^{d/(d+1)})\) with respect to M.
This theorem is a significant extension to Mühlenthaler et al. (2017, Theorem 5) which covers only the special case \(p_i=1/2\cdot (1+\left( {\begin{array}{c}i+1\\ 2\end{array}}\right) /\left( {\begin{array}{c}n\\ 2\end{array}}\right) ).\)
Proof
Consider the expression for \(H_1\) given in Eq. (4). Since \(p_i > 1/2\) for \(1 \le i \le n\) the products are at most 1 and \(1/p_i\) is at most 2. Therefore for any \(k\in \lbrace 1,\ldots ,n\rbrace : H_1\le 2k+H_{k+1}\). As \(H_{k+1}\) is the expected number of steps to move from state \(S_{k+1}\) to \(S_k\), the states \(S_0\) to \(S_{k1}\) are irrelevant for the calculation of \(H_{k+1}\) since they are never visited in between. Therefore also probabilities \(p_1\) to \(p_{k}\) do not matter. We truncate the model to states \(S_{k},\ldots ,S_n\). For these states the minimal probability of moving towards the attractor is \(p_{k+1}\ge p_k\). Therefore we can set \(p_i=p_{k}\) for \(i\in \lbrace k+1,\ldots ,n\rbrace \) to get an upper bound on the return time. By reindexing the states we obtain the model \(\tilde{M}={{\mathscr {M}}} ((p_{k})_{1\le i \le nk})\) and, because of the truncation and the decrease of probabilities, \({{\tilde{H}}}_1\) is an upper bound on \(H_{k+1}\), where \({{\tilde{H}}}_1\) is the expected number of steps to move from state \(S_1\) to \(S_0\) in model \({{\tilde{M}}}\). In \({{\tilde{M}}}\) we have the fixed probabilities \(p_{k}\) and can therefore apply Theorem 5 to determine \({{\tilde{H}}}_1\). Therefore
Altogether we have \(H_1\le 2k+\frac{A(n)}{A(k)}\). With \(k=n^{d/(d+1)}\), where d is the degree of A, we get
which certifies that \(H_1=O(n^{d/(d+1)})\). By using Eq. (4) we have the following lower bound on \(H_1\):
Note: \(\prod _{j=1}^{i1} \frac{1p_j}{p_j}\) is monotonically decreasing as \(p_j\ge 1/2\).
Note: A(j) is nondecreasing.
As this equation holds for any \(k\in \lbrace 1,\ldots ,n\rbrace \) we can choose \(k=g\cdot n^z\) with a not yet fixed constant \(g\in (0,1)\) and \(z=d/(d+1)\), \(z\in (0,1)\). Please note that \(g\cdot n^z\) tends to infinity if n tends to infinity and therefore asymptotic expressions can also be applied if \(g\cdot n^z\) is the argument. Please also note that k has to be an integer but errors can be captured by some \(\varTheta (1)\) expressions. Substituting k by \(g\cdot n^z\) in the previous inequality results in
Note: \(z+d\cdot zd=z\cdot (d+1)d=dd=0\).
Note: The last \(\Theta (1)\) can be bounded from above by some constant \(c_{\Theta }\) for large n (by definition of \(\Theta \)). Choose \(g=\root d+1 \of {1/(2\cdot c_{\Theta })}\). Then the expression in parentheses of the last term \(\left( 1 g^{d+1}\cdot \Theta (1)\right) \) may be negative for small n but is at least 1/2 for large n which implies that it is in \(\varOmega (1)\).
\(\square \)
It may be verified that
is a suitable choice for any \(0< \varepsilon < 1\), to replace k by \(g\cdot n^z\) to obtain the previous inequalities. To see this, note that the first fraction of \(\liminf \) and \(\limsup \) counterbalances the fluctuation of A(k)/A(n) relative to its \(\varTheta \)bound. Furthermore, the factor 1/4 compensates the factor 2 which is hidden by \(\Theta \) and supplies the desired factor of 1/2 to ensure that the expression in parentheses is positive and at least 1/2 for large n. Finally, the factor of \(1  \varepsilon \) is needed for some tolerance, because without it g is only sufficiently small in the limit^{Footnote 2} but not necessarily for large n.
This theorem is a generalization of Lemma 1 and implies the \(\Theta \)bound stated in that lemma by using \(A(n)=n\).
4.4 Bounds by integration
Reformulations (4) and (5) of the recurrence (1) given in Sect. 4.1 do not yield closedform expressions of \(H_1\), the expected number of steps it takes to return to the attractor after an unsuccessful attempt to improve the current best solution. In this section we derive closedform expressions for \(H_1\). In order to get rid of the sums and products in Eqs. (4) and (5), we use the following standard approaches. First, sums may be approximated by their integral. This approach works quite well if the summand/integrand is monotonic, which is true in our case. Second, products can be transformed to integrals by reformulating the product by the exponential function of the sum of logarithms. This approach is an extension to Mühlenthaler et al. (2017) as it is not present there at all.
Theorem 8
Let \(M={{\mathscr {M}}} ((p(i))_{1\le i\le n})\) and \(p:[0,n]\rightarrow (0,1]\) be a nondecreasing function assigning the probabilities in the model, then
The integral in \({\mathrm {base}}(p,n)\) is maximized by \(k=\inf \lbrace x\mid x\in [0,n]\wedge p(x)\ge 1/2\rbrace \) or \(k=n\) if the infimum is taken on the empty set.
Please note that we use \(\varTheta ^*\)bounds in the sense that polynomial factors can be omitted in \(\varTheta \)bounds. This is a similar notation as in the more common use case of \(O^*\) where polynomial factors can also be omitted for upper bounds.
Proof
p(i) is nondecreasing in i and has values in ]0, 1] and therefore \(\frac{1p(i)}{p(i)}\) and also \(\tau (i):=\ln \left( \frac{1p(i)}{p(i)}\right) \) are nonincreasing as the numerator is nonincreasing and the denominator is nondecreasing. In the following series of equations, let \(k \in [0, n)\). Using Eq. (4), we obtain
As k can be chosen arbitrarily we get the claimed lower bound for \(H_1\), because \(\frac{p(0)}{1p(0)}\) is a constant. As any integral is a continuous function also the whole expression in the supremum is a continuous function and therefore \(k=n\) can be allowed in the supremum without changing the value. Quite similar steps lead to the upper bound for \(H_1\). We start with Eq. (5).
This proves the claimed upper bound and as the base of the exponential part is equal for upper and lower bound we obtain the claimed \(\varTheta ^*\) bound.
The logarithm in \({\mathrm {base}}(p,n)\) is positive as long as \(1p(n\cdot x)\ge p(n\cdot x)\). Therefore the integral is maximized if we use the smallest possible k (the infimum) which satisfies the condition \(1p(k)\le p(k)\Leftrightarrow p(k)\ge \frac{1}{2}\). \(\square \)
\(p(n\cdot x)\) can in most cases be tightly bounded by a value independent of n. This is the case if for example \(p(i)=c+(1c)i/n\), which we have for the model solving OneMax by OnePSO. The k which maximizes the integral in the expression of \({\mathrm {base}}(p,n)\) is usually obtained by solving the simple equation \(p(k)=1/2\).
Therefore the integral can be evaluated and the base of the exponential part of the runtime can be determined.
4.5 Variance of the improvement time
We show that the standard deviation of the return time is in the same order as the return time. Therefore in experiments the average of such return times can be measured such that a small relative error can be achieved. Also the variance of \(W_i\), the number of steps needed to move from state \(S_i\) to state \(S_{i1}\), can be computed recursively. Let \(V_i:=\mathrm{Var}[W_i]\) be the variance of \(W_i\).
To evaluate this variance we need the expectation and variance of a random variable which is the sum of random variables where the number of summed up random variables is also a random variable. Such random variables appear in the GaltonWatson process (see Durrett 2010) from one generation to the next generation.
Lemma 2
Let T be a random variable with nonnegative integer values and let \((Y_i)_{i\in {{\mathbb {N}}}}\) be independent identically distributed random variables \(Y_i\sim Y\) which are also independent of T. Additionally let \(Z=\sum _{i=1}^{T}Y_i\). Then \(\mathrm{E}[Z]=\mathrm{E}[T]\cdot \mathrm{E}[Y]\) and \(\mathrm{Var}[Z]=\mathrm{E}[T]\cdot \mathrm{Var}[Y]+E[Y]^2\cdot \mathrm{Var}[T]\).
The statement on expected values is also known as Wald’s equation and the statement on the variance is known as the BlackwellGirshick equation. The BlackwellGirshick equation can be obtained by application of the law of total variance:
Also \(W_i\) can be specified as a sum of random variables where the number of summed up random variables is also a random variable. If we are currently in state \(S_i\) we have some success probability to move to \(S_{i1}\) in the next iteration. Therefore the number of trials in \(S_i\) until we move to \(S_{i1}\) follows a geometric distribution. In case of failure we move to \(S_{i+1}\) and need additional \(W_{i+1}\) steps until we can make our next attempt to move to \(S_{i1}\). Therefore
where T is a random variable distributed according to a geometric distribution with success probability equal to the probability of moving to \(S_{i1}\) from \(S_i\) and each \({{\tilde{W}}}_{i+1,j}\) is an independent copy of \(W_{i+1}\).
Theorem 9
where \(p_i\) is the probability of moving to \(S_{i1}\) from \(S_i\).
Proof
\(W_n=\mathrm{E}[W_n]=H_n=1\Rightarrow \mathrm{Var}[W_n]=V_n=\mathrm{Var}[1]=0\).
Let T be a random variable distributed according to a geometric distribution with success probability \(p_i\) and let all \(({{\tilde{W}}}_{i+1,j})_{y\in {{\mathbb {N}}}}\) be independent copies of \(W_{i+1}\).
Finally the rightmost expression of Eq. (8) is obtained by replacing \(H_{i+1}\) according to Eq. (2). \(\square \)
Therefore one can evaluate \(H_i\) by Eqs. (2) and (3) and then one can evaluate \(V_i\) by Eqs. (8) and (9).
Please note that \(V_i\) will always be in the same order as \(H_i^2\). If \((1p_i)/p_i\) is less than one then the recursively needed values of \(V_{j}\) for \(j>i\) become less important and we have mainly \(H_i^2\) and if \((1p_i)/p_i\) is greater than one then \(H_i^2\) is growing by at least \(((1p_i)/p_i)^2\) [see Eq. (2)] which is the square of the growing factor of \(V_i\).
As \(V_i\) is in the same order as \(H_i^2\) we obtain by an arithmetic average of T evaluations of \(W_i\) a relative error of approximately \(1/\sqrt{T}\). This is indeed a relevant statistic if evaluations are performed and is consolidated in the following corollary.
Corollary 2
Let \({{\tilde{W}}}_{i,j}\sim W_i\) be independent random variables. Then
5 Runtime analysis of OnePSO and DPSO
As mentioned above, we present a runtime analysis of OnePSO for two combinatorial problems, the sorting problem and OneMax. Our analysis is based on the fitness level method (Wegener 2002), in particular its application to the runtime analysis of a \((1+1)\)EA for the sorting problem in Scharnow et al. (2004). Consider a (discrete) search space X and an objective function \(f:X \rightarrow {\mathbb {R}} \), where f assigns m distinct values \(f_1< f_2< \cdots < f_m\) on X. Let \(S_i \subseteq X\) be the set of solutions with value \(f_i\). Assuming that some algorithm \({\mathscr {A}}\) optimizing f on X leaves fitness level i at most once then the expected runtime of \({\mathscr {A}}\) is bounded from above by \(\sum _{i=1}^m {1}/{s_i}\), where \(s_i\) is a lower bound on the probability of \({\mathscr {A}}\) leaving \(S_i\). The method has also been applied successfully, e.g., in Sudholt and Witt (2010) to obtain bounds on the expected runtime of a binary PSO proposed in Kennedy and Eberhart (1997).
5.1 Upper bounds on the expected optimization time
Similar to Scharnow et al. (2004) and Sudholt and Witt (2010), we use the fitnesslevel method to prove upper bounds on the expected optimization time of the OnePSO for sorting and OneMax. In contrast to the former, we allow nonimproving solutions and return to the attractor as often as needed in order to sample a neighbor of the attractor that belongs to a better fitness level. Therefore, the time needed to return to the attractor contributes a multiplicative term to the expected optimization time, which depends on the choice of the algorithm parameter c.
We first consider the sorting problem. The structure of the search space of the sorting problem has been discussed already in Scharnow et al. (2004) and a detailed analysis of its fitness levels is provided in Mühlenthaler et al. (2017). In the following lemma we bound the transition probabilities for the Markov model for the sorting problem. This allows us to bound the runtime of OnePSO for the sorting problem later on.
Lemma 3
For the sorting problem on n items, \(c = 1/2\) and \(x\in X_i\), the probability \(p_x\) that OnePSO moves from x to an element in \(X_{i1}\) is bounded from below by \(p_i = \frac{1}{2} (1+ i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) )\). Furthermore, this bound is tight.
Proof
The lower bound \(p_i\) on \(p_x\) can be obtained by
To show the above inequality, consider the attractor a and a permutation \(\tau \) such that \(x \circ \tau = a\). For each cycle of length k of \(\tau \), exactly \(k1\) transpositions are needed to adjust the elements in this cycle and there are \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \ge k1\) transpositions which decrease the transposition distance to the attractor a. Therefore the number of ways to decrease the transposition distance to a is bounded from below by the transposition distance to a. Hence, we have \(\vert {\mathscr {N}}(x)\cap X_{i1}\vert \ge i\).
The lower bound is tight as it appears if only cycles of length two (or one) appear. In Mühlenthaler et al. (2017, Sect. 4) a more detailed discussion on improvement probabilities can be found. \(\square \)
Using Lemma 3 we prove the following bounds on the expected optimization time \(T_{{\mathrm{sort}}}(n)\) required by OnePSO for sorting n items by transpositions.
Theorem 10
(Mühlenthaler et al. 2017, Theorem 13) The expected optimization time \(T_{{\mathrm{sort}}}(n)\) of the OnePSO sorting n items is bounded from above by
See Fig. 5 for a visualization of \(\frac{1c}{c}\).
Proof
Consider the situation that the attractor has just been updated. Whenever the OnePSO fails to update the attractor in the next iteration it will take in expectation \(H_1\) iterations until the attractor is reached again and then it is improved with probability at least \(i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \). Again, if the OnePSO fails to improve the attractor we have to wait \(H_1\) steps, and so on. Since we do not consider the case that the attractor has been improved meanwhile, the general fitness level method yields an expected runtime of at most \(\sum _{i=1}^n((H_1+1)(1/s_i1)+1) = H_1 \cdot O(n^2 \log n)\).
We now bound the expected return time \(H_1\). Let \(c \in (\frac{1}{2}, 1]\) and recall that \(p_i\) is the probability of moving from state \(S_i\) to state \(S_{i1}\). Then \(1 \ge p_i> c > \frac{1}{2}\). Then the expression for \(H_1\) given in Theorem 5 is bounded from above by the constant \(1/(2c1)\), so \(T_{{\mathrm{sort}}}(n)=O(n^2\log n)\). Now let \(c = \frac{1}{2}\), so \(p_i \ge \frac{1}{2} (1+ i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) )\) by Lemma 3. Then, by Corollary 1, we have \(H_1 = O(n)\), so \(T_{{\mathrm{sort}}}(n)=O(n^3\log n)\). Finally, let \(c \in (0, \frac{1}{2})\). Then \(p_i> c > 0\), and by Theorem 5, \(H_1\) is bounded from above by
so \(T_{{\mathrm{sort}}}(n)=O\left( \left( \frac{1c}{c} \right) ^n\cdot n^2\log n\right) \). \(\square \)
For \(c = 0\), OnePSO always moves to a uniformly drawn adjacent solution. Hence, the algorithm just behaves like a random walk on the search space. Hence, in this case, \(T_{{\mathrm{sort}}}(n)\) is the expected number of transpositions that need to be applied to a permutation in order to obtain a given permutation. We conjecture that \(T_{{\mathrm{sort}}}(n)\) has the following asymptotic behavior and provide theoretical evidence for this conjecture in the “Appendix A”.
Conjecture 1
\(T_{{\mathrm{sort}}}(n)\sim n!\) if \(c=0\).
Please note that the conjecture is actually only a conjecture on the upper bound as Theorem supplies a proof that \(T_{{\mathrm{sort}}}(n)=\varOmega (n!)\) if \(c=0\).
Using a similar approach as in Theorem 10, we now bound the expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) of OnePSO for OneMax.
Theorem 11
The expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) of the OnePSO solving OneMax is bounded from above by
where \(\beta (c)=2^{{1}/({1c})}\cdot (1c)\cdot c^{{c}/({1c})}\).
See Fig. 5 for a visualization of \(\beta (c)\).
Proof
The argument is along the lines of the proof of Theorem 10. We observe that on fitness level \(0 \le i \le n\) there are i bit flips that increase the number of ones in the current solution. Therefore, \(s_i = i/n\) and the fitness level method yields an expected runtime of at most \(\sum _{i=1}^n (H_1 + 1)(1/s_i1)+1 = H_1 \cdot O(n \log n)\). The bounds on \(H_1\) for \(c > \frac{1}{2}\) are as in the proof of Theorem 10. For \(c = \frac{1}{2}\) we invoke Lemma 1 and have \(H_1 = O(\sqrt{n})\). For \(c <\frac{1}{2}\) we use Theorem 8. The probabilities in the Markov model for \(H_1\) are \(p_i=c+(1c)i/n\) which can be continuously extended to the nondecreasing function \(p(i)=c+(1c)i/n\). Here \(k=n\cdot \frac{12c}{2(1c)}\) solves the equation \(p(k)=\frac{1}{2}\). Hence, we need the value of
Now Theorem 8 gives the upper bound \(H_1=O(n\cdot \beta (c)^n)\).
It remains to consider the case that \(c=0\). The claimed bound on \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) can be obtained by using the model \({{\mathscr {M}}} ((\frac{i}{n})_{1\le i\le n})\). Each state represents the distance to the optimal point. By Eq. (6) we have
The maximal expected time to reach the optimal point is the sum of all \(H_k\):
\(\square \)
We remark that the upper bounds given in Theorem 11 for \(c\in [\frac{1}{2},1]\) were presented in Mühlenthaler et al. (2017, Theorem 14) and that the upper bound for \(c\in (0,\frac{1}{2})\) is newly obtained using the boundsbyintegration from Sect. 4.4 and the proof of the upper bound for \(c=0\) is also new compared to Mühlenthaler et al. (2017). Admittedly, the bound for \(c=0\) is already available in the context of randomized local search and can be found in Garnier et al. (1999). Furthermore, note that for \(c=\frac{1}{2}\) it is not sufficient to use the lower bound \(p_i \ge p_1=\frac{1}{2} + \frac{1}{2n}\) in order to obtain the runtime bound given in Theorem 11.
By repeatedly running OnePSO and applying Markov’s inequality for the analysis, an optimal solution is found with high probability so we have the following Corollary.
Corollary 3
If the OnePSO is repeated \(\lambda \cdot \log _2(n)\) times but each repetition is terminated after \(2\cdot T(n)\) iterations, where T(n) is the upper bound on the expected number of iterations to find the optimum specified in Theorems 10and 11with suitable constant factor, then OnePSO finds the optimal solution with high probability.
5.2 Lower bounds via indistinguishable states
In this section we will provide lower bounds on the expected optimization time of OnePSO that almost match our upper bounds given in Sect. 5.1. We will use the Markov model from Sect. 3 to obtain these lower bounds. The main difference to the previous section is that we restrict our attention to the last improvement of the attractor, which dominates the runtime, both for sorting and OneMax. We will introduce the useful notion of indistinguishability of certain states of a Markov chain. Note that our lower bounds are significantly improved compared to the conference version (Mühlenthaler et al. 2017) by using the newly introduced boundsbyintegration from Sect. 4.4.
5.2.1 Indistinguishable states
We now introduce a notion of indistinguishability of certain states of a Markov chain already presented in Mühlenthaler et al. (2017). We will later use this notion to prove lower bounds on the expected optimization time of OnePSO for sorting and OneMax as follows: We show that the optimum is contained in a set \({\hat{Y}}\) of indistinguishable states. Therefore, in expectation, the states \({\hat{Y}}\) have to be visited \(\varOmega ({\hat{Y}})\) times to hit the optimum with positive constant probability.
Definition 2
(Indistinguishable states) Let M be a Markov process with a finite set Y of states and let \({\hat{Y}}\subseteq Y\). Furthermore, let \((Z_i)_{i\ge 0}\) be the sequence of visited states of M and let \(T=\min \lbrace t>0\mid Z_t\in {\hat{Y}}\rbrace \). Then \({\hat{Y}}\) is called indistinguishable with respect to M if

1.
the initial state \(Z_0\) is uniformly distributed over \({\hat{Y}}\), i. e., for all \(y\in Y\):
$$\begin{aligned} \Pr [Z_0=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert = {\left\{ \begin{array}{ll} 1/\vert {\hat{Y}}\vert &{} \text {if }y\in {\hat{Y}}\\ 0 &{} \text {if }y\not \in {\hat{Y}} . \end{array}\right. } \end{aligned}$$ 
2.
and the probabilities to reach states in \({\hat{Y}}\) from states in \({\hat{Y}}\) are symmetric, i. e., for all \(y_1,y_2\in {\hat{Y}}\):
$$\begin{aligned} \Pr [Z_T=y_2\mid Z_0=y_1]=\Pr [Z_T=y_1\mid Z_0=y_2] . \end{aligned}$$
Now we can prove a lower bound on the expected time for finding a specific state.
Theorem 12
Let M be a Markov process as in Definition 2and let \({\hat{Y}}\) be indistinguishable with respect to M. Let h(M) be a positive real value such that \(\mathrm{E}[T]\ge h(M)\), then the expected time to reach a fixed \(y\in {\hat{Y}}\) is bounded below by \(h(M)\cdot \Omega (\vert {\hat{Y}}\vert )\).
Proof
Let \(T_i\) be the stopping time when \({\hat{Y}}\) is visited the ith time.
With Statement 1 of Definition 2\(Z_0\) is uniformly distributed over \({\hat{Y}}\). Therefore \(T_1=0\) and \(T_2=T\). Statement 2 of Definition 2 implies that \(\Pr [Z_{T_i}=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert \) for all \(i\ge 1\) by the following induction. The base case for \(i=1\) and \(T_i=0\) is ensured by the Statement 1 of Definition 2. The induction hypothesis is \(\Pr [Z_{T_{i1}}=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert \). The inductive step is verified by the following series of equations.
It follows that for all \(i>0\) the difference \(T_{i+1}T_i\) of two consecutive stopping times has the same distribution as T and also
Now let \(y\in {\hat{Y}}\) be fixed. The probability that y is not reached within the first \(T_{\lfloor \vert {\hat{Y}}\vert /2\rfloor 1}\) steps is bounded from below through union bound by
and therefore the expected time to reach the fixed \(y\in {\hat{Y}}\) is bounded from below by
\(\square \)
5.2.2 Lower bounds on the expected optimization time for sorting
In this section we consider the sorting problem. Our first goal is to provide lower bounds on the expected return time to the attractor for the parameter choice \(c\in (0,\frac{1}{2})\).
Lemma 4
Let \(c\in (0,\frac{1}{2})\). For the sorting problem on n items, assume that the attractor has transposition distance one to the identity permutation. Then the expected return time \(H_1\) to the attractor is bounded from below by \(\Omega (\alpha (c)^n)\), where
See Fig. 5 for a visualization of \(\alpha (c)\).
Proof
The probability of decreasing the distance to the attractor in state \(S_i\) can be bounded from above by
We increase all indices by one such that \({{\tilde{p}}}_i=p_{i1}\) such that we have n states again. Please note that \(H_2=\varOmega (H_1)\). This can be obtained by the following equations while using Eq. 2 and the fact that \(H_1\ge \frac{1}{1c}\) is true in this case
We use Theorem 8 to get a lower bound on \(H_1\) by using \(p(i)=c+(1c)\cdot \frac{i^2}{n^2}\). Here \(k=n\cdot \sqrt{\frac{12c}{2(1c)}}\) maximizes the integral, because it solves the equation \(p(k)=\frac{1}{2}\). An application of Theorem 8 supplies
In the following we calculate the exact value of this integral. The integrand can be converted to the expression
The indefinite integral of \(\ln (1x^2)\) is
It can be evaluated for values \(x\in [0,1[\), but this is fine as \(0\le k/n<1\). Furthermore the indefinite integral of \(\ln \left( \frac{c}{1c}+x^2\right) \) is
which can be evaluated for all values, because \(\frac{c}{1c}\) is positive. The indefinite integral of the whole expression is obtained by the subtraction of both
and evaluation of the bounds \(k/n=\sqrt{\frac{12c}{2(1c)}}\) and 0 results in
An application of the \(\exp \) function on this result gives the claimed lower bound. \(\square \)
This lower bound is the best possible bound which can be achieved with this model as the probability \(p_i = c+(1c)\cdot {\left( {\begin{array}{c}i+1\\ 2\end{array}}\right) }/{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) actually appears at distance i if the permutation transforming the current position to the attractor consists of one cycle of length \(i+1\) and the remaining permutation consists of singleton cycles. For this improvement probability the bound is \(\varTheta ^*(\alpha (c)^n)\).
The following theorem supplies lower bounds on the expected optimization time of OnePSO on the sorting problem.
Theorem 13
The expected optimization time \(T_{{\mathrm{sort}}}(n)\) of the OnePSO sorting n items is bounded from below by
Proof
The situation where already the initial position is the optimum has probability 1/n!. As \(11/n!>1/2\) for \(n\ge 2\) we have the same \(\varOmega \) bound if we ignore this case. In all other cases we can consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. Without loss of generality, we assume that the attractor is the identity permutation and the optimum is the transposition \((0\,1)\). The number of steps required for the next (hence final) improvement of the attractor is a lower bound on the expected optimization time for the OnePSO. We determine a lower bound on this number for various choices of c.
For all \(c \in (0,1]\) we apply Theorem 12. We use all permutations as set of states Y in the Markov process M. Let \({\hat{Y}}=X_1\) be the subset of states which are a single swap away from the attractor. Therefore the optimal solution is contained in \({\hat{Y}}\), but up to the point when the OnePSO reaches the optimal solution it is indistinguishable from all other permutations in \({\hat{Y}}\). We will immediately prove that \({\hat{Y}}\) is actually indistinguishable with respect to M. Initially the particle is situated on the attractor and after a single step it is situated at a permutation in \({\hat{Y}}\), where each permutation has equal probability. We use the permutation after the first step as the initial state of the Markov process \(Z_0\) and all other \(Z_i\) are the successive permutations. Therefore Statement 1 of Definition 2 is fulfilled. Let \(T=\min \lbrace t>0\mid Z_t\in {\hat{Y}}\rbrace \) the stopping time of Theorem 12. For each sequence of states \(Z_0,\ldots ,Z_T\) there is a one to one mapping to a sequence \({{\tilde{Z}}}_0=Z_T,{{\tilde{Z}}}_1,\ldots ,{{\tilde{Z}}}_{T1},{{\tilde{Z}}}_{T}=Z_0\) which has equal probability to appear. The sequence \({{\tilde{Z}}}_0,\ldots ,\tilde{Z}_T\) is not the reversed sequence, because the forced steps would then lead to the wrong direction, but the sequence can be obtained by renaming the permutation indices. The renaming is possible because the permutations \(Z_0\) and \(Z_T\) are both single swaps. As this one to one mapping exists also the Statement 2 of Definition 2 is fulfilled. Finally we need a bound on the expectation of T. If we are in \(X_1={\hat{Y}}\) we can either go to the attractor by a forced move or random move and return to \(X_1\) in the next step or we can go to \(X_2\) by a random move and return to \(X_1\) in expectation after \(H_2\) steps. We have \(\mathrm{E}[T]=\left( c+(1c)/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \right) \cdot 2 + (1c)\cdot \left( 11/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \right) (1+H_2)=\varOmega (H_2)=:h(M)\). Theorem 12 provides the lower bound \(\Omega (\vert {\hat{Y}}\vert \cdot H_2)\) for the runtime to find the fixed permutation \((0,1)\in {\hat{Y}}\) which is the optimal solution. From Eq. 2 we get \(H_2=(p_1\cdot H_11)/(1p_1)\ge (c\cdot H_11)/(1c)\). As \(H_1=\Omega (n^{2/3})\) for \(c=\frac{1}{2}\) (see Theorem 7) and \(H_1=\Omega (\alpha (c)^n)\) for \(c\in (0,\frac{1}{2})\) (see Lemma 4) also \(H_2=\Omega (H_1)\) for \(c\in (0,\frac{1}{2}]\) which results in the lower bounds \(T_{{\mathrm{sort}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (\left( {\begin{array}{c}n\\ 2\end{array}}\right) \cdot n^{2/3})=\Omega (n^{8/3})\) for \(c=\frac{1}{2}\) and \(T_{{\mathrm{sort}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (\left( {\begin{array}{c}n\\ 2\end{array}}\right) \cdot \alpha (c)^n)=\Omega (n^2\cdot \alpha (c)^n)\) for \(c\in (0,\frac{1}{2})\). Trivially the return time to \(X_1\) in M can be bounded by 2, which results in the lower bound \(T_{\mathrm{sort}}(n)=\Omega (n^2)\) for the case \(c\in (\frac{1}{2},1]\).
The lower bound for \(c=0\) can be derived directly from the indistinguishability property: Let \({\hat{Y}} = Y\). It is readily verified that the initial state is uniformly distributed over \({\hat{Y}}\). Furthermore, any \({\hat{Y}}\)\({\hat{Y}}\)path can be reversed and has the same probability to occur. Therefore, Condition 2 of Definition 2 is satisfied and the lower runtime bound follows from Theorem 12 by choosing \(h(M) = 1\). \(\square \)
Beside that formally proved lower bounds we conjecture the following lower bounds on the expected optimization time of OnePSO for sorting n items.
Conjecture 2
Note that these lower bounds differ from our upper bounds given in Theorem 10 only by a \(\log \)factor. Evidence supporting this conjecture is given in “Appendix B”. We obtain our theoretical evidence by considering the average probability to move towards the attractor, instead of upper and lower bounds as before.
5.2.3 Lower bounds on the expected optimization time for OneMax
First we provide a lower bound on the expected return time to the attractor.
Lemma 5
Let \(c\in (0,\frac{1}{2})\). For OneMax, assume that the attractor has Hamming distance one to the optimum \(1^n\). Then the expected return time \(H_1\) to the attractor is bounded from below by \(H_1=\Omega (\beta (c)^n)\), where
See Fig. 5 for a visualization of \(\beta (c)\).
Proof
We use Theorem 8. The value \(\beta (c)\) is already calculated in Theorem 11 Eq. 10. \(\square \)
This result enables us to prove lower bounds on \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\).
Theorem 14
The expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\) of the OnePSO for solving OneMax is bounded from below by
Proof
First, let \(c \in (\frac{1}{2}, 1]\). Then, with probability at least \(\frac{1}{2}\), the initial solution contains at least \(k = \lfloor n/2 \rfloor = \varOmega (n)\) zeros. Each zero is flipped to one with probability 1/n in a random move, and none of the k entries is set to one in a move towards the attractor. The expected time required to sample the k distinct bit flips is bounded from below by the expected time it takes to obtain all coupons in the following instance of the coupon collector’s problem: there are k coupons and each coupon is drawn independently with probability 1/k. The expected time to obtain all coupons is \(\varOmega (k \log k)\) (Mitzenmacher and Upfal 2005, Sect. 5.4.1). It follows that the expected optimization time is \(\varOmega (n \log n)\) as claimed.
For \(c \in (0,\frac{1}{2}]\) we use the same approach as in the proof of Theorem . Also here the event that the initial solution is optimal can be ignored. Consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. We use the set of all bit strings as set of states Y in the Markov process M. Let \({\hat{Y}}=X_1\) the subset of bit strings which is a single bit flip away from the attractor, hence \({\hat{Y}}\) contains the optimum. \(Z_i\) and T are instantiated as in the proof of Theorem . Therefore Statement 1 of Definition 2 is fulfilled. Again for each sequence of states \(Z_0,\ldots ,Z_T\) we have a one to one mapping to a sequence \(\tilde{Z}_0=Z_T,{{\tilde{Z}}}_1,\ldots ,{{\tilde{Z}}}_{T1},{{\tilde{Z}}}_{T}=Z_0\) which has equal probability to appear. This sequence is again obtained by renaming the indices plus some bit changes according to the shape of the attractor. Hence also Statement 2 of Definition 2 is fulfilled. Hence \({\hat{Y}}\) is indistinguishable with respect to M. We obtain \(\mathrm{E}[T]=\Omega (H_2)=:h(M)\). Theorem 12 provides the lower bound \(\Omega (\vert {\hat{Y}}\vert \cdot H_2)\) for the runtime to find the optimal solution. From Eq. (2) we get \(H_2\ge (c\cdot H_11)/(1c)\). As \(H_1=\Omega (n^{1/2})\) for \(c=\frac{1}{2}\) (see Theorem 6) and \(H_1=\Omega (\beta (c)^n)\) for \(c\in (0,\frac{1}{2})\) (see Lemma 5) also \(H_2=\Omega (H_1)\) for \(c\in (0,\frac{1}{2}]\) which results in the lower bounds \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (n\cdot n^{1/2})=\Omega (n^{3/2})\) for \(c=\frac{1}{2}\) and \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (n\cdot \beta (c)^n)=\Omega (n\cdot \beta (c)^n)\) for \(c\in (0,\frac{1}{2})\).
Again the lower bound for \(c=0\) can be obtained by the indistinguishability property. The proof for this case is identical to the corresponding part of Theorem . \(\square \)
Finally all runtime bounds claimed in Table 1 are justified and for this purpose all of the presented tools in Sect. 4 are used.
5.3 Bounds on the expected optimization time for DPSO
The upper bounds on the runtime of OnePSO in Theorem 10 and Theorem 11 directly imply upper bounds for DPSO. Recall that we denote by c the parameter of OnePSO and by \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}} (n)\) and \(T_{{\mathrm{sort}}}(n)\) the expected optimization time of OnePSO for OneMax and sorting, respectively.
Corollary 4
Let \(T'_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) and \(T'_{{\mathrm{sort}}}(n)\) be the expected optimization time of DPSO for OneMax and sorting, respectively. If \(c = c_{glob}\), then \(T'_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n) = O(P\cdot T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n))\) and \(T'_{{\mathrm{sort}}}(n) =O(P\cdot T_{{\mathrm{sort}}}(n))\), where P is the number of particles.
Proof
In each trial to improve the value of the global attractor at least the particle which updated the global attractor has its local attractor at the same position as the global attractor. This particle behaves exactly like the single particle in OnePSO until the global attractor is improved. Therefore we have at most P times more objective function evaluations than OnePSO, where P is the number of particles. \(\square \)
One can refine this result by again looking on return times to an attractor.
If the global attractor equals the local attractor then this particle performs the same steps as all other particles having equal attractors. As all those particles perform optimization in parallel in expectation no additional objective function evaluations are made compared to the OnePSO.
For particles where the global attractor and the local attractor differ we can use previous arguments applied to the local attractor. With two different attractors alternating movements to the local and global attractor can cancel each other out. Therefore if (only) \(c_{loc}\) is fixed then for the worst case we can assume only \(c_{loc}\) as probability of moving towards the local attractor and \(1c_{loc}\) as probability of moving away from the local attractor. This enables us to use Theorem 5 to calculate the expected time to reduce the distance to an attractor from one to zero. We denote the return time from Theorem 5 as \(\varPsi (n,p)\)
If the position equals the local attractor and consequently differs from the global attractor the probability for improving the local attractor can be bounded from below by a positive constant. E.g., for the problem OneMax this constant is \(c_{glob}/2\) because for a move towards the global attractor for at least half of the differing bits the value of the global attractor equals the value of the optimal solution as the global attractor is at least as close to the optimum as the local attractor. Therefore the number of trials until the local attractor is improved is constant. As such an update occurs at most once for each particle and fitness level we obtain an additional summand of \(O(\varPsi (n,c_{loc})\cdot P\cdot n)\) instead of the factor P for the problems OneMax and the sorting problem.
In contrast to the upper bounds, the lower bounds for OnePSO do not apply for DPSO for the following reason. The bottleneck used for the analysis of OnePSO is the very last improvement step. However, DPSO may be faster at finding the last improvement because it may happen that the local and global attractor of a particle have both distance one to the optimum but are not equal. In this case, as described above, there is a constant probability of moving to the optimum if a particle is at one of the two attractors whereas for OnePSO the probability of moving towards the optimum if the particle is located at the attractor tends to zero for large n.
An analysis of experiments of DPSO with small number of particles and OnePSO applied to the sorting problem and OneMax revealed only a small increase in the optimization time of DPSO compared to OnePSO. This increase is way smaller than the factor P. For some parameter constellations also a significant decrease of the optimization time of DPSO compared to OnePSO is achieved.
5.4 Lower bounds for pseudoBoolean functions
Also for general pseudoBoolean functions \(f:\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}} \) we can prove lower bounds on the expected optimization time.
Theorem 15
If \(P=\varTheta (1)\), where P is the number of particles, then the expected optimization time of DPSO optimizing pseudoBoolean functions \((\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}} \)) with a unique optimal position is in \(\varOmega (n\log (n))\).
Proof
If there are \(P=\varTheta (1)\) particles, then in expectation there are \(n/2^P=\varOmega (n)\) bits such that there is no particle where this bit of the optimal position equals the corresponding bit of the initial position. The expected optimization time is therefore bounded by the time that each such bit is flipped in a random move at least once. This subproblem corresponds to a coupon collectors problem and therefore we have the claimed lower bound of \(\varOmega (n\log (n))\). \(\square \)
For larger values of P we obtain an even higher lower bound by the following theorem.
Theorem 16
If \(P=O(n^k)\), where P is the number of particles and k is an arbitrary nonnegative real value, then the expected optimization time of DPSO optimizing pseudoBoolean functions \((\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}})\) with a unique optimal position is in \(\varOmega (n\cdot P)\).
Proof
To bound the probability to be at least some distance apart from the attractor after initialization we can use Chernoff bounds. For a fixed particle we can define
Therefore \(Y=\sum _{i=1}^n Y_i\) is exactly the initial distance of the fixed particle to the unique optimal position. For each i we have that \(\Pr [Y_i=1]=\frac{1}{2}\) and \(\mathrm{E}[Y]=\frac{n}{2}\). By Chernoff bounds we obtain the lower bound
The probability that the initial position of all P particles have distance at least \(\frac{n}{4}\) to the unique optimal position is the \(P\hbox {th}\) power of this probability and can be bounded from below for large n by
Please note that one can choose such an n as \(P=O(n^k)\) and \(16\cdot \ln (2\cdot {\mathrm{poly}}(n))=o(n)\) for any polynomial. If the distance of the positions of all particles is at least \(\frac{n}{4}\) then it takes at least \(\frac{n}{4}\) iterations until the optimal position can be reached as the distance can change only by one in each iteration. For each iteration P evaluations of the objective function are performed. Therefore we have at least \(\frac{n\cdot P}{4}\) objective function evaluations with probability at least \(\frac{1}{2}\) for large n which results in the claimed optimization time of \(\varOmega (n\cdot P)\). \(\square \)
This means if we choose , e.g., \(P=n^{10}\) we would have at least \(\varOmega (n^{11})\) function evaluations in expectation.
6 Conclusion
We propose a simple and general adaptation of the PSO algorithm for a broad class of discrete optimization problems. For one particle, we provide upper and lower bounds on its expected optimization time for the sorting problem and OneMax and generalize the upper bounds to DPSO with arbitrary number of particles and we also prove lower bounds of DPSO optimizing pseudoBoolean functions. Depending on the parameter c, which is the probability of moving towards the attractor, the expected optimization time may be polynomial (\(c \ge 1/2\)) and exponential (\(c < 1/2\)), resp. The cornerstone of our analysis are \(\varTheta \)bounds on the expected time it takes until the PSO returns to the attractor. Our analysis also provides the variance of this value. We analyze Markov chains and provide tools to evaluate expected return times for certain classes of transition probabilities. Additionally we establish a useful general property of indistinguishability of a Markov process for obtaining lower bounds on the expected first hitting time of a special state. Application of the presented tools on other Markov chains, often appearing in the analysis of randomized algorithms, would obviously be possible.
For future work, it would be interesting to see if the upper and lower bounds on the expected optimization time for OneMax given in Theorems 11 and 14 are valid for any linear function \(f : \{0,1\}^n \rightarrow {\mathbb {R}} \), \(f(x_1,x_2,\ldots ,x_n) = \sum _i w_i x_i\). Furthermore, we conjecture that the upper bounds on the sorting problem for \(c=0\) is n! and that the other proved upper bounds on the sorting problem are tight. Another direction for future work is to apply our technical tools to other metaheuristics. In particular, our tools may be useful in the analysis of “nonelitist” metaheuristics, for instance the Strong Selection Weak Mutation (SSWM) evolutionary regime introduced in Gillespie (1983) as an example of nonelitist algorithm.
Finally, it would be interesting to determine the return time to the state \(S_0\) in a more general Markov model \({{\mathscr {M}}} ((p_i)_{1 \le i \le n})\), where \(p_i = 1/2 + z(i, n)\) such that \(z(i, n) = {\text {poly}}(i) / {\text {poly}}(n)\), where the degrees of the polynomials differ, and z(i, n) is nondecreasing for \(1 \le i \le n\). This would generalize Theorems 6 and 7, and shed some light on the relation between z(i, n) and the return time to state \(S_0\). Here, we conjecture that for z(i, n) as defined above the return time is in \({\text {poly}}(n)\). Finally a proof for the claimed upper bound of O(n!) on the expected time to reach a specified permutation in the graph of permutations by an actual random walk searching for the optimum would be beneficial. To the best of our knowledge no proof exists so far.
Notes
Note that a different definition of “transposition” is used in computational biology, so “transposition distance” has a different meaning, e.g., in Bafna and Pevzner (1998).
Similar as \((n+1)/n\) has the limit one and there is no n such that \((n+1)/n\le 1\) but for any \(\varepsilon \in (0,1)\) and large n we have \((1\varepsilon )(n+1)/n\le 1\).
References
Antipov D, Doerr B, Fang J, Hetet T (2018) A tight runtime analysis for the (\(\mu \) + \(\lambda \)) EA. In: Proc of the genetic and evolutionary computation conference (GECCO), pp 1459–1466. https://doi.org/10.1145/3205455.3205627
Antipov D, Doerr B, Yang Q (2019) The efficiency threshold for the offspring population size of the (\(\mu \), \(\lambda \)) EA. In: Proc. of the genetic and evolutionary computation conference (GECCO), pp 1461–1469. https://doi.org/10.1145/3321707.3321838
Bafna V, Pevzner PA (1998) Sorting by transpositions. SIAM J Discrete Math 11(2):224–240. https://doi.org/10.1137/S089548019528280X
Doerr B, Neumann F (2020) Theory of evolutionary computation—recent developments in discrete optimization. Springer. https://doi.org/10.1007/9783030294144
Doerr B, Neumann F, Sudholt D, Witt C (2007) On the runtime analysis of the 1ANT ACO algorithm. In: Proc. 9th ACM genetic and evolutionary computation conference (GECCO), pp 33–40. https://doi.org/10.1145/1276958.1276964
Droste S, Jansen T, Wegener I (2001) Dynamic parameter control in simple evolutionary algorithms. In: Proc. 6th workshop on foundations of genetic algorithms (FOGA), pp 275–294. https://doi.org/10.1016/B9781558607347/500986
Droste S, Jansen T, Wegener I (2002) On the analysis of the (1+1) evolutionary algorithm. Theor Comput Sci 276(1):51–81. https://doi.org/10.1016/S03043975(01)001827
Durrett R (2010) Probability: theory and examples. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press. https://doi.org/10.1017/9781108591034
Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proc. 6th international symposium on micro machine and human science, pp 39–43. https://doi.org/10.1109/MHS.1995.494215
Garnier J, Kallel L, Schoenauer M (1999) Rigorous hitting times for binary mutations. Evol Comput 7(2):173–203. https://doi.org/10.1162/evco.1999.7.2.173
Giel O, Wegener I (2003) Evolutionary algorithms and the maximum matching problem. In: Proc. 20th symp. on theoretical aspects of computer science (STACS), pp 415–426. https://doi.org/10.1007/3540364943_37
Gillespie JH (1983) Some properties of finite populations experiencing strong selection and weak mutation. Am Nat 121(5):691–708. https://doi.org/10.1086/284095
Graham RL, Knuth DE, Patashnik O (1994) Concrete mathematics: a foundation for computer science, 2nd edn. AddisonWesley Longman
Hirschhorn MD (2015) Wallis’s product and the central binomial coefficient. Am Math Mon 122(7):689. https://doi.org/10.4169/amer.math.monthly.122.7.689
Hoffmann M, Mühlenthaler M, Helwig S, Wanka R (2011) Discrete particle swarm optimization for TSP: theoretical results and experimental evaluations. In: Proc. 2nd int. conf. on adaptive and intelligent systems (ICAIS), pp 416–427. https://doi.org/10.1007/9783642238574_40
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proc. IEEE international conference on neural networks, vol 4, pp 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proc. IEEE int. conf. on systems, man, and cybernetics, vol 5, pp 4104–4108. https://doi.org/10.1109/ICSMC.1997.637339
Kötzing T, Krejca MS (2018) Firsthitting times for finite state spaces. In: Auger A, Fonseca CM, Lourenço N, Machado P, Paquete L, Whitley D (eds) Parallel problem solving from nature—PPSN XV. Springer International Publishing, Cham, pp 79–91. https://doi.org/10.1007/9783319992594_7
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10.1063/1.1699114
Mitzenmacher M, Upfal E (2005) Probability and computing. Cambridge University Press. https://doi.org/10.1017/CBO9780511813603
Mühlenthaler M, Raß A, Schmitt M, Siegling A, Wanka R (2017) Runtime analysis of a discrete particle swarm optimization algorithm on sorting and OneMax. In: Proc. 14th ACM/sigevo workshop on foundations of genetic algorithms (FOGA), pp 13–24. https://doi.org/10.1145/3040718.3040721
Neumann F, Witt C (2007) Runtime analysis of a simple ant colony optimization algorithm. Algorithmica 54(2):243–255. https://doi.org/10.1007/s0045300791342
Onwunalu JE, Durlofsky LJ (2010) Application of a particle swarm optimization algorithm for determining optimum well location and type. Comput Geosci 14:183–198. https://doi.org/10.1007/s1059600991421
Papadimitriou CH (1991) On selecting a satisfying truth assignment. In: Proc. 32nd IEEE symp. on foundations of computer science (FOCS), pp 163–169. https://doi.org/10.1109/SFCS.1991.185365
Papadimitriou CH, Schäffer AA, Yannakakis M (1990) On the complexity of local search. In: Proc. 22nd ACM symposium on theory of computing (STOC), pp 438–445. https://doi.org/10.1145/100216.100274
Petkovšek M (1992) Hypergeometric solutions of linear recurrences with polynomial coefficients. J Symb Comput 14(2):243–264. https://doi.org/10.1016/07477171(92)900386
Ramanathan K, Periasamy VM, Pushpavanam M, Natarajan U (2009) Particle swarm optimisation of hardness in nickel diamond electro composites. Arch Comput Mater Sci Surf Eng 1:232–236
Raß A, Schreiner J, Wanka R (2019) Runtime analysis of discrete particle swarm optimization applied to shortest paths computation. Evol Comput Comb Optim (EvoCOP), pp 115–130. https://doi.org/10.1007/9783030167110_8
Scharnow J, Tinnefeld K, Wegener I (2004) The analysis of evolutionary algorithms on sorting and shortest paths problems. J Math Model Algorithms 3(4):349–366. https://doi.org/10.1023/B:JMMA.0000049379.14872.f5
Schmitt M, Wanka R (2015) Particle swarm optimization almost surely finds local optima. Theor Comput Sci 561, Part A:57 – 72. https://doi.org/10.1016/j.tcs.2014.05.017
Schöning U (1999) A probabilistic algorithm for \(k\)SAT and constraint satisfaction problems. In: Proc. 40th IEEE symp. on foundations of computer science (FOCS), pp 410–414. https://doi.org/10.1109/SFFCS.1999.814612
Schwab L, Schmitt M, Wanka R (2015) Multimodal medical image registration using particle swarm optimization with influence of the data’s initial orientation. In: Proc. 12th IEEE conf. on computational intelligence in bioinformatics and computational biology (CIBCB), pp 403–410. https://doi.org/10.1109/CIBCB.2015.7300314
Sudholt D (2013) A new method for lower bounds on the running time of evolutionary algorithms. IEEE Trans Evol Comput 17(3):418–435. https://doi.org/10.1109/TEVC.2012.2202241
Sudholt D, Thyssen C (2012) Running time analysis of ant colony optimization for shortest path problems. J Discrete Algorithms 10:165–180. https://doi.org/10.1016/j.jda.2011.06.002
Sudholt D, Witt C (2008) Runtime analysis of binary PSO. In: Proc. 10th ACM genetic and evolutionary computation conf. (GECCO), pp 135–142. https://doi.org/10.1145/1389095.1389114
Sudholt D, Witt C (2010) Runtime analysis of a binary particle swarm optimizer. Theor Comput Sci 411(21):2084–2100. https://doi.org/10.1016/j.tcs.2010.03.002
Veeramachaneni K, Osadciw L, Kamath G (2007) Probabilistically driven particle swarms for optimization of multi valued discrete problems: design and analysis. In: Proc. IEEE swarm intelligence symposium (SIS), pp 141–149. https://doi.org/10.1109/SIS.2007.368038
Wachowiak MP, Smolíková R, Zheng Y, Zurada JM, Elmaghraby AS (2004) An approach to multimodal biomedical image registration utilizing particle swarm optimization. IEEE Trans Evol Comput 8:289–301. https://doi.org/10.1109/TEVC.2004.826068
Wegener I (2002) Methods for the analysis of evolutionary algorithms on pseudoboolean functions. In: Sarker R, Mohammadian M, Yao X (eds) Evolutionary optimization. Springer, Chap 14, pp 349–369. https://doi.org/10.1007/0306480417_14
Yang Q, Wu J, Li Y, Li W, Wang L, Yang Y (2017) Using the particle swarm optimization algorithm to calibrate the parameters relating to the turbulent flux in the surface layer in the source region of the Yellow River. Agric For Meteorol 232:606–622. https://doi.org/10.1016/j.agrformet.2016.10.019
Acknowledgements
We would like to thank the anonymous referees for valuable remarks.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is the significantly extended version of Mühlenthaler et al. (2017) published in the proceedings of the 14th ACM/SIGEVO Workshop on Foundations of Genetic Algorithms (FOGA), 2017.
Appendices
Appendix A: Evidence for Conjecture 1
In this section we provide computational evidence for Conjecture 1. To this end we compute exact values for \(T_{{\mathrm{sort}}}(n)\) for \(n \le 40\). For the calculation of \(T_{{\mathrm{sort}}}(n)\) for small n a system of linear equations similar to Eq. 1 is used. Let \(\tau _0\) be the optimal permutation (say, the identity) then
This simple approach works only for very small n since one variable for each permutation is used. Our results are based on the following insight: for each permutation \(\tau \) we examine the permutation \(\nu \) such that \(\tau \circ \nu =\tau _0\). Since the value \(h_\tau \) is equal for all permutations with the same cycle lengths of the cycle decomposition of \(\nu \) the number of variables in the system of linear equations can be reduced to the number of integer partitions of n, where n is the number of items to sort. Hence, for \(n = 40\), we have reduced the number of variables from 40! to 37, 338, which is a manageable number. \(T_{{\mathrm{sort}}}(n)\) is then just a linear combination of the calculated values.
Side note: The transition probabilities as well as the system of linear equations can be represented by a matrix. For elitist algorithms the matrix can be transformed to an uppertriangular matrix and therefore it is much easier to analyze them. Such a transformation to an uppertriangular matrix is not possible in our situation.
In Fig. 6 we can see the search space of the sorting problem for four items partitioned by their cycle lengths. This results in states which are represented by the integer partitions of \(n=4\). The complete search space with all permutations with \(n=4\) items is already visualized in Fig. 2. In Fig. 6 each state is labeled by the cycle lengths of the current permutation and can represent different permutations. (1, 1, 1, 1) is only a single permutation—the identity permutation 1234—where each cycle is a singleton cycle—a cycle with length one. All neighboring permutations of (1, 1, 1, 1) contain two swapped items and two items which stay at their position. Therefore the cycle lengths are (2, 1, 1) and there are six permutations with these cycle lengths. The neighbors of permutations with cycle lengths (2, 1, 1) can have cycle lengths (1, 1, 1, 1), (3, 1) or (2, 2). Out of the six possible exchange operations one splits the cycle of length two into two cycles of length one, one exchange operation merges the two singleton cycles to one cycle of length two getting two cycles of length two and the remaining four exchange operations merge the cycle of length two with a singleton cycle getting cycle lengths (3, 1). The respective transition probabilities for a random walk are also visualized in Fig. 6. Furthermore, there are three permutations with cycle lengths (2, 2)  the permutations 2143, 3412 and 4321. The remaining eight permutations of the third column in Fig. 2 have cycle lengths (3, 1) and the six permutations in the last column in Fig. 2 have only a single cycle of length four. The values of \(h_\tau \) satisfying Eq. 11 are \(h_{(1,1,1,1)}=0\), \(h_{2,1,1}=23\), \(h_{(2,2)}=27\), \(h_{(3,1)}=\frac{105}{4}\), \(h_{(4)}=\frac{55}{2}\) and then we have \(T_{{\mathrm{sort}}}(4)=\frac{1}{4!}\cdot (h_{(1,1,1,1)}+6\cdot h_{2,1,1}+3\cdot h_{(2,2)}+6\cdot h_{(3,1)}+6\cdot h_{(4)})=\frac{99}{4}=4!+\frac{3}{4}\).
Please note that this value does not rely on experiments. Instead the evaluations result in the exact expected optimization time.
In Fig. 7 the exact value of \(T_{{\mathrm{sort}}}(n)\) divided by n! tends to one. Proving that this holds for large n implies that \(T_{{\mathrm{sort}}}(n) =\Theta (n!)\) and also Conjecture 1.
Appendix B: Evidence for Conjecture 2
For the sorting problem, the lower and upper bound on the return time to the attractor have the following gaps. For \(0<c<1/2\) the expected number of iterations to return to the attractor vary from \(\varOmega ^{*}(\alpha (c)^n)\) to \(O^*\left( \left( (1c)/{c} \right) ^n \right) \) and for \(c=1/2\) these values vary from \(\varOmega (n^{2/3})\) to O(n). We provide a simplified Markov model based on an averaging argument. We conjecture that the simplified model and the actual model are asymptotically equivalent.
To improve the understanding of the search space of permutations we will approximate the improvement probabilities to obtain approximations of the expected return time to the attractor. For this purpose, instead of using upper and lower bounds on the probability to move towards the attractor, we use the average value. If the probability to be in a specific permutation equals the probability to be in any other permutation with the same distance to the attractor then this approximation would also result in the exact values.
Conjecture 3
Let \(H_1\) be the expected number of iterations until the OnePSO returns to the attractor g if the current distance to the attractor is one while optimizing the sorting problem. Let \(p_x\) be the probability to move from permutation x to a permutation y such that \(d (x,g)=1+d (y,g)\). Let
be the average probability to reduce the distance to the attractor. Let \({{\hat{H}}}_1\) be the expected number of iterations in \({{\hat{M}}}={{\mathscr {M}}} ( ({{\hat{p}}}_i)_{1\le i\le n1}))\) to move from state \(S_1\) to \(S_0\) (see Definition 1). We conjecture that \(H_1\sim {{\hat{H}}}_1\).
To provide evidence we compute these average improvement probabilities and compare the number of expected iterations.
Theorem 17
The average improvement probability of moving towards the attractor while optimizing the sorting problem is
where i is the distance to the attractor and \( \left [\begin{array}{c} n \\ m \\ \end{array}\right ]\) are the unsigned Stirling numbers of the first kind.
Proof
The unsigned Stirling numbers of the first kind \( \left [\begin{array}{c} n \\ m \\ \end{array}\right ]\) represent the number of permutations of n elements with exactly m cycles which can easily be calculated by the recursive formula
W.l.o.g let the attractor be the identity permutation. Then the attractor has n singleton cycles. An increase of the distance to the attractor by one is equivalent to a decrease of the number of cycles by one. Therefore a permutation with distance i to the attractor has exactly \(ni\) cycles. This means that the number of permutations with distance i is \( \left [\begin{array}{c} n \\ ni \\ \end{array}\right ]\). The probability that a fixed item is in a cycle of length k among all permutations with distance i from the attractor is
Choosing the remaining \(i1\) items in the cycle of length k from the remaining \(n1\) items has \(\left( {\begin{array}{c}n1\\ k1\end{array}}\right) \) options. There are \((k1)!\) orderings of these items within the cycle. The remaining \(nk\) items have to be partitioned into \(ni1\) cycles which results in another factor of \( \left [\begin{array}{c} nk \\ ni1\\ \end{array}\right ]\) options. In combination with the first cycle of length k a permutation with \(ni\) cycles is achieved.
This probability does not change if we choose a random item instead of a fixed item. Furthermore the probability of moving towards the attractor is determined by the probability that a cycle is split into two cycles which happens if two items of the same cycle are picked for an exchange. If the first picked item is in a cycle of length k then the probability that the second item is in the same cycle is \(\frac{k1}{n1}\). Summing up these probabilities over all possible cycle lengths for the first picked item results in the claimed result for \({{\hat{p}}}_i\), but there also the constant c of the DPSO comes into play which forces a move towards the attractor. Please note that the maximal cycle length at distance i from the attractor is \(i+1\) which explains the upper limit of the sum. \(\square \)
By using these average probabilities of moving towards the attractor we obtain a Markov chain where it is only possible to move to state \(S_{i1}\) or \(S_{i+1}\) from state \(S_i\) (and not to any other state) in a single step. For Markov chains with this property the return times can be computed as in Sect. 4.2. The result helps us to estimate the expected return time to the attractor. If c is zero then the expected return time if we are at distance one to the attractor is exactly \(n!1\) which is also obtained exactly by the model with average probabilities.
Remark 2

1.
Assuming \(T(n)=\gamma (c)^n\cdot f(n)\) where f is a polynomial then
$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{T(n)}{T(n1)}=\gamma (c). \end{aligned}$$ 
2.
Assuming \(T(n)=f(n)\) where f is again a polynomial then
$$\begin{aligned} \lim _{n\rightarrow \infty }\log _{\frac{n}{n1}}\left( T(n)/T(n1) \right) \end{aligned}$$is the maximal degree of f(n).
Proof
Let \(f(n)=a\cdot n^b+o(n^b)\), \(b>0\).

1.
Assuming \(T(n)=\gamma (c)^n\cdot f(n)\) leads to
$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{T(n)}{T(n1)}&=\lim _{n\rightarrow \infty }\gamma (c)\frac{n^b+o(n^b)}{(n1)^b+o((n1)^b)}\\&=\lim _{n\rightarrow \infty }\gamma (c)\frac{(\frac{n}{n1})^b+o(1)}{1+o(1)} =\gamma (c) \end{aligned}$$ 
2.
Assuming \(T_{{\mathrm{sort}}}(n)=f(n)\) leads to
$$\begin{aligned} \lim _{n\rightarrow \infty }\log _{\frac{n}{n1}}\left( \frac{T(n)}{T(n1)}\right)&=\lim _{n\rightarrow \infty }\log _{\frac{n}{n1}}\left( \frac{(\frac{n}{n1})^b+o(1)}{1+o(1)}\right) \\&=\lim _{n\rightarrow \infty }\log _{\frac{n}{n1}}\left( \left( \frac{n}{n1}\right) ^b\right) =b \end{aligned}$$
\(\square \)
Let
where \(H_1\) and \(H_1'\) are the expected return times to the attractor for the sorting problem on n and \(n1\) items respectively if the attractor has transposition distance one to the current position (actual exact model).
Let additionally
where \(H_1\) and \(H_1'\) are the corresponding return times in the Markov model with average success probability specified in Theorem 17.
In Fig. 8\(q_{ex}\) and \(q_{av}\) for different values of n and the upper bound \((1c)/c\) and the lower bound \(\alpha (c)\) on the base of the exponential part of the return time for the sorting problem.
With the first part of Remark 2 we also notice that \((1c)/c\) is also an upper bound and \(\alpha (c)\) is a lower bound on the limit of \(q_{ex}\) if n tends to infinity as \(q_{ex}\) tends to the actual base of the exponential part of the expected return time.
As \(q_{ex}\) has to be calculated by a system of linear equations where the number of variables equals the number of integer partitions of n (see description after Conjecture 1) we can not evaluate \(q_{ex}\) for large n. The values of \(q_{av}\) are quite similar to the values of \(q_{ex}\) for corresponding n. For \(c=0\) the values are exactly the same and for \(n=30\) the relative error is less than 0.04. Therefore we conjecture that the limit of \(q_{av}\) if n tends to infinity is close or even equal to the limit of \(q_{ex}\). But as we can see in Fig. 8 the values of \(q_{av}\) tend to the upper bound of \((1c)/c\). We omitted the graph of \(q_{av}(10,000,c)\) as it overlaps the graph of the upper bound almost completely. So it is reasonable to conjecture that the limit of \(q_{ex}\) is close to the upper bound \((1c)/c\). If this is actually true then for all runtime results the value of \(\alpha (c)\) can be replaced by \(((1c)/c)\varepsilon \) for some small nonnegative value \(\varepsilon \) which could probably be even zero.
By using the second part of Remark 2 the limit of \(\log _{\frac{n}{n1}}q_{ex}(n,1/2)\) if n tends to infinity supplies us the exponent of the largest monomial (probably omitting logarithmic factors) of the return time if \(c=1/2\). In Fig. 9 we can see the quotients \(q_{ex}(n,1/2)\) for n up to 40 and \(q_{av}(n,1/2)\) for even larger values of n. Also here \(q_{av}\) can be used as an approximation on \(q_{ex}\) and it is reasonable to assume that the limit is one. Please note that Theorem 10 tells us that the limit \(\lim _{n\rightarrow \infty }q_{ex}(n,1/2)\le 1\). Similarly to the exponential case with \(c<1/2\) we conjecture that the actual expected value \(H_1\) is close to the proposed upper bound on \(H_1\) described in the proof of Theorem 10.
Using these results for lower bounds on the expected optimization time we would have
as specified in Conjecture 2. These bounds are only a factor of \(\log (n)\) apart from the upper bounds specified in Theorem 10.
Please note that the results in this section do not rely on experiments. Instead, exact values for the expectation are computed. Nevertheless, we have done some experiments with small values of n and the measured optimization times comply with the evaluated exact optimization times. Especially for values c close to zero, optimization times of up to n! are claimed which can not be confirmed in reasonable time for larger values of n.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mühlenthaler, M., Raß, A., Schmitt, M. et al. Exact Markov chainbased runtime analysis of a discrete particle swarm optimization algorithm on sorting and OneMax. Nat Comput 21, 651–677 (2022). https://doi.org/10.1007/s11047021098560
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047021098560