Exact Markov Chain-based Runtime Analysis of a Discrete Particle Swarm Optimization Algorithm on Sorting and OneMax

Meta-heuristics are powerful tools for solving optimization problems whose structural properties are unknown or cannot be exploited algorithmically. We propose such a meta-heuristic for a large class of optimization problems over discrete domains based on the particle swarm optimization (PSO) paradigm. We provide a comprehensive formal analysis of the performance of this algorithm on certain"easy"reference problems in a black-box setting, namely the sorting problem and the problem OneMax. In our analysis we use a Markov model of the proposed algorithm to obtain upper and lower bounds on its expected optimization time. Our bounds are essentially tight with respect to the Markov model. We show that for a suitable choice of algorithm parameters the expected optimization time is comparable to that of known algorithms and, furthermore, for other parameter regimes, the algorithm behaves less greedy and more explorative, which can be desirable in practice in order to escape local optima. Our analysis provides a precise insight on the tradeoff between optimization time and exploration. To obtain our results we introduce the notion of indistinguishability of states of a Markov chain and provide bounds on the solution of a recurrence equation with non-constant coefficients by integration.


Introduction
Meta-heuristics are very successful at finding good solutions for hard optimization problems in practice. However, due to the nature of such algorithms and the problems they are applied to, it is generally very difficult to derive performance guarantees, or to determine the number of steps it takes until an optimal solution is found. In the present work we propose a simple adaptation of the particle swarm optimization (PSO) algorithm introduced by [EK95,KE95] to optimization problems over discrete domains. Our proposed algorithm assumes very little about the problem structure and consequently, it works naturally for a large class of discrete domains. It is reasonable to expect from a meta-heuristic that it solves black-box versions of many tractable problems in expected polynomial time. We provide a formal analysis based on Markov chains and establish under which conditions our algorithm satisfies this basic requirement. More concretely, we consider two classical problems that are easy to solve in a non-black-box setting, namely the problem of sorting items by transpositions and the problem ONEMAX, which asks to maximize the number of ones in a bitstring. Our analysis gives precise information about the expected number of steps our algorithm takes in order to solve these two reference problems. Our runtime bounds are essentially tight with respect to the Markov process we use to model the behavior of the algorithm.
For practical purposes, a meta-heuristic should, in one way or another, incorporate the following two general strategies: i) find an improving solution locally (often referred to as exploitation) and ii) move to unexplored parts of the search space (often referred to as exploration). The first strategy essentially leads the algorithm to a local optimum while the second one helps the algorithm to avoid getting stuck when it is close to a local optimum. For our proposed algorithm, as for many other meta-heuristics, the tradeoff between the two strategies can be conveniently set by an algorithm parameter. Our analysis shows that there is a sharp threshold with respect to this parameter, where the expected runtime of the algorithm on the reference problems turns from polynomial to exponential. Hence, we can maximize the algorithm's ability to escape local optima while still maintaining polynomial runtime on the reference problems.
A key tool for the runtime analysis of meta-heuristics for optimization problems over discrete domains is the fitness level method pioneered by [Weg02]. The basic idea is to consider the level sets of the objective function of a problem instance and to determine the expected number of steps an algorithm takes to move to a better level set. This approach has been used extensively in the study of so-called elitist (1 + 1)-EAs [Weg02,DJW02,GW03,Sud13]. These algorithms keep a single "current" solution and update this solution only if a better one is found. Our analysis of the proposed PSO algorithm also relies on the fitness level method. However, since our algorithm also considers non-improving solutions in order to escape local optima, a much more involved analysis is required in order to determine the time it takes to move to a better fitness level.
We will refer in the following by expected optimization time to the expected number of evaluations of the objective function an algorithm performs until an optimal solution is found. Before giving a precise statement of our results we provide some background information on the PSO algorithm as well as the two reference problems we consider.

Particle Swarm Optimization
The PSO algorithm has been introduced by [EK95,KE95] and is inspired by the social interaction of bird flocks. Fields of successful application of PSO are, among many others, Biomedical Image Processing [SSW15, WSZ + 04], Geosciences [OD10], Agriculture [YWL + 17], and Materials Science [RPPN09]. In the continuous setting, it is known that the algorithm converges to a local optimum under mild assumptions [SW15]. The algorithm has been adapted to various discrete problems and several results are available, for instance for binary problems [SW10] and the traveling salesperson problem (TSP) [HMHW11]. A PSO algorithm manages a collection (called swarm) of particles. Each particle consists of an (admissible) solution together with a velocity vector. Additionally each individual particle knows the local attractor, which is the best solution found by that particle. Information between particles is shared via a common reference solution called global attractor, which is the best solution found so far by all particles. In each iteration of the algorithm, the solution of each particle is updated based on its relative position with respect to the attractors and some random perturbation. Algorithm parameters balance the influence of the attractors and the perturbation and hence give a tradeoff between the two general search strategies "exploration" and "exploitation". Although PSO has originally been proposed to solve optimization problems over a -typically rectangular -domain X ⊆ R n , several authors have adapted PSO to discrete domains. This requires a fundamental reinterpretation of the PSO movement equation because corresponding mathematical operations of a vector space are typically lacking in the discrete setting. An early discrete PSO variant is the binary PSO presented in [KE97] for optimizing over X = {0, 1} n where velocities determine probabilities such that a bit is zero or one in the next iteration. A PSO variant for optimizing over general integral domains X = {0, 1, . . ., M − 1} n , M ∈ N, has been proposed in [VOK07].

Problems and Search Spaces
In this section we briefly define the optimization problems for which we will study the performance of the proposed PSO algorithm. The problem ONEMAX asks for a binary string of length n that maximizes the function ONEMAX(x 1 , . . . , which counts the number of ones in a binary string. A more general version of this problem asks to minimize the Hamming distance to an unknown binary string of length n. The proposed algorithm works exactly the same on the more general problem since it is indifferent to the actual bit values and each bit is handled independently. Therefore, the performance of our algorithm on this more general version is equal to its performance on ONEMAX. The corresponding search space is the n-dimensional hypercube: Any binary string of length n is a (feasible) solution, and two solutions are adjacent iff they differ by exactly one bitflip. For n = 4, the search space is shown in Figure 1. More generally, a pseudo-Boolean function is any function f : {0, 1} n → R.
By the sorting problem we refer to the task of arranging n items in non-decreasing order using transpositions. An (algebraic) transposition t = (i j) is the exchange of the entries at positions i and j. Therefore, the search space is the following (undirected) graph: The vertices are the permutations on {1, 2, . . ., n} and two vertices x, y are adjacent iff there is a transposition t such that x • t = y. The objective function is the transposition distance to the identity permutation 1 . Figure 2 shows the search space for the problem of sorting items {1, 2, 3, 4} using transpositions. Any two permutations drawn in the same vertical layer have the same objective value.
The sorting problem and ONEMAX have a unique optimum. Furthermore, the value of the objective function is the distance to the unique optimal solution in the corresponding directed graph.

Our Contribution
We propose a simple adaptation of the PSO algorithm to optimization problems over discrete domains. We refer to this algorithm as D-PSO. The algorithm works naturally on a large class of discrete problems, for instance optimization problems over bitstrings, integral domains, and permutations. The general task is to optimize a function f : X → R, where X is a finite set of feasible solutions. Our assumptions on the problem structure are the following. We assume that the set X is the vertex set of a finite, strongly connected graph and for any solution x ∈ X , we can sample a neighbor of x efficiently and uniformly. The D-PSO algorithm essentially explores this graph, looking for an optimal vertex. In our analysis, we assume at first a swarm size of one as in [MRS + 17], similar to the analysis of EAs and ACO in [SW10]. We refer to the corresponding specialization of D-PSO as ONEPSO. Indeed, for a single particle we have only a single attractor and, as a consequence, a single parameter is sufficient to control the tradeoff between the moving towards the attractor and performing a random perturbation. Our main results are upper and lower bounds on the expected optimization time of the proposed ONEPSO algorithm for solving the sorting problem and ONEMAX in a black-box setting summarized in Table 1. Certainly there are faster algorithms for the sorting problem or ONEMAX in a non-black-box setting, e. g., quicksort for the sorting problem. The upper bounds we prove for ONEPSO naturally hold for D-PSO and the bounds are tight with respect to our Markov model. The algorithm parameter c determines the probability of making a move towards the attractor. Depending on the parameter c, we obtain a complete classification of the expected optimization time of ONEPSO. For c = 0, ONEPSO performs a random walk on the search space, and for c = 1, the algorithm behaves like randomized local search (see [PSY90] or [DN20] for results on local search variants). For c ∈ (1/2, 1) the ONEPSO behaves essentially like the (1 + 1)-EA variants from [DJW02,STW04], since (1 + 1)-EA variants perform in expectation a constant number of elementary mutations to obtain an improved solution and ONEPSO with c ∈ (1/2, 1) in expectation also performs a constant number of elementary mutations to find an improved solution, before it returns to the current best solution. Therefore, ONEPSO in a sense generalizes the (1 + 1)-EA algorithm since a parameter choice in c ∈ [0, 1] supplies a broader range of behavior options than exploring solutions which are in expectation a constant number of elementary mutations away from the current best solution. If c < 1 then the ONEPSO uses similar subroutines as the Metropolis algorithm (see [MRR + 53]), but for ONEPSO new positions are always accepted and guidance to good solutions is instead implemented by a "drift" to the best position found so far.
is conjectured. All other bounds -including the lower bound Ω(n!)are proved formally in this work.
Indeed bounds on the expected optimization time for ONEMAX and upper bounds on the expected optimization time for the sorting problem of the ONEPSO with parameter c ∈ (1/2, 1] match the respective bounds on the expected optimization time for (1 + 1)-EA variants from [DJW02,STW04]. We show that for c ∈ [1/2, 1), the expected optimization time of ONEPSO for sorting and ONEMAX is polynomial and for c ∈ (0, 1/2), the expected optimization time is exponential. For c in the latter range we provide lower bounds on the base of the exponential expression by α(c) for sorting and β (c) for ONEMAX such that 1 < β (c) < α(c) < 6 (see Figure 5). Please note that α and β have been significantly improved compared to the conference version [MRS + 17] and the upper bound on ONEMAX has also been reduced heavily to an exponential term with base β (c). This means that the lower and upper bound on the base of the exponential expression for the expected optimization time is equal. Note that for c = 1/2 the expected time it takes to visit the attractor again after moving away from it is maximal while keeping the expected optimization time polynomial, i. e., we have a phase transition at c = 1/2 such that for any c ≤ 1/2 the expected optimization time is polynomial and for any c > 1/2 the expected optimization time is exponential in n. Hence, the parameter choice c = 1/2 maximizes the time until the attractor is visited again, i. e., the particles can explore the search space to the largest possible extent, provided that ONEMAX and the sorting problem are solved efficiently in a black-box setting.
In order to obtain the bounds shown in Table 1, we use a Markov model which captures the behavior of the ONEPSO algorithm between two consecutive updates of the attractor. Depending on whether we derive upper or lower bounds on the expected optimization time, the Markov model is instantiated in a slightly different way. The relevant quantity we extract from the Markov model is the expected number of steps it takes until the ONEPSO algorithm returns to the attractor. We determine Θ-bounds on the expected return time by an analysis of appropriate recurrence equations. Similar recurrences occur, for example, in the runtime analysis of randomized algorithms for the satisfiability problem [Pap91,Sch99]. Thus, our analysis of the Markov model presented in Section 3 may be of independent interest. For c > 1/2, the recurrence equations can be solved using standard methods. For the parameter choice c ≤ 1/2 however, we need to solve recurrence equations with non-constant coefficients in order to get sufficiently accurate bounds from the model. The gaps between upper and lower bounds on the expected optimization times shown in Table 1 result from choosing best-case or worst-case bounds on the transition probabilities in the Markov model, which are specific to the optimization problem. Since our bounds on the transition probabilities are essentially tight, we can hope to close the gap between the upper and lower bounds on the sorting problem (especially in the exponential case) only by using a more elaborate model. Furthermore, based on Wald's equation and the Blackwell-Girshick equation we obtain also the variance of the number of function evaluations needed to find an optimal solution with respect to the Markov model.
Upper Bounds. To obtain the upper bounds shown in Table 1 we use the established fitness level method (e. g., see [Weg02]). We instantiate our Markov model such that improvements of the attractor are only accounted for if the current position is at the attractor. The main difficulty is to determine the expected number of steps needed to return to the attractor. We obtain this quantity from the analysis of the corresponding recurrences with constant and non-constant coefficients. Furthermore, we obtain by integration closed-form expressions of the expected number of steps it takes to return to the attractor after an unsuccessful attempt to improve the current best solution.
Lower Bounds. The runtime of the ONEPSO algorithm is dominated by the time required for the last improvement of the attractor, after which the global optimum has been found. We again use the Markov model and observe that in this situation, the global optimum can be reached only when the Markov model is in a specific state. We argue that the optimal solution is included in a certain setŶ of indistinguishable states. Therefore, in expectation, this set needs to be hit Ω(|Ŷ |) times until the optimum has been found. By evaluation of the return time to the attractor we also obtain bounds on the return time to the setŶ . Furthermore, for D-PSO with a constant number of particles, we give a lower bound of Ω(log n) for optimizing a pseudo-Boolean function and for P = poly(n) particles a stronger lower bound of Ω(P · n) for the same task.
Open Problems. Finally, we conjecture that the expected optimization time of ONEPSO for sorting n items is asymptotically equivalent to n! if the attractor is not used at all (c = 0). An equivalent statement is that a random walk on the set of permutations of n items using single transpositions as neighborhood relation asymptotically takes expected time n! to discover a fixed given permutation starting at a random position. We provide theoretical evidence for this conjecture in Appendix A. Furthermore, we conjecture stronger lower bounds for ONEPSO for sorting n items for c > 0 and provide evidence in Appendix B.
Extensions. This article extends the conference paper [MRS + 17] as follows. We present D-PSO, a discrete PSO algorithm with multiple particles in Section 2 and show in Section 5.3 which results for ONEPSO generalize to D-PSO. For Pseudo-Boolean functions, a new general lower bound is presented, which holds for D-PSO (see Subsection 5.4). Furthermore, we give a refined analysis in the case of exponential runtime which allows us to determine the exact base of the exponential expression (see Section 4.4). In addition to the analysis of the expected optimization time, we also consider the variance of the expected optimization time (see Subsection 4.5). We conjecture that the expected number of steps needed until the random walk on the space of permutations hits a fixed permutation is n!. We provide evidence for this conjecture in Appendix A. Finally, we present an approximate model of the algorithm with average transition probabilities to obtain the actual bounds where lower and upper bounds on the expected optimization time differ (see Appendix B). Also, some theorems have been extended to larger classes of functions, which may make them more useful in other settings.

Related Work
Runtime results are available for several other meta-heuristics for optimization problems over discrete domains, for example evolutionary algorithms (EAs) [DJW02, GW03, Weg02, ADFH18, ADY19] and ant colony optimization (ACO) [DNSW07,NW07,ST12]. Most of the results relevant to this work concern the binary PSO algorithm and the (1 + 1)-EA algorithm. For the binary PSO, [SW08,SW10] provide various runtime results. For instance, they give general lower bound of Ω(n/ log n) for every function with a unique global optimum and a bound of Θ(n log n) on the function ONEMAX. Note that the binary PSO studied in [SW10] has been designed for optimizing over {0, 1} n and it is different from our proposed D-PSO, which can be applied to a much wider range of discrete problems. Sudhold and Witt show the following bound for the binary PSO. Essentially, this result reflects the fact that the binary PSO converges to the attractor in expected time O(n log n) unless the attractor has been updated meanwhile. This happens once for each fitness level. For ONEMAX, this result yields an expected optimization time of O(n 2 log n). By a more careful analysis of the binary PSO on ONEMAX, the following improved bound is established: Theorem 2. [SW10, Thm. 5] The expected optimization time of the binary PSO with a single particle optimizing ONEMAX is O(n log n).
The (1 + 1)-EA considered in [STW04] is reminiscent of stochastic hill climbing: In each iteration, a random solution is sampled and the current solution is replaced if and only if the solution is better. In order to escape local optima, the distance between the current solution and the new one is determined according to Poisson distributed random variables. [STW04] provide bounds on the expected optimization time of a (1 + 1)-EA sorting n items. They consider various choices of objective functions (e. g., Hamming distance, transposition distance, . . . ) as well as mutation operators (e. g., transpositions, reversing keys in a certain range, . . . ). A general lower bound of Ω(n 2 ) is proved, which holds for all permutation problems having objective functions with a unique optimum [STW04, Thm. 1]. The most relevant runtime result for a comparison with our D-PSO algorithm is the following: The upper bound can be obtained by the fitness level method and a lower bound of Ω(k/n 2 ) on the probability of improvement when the current solution is at transposition distance k to the attractor. The lower bound follows from a similar argument. In addition, for determining the lower bound [STW04] consider the Hamming distance to evaluate the distance between the current position and the optimum although the algorithm still uses the transposition distance to decide which position is better. In the conference version [MRS + 17] we incorrectly claimed that the proof does not apply to this setting. Thanks to an anonymous reviewer we can correct this statement here.
In contrast to the (1 + 1)-EA algorithm, the binary PSO studied in [SW10] allows for nonimproving solutions, but it converges to the attractor exactly once per fitness level. After the convergence occurred, the binary PSO behaves essentially like the (1 + 1)-EA.
Additionally, [RSW19] is based on a preliminary version of the present paper. There the authors applied the ONEPSO to the single-source shortest path problem. For this purpose they extend the Markov model presented here by allowing self loops. They also used the bounds by integration which are proved in this work without repeating the proof. The following upper and lower bounds on the expected optimization time are given which are dependent on the algorithm parameter c specifying the probability of movement towards the attractor.

Organization of the Paper
In Section 2 we introduce the algorithm D-PSO for solving optimization problems over discrete domains. In Section 3 we provide a Markov model for the behavior of the algorithm ONEPSO -a restriction of D-PSO to one particle -between two updates of the local attractor. Section 4 contains a comprehensive analysis of this Markov model. The results from this section are used in Section 5 in order to obtain the bounds on the expected optimization time for ONEPSO shown in Table 1 as well as lower bounds for D-PSO on pseudo-Boolean functions. Section 6 contains some concluding remarks.

Discrete PSO Algorithm
In this section we introduce the D-PSO algorithm, a PSO algorithm that optimizes functions over discrete domains. A simplified version of the algorithm that uses just a single particle will be referred to as ONEPSO. Note that ONEPSO is different from the 1-PSO studied in [SW10], which is tailored to optimization over bitstrings. The D-PSO and ONEPSO algorithm sample items from a finite set X in order to determine some x * ∈ X that minimizes a given objective function f : X −→ R. In order to have a discrete PSO that remains true to the principles of the original PSO for optimization in the domain R n from [EK95,KE95], we need some additional structure on X : For each x ∈ X we have a set of neighbors N X (x). If the set X is clear from the context we may drop the subscript. The neighborhood structure induces a solution graph with nodes X and arcs {xy | x, y ∈ X , y ∈ N (x)}. The distance d(x, y) of solutions x, y ∈ X is the length of a shortest (directed) xy-path in this graph. We assume that the solution graph is strongly connected, so the PSO cannot get "trapped" in a particular strongly connected component. The search spaces of our reference problems in combination with the used neighborhood relationship satisfy this assumption.
The D-PSO algorithm performs the steps shown in Algorithm 1. The initial positions of the particles are chosen uniformly at random (u.a.r.) from the search space X . The parameter c loc determines the importance of the local attractor l i of each particle, which is the best solution a single particle has found so far, and the parameter c glob determines the importance of the global attractor g, which is the best solution all particles have found so far. In each iteration each particle moves towards the local attractor with probability c loc , moves towards the global attractor with probability c glob and otherwise move to a random neighbor. If l i equals g then the particles still move only with probability c glob to g. Note that the attractors l i and g are updated in lines 21 and 24 whenever a strictly better solution has been found.
Alternatively, one could choose to update local and global attractors whenever the new position is at least as good as the position of the attractor (use ≤ instead of < in lines 21 and 24). The theorems presented here can also be carried over to this modified setting. Admittedly, for functions with plateaus this version potentially performs better, since a plateau is traversed easily by the modified algorithm. However, for the problems considered in this work there are no plateaus. Additionally, the probability to improve the objective function in the situation where position and attractor differ but have equal objective function value is higher than in the situation where the position is placed at the attractor.
At first glance, Algorithm 1 may not seem like an implementation of the PSO ideas, since we are choosing only a single attractor on each move or even make a random move whereas the classical PSO uses local and global attractor at each move, but looking at several consecutive iterations we retain the tendency of movement towards all the attractors. We consider the PSO to be an infinite process, so we do not give a termination criterion. We assume that sampling of x ′ in lines 11, 14 and 17 can be performed efficiently. This is the case for the neighborhood structures we consider.
The algorithm ONEPSO is simply given by Algorithm 1 with a single particle, i.e., we have P = 1. Note that there is only a single attractor in this case. Hence, ONEPSO has just a single parameter c = c glob that determines the probability of moving towards the (global) attractor g. In 9 if x i = l i and l i = g and q ∈ [0, c loc ] then /* towards local attractor (if not equal to global attractor) */ 11 11 all other aspects it behaves like the D-PSO algorithm.

Markov Model of ONEPSO
We present a simple Markov model that captures the behavior of the ONEPSO algorithm between two consecutive updates of the attractor. This model has been presented already in [MRS + 17] but we repeat it here to present a self contained overview on the presented approach. As an extension to [MRS + 17] we also present how the variance can be computed. This is essential for experiments and if ONEPSO and D-PSO are actually used. Using this model we can infer upper and lower bounds on the expected optimization time of the ONEPSO algorithm on suitable discrete functions. For our analysis, we assume that the objective function f : X → R has the property that every local optimum is a global one. That is, the function is either constant or any non-optimum solution x has a neighbor y ∈ N (x) such that f (x) > f (y). Functions with that property are called unimodal Figure 3: State diagram of the Markov model functions. Although this restriction certainly narrows down the class of objective functions to which our analysis applies, the class seems still appreciably large, e. g., it properly contains the class of functions with a unique global optimum and no further local optima.
Assume that the attractor g ∈ X is fixed and g is not a minimizer of f . Under which conditions can a new "best" solution be found? Certainly, if the current position x is equal to g, then, by the described structure of f we get an improvement with positive probability. If x = g then the attractor may still be improved. However, for the purpose of upper bounding the expected optimization time of the ONEPSO we dismiss the possibility that the attractor is improved if x = g. As a result, we obtain a reasonably simple Markov model of the ONEPSO behavior. Quite surprisingly, using the same Markov model, we are also able to get good lower bounds on the expected optimization time of the ONEPSO (see Section 5.2 for the details).
Recall that we think of the search space in terms of a strongly connected graph. Let n be the diameter of the search space X , i.e., the maximum distance of any two points in X . We partition the search space according to the distance to the attractor g ∈ X . That is, for 0 Note that this partition does not depend on the objective function. If the search space is not symmetric as in our case it could also be possible that some X i are empty, because the maximal distance to a specific solution in the search space could be less than the diameter. The model consists of n + 1 states S 0 , S 1 , . . . , S n . Being in state S i indicates that the current solution x of the ONEPSO is in X i .
For each x ∈ X i we denote by p x the transition probability from x to an element in X i−1 . The probabilities p x in turn depend on the parameter c, which is the probability that the ONEPSO explicitly moves towards the attractor. If the current position x is in X i and the algorithm moves towards the attractor, then the new position is in X i−1 . On the other hand, if the PSO updates x to any neighbor chosen u.a.r. from N (x), then the new position is in So we obtain the transition probability Remark 5. In this work we assume that the probability that we move from a position x ∈ X i to an element in X i is zero, i. e., if we move from a position x to a neighboring position x ′ ∈ N (x) then the distance to the attractor does always change (d(x, g) = d(x ′ , g)).
This assumption holds for both problems we investigate in Section 5. Nevertheless, an extensions allowing transitions inside a level X i is possible and has been considered already in the article [RSW19] which is based on an arXiv preprint of this article.
Using the assumption in Remark 5 and the fact that X i is defined by distance to a fixed position g the probability of moving from x to an element in X j , j / ∈ {i − 1, i + 1}, is zero. Consequently, the probability of moving from x to an element in X i+1 is then 1 − p x . Furthermore, if the ONEPSO is at position x ∈ X n then any move brings us closer to the reference solution; so p x = 1 in these cases.
Please note that p x and p x ′ can differ even if x, x ′ are contained in the same set X i . Therefore we do not necessarily obtain a Markov model if we use the states S i and the transition probabilities p i = p x for some x ∈ X i as this value is not necessarily equal for all x ′ ∈ X i . Nevertheless, we can analyze Markov chains using bounds on the transition probabilities.
To be more precise, we can use p i := min x∈X i p x as lower bound and p ′ i := max x∈X i p x as upper bound on the transition probabilities in the direction to the attractor to obtain an upper bound and lower bound on the expected number of iterations until the distance to the attractor is decreased respectively. Figure 3 shows the state diagram of this model.
we denote an instance of the Markov model with states S 0 , S 1 , . . . , S n and data p i ∈ R ≥0 , 1 ≤ i ≤ n. Suppose we are in state S i . Then we move to state S i−1 with probability Please note that the data does not need to be in [0, 1] but for the ease of presentation we will refer to them as probabilities. Furthermore, supposed we are in state S n even if p n = 1 the probability of moving to S n−1 is 1. This notation allows us to succinctly specify the currently used Markov model, e. g., the model which is used to obtain upper bounds is described by M ((min x∈X i p x ) 1≤i≤n ).
Our goal is to determine the expected number of steps needed to hit a solution which is better than the attractor after starting in S 0 . Let p g be the probability to improve the attractor if we are currently in state S 0 , hence at the attractor. Then the probability p g depends on f and the choice of g. We have that p g is positive since f is unimodal. In order to reach a better solution from S 0 we need in expectation 1/p g tries. If we are unsuccessful in some try, then the ONEPSO moves to S 1 . For upper bounds we can ignore the chance to improve the attractor through other states. Thus we need to determine the expected number of steps it takes until we can perform the next try, that is, the expected first hitting time for the state S 0 , starting in S 1 . The expected number h i of steps needed to move from S i to S 0 is given by the following recurrence: (1)

Analysis of the Markov Model
In this section we prove upper and lower bounds on the expected return time to the state S 0 of the Markov model from Section 3. These bounds are of key importance for our runtime analysis in Section 5. For the lower bounds we also introduce a notion of indistinguishability of certain states of a Markov model. In our analysis the probabilities p i are generally not identical. If we assume that p 1 = p 2 = . . . = p n = p, then we obtain a non-homogeneous recurrence of order two with constant coefficients. In this case, standard methods can be used to determine the expected time needed to move to the attractor state S 0 as a function of n [GKP94, Ch. 7]. Note also that for p = 1/k this is exactly the recurrence that occurs in the analysis of a randomized algorithm for k-SAT [Pap91,Sch99] and [MU05,pp. 160f.]. If p i has not necessarily identical values which are dependent on i, then the recurrence can in some cases be solved, see e. g., [GKP94,Ch. 7] and [Pet92]. Here, due to the structure of the recurrence, we can use a more pedestrian approach, which is outlined in the next section.

Reformulation of the Recurrence
We first present a useful reformulation of Recurrence (1). From this reformulation we will derive closed-form expressions and asymptotic properties of the return time to the attractor of the transition probabilities.
Let W i be the number of steps needed to move from state S i to state S i−1 and let H i := E[W i ] be its expectation. Then H i can be determined from H i+1 as follows: In expectation, we need 1/p i trials to get from S i to S i−1 , and each trial, except for the successful one, requires 1 + H i+1 steps. The successful trial requires only a single step, so H i is captured by the following recurrence: Another interpretation is the following: W i is equal to one with probability p i , the direct step to S i−1 , and with probability 1 − p i it takes the current step which leads to state S i+1 , then W i+1 steps to go from S i+1 back to S i and then again W i steps. For the expected value of W i this interpretation leads to the formula which is equivalent to Equation 2 after solving for H i . Please note that the probabilities p i are mostly determined by some function depending on n and i. Unfolding the recurrence specified in Equation (2) k times, 1 ≤ k ≤ n, followed by some rearrangement of the terms yields Thus, for k = n we obtain the following expression for H 1 : where the second term is a correction term which is required whenever p n < 1 (see Definition 6) in order to satisfy the initial condition given in Equation (3). Equation (5) has also been mentioned in [DJW01,Lemma 3] in the context of the analysis of randomized local search or in [KK18,Theorem 3]. H k can be obtained analogously, which leads to

Identical Transition Probabilities
If the probabilities p i = p for some constant p ∈ [0, 1] and 1 ≤ i < n, then Recurrences (2) become linear recurrence equations with constant coefficients. Standard methods can be used to determine closed-form expressions for h i and H i . However, we are mainly interested in H 1 and are able to determine closed-form expressions directly from Equation (5).
Theorem 7. Let 0 < p < 1. Then the expected return time H 1 to S 0 is Proof. By setting p i = p in Equation (5) and performing some rearrangements the theorem is proved.
It is easily verified that this expression for h 1 satisfies Equation (1). So, with p i = p we have that the time it takes to return to the attractor is bounded from above by a constant, a linear function, or an exponential exponential function in n if p > 1/2, p = 1/2, or p < 1/2, respectively.
For the case p = 1/2 one can obtain this result also from the Gambler's Ruin Model by mirroring the state 0 at n which would result in termination if values 0 or 2n appear. Starting at value 1 results in a first hitting time of 2n − 1 in this Gambler's Ruin Model as stated in Theorem 7.

Non-identical Transition Probabilities
Motivated by the runtime analysis of ONEPSO applied to optimization problems such as sorting and ONEMAX, we are particularly interested in the expected time it takes to improve the attractor if the probabilities p i are slightly greater than 1/2. By slightly we mean p i = 1/2 + i/(2A(n)) which appears in the analysis of ONEPSO optimizing ONEMAX and the sorting problem, or p i = 1/2 + A(i)/(2A(n)) which appears in the analysis of ONEPSO optimizing the sorting problem, where A : N → N is some non-decreasing function of n such that lim n→∞ A(n) = ∞. Recall from Definition 6 that if p i > 1 then we move from state S i to state S i−1 with probability 1. Clearly, in this setting we cannot hope for a recurrence with constant coefficients. Our goal in this section is to obtain the asymptotics of H 1 as n → ∞ for A(n) = n and A(n) = n 2 . We show that for p i = 1/2 + i/(2 · A(n)) and A(n) = n the return time to the attractor is Θ( √ n), while for A(n) = n 2 the return time is Θ(n).
(a) M (below arrows) and its extended versionM (above arrows) with n ′ > n states.
M (below arrows) and its reduced versionM (above arrows) withn < n states and larger probabilities in direction to S 0 .
Proof. We have p n = 1 so the correction term in Equation (5) is zero. We rearrange the remaining terms of Equation (5) and find that Applying the well-known relation (for an elegant derivation, see [Hir15]) finishes the proof.
This lemma can be generalized for linearly growing probabilities. Proof. First, we consider the case A(n) ≤ n 2 . Let n ′ = A(n), which is the smallest number such that p n ′ = 1/2 + n ′ /(2A(n)) ≥ 1. First, assume that n ′ ≤ n and consider the "truncated" model M ′ = M ((p i ) 1≤i≤n ′ ). Please note that there is actually no difference between M and M ′ because the removed states are never visited as p n ′ , the probability to move from S n ′ to S n ′ −1 , is already one and S n ′ +1 is never visited. Let H ′ 1 be the expected time to reach state S 0 starting at state S 1 with respect to M ′ . By Lemma 8 we have H ′ 1 = Θ( A(n)), which is by the construction of M ′ equal to H 1 . On the other hand, assume that n ′ > n and consider the "extended" modelM = M ((p i ) 1≤i≤n ′ ). M andM are visualized in Figure 4a with omitted probability of worsening. LetH 1 be the expected time to reach state S 0 starting at state S 1 with respect toM. By Lemma 8 we haveH 1 = Θ( A(n)) and sinceH 1 ≥ H 1 we obtain H 1 = O( A(n)).
To obtain a lower bound on H 1 for the case n ′ > n we consider the modelM = M ((p i ) 1≤i≤n ), wheren = min(n, ⌊ A(n)⌋) andp i = 1/2 + 1/(2n) for 1 ≤ i <n, andpn = 1. For 1 ≤ i <n we have thatp i ≥ p i because 1/n =n/n 2 ≥n/(A(n)). A schematic representation of M andM can be found in Figure 4b. LetĤ 1 denote the expected time to reach state S 0 starting at state S 1 inM. Sincep i ≥ p i for 1 ≤ i ≤n,Ĥ 1 is a lower bound on H 1 . This fact is obvious as in this Markov model one can move only to neighboring states. Increasing the probability of moving towards the final state consequently decreases the probability of movement in the opposite direction and therefore the hitting time to reach the final state is in expectation reduced if the probabilities of moving towards the final state are increased. Since p :=p i is constant for 1 ≤ i <n we get from Theorem 7 thatĤ Substituting p = 1/2 + 1/(2n) giveŝ Therefore H 1 = Ω(n) = Ω(min(n, A(n))).
For our application, the sorting problem, the following special case of Theorem 9 will be of interest: Corollary 10. Let M = M ((1/2(1 + i/ n 2 )) 1≤i≤n ), then H 1 = Θ(n). We will now consider a slightly different class of instances of the Markov model in order to obtain a lower bound on the ONEPSO runtime for sorting in Section 5. For this purpose we consider transition probabilities p i that increase in the same order as the divisor A(n), which suits our fitness level analysis of the sorting problem. This class of models is relevant for the analysis of the best case behavior of the ONEPSO algorithm for sorting n items (see Theorem 16). Although we will only make use of a lower bound on H 1 in this setting later on, we give the following Θ-bounds: Theorem 11. Let M = M ( 1 2 (1 + A(i)/A(n))) 1≤i≤n where A : N → R + is a non-decreasing function and A(n) = Θ(n d ) for some d > 0. Then H 1 = Θ(n d/(d+1) ) with respect to M.
This theorem is a significant extension to [MRS + 17, Thm. 5] which covers only the special case p i = 1/2 · (1 + i+1 2 / n 2 ). Proof. Consider the expression for H 1 given in Equation (4). Since p i > 1/2 for 1 ≤ i ≤ n the products are at most 1 and 1/p i is at most 2. Therefore for any k ∈ {1, . . ., n} : As H k+1 is the expected number of steps to move from state S k+1 to S k , the states S 0 to S k−1 are irrelevant for the calculation of H k+1 since they are never visited in between. Therefore also probabilities p 1 to p k do not matter. We truncate the model to states S k , . . . , S n . For these states the minimal probability of moving towards the attractor is p k+1 ≥ p k . Therefore we can set p i = p k for i ∈ {k + 1, . . . , n} to get an upper bound on the return time. By reindexing the states we obtain the modelM = M ((p k ) 1≤i≤n−k ) and, because of the truncation and the decrease of probabilities, H 1 is an upper bound on H k+1 , whereH 1 is the expected number of steps to move from state S 1 to S 0 in modelM. InM we have the fixed probabilities p k and can therefore apply Theorem 7 to determineH 1 . Therefore which certifies that H 1 = O(n d/(d+1) ). By using Equation (4) we have the following lower bound on H 1 : As this equation holds for any k ∈ {1, . . ., n} we can choose k = g · n z with a not yet fixed constant g ∈ (0, 1) and z = d/(d + 1), z ∈ (0, 1). Please note that g · n z tends to infinity if n tends to infinity and therefore asymptotic expressions can also be applied if g · n z is the argument. Please also note that k has to be an integer but errors can be captured by some Θ(1) expressions. Substituting k by g · n z in the previous inequality results in Note: The last Θ(1) can be bounded from above by some constant c Θ for large n (by definition of Θ). Choose g = d+1 1/(2 · c Θ ). Then the expression in parentheses of the last term 1 − g d+1 · Θ(1) may be negative for small n but is at least 1/2 for large n which implies that it is in Ω(1). .
It may be verified that is a suitable choice for any 0 < ε < 1, to replace k by g·n z to obtain the previous inequalities. To see this, note that the first fraction of lim inf and lim sup counterbalances the fluctuation of A(k)/A(n) relative to its Θ-bound. Furthermore, the factor 1/4 compensates the factor 2 which is hidden by Θ and supplies the desired factor of 1/2 to ensure that the expression in parentheses is positive and at least 1/2 for large n. Finally, the factor of 1 − ε is needed for some tolerance, because without it g is only sufficiently small in the limit 2 but not necessarily for large n. This theorem is a generalization of Lemma 8 and implies the Θ-bound stated in that lemma by using A(n) = n.

Bounds by Integration
Reformulations (4) and (5) of the recurrence (1) given in Section 4.1 do not yield closed-form expressions of H 1 , the expected number of steps it takes to return to the attractor after an unsuccessful attempt to improve the current best solution. In this section we derive closed-form expressions for H 1 . In order to get rid of the sums and products in equations (4) and (5), we use the following standard approaches. First, sums may be approximated by their integral. This approach works quite well if the summand/integrand is monotonic, which is true in our case. Second, products can be transformed to integrals by reformulating the product by the exponential function of the sum of logarithms. This approach is an extension to [MRS + 17] as it is not present there at all. Please note that we use Θ * -bounds in the sense that polynomial factors can be omitted in Θbounds. This is a similar notation as in the more common use case of O * where polynomial factors can also be omitted for upper bounds.
Proof. p(i) is non-decreasing in i and has values in ]0, 1] and therefore 1−p(i) p(i) and also τ(i) := ln are non-increasing as the numerator is non-increasing and the denominator is nondecreasing. In the following series of equations, let k ∈ [0, n). Using Equation (4), we obtain As k can be chosen arbitrarily we get the claimed lower bound for H 1 , because p(0) 1−p(0) is a constant. As any integral is a continuous function also the whole expression in the supremum is a continuous function and therefore k = n can be allowed in the supremum without changing the value. Quite similar steps lead to the upper bound for H 1 . We start with Equation (5).
This proves the claimed upper bound and as the base of the exponential part is equal for upper and lower bound we obtain the claimed Θ * bound.
The logarithm in base(p, n) is positive as long as 1 − p(n · x) ≥ p(n · x). Therefore the integral is maximized if we use the smallest possible k (the infimum) which satisfies the condition 1 − p(k) ≤ p(k) ⇔ p(k) ≥ 1 2 . p(n · x) can in most cases be tightly bounded by a value independent of n. This is the case if for example p(i) = c + (1 − c)i/n, which we have for the model solving ONEMAX by ONEPSO. The k which maximizes the integral in the expression of base(p, n) is usually obtained by solving the simple equation p(k) = 1/2. Therefore the integral can be evaluated and the base of the exponential part of the runtime can be determined.

Variance of the Improvement Time
We show that the standard deviation of the return time is in the same order as the return time. Therefore in experiments the average of such return times can be measured such that a small relative error can be achieved. Also the variance of W i , the number of steps needed to move from state S i to state S i−1 , can be computed recursively. Let V i := Var[W i ] be the variance of W i .
To evaluate this variance we need the expectation and variance of a random variable which is the sum of random variables where the number of summed up random variables is also a random variable. Such random variables appear in the Galton-Watson process (see [Dur10]) from one generation to the next generation. Also W i can be specified as a sum of random variables where the number of summed up random variables is also a random variable. If we are currently in state S i we have some success probability to move to S i−1 in the next iteration. Therefore the number of trials in S i until we move to S i−1 follows a geometric distribution. In case of failure we move to S i+1 and need additional W i+1 steps until we can make our next attempt to move to S i−1 . Therefore

Lemma 13. Let T be a random variable with non-negative integer values and let (Y i ) i∈N be independent identically distributed random variables Y i ∼ Y which are also independent of T . Additionally let Z
where T is a random variable distributed according to a geometric distribution with success probability equal to the probability of moving to S i−1 from S i and eachW i+1, j is an independent copy of W i+1 .
where p i is the probability of moving to S i−1 from S i .
Let T be a random variable distributed according to a geometric distribution with success probability p i and let all (W i+1, j ) y∈N be independent copies of W i+1 .
Finally the rightmost expression of Equation (8) is obtained by replacing H i+1 according to Equation (2).

Therefore one can evaluate H i by Equations (2) and (3) and then one can evaluate V i by Equations (8) and (9).
Please note that V i will always be in the same order as H 2 i . If (1 − p i )/p i is less than one then the recursively needed values of V j for j > i become less important and we have mainly H 2 i and if (1 − p i )/p i is greater than one then H 2 i is growing by at least ((1 − p i )/p i ) 2 (see Equation (2)) which is the square of the growing factor of V i .
As V i is in the same order as H 2 i we obtain by an arithmetic average of T evaluations of W i a relative error of approximately 1/ √ T . This is indeed a relevant statistic if evaluations are performed and is consolidated in the following corollary.

Runtime Analysis of ONEPSO and D-PSO
As mentioned above, we present a runtime analysis of ONEPSO for two combinatorial problems, the sorting problem and ONEMAX. Our analysis is based on the fitness level method [Weg02], in particular its application to the runtime analysis of a (1+1)-EA for the sorting problem in [STW04]. Consider a (discrete) search space X and an objective function f : X → R, where f assigns m distinct values f 1 < f 2 < . . . < f m on X . Let S i ⊆ X be the set of solutions with value f i . Assuming that some algorithm A optimizing f on X leaves fitness level i at most once then the expected runtime of A is bounded from above by ∑ m i=1 1/s i , where s i is a lower bound on the probability of A leaving S i . The method has also been applied successfully, e. g., in [SW10] to obtain bounds on the expected runtime of a binary PSO proposed in [KE97].

Upper Bounds on the Expected Optimization Time
Similar to [STW04,SW10], we use the fitness-level method to prove upper bounds on the expected optimization time of the ONEPSO for sorting and ONEMAX. In contrast to the former, we allow non-improving solutions and return to the attractor as often as needed in order to sample a neighbor of the attractor that belongs to a better fitness level. Therefore, the time needed to return to the attractor contributes a multiplicative term to the expected optimization time, which depends on the choice of the algorithm parameter c.
We first consider the sorting problem. The structure of the search space of the sorting problem has been discussed already in [STW04] and a detailed analysis of its fitness levels is provided in [MRS + 17]. In the following lemma we bound the transition probabilities for the Markov model for the sorting problem. This allows us to bound the runtime of ONEPSO for the sorting problem later on.
Lemma 16. For the sorting problem on n items, c = 1/2 and x ∈ X i , the probability p x that ONEPSO moves from x to an element in X i−1 is bounded from below by p i = 1 2 (1 + i/ n 2 ). Furthermore, this bound is tight.
Proof. The lower bound p i on p x can be obtained by To show the above inequality, consider the attractor a and a permutation τ such that x • τ = a. For each cycle of length k of τ, exactly k − 1 transpositions are needed to adjust the elements in this cycle and there are k 2 ≥ k − 1 transpositions which decrease the transposition distance to the attractor a. Therefore the number of ways to decrease the transposition distance to a is bounded from below by the transposition distance to a. Hence, we have The lower bound is tight as it appears if only cycles of length two (or one) appear. In [MRS + 17, Sec. 4] a more detailed discussion on improvement probabilities can be found.
Using Lemma 16 we prove the following bounds on the expected optimization time T sort (n) required by ONEPSO for sorting n items by transpositions.
Theorem 17. [MRS + 17, Thm. 13] The expected optimization time T sort (n) of the ONEPSO sorting n items is bounded from above by See Figure 5 for a visualization of 1−c c . Proof. Consider the situation that the attractor has just been updated. Whenever the ONEPSO fails to update the attractor in the next iteration it will take in expectation H 1 iterations until the attractor is reached again and then it is improved with probability at least i/ n 2 . Again, if the ONEPSO fails to improve the attractor we have to wait H 1 steps, and so on. Since we do not consider the case that the attractor has been improved meanwhile, the general fitness level method yields an expected runtime of at most ∑ n i=1 ((H 1 + 1)(1/s i − 1) + 1) = H 1 · O(n 2 log n). We now bound the expected return time H 1 . Let c ∈ ( 1 2 , 1] and recall that p i is the probability of moving from state S i to state S i−1 . Then 1 ≥ p i > c > 1 2 . Then the expression for H 1 given in Theorem 7 is bounded from above by the constant 1/(2c − 1), so T sort (n) = O(n 2 log n). Now let c = 1 2 , so p i ≥ 1 2 (1 + i/ n 2 ) by Lemma 16. Then, by Corollary 10, we have H 1 = O(n), so T sort (n) = O(n 3 log n). Finally, let c ∈ (0, 1 2 ). Then p i > c > 0, and by Theorem 7, H 1 is bounded from above by For c = 0, ONEPSO always moves to a uniformly drawn adjacent solution. Hence, the algorithm just behaves like a random walk on the search space. Hence, in this case, T sort (n) is the expected number of transpositions that need to be applied to a permutation in order to obtain a given permutation. We conjecture that T sort (n) has the following asymptotic behavior and provide theoretical evidence for this conjecture in the Appendix A.
Please note that the conjecture is actually only a conjecture on the upper bound as Theorem 24 supplies a proof that T sort (n) = Ω(n!) if c = 0.
Using a similar approach as in Theorem 17, we now bound the expected optimization time T ONEMAX (n) of ONEPSO for ONEMAX.
Theorem 19. The expected optimization time T ONEMAX (n) of the ONEPSO solving ONEMAX is bounded from above by See Figure 5 for a visualization of β (c).
Proof. The argument is along the lines of the proof of Theorem 17. We observe that on fitness level 0 ≤ i ≤ n there are i bit flips that increase the number of ones in the current solution. Therefore, s i = i/n and the fitness level method yields an expected runtime of at most Now Theorem 12 gives the upper bound H 1 = O(n · β (c) n ). It remains to consider the case that c = 0. The claimed bound on T ONEMAX can be obtained by using the model M (( i n ) 1≤i≤n ). Each state represents the distance to the optimal point. By Equation (6) we have The maximal expected time to reach the optimal point is the sum of all H k : We remark that the upper bounds given in Theorem 19 for c ∈ [ 1 2 , 1] were presented in [MRS + 17, Thm. 14] and that the upper bound for c ∈ (0, 1 2 ) is newly obtained using the bounds-by-integration from Section 4.4 and the proof of the upper bound for c = 0 is also new compared to [MRS + 17]. Admittedly, the bound for c = 0 is already available in the context of randomized local search and can be found in [GKS99]. Furthermore, note that for c = 1 2 it is not sufficient to use the lower bound p i ≥ p 1 = 1 2 + 1 2n in order to obtain the runtime bound given in Theorem 19. By repeatedly running ONEPSO and applying Markov's inequality for the analysis, an optimal solution is found with high probability so we have the following Corollary.
Corollary 20. If the ONEPSO is repeated λ · log 2 (n) times but each repetition is terminated after 2 · T (n) iterations, where T (n) is the upper bound on the expected number of iterations to find the optimum specified in Theorems 17 and 19 with suitable constant factor, then ONEPSO finds the optimal solution with high probability.

Lower Bounds via Indistinguishable States
In this section we will provide lower bounds on the expected optimization time of ONEPSO that almost match our upper bounds given in Section 5.1. We will use the Markov model from Section 3 to obtain these lower bounds. The main difference to the previous section is that we restrict our attention to the last improvement of the attractor, which dominates the runtime, both for sorting and ONEMAX. We will introduce the useful notion of indistinguishability of certain states of a Markov chain. Note that our lower bounds are significantly improved compared to the conference version [MRS + 17] by using the newly introduced bounds-by-integration from Section 4.4.

Indistinguishable States
We now introduce a notion of indistinguishability of certain states of a Markov chain already presented in [MRS + 17]. We will later use this notion to prove lower bounds on the expected optimization time of ONEPSO for sorting and ONEMAX as follows: We show that the optimum is contained in a setŶ of indistinguishable states. Therefore, in expectation, the statesŶ have to be visited Ω(|Ŷ |) times to hit the optimum with positive constant probability.
Definition 21 (Indistinguishable states). Let M be a Markov process with a finite set Y of states and letŶ ⊆ Y . Furthermore, let (Z i ) i≥0 be the sequence of visited states of M and let T = min{t > 0 | Z t ∈Ŷ }. ThenŶ is called indistinguishable with respect to M if 1. the initial state Z 0 is uniformly distributed overŶ , i. e., for all y ∈ Y : 2. and the probabilities to reach states inŶ from states inŶ are symmetric, i. e., for all y 1 , y 2 ∈ Y : Now we can prove a lower bound on the expected time for finding a specific state.

Theorem 22. Let M be a Markov process as in Definition 21 and letŶ be indistinguishable with respect to M. Let h(M) be a positive real value such that E[T ] ≥ h(M), then the expected time to reach a fixed y ∈Ŷ is bounded below by h(M) · Ω(|Ŷ |).
Proof. Let T i be the stopping time whenŶ is visited the i-th time.
With Statement 1 of Definition 21 Z 0 is uniformly distributed overŶ . Therefore T 1 = 0 and T 2 = T . Statement 2 of Definition 21 implies that Pr[Z T i = y] = 1 y∈Ŷ /|Ŷ | for all i ≥ 1 by the following induction. The base case for i = 1 and T i = 0 is ensured by the Statement 1 of Definition 21. The induction hypothesis is Pr[Z T i−1 = y] = 1 y∈Ŷ /|Ŷ |. The inductive step is verified by the following series of equations.
It follows that for all i > 0 the difference T i+1 − T i of two consecutive stopping times has the same distribution as T and also Now let y ∈Ŷ be fixed. The probability that y is not reached within the first T ⌊|Ŷ |/2⌋−1 steps is bounded from below through union bound by and therefore the expected time to reach the fixed y ∈Ŷ is bounded from below by · Ω(|Ŷ |).

Lower Bounds on the Expected Optimization Time for Sorting
In this section we consider the sorting problem. Our first goal is to provide lower bounds on the expected return time to the attractor for the parameter choice c ∈ (0, 1 2 ). Lemma 23. Let c ∈ (0, 1 2 ). For the sorting problem on n items, assume that the attractor has transposition distance one to the identity permutation. Then the expected return time H 1 to the attractor is bounded from below by Ω(α(c) n ), where See Figure 5 for a visualization of α(c). and evaluation of the bounds k/n = 1−2c 2(1−c) and 0 results in 2c .
An application of the exp function on this result gives the claimed lower bound.
This lower bound is the best possible bound which can be achieved with this model as the probability p i = c + (1 − c) · i+1 2 / n 2 actually appears at distance i if the permutation transforming the current position to the attractor consists of one cycle of length i + 1 and the remaining permutation consists of singleton cycles. For this improvement probability the bound is Θ * (α(c) n ).
The following theorem supplies lower bounds on the expected optimization time of ONEPSO on the sorting problem.
Theorem 24. The expected optimization time T sort (n) of the ONEPSO sorting n items is bounded from below by Proof. The situation where already the initial position is the optimum has probability 1/n!. As 1 − 1/n! > 1/2 for n ≥ 2 we have the same Ω bound if we ignore this case. In all other cases we can consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. Without loss of generality, we assume that the attractor is the identity permutation and the optimum is the transposition (0 1). The number of steps required for the next (hence final) improvement of the attractor is a lower bound on the expected optimization time for the ONEPSO.
We determine a lower bound on this number for various choices of c. For all c ∈ (0, 1] we apply Theorem 22. We use all permutations as set of states Y in the Markov process M. LetŶ = X 1 be the subset of states which are a single swap away from the attractor. Therefore the optimal solution is contained inŶ , but up to the point when the ONEPSO reaches the optimal solution it is indistinguishable from all other permutations inŶ . We will immediately prove thatŶ is actually indistinguishable with respect to M. Initially the particle is situated on the attractor and after a single step it is situated at a permutation inŶ , where each permutation has equal probability. We use the permutation after the first step as the initial state of the Markov process Z 0 and all other Z i are the successive permutations. Therefore Statement 1 of Definition 21 is fulfilled. Let T = min{t > 0 | Z t ∈Ŷ } the stopping time of Theorem 22. For each sequence of states Z 0 , . . . , Z T there is a one to one mapping to a sequenceZ 0 = Z T ,Z 1 , . . . ,Z T −1 ,Z T = Z 0 which has equal probability to appear. The sequenceZ 0 , . . .,Z T is not the reversed sequence, because the forced steps would then lead to the wrong direction, but the sequence can be obtained by renaming the permutation indices. The renaming is possible because the permutations Z 0 and Z T are both single swaps. As this one to one mapping exists also the Statement 2 of Definition 21 is fulfilled. Finally we need a bound on the expectation of T . If we are in X 1 =Ŷ we can either go to the attractor by a forced move or random move and return to X 1 in the next step or we can go to X 2 by a random move and return to X 1 in expectation after H 2 steps. We have E[T ] = c + (1 − c)/ n 2 · 2 + (1 − c) · 1 − 1/ n 2 (1 + H 2 ) = Ω(H 2 ) =: h(M). Theorem 22 provides the lower bound Ω(|Ŷ | · H 2 ) for the runtime to find the fixed permutation (0, 1) ∈Ŷ which is the optimal solution. From Equation 2 we get H 2 = (p 1 · H 1 − 1)/(1 − p 1 ) ≥ (c · H 1 − 1)/(1 − c). As H 1 = Ω(n 2/3 ) for c = 1 2 (see Theorem 11) and H 1 = Ω(α(c) n ) for c ∈ (0, 1 2 ) (see Lemma 23) also H 2 = Ω(H 1 ) for c ∈ (0, 1 2 ] which results in the lower bounds T sort (n) = Ω(|Ŷ | · H 1 ) = Ω( n 2 · n 2/3 ) = Ω(n 8/3 ) for c = 1 2 and T sort (n) = Ω(|Ŷ | · H 1 ) = Ω( n 2 · α(c) n ) = Ω(n 2 · α(c) n ) for c ∈ (0, 1 2 ). Trivially the return time to X 1 in M can be bounded by 2, which results in the lower bound T sort (n) = Ω(n 2 ) for the case c ∈ ( 1 2 , 1]. The lower bound for c = 0 can be derived directly from the indistinguishability property: Let Y = Y . It is readily verified that the initial state is uniformly distributed overŶ . Furthermore, anyŶ -Ŷ -path can be reversed and has the same probability to occur. Therefore, Condition 2 of Definition 21 is satisfied and the lower runtime bound follows from Theorem 22 by choosing h(M) = 1.
Beside that formally proved lower bounds we conjecture the following lower bounds on the expected optimization time of ONEPSO for sorting n items.
Note that these lower bounds differ from our upper bounds given in Theorem 17 only by a log-factor. Evidence supporting this conjecture is given in Appendix B. We obtain our theoretical evidence by considering the average probability to move towards the attractor, instead of upper and lower bounds as before.

Lower Bounds on the Expected Optimization Time for ONEMAX
First we provide a lower bound on the expected return time to the attractor.
Lemma 26. Let c ∈ (0, 1 2 ). For ONEMAX, assume that the attractor has Hamming distance one to the optimum 1 n . Then the expected return time H 1 to the attractor is bounded from below by See Figure 5 for a visualization of β (c).
Proof. We use Theorem 12. The value β (c) is already calculated in Theorem 19 Equation 10.
This result enables us to prove lower bounds on T ONEMAX (n).
Theorem 27. The expected optimization time T ONEMAX (n) of the ONEPSO for solving ONEMAX is bounded from below by Proof. First, let c ∈ ( 1 2 , 1]. Then, with probability at least 1 2 , the initial solution contains at least k = ⌊n/2⌋ = Ω(n) zeros. Each zero is flipped to one with probability 1/n in a random move, and none of the k entries is set to one in a move towards the attractor. The expected time required to sample the k distinct bit flips is bounded from below by the expected time it takes to obtain all coupons in the following instance of the coupon collector's problem: There are k coupons and each coupon is drawn independently with probability 1/k. The expected time to obtain all coupons is Ω(k log k) [MU05, Section 5.4.1]. It follows that the expected optimization time is Ω(n log n) as claimed.
For c ∈ (0, 1 2 ] we use the same approach as in the proof of Theorem 24. Also here the event that the initial solution is optimal can be ignored. Consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. We use the set of all bit strings as set of states Y in the Markov process M. LetŶ = X 1 the subset of bit strings which is a single bit flip away from the attractor, henceŶ contains the optimum. Z i and T are instantiated as in the proof of Theorem 24. Therefore Statement 1 of Definition 21 is fulfilled. Again for each sequence of states Z 0 , . . . , Z T we have a one to one mapping to a sequenceZ 0 = Z T ,Z 1 , . . . ,Z T −1 ,Z T = Z 0 which has equal probability to appear. This sequence is again obtained by renaming the indices plus some bit changes according to the shape of the attractor. Hence also Statement 2 of Definition 21 is fulfilled. HenceŶ is indistinguishable with respect to M. We obtain E[T ] = Ω(H 2 ) =: h(M). Theorem 22 provides the lower bound Ω(|Ŷ | · H 2 ) for the runtime to find the optimal solution. From Equation (2) we get H 2 ≥ (c · H 1 − 1)/(1 − c). As H 1 = Ω(n 1/2 ) for c = 1 2 (see Theorem 9) and H 1 = Ω(β (c) n ) for c ∈ (0, 1 2 ) (see Lemma 26) also H 2 = Ω(H 1 ) for c ∈ (0, 1 2 ] which results in the lower bounds T ONEMAX (n) = Ω(|Ŷ | · H 1 ) = Ω(n · n 1/2 ) = Ω(n 3/2 ) for c = 1 2 and T ONEMAX (n) = Ω(|Ŷ | · H 1 ) = Ω(n · β (c) n ) = Ω(n · β (c) n ) for c ∈ (0, 1 2 ). Again the lower bound for c = 0 can be obtained by the indistinguishability property. The proof for this case is identical to the corresponding part of Theorem 24.
Finally all runtime bounds claimed in Table 1 are justified and for this purpose all of the presented tools in Section 4 are used.

Bounds on the Expected Optimization Time for D-PSO
The upper bounds on the runtime of ONEPSO in Theorem 17 and Theorem 19 directly imply upper bounds for D-PSO. Recall that we denote by c the parameter of ONEPSO and by T ONEMAX (n) and T sort (n) the expected optimization time of ONEPSO for ONEMAX and sorting, respectively.
Corollary 28. Let T ′ ONEMAX (n) and T ′ sort (n) be the expected optimization time of D-PSO for ONEMAX and sorting, respectively. If c = c glob , then T ′ ONEMAX (n) = O(P · T ONEMAX (n)) and T ′ sort (n) = O(P · T sort (n)), where P is the number of particles. Proof. In each trial to improve the value of the global attractor at least the particle which updated the global attractor has its local attractor at the same position as the global attractor. This particle behaves exactly like the single particle in ONEPSO until the global attractor is improved. Therefore we have at most P times more objective function evaluations than ONEPSO, where P is the number of particles.
One can refine this result by again looking on return times to an attractor. If the global attractor equals the local attractor then this particle performs the same steps as all other particles having equal attractors. As all those particles perform optimization in parallel in expectation no additional objective function evaluations are made compared to the ONEPSO.
For particles where the global attractor and the local attractor differ we can use previous arguments applied to the local attractor. With two different attractors alternating movements to the local and global attractor can cancel each other out. Therefore if (only) c loc is fixed then for the worst case we can assume only c loc as probability of moving towards the local attractor and 1 − c loc as probability of moving away from the local attractor. This enables us to use Theorem 7 to calculate the expected time to reduce the distance to an attractor from one to zero. We denote the return time from Theorem 7 as Ψ(n, p) If the position equals the local attractor and consequently differs from the global attractor the probability for improving the local attractor can be bounded from below by a positive constant. E. g., for the problem ONEMAX this constant is c glob /2 because for a move towards the global attractor for at least half of the differing bits the value of the global attractor equals the value of the optimal solution as the global attractor is at least as close to the optimum as the local attractor. Therefore the number of trials until the local attractor is improved is constant. As such an update occurs at most once for each particle and fitness level we obtain an additional summand of O(Ψ(n, c loc ) · P · n) instead of the factor P for the problems ONEMAX and the sorting problem. In contrast to the upper bounds, the lower bounds for ONEPSO do not apply for D-PSO for the following reason. The bottleneck used for the analysis of ONEPSO is the very last improvement step. However, D-PSO may be faster at finding the last improvement because it may happen that the local and global attractor of a particle have both distance one to the optimum but are not equal. In this case, as described above, there is a constant probability of moving to the optimum if a particle is at one of the two attractors whereas for ONEPSO the probability of moving towards the optimum if the particle is located at the attractor tends to zero for large n.
An analysis of experiments of D-PSO with small number of particles and ONEPSO applied to the sorting problem and ONEMAX revealed only a small increase in the optimization time of D-PSO compared to ONEPSO. This increase is way smaller than the factor P. For some parameter constellations also a significant decrease of the optimization time of D-PSO compared to ONEPSO is achieved.

Lower Bounds for Pseudo-Boolean Functions
Also for general pseudo-Boolean functions f : {0, 1} n → R we can prove lower bounds on the expected optimization time.
Theorem 29. If P = Θ(1), where P is the number of particles, then the expected optimization time of D-PSO optimizing pseudo-Boolean functions ({0, 1} n → R) with a unique optimal position is in Ω(n log(n)).
Proof. If there are P = Θ(1) particles, then in expectation there are n/2 P = Ω(n) bits such that there is no particle where this bit of the optimal position equals the corresponding bit of the initial position. The expected optimization time is therefore bounded by the time that each such bit is flipped in a random move at least once. This subproblem corresponds to a coupon collectors problem and therefore we have the claimed lower bound of Ω(n log(n)).
For larger values of P we obtain an even higher lower bound by the following theorem.
Theorem 30. If P = O(n k ), where P is the number of particles and k is an arbitrary non-negative real value, then the expected optimization time of D-PSO optimizing pseudo-Boolean functions ({0, 1} n → R) with a unique optimal position is in Ω(n · P).
Proof. To bound the probability to be at least some distance apart from the attractor after initialization we can use Chernoff bounds. For a fixed particle we can define if the ith bit of the initial position differs from the corresponding bit of the unique optimal position 0 otherwise . Therefore Y = ∑ n i=1 Y i is exactly the initial distance of the fixed particle to the unique optimal position. For each i we have that Pr[Y i = 1] = 1 2 and E[Y ] = n 2 . By Chernoff bounds we obtain the lower bound The probability that the initial position of all P particles have distance at least n 4 to the unique optimal position is the Pth power of this probability and can be bounded from below for large n by Please note that one can choose such an n as P = O(n k ) and 16 · ln(2 · poly(n)) = o(n) for any polynomial. If the distance of the positions of all particles is at least n 4 then it takes at least n 4 iterations until the optimal position can be reached as the distance can change only by one in each iteration. For each iteration P evaluations of the objective function are performed. Therefore we have at least n·P 4 objective function evaluations with probability at least 1 2 for large n which results in the claimed optimization time of Ω(n · P).
This means if we choose , e. g., P = n 10 we would have at least Ω(n 11 ) function evaluations in expectation.

Conclusion
We propose a simple and general adaptation of the PSO algorithm for a broad class of discrete optimization problems. For one particle, we provide upper and lower bounds on its expected optimization time for the sorting problem and ONEMAX and generalize the upper bounds to D-PSO with arbitrary number of particles and we also prove lower bounds of D-PSO optimizing pseudo-Boolean functions. Depending on the parameter c, which is the probability of moving towards the attractor, the expected optimization time may be polynomial (c ≥ 1/2) and exponential (c < 1/2), resp. The cornerstone of our analysis are Θ-bounds on the expected time it takes until the PSO returns to the attractor. Our analysis also provides the variance of this value. We analyze Markov chains and provide tools to evaluate expected return times for certain classes of transition probabilities. Additionally we establish a useful general property of indistinguishability of a Markov process for obtaining lower bounds on the expected first hitting time of a special state. Application of the presented tools on other Markov chains, often appearing in the analysis of randomized algorithms, would obviously be possible.
For future work, it would be interesting to see if the upper and lower bounds on the expected optimization time for ONEMAX given in Theorems 19 and 27 are valid for any linear function f : {0, 1} n → R, f (x 1 , x 2 , . . . , x n ) = ∑ i w i x i . Furthermore, we conjecture that the upper bounds on the sorting problem for c = 0 is n! and that the other proved upper bounds on the sorting problem are tight. Another direction for future work is to apply our technical tools to other meta-heuristics. In particular, our tools may be useful in the analysis of "non-elitist" meta-heuristics, for instance the Strong Selection Weak Mutation (SSWM) evolutionary regime introduced in [Gil83] as an example of non-elitist algorithm.
Finally, it would be interesting to determine the return time to the state S 0 in a more general Markov model M ((p i ) 1≤i≤n ), where p i = 1/2 + z(i, n) such that z(i, n) = poly(i)/ poly(n), where the degrees of the polynomials differ, and z(i, n) is non-decreasing for 1 ≤ i ≤ n. This would generalize Theorems 9 and 11, and shed some light on the relation between z(i, n) and the return time to state S 0 . Here, we conjecture that for z(i, n) as defined above the return time is in poly(n). Finally a proof for the claimed upper bound of O(n!) on the expected time to reach a specified permutation in the graph of permutations by an actual random walk searching for the optimum would be beneficial. To the best of our knowledge no proof exists so far. equations similar to Equation 1 is used. Let τ 0 be the optimal permutation (say, the identity) then This simple approach works only for very small n since one variable for each permutation is used.
Our results are based on the following insight: For each permutation τ we examine the permutation ν such that τ • ν = τ 0 . Since the value h τ is equal for all permutations with the same cycle lengths of the cycle decomposition of ν the number of variables in the system of linear equations can be reduced to the number of integer partitions of n, where n is the number of items to sort. Hence, for n = 40, we have reduced the number of variables from 40! to 37 338, which is a manageable number. T sort (n) is then just a linear combination of the calculated values. Side note: The transition probabilities as well as the system of linear equations can be represented by a matrix. For elitist algorithms the matrix can be transformed to an upper-triangular matrix and therefore it is much easier to analyze them. Such a transformation to an upper-triangular matrix is not possible in our situation.
(1, 1, 1, 1) (2, 1, 1) (3, 1) (2, 2) (4) In Figure 6 we can see the search space of the sorting problem for four items partitioned by their cycle lengths. This results in states which are represented by the integer partitions of n = 4. The complete search space with all permutations with n = 4 items is already visualized in Figure 2. In Figure 6 each state is labeled by the cycle lengths of the current permutation and can represent different permutations. (1, 1, 1, 1) is only a single permutation -the identity permutation 1234where each cycle is a singleton cycle -a cycle with length one. All neighboring permutations of (1, 1, 1, 1) contain two swapped items and two items which stay at their position. Therefore the cycle lengths are (2, 1, 1) and there are six permutations with these cycle lengths. The neighbors of permutations with cycle lengths (2, 1, 1) can have cycle lengths (1, 1, 1, 1), (3, 1) or (2, 2). Out of the six possible exchange operations one splits the cycle of length two into two cycles of length one, one exchange operation merges the two singleton cycles to one cycle of length two getting two cycles of length two and the remaining four exchange operations merge the cycle of length two with a singleton cycle getting cycle lengths (3, 1). The respective transition probabilities for a random walk are also visualized in Figure 6. Furthermore, there are three permutations with cycle lengths (2, 2) -the permutations 2143, 3412 and 4321. The remaining eight permutations of the third column in Figure 2 have cycle lengths (3, 1) and the six permutations in the last column in Figure 2 have only a single cycle of length four. The values of h τ satisfying Equation 11 are h (1,1,1,1) = 0, h 2,1,1 = 23, h (2,2) = 27, h (3,1) = 105 4 , h (4) = 55 2 and then we have T sort (4) = 1 4! · (h (1,1,1,1) + 6 · h 2,1,1 + 3 · h (2,2) + 6 · h (3,1) + 6 · h (4) ) = 99 4 = 4! + 3 4 . Please note that this value does not rely on experiments. Instead the evaluations result in the exact expected optimization time. In Figure 7 the exact value of T sort (n) divided by n! tends to one. Proving that this holds for large n implies that T sort (n) = Θ(n!) and also Conjecture 18.

B Evidence for Conjecture 25
For the sorting problem, the lower and upper bound on the return time to the attractor have the following gaps. For 0 < c < 1/2 the expected number of iterations to return to the attractor vary from Ω * (α(c) n ) to O * (((1 − c)/c) n ) and for c = 1/2 these values vary from Ω(n 2/3 ) to O(n). We provide a simplified Markov model based on an averaging argument. We conjecture that the simplified model and the actual model are asymptotically equivalent.
To improve the understanding of the search space of permutations we will approximate the improvement probabilities to obtain approximations of the expected return time to the attractor. For this purpose, instead of using upper and lower bounds on the probability to move towards the attractor, we use the average value. If the probability to be in a specific permutation equals the probability to be in any other permutation with the same distance to the attractor then this approximation would also result in the exact values.
Conjecture 31. Let H 1 be the expected number of iterations until the ONEPSO returns to the attractor g if the current distance to the attractor is one while optimizing the sorting problem. Let p x be the probability to move from permutation x to a permutation y such that d(x, g) = 1 + d(y, g). Letp i = ∑ x∈X i p x /|X i | be the average probability to reduce the distance to the attractor. LetĤ 1 be the expected number of iterations inM = M ((p i ) 1≤i≤n−1 )) to move from state S 1 to S 0 (see Def. 6). We conjecture that H 1 ∼Ĥ 1 .
To provide evidence we compute these average improvement probabilities and compare the number of expected iterations.
Theorem 32. The average improvement probability of moving towards the attractor while optimizing the sorting problem iŝ W.l.o.g let the attractor be the identity permutation. Then the attractor has n singleton cycles. An increase of the distance to the attractor by one is equivalent to a decrease of the number of cycles by one. Therefore a permutation with distance i to the attractor has exactly n −i cycles. This means that the number of permutations with distance i is n n−i . The probability that a fixed item is in a cycle of length k among all permutations with distance i from the attractor is Choosing the remaining i − 1 items in the cycle of length k from the remaining n − 1 items has n−1 k−1 options. There are (k − 1)! orderings of these items within the cycle. The remaining n − k items have to be partitioned into n − i − 1 cycles which results in another factor of n−k n−i−1 options. In combination with the first cycle of length k a permutation with n − i cycles is achieved.
This probability does not change if we choose a random item instead of a fixed item. Furthermore the probability of moving towards the attractor is determined by the probability that a cycle is split into two cycles which happens if two items of the same cycle are picked for an exchange. If the first picked item is in a cycle of length k then the probability that the second item is in the same cycle is k−1 n−1 . Summing up these probabilities over all possible cycle lengths for the first picked item results in the claimed result forp i , but there also the constant c of the D-PSO comes into play which forces a move towards the attractor. Please note that the maximal cycle length at distance i from the attractor is i + 1 which explains the upper limit of the sum. By using these average probabilities of moving towards the attractor we obtain a Markov chain where it is only possible to move to state S i−1 or S i+1 from state S i (and not to any other state) in a single step. For Markov chains with this property the return times can be computed as in Section 4.2. The result helps us to estimate the expected return time to the attractor. If c is zero then the expected return time if we are at distance one to the attractor is exactly n! − 1 which is also obtained exactly by the model with average probabilities. With the first part of Remark 33 we also notice that (1 − c)/c is also an upper bound and α(c) is a lower bound on the limit of q ex if n tends to infinity as q ex tends to the actual base of the exponential part of the expected return time.
As q ex has to be calculated by a system of linear equations where the number of variables equals the number of integer partitions of n (see description after Conjecture 18) we can not evaluate q ex for large n. The values of q av are quite similar to the values of q ex for corresponding n. For c = 0 the values are exactly the same and for n = 30 the relative error is less than 0.04. Therefore we conjecture that the limit of q av if n tends to infinity is close or even equal to the limit of q ex . But as we can see in Figure 8 the values of q av tend to the upper bound of (1 − c)/c. We omitted the graph of q av (10 000, c) as it overlaps the graph of the upper bound almost completely. So it is reasonable to conjecture that the limit of q ex is close to the upper bound (1 − c)/c. If this is actually true then for all runtime results the value of α(c) can be replaced by ((1 − c)/c) − ε for some small non-negative value ε which could probably be even zero. 100 1000 n q av (n, 1/2) q ex (n, 1/2) Figure 9: Quotients of return times q ex and q av for c = 1/2. By using the second part of Remark 33 the limit of log n n−1 q ex (n, 1/2) if n tends to infinity supplies us the exponent of the largest monomial (probably omitting logarithmic factors) of the return time if c = 1/2. In Figure 9 we can see the quotients q ex (n, 1/2) for n up to 40 and q av (n, 1/2) for even larger values of n. Also here q av can be used as an approximation on q ex and it is reasonable to assume that the limit is one. Please note that Theorem 17 tells us that the limit lim n→∞ q ex (n, 1/2) ≤ 1. Similarly to the exponential case with c < 1/2 we conjecture that the actual expected value H 1 is close to the proposed upper bound on H 1 described in the proof of Theorem 17.
Using these results for lower bounds on the expected optimization time we would have