Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax

Witt, Carsten

doi:10.1007/s00453-018-0463-0

Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax

Published: 11 June 2018

Volume 81, pages 632–667, (2019)
Cite this article

Algorithmica Aims and scope Submit manuscript

Carsten Witt ORCID: orcid.org/0000-0002-6105-7700¹

288 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

The Univariate Marginal Distribution Algorithm (UMDA) is a randomized search heuristic that builds a stochastic model of the underlying optimization problem by repeatedly sampling $\lambda $ solutions and adjusting the model according to the best $\mu $ samples. We present a running time analysis of the UMDA on the classical OneMax benchmark function for wide ranges of the parameters $\mu $ and $\lambda $. If $\mu \ge c\log n$ for some constant $c>0$ and $\lambda =(1+\varTheta (1))\mu $, we obtain a general bound $O(\mu n)$ on the expected running time. This bound crucially assumes that all marginal probabilities of the algorithm are confined to the interval $[1/n,1-1/n]$. If $\mu \ge c' \sqrt{n}\log n$ for a constant $c'>0$ and $\lambda =(1+\varTheta (1))\mu $, the behavior of the algorithm changes and the bound on the expected running time becomes $O(\mu \sqrt{n})$, which typically holds even if the borders on the marginal probabilities are omitted. The results supplement the recently derived lower bound $\varOmega (\mu \sqrt{n}+n\log n)$ by Krejca and Witt (Proceedings of FOGA 2017, ACM Press, New York, pp 65–79, 2017) and turn out to be tight for the two very different choices $\mu =c\log n$ and $\mu =c'\sqrt{n}\log n$. They also improve the previously best known upper bound $O(n\log n\log \log n)$ by Dang and Lehre (Proceedings of GECCO ’15, ACM Press, New York, pp 513–518, 2015) that was established for $\mu =c\log n$ and $\lambda =(1+\varTheta (1))\mu $.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Safeguarded augmented Lagrangian algorithms with scaled stopping criterion for the subproblems

Article 15 April 2024

The DIRECT algorithm: 25 years Later

Article Open access 17 October 2020

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

Article 01 April 2024

References

Baillon, J.-B., Cominetti, R., Vaisman, J.: A sharp uniform bound for the distribution of sums of Bernoulli trials. Comb. Probab. Comput. 25(3), 352–361 (2016)
Chen, T., Tang, K., Chen, G., Yao, X.: On the analysis of average time complexity of estimation of distribution algorithms. In: Proceedings of CEC ’07, pp. 453–460 (2007)
Chen, T., Lehre,P.K. , Tang, K., Yao, X.: When is an estimation of distribution algorithm better than an evolutionary algorithm? In: Proceedings of CEC ’09, pp. 1470–1477 (2009)
Chen, T., Tang, K., Chen, G., Yao, X.: Rigorous time complexity analysis of univariate marginal distribution algorithm with margins. In: Proceedings of CEC ’09, pp. 2157–2164 (2009)
Chen, T., Tang, K., Chen, G., Yao, X.: Analysis of computational time of simple estimation of distribution algorithms. IEEE Trans. Evol. Comput. 14(1), 1–22 (2010)
Article Google Scholar
Dang, D.-C., Lehre, P.K.: Simplified runtime analysis of estimation of distribution algorithms. In: Proceedings of GECCO ’15, pp. 513–518. ACM Press, New York (2015)
Droste, S.: A rigorous analysis of the compact genetic algorithm for linear functions. Nat. Comput. 5(3), 257–283 (2006)
Article MathSciNet MATH Google Scholar
Friedrich, T., Kötzing, T., Krejca, M.S., Sutton, A.M.: The benefit of recombination in noisy evolutionary search. In: Proceedings of ISAAC ’15, pp. 140–150. Springer, Berlin (2015)
Friedrich, T., Kötzing, T., Krejca, M.S.: EDAs cannot be balanced and stable. In: Proceedings of GECCO ’16, pp. 1139–1146. ACM Press, New York (2016)
Hajek, B.: Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14, 502–525 (1982)
Article MathSciNet MATH Google Scholar
Hauschild, M., Pelikan, M.: An introduction and survey of estimation of distribution algorithms. Swarm Evol. Comput. 1(3), 111–128 (2011)
Article Google Scholar
Johannsen, D.: Random combinatorial structures and randomized search heuristics. Ph.D. Thesis, Universität des Saarlandes, Germany (2010). http://scidok.sulb.uni-saarland.de/volltexte/2011/3529
Kaas, R., Buhrman, J.M.: Mean, median and mode in binomial distributions. Stat. Neerl. 34, 13–18 (1980)
Article MathSciNet MATH Google Scholar
Krejca, M.S., Witt, C.: Lower bounds on the run time of the univariate marginal distribution algorithm on OneMax. In: Proceedings of FOGA 2017, pp. 65–79. ACM Press, New York (2017)
Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, Volume 2 of Genetic Algorithms and Evolutionary Computation. Springer, Berlin (2002)
MATH Google Scholar
Lehre, P.K., Nguyen, P.T.H.: Improved runtime bounds for the univariate marginal distribution algorithm via anti-concentration. In: Proceedings of GECCO ’17, pp. 414–434. ACM Press, New York (2017)
Lehre, P.K., Witt, C.: Concentrated hitting times of randomized search heuristics with variable drift. In: Proceedings of ISAAC ’14, pp. 686–697. Springer, Berlin (2014). Full technical report at arXiv:1307.2559
McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 195–247. Springer, Berlin (1998)
Chapter Google Scholar
Mitavskiy, B., Rowe, J.E., Cannings, C.: Theoretical analysis of local search strategies to optimize network communication subject to preserving the total number of links. Int. J. Intell. Comput. Cybern. 2(2), 243–284 (2009)
Article MathSciNet MATH Google Scholar
Mühlenbein, H., Paass, G.: From recombination of genes to the estimation of distributions I. Binary parameters. In: Proceedings of PPSN IV, pp. 178–187. Springer, Berlin (1996)
Neumann, F., Sudholt, D., Witt, C.: A few ants are enough: ACO with iteration-best update. In: Proceedings of GECCO ’10, pp. 63–70. ACM Press, New York (2010)
Oliveto, P.S., Witt, C.: Improved time complexity analysis of the simple genetic algorithm. Theor. Comput. Sci. 605, 21–41 (2015)
Article MathSciNet MATH Google Scholar
Rowe, J.E., Sudholt, D.: The choice of the offspring population size in the (1, $\lambda $) evolutionary algorithm. Theor. Comput. Sci. 545, 20–38 (2014)
Article MathSciNet MATH Google Scholar
Samuels, S.M.: On the number of successes in independent trials. Ann. Math. Stat. 36(4), 1272–1278 (1965)
Article MathSciNet MATH Google Scholar
Sudholt, D.: A new method for lower bounds on the running time of evolutionary algorithms. IEEE Trans. Evol. Comput. 17(3), 418–435 (2013)
Article Google Scholar
Sudholt, D., Witt, C.: Update strength in EDAs and ACO: how to avoid genetic drift. In: Proceedings of GECCO ’16, pp. 61–68. ACM Press, New York (2016)
Witt, C.: Tight bounds on the optimization time of a randomized search heuristic on linear functions. Comb. Probab. Comput. 22(2), 294–318 (2013)
Article MathSciNet MATH Google Scholar
Witt, C.: Upper bounds on the runtime of the univariate marginal distribution algorithm on OneMax. In: Proceedings of GECCO ’17, pp. 1415–1422 (2017)
Wu, Z., Michael, K., Möhring, R.H.: Stochastic runtime analysis of the cross-entropy algorithm. IEEE Trans. Evolut. Comput. 21(4), 616–628 (2017)
Article Google Scholar

Download references

Acknowledgements

Financial support by the Danish Council for Independent Research (DFF-FNU 4002–00542) is gratefully acknowledged.

Author information

Authors and Affiliations

DTU Compute, Technical University of Denmark, Kongens Lyngby, Denmark
Carsten Witt

Authors

Carsten Witt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carsten Witt.

Additional information

An extended abstract of this article appeared in the proceedings of the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017) [28].

Appendix

1.1 Proof of Theorem 3

We will use Hajek’s drift theorem to prove Lemma 3. As we are dealing with a stochastic process, we implicitly assume that the random variables $X_t$, $t\ge 0$, are adapted to some filtration $\mathcal {F}_t$ such as the natural filtration $X_0, \ldots ,X_t$, $t\ge 0$.

We do not formulate the theorem using a potential/Lyapunov function g mapping from some state space to the reals either. Instead, we w. l. o. g. assume the random variables $X_t$ as already obtained by the mapping.

The following theorem follows immediately from taking Conditions D1 and D2 in [10] and applying Inequality (2.8) in a union bound over $L(\ell )$ time steps.

Theorem 15

[10] Let $X_t$, $t\ge 0$, be real-valued random variables describing a stochastic process over some state space, adapted to a filtration $\mathcal {F}_t$. Pick two real numbers $a(\ell )$ and $b(\ell )$ depending on a parameter $\ell $ such that $a(\ell )<b(\ell )$ holds. Let $T(\ell )$ be the random variable denoting the earliest point in time $t\ge 0$ such that $X_t\le a(\ell )$ holds. If there are $\lambda (\ell )>0$ and $p(\ell )>0$ such that the condition

holds for all $t\ge 0$ then for all time bounds $L(\ell )\ge 0$

$$\begin{aligned} {{\mathrm{Pr}}}\bigl (T(\ell )\le L(\ell ) \mid X_0\ge b(\ell )\bigr ) \le e^{-\lambda (\ell )\cdot (b(\ell )-a(\ell ))}\cdot L(\ell )\cdot D(\ell )\cdot p(\ell ), \end{aligned}$$

where $D(\ell )=\max \bigl \{1,\mathrm {E}\bigl (e^{-\lambda (\ell )\cdot (X_{t+1}-b(\ell ))}\mid \mathcal {F}_t\,;\, X_t\ge b(\ell )\bigr )\bigr \}$.

Proof

(Proof of Theorem 3) We will apply Theorem 15 for suitable choices of its variables, some of which might depend on the parameter $\ell =b-a$ denoting the length of the interval [a, b]. The following argumentation is also inspired by Hajek’s work [10].

By assumption $\varDelta _t(X_{t+1}-X_t)\preceq X_{t+1}-X_t$. Clearly, for the process $X'_t=X_0+\sum _{j=0}^{t-1} \varDelta _t(X_{t+1}-X_t)$ we have $X_t'\preceq X_t$. Hence, the hitting time $T^*$ for state less than a of the original process $X_t$ is stochastically at least as big as the corresponding hitting time of the process $X_t'$. In the following, we will therefore without further mention analyze $X_t'$ instead of $X_t$ and bound the tail of its hitting time. We work with $\varDelta :=\varDelta _t$, which equals $X'_{t+1}-X'_t$. We still use the old notation $X_t$ instead of $X'_t$.

The aim is to bound the moment-generating function (mgf.) from Condition ($*$). In this analysis, we for notational convenience often omit the filtration $\mathcal {F}_t$. First we observe that it is sufficient to bound the mgf. of $\varDelta \cdot \mathbb {1}\{\varDelta \le \kappa \epsilon \}$ since

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta })&= \mathrm {E}(e^{-\lambda \varDelta \mathbb {1}\{\varDelta \le \kappa \epsilon \} - \lambda \varDelta \mathbb {1}\{\varDelta> \kappa \epsilon \}})\\&= \mathrm {E}(e^{-\lambda \varDelta \mathbb {1}\{\varDelta \le \kappa \epsilon \}} e^{ - \lambda \varDelta \mathbb {1}\{\varDelta > \kappa \epsilon \}}) \le \mathrm {E}(e^{-\lambda \varDelta \mathbb {1}\{\varDelta \le \kappa \epsilon \}}), \end{aligned}$$

using $\varDelta \mathbb {1}\{\varDelta> \kappa \epsilon \}>0$ and hence $e^{ - \lambda \varDelta \mathbb {1}\{\varDelta > \kappa \epsilon \}}\le 1$. In the following, we omit the factor $\mathbb {1}\{\varDelta \le \kappa \epsilon \}$ but implicitly multiply $\varDelta $ with it all the time. The same goes for $\mathbb {1}\{a<X_t<b\}$.

To establish Condition ($*$), it is sufficient to identify values $\lambda :=\lambda (\ell )>0$ and $p(\ell )>0$ such that

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta }\mathbb {1}\{a<X_t<b\}) \le 1-\frac{1}{p(\ell )}. \end{aligned}$$

Using the series expansion of the exponential function, we get

$$\begin{aligned}&\mathrm {E}(e^{-\lambda \varDelta }\mathbb {1}\{a<X_t<b\}) = 1 - \lambda \mathrm {E}(\varDelta ) + \sum _{k=2}^\infty \frac{(-\lambda )^{k}}{k!} \mathrm {E}(\varDelta ^k)\\&\quad = 1 - \lambda \mathrm {E}(\varDelta ) + \sum _{k=2}^\infty \frac{(-\lambda )^{k}}{k!} \left( \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}) + \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta < 0\})\right) . \end{aligned}$$

We first concentrate on the positive steps in the direction of the expected value, more precisely, we consider for any odd $k\ge 3$

$$\begin{aligned} M_k :=\frac{\lambda ^{k}}{k!} \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}) - \frac{\lambda ^{k+1}}{(k+1)!} \mathrm {E}(\varDelta ^{k+1}\mathbb {1}\{\varDelta \ge 0\}). \end{aligned}$$

Since we implicitly multiply with $\mathbb {1}\{\varDelta \le \kappa \epsilon \}$, we have $\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}\le (\kappa \epsilon )^k$ and hence $|\mathrm {E}(\varDelta ^{k+1}\mathbb {1}\{\varDelta \ge 0\})/\mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\})|\le \kappa \epsilon $. By choosing $\lambda \le 1/(\kappa \epsilon )$, we have

$$\begin{aligned} M_k\ge \frac{\lambda ^k}{k!} \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}) - \frac{\lambda ^{k}}{\kappa \epsilon (k+1)!} \kappa \epsilon \mathrm {E}(\varDelta ^{k}\mathbb {1}\{\varDelta \ge 0\}) \ge 0, \end{aligned}$$

for $k\ge 3$ since $(1/k!)/(1/(k+1)!)=k$. Hence,

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta })&\le 1- \lambda \mathrm {E}(\varDelta ) + \frac{\lambda ^2 }{2} \mathrm {E}(\varDelta ^2\mathbb {1}\{\varDelta \ge 0\}) \\&\le 1- \lambda \mathrm {E}(\varDelta ) + \frac{\lambda ^2 }{2} \mathrm {E}(\varDelta \cdot \kappa \epsilon \cdot \mathbb {1}\{\varDelta \ge 0\})\\&\le 1-\lambda \mathrm {E}(\varDelta ) + \lambda \frac{1}{2\kappa \epsilon }\cdot \kappa \epsilon \cdot \mathrm {E}(\varDelta ) \le 1-\lambda \epsilon /2 \end{aligned}$$

where the first inequality used that $\varDelta ^{2}\le \varDelta \kappa \epsilon $ due to our implicit multiplication with $\mathbb {1}\{\varDelta \le \kappa \epsilon \}$ everywhere and the second used again $\lambda \le 1/(\kappa \epsilon )$. So, we have estimated the contribution of all the positive steps by $1-\lambda \mathrm {E}(\varDelta )/2$.

We proceed with the remaining terms. We overestimate the sum by using $\varDelta ':=|\varDelta \cdot \mathbb {1}\{\varDelta <0\})|$ and bounding $(-\lambda ^k)\le \lambda ^k$ in all terms starting from $k=2$. Incorporating the contribution of the positive steps, we obtain for all $\gamma \ge \lambda $

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta })&\le 1 - \frac{\lambda }{2} \mathrm {E}(\varDelta ) + \frac{\lambda ^2}{\gamma ^2} \sum _{k=2}^\infty \frac{\gamma ^{k}}{k!} \mathrm {E}(\varDelta '^k)\\&\le 1 - \frac{\lambda }{2} \mathrm {E}(\varDelta )+ \frac{\lambda ^2}{\gamma ^2} \sum _{k=0}^\infty \frac{\gamma ^{k}}{k!} \mathrm {E}(\varDelta '^k) \le 1 -\frac{\lambda }{2} \epsilon + \lambda ^2 \underbrace{\frac{\mathrm {E}(e^{\gamma \varDelta '})}{\gamma ^2}}_{=:C(\gamma )}, \end{aligned}$$

where the last inequality uses the first condition of the theorem, i. e., the bound on the drift.

Given any $\gamma >0$, choosing $\lambda :=\min \{1/(\kappa \epsilon ), \gamma , \epsilon /(4C(\gamma ))\}$ results in

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta }\mathbb {1}\{a<X_t<b\}) \le 1- \frac{\lambda }{2}\epsilon + \lambda \cdot \frac{\epsilon }{4C(\gamma )}\cdot C(\gamma ) = 1-\frac{\lambda \epsilon }{4} = 1-\frac{1}{p(\ell )} \end{aligned}$$

with $p(\ell ):=4/(\lambda \epsilon )$.

The aim is now to choose $\gamma $ in such a way that $\mathrm {E}(e^{\gamma \varDelta '})$ is bounded from above by a constant. We get

$$\begin{aligned} \mathrm {E}(e^{\gamma \varDelta '}) \le \sum _{j=0}^\infty e^{\gamma (j+1)r} {{\mathrm{Pr}}}(\varDelta \le -jr) \le \sum _{j=0}^\infty e^{\gamma (j+1)r} e^{- j } \end{aligned}$$

where the inequality uses the second condition of the theorem.

Choosing $\gamma :=1/(2r)$ yields

$$\begin{aligned} \mathrm {E}(e^{\gamma \varDelta '})&\le \sum _{j=0}^\infty e^{(j+1)/2 - j } = e^{1/2} \sum _{j=0}^\infty e^{-j/2} = e^{1/2} \frac{1}{1-e^{-1/2}} \le 4.2. \end{aligned}$$

Hence, $C(\gamma )\le 4.2/\gamma ^2$ and therefore $\lambda \le \epsilon /(4\cdot 4.2r^2)< \epsilon /(17r^2)$. From the definition of $\lambda $, we altogether have $\lambda = \min \{1/(2r),\epsilon /(17r^2),1/(\kappa \epsilon )\}$. Since $p(\ell )=4/(\lambda \epsilon )$, we know $p(\ell )=O(r / \epsilon + r^2/\epsilon ^2 + \kappa )$. Condition ($*$) of Theorem 15 has been established along with these bounds on $p(\ell )$ and $\lambda =\lambda (\ell )$.

To bound the probability of a success within $L(\ell )$ steps, we still need a bound on $D(\ell )=\max \{1,\mathrm {E}(e^{-\lambda (X_{t+1}-b)}\mid X_t\ge b)\}$. If 1 does not maximize the expression then

$$\begin{aligned} D(\ell )&= \mathrm {E}(e^{-\lambda (X_{t+1}-b)}\mid X_t\ge b) \le \mathrm {E}(e^{-\lambda |\varDelta |}\mid X_t\ge b) \\&\, \le 1+ \mathrm {E}(e^{\gamma \varDelta '}\mid X_t\ge b), \end{aligned}$$

where the first inequality follows from $X_t\ge b$ and the second one from $\gamma \ge \lambda $ along with the bound $+\,1$ for the positive terms as argued above. The last term can be bounded as in the above calculation leading to $\mathrm {E}(e^{\gamma \varDelta '})=O(1)$ since that estimation uses only the second condition, which holds conditional on $X_t > a$. Hence, in any case $D(\ell ) = O(1)$. Altogether, we have

$$\begin{aligned} e^{-\lambda (\ell )\cdot \ell }\cdot D(\ell ) \cdot p(\ell )&\le e^{-\lambda \ell } \cdot \frac{4}{\lambda \epsilon } \\&\quad = e^{-\lambda \ell \epsilon + \ln (4/(\lambda \epsilon ))} \end{aligned}$$

By the third condition, we have $\lambda \ell \ge 2\ln (4/(\lambda \epsilon ))$, which finally means that

$$\begin{aligned} e^{-\lambda (\ell )\cdot \ell }\cdot D(\ell ) \cdot p(\ell ) \le O(e^{-\lambda \ell \epsilon /2 )) } \end{aligned}$$

Choosing $L(\ell )=e^{\lambda \ell /4}$, Theorem 15 yields

$$\begin{aligned} {{\mathrm{Pr}}}(T(\ell )\le L(\ell )) \le L(\ell )\cdot O(e^{-\lambda \ell /4}) = O(e^{-\lambda \ell /4}), \end{aligned}$$

which proves the theorem. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Witt, C. Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax. Algorithmica 81, 632–667 (2019). https://doi.org/10.1007/s00453-018-0463-0

Download citation

Received: 13 June 2017
Accepted: 02 June 2018
Published: 11 June 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s00453-018-0463-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax

Abstract

Access this article

Similar content being viewed by others

Safeguarded augmented Lagrangian algorithms with scaled stopping criterion for the subproblems

The DIRECT algorithm: 25 years Later

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Theorem 3

Theorem 15

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax

Abstract

Access this article

Similar content being viewed by others

Safeguarded augmented Lagrangian algorithms with scaled stopping criterion for the subproblems

The DIRECT algorithm: 25 years Later

Algorithms for Scheduling Deadline-Sensitive Malleable Tasks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Theorem 3

Theorem 15

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation