Skip to main content
Log in

Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

The Univariate Marginal Distribution Algorithm (UMDA) is a randomized search heuristic that builds a stochastic model of the underlying optimization problem by repeatedly sampling \(\lambda \) solutions and adjusting the model according to the best \(\mu \) samples. We present a running time analysis of the UMDA on the classical OneMax benchmark function for wide ranges of the parameters \(\mu \) and \(\lambda \). If \(\mu \ge c\log n\) for some constant \(c>0\) and \(\lambda =(1+\varTheta (1))\mu \), we obtain a general bound \(O(\mu n)\) on the expected running time. This bound crucially assumes that all marginal probabilities of the algorithm are confined to the interval \([1/n,1-1/n]\). If \(\mu \ge c' \sqrt{n}\log n\) for a constant \(c'>0\) and \(\lambda =(1+\varTheta (1))\mu \), the behavior of the algorithm changes and the bound on the expected running time becomes \(O(\mu \sqrt{n})\), which typically holds even if the borders on the marginal probabilities are omitted. The results supplement the recently derived lower bound \(\varOmega (\mu \sqrt{n}+n\log n)\) by Krejca and Witt (Proceedings of FOGA 2017, ACM Press, New York, pp 65–79, 2017) and turn out to be tight for the two very different choices \(\mu =c\log n\) and \(\mu =c'\sqrt{n}\log n\). They also improve the previously best known upper bound \(O(n\log n\log \log n)\) by Dang and Lehre (Proceedings of GECCO ’15, ACM Press, New York, pp 513–518, 2015) that was established for \(\mu =c\log n\) and \(\lambda =(1+\varTheta (1))\mu \).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Baillon, J.-B., Cominetti, R., Vaisman, J.: A sharp uniform bound for the distribution of sums of Bernoulli trials. Comb. Probab. Comput. 25(3), 352–361 (2016)

  2. Chen, T., Tang, K., Chen, G., Yao, X.: On the analysis of average time complexity of estimation of distribution algorithms. In: Proceedings of CEC ’07, pp. 453–460 (2007)

  3. Chen, T., Lehre,P.K. , Tang, K., Yao, X.: When is an estimation of distribution algorithm better than an evolutionary algorithm? In: Proceedings of CEC ’09, pp. 1470–1477 (2009)

  4. Chen, T., Tang, K., Chen, G., Yao, X.: Rigorous time complexity analysis of univariate marginal distribution algorithm with margins. In: Proceedings of CEC ’09, pp. 2157–2164 (2009)

  5. Chen, T., Tang, K., Chen, G., Yao, X.: Analysis of computational time of simple estimation of distribution algorithms. IEEE Trans. Evol. Comput. 14(1), 1–22 (2010)

    Article  Google Scholar 

  6. Dang, D.-C., Lehre, P.K.: Simplified runtime analysis of estimation of distribution algorithms. In: Proceedings of GECCO ’15, pp. 513–518. ACM Press, New York (2015)

  7. Droste, S.: A rigorous analysis of the compact genetic algorithm for linear functions. Nat. Comput. 5(3), 257–283 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Friedrich, T., Kötzing, T., Krejca, M.S., Sutton, A.M.: The benefit of recombination in noisy evolutionary search. In: Proceedings of ISAAC ’15, pp. 140–150. Springer, Berlin (2015)

  9. Friedrich, T., Kötzing, T., Krejca, M.S.: EDAs cannot be balanced and stable. In: Proceedings of GECCO ’16, pp. 1139–1146. ACM Press, New York (2016)

  10. Hajek, B.: Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14, 502–525 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hauschild, M., Pelikan, M.: An introduction and survey of estimation of distribution algorithms. Swarm Evol. Comput. 1(3), 111–128 (2011)

    Article  Google Scholar 

  12. Johannsen, D.: Random combinatorial structures and randomized search heuristics. Ph.D. Thesis, Universität des Saarlandes, Germany (2010). http://scidok.sulb.uni-saarland.de/volltexte/2011/3529

  13. Kaas, R., Buhrman, J.M.: Mean, median and mode in binomial distributions. Stat. Neerl. 34, 13–18 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  14. Krejca, M.S., Witt, C.: Lower bounds on the run time of the univariate marginal distribution algorithm on OneMax. In: Proceedings of FOGA 2017, pp. 65–79. ACM Press, New York (2017)

  15. Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, Volume 2 of Genetic Algorithms and Evolutionary Computation. Springer, Berlin (2002)

    MATH  Google Scholar 

  16. Lehre, P.K., Nguyen, P.T.H.: Improved runtime bounds for the univariate marginal distribution algorithm via anti-concentration. In: Proceedings of GECCO ’17, pp. 414–434. ACM Press, New York (2017)

  17. Lehre, P.K., Witt, C.: Concentrated hitting times of randomized search heuristics with variable drift. In: Proceedings of ISAAC ’14, pp. 686–697. Springer, Berlin (2014). Full technical report at arXiv:1307.2559

  18. McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 195–247. Springer, Berlin (1998)

    Chapter  Google Scholar 

  19. Mitavskiy, B., Rowe, J.E., Cannings, C.: Theoretical analysis of local search strategies to optimize network communication subject to preserving the total number of links. Int. J. Intell. Comput. Cybern. 2(2), 243–284 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  20. Mühlenbein, H., Paass, G.: From recombination of genes to the estimation of distributions I. Binary parameters. In: Proceedings of PPSN IV, pp. 178–187. Springer, Berlin (1996)

  21. Neumann, F., Sudholt, D., Witt, C.: A few ants are enough: ACO with iteration-best update. In: Proceedings of GECCO ’10, pp. 63–70. ACM Press, New York (2010)

  22. Oliveto, P.S., Witt, C.: Improved time complexity analysis of the simple genetic algorithm. Theor. Comput. Sci. 605, 21–41 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  23. Rowe, J.E., Sudholt, D.: The choice of the offspring population size in the (1, \(\lambda \)) evolutionary algorithm. Theor. Comput. Sci. 545, 20–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  24. Samuels, S.M.: On the number of successes in independent trials. Ann. Math. Stat. 36(4), 1272–1278 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  25. Sudholt, D.: A new method for lower bounds on the running time of evolutionary algorithms. IEEE Trans. Evol. Comput. 17(3), 418–435 (2013)

    Article  Google Scholar 

  26. Sudholt, D., Witt, C.: Update strength in EDAs and ACO: how to avoid genetic drift. In: Proceedings of GECCO ’16, pp. 61–68. ACM Press, New York (2016)

  27. Witt, C.: Tight bounds on the optimization time of a randomized search heuristic on linear functions. Comb. Probab. Comput. 22(2), 294–318 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  28. Witt, C.: Upper bounds on the runtime of the univariate marginal distribution algorithm on OneMax. In: Proceedings of GECCO ’17, pp. 1415–1422 (2017)

  29. Wu, Z., Michael, K., Möhring, R.H.: Stochastic runtime analysis of the cross-entropy algorithm. IEEE Trans. Evolut. Comput. 21(4), 616–628 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Financial support by the Danish Council for Independent Research (DFF-FNU 4002–00542) is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carsten Witt.

Additional information

An extended abstract of this article appeared in the proceedings of the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017) [28].

Appendix

Appendix

1.1 Proof of Theorem 3

We will use Hajek’s drift theorem to prove Lemma 3. As we are dealing with a stochastic process, we implicitly assume that the random variables \(X_t\), \(t\ge 0\), are adapted to some filtration \(\mathcal {F}_t\) such as the natural filtration \(X_0, \ldots ,X_t\), \(t\ge 0\).

We do not formulate the theorem using a potential/Lyapunov function g mapping from some state space to the reals either. Instead, we w. l. o. g. assume the random variables \(X_t\) as already obtained by the mapping.

The following theorem follows immediately from taking Conditions D1 and D2 in [10] and applying Inequality (2.8) in a union bound over \(L(\ell )\) time steps.

Theorem 15

[10] Let \(X_t\), \(t\ge 0\), be real-valued random variables describing a stochastic process over some state space, adapted to a filtration \(\mathcal {F}_t\). Pick two real numbers \(a(\ell )\) and \(b(\ell )\) depending on a parameter \(\ell \) such that \(a(\ell )<b(\ell )\) holds. Let \(T(\ell )\) be the random variable denoting the earliest point in time \(t\ge 0\) such that \(X_t\le a(\ell )\) holds. If there are \(\lambda (\ell )>0\) and \(p(\ell )>0\) such that the condition

figure b

holds for all \(t\ge 0\) then for all time bounds \(L(\ell )\ge 0\)

$$\begin{aligned} {{\mathrm{Pr}}}\bigl (T(\ell )\le L(\ell ) \mid X_0\ge b(\ell )\bigr ) \le e^{-\lambda (\ell )\cdot (b(\ell )-a(\ell ))}\cdot L(\ell )\cdot D(\ell )\cdot p(\ell ), \end{aligned}$$

where \(D(\ell )=\max \bigl \{1,\mathrm {E}\bigl (e^{-\lambda (\ell )\cdot (X_{t+1}-b(\ell ))}\mid \mathcal {F}_t\,;\, X_t\ge b(\ell )\bigr )\bigr \}\).

Proof

(Proof of Theorem 3) We will apply Theorem 15 for suitable choices of its variables, some of which might depend on the parameter \(\ell =b-a\) denoting the length of the interval [ab]. The following argumentation is also inspired by Hajek’s work [10].

By assumption \(\varDelta _t(X_{t+1}-X_t)\preceq X_{t+1}-X_t\). Clearly, for the process \(X'_t=X_0+\sum _{j=0}^{t-1} \varDelta _t(X_{t+1}-X_t)\) we have \(X_t'\preceq X_t\). Hence, the hitting time \(T^*\) for state less than a of the original process \(X_t\) is stochastically at least as big as the corresponding hitting time of the process \(X_t'\). In the following, we will therefore without further mention analyze \(X_t'\) instead of \(X_t\) and bound the tail of its hitting time. We work with \(\varDelta :=\varDelta _t\), which equals \(X'_{t+1}-X'_t\). We still use the old notation \(X_t\) instead of \(X'_t\).

The aim is to bound the moment-generating function (mgf.) from Condition (\(*\)). In this analysis, we for notational convenience often omit the filtration \(\mathcal {F}_t\). First we observe that it is sufficient to bound the mgf. of \(\varDelta \cdot \mathbb {1}\{\varDelta \le \kappa \epsilon \}\) since

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta })&= \mathrm {E}(e^{-\lambda \varDelta \mathbb {1}\{\varDelta \le \kappa \epsilon \} - \lambda \varDelta \mathbb {1}\{\varDelta> \kappa \epsilon \}})\\&= \mathrm {E}(e^{-\lambda \varDelta \mathbb {1}\{\varDelta \le \kappa \epsilon \}} e^{ - \lambda \varDelta \mathbb {1}\{\varDelta > \kappa \epsilon \}}) \le \mathrm {E}(e^{-\lambda \varDelta \mathbb {1}\{\varDelta \le \kappa \epsilon \}}), \end{aligned}$$

using \(\varDelta \mathbb {1}\{\varDelta> \kappa \epsilon \}>0\) and hence \(e^{ - \lambda \varDelta \mathbb {1}\{\varDelta > \kappa \epsilon \}}\le 1\). In the following, we omit the factor \(\mathbb {1}\{\varDelta \le \kappa \epsilon \}\) but implicitly multiply \(\varDelta \) with it all the time. The same goes for \(\mathbb {1}\{a<X_t<b\}\).

To establish Condition (\(*\)), it is sufficient to identify values \(\lambda :=\lambda (\ell )>0\) and \(p(\ell )>0\) such that

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta }\mathbb {1}\{a<X_t<b\}) \le 1-\frac{1}{p(\ell )}. \end{aligned}$$

Using the series expansion of the exponential function, we get

$$\begin{aligned}&\mathrm {E}(e^{-\lambda \varDelta }\mathbb {1}\{a<X_t<b\}) = 1 - \lambda \mathrm {E}(\varDelta ) + \sum _{k=2}^\infty \frac{(-\lambda )^{k}}{k!} \mathrm {E}(\varDelta ^k)\\&\quad = 1 - \lambda \mathrm {E}(\varDelta ) + \sum _{k=2}^\infty \frac{(-\lambda )^{k}}{k!} \left( \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}) + \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta < 0\})\right) . \end{aligned}$$

We first concentrate on the positive steps in the direction of the expected value, more precisely, we consider for any odd \(k\ge 3\)

$$\begin{aligned} M_k :=\frac{\lambda ^{k}}{k!} \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}) - \frac{\lambda ^{k+1}}{(k+1)!} \mathrm {E}(\varDelta ^{k+1}\mathbb {1}\{\varDelta \ge 0\}). \end{aligned}$$

Since we implicitly multiply with \(\mathbb {1}\{\varDelta \le \kappa \epsilon \}\), we have \(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}\le (\kappa \epsilon )^k\) and hence \(|\mathrm {E}(\varDelta ^{k+1}\mathbb {1}\{\varDelta \ge 0\})/\mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\})|\le \kappa \epsilon \). By choosing \(\lambda \le 1/(\kappa \epsilon )\), we have

$$\begin{aligned} M_k\ge \frac{\lambda ^k}{k!} \mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}) - \frac{\lambda ^{k}}{\kappa \epsilon (k+1)!} \kappa \epsilon \mathrm {E}(\varDelta ^{k}\mathbb {1}\{\varDelta \ge 0\}) \ge 0, \end{aligned}$$

for \(k\ge 3\) since \((1/k!)/(1/(k+1)!)=k\). Hence,

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta })&\le 1- \lambda \mathrm {E}(\varDelta ) + \frac{\lambda ^2 }{2} \mathrm {E}(\varDelta ^2\mathbb {1}\{\varDelta \ge 0\}) \\&\le 1- \lambda \mathrm {E}(\varDelta ) + \frac{\lambda ^2 }{2} \mathrm {E}(\varDelta \cdot \kappa \epsilon \cdot \mathbb {1}\{\varDelta \ge 0\})\\&\le 1-\lambda \mathrm {E}(\varDelta ) + \lambda \frac{1}{2\kappa \epsilon }\cdot \kappa \epsilon \cdot \mathrm {E}(\varDelta ) \le 1-\lambda \epsilon /2 \end{aligned}$$

where the first inequality used that \(\varDelta ^{2}\le \varDelta \kappa \epsilon \) due to our implicit multiplication with \(\mathbb {1}\{\varDelta \le \kappa \epsilon \}\) everywhere and the second used again \(\lambda \le 1/(\kappa \epsilon )\). So, we have estimated the contribution of all the positive steps by \(1-\lambda \mathrm {E}(\varDelta )/2\).

We proceed with the remaining terms. We overestimate the sum by using \(\varDelta ':=|\varDelta \cdot \mathbb {1}\{\varDelta <0\})|\) and bounding \((-\lambda ^k)\le \lambda ^k\) in all terms starting from \(k=2\). Incorporating the contribution of the positive steps, we obtain for all \(\gamma \ge \lambda \)

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta })&\le 1 - \frac{\lambda }{2} \mathrm {E}(\varDelta ) + \frac{\lambda ^2}{\gamma ^2} \sum _{k=2}^\infty \frac{\gamma ^{k}}{k!} \mathrm {E}(\varDelta '^k)\\&\le 1 - \frac{\lambda }{2} \mathrm {E}(\varDelta )+ \frac{\lambda ^2}{\gamma ^2} \sum _{k=0}^\infty \frac{\gamma ^{k}}{k!} \mathrm {E}(\varDelta '^k) \le 1 -\frac{\lambda }{2} \epsilon + \lambda ^2 \underbrace{\frac{\mathrm {E}(e^{\gamma \varDelta '})}{\gamma ^2}}_{=:C(\gamma )}, \end{aligned}$$

where the last inequality uses the first condition of the theorem, i. e., the bound on the drift.

Given any \(\gamma >0\), choosing \(\lambda :=\min \{1/(\kappa \epsilon ), \gamma , \epsilon /(4C(\gamma ))\}\) results in

$$\begin{aligned} \mathrm {E}(e^{-\lambda \varDelta }\mathbb {1}\{a<X_t<b\}) \le 1- \frac{\lambda }{2}\epsilon + \lambda \cdot \frac{\epsilon }{4C(\gamma )}\cdot C(\gamma ) = 1-\frac{\lambda \epsilon }{4} = 1-\frac{1}{p(\ell )} \end{aligned}$$

with \(p(\ell ):=4/(\lambda \epsilon )\).

The aim is now to choose \(\gamma \) in such a way that \(\mathrm {E}(e^{\gamma \varDelta '})\) is bounded from above by a constant. We get

$$\begin{aligned} \mathrm {E}(e^{\gamma \varDelta '}) \le \sum _{j=0}^\infty e^{\gamma (j+1)r} {{\mathrm{Pr}}}(\varDelta \le -jr) \le \sum _{j=0}^\infty e^{\gamma (j+1)r} e^{- j } \end{aligned}$$

where the inequality uses the second condition of the theorem.

Choosing \(\gamma :=1/(2r)\) yields

$$\begin{aligned} \mathrm {E}(e^{\gamma \varDelta '})&\le \sum _{j=0}^\infty e^{(j+1)/2 - j } = e^{1/2} \sum _{j=0}^\infty e^{-j/2} = e^{1/2} \frac{1}{1-e^{-1/2}} \le 4.2. \end{aligned}$$

Hence, \(C(\gamma )\le 4.2/\gamma ^2\) and therefore \(\lambda \le \epsilon /(4\cdot 4.2r^2)< \epsilon /(17r^2)\). From the definition of \(\lambda \), we altogether have \(\lambda = \min \{1/(2r),\epsilon /(17r^2),1/(\kappa \epsilon )\}\). Since \(p(\ell )=4/(\lambda \epsilon )\), we know \(p(\ell )=O(r / \epsilon + r^2/\epsilon ^2 + \kappa )\). Condition (\(*\)) of Theorem 15 has been established along with these bounds on \(p(\ell )\) and \(\lambda =\lambda (\ell )\).

To bound the probability of a success within \(L(\ell )\) steps, we still need a bound on \(D(\ell )=\max \{1,\mathrm {E}(e^{-\lambda (X_{t+1}-b)}\mid X_t\ge b)\}\). If 1 does not maximize the expression then

$$\begin{aligned} D(\ell )&= \mathrm {E}(e^{-\lambda (X_{t+1}-b)}\mid X_t\ge b) \le \mathrm {E}(e^{-\lambda |\varDelta |}\mid X_t\ge b) \\&\, \le 1+ \mathrm {E}(e^{\gamma \varDelta '}\mid X_t\ge b), \end{aligned}$$

where the first inequality follows from \(X_t\ge b\) and the second one from \(\gamma \ge \lambda \) along with the bound \(+\,1\) for the positive terms as argued above. The last term can be bounded as in the above calculation leading to \(\mathrm {E}(e^{\gamma \varDelta '})=O(1)\) since that estimation uses only the second condition, which holds conditional on \(X_t > a\). Hence, in any case \(D(\ell ) = O(1)\). Altogether, we have

$$\begin{aligned} e^{-\lambda (\ell )\cdot \ell }\cdot D(\ell ) \cdot p(\ell )&\le e^{-\lambda \ell } \cdot \frac{4}{\lambda \epsilon } \\&\quad = e^{-\lambda \ell \epsilon + \ln (4/(\lambda \epsilon ))} \end{aligned}$$

By the third condition, we have \(\lambda \ell \ge 2\ln (4/(\lambda \epsilon ))\), which finally means that

$$\begin{aligned} e^{-\lambda (\ell )\cdot \ell }\cdot D(\ell ) \cdot p(\ell ) \le O(e^{-\lambda \ell \epsilon /2 )) } \end{aligned}$$

Choosing \(L(\ell )=e^{\lambda \ell /4}\), Theorem 15 yields

$$\begin{aligned} {{\mathrm{Pr}}}(T(\ell )\le L(\ell )) \le L(\ell )\cdot O(e^{-\lambda \ell /4}) = O(e^{-\lambda \ell /4}), \end{aligned}$$

which proves the theorem. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Witt, C. Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax. Algorithmica 81, 632–667 (2019). https://doi.org/10.1007/s00453-018-0463-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-018-0463-0

Keywords

Navigation