Abstract
The Univariate Marginal Distribution Algorithm (UMDA) is a randomized search heuristic that builds a stochastic model of the underlying optimization problem by repeatedly sampling \(\lambda \) solutions and adjusting the model according to the best \(\mu \) samples. We present a running time analysis of the UMDA on the classical OneMax benchmark function for wide ranges of the parameters \(\mu \) and \(\lambda \). If \(\mu \ge c\log n\) for some constant \(c>0\) and \(\lambda =(1+\varTheta (1))\mu \), we obtain a general bound \(O(\mu n)\) on the expected running time. This bound crucially assumes that all marginal probabilities of the algorithm are confined to the interval \([1/n,1-1/n]\). If \(\mu \ge c' \sqrt{n}\log n\) for a constant \(c'>0\) and \(\lambda =(1+\varTheta (1))\mu \), the behavior of the algorithm changes and the bound on the expected running time becomes \(O(\mu \sqrt{n})\), which typically holds even if the borders on the marginal probabilities are omitted. The results supplement the recently derived lower bound \(\varOmega (\mu \sqrt{n}+n\log n)\) by Krejca and Witt (Proceedings of FOGA 2017, ACM Press, New York, pp 65–79, 2017) and turn out to be tight for the two very different choices \(\mu =c\log n\) and \(\mu =c'\sqrt{n}\log n\). They also improve the previously best known upper bound \(O(n\log n\log \log n)\) by Dang and Lehre (Proceedings of GECCO ’15, ACM Press, New York, pp 513–518, 2015) that was established for \(\mu =c\log n\) and \(\lambda =(1+\varTheta (1))\mu \).
Similar content being viewed by others
References
Baillon, J.-B., Cominetti, R., Vaisman, J.: A sharp uniform bound for the distribution of sums of Bernoulli trials. Comb. Probab. Comput. 25(3), 352–361 (2016)
Chen, T., Tang, K., Chen, G., Yao, X.: On the analysis of average time complexity of estimation of distribution algorithms. In: Proceedings of CEC ’07, pp. 453–460 (2007)
Chen, T., Lehre,P.K. , Tang, K., Yao, X.: When is an estimation of distribution algorithm better than an evolutionary algorithm? In: Proceedings of CEC ’09, pp. 1470–1477 (2009)
Chen, T., Tang, K., Chen, G., Yao, X.: Rigorous time complexity analysis of univariate marginal distribution algorithm with margins. In: Proceedings of CEC ’09, pp. 2157–2164 (2009)
Chen, T., Tang, K., Chen, G., Yao, X.: Analysis of computational time of simple estimation of distribution algorithms. IEEE Trans. Evol. Comput. 14(1), 1–22 (2010)
Dang, D.-C., Lehre, P.K.: Simplified runtime analysis of estimation of distribution algorithms. In: Proceedings of GECCO ’15, pp. 513–518. ACM Press, New York (2015)
Droste, S.: A rigorous analysis of the compact genetic algorithm for linear functions. Nat. Comput. 5(3), 257–283 (2006)
Friedrich, T., Kötzing, T., Krejca, M.S., Sutton, A.M.: The benefit of recombination in noisy evolutionary search. In: Proceedings of ISAAC ’15, pp. 140–150. Springer, Berlin (2015)
Friedrich, T., Kötzing, T., Krejca, M.S.: EDAs cannot be balanced and stable. In: Proceedings of GECCO ’16, pp. 1139–1146. ACM Press, New York (2016)
Hajek, B.: Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14, 502–525 (1982)
Hauschild, M., Pelikan, M.: An introduction and survey of estimation of distribution algorithms. Swarm Evol. Comput. 1(3), 111–128 (2011)
Johannsen, D.: Random combinatorial structures and randomized search heuristics. Ph.D. Thesis, Universität des Saarlandes, Germany (2010). http://scidok.sulb.uni-saarland.de/volltexte/2011/3529
Kaas, R., Buhrman, J.M.: Mean, median and mode in binomial distributions. Stat. Neerl. 34, 13–18 (1980)
Krejca, M.S., Witt, C.: Lower bounds on the run time of the univariate marginal distribution algorithm on OneMax. In: Proceedings of FOGA 2017, pp. 65–79. ACM Press, New York (2017)
Larrañaga, P., Lozano, J.A. (eds.): Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, Volume 2 of Genetic Algorithms and Evolutionary Computation. Springer, Berlin (2002)
Lehre, P.K., Nguyen, P.T.H.: Improved runtime bounds for the univariate marginal distribution algorithm via anti-concentration. In: Proceedings of GECCO ’17, pp. 414–434. ACM Press, New York (2017)
Lehre, P.K., Witt, C.: Concentrated hitting times of randomized search heuristics with variable drift. In: Proceedings of ISAAC ’14, pp. 686–697. Springer, Berlin (2014). Full technical report at arXiv:1307.2559
McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 195–247. Springer, Berlin (1998)
Mitavskiy, B., Rowe, J.E., Cannings, C.: Theoretical analysis of local search strategies to optimize network communication subject to preserving the total number of links. Int. J. Intell. Comput. Cybern. 2(2), 243–284 (2009)
Mühlenbein, H., Paass, G.: From recombination of genes to the estimation of distributions I. Binary parameters. In: Proceedings of PPSN IV, pp. 178–187. Springer, Berlin (1996)
Neumann, F., Sudholt, D., Witt, C.: A few ants are enough: ACO with iteration-best update. In: Proceedings of GECCO ’10, pp. 63–70. ACM Press, New York (2010)
Oliveto, P.S., Witt, C.: Improved time complexity analysis of the simple genetic algorithm. Theor. Comput. Sci. 605, 21–41 (2015)
Rowe, J.E., Sudholt, D.: The choice of the offspring population size in the (1, \(\lambda \)) evolutionary algorithm. Theor. Comput. Sci. 545, 20–38 (2014)
Samuels, S.M.: On the number of successes in independent trials. Ann. Math. Stat. 36(4), 1272–1278 (1965)
Sudholt, D.: A new method for lower bounds on the running time of evolutionary algorithms. IEEE Trans. Evol. Comput. 17(3), 418–435 (2013)
Sudholt, D., Witt, C.: Update strength in EDAs and ACO: how to avoid genetic drift. In: Proceedings of GECCO ’16, pp. 61–68. ACM Press, New York (2016)
Witt, C.: Tight bounds on the optimization time of a randomized search heuristic on linear functions. Comb. Probab. Comput. 22(2), 294–318 (2013)
Witt, C.: Upper bounds on the runtime of the univariate marginal distribution algorithm on OneMax. In: Proceedings of GECCO ’17, pp. 1415–1422 (2017)
Wu, Z., Michael, K., Möhring, R.H.: Stochastic runtime analysis of the cross-entropy algorithm. IEEE Trans. Evolut. Comput. 21(4), 616–628 (2017)
Acknowledgements
Financial support by the Danish Council for Independent Research (DFF-FNU 4002–00542) is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
An extended abstract of this article appeared in the proceedings of the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017) [28].
Appendix
Appendix
1.1 Proof of Theorem 3
We will use Hajek’s drift theorem to prove Lemma 3. As we are dealing with a stochastic process, we implicitly assume that the random variables \(X_t\), \(t\ge 0\), are adapted to some filtration \(\mathcal {F}_t\) such as the natural filtration \(X_0, \ldots ,X_t\), \(t\ge 0\).
We do not formulate the theorem using a potential/Lyapunov function g mapping from some state space to the reals either. Instead, we w. l. o. g. assume the random variables \(X_t\) as already obtained by the mapping.
The following theorem follows immediately from taking Conditions D1 and D2 in [10] and applying Inequality (2.8) in a union bound over \(L(\ell )\) time steps.
Theorem 15
[10] Let \(X_t\), \(t\ge 0\), be real-valued random variables describing a stochastic process over some state space, adapted to a filtration \(\mathcal {F}_t\). Pick two real numbers \(a(\ell )\) and \(b(\ell )\) depending on a parameter \(\ell \) such that \(a(\ell )<b(\ell )\) holds. Let \(T(\ell )\) be the random variable denoting the earliest point in time \(t\ge 0\) such that \(X_t\le a(\ell )\) holds. If there are \(\lambda (\ell )>0\) and \(p(\ell )>0\) such that the condition
holds for all \(t\ge 0\) then for all time bounds \(L(\ell )\ge 0\)
where \(D(\ell )=\max \bigl \{1,\mathrm {E}\bigl (e^{-\lambda (\ell )\cdot (X_{t+1}-b(\ell ))}\mid \mathcal {F}_t\,;\, X_t\ge b(\ell )\bigr )\bigr \}\).
Proof
(Proof of Theorem 3) We will apply Theorem 15 for suitable choices of its variables, some of which might depend on the parameter \(\ell =b-a\) denoting the length of the interval [a, b]. The following argumentation is also inspired by Hajek’s work [10].
By assumption \(\varDelta _t(X_{t+1}-X_t)\preceq X_{t+1}-X_t\). Clearly, for the process \(X'_t=X_0+\sum _{j=0}^{t-1} \varDelta _t(X_{t+1}-X_t)\) we have \(X_t'\preceq X_t\). Hence, the hitting time \(T^*\) for state less than a of the original process \(X_t\) is stochastically at least as big as the corresponding hitting time of the process \(X_t'\). In the following, we will therefore without further mention analyze \(X_t'\) instead of \(X_t\) and bound the tail of its hitting time. We work with \(\varDelta :=\varDelta _t\), which equals \(X'_{t+1}-X'_t\). We still use the old notation \(X_t\) instead of \(X'_t\).
The aim is to bound the moment-generating function (mgf.) from Condition (\(*\)). In this analysis, we for notational convenience often omit the filtration \(\mathcal {F}_t\). First we observe that it is sufficient to bound the mgf. of \(\varDelta \cdot \mathbb {1}\{\varDelta \le \kappa \epsilon \}\) since
using \(\varDelta \mathbb {1}\{\varDelta> \kappa \epsilon \}>0\) and hence \(e^{ - \lambda \varDelta \mathbb {1}\{\varDelta > \kappa \epsilon \}}\le 1\). In the following, we omit the factor \(\mathbb {1}\{\varDelta \le \kappa \epsilon \}\) but implicitly multiply \(\varDelta \) with it all the time. The same goes for \(\mathbb {1}\{a<X_t<b\}\).
To establish Condition (\(*\)), it is sufficient to identify values \(\lambda :=\lambda (\ell )>0\) and \(p(\ell )>0\) such that
Using the series expansion of the exponential function, we get
We first concentrate on the positive steps in the direction of the expected value, more precisely, we consider for any odd \(k\ge 3\)
Since we implicitly multiply with \(\mathbb {1}\{\varDelta \le \kappa \epsilon \}\), we have \(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\}\le (\kappa \epsilon )^k\) and hence \(|\mathrm {E}(\varDelta ^{k+1}\mathbb {1}\{\varDelta \ge 0\})/\mathrm {E}(\varDelta ^k\mathbb {1}\{\varDelta \ge 0\})|\le \kappa \epsilon \). By choosing \(\lambda \le 1/(\kappa \epsilon )\), we have
for \(k\ge 3\) since \((1/k!)/(1/(k+1)!)=k\). Hence,
where the first inequality used that \(\varDelta ^{2}\le \varDelta \kappa \epsilon \) due to our implicit multiplication with \(\mathbb {1}\{\varDelta \le \kappa \epsilon \}\) everywhere and the second used again \(\lambda \le 1/(\kappa \epsilon )\). So, we have estimated the contribution of all the positive steps by \(1-\lambda \mathrm {E}(\varDelta )/2\).
We proceed with the remaining terms. We overestimate the sum by using \(\varDelta ':=|\varDelta \cdot \mathbb {1}\{\varDelta <0\})|\) and bounding \((-\lambda ^k)\le \lambda ^k\) in all terms starting from \(k=2\). Incorporating the contribution of the positive steps, we obtain for all \(\gamma \ge \lambda \)
where the last inequality uses the first condition of the theorem, i. e., the bound on the drift.
Given any \(\gamma >0\), choosing \(\lambda :=\min \{1/(\kappa \epsilon ), \gamma , \epsilon /(4C(\gamma ))\}\) results in
with \(p(\ell ):=4/(\lambda \epsilon )\).
The aim is now to choose \(\gamma \) in such a way that \(\mathrm {E}(e^{\gamma \varDelta '})\) is bounded from above by a constant. We get
where the inequality uses the second condition of the theorem.
Choosing \(\gamma :=1/(2r)\) yields
Hence, \(C(\gamma )\le 4.2/\gamma ^2\) and therefore \(\lambda \le \epsilon /(4\cdot 4.2r^2)< \epsilon /(17r^2)\). From the definition of \(\lambda \), we altogether have \(\lambda = \min \{1/(2r),\epsilon /(17r^2),1/(\kappa \epsilon )\}\). Since \(p(\ell )=4/(\lambda \epsilon )\), we know \(p(\ell )=O(r / \epsilon + r^2/\epsilon ^2 + \kappa )\). Condition (\(*\)) of Theorem 15 has been established along with these bounds on \(p(\ell )\) and \(\lambda =\lambda (\ell )\).
To bound the probability of a success within \(L(\ell )\) steps, we still need a bound on \(D(\ell )=\max \{1,\mathrm {E}(e^{-\lambda (X_{t+1}-b)}\mid X_t\ge b)\}\). If 1 does not maximize the expression then
where the first inequality follows from \(X_t\ge b\) and the second one from \(\gamma \ge \lambda \) along with the bound \(+\,1\) for the positive terms as argued above. The last term can be bounded as in the above calculation leading to \(\mathrm {E}(e^{\gamma \varDelta '})=O(1)\) since that estimation uses only the second condition, which holds conditional on \(X_t > a\). Hence, in any case \(D(\ell ) = O(1)\). Altogether, we have
By the third condition, we have \(\lambda \ell \ge 2\ln (4/(\lambda \epsilon ))\), which finally means that
Choosing \(L(\ell )=e^{\lambda \ell /4}\), Theorem 15 yields
which proves the theorem. \(\square \)
Rights and permissions
About this article
Cite this article
Witt, C. Upper Bounds on the Running Time of the Univariate Marginal Distribution Algorithm on OneMax. Algorithmica 81, 632–667 (2019). https://doi.org/10.1007/s00453-018-0463-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-018-0463-0