Abstract
We propose and study a version of simulated annealing (SA) on continuous state spaces based on \((t,s)_R\)-sequences. The parameter \(R\in \bar{\mathbb {N}}\) regulates the degree of randomness of the input sequence, with the case \(R=0\) corresponding to IID uniform random numbers and the limiting case \(R=\infty \) to (t, s)-sequences. Our main result, obtained for rectangular domains, shows that the resulting optimization method, which we refer to as QMC-SA, converges almost surely to the global optimum of the objective function \(\varphi \) for any \(R\in \mathbb {N}\). When \(\varphi \) is univariate, we are in addition able to show that the completely deterministic version of QMC-SA is convergent. A key property of these results is that they do not require objective-dependent conditions on the cooling schedule. As a corollary of our theoretical analysis, we provide a new almost sure convergence result for SA which shares this property under minimal assumptions on \(\varphi \). We further explain how our results in fact apply to a broader class of optimization methods including for example threshold accepting, for which to our knowledge no convergence results currently exist. We finally illustrate the superiority of QMC-SA over SA algorithms in a numerical study.
Similar content being viewed by others
References
Alabduljabbar, A., Milanovic, J., Al-Eid, E.: Low discrepancy sequences based optimization algorithm for tuning psss. In: Proceedings of the 10th International Conference on Probabilistic Methods Applied to Power Systems, PMAPS’08, pp. 1–9. IEEE (2008)
Althöfer, I., Koschnick, K.-U.: On the convergence of “Threshold accepting”. Appl. Math. Optim. 24(1), 183–195 (1991)
Andrieu, C., Breyer, L.A., Doucet, A.: Convergence of simulated annealing using Foster–Lyapunov criteria. J. Appl. Prob. 38(4), 975–994 (2001)
Andrieu, C., Doucet, A.: Simulated annealing for maximum a posteriori parameter estimation of hidden Markov models. IEEE Trans. Inf. Theory 46(3), 994–1004 (2000)
Bélisle, C.J.P.: Convergence theorems for a class of simulated annealing algorithms on \(\mathbb{R}^d\). J. Appl. Prob. 29(4), 885–895 (1992)
Bornn, L., Shaddick, G., Zidek, J.V.: Modeling nonstationary processes through dimension expansion. J. Am. Stat. Assoc. 107(497), 281–289 (2012)
Chen, J., Suarez, J., Molnar, P., Behal, A.: Maximum likelihood parameter estimation in a stochastic resonate-and-fire neuronal model. In: 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), pp. 57–62. IEEE (2011)
Chen, S., Luk, B.L.: Adaptive simulated annealing for optimization in signal processing applications. Signal Process. 79(1), 117–128 (1999)
Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration. Cambridge University Press, Cambridge (2010)
Dueck, G., Scheuer, T.: Threshold accepting: a general purpose optimization algorithm appearing superior to simulated annealing. J. Comput. Phys. 90(1), 161–175 (1990)
Fang, K., Winker, P., Hickernell, F.J.: Some global optimization algorithms in statistics. In: Du, D.Z., Zhang, Z.S., Cheng, K. (eds.) Operations Research and Its Applications. Lecture Notes in Operations Research, vol. 2. World Publishing Corp, New York (1996)
Fang, K.T.: Some applications of quasi-Monte Carlo methods in statistics. In: Monte Carlo and Quasi-Monte Carlo Methods 2000, pp. 10–26. Springer (2002)
Gelfand, S.B., Mitter, S.K.: Recursive stochastic algorithms for global optimization in \(\mathbb{R}^d\). SIAM J. Control Optim. 29(5), 999–1018 (1991)
Gelfand, S.B., Mitter, S.K.: Metropolis-type annealing algorithms for global optimization in \(\mathbb{R}^d\). SIAM J. Control Optim. 31(1), 111–131 (1993)
Geman, S., Hwang, C.-R.: Diffusions for global optimization. SIAM J. Control Optim. 24(5), 1031–1043 (1986)
Gerber, M., Chopin, N.: Sequential Quasi-Monte Carlo. J. R. Stat. Soc. B 77(3), 509–579 (2015)
Girard, T., Staraj, R., Cambiaggio, E., Muller, F.: A simulated annealing algorithm for planar or conformal antenna array synthesis with optimized polarization. Microw. Opt. Technol. Lett. 28(2), 86–89 (2001)
Goffe, W.L., Ferrier, G.D., Rogers, J.: Global optimization of statistical functions with simulated annealing. J. Econom. 60(1), 65–99 (1994)
Haario, H., Saksman, E.: Simulated annealing process in general state space. Adv. Appl. Probab. 23, 866–893 (1991)
He, Z., Owen, A.B.: Extensible grids: uniform sampling on a space filling curve. J. R. Stat. Soc.: Ser. B (2015)
Hickernell, F.J., Yuan, Y.-X.: A simple multistart algorithm for global optimization. OR Trans. 1(2), 1–12 (1997)
Hong, H.S., Hickernell, F.J.: Algorithm 823: implementing scrambled digital sequences. ACM Trans. Math. Softw. 29(2), 95–109 (2003)
Ingber, L.: Very fast simulated re-annealing. Math. Comput. Model. 12(8), 967–973 (1989)
Ireland, J.: Simulated annealing and Bayesian posterior distribution analysis applied to spectral emission line fitting. Solar Phys. 243(2), 237–252 (2007)
Jiao, Y.-C., Dang, C., Leung, Y., Hao, Y.: A modification to the new version of the price’s algorithm for continuous global optimization problems. J. Global Optim. 36(4), 609–626 (2006)
Lecchini-Visintini, A., Lygeros, J., Maciejowski, J.M.: Stochastic optimization on continuous domains with finite-time guarantees by Markov Chain Monte Carlo methods. IEEE Trans. Autom. Control 55(12), 2858–2863 (2010)
Lei, G.: Adaptive random search in quasi-Monte Carlo methods for global optimization. Comput. Math. Appl. 43(6), 747–754 (2002)
Locatelli, M.: Convergence properties of simulated annealing for continuous global optimization. J. Appl. Prob. 33(4), 1127–1140 (1991)
Locatelli, M.: Convergence of a simulated annealing algorithm for continuous global optimization. J. Global Optim. 18(3), 219–233 (2000)
Locatelli, M.: Simulated annealing algorithms for continuous global optimization. In: Handbook of global optimization, pp. 179–229. Springer (2002)
Moscato, P., Fontanari, J.F.: Stochastic versus deterministic update in simulated annealing. Phys. Lett. A 146(4), 204–208 (1990)
Niederreiter, H.: A quasi-Monte Carlo method for the approximate computation of the extreme values of a function. In: Studies in Pure Mathematics, pp. 523–529. Springer (1983)
Niederreiter, H.: Point sets and sequences with small discrepancy. Monatshefte für Mathematik 104(4), 273–337 (1987)
Niederreiter, H.: Random number generation and quasi-Monte Carlo methods. In: CBMS-NSF Regional Conference Series in Applied Mathematics (1992)
Niederreiter, H., Peart, P.: Localization of search in quasi-Monte Carlo methods for global optimization. SIAM J. Sci. Stat. Comput. 7(2), 660–664 (1986)
Nikolaev, A.G., Jacobson, S.H.: Simulated annealing. In: Handbook of Metaheuristics, pp. 1–39. Springer (2010)
Owen, A.B.: Randomly permuted \((t, m, s)\)-nets and \((t, s)\)-sequences. In: Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing. Lecture Notes in Statististics, vol. 106, pp. 299–317. Springer, New York (1995)
Pistovčák, F., Breuer, T.: Using quasi-Monte Carlo scenarios in risk management. In: Monte Carlo and Quasi-Monte Carlo Methods 2002, pp. 379–392. Springer (2004)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer-Verlag, New York (2004)
Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470–472 (1952)
Rubenthaler, S., Rydén, T., Wiktorsson, M.: Fast simulated annealing in \(\mathbb{R}^d\) with an application to maximum likelihood estimation in state-space models. Stoch. Process. Appl. 119(6), 1912–1931 (2009)
Winker, P., Maringer, D.: The threshold accepting optimisation algorithm in economics and statistics. In: Optimisation, Econometric and Financial Analysis, pp. 107–125. Springer (2007)
Zhang, H., Bonilla-Petriciolet, A., Rangaiah, G.P.: A review on global optimization methods for phase equilibrium modeling and calculations. Open Thermodyn. J. 5(S1), 71–92 (2011)
Acknowledgments
The authors acknowledge support from DARPA under Grant No. FA8750-14-2-0117. The authors also thank Christophe Andrieu, Pierre Jacob and Art Owen for insightful discussions and useful feedback.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Lemma 1
Let \(n\in \mathbb {N}\), \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\), \(\delta _{\mathcal {X}}=0.5\) and \(\delta \in (0,\delta _{\mathcal {X}}]\). Then, by Assumption (A1), \(F_K^{-1}(\tilde{\mathbf {x}},\mathbf {u}_1^n)\in B_{\delta }(\mathbf {x}^{\prime })\) if and only if \( \mathbf {u}_1^n\in F_K(\tilde{\mathbf {x}},B_{\delta }(\mathbf {x}^{\prime }))\). We now show that, for \(\delta \) small enough, there exists a closed hypercube \(W(\tilde{\mathbf {x}},\mathbf {x}^{\prime }, \delta )\subset [0,1)^d\) such that \(W(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\subseteq F_K(\mathbf {x},B_{\delta }(\mathbf {x}^{\prime }))\) for all \(\mathbf {x}\in B_{v_K(\delta )}(\tilde{\mathbf {x}})\), with \(v_K(\cdot )\) as in the statement of the lemma.
To see this note that, because \(K(\mathbf {x},\mathrm {d}\mathbf {y})\) admits a density \(K(\mathbf {y}|\mathbf {x})\) which is continuous on the compact set \(\mathcal {X}^2\), and using Assumption (A2), it is easy to see that, for \(i\in 1:d\), \(K_i(y_i|y_{1:i-1},\mathbf {x})\ge \tilde{K}\) for all \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\) and for a constant \(\tilde{K}>0\). Consequently, for any \(\delta \in [0,0.5]\) and \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\),
where \(F_{K_i}(\cdot {}|\mathbf {x}, y_{1:i-1})\) denotes the CDF of the probability measure \(K_i(\mathbf {x},y_{1:i-1},\mathrm {d}y_i)\), with the convention that \(F_{K_i}(\cdot {}|\mathbf {x}, y_{1:i-1})=F_{K_1}(\cdot {}|\mathbf {x})\) when \(i=1\). Note that the right-hand side of (6) is \(\tilde{K}\delta \) and not \(2\tilde{K}\delta \) to encompass the case where either \(x_i^{\prime }-\delta \not \in [0,1]\) or \(x_i^{\prime }+\delta \not \in [0,1]\). (Note also that because \(\delta \le 0.5\) we cannot have both \(x_i^{\prime }-\delta \not \in [0,1]\) and \(x_i^{\prime }+\delta \not \in [0,1]\).)
For \(i\in 1:d\) and \(\delta ^{\prime }>0\), let
be the (optimal) modulus of continuity of \(F_{K_i}(\cdot |\cdot )\). Since \(F_{K_i}\) is uniformly continuous on the compact set \([0,1]^{d+i}\), the mapping \(\omega _i(\cdot )\) is continuous and \(\omega _i(\delta ^{\prime })\rightarrow 0\) as \(\delta ^{\prime }\rightarrow 0\). In addition, because \(F_{K_i}\left( \cdot |\mathbf {x}, y_{1:i-1}\right) \) is strictly increasing on [0, 1] for all \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\), \(\omega _i(\cdot )\) is strictly increasing on (0, 1]. Let \(\tilde{K}\) be small enough so that, for \(i\in 1:d\), \(0.25\tilde{K}\delta _{\mathcal {X}}\le w_i(1)\) and let \(\tilde{\delta }_i(\cdot )\) be the mapping \( z\in (0,\delta _{\mathcal {X}}]\longmapsto \tilde{\delta }_i(z)=\omega _i^{-1} (0.25\tilde{K}z)\). Remark that the function \(\tilde{\delta }_i(\cdot )\) is independent of \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\), continuous and strictly increasing on \((0,\delta _{\mathcal {X}}]\) and such that \(\tilde{\delta }_i(\delta ^{\prime })\rightarrow 0\) as \(\delta ^{\prime }\rightarrow 0\).
For \(\mathbf {x}\in \mathcal {X}\), \(\delta ^{\prime }>0\) and \(\delta ^{\prime }_i>0\), \(i\in 1:d\), let
and
Then, for any \(\delta ^{\prime }>0\) and for all \((\mathbf {x},y_{1:i-1})\in B_{\tilde{\delta }_i(\delta )}(\tilde{\mathbf {x}})\times B^{i-1}_{\tilde{\delta }_i(\delta )}(\mathbf {x}^{\prime })\), we have
For \(i\in 1:d\) and \(\delta ^{\prime }\in (0,\delta _{\mathcal {X}}]\), let \(\delta _i(\delta ^{\prime })=\tilde{\delta }_i(\delta ^{\prime })\wedge \delta ^{\prime }\) and note that the function \(\delta _i(\cdot )\) is continuous and strictly increasing on \((0,\delta _{\mathcal {X}}]\). Let \(\delta _d=\delta _d(\delta )\) and define recursively \(\delta _{i}=\delta _{i}(\delta _{i+1})\), \(i\in 1:(d-1)\), so that \(\delta \ge \delta _d\ge \dots \ge \delta _1>0\). For \(i\in 1:d\), let
and
Then, since \(F_{K_i}(\cdot |\cdot )\) is continuous and the set \(B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\) is compact, there exists points \((\underline{\mathbf {x}}^i,\underline{y}^{i}_{1:i-1})\) and \((\bar{\mathbf {x}}^i,\bar{y}^{i}_{1:i-1})\) in \(B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\) such that
In addition, by the construction of the \(\delta _i\)’s, \(B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\subseteq B_{\tilde{\delta }_i(\delta _{i})}(\tilde{\mathbf {x}})\times B^i_{\tilde{\delta }_i(\delta _{i})}(\mathbf {x}^{\prime })\) for all \(i\in 1:d\). Therefore, using (6)–(8), we have, for all \(i\in 1:d\),
Consequently, for all \(i\in 1:d\) and for all \((\mathbf {x},y_{1:i-1})\in B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\),
Let \(\underline{S}_{\delta }=0.5\tilde{K}\delta _1\). Then, this shows that there exists a closed hypercube \(\underline{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\) of side \(\underline{S}_{\delta }\) such that
where we set \(v_K(\delta )=\delta _1\). Note that \(v_K(\delta )\in (0,\delta ]\) and thus \(v_K(\delta )\rightarrow 0\) as \(\delta \rightarrow 0\), as required. In addition, \(v_K(\cdot )=\delta _1\circ \dots \delta _d(\cdot )\) is continuous and strictly increasing on \((0,\delta _{\mathcal {X}}]\) because the functions \(\delta _i(\cdot )\), \(i\in 1:d\), are continuous and strictly increasing on this set. Note also that \(v_K(\cdot )\) does not depend on \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\).
To conclude the proof, let
and note that, if \(\delta \) is small enough, \(k_{\delta }\ge t+d\) because \(\underline{S}_{\delta }\rightarrow 0\) as \(\delta \rightarrow 0\). Let \(\bar{\delta }_K\) be the largest value of \(\delta ^{\prime }\le \delta _{\mathcal {X}}\) such that \(k_{\delta ^{\prime }}\ge t+d\). Let \(\delta \in (0,\bar{\delta }_K]\) and \(t_{\delta ,d}\in t:(t+d)\) be such that \((k_{\delta }-t_{\delta ,d})/d\) is an integer. Let \(\{E(j,\delta )\}_{j=1}^{b^{k_{\delta }-t_{\delta ,d}}}\) be the partition of \([0,1)^d\) into elementary intervals of volume \(b^{t_{\delta ,d}-k_{\delta }}\) so that any closed hypercube of side \(\underline{S}_{\delta }\) contains at least one elementary interval \(E(j,\delta )\) for a \(j\in 1:b^{k_{\delta }-t_{\delta ,d}}\). Hence, there exists a \(j_{\tilde{\mathbf {x}},\mathbf {x}^{\prime }}\in 1:b^{k_{\delta }-t_{\delta ,d}}\) such that
Let \(a\in \mathbb {N}\) and note that, by the properties of (t, s)-sequences in base b, the point set \(\{\mathbf {u}^n\}_{n=ab^{k_{\delta }}}^{(a+1)b^{k_{\delta }}-1}\) is a \((t,k_{\delta },d)\)-net in base b because \(k_{\delta }> t\). In addition, since \(k_{\delta }\ge t_{\delta ,d}\ge t\), the point set \(\{\mathbf {u}^n\}_{n=ab^{k_{\delta }}}^{(a+1)b^{k_{\delta }}-1}\) is also a \((t_{\delta ,d},k_{\delta },d)\)-net in base b ([34] Remark 4.3, p. 48). Thus, since for \(j\in 1:b^{k_{\delta }-t_{\delta ,d}}\) the elementary interval \(E(j,\delta )\) has volume \(b^{t_{\delta ,d}-k_{\delta }}\), the point set \(\{\mathbf {u}^n\}_{n=ab^{k_{\delta }}}^{(a+1)b^{k_{\delta }}-1}\) therefore contains exactly \(b^{t_{\delta _d}}\ge b^t\) points in \(E(j_{\tilde{\mathbf {x}},\mathbf {x}^{\prime },},\delta )\) and the proof is complete.
Appendix 2: Proof of Lemma 2
Using the Lipschitz property of \(F_{K_i}(\cdot |\cdot )\) for all \(i\in 1:d\), conditions (7) and (8) in the proof of Lemma 1 hold with \( \tilde{\delta }_i(\delta )=\delta (0.25\tilde{K}/C_K)\), \(i\in 1:d\). Hence, we can take \(v_K(\delta )=\delta (0.25\tilde{K}/C_K)^{d}\wedge \delta \) and thus \(\underline{S}_{\delta }= \delta 0.5\tilde{K}\big (1 \wedge (0.25\tilde{K}/C_K)^{d}\big )\). Then, the expression for \(k_{\delta }\) follows using (9) while the expression for \(\bar{\delta }_{K}\le 0.5\) results from the condition \(k_{\delta }\ge t+d\) for all \(\delta \in (0,\bar{\delta }_{K}]\).
Appendix 3: Proof of Lemma 3
We first state and prove three technical Lemmas:
Lemma 6
Let \(\mathcal {X}=[0,1]^d\) and \(K:\mathcal {X}\rightarrow \mathcal {P}(\mathcal {X})\) be a Markov kernel which verifies Assumptions (A1)-(A2). Then, for any \(\delta \in (0,\bar{\delta }_K]\), with \(\bar{\delta }_K\) as in Lemma 1, and any \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\), there exists a closed hypercube \(\bar{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\subset [0,1)^d\) of side \(\bar{S}_{\delta }=2.5\bar{K}\delta \), with \(\bar{K}=\max _{i\in 1:d}\{\sup _{\mathbf {x},\mathbf {y}\in \mathcal {X}}K_i(y_i|y_{1:i-1},\mathbf {x})\}\), such that
where \(v_K(\cdot )\) is as in Lemma 1.
Proof
The proof of Lemma 6 is similar to the proof of Lemma 1. Below, we use the same notation as in this latter.
Let \(\delta \in (0,\bar{\delta }_K]\), \((\tilde{\mathbf {x}},\,\mathbf {x}^{\prime })\in \mathcal {X}^2\) and note that, for any \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\),
Let \(0<\delta _1\le \dots \le \delta _d\le \delta \) be as in the proof of Lemma 1 and, for \(i\in 1:d\), define
and
Let \(i\in 1:d\) and \((\underline{\mathbf {x}}^i,\underline{\mathbf {y}}^i),(\bar{\mathbf {x}}^i,\bar{\mathbf {y}}^i)\in B_{v_K(\delta )}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\) be such that
Therefore, using (7), (8) and (10), we have, \(\forall i\in 1:d\),
where \(\tilde{K}\le \bar{K}\) is as in the proof of Lemma 1. (Note that \(\bar{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})-\underline{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})\) is indeed strictly positive because \(F_{K_i}(\cdot |\mathbf {x},y_{1:i-1},)\) is strictly increasing on [0, 1] for any \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\) and because \(\delta _i>0\).)
This shows that for all \(\mathbf {x}\in B_{v_K(\delta )}(\tilde{\mathbf {x}})\) and for all \(\mathbf {y}\in B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\), we have
and thus there exists a closed hypercube \(\bar{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\) of side \(\bar{S}_{\delta }=2.5\delta \bar{K}\) such that
To conclude the proof of Lemma 6, note that, because \(v_K(\delta )\le \delta _i\) for all \(i\in 1:d\),
Lemma 7
Consider the set-up of Lemma 3 and, for \((p,a,k)\in \mathbb {N}_+^3\), let
Then, for all \(k\in \mathbb {N}\), there exists a \(p^*_k\in \mathbb {N}\) such that \(\mathrm {Pr}\big (\bigcap _{a\in \mathbb {N}}E^{p}_{a,k}\big )=0\) for all \(p\ge p^*_k\).
Proof
Let \(\epsilon >0\), \(a\in \mathbb {N}\) and \(l\in \mathbb {R}\) be such that \(l<\varphi ^*\), and for \(k\in \mathbb {N}\), let \(E(k)=\{E(j,k)\}_{j=1}^{k^d}\) be the splitting of \(\mathcal {X}\) into closed hypercubes of volume \(k^{-d}\).
Let \(p^{\prime }\in \mathbb {N}_+\), \(\delta =2^{-p^{\prime }}\) and \(P^l_{\epsilon ,\delta }\subseteq E(\delta )\) be the smallest coverage of \((\mathcal {X}_{l})_{\epsilon }\) by hypercubes in \(E(\delta )\); that is, \(|P^l_{\epsilon ,\delta }|\) is the smallest value in \(1:\delta ^{-d}\) such that \((\mathcal {X}_{l})_{\epsilon }\subseteq \cup _{W\in P^l_{\epsilon ,\delta }}\). Let \(J^l_{\epsilon ,\delta }\subseteq 1:\delta ^{-d}\) be such that \(j\in J^l_{\epsilon ,\delta }\) if and only if \(E(j,\delta )\in P^l_{\epsilon ,\delta }\). We now bound \(|J^l_{\epsilon ,\delta }|\) following the same idea as in [20].
By assumption, there exists a constant \(\bar{M}<\infty \) independent of l such that \(M(\mathcal {X}_{l})\le \bar{M}\). Hence, for any fixed \(w>1\) there exists a \(\epsilon ^*\in (0,1)\) (independent of l) such that \(\lambda _d\big ((\mathcal {X}_{l})_{\epsilon }\big )\le w M(\mathcal {X}_{l})\epsilon \le w \bar{M}\epsilon \) for all \(\epsilon \in (0,\epsilon ^*]\). Let \(\epsilon =2^{-p}\), with \(p\in \mathbb {N}\) such that \(2^{-p}\le 0.5\epsilon ^*\), and take \(\delta _{\epsilon }=2^{-p-1}\). Then, we have the inclusions \((\mathcal {X}_{l})_{\epsilon }\subseteq \cup _{W\in P^l_{\epsilon ,\delta _{\epsilon }}} \subseteq (\mathcal {X}_{l})_{2\epsilon }\) and therefore, since \(2\epsilon \le \epsilon ^*\),
where the right-hand side is independent of l.
Next, for \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\), let \(\bar{\mathbf {x}}^j\) be the center of \(E(j,\delta _{\epsilon })\) and \(W^l(j,\delta _{\epsilon })=\cup _{j^{\prime }\in J^l_{\epsilon ,\delta _{\epsilon }}} \bar{W}(\bar{\mathbf {x}}^j,\bar{\mathbf {x}}^{j^{\prime }},\delta _{\epsilon })\), with \(\bar{W}(\cdot ,\cdot ,\cdot )\) as in Lemma 6. Then, using this latter, a necessary condition to move at iteration \(n+1\) of Algorithm 1 from a point \(\mathbf {x}^{n}\in E(j_{n},\delta _{\epsilon })\), with \(j_{n}\in J^{l}_{\epsilon ,\delta _{\epsilon }}\), to a point \(\mathbf {x}^{n+1}\ne \mathbf {x}^{n}\) such that \(\mathbf {x}^{n+1}\in E(j_{n+1},\delta _{\epsilon })\) for a \(j_{n+1}\in J^{l}_{\epsilon ,\delta _{\epsilon }}\) is that \(\mathbf {u}_R^{n+1}\in W^{l}(j_{n},\delta _{\epsilon })\).
Let \(k^{\delta _{\epsilon }}\) be the largest integer such that (i) \(b^{k}\le \bar{S}_{\delta _{\epsilon }}^{-d}b^t\), with \(\bar{S}_{\delta _{\epsilon }}=2.5\bar{K}\delta _{\epsilon }\), \(\bar{K}<\infty \), as in Lemma 6, and (ii) \((k-t)/d\) is a positive integer (if necessary reduce \(\epsilon \) to fulfil this last condition). Let \(E^{\prime }(\delta _{\epsilon })=\{E^{\prime }(k,\delta _{\epsilon })\}_{k=1}^{b^{k^{\delta _{\epsilon }}-t}}\) be the partition of \([0,1)^d\) into hypercubes of volume \(b^{t-k^{\delta _{\epsilon }}}\). Then, for all \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\), \(W^l(j,\delta _{\epsilon })\) is covered by at most \(2^d|J^l_{\epsilon ,\delta _{\epsilon }}|\) hypercubes of \(E^{\prime }(\delta _{\epsilon })\).
Let \(\epsilon \) be small enough so that \(k^{\delta _{\epsilon }}>t+dR\). Then, using the properties of \((t,s)_R\)-sequences (see Section 3.1), it is easily checked that, for all \(n\ge 0\),
Thus, using (12)–(13) and the definition of \(k^{\delta _{\epsilon }}\), we obtain, for all \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\) and \(n\ge 0\),
Consequently, using the definition of \(\epsilon \) and \(\delta _{\epsilon }\), and the fact that there exist at most \(2^d\) values of \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\) such that, for \(n\in \mathbb {N}\), we have \(\mathbf {x}^{n}\in E(j,\delta _{\epsilon })\), we deduce that, for a \(p^*\in \mathbb {N}\) large enough (i.e. for \(\epsilon =2^{-p^*}\) small enough)
implying that, for \(p\ge p^*\),
Finally, because the uniform random numbers \(\mathbf {z}^n\)’s in \([0,1)^s\) that enter into the definition of \((t,s)_R\)-sequences are IID, this shows that
To conclude the proof, for \(k\in \mathbb {N}\) let \(\rho _k\in (0,1)\) and \(p^*_k\ge p^*\) be such that
Then, \(\mathrm {Pr}\big (\cap _{a\in \mathbb {N}} E^p_{a,k})=0\) for all \(p\ge p^*_k\), as required.
Lemma 8
Consider the set-up of Lemma 3. For \(k\in \mathbb {N}\), let \(\tilde{E}(dk)=\{\tilde{E}(j,k)\}_{j=1}^{b^{dk}}\) be the partition of \([0,1)^d\) into hypercubes of volume \(b^{-dk}\). Let \(k^{R} \in (dR+t):(dR+t+d)\) be the smallest integer k such \((k-t)/d\) is an integer and such that \((k-t)/d\ge R\) and, for \(m\in \mathbb {N}\), let \(I_m=\{mb^{k^R},\dots ,(m+1)b^{k^R}-1\}\). Then, for any \(\delta \in (0,\bar{\delta }_K]\) verifying \(k_\delta > t+d+dR\) (with \(\bar{\delta }_K\) and \(k_\delta \) as in Lemma 1), there exists a \(p(\delta )>0\) such that
where \(t_{\delta ,d}\in t:(t+d)\) is such that \((k_{\delta }-t_{\delta ,d})/d\in \mathbb {N}\).
Proof
Let \(m\in \mathbb {N}\) and note that, by the properties of \((t,s)_R\)-sequence, the point set \(\{\mathbf {u}_{\infty }^{n}\}_{n\in I_m}\) is a \((t,{k^R},d)\)-net in base b. Thus, for all \(j\in 1:b^{k^R-t}\), this point set contains \(b^t\) points in \(\tilde{E}(j, k^{R}-t)\) and, consequently, for all \(j\in 1:b^{dR}\), it contains \(b^tb^{k^R-t-dR}=b^{k^R-dR}\ge 1\) points in \(\tilde{E}(j,dR)\). This implies that, for all \(j\in 1:b^{dR}\), the point set \(\{\mathbf {u}_R^{n}\}_{n\in I_m}\) contains \(b^{k^R-dR}\ge 1\) points in \(\tilde{E}(j,dR)\) where, for all \( n\in I_{m_i}\), \(\mathbf {u}_R^{n}\) is uniformly distributed in \(\tilde{E}(j_{n},dR)\) for a \(j_n\in 1:b^{dR}\).
In addition, it is easily checked that each hypercube of the set \(\tilde{E}(dR)\) contains
hypercubes of the set \(\tilde{E}(k_{\delta }-t_{\delta ,d})\), where \(k_{\delta }\) and \(t_{\delta ,d}\) are as in the statement of the lemma. Note that the last inequality holds because \(\delta \) is chosen so that \(k_{\delta }> t+d+dR\). Consequently,
and the proof is complete.
Proof of Lemma 3:
To prove the lemma we need to introduce some additional notation. Let \(\varOmega =[0,1)^{\mathbb {N}}\), \(\mathcal {B}([0,1))\) be the Borel \(\sigma \)-algebra on \([0,1)\), \(\mathcal {F}= \mathcal {B}([0,1))^{\otimes \mathbb {N}}\) and \(\mathbb {P}\) be the probability measure on \((\varOmega ,\mathcal {F})\) defined by
Next, for \(\omega \in \varOmega \), we denote by \(\big (\mathbf {U}_R^n(\omega )\big )_{n\ge 0}\) the sequence of points in \([0,1)^{d}\) defined, for all \(n\ge 0\), by (using the convention that empty sums are null),
Note that, under \(\mathbb {P}\), \(\big (\mathbf {U}_R^n\big )_{n\ge 0}\) is a \((t,d)_R\)-sequence in base b. Finally, for \(\omega \in \varOmega \), we denote by \(\big (\mathbf {x}^n_\omega \big )_{n\ge 0}\) the sequence of points in \(\mathcal {X}\) generated by Algorithm 1 when the sequence \(\big (\mathbf {U}_R^n(\omega )\big )_{n\ge 0}\) is used as input.
Under the assumptions of the lemma there exists a set \(\varOmega _1\in \mathcal {F}\) such that \(\mathbb {P}(\varOmega _1)=1\) and
Let \(\omega \in \varOmega _1\). Since \(\varphi \) is continuous, for any \(\epsilon >0\) there exists a \(N_{\omega , \epsilon }\in \mathbb {N}\) such that \(\mathbf {x}^n_\omega \in (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }\) for all \(n\ge N_{\omega ,\epsilon }\), where we recall that \((\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }=\{\mathbf {x}\in \mathcal {X}:\exists \mathbf {x}^{\prime }\in \mathcal {X}_{\bar{\varphi }_\omega }\text { such that } \Vert \mathbf {x}-\mathbf {x}^{\prime }\Vert _{\infty }\le \epsilon \}\). In addition, because \(\varphi \) is continuous and \(\mathcal {X}\) is compact, there exists an integer \(p_{\omega , \epsilon }\in \mathbb {N}\) such that we have both \(\lim _{\epsilon \rightarrow 0}p_{\omega ,\epsilon }=\infty \) and
Next, let \(\mathbf {x}^*\in \mathcal {X}\) be such that \(\varphi (\mathbf {x}^*)=\varphi ^*\), \(k^R \in (dR+t):(dR+t+d)\) be as in Lemma 8 and, for \((p,a,k)\in \mathbb {N}_+^3\), let
Then, by Lemma 7, there exists a \(p^*\in \mathbb {N}\) such that \(\mathbb {P}\big (\cap _{a\in \mathbb {N}} \tilde{E}^p_{a,k^R}\big )=0\) for all \(p\ge p^*\), and thus the set \(\tilde{\varOmega }_2=\cap _{p\ge p^*}\big (\mathcal {X}\setminus \cap _{a\in \mathbb {N}} \tilde{E}^p_{a,k^R}\big )\) verifies \(\mathbb {P}(\tilde{\varOmega }_2)=1\). Let \(\varOmega _2=\varOmega _1\cap \tilde{\varOmega }_2\) so that \(\mathbb {P}(\varOmega _2)=1\).
For \(\omega \in \varOmega _2\) let \(\epsilon _\omega >0\) be small enough so that, for any \(\epsilon \in (0,\epsilon _\omega ]\), we can take \(p_{\omega ,\epsilon }\ge p^*\) in (14). Then, for any \(\omega \in \varOmega _2\) such that \(\bar{\varphi }_\omega <\varphi ^*\), there exists a subsequence \((m_i)_{i\ge 1}\) of \((m)_{m\ge 1}\) such that, for all \(i\ge 1\), either
or
Assume first that there exist infinitely many \(i\in \mathbb {N}\) such that (15) holds. Then, by (14), this leads to a contradiction with the fact that \(\omega \in \varOmega _2\subseteq \varOmega _1\). Therefore, for any \(\omega \in \varOmega _2\) such that \(\bar{\varphi }_{\omega }<\varphi ^*\) there exists a subsequence \((m_i)_{i\ge 1}\) of \((m)_{m\ge 1}\) such that, for a \(i^*\) large enough,
Let \( \tilde{\varOmega }_2=\{\omega \in \varOmega _2:\,\bar{\varphi }_{\omega }<\varphi ^*\}\subseteq \varOmega _2\) . Then, to conclude the proof, it remains to show that \(\mathbb {P}(\tilde{\varOmega }_2)=0\). We prove this result by contradiction and thus, from henceforth, we assume \(\mathbb {P}(\tilde{\varOmega }_2)>0\).
To this end, let \(\mathbf {x}^*\in \mathcal {X}\) be such that \(\varphi (\mathbf {x}^*)=\varphi ^*\), \(\mathbf {x}\in \mathcal {X}\) and \(\delta \in (0,\bar{\delta }_K]\), with \(\bar{\delta }_K\) as in Lemma 1. Then, using this latter, a sufficient condition to have \(F_K^{-1}(\mathbf {x},\mathbf {U}_R^{n}(\omega ))\in B_{\delta }(\mathbf {x}^*)\), \(n\ge 1\), is that \(\mathbf {U}_R^{n}(\omega )\in \underline{W}(\mathbf {x},\mathbf {x}^*,\delta )\), with \(\underline{W}(\cdot ,\cdot ,\cdot )\) as in Lemma 1. From the proof of this latter we know that the hypercube \(\underline{W}(\mathbf {x},\mathbf {x}^*,\delta )\) contains at least one hypercube of the set \(\tilde{E}(k_{\delta }-t_{\delta ,d})\), where \(t_{\delta ,d}\in t:(t+d)\) is such that \((k_{\delta }-t_{\delta ,d})/d\in \mathbb {N}\) and, for \(k\in \mathbb {N}\), \(\tilde{E}(dk)\) is as in Lemma 8. Hence, by this latter, for any \(\delta \in (0,\delta ^*]\), with \(\delta ^*\) such that \(k_{\delta ^*}>t+d+dR\) (where, for \(\delta >0\), \(k_\delta \) is defined in Lemma 1), there exists a \(p(\delta )>0\) such that
and thus, using (16) and under Assumption (A2), it is easily checked that, for any \(\delta \in (0,\delta ^*]\),
where \(\mathbb {P}_2\) denotes the restriction of \(\mathbb {P}\) on \(\tilde{\varOmega }_2\) (recall that we assume \(\mathbb {P}(\tilde{\varOmega }_2)>0\)).
For \(\delta >0\), let
and let \(\tilde{p}^*\in \mathbb {N}\) be such that \(2^{-\tilde{p}^*}\le \delta ^*\). Then, the set \(\varOmega ^{\prime }=\cap _{\tilde{p}\ge \tilde{p}^*}\varOmega ^{\prime }_{2^{-\tilde{p}}}\) verifies \(\mathbb {P}_2(\varOmega ^{\prime })=1\).
To conclude the proof let \(\omega \in \varOmega ^{\prime }\). Then, because \(\varphi \) is continuous and \(\bar{\varphi }_\omega <\varphi ^*\), there exists a \(\tilde{\delta }_{\bar{\varphi }_\omega }>0\) so that \(\varphi (\mathbf {x})>\bar{\varphi }\) for all \(\mathbf {x}\in B_{\tilde{\delta }_{\bar{\varphi }_\omega }}(\mathbf {x}^*)\). Let \(\delta _{\bar{\varphi }_\omega }:=2^{-\tilde{p}_{\omega ,\epsilon }}\ge \tilde{\delta }_{\bar{\varphi }_\omega }\wedge \bar{\delta }_K\) for an integer \(\tilde{p}_{\omega ,\epsilon }\ge \tilde{p}^*\). Next, take \(\epsilon \) small enough so that we have both \(B_{\delta _{\bar{\varphi }_\omega }}(\mathbf {x}^*)\cap (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }=\varnothing \) and \(\varphi (\mathbf {x})\ge \varphi (\mathbf {x}^{\prime })\) for all \((\mathbf {x},\mathbf {x}^{\prime })\in B_{\delta _{\bar{\varphi }_\omega }}(\mathbf {x}^*)\times (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }\).
Using above computations, the set \( B_{\tilde{\delta }_{\bar{\varphi }_\omega }}(\mathbf {x}^*)\) is visited infinitely many time and thus \(\varphi (\mathbf {x}_\omega ^n)>\bar{\varphi }_\omega \) for infinitely many \(n\in \mathbb {N}\), contradicting the fact that \(\varphi (\mathbf {x}_\omega ^n)\rightarrow \bar{\varphi }_\omega \) as \(n\rightarrow \infty \). Hence, the set \(\varOmega ^{\prime }\) is empty. On the other hand, as shown above, under the assumption \(\mathbb {P}(\tilde{\varOmega }_2)>0\) we have \(\mathbb {P}_2(\varOmega ^{\prime })=1\) and, consequently, \(\varOmega ^{\prime }\ne \varnothing \). Therefore, we must have \(\mathbb {P}(\tilde{\varOmega }_2)=0\) and the proof is complete.
Appendix 4: Proof of Theorem 2
Using Lemmas 4 and 5, we know that \(\varphi (x^n)\rightarrow \bar{\varphi }\in \mathbb {R}\) and thus it remains to show that \(\bar{\varphi }=\varphi ^*\).
Assume that \(\bar{\varphi }\ne \varphi ^*\) and, for \(\epsilon =2^{-p}\), \(p\in \mathbb {N}_+\), let \(N_\epsilon \in \mathbb {N}\), \(p_\epsilon \) and \(\delta _{\bar{\varphi }}>0\) be as in the proof of Lemma 3 (with the dependence of \(N_\epsilon \), \(p_\epsilon \) and of \(\delta _{\bar{\varphi }}\) on \(\omega \in \varOmega \) suppressed in the notation for obvious reasons).
Let \(x^*\in \mathcal {X}\) be a global maximizer of \(\varphi \) and \(n=a_nb^{k_{\delta _{\bar{\varphi }}}}-1\) with \(a_n\in \mathbb {N}\) such that \(n>N_{\epsilon }\). For \(k\in \mathbb {N}\), let \(E(k)=\{E(j,k)\}_{j=1}^{k}\) be the splitting of [0, 1] into closed hypercubes of volume \(k^{-1}\). Then, by Lemma 6, a necessary condition to have a move at iteration \(n^{\prime }+1\ge 1\) of Algorithm 1 from \(x^{n^{\prime }}\in (\mathcal {X}_{\bar{\varphi }})_{\epsilon }\) to \(x^{n^{\prime }+1}\ne x^{n^{\prime }}\), \(x^{n^{\prime }+1}\in (\mathcal {X}_{\bar{\varphi }})_{\epsilon }\) is that
where, for \(j\in 1:(\epsilon /2)^{-d}\), \(\bar{x}^j\) denotes the center of \(E(j,\epsilon /2)\), \(J^{\bar{\varphi }}_{\epsilon , \epsilon /2}\) is as in the proof of Lemma 7 and \(\bar{W}(\cdot ,\cdot ,\cdot )\) is as in Lemma 6. Note that, using (12) with \(d=1\), \(|J^{\bar{\varphi }}_{\epsilon , \epsilon /2}|\le C^*\) for a constant \(C^*<\infty \) (independent of \(\epsilon \)).
Let \(b^{k^{\delta _{\epsilon }}}\) be the largest integer \(k\ge t\) such that \(b^{t-k}\ge \bar{S}_{\epsilon /2}^{d}\), with \(\bar{S}_{\epsilon /2}\) as in Lemma 6, and let \(\epsilon \) be small enough so that \(b^{k^{\delta _{\epsilon }}}>2^dC^*b^t\). The point set \(\{u_{\infty }^{n^{\prime }}\}_{n^{\prime }=a_nb^{k^{\delta _{\epsilon }}}}^{ (a_n+1)b^{k^{\delta _{\epsilon }}}-1}\) is a \((t, k^{\delta _{\epsilon }},d)\)-net in base b and thus the set \(\bar{W}(\epsilon )\) contains at most \(2^dC^* b^t\) points of this points set. Hence, if for \(n>N_{\epsilon }\) only moves inside the set \((\mathcal {X}_{\bar{\varphi }})_{\epsilon }\) occur, then, for a \(\tilde{n}\in a_nb^{k^{\delta _{\epsilon }}}:\big ((a_n+1)b^{k^{\delta _{\epsilon }}}-\eta _{\epsilon }-1)\big )\), the point set \(\{x^{n^{\prime }}\}_{n^{\prime }=\tilde{n}}^{\tilde{n}+\eta _{\epsilon }}\) is such that \(x^{n^{\prime }}=x^{\tilde{n}}\) for all \(n\in \tilde{n}:(\tilde{n}+\eta _{\epsilon })\), where \(\eta _{\epsilon }\ge \lfloor \frac{b^{k^{\delta _{\epsilon }}}}{2^dC^*{}^2b^t}\rfloor \); note that \(\eta _{\epsilon }\rightarrow \infty \) as \(\epsilon \rightarrow 0\).
Let \(k^{\epsilon }_0\) be the largest integer which verifies \(\eta _{\epsilon }\ge 2b^{k^{\epsilon }_0}\) so that \(\{u_{\infty }^n\}_{n=\tilde{n}}^{\tilde{n}+\eta _{\epsilon }}\) contains at least one \((t,k^{\epsilon }_0,d)\)-net in base b. Note that \(k^{\epsilon }_0\rightarrow \infty \) as \(\epsilon \rightarrow 0\), and let \(\epsilon \) be small enough so that \(k^{\epsilon }_0\ge k_{\delta _{\bar{\varphi }}}\), with \(k_{\delta }\) as in Lemma 1. Then, by Lemma 1, there exists at least one \(n^*\in (\tilde{n}+1):(\tilde{n}+\eta _{\epsilon })\) such that \(\tilde{y}^{n^*}:=F_K^{-1}(x^{\tilde{n}},u_{\infty }^{n^*})\in B_{\delta _{\bar{\varphi }}}(x^*)\). Since, by the definition of \(\delta _{\bar{\varphi }}\), for all \((x,x^{\prime })\in B_{\delta _{\bar{\varphi }}}(x^*)\times (\mathcal {X}_{\bar{\varphi }})_{\epsilon }\), and for \(\epsilon \) small enough, we have \(\varphi (x)>\varphi (x^{\prime })\), it follows that \(\varphi (\tilde{y}^{n^*})>\varphi (x^{\tilde{n}})\). Hence, there exists at least one \(n\in \tilde{n}:(\tilde{n}+\eta _{\epsilon })\) such that \(x^n\ne x^{\tilde{n}}\), which contradicts the fact that \(x^n=\tilde{x}\) for all \(n\in \tilde{n}:(\tilde{n}+\eta _{\epsilon })\). This shows that \(\bar{\varphi }\) is indeed the global maximum of \(\varphi \).
Appendix 5: Additional figures for the example of Sect. 6.1
Rights and permissions
About this article
Cite this article
Gerber, M., Bornn, L. Improving simulated annealing through derandomization. J Glob Optim 68, 189–217 (2017). https://doi.org/10.1007/s10898-016-0461-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-016-0461-1