Skip to main content
Log in

Improving simulated annealing through derandomization

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

We propose and study a version of simulated annealing (SA) on continuous state spaces based on \((t,s)_R\)-sequences. The parameter \(R\in \bar{\mathbb {N}}\) regulates the degree of randomness of the input sequence, with the case \(R=0\) corresponding to IID uniform random numbers and the limiting case \(R=\infty \) to (ts)-sequences. Our main result, obtained for rectangular domains, shows that the resulting optimization method, which we refer to as QMC-SA, converges almost surely to the global optimum of the objective function \(\varphi \) for any \(R\in \mathbb {N}\). When \(\varphi \) is univariate, we are in addition able to show that the completely deterministic version of QMC-SA is convergent. A key property of these results is that they do not require objective-dependent conditions on the cooling schedule. As a corollary of our theoretical analysis, we provide a new almost sure convergence result for SA which shares this property under minimal assumptions on \(\varphi \). We further explain how our results in fact apply to a broader class of optimization methods including for example threshold accepting, for which to our knowledge no convergence results currently exist. We finally illustrate the superiority of QMC-SA over SA algorithms in a numerical study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Alabduljabbar, A., Milanovic, J., Al-Eid, E.: Low discrepancy sequences based optimization algorithm for tuning psss. In: Proceedings of the 10th International Conference on Probabilistic Methods Applied to Power Systems, PMAPS’08, pp. 1–9. IEEE (2008)

  2. Althöfer, I., Koschnick, K.-U.: On the convergence of “Threshold accepting”. Appl. Math. Optim. 24(1), 183–195 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  3. Andrieu, C., Breyer, L.A., Doucet, A.: Convergence of simulated annealing using Foster–Lyapunov criteria. J. Appl. Prob. 38(4), 975–994 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  4. Andrieu, C., Doucet, A.: Simulated annealing for maximum a posteriori parameter estimation of hidden Markov models. IEEE Trans. Inf. Theory 46(3), 994–1004 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bélisle, C.J.P.: Convergence theorems for a class of simulated annealing algorithms on \(\mathbb{R}^d\). J. Appl. Prob. 29(4), 885–895 (1992)

    MathSciNet  MATH  Google Scholar 

  6. Bornn, L., Shaddick, G., Zidek, J.V.: Modeling nonstationary processes through dimension expansion. J. Am. Stat. Assoc. 107(497), 281–289 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen, J., Suarez, J., Molnar, P., Behal, A.: Maximum likelihood parameter estimation in a stochastic resonate-and-fire neuronal model. In: 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), pp. 57–62. IEEE (2011)

  8. Chen, S., Luk, B.L.: Adaptive simulated annealing for optimization in signal processing applications. Signal Process. 79(1), 117–128 (1999)

    Article  MATH  Google Scholar 

  9. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration. Cambridge University Press, Cambridge (2010)

    Book  MATH  Google Scholar 

  10. Dueck, G., Scheuer, T.: Threshold accepting: a general purpose optimization algorithm appearing superior to simulated annealing. J. Comput. Phys. 90(1), 161–175 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fang, K., Winker, P., Hickernell, F.J.: Some global optimization algorithms in statistics. In: Du, D.Z., Zhang, Z.S., Cheng, K. (eds.) Operations Research and Its Applications. Lecture Notes in Operations Research, vol. 2. World Publishing Corp, New York (1996)

    Google Scholar 

  12. Fang, K.T.: Some applications of quasi-Monte Carlo methods in statistics. In: Monte Carlo and Quasi-Monte Carlo Methods 2000, pp. 10–26. Springer (2002)

  13. Gelfand, S.B., Mitter, S.K.: Recursive stochastic algorithms for global optimization in \(\mathbb{R}^d\). SIAM J. Control Optim. 29(5), 999–1018 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gelfand, S.B., Mitter, S.K.: Metropolis-type annealing algorithms for global optimization in \(\mathbb{R}^d\). SIAM J. Control Optim. 31(1), 111–131 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  15. Geman, S., Hwang, C.-R.: Diffusions for global optimization. SIAM J. Control Optim. 24(5), 1031–1043 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gerber, M., Chopin, N.: Sequential Quasi-Monte Carlo. J. R. Stat. Soc. B 77(3), 509–579 (2015)

    Article  MathSciNet  Google Scholar 

  17. Girard, T., Staraj, R., Cambiaggio, E., Muller, F.: A simulated annealing algorithm for planar or conformal antenna array synthesis with optimized polarization. Microw. Opt. Technol. Lett. 28(2), 86–89 (2001)

    Article  Google Scholar 

  18. Goffe, W.L., Ferrier, G.D., Rogers, J.: Global optimization of statistical functions with simulated annealing. J. Econom. 60(1), 65–99 (1994)

    Article  MATH  Google Scholar 

  19. Haario, H., Saksman, E.: Simulated annealing process in general state space. Adv. Appl. Probab. 23, 866–893 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  20. He, Z., Owen, A.B.: Extensible grids: uniform sampling on a space filling curve. J. R. Stat. Soc.: Ser. B (2015)

  21. Hickernell, F.J., Yuan, Y.-X.: A simple multistart algorithm for global optimization. OR Trans. 1(2), 1–12 (1997)

    Google Scholar 

  22. Hong, H.S., Hickernell, F.J.: Algorithm 823: implementing scrambled digital sequences. ACM Trans. Math. Softw. 29(2), 95–109 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ingber, L.: Very fast simulated re-annealing. Math. Comput. Model. 12(8), 967–973 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  24. Ireland, J.: Simulated annealing and Bayesian posterior distribution analysis applied to spectral emission line fitting. Solar Phys. 243(2), 237–252 (2007)

    Article  Google Scholar 

  25. Jiao, Y.-C., Dang, C., Leung, Y., Hao, Y.: A modification to the new version of the price’s algorithm for continuous global optimization problems. J. Global Optim. 36(4), 609–626 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lecchini-Visintini, A., Lygeros, J., Maciejowski, J.M.: Stochastic optimization on continuous domains with finite-time guarantees by Markov Chain Monte Carlo methods. IEEE Trans. Autom. Control 55(12), 2858–2863 (2010)

    Article  MathSciNet  Google Scholar 

  27. Lei, G.: Adaptive random search in quasi-Monte Carlo methods for global optimization. Comput. Math. Appl. 43(6), 747–754 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  28. Locatelli, M.: Convergence properties of simulated annealing for continuous global optimization. J. Appl. Prob. 33(4), 1127–1140 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  29. Locatelli, M.: Convergence of a simulated annealing algorithm for continuous global optimization. J. Global Optim. 18(3), 219–233 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  30. Locatelli, M.: Simulated annealing algorithms for continuous global optimization. In: Handbook of global optimization, pp. 179–229. Springer (2002)

  31. Moscato, P., Fontanari, J.F.: Stochastic versus deterministic update in simulated annealing. Phys. Lett. A 146(4), 204–208 (1990)

    Article  Google Scholar 

  32. Niederreiter, H.: A quasi-Monte Carlo method for the approximate computation of the extreme values of a function. In: Studies in Pure Mathematics, pp. 523–529. Springer (1983)

  33. Niederreiter, H.: Point sets and sequences with small discrepancy. Monatshefte für Mathematik 104(4), 273–337 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  34. Niederreiter, H.: Random number generation and quasi-Monte Carlo methods. In: CBMS-NSF Regional Conference Series in Applied Mathematics (1992)

  35. Niederreiter, H., Peart, P.: Localization of search in quasi-Monte Carlo methods for global optimization. SIAM J. Sci. Stat. Comput. 7(2), 660–664 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  36. Nikolaev, A.G., Jacobson, S.H.: Simulated annealing. In: Handbook of Metaheuristics, pp. 1–39. Springer (2010)

  37. Owen, A.B.: Randomly permuted \((t, m, s)\)-nets and \((t, s)\)-sequences. In: Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing. Lecture Notes in Statististics, vol. 106, pp. 299–317. Springer, New York (1995)

  38. Pistovčák, F., Breuer, T.: Using quasi-Monte Carlo scenarios in risk management. In: Monte Carlo and Quasi-Monte Carlo Methods 2002, pp. 379–392. Springer (2004)

  39. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer-Verlag, New York (2004)

    Book  MATH  Google Scholar 

  40. Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470–472 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  41. Rubenthaler, S., Rydén, T., Wiktorsson, M.: Fast simulated annealing in \(\mathbb{R}^d\) with an application to maximum likelihood estimation in state-space models. Stoch. Process. Appl. 119(6), 1912–1931 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  42. Winker, P., Maringer, D.: The threshold accepting optimisation algorithm in economics and statistics. In: Optimisation, Econometric and Financial Analysis, pp. 107–125. Springer (2007)

  43. Zhang, H., Bonilla-Petriciolet, A., Rangaiah, G.P.: A review on global optimization methods for phase equilibrium modeling and calculations. Open Thermodyn. J. 5(S1), 71–92 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

The authors acknowledge support from DARPA under Grant No. FA8750-14-2-0117. The authors also thank Christophe Andrieu, Pierre Jacob and Art Owen for insightful discussions and useful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathieu Gerber.

Appendices

Appendix 1: Proof of Lemma 1

Let \(n\in \mathbb {N}\), \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\), \(\delta _{\mathcal {X}}=0.5\) and \(\delta \in (0,\delta _{\mathcal {X}}]\). Then, by Assumption (A1), \(F_K^{-1}(\tilde{\mathbf {x}},\mathbf {u}_1^n)\in B_{\delta }(\mathbf {x}^{\prime })\) if and only if \( \mathbf {u}_1^n\in F_K(\tilde{\mathbf {x}},B_{\delta }(\mathbf {x}^{\prime }))\). We now show that, for \(\delta \) small enough, there exists a closed hypercube \(W(\tilde{\mathbf {x}},\mathbf {x}^{\prime }, \delta )\subset [0,1)^d\) such that \(W(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\subseteq F_K(\mathbf {x},B_{\delta }(\mathbf {x}^{\prime }))\) for all \(\mathbf {x}\in B_{v_K(\delta )}(\tilde{\mathbf {x}})\), with \(v_K(\cdot )\) as in the statement of the lemma.

To see this note that, because \(K(\mathbf {x},\mathrm {d}\mathbf {y})\) admits a density \(K(\mathbf {y}|\mathbf {x})\) which is continuous on the compact set \(\mathcal {X}^2\), and using Assumption (A2), it is easy to see that, for \(i\in 1:d\), \(K_i(y_i|y_{1:i-1},\mathbf {x})\ge \tilde{K}\) for all \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\) and for a constant \(\tilde{K}>0\). Consequently, for any \(\delta \in [0,0.5]\) and \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\),

$$\begin{aligned} F_{K_i}\left( x^{\prime }_i+\delta |\mathbf {x}, y_{1:i-1}\right) -F_{K_i}\left( x^{\prime }_i-\delta |\mathbf {x}, y_{1:i-1}\right) \ge \tilde{K}\delta ,\quad \forall i\in 1:d \end{aligned}$$
(6)

where \(F_{K_i}(\cdot {}|\mathbf {x}, y_{1:i-1})\) denotes the CDF of the probability measure \(K_i(\mathbf {x},y_{1:i-1},\mathrm {d}y_i)\), with the convention that \(F_{K_i}(\cdot {}|\mathbf {x}, y_{1:i-1})=F_{K_1}(\cdot {}|\mathbf {x})\) when \(i=1\). Note that the right-hand side of (6) is \(\tilde{K}\delta \) and not \(2\tilde{K}\delta \) to encompass the case where either \(x_i^{\prime }-\delta \not \in [0,1]\) or \(x_i^{\prime }+\delta \not \in [0,1]\). (Note also that because \(\delta \le 0.5\) we cannot have both \(x_i^{\prime }-\delta \not \in [0,1]\) and \(x_i^{\prime }+\delta \not \in [0,1]\).)

For \(i\in 1:d\) and \(\delta ^{\prime }>0\), let

$$\begin{aligned} \omega _i(\delta ^{\prime })=\sup _{ \begin{array}{c} (\mathbf {x},\mathbf {y})\in \mathcal {X}^2,\,(\mathbf {x}^{\prime },\mathbf {y}^{\prime })\in \mathcal {X}^2\\ \Vert \mathbf {x}-\mathbf {x}^{\prime }\Vert _{\infty }\vee \Vert \mathbf {y}-\mathbf {y}^{\prime }\Vert _{\infty }\le \delta ^{\prime } \end{array}}|F_{K_i}\left( y_i|\mathbf {x}, y_{1:i-1}\right) -F_{K_i}\left( y_i^{\prime }|\mathbf {x}^{\prime }, y^{\prime }_{1:i-1}\right) | \end{aligned}$$

be the (optimal) modulus of continuity of \(F_{K_i}(\cdot |\cdot )\). Since \(F_{K_i}\) is uniformly continuous on the compact set \([0,1]^{d+i}\), the mapping \(\omega _i(\cdot )\) is continuous and \(\omega _i(\delta ^{\prime })\rightarrow 0\) as \(\delta ^{\prime }\rightarrow 0\). In addition, because \(F_{K_i}\left( \cdot |\mathbf {x}, y_{1:i-1}\right) \) is strictly increasing on [0, 1] for all \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\), \(\omega _i(\cdot )\) is strictly increasing on (0, 1]. Let \(\tilde{K}\) be small enough so that, for \(i\in 1:d\), \(0.25\tilde{K}\delta _{\mathcal {X}}\le w_i(1)\) and let \(\tilde{\delta }_i(\cdot )\) be the mapping \( z\in (0,\delta _{\mathcal {X}}]\longmapsto \tilde{\delta }_i(z)=\omega _i^{-1} (0.25\tilde{K}z)\). Remark that the function \(\tilde{\delta }_i(\cdot )\) is independent of \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\), continuous and strictly increasing on \((0,\delta _{\mathcal {X}}]\) and such that \(\tilde{\delta }_i(\delta ^{\prime })\rightarrow 0\) as \(\delta ^{\prime }\rightarrow 0\).

For \(\mathbf {x}\in \mathcal {X}\), \(\delta ^{\prime }>0\) and \(\delta ^{\prime }_i>0\), \(i\in 1:d\), let

$$\begin{aligned} B^i_{\delta ^{\prime }}(\tilde{\mathbf {x}})=\{\mathbf {x}\in [0,1]^i:\, \Vert \mathbf {x}-\tilde{x}_{1:i}\Vert _{\infty }\le \delta ^{\prime }\}\cap [0,1]^i \end{aligned}$$

and

$$\begin{aligned} B_{\delta ^{\prime }_{1:i}}(\tilde{\mathbf {x}})=\{\mathbf {x}\in [0,1]^i:\, |x_j-\tilde{x}_{j}|\le \delta ^{\prime }_j,\,j\in 1:i\}\cap [0,1]^i. \end{aligned}$$

Then, for any \(\delta ^{\prime }>0\) and for all \((\mathbf {x},y_{1:i-1})\in B_{\tilde{\delta }_i(\delta )}(\tilde{\mathbf {x}})\times B^{i-1}_{\tilde{\delta }_i(\delta )}(\mathbf {x}^{\prime })\), we have

$$\begin{aligned}&|F_{K_i}\left( x^{\prime }_i+\delta ^{\prime }|\mathbf {x}, y_{1:i-1}\right) -F_{K_i}\left( x^{\prime }_i+\delta ^{\prime }|\tilde{\mathbf {x}}, x^{\prime }_{1:i-1}\right) |\le 0.25\tilde{K}\delta \end{aligned}$$
(7)
$$\begin{aligned}&|F_{K_i}\left( x^{\prime }_i-\delta ^{\prime }|\mathbf {x}, y_{1:i-1}\right) -F_{K_i}\left( x^{\prime }_i-\delta ^{\prime }|\tilde{\mathbf {x}}, x^{\prime }_{1:i-1}\right) |\le 0.25\tilde{K}\delta . \end{aligned}$$
(8)

For \(i\in 1:d\) and \(\delta ^{\prime }\in (0,\delta _{\mathcal {X}}]\), let \(\delta _i(\delta ^{\prime })=\tilde{\delta }_i(\delta ^{\prime })\wedge \delta ^{\prime }\) and note that the function \(\delta _i(\cdot )\) is continuous and strictly increasing on \((0,\delta _{\mathcal {X}}]\). Let \(\delta _d=\delta _d(\delta )\) and define recursively \(\delta _{i}=\delta _{i}(\delta _{i+1})\), \(i\in 1:(d-1)\), so that \(\delta \ge \delta _d\ge \dots \ge \delta _1>0\). For \(i\in 1:d\), let

$$\begin{aligned} \underline{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})= \sup _{(\mathbf {x},y_{1:i-1})\in B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })} F_{K_i}\left( x^{\prime }_i-\delta _i|\mathbf {x}, y_{1:i-1}\right) \end{aligned}$$

and

$$\begin{aligned} \bar{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})= \inf _{(\mathbf {x},y_{1:i-1})\in B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })} F_{K_i}\left( x^{\prime }_i+\delta _i|\mathbf {x}, y_{1:i-1}\right) . \end{aligned}$$

Then, since \(F_{K_i}(\cdot |\cdot )\) is continuous and the set \(B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\) is compact, there exists points \((\underline{\mathbf {x}}^i,\underline{y}^{i}_{1:i-1})\) and \((\bar{\mathbf {x}}^i,\bar{y}^{i}_{1:i-1})\) in \(B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\) such that

$$\begin{aligned} \underline{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})=F_{K_i}(x^{\prime }_i-\delta _{i}|\underline{\mathbf {x}}^i, \underline{y}^{i}_{1:i-1}),\quad \bar{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime }, \delta _{1:i})=F_{K_i}\left( x^{\prime }_i+\delta _i|\bar{\mathbf {x}}^i, \bar{y}^{i}_{1:i-1}\right) . \end{aligned}$$

In addition, by the construction of the \(\delta _i\)’s, \(B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\subseteq B_{\tilde{\delta }_i(\delta _{i})}(\tilde{\mathbf {x}})\times B^i_{\tilde{\delta }_i(\delta _{i})}(\mathbf {x}^{\prime })\) for all \(i\in 1:d\). Therefore, using (6)–(8), we have, for all \(i\in 1:d\),

$$\begin{aligned} \bar{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})-\underline{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})&=F_{K_i}(x^{\prime }_i+\delta _i|\bar{\mathbf {x}}^i, \bar{y}^i_{1:i-1})-F_{K_i}(x^{\prime }_i-\delta _i|\underline{\mathbf {x}}^i, \underline{y}^i_{1:i-1})\\&\ge F_{K_i}\left( x^{\prime }_i+\delta _i|\tilde{\mathbf {x}}, x^{\prime }_{1:i-1}\right) -F_{K_i}\left( x^{\prime }_i-\delta _i|\tilde{\mathbf {x}}, x^{\prime }_{1:i-1}\right) -0.5\tilde{K}\delta _i\\&\ge 0.5\tilde{K}\delta _i\\&>0. \end{aligned}$$

Consequently, for all \(i\in 1:d\) and for all \((\mathbf {x},y_{1:i-1})\in B_{\delta _{1}}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\),

$$\begin{aligned} \Big [\underline{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i}), \bar{v}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})\Big ]\subseteq \Big [F_{K_i}(x^{\prime }_i-\delta _i|\mathbf {x}, y_{1:i-1}),F_{K_i}(x^{\prime }_i+\delta _i|\mathbf {x}, y_{1:i-1})\Big ]. \end{aligned}$$

Let \(\underline{S}_{\delta }=0.5\tilde{K}\delta _1\). Then, this shows that there exists a closed hypercube \(\underline{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\) of side \(\underline{S}_{\delta }\) such that

$$\begin{aligned} \underline{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\subseteq F_K\big (\mathbf {x},B_{\delta _{1:d}}(\mathbf {x}^{\prime })\big )\subseteq F_K\big (\mathbf {x},B_{\delta }(\mathbf {x}^{\prime })\big ),\quad \forall \mathbf {x}\in B_{v_K(\delta )}(\tilde{\mathbf {x}}) \end{aligned}$$

where we set \(v_K(\delta )=\delta _1\). Note that \(v_K(\delta )\in (0,\delta ]\) and thus \(v_K(\delta )\rightarrow 0\) as \(\delta \rightarrow 0\), as required. In addition, \(v_K(\cdot )=\delta _1\circ \dots \delta _d(\cdot )\) is continuous and strictly increasing on \((0,\delta _{\mathcal {X}}]\) because the functions \(\delta _i(\cdot )\), \(i\in 1:d\), are continuous and strictly increasing on this set. Note also that \(v_K(\cdot )\) does not depend on \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\).

To conclude the proof, let

$$\begin{aligned} k_{\delta }=\big \lceil t+d-d\log (\underline{S}_{\delta }/3)/\log b\big \rceil \end{aligned}$$
(9)

and note that, if \(\delta \) is small enough, \(k_{\delta }\ge t+d\) because \(\underline{S}_{\delta }\rightarrow 0\) as \(\delta \rightarrow 0\). Let \(\bar{\delta }_K\) be the largest value of \(\delta ^{\prime }\le \delta _{\mathcal {X}}\) such that \(k_{\delta ^{\prime }}\ge t+d\). Let \(\delta \in (0,\bar{\delta }_K]\) and \(t_{\delta ,d}\in t:(t+d)\) be such that \((k_{\delta }-t_{\delta ,d})/d\) is an integer. Let \(\{E(j,\delta )\}_{j=1}^{b^{k_{\delta }-t_{\delta ,d}}}\) be the partition of \([0,1)^d\) into elementary intervals of volume \(b^{t_{\delta ,d}-k_{\delta }}\) so that any closed hypercube of side \(\underline{S}_{\delta }\) contains at least one elementary interval \(E(j,\delta )\) for a \(j\in 1:b^{k_{\delta }-t_{\delta ,d}}\). Hence, there exists a \(j_{\tilde{\mathbf {x}},\mathbf {x}^{\prime }}\in 1:b^{k_{\delta }-t_{\delta ,d}}\) such that

$$\begin{aligned} E(j_{\tilde{\mathbf {x}},\mathbf {x}^{\prime },},\delta )\subseteq \underline{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\subseteq F_K(\mathbf {x},B_{\delta }(\mathbf {x}^{\prime })),\quad \forall \mathbf {x}\in B_{v(\delta )}(\tilde{\mathbf {x}}). \end{aligned}$$

Let \(a\in \mathbb {N}\) and note that, by the properties of (ts)-sequences in base b, the point set \(\{\mathbf {u}^n\}_{n=ab^{k_{\delta }}}^{(a+1)b^{k_{\delta }}-1}\) is a \((t,k_{\delta },d)\)-net in base b because \(k_{\delta }> t\). In addition, since \(k_{\delta }\ge t_{\delta ,d}\ge t\), the point set \(\{\mathbf {u}^n\}_{n=ab^{k_{\delta }}}^{(a+1)b^{k_{\delta }}-1}\) is also a \((t_{\delta ,d},k_{\delta },d)\)-net in base b ([34] Remark 4.3, p. 48). Thus, since for \(j\in 1:b^{k_{\delta }-t_{\delta ,d}}\) the elementary interval \(E(j,\delta )\) has volume \(b^{t_{\delta ,d}-k_{\delta }}\), the point set \(\{\mathbf {u}^n\}_{n=ab^{k_{\delta }}}^{(a+1)b^{k_{\delta }}-1}\) therefore contains exactly \(b^{t_{\delta _d}}\ge b^t\) points in \(E(j_{\tilde{\mathbf {x}},\mathbf {x}^{\prime },},\delta )\) and the proof is complete.

Appendix 2: Proof of Lemma 2

Using the Lipschitz property of \(F_{K_i}(\cdot |\cdot )\) for all \(i\in 1:d\), conditions (7) and (8) in the proof of Lemma 1 hold with \( \tilde{\delta }_i(\delta )=\delta (0.25\tilde{K}/C_K)\), \(i\in 1:d\). Hence, we can take \(v_K(\delta )=\delta (0.25\tilde{K}/C_K)^{d}\wedge \delta \) and thus \(\underline{S}_{\delta }= \delta 0.5\tilde{K}\big (1 \wedge (0.25\tilde{K}/C_K)^{d}\big )\). Then, the expression for \(k_{\delta }\) follows using (9) while the expression for \(\bar{\delta }_{K}\le 0.5\) results from the condition \(k_{\delta }\ge t+d\) for all \(\delta \in (0,\bar{\delta }_{K}]\).

Appendix 3: Proof of Lemma 3

We first state and prove three technical Lemmas:

Lemma 6

Let \(\mathcal {X}=[0,1]^d\) and \(K:\mathcal {X}\rightarrow \mathcal {P}(\mathcal {X})\) be a Markov kernel which verifies Assumptions (A1)-(A2). Then, for any \(\delta \in (0,\bar{\delta }_K]\), with \(\bar{\delta }_K\) as in Lemma 1, and any \((\tilde{\mathbf {x}},\mathbf {x}^{\prime })\in \mathcal {X}^2\), there exists a closed hypercube \(\bar{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\subset [0,1)^d\) of side \(\bar{S}_{\delta }=2.5\bar{K}\delta \), with \(\bar{K}=\max _{i\in 1:d}\{\sup _{\mathbf {x},\mathbf {y}\in \mathcal {X}}K_i(y_i|y_{1:i-1},\mathbf {x})\}\), such that

$$\begin{aligned} F_K(\mathbf {x},B_{v_K(\delta )}(\mathbf {x}^{\prime }))\subseteq \bar{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta ),\quad \forall \mathbf {x}\in B_{v_K(\delta )}(\tilde{\mathbf {x}}) \end{aligned}$$
(10)

where \(v_K(\cdot )\) is as in Lemma 1.

Proof

The proof of Lemma 6 is similar to the proof of Lemma 1. Below, we use the same notation as in this latter.

Let \(\delta \in (0,\bar{\delta }_K]\), \((\tilde{\mathbf {x}},\,\mathbf {x}^{\prime })\in \mathcal {X}^2\) and note that, for any \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\),

$$\begin{aligned} F_{K_i}(x^{\prime }_i+\delta |\mathbf {x},y_{1:i-1})-F_{K_i}(x^{\prime }_i-\delta |\mathbf {x},y_{1:i-1})\le 2\bar{K}\delta ,\quad i\in 1:d. \end{aligned}$$
(11)

Let \(0<\delta _1\le \dots \le \delta _d\le \delta \) be as in the proof of Lemma 1 and, for \(i\in 1:d\), define

$$\begin{aligned} \underline{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})=\inf _{(\mathbf {x},\,\mathbf {y})\in B_{v_K(\delta )}(\tilde{\mathbf {x}}),\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })}F_{K_i}(x^{\prime }_i-\delta _i|\mathbf {x},y_{1:i-1}) \end{aligned}$$

and

$$\begin{aligned} \bar{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime }, \delta _{1:i})=\sup _{(\mathbf {x},\,\mathbf {y})\in B_{v_K(\delta )}(\tilde{\mathbf {x}}),\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })}F_{K_i}(x^{\prime }_i+\delta _i|\mathbf {x},y_{1:i-1}). \end{aligned}$$

Let \(i\in 1:d\) and \((\underline{\mathbf {x}}^i,\underline{\mathbf {y}}^i),(\bar{\mathbf {x}}^i,\bar{\mathbf {y}}^i)\in B_{v_K(\delta )}(\tilde{\mathbf {x}})\times B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\) be such that

$$\begin{aligned} \underline{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})=F_{K_i}(x^{\prime }_i-\delta _i|\underline{\mathbf {x}}^i,\underline{y}^i_{1:i-1}),\quad \bar{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )=F_{K_i}(x^{\prime }_i+\delta _i|\bar{\mathbf {x}}^i,\bar{y}^i_{1:i-1}). \end{aligned}$$

Therefore, using (7), (8) and (10), we have, \(\forall i\in 1:d\),

$$\begin{aligned} 0<\bar{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})-\underline{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})&=F_{K_i}(x^{\prime }_i+\delta _i|\bar{\mathbf {x}}^i,\bar{y}^i_{1:i-1})-F_{K_i}(x^{\prime }_i-\delta _i|\underline{\mathbf {x}}^i,\underline{y}^i_{1:i-1})\\&\le F_{K_i}(x^{\prime }_i\!+\!\delta _i|\tilde{\mathbf {x}},x^{\prime }_{1:i-1})\!-\!F_{K_i}(x^{\prime }_i-\delta _i|\tilde{\mathbf {x}},x^{\prime }_{1:i-1})+0.5\tilde{K}\delta _i\\&\le \delta _i(2\bar{K}+0.5\tilde{K})\\&\le 2.5\delta _i\bar{K} \end{aligned}$$

where \(\tilde{K}\le \bar{K}\) is as in the proof of Lemma 1. (Note that \(\bar{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})-\underline{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta _{1:i})\) is indeed strictly positive because \(F_{K_i}(\cdot |\mathbf {x},y_{1:i-1},)\) is strictly increasing on [0, 1] for any \((\mathbf {x},\mathbf {y})\in \mathcal {X}^2\) and because \(\delta _i>0\).)

This shows that for all \(\mathbf {x}\in B_{v_K(\delta )}(\tilde{\mathbf {x}})\) and for all \(\mathbf {y}\in B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })\), we have

$$\begin{aligned} \big [F_{K_i}(x^{\prime }_i-\delta _i|\mathbf {x},\mathbf {y}),F_{K_i}(x^{\prime }_i+\delta _i|\mathbf {x},\mathbf {y})\big ]\subseteq \big [\underline{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta ),\bar{u}_i(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\big ],\quad \forall i\in 1:d \end{aligned}$$

and thus there exists a closed hypercube \(\bar{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta )\) of side \(\bar{S}_{\delta }=2.5\delta \bar{K}\) such that

$$\begin{aligned} F_K(\mathbf {x},B_{\delta _{1:i-1}}(\mathbf {x}^{\prime }))\subseteq \bar{W}(\tilde{\mathbf {x}},\mathbf {x}^{\prime },\delta ),\quad \forall \mathbf {x}\in B_{v_K(\delta )}(\tilde{\mathbf {x}}). \end{aligned}$$

To conclude the proof of Lemma 6, note that, because \(v_K(\delta )\le \delta _i\) for all \(i\in 1:d\),

$$\begin{aligned} F_K(\mathbf {x},B_{v_K(\delta )}(\mathbf {x}^{\prime }))\subseteq F_K(\mathbf {x},B_{\delta _{1:i-1}}(\mathbf {x}^{\prime })). \end{aligned}$$

Lemma 7

Consider the set-up of Lemma 3 and, for \((p,a,k)\in \mathbb {N}_+^3\), let

$$\begin{aligned} E^{p}_{a,k}&=\Big \{\exists n\in \{ab^{k},\dots ,(a+1)b^{k}-1\}:\ \mathbf {x}^{n}\ne \mathbf {x}^{ab^{k}-1},\,\,\varphi (\mathbf {x}^{ab^k-1})<\varphi ^*\Big \}\\&\cap \Big \{\forall n\in \{ab^{k},\dots ,(a+1)b^{k}-1\}:\, \mathbf {x}^{n} \in (\mathcal {X}_{\varphi (\mathbf {x}^{ab^k-1})})_{2^{-p}}\Big \}. \end{aligned}$$

Then, for all \(k\in \mathbb {N}\), there exists a \(p^*_k\in \mathbb {N}\) such that \(\mathrm {Pr}\big (\bigcap _{a\in \mathbb {N}}E^{p}_{a,k}\big )=0\) for all \(p\ge p^*_k\).

Proof

Let \(\epsilon >0\), \(a\in \mathbb {N}\) and \(l\in \mathbb {R}\) be such that \(l<\varphi ^*\), and for \(k\in \mathbb {N}\), let \(E(k)=\{E(j,k)\}_{j=1}^{k^d}\) be the splitting of \(\mathcal {X}\) into closed hypercubes of volume \(k^{-d}\).

Let \(p^{\prime }\in \mathbb {N}_+\), \(\delta =2^{-p^{\prime }}\) and \(P^l_{\epsilon ,\delta }\subseteq E(\delta )\) be the smallest coverage of \((\mathcal {X}_{l})_{\epsilon }\) by hypercubes in \(E(\delta )\); that is, \(|P^l_{\epsilon ,\delta }|\) is the smallest value in \(1:\delta ^{-d}\) such that \((\mathcal {X}_{l})_{\epsilon }\subseteq \cup _{W\in P^l_{\epsilon ,\delta }}\). Let \(J^l_{\epsilon ,\delta }\subseteq 1:\delta ^{-d}\) be such that \(j\in J^l_{\epsilon ,\delta }\) if and only if \(E(j,\delta )\in P^l_{\epsilon ,\delta }\). We now bound \(|J^l_{\epsilon ,\delta }|\) following the same idea as in [20].

By assumption, there exists a constant \(\bar{M}<\infty \) independent of l such that \(M(\mathcal {X}_{l})\le \bar{M}\). Hence, for any fixed \(w>1\) there exists a \(\epsilon ^*\in (0,1)\) (independent of l) such that \(\lambda _d\big ((\mathcal {X}_{l})_{\epsilon }\big )\le w M(\mathcal {X}_{l})\epsilon \le w \bar{M}\epsilon \) for all \(\epsilon \in (0,\epsilon ^*]\). Let \(\epsilon =2^{-p}\), with \(p\in \mathbb {N}\) such that \(2^{-p}\le 0.5\epsilon ^*\), and take \(\delta _{\epsilon }=2^{-p-1}\). Then, we have the inclusions \((\mathcal {X}_{l})_{\epsilon }\subseteq \cup _{W\in P^l_{\epsilon ,\delta _{\epsilon }}} \subseteq (\mathcal {X}_{l})_{2\epsilon }\) and therefore, since \(2\epsilon \le \epsilon ^*\),

$$\begin{aligned} |J^l_{\epsilon ,\delta _{\epsilon }}|\le \frac{\lambda _d\big ((\mathcal {X}_{l})_{2\epsilon }\big )}{\lambda _d(E(j,\delta _{\epsilon }))}\le \frac{w \bar{M} (2\epsilon )^{d}}{\delta _{\epsilon }^{d}}\le \bar{C}\delta _{\epsilon }^{-(d-1)},\quad \bar{C}:=w \bar{M} 2^d \end{aligned}$$
(12)

where the right-hand side is independent of l.

Next, for \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\), let \(\bar{\mathbf {x}}^j\) be the center of \(E(j,\delta _{\epsilon })\) and \(W^l(j,\delta _{\epsilon })=\cup _{j^{\prime }\in J^l_{\epsilon ,\delta _{\epsilon }}} \bar{W}(\bar{\mathbf {x}}^j,\bar{\mathbf {x}}^{j^{\prime }},\delta _{\epsilon })\), with \(\bar{W}(\cdot ,\cdot ,\cdot )\) as in Lemma 6. Then, using this latter, a necessary condition to move at iteration \(n+1\) of Algorithm 1 from a point \(\mathbf {x}^{n}\in E(j_{n},\delta _{\epsilon })\), with \(j_{n}\in J^{l}_{\epsilon ,\delta _{\epsilon }}\), to a point \(\mathbf {x}^{n+1}\ne \mathbf {x}^{n}\) such that \(\mathbf {x}^{n+1}\in E(j_{n+1},\delta _{\epsilon })\) for a \(j_{n+1}\in J^{l}_{\epsilon ,\delta _{\epsilon }}\) is that \(\mathbf {u}_R^{n+1}\in W^{l}(j_{n},\delta _{\epsilon })\).

Let \(k^{\delta _{\epsilon }}\) be the largest integer such that (i) \(b^{k}\le \bar{S}_{\delta _{\epsilon }}^{-d}b^t\), with \(\bar{S}_{\delta _{\epsilon }}=2.5\bar{K}\delta _{\epsilon }\), \(\bar{K}<\infty \), as in Lemma 6, and (ii) \((k-t)/d\) is a positive integer (if necessary reduce \(\epsilon \) to fulfil this last condition). Let \(E^{\prime }(\delta _{\epsilon })=\{E^{\prime }(k,\delta _{\epsilon })\}_{k=1}^{b^{k^{\delta _{\epsilon }}-t}}\) be the partition of \([0,1)^d\) into hypercubes of volume \(b^{t-k^{\delta _{\epsilon }}}\). Then, for all \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\), \(W^l(j,\delta _{\epsilon })\) is covered by at most \(2^d|J^l_{\epsilon ,\delta _{\epsilon }}|\) hypercubes of \(E^{\prime }(\delta _{\epsilon })\).

Let \(\epsilon \) be small enough so that \(k^{\delta _{\epsilon }}>t+dR\). Then, using the properties of \((t,s)_R\)-sequences (see Section 3.1), it is easily checked that, for all \(n\ge 0\),

$$\begin{aligned} \mathrm {Pr}\big (\mathbf {u}_R^{n}\in E^{\prime }(k,\delta _{\epsilon })\big )\le b^{t-k^{\delta ^{\epsilon }}+dR},\quad \forall k\in 1:b^{k^{\delta _{\epsilon }}-t}. \end{aligned}$$
(13)

Thus, using (12)–(13) and the definition of \(k^{\delta _{\epsilon }}\), we obtain, for all \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\) and \(n\ge 0\),

$$\begin{aligned} \mathrm {Pr}\big (\mathbf {u}_R^{n}\in W^l(j,\delta _{\epsilon })\big )\le 2^d|J^l_{\epsilon ,\delta _{\epsilon }}|b^tb^{t-k^{\delta ^{\epsilon }}+dR}\le C^*\delta _\epsilon ,\quad C^*=2^d\bar{C}b^{t+1}(2.5\bar{K})^{d}b^{dR}. \end{aligned}$$

Consequently, using the definition of \(\epsilon \) and \(\delta _{\epsilon }\), and the fact that there exist at most \(2^d\) values of \(j\in J^l_{\epsilon ,\delta _{\epsilon }}\) such that, for \(n\in \mathbb {N}\), we have \(\mathbf {x}^{n}\in E(j,\delta _{\epsilon })\), we deduce that, for a \(p^*\in \mathbb {N}\) large enough (i.e. for \(\epsilon =2^{-p^*}\) small enough)

$$\begin{aligned} \mathrm {Pr}\big (E^{p}_{a,k}|\, \varphi (\mathbf {x}^{ab^k-1})=l\big )\le b^{k} 2^dC^* 2^{-p-1},\quad \forall (a,k)\in \mathbb {N}^2,\quad \forall l<\varphi ^*,\quad \forall p\ge p^* \end{aligned}$$

implying that, for \(p\ge p^*\),

$$\begin{aligned} \mathrm {Pr}\big (E^{p}_{a,k}\big )\le b^{k} 2^dC^* 2^{-p-1},\quad \forall (a,k)\in \mathbb {N}^2. \end{aligned}$$

Finally, because the uniform random numbers \(\mathbf {z}^n\)’s in \([0,1)^s\) that enter into the definition of \((t,s)_R\)-sequences are IID, this shows that

$$\begin{aligned} \mathrm {Pr}\big (\cap _{j=a}^{a+m}E_{j,k}^p\big )\le (b^{k}2^dC^*2^{-p-1})^m,\quad \forall (a,m,k)\in \mathbb {N}^3,\quad \forall p\ge p^*. \end{aligned}$$

To conclude the proof, for \(k\in \mathbb {N}\) let \(\rho _k\in (0,1)\) and \(p^*_k\ge p^*\) be such that

$$\begin{aligned} b^{k}2^dC^*2^{-p-1}\le \rho _k,\quad \forall p>p^*_k. \end{aligned}$$

Then, \(\mathrm {Pr}\big (\cap _{a\in \mathbb {N}} E^p_{a,k})=0\) for all \(p\ge p^*_k\), as required.

Lemma 8

Consider the set-up of Lemma 3. For \(k\in \mathbb {N}\), let \(\tilde{E}(dk)=\{\tilde{E}(j,k)\}_{j=1}^{b^{dk}}\) be the partition of \([0,1)^d\) into hypercubes of volume \(b^{-dk}\). Let \(k^{R} \in (dR+t):(dR+t+d)\) be the smallest integer k such \((k-t)/d\) is an integer and such that \((k-t)/d\ge R\) and, for \(m\in \mathbb {N}\), let \(I_m=\{mb^{k^R},\dots ,(m+1)b^{k^R}-1\}\). Then, for any \(\delta \in (0,\bar{\delta }_K]\) verifying \(k_\delta > t+d+dR\) (with \(\bar{\delta }_K\) and \(k_\delta \) as in Lemma 1), there exists a \(p(\delta )>0\) such that

$$\begin{aligned} \mathrm {Pr}\big (\exists n\in I_m:\,\, \mathbf {u}_R^{n}\in \tilde{E}(j, k_{\delta }-t_{\delta ,d})\big )\ge p(\delta ),\quad \forall j\in 1: b^{ k_{\delta }-t_{\delta ,d}},\quad \forall m\in \mathbb {N} \end{aligned}$$

where \(t_{\delta ,d}\in t:(t+d)\) is such that \((k_{\delta }-t_{\delta ,d})/d\in \mathbb {N}\).

Proof

Let \(m\in \mathbb {N}\) and note that, by the properties of \((t,s)_R\)-sequence, the point set \(\{\mathbf {u}_{\infty }^{n}\}_{n\in I_m}\) is a \((t,{k^R},d)\)-net in base b. Thus, for all \(j\in 1:b^{k^R-t}\), this point set contains \(b^t\) points in \(\tilde{E}(j, k^{R}-t)\) and, consequently, for all \(j\in 1:b^{dR}\), it contains \(b^tb^{k^R-t-dR}=b^{k^R-dR}\ge 1\) points in \(\tilde{E}(j,dR)\). This implies that, for all \(j\in 1:b^{dR}\), the point set \(\{\mathbf {u}_R^{n}\}_{n\in I_m}\) contains \(b^{k^R-dR}\ge 1\) points in \(\tilde{E}(j,dR)\) where, for all \( n\in I_{m_i}\), \(\mathbf {u}_R^{n}\) is uniformly distributed in \(\tilde{E}(j_{n},dR)\) for a \(j_n\in 1:b^{dR}\).

In addition, it is easily checked that each hypercube of the set \(\tilde{E}(dR)\) contains

$$\begin{aligned} b^{k_{\delta }-t_{\delta ,d}-dR}\ge b^{k_{\delta }-t-d-dR}> 1 \end{aligned}$$

hypercubes of the set \(\tilde{E}(k_{\delta }-t_{\delta ,d})\), where \(k_{\delta }\) and \(t_{\delta ,d}\) are as in the statement of the lemma. Note that the last inequality holds because \(\delta \) is chosen so that \(k_{\delta }> t+d+dR\). Consequently,

$$\begin{aligned} \mathrm {Pr}\big (\exists n\in I_m:\,\, \mathbf {u}_R^n\in \tilde{E}(j, k_{\delta }-t_{\delta ,d})\big )\ge p(\delta ):=b^{dR+t_{\delta ,d}-k_{\delta }}>0,\quad \forall j\in 1: b^{ k_{\delta }-t_{\delta ,d}} \end{aligned}$$

and the proof is complete.

Proof of Lemma 3:

To prove the lemma we need to introduce some additional notation. Let \(\varOmega =[0,1)^{\mathbb {N}}\), \(\mathcal {B}([0,1))\) be the Borel \(\sigma \)-algebra on \([0,1)\), \(\mathcal {F}= \mathcal {B}([0,1))^{\otimes \mathbb {N}}\) and \(\mathbb {P}\) be the probability measure on \((\varOmega ,\mathcal {F})\) defined by

$$\begin{aligned} \mathbb {P}(A)=\prod _{i\in \mathbb {N}}\lambda _1(A_i),\quad (A_1,\dots ,A_i\dots )\in \mathcal {B}([0,1))^{\otimes \mathbb {N}}. \end{aligned}$$

Next, for \(\omega \in \varOmega \), we denote by \(\big (\mathbf {U}_R^n(\omega )\big )_{n\ge 0}\) the sequence of points in \([0,1)^{d}\) defined, for all \(n\ge 0\), by (using the convention that empty sums are null),

$$\begin{aligned} \mathbf {U}_R^n(\omega )=\big (U_{R,1}^n(\omega ),\dots ,U_{R,d}^n(\omega )),\quad U_{R,i}^n(\omega )=\sum _{k=1}^{R}a_{ki}^nb^{-k}+b^{-R}\omega _{n d+i},\quad i\in 1:s. \end{aligned}$$

Note that, under \(\mathbb {P}\), \(\big (\mathbf {U}_R^n\big )_{n\ge 0}\) is a \((t,d)_R\)-sequence in base b. Finally, for \(\omega \in \varOmega \), we denote by \(\big (\mathbf {x}^n_\omega \big )_{n\ge 0}\) the sequence of points in \(\mathcal {X}\) generated by Algorithm 1 when the sequence \(\big (\mathbf {U}_R^n(\omega )\big )_{n\ge 0}\) is used as input.

Under the assumptions of the lemma there exists a set \(\varOmega _1\in \mathcal {F}\) such that \(\mathbb {P}(\varOmega _1)=1\) and

$$\begin{aligned} \exists \bar{\varphi }_\omega \in \mathbb {R}\text { such that }\lim _{n\rightarrow \infty }\varphi \big (\mathbf {x}^n_\omega \big )=\bar{\varphi }_\omega ,\quad \forall \omega \in \varOmega _1. \end{aligned}$$

Let \(\omega \in \varOmega _1\). Since \(\varphi \) is continuous, for any \(\epsilon >0\) there exists a \(N_{\omega , \epsilon }\in \mathbb {N}\) such that \(\mathbf {x}^n_\omega \in (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }\) for all \(n\ge N_{\omega ,\epsilon }\), where we recall that \((\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }=\{\mathbf {x}\in \mathcal {X}:\exists \mathbf {x}^{\prime }\in \mathcal {X}_{\bar{\varphi }_\omega }\text { such that } \Vert \mathbf {x}-\mathbf {x}^{\prime }\Vert _{\infty }\le \epsilon \}\). In addition, because \(\varphi \) is continuous and \(\mathcal {X}\) is compact, there exists an integer \(p_{\omega , \epsilon }\in \mathbb {N}\) such that we have both \(\lim _{\epsilon \rightarrow 0}p_{\omega ,\epsilon }=\infty \) and

$$\begin{aligned} (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }\subseteq (\mathcal {X}_{\varphi (x^{\prime })})_{2^{-p_{\omega ,\epsilon }}},\quad \forall x^{\prime }\in (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }. \end{aligned}$$
(14)

Next, let \(\mathbf {x}^*\in \mathcal {X}\) be such that \(\varphi (\mathbf {x}^*)=\varphi ^*\), \(k^R \in (dR+t):(dR+t+d)\) be as in Lemma 8 and, for \((p,a,k)\in \mathbb {N}_+^3\), let

$$\begin{aligned} \tilde{E}^{p}_{a,k}&=\Big \{\omega \in \varOmega :\,\exists n\in \{ab^{k},\dots ,(a+1)b^{k}-1\}:\ \mathbf {x}_{\omega }^{n}\ne \mathbf {x}^{ab^{k}-1}_{\omega },\,\varphi (\mathbf {x}_{\omega }^{ab^k-1})<\varphi ^*\Big \}\\&\cap \Big \{\omega \in \varOmega :\,\forall n\in \{ab^{k},\dots ,(a+1)b^{k}-1\}:\mathbf {x}^{n}_{\omega }\in \big (\mathcal {X}_{\varphi (\mathbf {x}^{ab^k-1}_{\omega })}\big )_{2^{-p}}\Big \}. \end{aligned}$$

Then, by Lemma 7, there exists a \(p^*\in \mathbb {N}\) such that \(\mathbb {P}\big (\cap _{a\in \mathbb {N}} \tilde{E}^p_{a,k^R}\big )=0\) for all \(p\ge p^*\), and thus the set \(\tilde{\varOmega }_2=\cap _{p\ge p^*}\big (\mathcal {X}\setminus \cap _{a\in \mathbb {N}} \tilde{E}^p_{a,k^R}\big )\) verifies \(\mathbb {P}(\tilde{\varOmega }_2)=1\). Let \(\varOmega _2=\varOmega _1\cap \tilde{\varOmega }_2\) so that \(\mathbb {P}(\varOmega _2)=1\).

For \(\omega \in \varOmega _2\) let \(\epsilon _\omega >0\) be small enough so that, for any \(\epsilon \in (0,\epsilon _\omega ]\), we can take \(p_{\omega ,\epsilon }\ge p^*\) in (14). Then, for any \(\omega \in \varOmega _2\) such that \(\bar{\varphi }_\omega <\varphi ^*\), there exists a subsequence \((m_i)_{i\ge 1}\) of \((m)_{m\ge 1}\) such that, for all \(i\ge 1\), either

$$\begin{aligned} \mathbf {x}^{n}_\omega =\mathbf {x}^{m_ib^{k^R}-1}_\omega ,\quad \forall n\in I_{m_i}:=\big \{m_i b^{k^R},\dots , (m_i+1)b^{k^R}-1\big \} \end{aligned}$$

or

$$\begin{aligned} \exists n\in I_{m_i}\text { such that }\mathbf {x}^{n}_\omega \not \in \Big (\mathcal {X}_{\varphi (\mathbf {x}^{m_ib^{k^R}-1}_\omega )}\Big )_{2^{-p_{\omega ,\epsilon }}}. \end{aligned}$$
(15)

Assume first that there exist infinitely many \(i\in \mathbb {N}\) such that (15) holds. Then, by (14), this leads to a contradiction with the fact that \(\omega \in \varOmega _2\subseteq \varOmega _1\). Therefore, for any \(\omega \in \varOmega _2\) such that \(\bar{\varphi }_{\omega }<\varphi ^*\) there exists a subsequence \((m_i)_{i\ge 1}\) of \((m)_{m\ge 1}\) such that, for a \(i^*\) large enough,

$$\begin{aligned} \mathbf {x}_\omega ^{n}=\mathbf {x}_\omega ^{m_ib^{k^R}-1},\quad \forall n\in I_{m_i},\quad \forall i\ge i^*. \end{aligned}$$
(16)

Let \( \tilde{\varOmega }_2=\{\omega \in \varOmega _2:\,\bar{\varphi }_{\omega }<\varphi ^*\}\subseteq \varOmega _2\) . Then, to conclude the proof, it remains to show that \(\mathbb {P}(\tilde{\varOmega }_2)=0\). We prove this result by contradiction and thus, from henceforth, we assume \(\mathbb {P}(\tilde{\varOmega }_2)>0\).

To this end, let \(\mathbf {x}^*\in \mathcal {X}\) be such that \(\varphi (\mathbf {x}^*)=\varphi ^*\), \(\mathbf {x}\in \mathcal {X}\) and \(\delta \in (0,\bar{\delta }_K]\), with \(\bar{\delta }_K\) as in Lemma 1. Then, using this latter, a sufficient condition to have \(F_K^{-1}(\mathbf {x},\mathbf {U}_R^{n}(\omega ))\in B_{\delta }(\mathbf {x}^*)\), \(n\ge 1\), is that \(\mathbf {U}_R^{n}(\omega )\in \underline{W}(\mathbf {x},\mathbf {x}^*,\delta )\), with \(\underline{W}(\cdot ,\cdot ,\cdot )\) as in Lemma 1. From the proof of this latter we know that the hypercube \(\underline{W}(\mathbf {x},\mathbf {x}^*,\delta )\) contains at least one hypercube of the set \(\tilde{E}(k_{\delta }-t_{\delta ,d})\), where \(t_{\delta ,d}\in t:(t+d)\) is such that \((k_{\delta }-t_{\delta ,d})/d\in \mathbb {N}\) and, for \(k\in \mathbb {N}\), \(\tilde{E}(dk)\) is as in Lemma 8. Hence, by this latter, for any \(\delta \in (0,\delta ^*]\), with \(\delta ^*\) such that \(k_{\delta ^*}>t+d+dR\) (where, for \(\delta >0\), \(k_\delta \) is defined in Lemma 1), there exists a \(p(\delta )>0\) such that

$$\begin{aligned} \mathbb {P}\Big (\omega \in \varOmega :\, \exists n\in I_m,\, F_K^{-1}\big (\mathbf {x}, \mathbf {U}_R^n(\omega )\big )\in B_{\delta }(\mathbf {x}^*)\Big )\ge p(\delta ),\quad \forall (\mathbf {x},m)\in \mathcal {X}\times \mathbb {N} \end{aligned}$$

and thus, using (16) and under Assumption (A2), it is easily checked that, for any \(\delta \in (0,\delta ^*]\),

$$\begin{aligned} \mathbb {P}_2\Big (\omega \in \tilde{\varOmega }_2: F_K^{-1}\big (\mathbf {x}_{\omega }^{n-1}, \mathbf {U}_R^n(\omega )\big )\in B_{\delta }(\mathbf {x}^*)\text { for infinitly many }n\in \mathbb {N}\Big )=1 \end{aligned}$$

where \(\mathbb {P}_2\) denotes the restriction of \(\mathbb {P}\) on \(\tilde{\varOmega }_2\) (recall that we assume \(\mathbb {P}(\tilde{\varOmega }_2)>0\)).

For \(\delta >0\), let

$$\begin{aligned} \varOmega ^{\prime }_{\delta }=\Big \{\omega \in \tilde{\varOmega }_2:\,\, F_K^{-1}(\mathbf {x}_{\omega }^{n-1}, \mathbf {U}_R^n(\omega ))\in B_{\delta }(\mathbf {x}^*)\text { for infinitly many }n\in \mathbb {N}\Big \} \end{aligned}$$

and let \(\tilde{p}^*\in \mathbb {N}\) be such that \(2^{-\tilde{p}^*}\le \delta ^*\). Then, the set \(\varOmega ^{\prime }=\cap _{\tilde{p}\ge \tilde{p}^*}\varOmega ^{\prime }_{2^{-\tilde{p}}}\) verifies \(\mathbb {P}_2(\varOmega ^{\prime })=1\).

To conclude the proof let \(\omega \in \varOmega ^{\prime }\). Then, because \(\varphi \) is continuous and \(\bar{\varphi }_\omega <\varphi ^*\), there exists a \(\tilde{\delta }_{\bar{\varphi }_\omega }>0\) so that \(\varphi (\mathbf {x})>\bar{\varphi }\) for all \(\mathbf {x}\in B_{\tilde{\delta }_{\bar{\varphi }_\omega }}(\mathbf {x}^*)\). Let \(\delta _{\bar{\varphi }_\omega }:=2^{-\tilde{p}_{\omega ,\epsilon }}\ge \tilde{\delta }_{\bar{\varphi }_\omega }\wedge \bar{\delta }_K\) for an integer \(\tilde{p}_{\omega ,\epsilon }\ge \tilde{p}^*\). Next, take \(\epsilon \) small enough so that we have both \(B_{\delta _{\bar{\varphi }_\omega }}(\mathbf {x}^*)\cap (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }=\varnothing \) and \(\varphi (\mathbf {x})\ge \varphi (\mathbf {x}^{\prime })\) for all \((\mathbf {x},\mathbf {x}^{\prime })\in B_{\delta _{\bar{\varphi }_\omega }}(\mathbf {x}^*)\times (\mathcal {X}_{\bar{\varphi }_\omega })_{\epsilon }\).

Using above computations, the set \( B_{\tilde{\delta }_{\bar{\varphi }_\omega }}(\mathbf {x}^*)\) is visited infinitely many time and thus \(\varphi (\mathbf {x}_\omega ^n)>\bar{\varphi }_\omega \) for infinitely many \(n\in \mathbb {N}\), contradicting the fact that \(\varphi (\mathbf {x}_\omega ^n)\rightarrow \bar{\varphi }_\omega \) as \(n\rightarrow \infty \). Hence, the set \(\varOmega ^{\prime }\) is empty. On the other hand, as shown above, under the assumption \(\mathbb {P}(\tilde{\varOmega }_2)>0\) we have \(\mathbb {P}_2(\varOmega ^{\prime })=1\) and, consequently, \(\varOmega ^{\prime }\ne \varnothing \). Therefore, we must have \(\mathbb {P}(\tilde{\varOmega }_2)=0\) and the proof is complete.

Appendix 4: Proof of Theorem 2

Using Lemmas 4 and 5, we know that \(\varphi (x^n)\rightarrow \bar{\varphi }\in \mathbb {R}\) and thus it remains to show that \(\bar{\varphi }=\varphi ^*\).

Assume that \(\bar{\varphi }\ne \varphi ^*\) and, for \(\epsilon =2^{-p}\), \(p\in \mathbb {N}_+\), let \(N_\epsilon \in \mathbb {N}\), \(p_\epsilon \) and \(\delta _{\bar{\varphi }}>0\) be as in the proof of Lemma 3 (with the dependence of \(N_\epsilon \), \(p_\epsilon \) and of \(\delta _{\bar{\varphi }}\) on \(\omega \in \varOmega \) suppressed in the notation for obvious reasons).

Let \(x^*\in \mathcal {X}\) be a global maximizer of \(\varphi \) and \(n=a_nb^{k_{\delta _{\bar{\varphi }}}}-1\) with \(a_n\in \mathbb {N}\) such that \(n>N_{\epsilon }\). For \(k\in \mathbb {N}\), let \(E(k)=\{E(j,k)\}_{j=1}^{k}\) be the splitting of [0, 1] into closed hypercubes of volume \(k^{-1}\). Then, by Lemma 6, a necessary condition to have a move at iteration \(n^{\prime }+1\ge 1\) of Algorithm 1 from \(x^{n^{\prime }}\in (\mathcal {X}_{\bar{\varphi }})_{\epsilon }\) to \(x^{n^{\prime }+1}\ne x^{n^{\prime }}\), \(x^{n^{\prime }+1}\in (\mathcal {X}_{\bar{\varphi }})_{\epsilon }\) is that

$$\begin{aligned} u_{\infty }^{n^{\prime }}\in \bar{W}(\epsilon ):=\bigcup _{j,j^{\prime }\in J^{\bar{\varphi }}_{\epsilon , \epsilon /2}} \bar{W}(\bar{x}^j,\bar{x}^{j^{\prime }},\epsilon /2) \end{aligned}$$

where, for \(j\in 1:(\epsilon /2)^{-d}\), \(\bar{x}^j\) denotes the center of \(E(j,\epsilon /2)\), \(J^{\bar{\varphi }}_{\epsilon , \epsilon /2}\) is as in the proof of Lemma 7 and \(\bar{W}(\cdot ,\cdot ,\cdot )\) is as in Lemma 6. Note that, using (12) with \(d=1\), \(|J^{\bar{\varphi }}_{\epsilon , \epsilon /2}|\le C^*\) for a constant \(C^*<\infty \) (independent of \(\epsilon \)).

Let \(b^{k^{\delta _{\epsilon }}}\) be the largest integer \(k\ge t\) such that \(b^{t-k}\ge \bar{S}_{\epsilon /2}^{d}\), with \(\bar{S}_{\epsilon /2}\) as in Lemma 6, and let \(\epsilon \) be small enough so that \(b^{k^{\delta _{\epsilon }}}>2^dC^*b^t\). The point set \(\{u_{\infty }^{n^{\prime }}\}_{n^{\prime }=a_nb^{k^{\delta _{\epsilon }}}}^{ (a_n+1)b^{k^{\delta _{\epsilon }}}-1}\) is a \((t, k^{\delta _{\epsilon }},d)\)-net in base b and thus the set \(\bar{W}(\epsilon )\) contains at most \(2^dC^* b^t\) points of this points set. Hence, if for \(n>N_{\epsilon }\) only moves inside the set \((\mathcal {X}_{\bar{\varphi }})_{\epsilon }\) occur, then, for a \(\tilde{n}\in a_nb^{k^{\delta _{\epsilon }}}:\big ((a_n+1)b^{k^{\delta _{\epsilon }}}-\eta _{\epsilon }-1)\big )\), the point set \(\{x^{n^{\prime }}\}_{n^{\prime }=\tilde{n}}^{\tilde{n}+\eta _{\epsilon }}\) is such that \(x^{n^{\prime }}=x^{\tilde{n}}\) for all \(n\in \tilde{n}:(\tilde{n}+\eta _{\epsilon })\), where \(\eta _{\epsilon }\ge \lfloor \frac{b^{k^{\delta _{\epsilon }}}}{2^dC^*{}^2b^t}\rfloor \); note that \(\eta _{\epsilon }\rightarrow \infty \) as \(\epsilon \rightarrow 0\).

Let \(k^{\epsilon }_0\) be the largest integer which verifies \(\eta _{\epsilon }\ge 2b^{k^{\epsilon }_0}\) so that \(\{u_{\infty }^n\}_{n=\tilde{n}}^{\tilde{n}+\eta _{\epsilon }}\) contains at least one \((t,k^{\epsilon }_0,d)\)-net in base b. Note that \(k^{\epsilon }_0\rightarrow \infty \) as \(\epsilon \rightarrow 0\), and let \(\epsilon \) be small enough so that \(k^{\epsilon }_0\ge k_{\delta _{\bar{\varphi }}}\), with \(k_{\delta }\) as in Lemma 1. Then, by Lemma 1, there exists at least one \(n^*\in (\tilde{n}+1):(\tilde{n}+\eta _{\epsilon })\) such that \(\tilde{y}^{n^*}:=F_K^{-1}(x^{\tilde{n}},u_{\infty }^{n^*})\in B_{\delta _{\bar{\varphi }}}(x^*)\). Since, by the definition of \(\delta _{\bar{\varphi }}\), for all \((x,x^{\prime })\in B_{\delta _{\bar{\varphi }}}(x^*)\times (\mathcal {X}_{\bar{\varphi }})_{\epsilon }\), and for \(\epsilon \) small enough, we have \(\varphi (x)>\varphi (x^{\prime })\), it follows that \(\varphi (\tilde{y}^{n^*})>\varphi (x^{\tilde{n}})\). Hence, there exists at least one \(n\in \tilde{n}:(\tilde{n}+\eta _{\epsilon })\) such that \(x^n\ne x^{\tilde{n}}\), which contradicts the fact that \(x^n=\tilde{x}\) for all \(n\in \tilde{n}:(\tilde{n}+\eta _{\epsilon })\). This shows that \(\bar{\varphi }\) is indeed the global maximum of \(\varphi \).

Appendix 5: Additional figures for the example of Sect. 6.1

Fig. 3
figure 3

Minimization of \(\tilde{\varphi }_1\) defined by (2) for 1000 starting values sampled independently and uniformly on \(\mathcal {X}_1\). Results are presented for the Cauchy kernel \((K_n^{(2)})_{n\ge 1}\) with \((T_n^{(1)})_{n\ge 1}\) (top plots) and for \((T_n^{(2)})_{n\ge 1}\). For \(m=1,2\), simulations are done for the smallest (left plots) and the highest (right) value of \(T_n^{(m)}\) given in (4). The plots show the minimum number of iterations needed for SA (white boxes) and QMC-SA to find a \(\mathbf {x}\in \mathcal {X}_1\) such that \(\tilde{\varphi }_1(\mathbf {x})<10^{-5}\). For each starting value, the Monte Carlo algorithm is run only once and the QMC-SA algorithm is based on the Sobol’ sequence

Fig. 4
figure 4

Minimization of \(\tilde{\varphi }_1\) defined by (2) for 1000 starting values sampled independently and uniformly on \(\mathcal {X}_1\). Results are presented for the Gaussian kernel \((K_n^{(3)})_{n\ge 1}\) with \((T_n^{(1)})_{n\ge 1}\) (top plots) and for \((T_n^{(3)})_{n\ge 1}\). For \(m=\{1,3\}\), simulations are done for the smallest (left plots) and the highest (right) value of \(T_n^{(m)}\) given in (4). The plots show the minimum number of iterations needed for SA (white boxes) and QMC-SA to find a \(\mathbf {x}\in \mathcal {X}_1\) such that \(\tilde{\varphi }_1(\mathbf {x})<10^{-5}\). For each starting value, the Monte Carlo algorithm is run only once and the QMC-SA algorithm is based on the Sobol’ sequence

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gerber, M., Bornn, L. Improving simulated annealing through derandomization. J Glob Optim 68, 189–217 (2017). https://doi.org/10.1007/s10898-016-0461-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-016-0461-1

Keywords

Navigation