Skip to main content
Log in

Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Surrogate functions are often employed to reduce the number of objective function evaluations in a continuous optimization. However, their effects have seldom been investigated theoretically. This paper analyzes the effect of a surrogate function in the information-geometric optimization (IGO) framework, which includes as an algorithm instance a variant of the covariance matrix adaptation evolution strategy—a widely used solver for black-box continuous optimization. We derive a sufficient condition on the surrogate function for the parameter update in the IGO algorithms to point to a descent direction of the objective function expected over the search distribution. The condition is expressed in terms of three measures of correlation between the objective function and the surrogate function. Our result constitutes a partial justification for the use of a surrogate function in IGO algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)

    Article  Google Scholar 

  2. Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)

    Article  Google Scholar 

  3. Hansen, N., Kern, S.: Evaluating the CMA evolution strategy on multimodal test functions. In: Parallel Problem Solving from Nature—PPSN VIII, pp. 282–291 (2004)

  4. Akimoto, Y., Hansen, N.: Diagonal acceleration for covariance matrix adaptation evolution strategies. Evol. Comput. 28(3), 405–435 (2020)

    Article  Google Scholar 

  5. Jastrebski, G.A., Arnold, D.V.: Improving evolution strategies through active covariance matrix adaptation. In: 2006 IEEE International Conference on Evolutionary Computation, pp. 2814–2821 (2006)

  6. Hansen, N., Auger, A., Ros, R., Finck, S., Pošík, P.: Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1689–1696 (2010)

  7. Rios, L.M., Sahinidis, N.V.: Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Global Optim. 56(3), 1247–1293 (2013)

    Article  MathSciNet  Google Scholar 

  8. Urieli, D., MacAlpine, P., Kalyanakrishnan, S., Bentor, Y., Stone, P.: On optimizing interdependent skills: a case study in simulated 3d humanoid robot soccer. In: Proceedings of 10th International Conference on Autonomous Agents and Multiagent Systems, vol. 11, pp. 769–776 (2011)

  9. Maki, A., Sakamoto, N., Akimoto, Y., Nishikawa, H., Umeda, N.: Application of optimal control theory based on the evolution strategy (CMA-ES) to automatic berthing. J. Mar. Sci. Technol. 25(1), 221–233 (2020)

    Article  Google Scholar 

  10. Schafroth, D., Bermes, C., Bouabdallah, S., Siegwart, R.: Modeling, system identification and robust control of a coaxial micro helicopter. Control. Eng. Pract. 18(7), 700–711 (2010)

    Article  Google Scholar 

  11. Fujii, G., Akimoto, Y., Takahashi, M.: Exploring optimal topology of thermal cloaks by CMA-ES. Appl. Phys. Lett. 112(6), 061108 (2018)

    Article  Google Scholar 

  12. Marsden, A.L., Wang, M., Dennis, J.E., Moin, P.: Optimal aeroacoustic shape design using the surrogate management framework. Optim. Eng. 5(2), 235–262 (2004)

    Article  MathSciNet  Google Scholar 

  13. Hitz, G., Galceran, E., Garneau, M.-È., Pomerleau, F., Siegwart, R.: Adaptive continuous-space informative path planning for online environmental monitoring. J. Field Robot. 34(8), 1427–1449 (2017)

    Article  Google Scholar 

  14. Sadeghi, M., Kalantar, M.: Multi types dg expansion dynamic planning in distribution system under stochastic conditions using covariance matrix adaptation evolutionary strategy and monte-carlo simulation. Energy Convers. Manag. 87, 455–471 (2014)

    Article  Google Scholar 

  15. Bouzarkouna, Z., Ding, D.Y., Auger, A.: Well placement optimization with the covariance matrix adaptation evolution strategy and meta-models. Comput. Geosci. 16(1), 75–92 (2012)

    Article  Google Scholar 

  16. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, pp. 2455–2467 (2018)

  17. Chrabaszcz, P., Loshchilov, I., Hutter, F.: Back to basics: Benchmarking canonical evolution strategies for playing Atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 1419–1426 (2018)

  18. Volz, V., Schrum, J., Liu, J., Lucas, S.M., Smith, A., Risi, S.: Evolving Mario levels in the latent space of a deep convolutional generative adversarial network. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 221–228 (2018)

  19. Tanabe, T., Fukuchi, K., Sakuma, J., Akimoto, Y.: Level generation for angry birds with sequential VAE and latent variable evolution. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1052–1060 (2021)

  20. Nomura, M., Watanabe, S., Akimoto, Y., Ozaki, Y., Onishi, M.: Warm starting CMA-ES for hyperparameter optimization. Proc. AAAI Conf. Artif. Intell. 35(10), 9188–9196 (2021)

    Google Scholar 

  21. Loshchilov, I., Schoenauer, M., Sebag, M.: Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 321–328 (2012)

  22. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft. Comput. 9(1), 3–12 (2005)

    Article  Google Scholar 

  23. Pitra, Z., Hanuš, M., Koza, J., Tumpach, J., Holeňa, M.: Interaction between model and its evolution control in surrogate-assisted CMA evolution strategy. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 528–536 (2021)

  24. Hansen, N.: A global surrogate assisted CMA-ES. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 664–672 (2019)

  25. Akimoto, Y., Shimizu, T., Yamaguchi, T.: Adaptive objective selection for multi-fidelity optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 880–888 (2019)

  26. Miyagi, A., Fukuchi, K., Sakuma, J., Akimoto, Y.: Adaptive scenario subset selection for min–max black-box continuous optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 697–705 (2021)

  27. Miyagi, A., Fukuchi, K., Sakuma, J., Akimoto, Y.: Black-box min-max continuous optimization using cma-es with worst-case ranking approximation. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 823–831 (2022)

  28. Akimoto, Y., Sakamoto, N., Ohtani, M.: Multi-fidelity optimization approach under prior and posterior constraints and its application to compliance minimization. In: Parallel Problem Solving from Nature—PPSN XVI, pp. 81–94 (2020)

  29. Miyagi, A., Akimoto, Y., Yamamoto, H.: Well placement optimization under geological statistical uncertainty. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1284–1292 (2019)

  30. Jin, Y.: Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol. Comput. 1(2), 61–70 (2011)

    Article  Google Scholar 

  31. Kayhani, A., Arnold, D.V.: Design of a surrogate model assisted (1+1)-es. In: Parallel Problem Solving from Nature—PPSN XV, pp. 16–28 (2018)

  32. Yang, J., Arnold, D.V.: A surrogate model assisted (1+1)-es with increased exploitation of the model. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 727–735 (2019)

  33. Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(1), 564–628 (2017)

    MathSciNet  Google Scholar 

  34. Hajek, B.: Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3), 502–525 (1982)

    Article  MathSciNet  Google Scholar 

  35. Devroye, L.: Non-uniform Random Variate Generation, 1st edn. Springer, New York (1986)

    Book  Google Scholar 

  36. Akimoto, Y., Auger, A., Hansen, N.: An ode method to prove the geometric convergence of adaptive stochastic algorithms. Stochastic Processes Appl. 145, 269–307 (2022)

    Article  MathSciNet  Google Scholar 

  37. Arratia, R., Gordon, L.: Tutorial on large deviations for the binomial distribution. Bull. Math. Biol. 51(1), 125–131 (1989)

    Article  MathSciNet  Google Scholar 

  38. Stanica, P.: Good lower and upper bounds on binomial coefficients. J. Inequal. Pure Appl. Math. 2(3), 30 (2001)

    MathSciNet  Google Scholar 

  39. Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: Parallel Problem Solving from Nature, PPSN XI, pp. 154–163 (2010)

  40. Harville, D.A.: Matrix Algebra From a Statistician’s Perspective, 1st edn. Springer, New York (1998)

    Google Scholar 

  41. Ros, R., Hansen, N.: A simple modification in CMA-ES achieving linear time and space complexity. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature—PPSN X, pp. 296–305 (2008)

  42. Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Theoretical foundation for CMA-ES from information geometry perspective. Algorithmica 64(4), 698–716 (2012)

    Article  MathSciNet  Google Scholar 

  43. Akimoto, Y., Auger, A., Hansen, N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 373–380 (2014)

  44. Akimoto, Y., Ollivier, Y.: Objective improvement in information-geometric optimization. In: Proceedings of the Twelfth Workshop on Foundations of Genetic Algorithms XII. FOGA XII ’13, pp. 1–10 (2013)

  45. Akimoto, Y.: Analysis of a natural gradient algorithm on monotonic convex-quadratic-composite functions. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 1293–1300 (2012)

  46. Uchida, K., Shirakawa, S., Akimoto, Y.: Finite-sample analysis of information geometric optimization with isotropic gaussian distribution on convex quadratic functions. IEEE Trans. Evol. Comput. 24(6), 1035–1049 (2020)

    Article  Google Scholar 

  47. Lehmann, E.L., Casella, G.: Theory of Point Estimation, 2nd edn. Springer, New York (2006)

    Google Scholar 

  48. Magnus, J.R.: The moments of products of quadratic forms in normal variables. Stat. Neerl. 32(4), 201–210 (1978)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research is partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 19H04179 and the New Energy and Industrial Technology Development Organization (NEDO) Project Number JPNP18002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youhei Akimoto.

Ethics declarations

Conflicts of interest

We have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Proofs

A Proofs

1.1 A.1 Proof of Proposition 18

Proof

Let \(\Delta ^{(t)}_g = (\Delta _m, {{\,\textrm{vec}\,}}(\Delta _\Sigma ))\). We have

$$\begin{aligned} \psi (\alpha \Delta ^{(t)}_g{;} \theta ^{(t)}) = J(\theta ^{(t)}+ \alpha \Delta ^{(t)}_g) - J(\theta ^{(t)}) - \alpha \nabla J(\theta ^{(t)})^\textrm{T}\Delta ^{(t)}_g = \frac{\alpha ^2}{2} \Delta _m^\textrm{T}A \Delta _{m}. \end{aligned}$$
(77)

Noting that \(\mathbb {E}[g(X_\theta )] = {\tilde{\nu }}^\textrm{T}\mathbb {E}[s(\theta {;} X_\theta )] = 0\), we have

$$\begin{aligned} {{\,\textrm{Var}\,}}[g(X_\theta )]\mathbb {E}[ \Delta _m^\textrm{T}A \Delta _{m}]= & {} \frac{1}{\lambda } \mathbb {E}\left[ g(X_\theta )^2 (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] \nonumber \\{} & {} + \frac{\lambda - 1}{\lambda } \mathbb {E}\left[ g(X_\theta ) (X_\theta - m) \right] ^\textrm{T}A \mathbb {E}\left[ g(X_\theta ) (X_\theta - m) \right] ,\nonumber \\ \end{aligned}$$
(78)

where

$$\begin{aligned} \mathbb {E}\left[ g(X_\theta ) (X_\theta - m) \right] = \mathbb {E}[(X_\theta -m)(X_\theta -m)^\textrm{T}\Sigma ^{-1} {\tilde{\nu }}_m] = {\tilde{\nu }}_m. \end{aligned}$$
(79)

Let \(H_1 = \Sigma ^{-1/2} {\tilde{\nu }}_m {\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1/2}\), \(H_2 = \Sigma ^{-1/2} {\tilde{\nu }}_\Sigma \Sigma ^{-1/2}\), and \(H_3 = \Sigma ^{1/2} A \Sigma ^{1/2}\). Let \(Z = \Sigma ^{-1/2}(X_\theta - m)\). Then Z follows a standard normal distribution, and we have

$$\begin{aligned}&\mathbb {E}\left[ g(X_\theta )^2 (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] \end{aligned}$$
(80a)
$$\begin{aligned}&\quad =\ \mathbb {E}[(Z^\textrm{T}H_1 Z) (Z^\textrm{T}H_3 Z)] \end{aligned}$$
(80b)
$$\begin{aligned}&\qquad + \frac{1}{4}\mathbb {E}[(Z^\textrm{T}H_2 Z)^2 (Z^\textrm{T}H_3 Z)] \end{aligned}$$
(80c)
$$\begin{aligned}&\qquad - \frac{1}{2}{{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1})\mathbb {E}[(Z^\textrm{T}H_2 Z) (Z^\textrm{T}H_3 Z)] \end{aligned}$$
(80d)
$$\begin{aligned}&\qquad + \frac{1}{4}{{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1})^2\mathbb {E}[(Z^\textrm{T}H_3 Z)]. \end{aligned}$$
(80e)

By using the formula for the expectation of the product of quadratic forms of Gaussian random vectors [48, Theorem 5.1], we obtain

$$\begin{aligned} (80b)&= {{\,\textrm{Tr}\,}}(H_1){{\,\textrm{Tr}\,}}(H_3) + 2{{\,\textrm{Tr}\,}}(H_1 H_3), \end{aligned}$$
(81a)
$$\begin{aligned} (80c)&= \frac{1}{4} ({{\,\textrm{Tr}\,}}(H_2)^2 {{\,\textrm{Tr}\,}}(H_3) + 2{{\,\textrm{Tr}\,}}(H_2^2){{\,\textrm{Tr}\,}}(H_3) \nonumber \\&\quad + 4{{\,\textrm{Tr}\,}}(H_2 H_3) {{\,\textrm{Tr}\,}}(H_2) + 8 {{\,\textrm{Tr}\,}}(H_2^2 H_3) ), \end{aligned}$$
(81b)
$$\begin{aligned} (80d)&= - \frac{1}{2}{{\,\textrm{Tr}\,}}(H_2)({{\,\textrm{Tr}\,}}(H_2){{\,\textrm{Tr}\,}}(H_3) + 2{{\,\textrm{Tr}\,}}(H_2 H_3)), \end{aligned}$$
(81c)
$$\begin{aligned} (80e)&= \frac{1}{4}{{\,\textrm{Tr}\,}}(H_2)^2 {{\,\textrm{Tr}\,}}(H_3). \end{aligned}$$
(81d)

Rearranging these terms, we obtain

$$\begin{aligned}&\mathbb {E}\left[ g(X_\theta )^2 (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] \end{aligned}$$
(82a)
$$\begin{aligned}&= {{\,\textrm{Tr}\,}}(H_3) ({{\,\textrm{Tr}\,}}(H_1) + \frac{1}{2}{{\,\textrm{Tr}\,}}(H_2^2)) + 2 {{\,\textrm{Tr}\,}}(H_1 H_3) + 2 {{\,\textrm{Tr}\,}}(H_2^2 H_3) \end{aligned}$$
(82b)
$$\begin{aligned}&= {{\,\textrm{Var}\,}}[g(X_\theta )] {{\,\textrm{Tr}\,}}(A \Sigma ) + 2{\tilde{\nu }}_m^\textrm{T}A {\tilde{\nu }}_m + 2 {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma A), \end{aligned}$$
(82c)

for which we used the fact that \({{\,\textrm{Var}\,}}[g(X_\theta )] = {{\,\textrm{Tr}\,}}(H_1) + \frac{1}{2}{{\,\textrm{Tr}\,}}(H_2^2)\). Finally, from (78), we obtain

$$\begin{aligned} \mathbb {E}[ \Delta _m^\textrm{T}A \Delta _{m}] = \frac{1}{\lambda } {{\,\textrm{Tr}\,}}(A \Sigma ) + \frac{1}{\lambda }\frac{(\lambda + 1){\tilde{\nu }}_m^\textrm{T}A {\tilde{\nu }}_m + 2 {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma A)}{{\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1} {\tilde{\nu }}_m + \frac{1}{2} {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma \Sigma ^{-1})}. \end{aligned}$$
(83)

In light of Proposition 5 and the fact that \(\nabla J(\theta )^\textrm{T}{\mathcal {I}}(\theta )^{-1} \nabla J(\theta ) = {{\,\textrm{Var}\,}}[f(X_\theta )] = \nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ) = (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\), we have

$$\begin{aligned} \nabla J(\theta ^{(t)})^\textrm{T}\mathbb {E}_t[\Delta ^{(t)}_f] = - \left( (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\right) ^{1/2}. \end{aligned}$$
(84)

From (83) and (84), we obtain (71a).

Inequality (71b) is derived as follows. Using the inequalities \(\frac{{\tilde{\nu }}_m^\textrm{T}A {\tilde{\nu }}_m}{{\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1} {\tilde{\nu }}_m} \le \sigma _{\max }(A \Sigma )\) and \(\frac{ {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma A) }{ {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma \Sigma ^{-1}) } \le \sigma _{\max }(A \Sigma )\), where \(\sigma _{\max }(A \Sigma )\) denotes the greatest singular value of \(A \Sigma \), we obtain

$$\begin{aligned} \frac{{{\bar{\alpha }}}(\theta {;} \alpha )}{\alpha / 2} \le \frac{\frac{1}{\lambda } {{\,\textrm{Tr}\,}}(A \Sigma ) + \frac{\lambda + 5}{\lambda } \sigma _{\max }(A \Sigma ) }{ ( (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma ))^{1/2} }. \end{aligned}$$
(85)

Moreover, by using \(\frac{{{\,\textrm{Tr}\,}}(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le d^{1/2}\) and \(\frac{\sigma _{\max }(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le 1\), we have

$$\begin{aligned} \frac{{{\bar{\alpha }}}(\theta {;} \alpha )}{\alpha / 2} \le 2^{1/2} \left( \frac{2 d^{1/2}}{\lambda } + \frac{\lambda + 5}{\lambda } \right) . \end{aligned}$$
(86)

This completes the proof. \(\square \)

1.2 A.2 Proof of Lemma 20

Proof

Note that \(\frac{\mathbb {E}[ (\nu (\theta ) s(\theta {;} X_\theta ))^4 ]}{ (\nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ))^2 } = \frac{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^4]}{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^2]}\). We bound this ratio as follows. Let the eigenvalue decomposition of \(\sqrt{\Sigma } A \sqrt{\Sigma }\) be denoted by \(E D E^\textrm{T}\), where D is the diagonal matrix composed of the eigenvalues of \(\sqrt{\Sigma } A \sqrt{\Sigma }\), and E is the orthogonal matrix composed of the unit eigenvectors of \(\sqrt{\Sigma } A \sqrt{\Sigma }\). Let \(Z = E^\textrm{T}\sqrt{\Sigma }^{-1}(X_\theta - m)\) and \(v = E^\textrm{T}\sqrt{\Sigma } A (m - x^*)\). Then by a simple derivation, we have

$$\begin{aligned}&f(Z_\theta ) - E[f(X_\theta )] \end{aligned}$$
(87a)
$$\begin{aligned}&\quad = \frac{1}{2} (X_\theta - m)^\textrm{T}A (X_\theta - m) - \frac{1}{2} {{\,\textrm{Tr}\,}}(A \Sigma ) + (m - x^*)^\textrm{T}A (X_\theta - m) \end{aligned}$$
(87b)
$$\begin{aligned}&\quad = \frac{1}{2} Z^\textrm{T}E^\textrm{T}\sqrt{\Sigma } A \sqrt{\Sigma } E Z - \frac{1}{2} {{\,\textrm{Tr}\,}}( \sqrt{\Sigma } A \sqrt{\Sigma }) + v^\textrm{T}E^\textrm{T}\sqrt{\Sigma }^{-1} A^{-1} A \sqrt{\Sigma } E Z \end{aligned}$$
(87c)
$$\begin{aligned}&\quad = \frac{1}{2} Z^\textrm{T}D Z - \frac{1}{2} {{\,\textrm{Tr}\,}}( D ) + v^\textrm{T}Z \end{aligned}$$
(87d)
$$\begin{aligned}&\quad = \frac{1}{2} \sum _{i=1}^{d} \left( d_i (Z_i^2 - 1) + 2 v_i Z_i \right) . \end{aligned}$$
(87e)

Let

$$\begin{aligned} \mu _{i,2} = \mathbb {E}\left[ \left( d_i (Z_i^2 - 1) + 2 v_i Z_i \right) ^2 \right]&= 2 d_i^2 + 4 v_i^2, \end{aligned}$$
(88a)
$$\begin{aligned} \mu _{i,4} = \mathbb {E}\left[ \left( d_i (Z_i^2 - 1) + 2 v_i Z_i \right) ^4 \right]&= 60 d_i^4 + 240 d_i^2 v_i^2 + 48 v_i^4. \end{aligned}$$
(88b)

Note that the \(Z_i\) values are independent and follow a standard normal distribution. A simple derivation leads to

$$\begin{aligned} \mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^2]&= \frac{1}{4} \sum _{i=1}^{d} \mu _{i,2}, \end{aligned}$$
(89a)
$$\begin{aligned} \mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^4]&= \frac{1}{16}\sum _{i=1}^{d}\left( \mu _{i,4} - 3 \mu _{i,2}^2 \right) + \frac{3}{16}\left( \sum _{i=1}^d \mu _{i,2} \right) ^2 . \end{aligned}$$
(89b)

Using \(\mu _{i,4} - 3 \mu _{i,2}^2 \le 12 \mu _{i,2}^2\), we obtain

$$\begin{aligned} \frac{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^4]}{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^2]^2}&= \frac{\sum _{i=1}^{d}\left( \mu _{i,4} - 3 \mu _{i,2}^2 \right) }{ \left( \sum _{i=1}^d \mu _{i,2} \right) ^2 } + 3 \end{aligned}$$
(90a)
$$\begin{aligned}&\le \frac{12 \sum _{i=1}^{d} \mu _{i,2}^2}{ \left( \sum _{i=1}^d \mu _{i,2} \right) ^2 } + 3 \le 15. \end{aligned}$$
(90b)

Therefore, \(N_s \le 15\) for all \(\theta \in \Theta \). This completes the proof. \(\square \)

1.3 A.3 Proof of Lemma 21

Proof

Note that \(\lambda \left| {\hat{W}}_i\right| \le N_w\) with probability one for all \(i = 1, \dots , \lambda \). Then

$$\begin{aligned} \mathbb {E}_t\left[ \Delta _m^\textrm{T}A \Delta _m \right]&= \sum _{i=1}^{\lambda }\sum _{j=1}^{\lambda } \mathbb {E}\left[ {\hat{W}}_i{\hat{W}}_j (X_i - m)^\textrm{T}A (X_j - m) \right] \end{aligned}$$
(91a)
$$\begin{aligned}&\le \sum _{i=1}^{\lambda }\sum _{j=1}^{\lambda } \mathbb {E}\left[ \left| {\hat{W}}_i\right| \left| {\hat{W}}_j\right| \left| (X_i - m)^\textrm{T}A (X_j - m)\right| \right] \end{aligned}$$
(91b)
$$\begin{aligned}&\le \frac{N_w^2}{\lambda ^2} \sum _{i=1}^{\lambda }\sum _{j=1}^{\lambda } \mathbb {E}\left[ \left| (X_i - m)^\textrm{T}A (X_j - m)\right| \right] \end{aligned}$$
(91c)
$$\begin{aligned}&= \frac{N_w^2}{\lambda ^2} \sum _{i=1}^{\lambda }\sum _{j \ne i} \mathbb {E}\left[ \left| (X_i - m)^\textrm{T}A (X_j - m)\right| \right] \end{aligned}$$
(91d)
$$\begin{aligned}&\quad + \frac{N_w^2}{\lambda ^2} \sum _{i=1}^{\lambda } \mathbb {E}\left[ (X_i - m)^\textrm{T}A (X_i - m) \right] \end{aligned}$$
(91e)
$$\begin{aligned}&\le \left( 1 - \frac{1}{\lambda }\right) N_w^2 {{\,\textrm{Tr}\,}}((A \Sigma )^2)^{1/2} + \frac{1}{\lambda }N_w^2 {{\,\textrm{Tr}\,}}\left( A \Sigma \right) . \end{aligned}$$
(91f)

To obtain the last inequality, we applied the Cauchy–Schwarz inequality to the first term, and we used the fact that \(\mathbb {E}\left[ (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] = {{\,\textrm{Tr}\,}}(\Sigma A )\). In light of (77), we obtain (73).

In light of Proposition 16, Lemma 20, and the fact that \(\nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ) = (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\), we have

$$\begin{aligned}{} & {} \nabla J(\theta ^{(t)})^\textrm{T}\mathbb {E}_t[\Delta ^{(t)}_f] \nonumber \\{} & {} \quad \le - \frac{M_w}{3\cdot 2^{1/2}} \left( (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\right) ^{1/2}. \end{aligned}$$
(92)

From (73) and (92), we obtain

$$\begin{aligned} \frac{{{\bar{\alpha }}}(\theta {;} \alpha )}{\alpha / 2} \le \frac{6 N_w^2}{M_w} \bigg (\left( 1 - \frac{1}{\lambda }\right) + \frac{d^{1/2}}{\lambda } \bigg ), \end{aligned}$$
(93)

for which we used the fact that \(\frac{{{\,\textrm{Tr}\,}}(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le d^{1/2}\). This completes the proof. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akimoto, Y. Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms. Algorithmica 86, 33–63 (2024). https://doi.org/10.1007/s00453-022-01087-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-022-01087-8

Keywords

Navigation