Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms

Akimoto, Youhei

doi:10.1007/s00453-022-01087-8

Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms

Published: 22 December 2022

Volume 86, pages 33–63, (2024)
Cite this article

Algorithmica Aims and scope Submit manuscript

Youhei Akimoto^1,2

183 Accesses
Explore all metrics

Abstract

Surrogate functions are often employed to reduce the number of objective function evaluations in a continuous optimization. However, their effects have seldom been investigated theoretically. This paper analyzes the effect of a surrogate function in the information-geometric optimization (IGO) framework, which includes as an algorithm instance a variant of the covariance matrix adaptation evolution strategy—a widely used solver for black-box continuous optimization. We derive a sufficient condition on the surrogate function for the parameter update in the IGO algorithms to point to a descent direction of the objective function expected over the search distribution. The condition is expressed in terms of three measures of correlation between the objective function and the surrogate function. Our result constitutes a partial justification for the use of a surrogate function in IGO algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Open Issues in Surrogate-Assisted Optimization

Simple Surrogate Model Assisted Optimization with Covariance Matrix Adaptation

A Fair Performance Comparison of Different Surrogate Optimization Strategies

References

Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Article Google Scholar
Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)
Article Google Scholar
Hansen, N., Kern, S.: Evaluating the CMA evolution strategy on multimodal test functions. In: Parallel Problem Solving from Nature—PPSN VIII, pp. 282–291 (2004)
Akimoto, Y., Hansen, N.: Diagonal acceleration for covariance matrix adaptation evolution strategies. Evol. Comput. 28(3), 405–435 (2020)
Article Google Scholar
Jastrebski, G.A., Arnold, D.V.: Improving evolution strategies through active covariance matrix adaptation. In: 2006 IEEE International Conference on Evolutionary Computation, pp. 2814–2821 (2006)
Hansen, N., Auger, A., Ros, R., Finck, S., Pošík, P.: Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1689–1696 (2010)
Rios, L.M., Sahinidis, N.V.: Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Global Optim. 56(3), 1247–1293 (2013)
Article MathSciNet Google Scholar
Urieli, D., MacAlpine, P., Kalyanakrishnan, S., Bentor, Y., Stone, P.: On optimizing interdependent skills: a case study in simulated 3d humanoid robot soccer. In: Proceedings of 10th International Conference on Autonomous Agents and Multiagent Systems, vol. 11, pp. 769–776 (2011)
Maki, A., Sakamoto, N., Akimoto, Y., Nishikawa, H., Umeda, N.: Application of optimal control theory based on the evolution strategy (CMA-ES) to automatic berthing. J. Mar. Sci. Technol. 25(1), 221–233 (2020)
Article Google Scholar
Schafroth, D., Bermes, C., Bouabdallah, S., Siegwart, R.: Modeling, system identification and robust control of a coaxial micro helicopter. Control. Eng. Pract. 18(7), 700–711 (2010)
Article Google Scholar
Fujii, G., Akimoto, Y., Takahashi, M.: Exploring optimal topology of thermal cloaks by CMA-ES. Appl. Phys. Lett. 112(6), 061108 (2018)
Article Google Scholar
Marsden, A.L., Wang, M., Dennis, J.E., Moin, P.: Optimal aeroacoustic shape design using the surrogate management framework. Optim. Eng. 5(2), 235–262 (2004)
Article MathSciNet Google Scholar
Hitz, G., Galceran, E., Garneau, M.-È., Pomerleau, F., Siegwart, R.: Adaptive continuous-space informative path planning for online environmental monitoring. J. Field Robot. 34(8), 1427–1449 (2017)
Article Google Scholar
Sadeghi, M., Kalantar, M.: Multi types dg expansion dynamic planning in distribution system under stochastic conditions using covariance matrix adaptation evolutionary strategy and monte-carlo simulation. Energy Convers. Manag. 87, 455–471 (2014)
Article Google Scholar
Bouzarkouna, Z., Ding, D.Y., Auger, A.: Well placement optimization with the covariance matrix adaptation evolution strategy and meta-models. Comput. Geosci. 16(1), 75–92 (2012)
Article Google Scholar
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, pp. 2455–2467 (2018)
Chrabaszcz, P., Loshchilov, I., Hutter, F.: Back to basics: Benchmarking canonical evolution strategies for playing Atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 1419–1426 (2018)
Volz, V., Schrum, J., Liu, J., Lucas, S.M., Smith, A., Risi, S.: Evolving Mario levels in the latent space of a deep convolutional generative adversarial network. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 221–228 (2018)
Tanabe, T., Fukuchi, K., Sakuma, J., Akimoto, Y.: Level generation for angry birds with sequential VAE and latent variable evolution. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1052–1060 (2021)
Nomura, M., Watanabe, S., Akimoto, Y., Ozaki, Y., Onishi, M.: Warm starting CMA-ES for hyperparameter optimization. Proc. AAAI Conf. Artif. Intell. 35(10), 9188–9196 (2021)
Google Scholar
Loshchilov, I., Schoenauer, M., Sebag, M.: Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 321–328 (2012)
Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft. Comput. 9(1), 3–12 (2005)
Article Google Scholar
Pitra, Z., Hanuš, M., Koza, J., Tumpach, J., Holeňa, M.: Interaction between model and its evolution control in surrogate-assisted CMA evolution strategy. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 528–536 (2021)
Hansen, N.: A global surrogate assisted CMA-ES. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 664–672 (2019)
Akimoto, Y., Shimizu, T., Yamaguchi, T.: Adaptive objective selection for multi-fidelity optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 880–888 (2019)
Miyagi, A., Fukuchi, K., Sakuma, J., Akimoto, Y.: Adaptive scenario subset selection for min–max black-box continuous optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 697–705 (2021)
Miyagi, A., Fukuchi, K., Sakuma, J., Akimoto, Y.: Black-box min-max continuous optimization using cma-es with worst-case ranking approximation. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 823–831 (2022)
Akimoto, Y., Sakamoto, N., Ohtani, M.: Multi-fidelity optimization approach under prior and posterior constraints and its application to compliance minimization. In: Parallel Problem Solving from Nature—PPSN XVI, pp. 81–94 (2020)
Miyagi, A., Akimoto, Y., Yamamoto, H.: Well placement optimization under geological statistical uncertainty. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1284–1292 (2019)
Jin, Y.: Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol. Comput. 1(2), 61–70 (2011)
Article Google Scholar
Kayhani, A., Arnold, D.V.: Design of a surrogate model assisted (1+1)-es. In: Parallel Problem Solving from Nature—PPSN XV, pp. 16–28 (2018)
Yang, J., Arnold, D.V.: A surrogate model assisted (1+1)-es with increased exploitation of the model. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 727–735 (2019)
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(1), 564–628 (2017)
MathSciNet Google Scholar
Hajek, B.: Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3), 502–525 (1982)
Article MathSciNet Google Scholar
Devroye, L.: Non-uniform Random Variate Generation, 1st edn. Springer, New York (1986)
Book Google Scholar
Akimoto, Y., Auger, A., Hansen, N.: An ode method to prove the geometric convergence of adaptive stochastic algorithms. Stochastic Processes Appl. 145, 269–307 (2022)
Article MathSciNet Google Scholar
Arratia, R., Gordon, L.: Tutorial on large deviations for the binomial distribution. Bull. Math. Biol. 51(1), 125–131 (1989)
Article MathSciNet Google Scholar
Stanica, P.: Good lower and upper bounds on binomial coefficients. J. Inequal. Pure Appl. Math. 2(3), 30 (2001)
MathSciNet Google Scholar
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: Parallel Problem Solving from Nature, PPSN XI, pp. 154–163 (2010)
Harville, D.A.: Matrix Algebra From a Statistician’s Perspective, 1st edn. Springer, New York (1998)
Google Scholar
Ros, R., Hansen, N.: A simple modification in CMA-ES achieving linear time and space complexity. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature—PPSN X, pp. 296–305 (2008)
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Theoretical foundation for CMA-ES from information geometry perspective. Algorithmica 64(4), 698–716 (2012)
Article MathSciNet Google Scholar
Akimoto, Y., Auger, A., Hansen, N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 373–380 (2014)
Akimoto, Y., Ollivier, Y.: Objective improvement in information-geometric optimization. In: Proceedings of the Twelfth Workshop on Foundations of Genetic Algorithms XII. FOGA XII ’13, pp. 1–10 (2013)
Akimoto, Y.: Analysis of a natural gradient algorithm on monotonic convex-quadratic-composite functions. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 1293–1300 (2012)
Uchida, K., Shirakawa, S., Akimoto, Y.: Finite-sample analysis of information geometric optimization with isotropic gaussian distribution on convex quadratic functions. IEEE Trans. Evol. Comput. 24(6), 1035–1049 (2020)
Article Google Scholar
Lehmann, E.L., Casella, G.: Theory of Point Estimation, 2nd edn. Springer, New York (2006)
Google Scholar
Magnus, J.R.: The moments of products of quadratic forms in normal variables. Stat. Neerl. 32(4), 201–210 (1978)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research is partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 19H04179 and the New Energy and Industrial Technology Development Organization (NEDO) Project Number JPNP18002.

Author information

Authors and Affiliations

Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan
Youhei Akimoto
Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
Youhei Akimoto

Authors

Youhei Akimoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youhei Akimoto.

Ethics declarations

Conflicts of interest

We have no conflicts of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Proofs

1.1 A.1 Proof of Proposition 18

Proof

Let $\Delta ^{(t)}_g = (\Delta _m, {{\,\textrm{vec}\,}}(\Delta _\Sigma ))$. We have

$$\begin{aligned} \psi (\alpha \Delta ^{(t)}_g{;} \theta ^{(t)}) = J(\theta ^{(t)}+ \alpha \Delta ^{(t)}_g) - J(\theta ^{(t)}) - \alpha \nabla J(\theta ^{(t)})^\textrm{T}\Delta ^{(t)}_g = \frac{\alpha ^2}{2} \Delta _m^\textrm{T}A \Delta _{m}. \end{aligned}$$

(77)

Noting that $\mathbb {E}[g(X_\theta )] = {\tilde{\nu }}^\textrm{T}\mathbb {E}[s(\theta {;} X_\theta )] = 0$, we have

$$\begin{aligned} {{\,\textrm{Var}\,}}[g(X_\theta )]\mathbb {E}[ \Delta _m^\textrm{T}A \Delta _{m}]= & {} \frac{1}{\lambda } \mathbb {E}\left[ g(X_\theta )^2 (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] \nonumber \\{} & {} + \frac{\lambda - 1}{\lambda } \mathbb {E}\left[ g(X_\theta ) (X_\theta - m) \right] ^\textrm{T}A \mathbb {E}\left[ g(X_\theta ) (X_\theta - m) \right] ,\nonumber \\ \end{aligned}$$

(78)

where

$$\begin{aligned} \mathbb {E}\left[ g(X_\theta ) (X_\theta - m) \right] = \mathbb {E}[(X_\theta -m)(X_\theta -m)^\textrm{T}\Sigma ^{-1} {\tilde{\nu }}_m] = {\tilde{\nu }}_m. \end{aligned}$$

(79)

Let $H_1 = \Sigma ^{-1/2} {\tilde{\nu }}_m {\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1/2}$, $H_2 = \Sigma ^{-1/2} {\tilde{\nu }}_\Sigma \Sigma ^{-1/2}$, and $H_3 = \Sigma ^{1/2} A \Sigma ^{1/2}$. Let $Z = \Sigma ^{-1/2}(X_\theta - m)$. Then Z follows a standard normal distribution, and we have

$$\begin{aligned}&\mathbb {E}\left[ g(X_\theta )^2 (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] \end{aligned}$$

(80a)

$$\begin{aligned}&\quad =\ \mathbb {E}[(Z^\textrm{T}H_1 Z) (Z^\textrm{T}H_3 Z)] \end{aligned}$$

(80b)

$$\begin{aligned}&\qquad + \frac{1}{4}\mathbb {E}[(Z^\textrm{T}H_2 Z)^2 (Z^\textrm{T}H_3 Z)] \end{aligned}$$

(80c)

$$\begin{aligned}&\qquad - \frac{1}{2}{{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1})\mathbb {E}[(Z^\textrm{T}H_2 Z) (Z^\textrm{T}H_3 Z)] \end{aligned}$$

(80d)

$$\begin{aligned}&\qquad + \frac{1}{4}{{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1})^2\mathbb {E}[(Z^\textrm{T}H_3 Z)]. \end{aligned}$$

(80e)

By using the formula for the expectation of the product of quadratic forms of Gaussian random vectors [48, Theorem 5.1], we obtain

$$\begin{aligned} (80b)&= {{\,\textrm{Tr}\,}}(H_1){{\,\textrm{Tr}\,}}(H_3) + 2{{\,\textrm{Tr}\,}}(H_1 H_3), \end{aligned}$$

(81a)

$$\begin{aligned} (80c)&= \frac{1}{4} ({{\,\textrm{Tr}\,}}(H_2)^2 {{\,\textrm{Tr}\,}}(H_3) + 2{{\,\textrm{Tr}\,}}(H_2^2){{\,\textrm{Tr}\,}}(H_3) \nonumber \\&\quad + 4{{\,\textrm{Tr}\,}}(H_2 H_3) {{\,\textrm{Tr}\,}}(H_2) + 8 {{\,\textrm{Tr}\,}}(H_2^2 H_3) ), \end{aligned}$$

(81b)

$$\begin{aligned} (80d)&= - \frac{1}{2}{{\,\textrm{Tr}\,}}(H_2)({{\,\textrm{Tr}\,}}(H_2){{\,\textrm{Tr}\,}}(H_3) + 2{{\,\textrm{Tr}\,}}(H_2 H_3)), \end{aligned}$$

(81c)

$$\begin{aligned} (80e)&= \frac{1}{4}{{\,\textrm{Tr}\,}}(H_2)^2 {{\,\textrm{Tr}\,}}(H_3). \end{aligned}$$

(81d)

Rearranging these terms, we obtain

$$\begin{aligned}&\mathbb {E}\left[ g(X_\theta )^2 (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] \end{aligned}$$

(82a)

$$\begin{aligned}&= {{\,\textrm{Tr}\,}}(H_3) ({{\,\textrm{Tr}\,}}(H_1) + \frac{1}{2}{{\,\textrm{Tr}\,}}(H_2^2)) + 2 {{\,\textrm{Tr}\,}}(H_1 H_3) + 2 {{\,\textrm{Tr}\,}}(H_2^2 H_3) \end{aligned}$$

(82b)

$$\begin{aligned}&= {{\,\textrm{Var}\,}}[g(X_\theta )] {{\,\textrm{Tr}\,}}(A \Sigma ) + 2{\tilde{\nu }}_m^\textrm{T}A {\tilde{\nu }}_m + 2 {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma A), \end{aligned}$$

(82c)

for which we used the fact that ${{\,\textrm{Var}\,}}[g(X_\theta )] = {{\,\textrm{Tr}\,}}(H_1) + \frac{1}{2}{{\,\textrm{Tr}\,}}(H_2^2)$. Finally, from (78), we obtain

$$\begin{aligned} \mathbb {E}[ \Delta _m^\textrm{T}A \Delta _{m}] = \frac{1}{\lambda } {{\,\textrm{Tr}\,}}(A \Sigma ) + \frac{1}{\lambda }\frac{(\lambda + 1){\tilde{\nu }}_m^\textrm{T}A {\tilde{\nu }}_m + 2 {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma A)}{{\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1} {\tilde{\nu }}_m + \frac{1}{2} {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma \Sigma ^{-1})}. \end{aligned}$$

(83)

In light of Proposition 5 and the fact that $\nabla J(\theta )^\textrm{T}{\mathcal {I}}(\theta )^{-1} \nabla J(\theta ) = {{\,\textrm{Var}\,}}[f(X_\theta )] = \nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ) = (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )$, we have

$$\begin{aligned} \nabla J(\theta ^{(t)})^\textrm{T}\mathbb {E}_t[\Delta ^{(t)}_f] = - \left( (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\right) ^{1/2}. \end{aligned}$$

(84)

From (83) and (84), we obtain (71a).

Inequality (71b) is derived as follows. Using the inequalities $\frac{{\tilde{\nu }}_m^\textrm{T}A {\tilde{\nu }}_m}{{\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1} {\tilde{\nu }}_m} \le \sigma _{\max }(A \Sigma )$ and $\frac{ {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma A) }{ {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma \Sigma ^{-1}) } \le \sigma _{\max }(A \Sigma )$, where $\sigma _{\max }(A \Sigma )$ denotes the greatest singular value of $A \Sigma $, we obtain

$$\begin{aligned} \frac{{{\bar{\alpha }}}(\theta {;} \alpha )}{\alpha / 2} \le \frac{\frac{1}{\lambda } {{\,\textrm{Tr}\,}}(A \Sigma ) + \frac{\lambda + 5}{\lambda } \sigma _{\max }(A \Sigma ) }{ ( (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma ))^{1/2} }. \end{aligned}$$

(85)

Moreover, by using $\frac{{{\,\textrm{Tr}\,}}(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le d^{1/2}$ and $\frac{\sigma _{\max }(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le 1$, we have

$$\begin{aligned} \frac{{{\bar{\alpha }}}(\theta {;} \alpha )}{\alpha / 2} \le 2^{1/2} \left( \frac{2 d^{1/2}}{\lambda } + \frac{\lambda + 5}{\lambda } \right) . \end{aligned}$$

(86)

This completes the proof. $\square $

1.2 A.2 Proof of Lemma 20

Proof

Note that $\frac{\mathbb {E}[ (\nu (\theta ) s(\theta {;} X_\theta ))^4 ]}{ (\nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ))^2 } = \frac{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^4]}{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^2]}$. We bound this ratio as follows. Let the eigenvalue decomposition of $\sqrt{\Sigma } A \sqrt{\Sigma }$ be denoted by $E D E^\textrm{T}$, where D is the diagonal matrix composed of the eigenvalues of $\sqrt{\Sigma } A \sqrt{\Sigma }$, and E is the orthogonal matrix composed of the unit eigenvectors of $\sqrt{\Sigma } A \sqrt{\Sigma }$. Let $Z = E^\textrm{T}\sqrt{\Sigma }^{-1}(X_\theta - m)$ and $v = E^\textrm{T}\sqrt{\Sigma } A (m - x^*)$. Then by a simple derivation, we have

$$\begin{aligned}&f(Z_\theta ) - E[f(X_\theta )] \end{aligned}$$

(87a)

$$\begin{aligned}&\quad = \frac{1}{2} (X_\theta - m)^\textrm{T}A (X_\theta - m) - \frac{1}{2} {{\,\textrm{Tr}\,}}(A \Sigma ) + (m - x^*)^\textrm{T}A (X_\theta - m) \end{aligned}$$

(87b)

$$\begin{aligned}&\quad = \frac{1}{2} Z^\textrm{T}E^\textrm{T}\sqrt{\Sigma } A \sqrt{\Sigma } E Z - \frac{1}{2} {{\,\textrm{Tr}\,}}( \sqrt{\Sigma } A \sqrt{\Sigma }) + v^\textrm{T}E^\textrm{T}\sqrt{\Sigma }^{-1} A^{-1} A \sqrt{\Sigma } E Z \end{aligned}$$

(87c)

$$\begin{aligned}&\quad = \frac{1}{2} Z^\textrm{T}D Z - \frac{1}{2} {{\,\textrm{Tr}\,}}( D ) + v^\textrm{T}Z \end{aligned}$$

(87d)

$$\begin{aligned}&\quad = \frac{1}{2} \sum _{i=1}^{d} \left( d_i (Z_i^2 - 1) + 2 v_i Z_i \right) . \end{aligned}$$

(87e)

Let

$$\begin{aligned} \mu _{i,2} = \mathbb {E}\left[ \left( d_i (Z_i^2 - 1) + 2 v_i Z_i \right) ^2 \right]&= 2 d_i^2 + 4 v_i^2, \end{aligned}$$

(88a)

$$\begin{aligned} \mu _{i,4} = \mathbb {E}\left[ \left( d_i (Z_i^2 - 1) + 2 v_i Z_i \right) ^4 \right]&= 60 d_i^4 + 240 d_i^2 v_i^2 + 48 v_i^4. \end{aligned}$$

(88b)

Note that the $Z_i$ values are independent and follow a standard normal distribution. A simple derivation leads to

$$\begin{aligned} \mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^2]&= \frac{1}{4} \sum _{i=1}^{d} \mu _{i,2}, \end{aligned}$$

(89a)

$$\begin{aligned} \mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^4]&= \frac{1}{16}\sum _{i=1}^{d}\left( \mu _{i,4} - 3 \mu _{i,2}^2 \right) + \frac{3}{16}\left( \sum _{i=1}^d \mu _{i,2} \right) ^2 . \end{aligned}$$

(89b)

Using $\mu _{i,4} - 3 \mu _{i,2}^2 \le 12 \mu _{i,2}^2$, we obtain

$$\begin{aligned} \frac{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^4]}{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^2]^2}&= \frac{\sum _{i=1}^{d}\left( \mu _{i,4} - 3 \mu _{i,2}^2 \right) }{ \left( \sum _{i=1}^d \mu _{i,2} \right) ^2 } + 3 \end{aligned}$$

(90a)

$$\begin{aligned}&\le \frac{12 \sum _{i=1}^{d} \mu _{i,2}^2}{ \left( \sum _{i=1}^d \mu _{i,2} \right) ^2 } + 3 \le 15. \end{aligned}$$

(90b)

Therefore, $N_s \le 15$ for all $\theta \in \Theta $. This completes the proof. $\square $

1.3 A.3 Proof of Lemma 21

Proof

Note that $\lambda \left| {\hat{W}}_i\right| \le N_w$ with probability one for all $i = 1, \dots , \lambda $. Then

$$\begin{aligned} \mathbb {E}_t\left[ \Delta _m^\textrm{T}A \Delta _m \right]&= \sum _{i=1}^{\lambda }\sum _{j=1}^{\lambda } \mathbb {E}\left[ {\hat{W}}_i{\hat{W}}_j (X_i - m)^\textrm{T}A (X_j - m) \right] \end{aligned}$$

(91a)

$$\begin{aligned}&\le \sum _{i=1}^{\lambda }\sum _{j=1}^{\lambda } \mathbb {E}\left[ \left| {\hat{W}}_i\right| \left| {\hat{W}}_j\right| \left| (X_i - m)^\textrm{T}A (X_j - m)\right| \right] \end{aligned}$$

(91b)

$$\begin{aligned}&\le \frac{N_w^2}{\lambda ^2} \sum _{i=1}^{\lambda }\sum _{j=1}^{\lambda } \mathbb {E}\left[ \left| (X_i - m)^\textrm{T}A (X_j - m)\right| \right] \end{aligned}$$

(91c)

$$\begin{aligned}&= \frac{N_w^2}{\lambda ^2} \sum _{i=1}^{\lambda }\sum _{j \ne i} \mathbb {E}\left[ \left| (X_i - m)^\textrm{T}A (X_j - m)\right| \right] \end{aligned}$$

(91d)

$$\begin{aligned}&\quad + \frac{N_w^2}{\lambda ^2} \sum _{i=1}^{\lambda } \mathbb {E}\left[ (X_i - m)^\textrm{T}A (X_i - m) \right] \end{aligned}$$

(91e)

$$\begin{aligned}&\le \left( 1 - \frac{1}{\lambda }\right) N_w^2 {{\,\textrm{Tr}\,}}((A \Sigma )^2)^{1/2} + \frac{1}{\lambda }N_w^2 {{\,\textrm{Tr}\,}}\left( A \Sigma \right) . \end{aligned}$$

(91f)

To obtain the last inequality, we applied the Cauchy–Schwarz inequality to the first term, and we used the fact that $\mathbb {E}\left[ (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] = {{\,\textrm{Tr}\,}}(\Sigma A )$. In light of (77), we obtain (73).

In light of Proposition 16, Lemma 20, and the fact that $\nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ) = (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )$, we have

$$\begin{aligned}{} & {} \nabla J(\theta ^{(t)})^\textrm{T}\mathbb {E}_t[\Delta ^{(t)}_f] \nonumber \\{} & {} \quad \le - \frac{M_w}{3\cdot 2^{1/2}} \left( (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\right) ^{1/2}. \end{aligned}$$

(92)

From (73) and (92), we obtain

$$\begin{aligned} \frac{{{\bar{\alpha }}}(\theta {;} \alpha )}{\alpha / 2} \le \frac{6 N_w^2}{M_w} \bigg (\left( 1 - \frac{1}{\lambda }\right) + \frac{d^{1/2}}{\lambda } \bigg ), \end{aligned}$$

(93)

for which we used the fact that $\frac{{{\,\textrm{Tr}\,}}(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le d^{1/2}$. This completes the proof. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Akimoto, Y. Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms. Algorithmica 86, 33–63 (2024). https://doi.org/10.1007/s00453-022-01087-8

Download citation

Received: 23 September 2022
Accepted: 13 December 2022
Published: 22 December 2022
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00453-022-01087-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms

Abstract

Access this article

Similar content being viewed by others

Open Issues in Surrogate-Assisted Optimization

Simple Surrogate Model Assisted Optimization with Covariance Matrix Adaptation

A Fair Performance Comparison of Different Surrogate Optimization Strategies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

A Proofs

1.1 A.1 Proof of Proposition 18

Proof

1.2 A.2 Proof of Lemma 20

Proof

1.3 A.3 Proof of Lemma 21

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms

Abstract

Access this article

Similar content being viewed by others

Open Issues in Surrogate-Assisted Optimization

Simple Surrogate Model Assisted Optimization with Covariance Matrix Adaptation

A Fair Performance Comparison of Different Surrogate Optimization Strategies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

A Proofs

A Proofs

1.1 A.1 Proof of Proposition 18

Proof

1.2 A.2 Proof of Lemma 20

Proof

1.3 A.3 Proof of Lemma 21

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation