Abstract
Surrogate functions are often employed to reduce the number of objective function evaluations in a continuous optimization. However, their effects have seldom been investigated theoretically. This paper analyzes the effect of a surrogate function in the information-geometric optimization (IGO) framework, which includes as an algorithm instance a variant of the covariance matrix adaptation evolution strategy—a widely used solver for black-box continuous optimization. We derive a sufficient condition on the surrogate function for the parameter update in the IGO algorithms to point to a descent direction of the objective function expected over the search distribution. The condition is expressed in terms of three measures of correlation between the objective function and the surrogate function. Our result constitutes a partial justification for the use of a surrogate function in IGO algorithms.
Similar content being viewed by others
References
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evol. Comput. 9(2), 159–195 (2001)
Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evol. Comput. 11(1), 1–18 (2003)
Hansen, N., Kern, S.: Evaluating the CMA evolution strategy on multimodal test functions. In: Parallel Problem Solving from Nature—PPSN VIII, pp. 282–291 (2004)
Akimoto, Y., Hansen, N.: Diagonal acceleration for covariance matrix adaptation evolution strategies. Evol. Comput. 28(3), 405–435 (2020)
Jastrebski, G.A., Arnold, D.V.: Improving evolution strategies through active covariance matrix adaptation. In: 2006 IEEE International Conference on Evolutionary Computation, pp. 2814–2821 (2006)
Hansen, N., Auger, A., Ros, R., Finck, S., Pošík, P.: Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009. In: Proceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1689–1696 (2010)
Rios, L.M., Sahinidis, N.V.: Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Global Optim. 56(3), 1247–1293 (2013)
Urieli, D., MacAlpine, P., Kalyanakrishnan, S., Bentor, Y., Stone, P.: On optimizing interdependent skills: a case study in simulated 3d humanoid robot soccer. In: Proceedings of 10th International Conference on Autonomous Agents and Multiagent Systems, vol. 11, pp. 769–776 (2011)
Maki, A., Sakamoto, N., Akimoto, Y., Nishikawa, H., Umeda, N.: Application of optimal control theory based on the evolution strategy (CMA-ES) to automatic berthing. J. Mar. Sci. Technol. 25(1), 221–233 (2020)
Schafroth, D., Bermes, C., Bouabdallah, S., Siegwart, R.: Modeling, system identification and robust control of a coaxial micro helicopter. Control. Eng. Pract. 18(7), 700–711 (2010)
Fujii, G., Akimoto, Y., Takahashi, M.: Exploring optimal topology of thermal cloaks by CMA-ES. Appl. Phys. Lett. 112(6), 061108 (2018)
Marsden, A.L., Wang, M., Dennis, J.E., Moin, P.: Optimal aeroacoustic shape design using the surrogate management framework. Optim. Eng. 5(2), 235–262 (2004)
Hitz, G., Galceran, E., Garneau, M.-È., Pomerleau, F., Siegwart, R.: Adaptive continuous-space informative path planning for online environmental monitoring. J. Field Robot. 34(8), 1427–1449 (2017)
Sadeghi, M., Kalantar, M.: Multi types dg expansion dynamic planning in distribution system under stochastic conditions using covariance matrix adaptation evolutionary strategy and monte-carlo simulation. Energy Convers. Manag. 87, 455–471 (2014)
Bouzarkouna, Z., Ding, D.Y., Auger, A.: Well placement optimization with the covariance matrix adaptation evolution strategy and meta-models. Comput. Geosci. 16(1), 75–92 (2012)
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, pp. 2455–2467 (2018)
Chrabaszcz, P., Loshchilov, I., Hutter, F.: Back to basics: Benchmarking canonical evolution strategies for playing Atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 1419–1426 (2018)
Volz, V., Schrum, J., Liu, J., Lucas, S.M., Smith, A., Risi, S.: Evolving Mario levels in the latent space of a deep convolutional generative adversarial network. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 221–228 (2018)
Tanabe, T., Fukuchi, K., Sakuma, J., Akimoto, Y.: Level generation for angry birds with sequential VAE and latent variable evolution. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1052–1060 (2021)
Nomura, M., Watanabe, S., Akimoto, Y., Ozaki, Y., Onishi, M.: Warm starting CMA-ES for hyperparameter optimization. Proc. AAAI Conf. Artif. Intell. 35(10), 9188–9196 (2021)
Loshchilov, I., Schoenauer, M., Sebag, M.: Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 321–328 (2012)
Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft. Comput. 9(1), 3–12 (2005)
Pitra, Z., Hanuš, M., Koza, J., Tumpach, J., Holeňa, M.: Interaction between model and its evolution control in surrogate-assisted CMA evolution strategy. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 528–536 (2021)
Hansen, N.: A global surrogate assisted CMA-ES. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 664–672 (2019)
Akimoto, Y., Shimizu, T., Yamaguchi, T.: Adaptive objective selection for multi-fidelity optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 880–888 (2019)
Miyagi, A., Fukuchi, K., Sakuma, J., Akimoto, Y.: Adaptive scenario subset selection for min–max black-box continuous optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 697–705 (2021)
Miyagi, A., Fukuchi, K., Sakuma, J., Akimoto, Y.: Black-box min-max continuous optimization using cma-es with worst-case ranking approximation. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 823–831 (2022)
Akimoto, Y., Sakamoto, N., Ohtani, M.: Multi-fidelity optimization approach under prior and posterior constraints and its application to compliance minimization. In: Parallel Problem Solving from Nature—PPSN XVI, pp. 81–94 (2020)
Miyagi, A., Akimoto, Y., Yamamoto, H.: Well placement optimization under geological statistical uncertainty. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1284–1292 (2019)
Jin, Y.: Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol. Comput. 1(2), 61–70 (2011)
Kayhani, A., Arnold, D.V.: Design of a surrogate model assisted (1+1)-es. In: Parallel Problem Solving from Nature—PPSN XV, pp. 16–28 (2018)
Yang, J., Arnold, D.V.: A surrogate model assisted (1+1)-es with increased exploitation of the model. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 727–735 (2019)
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(1), 564–628 (2017)
Hajek, B.: Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3), 502–525 (1982)
Devroye, L.: Non-uniform Random Variate Generation, 1st edn. Springer, New York (1986)
Akimoto, Y., Auger, A., Hansen, N.: An ode method to prove the geometric convergence of adaptive stochastic algorithms. Stochastic Processes Appl. 145, 269–307 (2022)
Arratia, R., Gordon, L.: Tutorial on large deviations for the binomial distribution. Bull. Math. Biol. 51(1), 125–131 (1989)
Stanica, P.: Good lower and upper bounds on binomial coefficients. J. Inequal. Pure Appl. Math. 2(3), 30 (2001)
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: Parallel Problem Solving from Nature, PPSN XI, pp. 154–163 (2010)
Harville, D.A.: Matrix Algebra From a Statistician’s Perspective, 1st edn. Springer, New York (1998)
Ros, R., Hansen, N.: A simple modification in CMA-ES achieving linear time and space complexity. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature—PPSN X, pp. 296–305 (2008)
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Theoretical foundation for CMA-ES from information geometry perspective. Algorithmica 64(4), 698–716 (2012)
Akimoto, Y., Auger, A., Hansen, N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 373–380 (2014)
Akimoto, Y., Ollivier, Y.: Objective improvement in information-geometric optimization. In: Proceedings of the Twelfth Workshop on Foundations of Genetic Algorithms XII. FOGA XII ’13, pp. 1–10 (2013)
Akimoto, Y.: Analysis of a natural gradient algorithm on monotonic convex-quadratic-composite functions. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 1293–1300 (2012)
Uchida, K., Shirakawa, S., Akimoto, Y.: Finite-sample analysis of information geometric optimization with isotropic gaussian distribution on convex quadratic functions. IEEE Trans. Evol. Comput. 24(6), 1035–1049 (2020)
Lehmann, E.L., Casella, G.: Theory of Point Estimation, 2nd edn. Springer, New York (2006)
Magnus, J.R.: The moments of products of quadratic forms in normal variables. Stat. Neerl. 32(4), 201–210 (1978)
Acknowledgements
This research is partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 19H04179 and the New Energy and Industrial Technology Development Organization (NEDO) Project Number JPNP18002.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
We have no conflicts of interest to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Proofs
A Proofs
1.1 A.1 Proof of Proposition 18
Proof
Let \(\Delta ^{(t)}_g = (\Delta _m, {{\,\textrm{vec}\,}}(\Delta _\Sigma ))\). We have
Noting that \(\mathbb {E}[g(X_\theta )] = {\tilde{\nu }}^\textrm{T}\mathbb {E}[s(\theta {;} X_\theta )] = 0\), we have
where
Let \(H_1 = \Sigma ^{-1/2} {\tilde{\nu }}_m {\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1/2}\), \(H_2 = \Sigma ^{-1/2} {\tilde{\nu }}_\Sigma \Sigma ^{-1/2}\), and \(H_3 = \Sigma ^{1/2} A \Sigma ^{1/2}\). Let \(Z = \Sigma ^{-1/2}(X_\theta - m)\). Then Z follows a standard normal distribution, and we have
By using the formula for the expectation of the product of quadratic forms of Gaussian random vectors [48, Theorem 5.1], we obtain
Rearranging these terms, we obtain
for which we used the fact that \({{\,\textrm{Var}\,}}[g(X_\theta )] = {{\,\textrm{Tr}\,}}(H_1) + \frac{1}{2}{{\,\textrm{Tr}\,}}(H_2^2)\). Finally, from (78), we obtain
In light of Proposition 5 and the fact that \(\nabla J(\theta )^\textrm{T}{\mathcal {I}}(\theta )^{-1} \nabla J(\theta ) = {{\,\textrm{Var}\,}}[f(X_\theta )] = \nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ) = (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\), we have
From (83) and (84), we obtain (71a).
Inequality (71b) is derived as follows. Using the inequalities \(\frac{{\tilde{\nu }}_m^\textrm{T}A {\tilde{\nu }}_m}{{\tilde{\nu }}_m^\textrm{T}\Sigma ^{-1} {\tilde{\nu }}_m} \le \sigma _{\max }(A \Sigma )\) and \(\frac{ {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma A) }{ {{\,\textrm{Tr}\,}}({\tilde{\nu }}_\Sigma \Sigma ^{-1} {\tilde{\nu }}_\Sigma \Sigma ^{-1}) } \le \sigma _{\max }(A \Sigma )\), where \(\sigma _{\max }(A \Sigma )\) denotes the greatest singular value of \(A \Sigma \), we obtain
Moreover, by using \(\frac{{{\,\textrm{Tr}\,}}(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le d^{1/2}\) and \(\frac{\sigma _{\max }(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le 1\), we have
This completes the proof. \(\square \)
1.2 A.2 Proof of Lemma 20
Proof
Note that \(\frac{\mathbb {E}[ (\nu (\theta ) s(\theta {;} X_\theta ))^4 ]}{ (\nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ))^2 } = \frac{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^4]}{\mathbb {E}[(f(X_\theta ) - \mathbb {E}[f(X_\theta )])^2]}\). We bound this ratio as follows. Let the eigenvalue decomposition of \(\sqrt{\Sigma } A \sqrt{\Sigma }\) be denoted by \(E D E^\textrm{T}\), where D is the diagonal matrix composed of the eigenvalues of \(\sqrt{\Sigma } A \sqrt{\Sigma }\), and E is the orthogonal matrix composed of the unit eigenvectors of \(\sqrt{\Sigma } A \sqrt{\Sigma }\). Let \(Z = E^\textrm{T}\sqrt{\Sigma }^{-1}(X_\theta - m)\) and \(v = E^\textrm{T}\sqrt{\Sigma } A (m - x^*)\). Then by a simple derivation, we have
Let
Note that the \(Z_i\) values are independent and follow a standard normal distribution. A simple derivation leads to
Using \(\mu _{i,4} - 3 \mu _{i,2}^2 \le 12 \mu _{i,2}^2\), we obtain
Therefore, \(N_s \le 15\) for all \(\theta \in \Theta \). This completes the proof. \(\square \)
1.3 A.3 Proof of Lemma 21
Proof
Note that \(\lambda \left| {\hat{W}}_i\right| \le N_w\) with probability one for all \(i = 1, \dots , \lambda \). Then
To obtain the last inequality, we applied the Cauchy–Schwarz inequality to the first term, and we used the fact that \(\mathbb {E}\left[ (X_\theta - m)^\textrm{T}A (X_\theta - m) \right] = {{\,\textrm{Tr}\,}}(\Sigma A )\). In light of (77), we obtain (73).
In light of Proposition 16, Lemma 20, and the fact that \(\nu (\theta )^\textrm{T}{\mathcal {I}}(\theta ) \nu (\theta ) = (m - x^*)^\textrm{T}A \Sigma A (m - x^*) + \frac{1}{2} {{\,\textrm{Tr}\,}}(A\Sigma A\Sigma )\), we have
for which we used the fact that \(\frac{{{\,\textrm{Tr}\,}}(A \Sigma )}{{{\,\textrm{Tr}\,}}(A \Sigma A \Sigma )^{1/2}} \le d^{1/2}\). This completes the proof. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Akimoto, Y. Analysis of Surrogate-Assisted Information-Geometric Optimization Algorithms. Algorithmica 86, 33–63 (2024). https://doi.org/10.1007/s00453-022-01087-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-022-01087-8