Abstract
Model-based optimization methods are a class of stochastic search methods that iteratively find candidate solutions by generating samples from a parameterized probabilistic model on the solution space. In order to better capture the multi-modality of the objective functions than the traditional model-based methods which use only a single model, we propose a framework of using a population of models at every iteration with an adaptive mechanism to propagate the population over iterations. The adaptive mechanism is derived from estimating the optimal parameter of the probabilistic model in a Bayesian manner, and thus provides a proper way to determine the diversity in the population of the models. We provide theoretical justification on the convergence of this framework by showing that the posterior distribution of the parameter asymptotically converges to a degenerate distribution concentrating on the optimal parameter. Under this framework, we develop two practical algorithms by incorporating sequential Monte Carlo methods, and carry out numerical experiments to illustrate their performance.
Similar content being viewed by others
References
Ali, M.M., Khompatraporn, C., Zabinsky, Z.B.: A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J. Global Optim. 31, 635–672 (2005)
Azimi-Sadjadi, B., Krishnaprasad, P.S.: Approximate nonlinear filtering and its application in navigation. Automatica 41(6), 945–956 (2005)
Crisan, D., Doucet, A.: A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 50(3), 736–746 (2002)
DeBoer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134, 19–67 (2005)
Denny, M.: Introduction to importance sampling in rare-event simulations. Eur. J. Phys. 22(4), 403–411 (2001)
Dorigo, M., Gambardella, L.M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1, 53–66 (1997)
Doucet, A., deFreitas, J.F.G., Gordon, N.J. (eds.): Sequential Monte Carlo Methods In Practice. Springer, New York (2001)
Glover, F.W.: Tabu search: a tutorial. Interfaces 20, 74–94 (1990)
Goldberg, D.E.: Genetic Algorithms in Search, Optimizaion and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1989)
Hu, J., Chang, H.S., Fu, M.C., Marcus, S.I.: Dynamic sample budget allocation in model-based optimization. J. Global Optim. 50, 575–596 (2011)
Hu, J., Fu, M.C., Marcus, S.I.: A model reference adaptive search method for global optimization. Oper. Res. 55, 549–568 (2007)
Kirkpatrick, S., Gelatt, C.D., Vecchi Jr, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Laguna, M., Marti, R.: Experimental testing of advanced scatter search designs for global optimization of multimodal functions. J. Global Optim. 33, 235–255 (2005)
Larranaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Boston, MA (2002)
Liu, J., West, M.: Combined parameter and state estimation in simulation-based filtering. In: Doucet, A., de Freitas, J.F.G., Gordon, N.J. (eds.) Sequential Monte Carlo Methods in Practice. Springer, New York (2001)
Molvalioglu, O., Zabinsky, Z.B., Kohn, W.: The interacting-particle algorithm with dynamic heating and cooling. J. Global Optim. 43, 329–356 (2009)
Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Springer, New York (2004)
Shi, L., Ólafsson, S.: Nested partitions method for global optimization. Oper. Res. 48(3), 390–407 (2000)
Zhigljavsky, A.: Theory of Global Random Search. Kluwer, Netherlands (1991)
Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer, Berlin (2008)
Zhou, E., Fu, M.C., Marcus, S.I.: Solving continuous-state POMDPs via density projection. IEEE Trans. Autom. Control 55(5), 1101–1116 (2010)
Zhou, E., Fu, M.C., Marcus, S.I.: Particle filtering framework for a class of randomized optimization algorithms. IEEE Trans. Autom. Control 59(4), 1025–1030 (2014)
Zhou, E., Hu, J.: Gradient-based adaptive stochastic search for non-differentiable optimization. IEEE Trans. Autom. Control 59(7), 1818–1832 (2014)
Zlochin, M., Birattari, M., Meuleau, N., Dorigo, M.: Model-based search for combinatorial optimization: a critical survey. Ann. Oper. Res. 131, 373–395 (2004)
Acknowledgments
This work was supported by the National Science Foundation under Grant CMMI-1130273, and Air Force Office of Scientific Research under YIP Grant FA-9550-12-1-0250. The preliminary conference version of this paper, which presents the framework of PMO and part of the numerical results, is in the proceedings of the 2013 Winter Simulation Conference.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Lemma 1
Proof
Define \(c_{k}\triangleq \mathbb {E}_{b_{k}}[H(X)]-\mathbb {E}_{\tilde{b}_{k}}[H(X)]\). First, we show that \(\mathbb {E}_{b_k}[H(X)]\ge \mathbb {E}_{b_{k-1}}[H(X)]-c_{k-1}\). By (11), the posterior distribution \(b_k(x,\theta )\) can be expressed by
Then, the expectation of \(H(X)\) with respect to \(b_k(x,\theta )\) is represented by
where the inequality follows from \(\mathbb {E}_{\tilde{b}_{k-1}}\left[ (H(X)-E_{\tilde{b}_{k-1}}[H(X)])\varphi (H(X)-y_k)\right] \ge 0\), which can be proved as follows.
By Assumption 1, \(\varphi (\cdot )\) is strictly increasing on its support. We have
and
Then,
Therefore,
Then, we have
Thus, \(\{a_k,\ k=1,2,\ldots \}\) is monotonically increasing. Moreover, \(\{a_k\}\) is upper bounded, since for all \(k\ge 1\), \(a_k \le H^{u} + \sum _{i=1}^{k-1}c_i\) and
where the last inequality follows from Assumption 4 and the fact that \(\mathcal {X}\) is compact. Since \(\{a_k\}\) is monotonically increasing and upper bounded, \(\lim _{k\rightarrow \infty }a_k\) exists. Using the dominated convergence theorem, we conclude that \(\sum _{i=1}^{\infty }c_i\) exists and
Therefore, the limit of the righthand side of \(\mathbb {E}_{b_k}[H(X)] = a_k - \sum _{i=1}^{k-1}c_i\) exists, which implies that \(\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(x)]\) exists. \(\square \)
1.2 Proof of Theorem 1
Proof
Since \(y_k\) is monotonically increasing and upper bounded by \(H^*\) and is updated only when \(\gamma _k\ge y_{k-1}+\epsilon \), there exists \(K<\infty \) such that \(y_k=y_K\), \(\forall k\ge K\). There are two cases need to consider: (i) \(y_K=H^*\), and (ii) \(y_K<H^*\).
(i) Case 1: \(y_K=H^*\)
By (17), we have
Since \(\varphi (\cdot )\) has support on \([0,H^u-H^l]\), we have \(\varphi (H(X)-y_K)=0\) if \(H(x)<H^*\), which trivially gives us
Thus,
which completes the proof of case (i).
(ii) Case 2: \(y_K<H^*\)
By lemma 1, the sequence \(\{\mathbb {E}_{b_k}[H(X)],k=1,2,\dots \}\) converges. Suppose \(\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(X)]=H_*\), and we will prove \(H_*=H^*\) by contradiction.
We define the set \(\mathcal {A}\) as
For any fixed \(x\in \mathcal {A}\) and any finite \(i\), since \(\varphi (H(X)-y_i)>0\) and by Assumption 3 \(\tilde{b}_i(x)>0\), we have \(b_i(x)>0\) and \(\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]>0\). Therefore, by induction we may represent (18) by
and from Assumption 4, we have \(\lim _{i\rightarrow \infty }\frac{\tilde{b}_i(x)}{b_i(x)}=1\), almost everywhere in \(\mathcal {A}\).
Hence, almost everywhere in \(\mathcal {A}\),
where the last equality is because of the continuity of \(\varphi (\cdot )\) under Assumption 1, and \(\lim _{i\rightarrow \infty }\mathbb {E}_{\tilde{b}_{i-1}}[\varphi (H(X)-y_i)]=\lim _{i\rightarrow \infty }\mathbb {E}_{b_{i-1}}[\varphi (H(X)-y_i)]\) is yielded by bounded convergence theorem and Assumption 4.
Suppose \(\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(X)]=H_*<H^*\), a trivial contradiction leads to
We can write
Taking limit on both sides of the inequality, by the continuity of \(\varphi (\cdot )\) we have
We define the set \(\mathcal {B}\) as
Since \(C>0\) and \(\varphi (\cdot )\) is strictly increasing, \(\varphi (H_*-y_K)C+\varphi (H^*-y_K)(1-C)<\varphi (H^*-y_K)\). Thus, \(\mathcal {B}\) has a strict positive Lebesgue measure by Assumption 2.
Hence, almost everywhere in \(\mathcal {B}\),
by the definition of \(\mathcal {B}\). From the inequality above and (19), we have
By Fatou’s lemma and the positive Lebesgue measure of \(\mathcal {B}\), we have
which contradicts to the fact that
Therefore, we conclude that \(H_*=H^*\), and \(\lim _{k\rightarrow \infty }\mathbb {E}_{b_k}[H(X)]=H^*\). \(\square \)
1.3 Proof of Theorem 2
Proof
We prove this theorem by showing that Assumption 4 is satisfied under Assumptions 5–7.
For any fixed \(\theta \in \Theta \), let \(S_k^m(\theta )=\left\{ \theta _k \in \Theta : |\theta _k^i-\theta ^i|<\delta _k, i=1,\ldots ,m \right\} \), where \(\theta ^i\) denotes the i-th element of \(\theta \), \(m\) is the dimension of \(\theta \). Let \(V(S_k^m(\theta ))\) denote the volume of \(S_k^m(\theta )\). By Assumption 5, the artificial noise \(\Gamma _k\) is uniformly distributed on \([-\delta _k,\delta _k]^m\), then the p.d.f. \(p(\theta |\theta _k,y_{1:k})\) is
Plugging (21) into (20), we have
The joint posterior p.d.f. \(b_k(x,\theta )\) can be represented by
Then,
By Assumption 6, \(b_k(\theta )\) is continuous on \(S_k^m(\theta )\) and differentiable on the open set \(\{\theta _k \in \Theta :|\theta _k^i-\theta ^i|<\delta _k,\ i=1,\ldots ,m\}\). By the mean value theorem, \(\exists \ \xi \in S_k^m(\theta )\), such that
where \(\Vert \cdot \Vert _2\) denotes the Euclidean norm and \(\Vert \cdot \Vert \) denotes the maximum norm.
Define
and \(V(\Theta )\) is the volume of \(\Theta \), which is bounded. Now, since
to prove \(\sum _{k=1}^{\infty }|b_k(x)-\tilde{b}_k(x)|<\infty \) almost everywhere in \(\mathcal {X}\), it is sufficient to show \(\sum _{k=1}^{\infty } d_k <\infty \).
The gradient of \(b_k(\theta )\) is
Since there exists \(K<\infty \) such that \(y_k=y_K\), \(\forall k\ge K\), the gradient of \(b_k(\theta )\) is upper bounded by
\(\forall k\ge K,\)
\(\forall k< K,\)
where the inequalities are because of \(\varphi (0)\le \varphi (H(x)-y_k)\le \varphi (H^u-y_K)\), \(\forall k\ge K\), and \(\varphi (0)\le \varphi (H(x)-y_k)\le \varphi (H^u-H^l)\), \(\forall k< K\). Taking the maximum norm on both sides of (22) and (23), we have the following inequalities
\(\forall k\ge K,\)
\(\forall k< K,\)
Next, we prove \(\exists \ \eta _k\in \Theta \), such that \(\Vert \nabla _\theta \tilde{b}_{k}(\theta )\Vert \le \Vert \nabla _\theta b_{k}(\eta _k)\Vert \), where \(\eta _k\) is dependent on \(\theta \).
Let \(\vec {\varepsilon }^i=(0,0,\ldots ,0,\varepsilon ,0,\ldots ,0)\), where the i-th element of \(\vec {\varepsilon }^i\) is \(\varepsilon \) and other elements are 0. We denote \(\theta =(\theta ^1,\theta ^2,\ldots ,\theta ^m)\), and \(\bar{\theta }^i=(\theta ^1,\theta ^2,\ldots ,\theta ^{i-1},\theta ^{i+1},\ldots ,\theta ^m)\). With the definition of \(\bar{\theta }^i\), \(\tilde{b}_k(\theta )\) can be alternatively represented by
where \(S_k^{m-1}(\bar{\theta }^i)=\{\bar{\theta }^i_k\in \Theta : |\theta ^i_k-\theta ^i|<\delta _k, i=1,\ldots ,i-1,i+1,\ldots ,m\}\). Then,
Because \(S_k(\theta )\) is compact, \(\exists \ t\in S_k(\theta )\), such that \(\forall \theta _k\in S_k(\theta )\), we have
Thus,
By mean value theorem, \(\exists \ \tau \in \Theta \), such that
Thus,
By the definition of derivative, we have
It is easy to observe from the above inequality that \(\exists \eta _k \in \Theta \), such that
By (24)–(26), we may bound \(\Vert \nabla _\theta b_k(\theta )\Vert \) in terms of \(\Vert \nabla _\theta b_{k-1}(\cdot )\Vert \). Therefore, \(\exists \ \eta _{k-1}\in \Theta \), such that
and
By induction, we have
By Assumption 7, we have \(\Vert \nabla _\theta b_0(\theta )\Vert \le A\); hence the upper bound of \(d_k\) is
If \(\delta _k=\delta \alpha ^k\) and \(\alpha <\frac{\varphi (0)}{\varphi (H^u-y_K)}\), we have \(\sum _{k=1}^\infty d_k<\infty \), which implies that
as \(k\) goes to infinity. Therefore, \(\sum _{k=1}^{\infty }|b_k(x)-\tilde{b}_k(x)|<\infty \) almost everywhere in \(\mathcal {X}\), which is Assumption 4. \(\square \)
Rights and permissions
About this article
Cite this article
Chen, X., Zhou, E. Population model-based optimization. J Glob Optim 63, 125–148 (2015). https://doi.org/10.1007/s10898-015-0288-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-015-0288-1