Skip to main content

Advertisement

Log in

Recursive Modified Pattern Search on High-Dimensional Simplex : A Blackbox Optimization Technique

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

In this paper, a novel derivative-free pattern search based algorithm for Black-box optimization is proposed over a simplex constrained parameter space. At each iteration, starting from the current solution, new possible set of solutions are found by adding a set of derived step-size vectors to the initial starting point. While deriving these step-size vectors, precautions and adjustments are considered so that the set of new possible solution points still remain within the simplex constrained space. Thus, no extra time is spent in evaluating the (possibly expensive) objective function at infeasible points (points outside the unit-simplex space); which being the primary motivation of designing a customized optimization algorithm specifically when the parameters belong to a unit-simplex. While minimizing any objective function of m parameters, within each iteration, the objective function is evaluated at 2m new possible solution points. So, upto 2m parallel threads can be incorporated which makes the computation even faster while optimizing expensive objective functions over high-dimensional parameter space. Once a local minimum is discovered, in order to find a better solution, a novel ‘re-start’ strategy is considered to increase the likelihood of finding a better solution. Unlike existing pattern search based methods, a sparsity control parameter is introduced which can be used to induce sparsity in the solution in case the solution is expected to be sparse in prior. A comparative study of the performances of the proposed algorithm and other existing algorithms are shown for a few low, moderate and high-dimensional optimization problems. Upto 338 folds improvement in computation time is achieved using the proposed algorithm over Genetic algorithm along with better solution. The proposed algorithm is used to estimate the simultaneous quantiles of North Atlantic Hurricane velocities during 1981–2006 by maximizing a non-closed form likelihood function with (possibly) multiple maximums.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

Notes

  1. https://github.com/priyamdas2/RMPSS

  2. http://weather.unisys.com/hurricanes

  3. https://www.mathworks.com/help/optim/ug/improving-performance-with-parallel-computing.html

References

  • Audet, C. (2014). A survey on direct search methods for blackbox optimization and their applications. Mathematics without boundaries: Surveys in interdisciplinary research, chapter 2 31—56.

  • Audet, C., Bechard, V. and Le Digabel, S. (2008a). Nonsmooth optimization through mesh adaptive direct search and variable neighborhood search. J. Glob. Optim. 41, 2, 299–318.

    Article  MathSciNet  Google Scholar 

  • Audet, C., Dennis, Jr. J. E. and Le Digabel, S. (2008b). Parallel space decomposition of the mesh adaptive direct search algorithm. SIAM J. Optim.19, 3, 1150–1170.

    Article  MathSciNet  Google Scholar 

  • Audet, C. and Dennis Jr, J. E. (2006). Mesh adaptive direct search algorithms for constrained optimization. SIAM J. Optim. 17, 1, 188–217.

    Article  MathSciNet  Google Scholar 

  • Basford, K. E. and McLachlan, G. J. (1985). Likelihood estimation with normal mixture models. Journal of the Royal Statistical Society. Series C (Applied Statistics) 34, 3, 282–289.

    MathSciNet  Google Scholar 

  • Bethke, A. D. (1980). Genetic algorithms as function optimizers.

  • Boggs, P. T. and Tolle, J. W. (1996). Sequential quadratic programming. Acta Numerica, 1–52.

  • Boyd, S. and Vandenberghe, L. (2006). Convex Optimization. Cambridge University Press, Cambridge.

    MATH  Google Scholar 

  • Conn, A. R., Scheinberg, K. and Vicente, L. N. (2009). Introduction to Derivative-Free optimization. MOS-SIAM series on optimization. SIAM.

  • Custodio, A. L. and Madeira, J. F. A. (2015). GLODS: Global and Local Optimization using Direct Search. J. Glob. Optim. 62, 1, 1–28.

    Article  MathSciNet  Google Scholar 

  • Das, P. (2016). Black-box optimization on Hyper-rectangle using Recursive Modified Pattern Search and Application to Matrix Completion Problem with Non-convex Regularization hrefhttps://arxiv.org/pdf/1604.08616.pdf.

  • Das, P. and Ghosal, S. (2017). Bayesian quantile regression using random B-spline series prior. Computational Statistics & Data Analysis 109, 121–143.

    Article  MathSciNet  Google Scholar 

  • Das, P. and Ghosal, S. (2017). Analyzing ozone concentration by Bayesian spatio-temporal quantile regression. Environmetrics 28, 4, e2443.

    Article  Google Scholar 

  • Das, P. and Ghosal, S. (2018). Bayesian non-parametric simultaneous quantile regression for complete and grid data. Computational Statistics & Data Analysis 127, 172–186.

    Article  MathSciNet  Google Scholar 

  • Eberhart, R. and Kennedy, J. (1995). A new optimizer using particle swarm theory. In: roceedings of the Sixth International Symposium on Micro Machine and Human Science, 39–43. Nagoya, Japan.

  • Elsner, J., Kossin, J. and Jagger, T. (2008). The increasing intensity of the strongest tropical cyclones. Nature 455, 92–95.

    Article  Google Scholar 

  • Fermi, E. and Metropolis, N. (1952). Numerical solution of a minimum problem. Los Alamos Unclassiffied Report LA–1492, Los Alamos National Laboratory, Los Alamos, USA.

  • Fraser, A. S. (1957). Simulation of genetic systems by automatic digital computers i. introduction. Aust. J. Biol. Sci. 10, 484–491.

    Article  Google Scholar 

  • Geris, L. (2012). Computational modeling in tissue engineering. Springer.

  • Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Operations Research Series, Addison-Wesley Publishing Company.

  • Granville, V., Krivanek, M. and Rasson, J. P. (1994). Simulated annealing: a proof of convergence. IEEE Trans. Pattern Anal. Mach. Intell. 16, 652–656.

    Article  Google Scholar 

  • Hilbert, M. and Lopez, P. (2011). The world’s technological capacity to store, communicate, and compute information. Science 332, 60, 60–65.

    Article  Google Scholar 

  • Hooke, R. and Jeeves, T. A. (1961). Direct search solution of numerical and statistical problems. J. Assoc. Comput. Mach. 8, 212–219.

    Article  Google Scholar 

  • Jones, D. R., Schonlau, M. and Welch, W. J. (1998). Efficient global optimization of expensive black box functions. J. Glob. Optim. 13, 4, 455–492.

    Article  MathSciNet  Google Scholar 

  • Karmakar, N. (1984). New polynomial-time algorithm for linear programming. COMBINATOR1CA 4, 373–395.

    Article  MathSciNet  Google Scholar 

  • Kennedy, J. and Eberhart, R. (1995). Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, 1942–1948. Piscataway, NJ USA.

  • Kerr, C. C., Smolinski, T. G., Dura-Bernal, S. and Wilson, D. P. (2014). Optimization by bayesian adaptive locally linear stochastic descent, http://thekerrlab.com/ballsd/ballsd.pdf.

  • Kirkpatrick, S., Jr, C. D. G. and Vecchi, M. P. (1983). Optimization by simulated annealing. Aust. J. Biol. Sci. 220, 4598, 671–680.

  • Kolda, T. G., Lewis, R. M. and Torczon, V. (2003). Optimization by direct search: New perspectives on some classical and modern methods. SIAM Rev.45, 3, 385–482.

    Article  MathSciNet  Google Scholar 

  • Le Digabel, S. (2011). Algorithm 909: NOMAD: Nonlinear Optimization with the MADS algorithm. ACM Trans. Math. Softw. 37(4), 44, 1–15.

    Article  MathSciNet  Google Scholar 

  • Lewis, R. M. and Torczon, V. (1999). Pattern search algorithms for bound constrained minimization. SIAM J. Optim. 9, 4, 1082–1099.

    Article  MathSciNet  Google Scholar 

  • Lewis, R. M. and Torczon, V. J. (2000). Pattern search algorithms for linearly constrained minimization. SIAM J. Optim. 10, 917–941.

    Article  MathSciNet  Google Scholar 

  • Martelli, E. and Amaldi, E. (2014). PGS-COM: A Hybrid method for constrained non-smooth black-box optimization problems: Brief review, novel algorithm and comparative evaluation. Comput. Chem. Eng. 63, 108–139.

    Article  Google Scholar 

  • Martınez, J. M. and Sobral, F. N. C. (2003). Constrained derivative-free optimization on thin domains. J. Glob. Optim. 56, 3, 1217–1232.

    Article  MathSciNet  Google Scholar 

  • Nocedal, J. and Wright, S. J. (2006). Numerical Optimization, 2 edn. Operations Research Series, Springer.

  • Potra, F. A. and Wright, S. J. (2000). Interior-point methods. J. Comput. Appl. Math. 4, 281–302.

    Article  MathSciNet  Google Scholar 

  • Rios, L.M. and Sahinidis, N.V. (2013). Derivative-free optimization: a review of algorithms and comparison of software implementations. J. Glob. Optim.56, 3, 1247–1293.

    Article  MathSciNet  Google Scholar 

  • Torczon, V. J. (1997). On the convergence of pattern search algorithms. SIAM J. Optim. 7, 1–25.

    Article  MathSciNet  Google Scholar 

  • Wright, M. H. (2005). The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bulletin of American Mathematical Society 42, 39–56.

    Article  MathSciNet  Google Scholar 

  • de Boor, C. (2001). A practical guide to splines (revised edition). Springer.

Download references

Acknowledgements

I would like to thank Dr. Rudrodip Majumdar, Dr. Debraj Das, Dr. Suman Chakraborty and Dr. Kushal Dey for helping me editing the earlier drafts of this paper and for their valuable suggestions for improvements. I would also like to acknowledge my Ph.D. adviser Dr. Subhashis Ghoshal for his valuable suggestions and suggested statistical problems which made me think of this algorithm. Also, I would like to thank the reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priyam Das.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Discussion on the number of operations and objective function evaluations required at each iteration of RMPSS

Here we find the order of the number of basic operations and the number of objective function evaluations required at each iteration in terms of the dimension of the parameter space. Therefore, we find the upper bound of number of operations required for the worst case scenario which would be sufficient to determine the order of the number of basic operations.

Suppose we want to minimize f(p) where pS. At the beginning of each iteration, 4 arrays of length m, i.e., s+, s, f+, f+ are initialized (see step (2) of STAGE 1 in Section 2). During each iteration, starting from the current value of the parameter 2m candidate solutions are generated in a such way that each of them belongs to the domain S. Search algorithm for first m of these movements have been described in step (3) of STAGE 1 of Section 2.

In step (3) of STAGE 1 of Section 2, note that it requires not more than m operations to find \(S_{i}^{+}\). To find \(K_{i}^{+}\), it takes at most m operations. As we are considering the worst case scenario in terms of maximizing the number or required operations, assume \(K_{i}^{+} \geq 1\). In step (3.1) and (3.2) of STAGE 1, suppose the value of \(S_{i}^{+}\) is updated atmost k times. So, we have \(\frac {s_{initial}}{\rho ^{k-1}} \leq \phi \) but \(\frac {s_{initial}}{\rho ^{k-2}} > \phi \). Hence \(k = 1+\left [\frac {\log (\frac {s_{initial}}{\phi })}{\log (\rho )}\right ]\), where [x] returns the largest interger less than or equal to x. Corresponding to each update step of \(S_{i}^{+}\), first it is checked whether \(s_{i}^{+} \leq \phi \) or not. It involves a single operation. Then deriving \(\mathbf {q}_{i}^{+}\) involves not more than 2m steps because the most complicated scenario occurs for updating the positions of \(\mathbf {q}_{i}^{+}\) which belong to \(S_{i}^{+}\). And in that case, it takes total two operations for each site, one operation to find \(\frac {s_{i}^{+}}{K_{i}^{+}}\) and one more to evaluate to subtract that quantity from \(p_{i}^{(l)}\) for \(l \in S_{i}^{+}\). In order to check whether \(\mathbf {q}_{i}^{+} \in \mathbf {S}\) or not, it requires m operations. For the worst case scenario, we also add one more step required for updating \(s_{i}^{+} = \frac {s_{i}^{+}}{\rho }\). Hence the search procedure of any movement (i.e., for any i ∈ {1, … , m}) in step (3) of STAGE 1 requires m + m + k × (1 + 2m + m + 1) = m × (2 + 3k) + 2k operations. Hence, for m movements (mentioned in step (3) of STAGE 1 in Section 2) it requires not more than m2 × (2 + 3k) + 2mk operations. In a similar way, it can be shown that for step (4) also the maximum number of required operations is not more than m2 × (2 + 3k) + 2mk.

In step (5) of STAGE 1 in Section 2, to find k1 or k2, it takes (m − 1) operations. The required number of steps for this step will be maximized if \(\min \limits (f_{k_{1}}^{+},f_{k_{2}}^{-}) < Y^{(j)}\). Under this scenario, two more operations (i.e., comparisons) are required to find ptemp. So this step requires not more than 2 × (m − 1) + 2 = 2m operations.

In step (6) of STAGE 1 in Section 2, it takes at most m operations to find Supdated. In case Kupdated is not m, the required number of operations required in this step would be more than the case when Kupdated = m. For finding out the number of operations required for the worst case scenario, assume Kupdated < m. To find the value of garbage, maximum number of required steps is not more than m. Finally, it can be noted that in step (6.2), updating the value of the parameter of interest from p(j) to p(j+ 1) requires not more than 2m steps. So maximum number of operations required for step (6) of STAGE 1 is not more than m + m + 2m = 4m.

In step (7) of STAGE 1 in Section 2, to find (p(j)(i) −p(j− 1)(i))2 for each i ∈ {1, … , m}, we need one operation for taking difference, and one operation for taking the square. Hence to find the sum of the squares, it needs (3m − 1) more operations. Comparing it’s value with tol_fun takes one more operation. In the worst case scenario, it would take two more operations till the end of the iteration, i.e., update of s(j) at step (7) and it’s comparison with ϕ at step (8). Hence after step (6), the required number of operations would be at most (3m − 1) + 1 + 2 = 3m + 2.

Hence for each iteration, in the worst case scenario, the number of required basic operations is not more than m2(2 + 3k) + 2mk + 2m + 3m + (3m + 2) = m2(2 + 3k) + m(2k + 8) + 2. So, number of basic operations required for each iteration in our algorithm is of O(m2) where m is the number of parameters to estimate.

Note that the number of times the objective function is evaluated in each iteration is 2m + 1 (once at step (2), m times at step (3.3) and m times at step (4.3) of STAGE 1 in Section 2). Thus, we note that the order of the number of function evaluations at each iteration step is of O(m).

Appendix B: Model and Likelihood of Simultaneous Quantile Regression

Let \(Q_{y}(\tau |x)=\inf \{q : P(Y\leq q| X=x)\geq \tau \}\) denote the τ-th conditional quantile of a response Y at X = x for 0 ≤ τ ≤ 1, where X is the predictor. A linear simultaneous quantile regression model for Qy(τ|x) at a given τ is given by

$$Q_{y}(\tau|x)=\beta_{0}(\tau)+x\beta(\tau)$$

where β0(τ) denotes the intercept and β(τ) denotes the slope which are smoothly varying function of τ. After transforming the predictor and the response variables to unit variable by some monotonic transformation, as shown in Das and Ghosal (2017a), the linear quantile function can be represented as

$$ Q_{y}(\tau|x)= x\xi_{1}(\tau)+(1-x)\xi_{2}(\tau) \text{for} \tau \in [0,1], x,y \in [0,1], $$
(B.1)

for some functions ξ1(τ) and ξ2(τ) which are monotonically increasing in τ for τ ∈ [0,1] satisfying ξ1(0) = ξ2(0) = 0, ξ1(1) = ξ2(1) = 1. Equation B.1 can be re-framed as

$$ Q_{y}(\tau|x)= \beta_{0}(\tau)+x\beta_{1}(\tau) \text{for} \tau \in [0,1], x,y \in [0,1], $$

where β0(τ) = ξ2(τ) and β1(τ) = ξ1(τ) − ξ2(τ) denotes the slope and the intercept of the quantile regression. The conditional density for Y is given by

$$ f_{y}(y|x) = \left( \frac{\partial}{\partial \tau}Q_{y}(\tau|x)|_{\tau=\tau_{x}(y)}\right)^{-1}=\left( \frac{\partial}{\partial \tau}\beta_{0}(\tau)+x\frac{\partial}{\partial \tau}\beta(\tau)|_{\tau=\tau_{x}(y)}\right)^{-1}, $$
(B.2)

where τx(y) solves the equation

$$ x\xi_{1}(\tau)+(1-x)\xi_{2}(\tau) = y. $$
(B.3)

Therefore for any given dataset \(\{(X_{i},Y_{i})\}_{i=1}^{n}\), the likelihood is given by \({\prod }_{i=1}^{n} f_{Y}(y_{i}|x_{i})\).

Let 0 = t0 < t1 < … < tk = 1 be the equidistant knots on the interval [0,1] such that t0 = 0, tk = 1 and \(t_{i}=\frac {1}{k}\) for all i = 0,1, … , k − 1. Suppose \(\{B_{j,h}(t)\}_{j=1}^{k+h}\) denote the basis functions of hth degree B-splines on [0,1] on the above mentioned set of equidistant knots. Now, the basis expansion of ξ1(⋅) and ξ2(⋅) are given by

$$ \begin{array}{@{}rcl@{}} && \xi_{1}(\tau)=\sum\limits_{j=1}^{k+h} \theta_{j} B_{j,h}(\tau) \text{ where } 0=\theta_{1}<\theta_{2}<\cdots<\theta_{k+h}=1, \\ && \xi_{2}(\tau)=\sum\limits_{j=1}^{k+h} \phi_{j} B_{j,h}(\tau) \text{ where } 0=\phi_{1}<\phi_{2}<\cdots<\phi_{k+h}=1. \end{array} $$
(B.4)

Note that estimating \(\{\theta _{j},\phi _{j}\}_{j=1}^{k+h}\) is equivalent to estimating \(G =\{\gamma _{j}\}_{j=1}^{k+h-1},\) \(D = \{\delta _{j}\}_{j=1}^{k+h-1}\) where

$$ \gamma_{j},\delta_{j} \geq 0, j=1,\cdots,k+h-1, \text{and} \sum\limits_{j=1}^{k+h-1}\gamma_{j}= \sum\limits_{j=1}^{k+h-1}\delta_{j}=1. $$

Appendix C: Application of RMPSS to estimate the membership probability vector of mixture model

In this section we discuss how RMPSS can be used to estimate the proportion vector of a mixture model (e.g., Gaussian mixture). Suppose X is a random variable which is coming from mixture of C classes with density functions \(\{f_{j}(\cdot |\theta _{j})\}_{j=1}^{C}\) with probabilities p = (p1, … , pC) such that \(p_{j}\geq 0,{\sum }_{j=1}^{C}p_{j} = 1\). So in this case, the density of the univariate mixture model is given by

$$ L(f(x|\mathbf{p},\boldsymbol{\theta}) = \sum\limits_{j=1}^{C} p_{j}f_{j}(x|\theta_{j}), $$
(B.1)

where θ = (θ1, … , θC) denotes the parameters of C classes. For a given sample \(\{x_{i}\}_{i=1}^{n}\), the likelihood is given by

$$L(\mathbf{p},\boldsymbol{\theta}) = \prod\limits_{i=1}^{n}\sum\limits_{j=1}^{C} p_{j}f_{j}(x_{i}|\theta_{j}).$$

We consider the case where θjs are univariate.

Case 1 ::

When θ is known :

In case, θ is known, the likelihood is a function of only the proportion vector p. Now, since θjs are known, while estimating p, problem of identifiability would not occur as the order of θjs cannot change, so changing the order of elements of p would not produce the same likelihood value for all \(x \sim X\) assuming all θjs are different. So the proportion vector can be estimated using RMPSS without further modification.

Case 2 ::

When θ is unknown :

In case, θ is also unknown, both p = (p1, … , pC) and θ = (θ1, … , θC) are needed to be estimated. However, unlike previous scenario, in this case, the likelihood function is not identifiable. For example, suppose \(\mathbf {p}^{*}=(q_{1},q_{2},q_{3}), \mathbf {p}^{**}=(q_{2},q_{1},q_{3}), \boldsymbol {\theta }^{*} = (\phi _{1},\phi _{2}, \phi _{3}), \boldsymbol {\theta }^{**} = (\phi _{2},\phi _{1}, \phi _{3})\) for C = 3. Then note that L(p, θ) = L(p∗∗, θ∗∗) holds for any given sample. In order to get rid of the identifiability problem, we set a natural ordering of the θ parameter space such that θ1 > θ2 > ⋯ > θC. Define, βj = θjθj− 1 for j = 2, ⋯ , C. So, in case θ ∈RC, then the new set of parameters are given by \(\boldsymbol {\theta }^{\prime } = \{\theta _{1},\beta _{2},\beta _{3},\ldots ,\beta _{C}\} \in \mathrm {R}\times \mathrm {R^{+}}^{C-1}\). Now, to estimate the solution \(\hat {\mathbf {p}}\) and \(\hat {\boldsymbol {\theta }^{\prime }}\) for which the likelihood function is maximized, RMPSS can be used along with any other Black-box algorithm (e.g., GA, SA, PSO etc), such that at any given iteration, RMPSS is used to maximize the likelihood in terms of p fixing the value of \(\boldsymbol {\theta }^{\prime }\) at current value, then Genetic Algorithm (or any other Blackbox algorithm) is used to maximize the likelihood in terms of \(\boldsymbol {\theta }^{\prime }\) fixing the value of p at the current updated value. Suppose p(k) and \(\boldsymbol {\theta ^{\prime }}^{(k)}\) denote the updated values of p and \(\boldsymbol {\theta ^{\prime }}\) at the beginning of k-th iteration (k ≥ 2). Then following iteration steps should be performed iterartively until final solution is obtained.

  • While \(L_{1}(\mathbf {p}^{(k)},\boldsymbol {\theta ^{\prime }}^{(k)}) \neq L_{1}(\mathbf {p}^{(k-1)},\boldsymbol {\theta ^{\prime }}^{(k-1)})\)

    1. 1.

      Set k = k + 1.

    2. 2.

      Set \(\mathbf {p}^{(k)} = {\arg \max \limits }_{\mathbf {p}}, L_{1}(\mathbf {p}, \boldsymbol {\theta ^{\prime }} = \boldsymbol {\theta ^{\prime }}^{(k-1)}\)) by solving with RMPSS where p ∈ ΔC− 1,

    3. 3.

      Set \(\boldsymbol {\theta ^{\prime }}^{(k)} = {\arg \max \limits }_{\boldsymbol {\theta ^{\prime }}} L_{1}(\mathbf {p} = \mathbf {p}^{(k)}, \boldsymbol {\theta ^{\prime }})\) by solving with GA (or any other Black-box optimization technique) where \(\boldsymbol {\theta ^{\prime }} \in \mathrm {R}\times \mathrm {R^{+}}^{C-1}\),

where \(L_{1}(\mathbf {p}, \boldsymbol {\theta ^{\prime }}) = L(\mathbf {p}, \boldsymbol {\theta })\) and

$$ \begin{array}{@{}rcl@{}} {\Delta}^{C-1} = \left\{(p_{1}, \ldots, p_{C}) \in \mathbb{R}^{C} | p_{i} \geq 0, i=1,\ldots,C, \sum\limits_{i=1}^{C}p_{i}= 1\right\}. \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, P. Recursive Modified Pattern Search on High-Dimensional Simplex : A Blackbox Optimization Technique. Sankhya B 83 (Suppl 2), 440–483 (2021). https://doi.org/10.1007/s13571-020-00236-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-020-00236-9

Keywords

AMS (2000) subject classification

Navigation