Black-box optimization on hyper-rectangle using Recursive Modified Pattern Search and application to ROC-based Classification Problem

Das, Priyam

doi:10.1007/s13571-023-00312-w

Black-box optimization on hyper-rectangle using Recursive Modified Pattern Search and application to ROC-based Classification Problem

Published: 12 October 2023

Volume 85, pages 365–404, (2023)
Cite this article

Sankhya B Aims and scope Submit manuscript

Priyam Das ORCID: orcid.org/0000-0003-2384-0486¹

59 Accesses
1 Altmetric
Explore all metrics

Abstract

In statistics, it is common to encounter multi-modal and non-smooth likelihood (or objective function) maximization problems, where the parameters have known upper and lower bounds. This paper proposes a novel derivative-free global optimization technique that can be used to solve those problems even when the objective function is not known explicitly or its derivatives are difficult or expensive to obtain. The technique is based on the pattern search algorithm, which has been shown to be effective for black-box optimization problems. The proposed algorithm works by iteratively generating new solutions from the current solution. The new solutions are generated by making movements along the coordinate axes of the constrained sample space. Before making a jump from the current solution to a new solution, the objective function is evaluated at several neighborhood points around the current solution. The best solution point is then chosen based on the objective function values at those points. Parallel threading can be used to make the algorithm more scalable. The performance of the proposed method is evaluated by optimizing up to 5000-dimensional multi-modal benchmark functions. The proposed algorithm is shown to be up to 40 and 368 times faster than genetic algorithm (GA) and simulated annealing (SA), respectively. The proposed method is also used to estimate the optimal biomarker combination from Alzheimer’s disease data by maximizing the empirical estimates of the area under the receiver operating characteristic curve (AUC), outperforming the contextual popular alternative, known as step-down algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

The importance of implementation details and parameter settings in black-box optimization: a case study on Gaussian estimation-of-distribution algorithms and circles-in-a-square packing problems

Article 15 October 2016

Lipschitz-inspired HALRECT algorithm for derivative-free global optimization

Article 30 May 2023

On Jackknifed Greedy Algorithms and Their Applications in NMR

Article 01 November 2020

References

Audet, C.: A survey on direct search methods for blackbox optimization and their applications. Mathematics without boundaries: Surveys in interdisciplinary research chapter 2, 31–56 (2014)
Article MathSciNet MATH Google Scholar
Audet, C., Bechard, V., Digabel, S.L.: Nonsmooth optimization through mesh adaptive direct search and variable neighborhood search. Journal of Global Optimization 41(2), 299–318 (2008)
Article MathSciNet MATH Google Scholar
Audet, C., Dennis, J.: Mesh adaptive direct search algorithms for constrained optimization. SIAM Journal on Optimization 17(1), 188–217 (2006)
Article MathSciNet MATH Google Scholar
Audet, C., Jr., J.D., Digabel, S.L.: Parallel space decomposition of the mesh adaptive direct search algorithm. SIAM Journal on Optimization 19(3), 1150–1170 (2008)
Bethke, A.D.: Genetic algorithms as function optimizers (1980). https://api.semanticscholar.org/CorpusID:60965631
Boggs, P., Tolle, J.: Sequential quadratic programmings. Acta Numerica pp. 1–52 (1996)
Byrd, R., Gilbert, J., Nocedal, J.: A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming 89(1), 149–185 (2000)
Article MathSciNet MATH Google Scholar
Candes, E., Recht, B.: Exact matrix completion via convex optimization. Foundations of Computational Mathematics 9, 717–772 (2009)
Article MathSciNet MATH Google Scholar
Candes, E., Tao, T.: The power of convex relaxation: Near-optimal matrix completion. IEEE transactions on information theory 56(5), 2053–2080 (2010)
Article MathSciNet MATH Google Scholar
Conn, A., Scheinberg, K., Vicente, L.: Introduction to derivative-free optimization. Mathematics without boundaries: Surveys in interdisciplinary research, MOS-SIAM Series on Optimization, SIAM (2009)
Book MATH Google Scholar
Custodio, A., Madeira, J.: Glods: Global and local optimization using direct search. Journal of Global Optimization 62(1), 1–28 (2015)
Article MathSciNet MATH Google Scholar
Das, P.: Recursive modified pattern search on high-dimensional simplex : a blackbox optimization technique. The Indian Journal of Statistics - Sankhya B 83, 440–483 (2021)
Article MathSciNet MATH Google Scholar
Das, P., De, D.: Rmpsh: A r package for recursive modified pattern search on hyper-rectangle. R CRAN https://CRAN.R-project.org/package=RMPSH
Das, P., De, D., Maiti, R., Kamal, M., Hutcheson, K.A., Fuller, C.D., Chakraborty, B., Peterson, C.B.: Estimating the optimal linear combination of predictors using spherically constrained optimization. BMC Bioinformatics 23(3)(436) (2022)
Das, P., Ghosal, S.: Analyzing ozone concentration by bayesian spatio-temporal quantile regression. Environmetrics 28(4), e2443 (2017)
Article Google Scholar
Das, P., Sen, D., De, D., Hou, J., Abad, Z., Kim, N., Xia, Z., Cai, T.: Clustering sequence data with mixture Markov chains with covariates using multiple simplex constrained optimization routine (MSiCOR). Journal of Computational and Graphical Statistics (2023). https://doi.org/10.1080/10618600.2023.2257258
Das, P., Weisenfeld, D., Dahal, K., De, D., Feathers, V., Coblyn, J., Weinblatt, M., Shadick, N., Cai, T., Liao, K.: Utilizing biologic disease-modifying anti-rheumatic treatment sequences to subphenotype rheumatoid arthritis. Arthritis Research and Therapy 25(1), 1–7 (2023)
Article Google Scholar
Digabel, S.L.: Algorithm 909: Nomad: Nonlinear optimization with the mads algorithm. ACM Transactions on Mathematical Software 37(4(44)), 1–15 (2011)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456), 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fermi, E., Metropolis, N.: Numerical solution of a minimum problem. los alamos unclassified report la-1492. Los Alamos National Laboratory, Los Alamos, USA (1952)
Fraser, A.: Simulation of genetic systems by automatic digital computers i. introduction. Australian Journal of Biological Sciences 10, 484–491 (1957)
Geris, L.: Computational Modeling in Tissue Engineering. Springer (2012)
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Operations Research Series, Addison-Wesley Publishing Company (1989)
MATH Google Scholar
Goodner, J., Tsianos, G., Li, Y., Loeb, G.: Biosearch: A physiologically plausible learning model for the sensorimotor system. Proceedings of the Society for Neuroscience Annual Meeting (2012)
Granville, V., Krivanek, M., Rasson, J.P.: Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 652–656 (1994)
Article Google Scholar
Hsu, M., Chen, Y.: Optimal linear combination of biomarkers for multi-category diagnosis. Statistics in Medicine 35(2), 202–213 (2016)
Article MathSciNet Google Scholar
Huyer, W., Neumaier, A.: Global optimization by multilevel coordinate search. Journal of Global Optimization 14, 331–355 (1999)
Article MathSciNet MATH Google Scholar
Jamil, M., Yang, X.: A literature survey of benchmark functions for global optimization problems. Int. J. of Mathematical Modelling and Numerical Optimisation 4(2) (2013)
Jones, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive black box functions. Journal of Global Optimization 13(4), 455–492 (1998)
Article MathSciNet MATH Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA pp. 1942–1948 (1995)
Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimization by simulated annealing. Australian Journal of Biological Sciences 220(4598), 671–680 (1983)
MathSciNet MATH Google Scholar
Kolda, T., Lewis, R., Torczon, V.: Optimization by direct search: New perspectives on some classical and modern methods. SIAM Review 45(3), 385–482 (2003)
Article MathSciNet MATH Google Scholar
Lewis, R., Torczon, V.: Pattern search algorithms for bound constrained minimization. SIAM Journal on Optimization 9(4), 1082–1099 (1999)
Article MathSciNet MATH Google Scholar
Li, J., Fine, J.: ROC analysis with multiple classes and multiple tests: methodology and its application in microarray studies. Biostatistics 9(3), 566–576 (2008)
Article MATH Google Scholar
Luo, J., Xiong, C.: Diagtest3grp: An r package for analyzing diagnostic tests with three ordinal groups. J Stat Softw. 51(3), 1–24 (2012)
Article MathSciNet Google Scholar
Maiti, R., Li, J., Das, P., Liu, X., Feng, L., Hausenloy, D.J., Chakraborty, B.: A distribution-free smoothed combination method to improve discrimination accuracy in multi-category classification. Statistical Methods in Medical Research 32(2), 242–266 (2023)
Article MathSciNet Google Scholar
Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics 11, 431–441 (1963)
Article MathSciNet MATH Google Scholar
Martelli, E., Amaldi, E.: Pgs-com: A hybrid method for constrained non-smooth black-box optimization problems: Brief review, novel algorithm and comparative evaluation. Computers and Chemical Engineering 63, 108–139 (2014)
Article Google Scholar
Martinez, J., Sobral, F.: Constrained derivative-free optimization on thin domains. Journal of Global Optimization 56(3), 1217–1232 (2003)
Article MathSciNet MATH Google Scholar
Pepe, M., Cai, T., Longton, G.: Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62(1), 221–229 (2006)
Article MathSciNet MATH Google Scholar
Pepe, M., Thompson, M.: Combining diagnostic test results to increase accuracy. Biostatistics 1(2), 123–140 (2000)
Article MATH Google Scholar
Potra, F., Wright, S.: Interior-point methods. Journal of Computational and Applied Mathematics 4, 281–302 (2000)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)
Torczon, V.: On the convergence of pattern search algorithms. SIAM Journal on Optimization 7, 1–25 (1997)
Article MathSciNet MATH Google Scholar
Youden, W.: Index for rating diagnostic tests. Cancer 3, 32–35 (1950)
Article Google Scholar
Zhang, Y., Li, J.: Combining multiple markers for multi-category classification: An ROC surface approach. Australian & New Zealand Journal of Statistics 53(1), 63–78 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

I would really like to thank Dr. Rudrodip Majumdar, Dr. Debraj Das and Dr. Subhashis Ghoshal for helping me with their valuable comments to improve the first version of this draft.

Author information

Authors and Affiliations

Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
Priyam Das

Authors

Priyam Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priyam Das.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A : RMPS and Generalized Pattern Search

Here we discuss some of the key features of RMPS along with the differences of RMPS with a few existing Pattern Search based optimization techniques (e.g., Torczon (1997)). Firstly, the restart strategy with smaller step-size decay rate is something which is possibly proposed for the first time in the context of Pattern search, to the best of our knowledge. Secondly, unlike algorithm 1 of Torczon (1997), instead of unconstrained minimization, the proposed algorithm minimizes the black-box function on a hyper-rectangle. In Fermi and Metropolis (1952) and Torczon (1997), the coordinate-wise jump sizes were kept equal inside an iteration while in the proposed algorithm, the domain of each coordinate being bounded, in every iteration, local-step sizes are modified separately for each coordinates in each direction (positive and negative), as required. In GPS, each coordinate-wise jump step-sizes are evaluated using ‘exploratory moves algorithm’ (see Torczon (1997)) while in the proposed algorithm it is straightforward and does not use ‘exploratory moves algorithm’. While optimizing a function on a hyper-rectangle, since the domain is transformed into an unit hyper-cube, the global step-size is kept same for each coordinate. So while determining the step-sizes of coordinate-wise movements, the proposed algorithm uses different strategy than the ‘exploratory moves algorithm’. The most unique feature of the proposed algorithm is the restart strategy as described in the main paper, in details.

Appendix B : Tuning parameters and their roles

In this section, a brief description of the tuning parameters and their roles are provided which are as follows.

step decay rate ($\rho $) : $\rho $ determines the rate of change of global step size at the end of each iteration. . So it is understandable that the value of $\rho $ must be greater than 1. Taking smaller values of $\rho $ will make the decay of step sizes slower, which would allow finer search within the domain at the cost of more computation time.
step size threshold ($\phi $) : $\phi $ controls the precision of the solution. This is the minimum possible value that the global step size and the local step sizes could take. Once the global step size reaches below $\phi $, the run is terminated. Taking $\phi $ to be smaller results in better precision in the cost of higher computation time.
$\textit{tol\_fun}$ : $\textit{tol\_fun}$ denotes the minimum amount of improvement after an iteration in the solution which is required to keep the value of global step size unchanged. In other words, if the differences of solutions obtained in two consecutive iterations is less than tol_fun, the improvement is considered to be ‘not significant’ and in that scenario, the global step size is decreased for next iteration for employing a finer search.
$\textit{tol\_fun\_2}$ : The second run onwards, whenever a run terminates, it is checked whether the solution returned by the current run is the same or different with the solution returned by the previous run. However, to check whether they are exactly equal, they need to be matched up to several decimal places depending on the type of storage variable used by the software. Thus it might result into performance of a lot of extra runs just to improve the solution at distant decimal places which might be not of our interest. Therefore, once the euclidean distance of solution points obtained from two consecutive runs become less than $\textit{tol\_fun\_2}$, the algorithm is terminated and the final result is returned.

Table 4 Domains of search regions of the benchmark functions considered in Section 4 in the main paper

Full size table

Appendix C : Additional comparative performance study of RMPS

The domains of the optimization problems considered in Table 1 of the main paper is provided in Table 4. Figures 5 and 6 can be used as an aid for visualization corresponding to the theoretical properties of RMPS shown in Section 3 of the main paper.

1.1 Exploiting convexity

The prior knowledge of convexity can be used to save computation time using RMPS. In order to improve computation time for minimizing convex functions, we also consider RMPS taking $\rho =4$ which employs steeper decrease in global step size and local step sizes. In Table 5, a comparison study of performances of RMPS, and changed RMPS with the prior knowledge of convexity (RMPS(c), (i.e., max_runs=1, $\rho =4$ with default values of other parameters), GA and SA are provided for minimizing Sphere and Sum squares function for various dimensions starting from 10 randomly generated starting points (under 10 random number generating seeds in MATLAB) in each cases. It is noted that in each cases, RMPS(c) performs faster than RMPS. Also in terms of computation times, using RMPS(c) up to 40 folds improvement is observed compared to GA and up to 92 folds improvement is observed compared to SA.

In Table 6 we note down the comparative performance results of RMPS, GA and SA when the true solution lies at a boundary point of the domain.

Table 5 Comparative study of RMPS, RMPS(c)(i.e., max_runs=1, $\rho =4$ with default values of other parameters)

Full size table

Table 6 Comparative study of RMPS, GA and SA for minimizing 100 and 1000 dimensional Ackley, Griewank, Rastrigin, Schwefel, Sphere and Sum squares functions where the true solution is a boundary point. Benchmark functions are minimized starting from 10 randomly generated starting points from the corresponding domains using RMPS, GA and SA. The minimum of the 10 obtained minimum values of the objective functions is noted down for each method. For RMPS, also the maximum of these 10 obtained minimum objective function values is noted down as well

Full size table

Appendix D : Optimization on Hypervolume under manifolds

Upper and Lower Bound Approach (ULBA) :

Hsu and Chen (2016) showed the following equality holds true

$$\max \{0, (H-1)P_{A}(\varvec{\beta }) - (H-2)\} \le D(\varvec{\beta }) \le P_{H}(\varvec{\beta }),$$

where $P_{A}(\varvec{\beta })$ and $P_{H}(\varvec{\beta })$ are defined by

$$P_{A}(\varvec{\beta }) =\frac{1}{H-1} \displaystyle \sum _{j=1}^{H-1}P(\varvec{\beta }^{T}\textbf{X}_{j+1}>\varvec{\beta }^{T}\textbf{X}_{j}), P_{H}(\varvec{\beta }) = \min _{1 \le j \le H-1} P(\varvec{\beta }^{T}\textbf{X}_{j+1}>\varvec{\beta }^{T}\textbf{X}_{j}).$$

They proposed to maximize $P_{A}(\varvec{\beta })$ in order to obtain the optimal biomarker combination. Compared to EHUM, ULBA objective function is much less expensive to compute. However, rest of the challenges for maximizing EHUM (e.g., discontinuity, possible multi-modal nature) also applies here as well.

Step-down algorithm :

The step-down algorithm for maximizing any given objective function is given as follows:

1.
Step 1. EHUM values of the individual biomarkers are computed and based on their individual EHUM values, they are arranged in decreasing order. Suppose $X_{(1)}$ and $X_{(d)}$ denote the biomarkers with the highest and the lowest individual EHUM values, respectively.
2.
Step 2. The first two biomarkers with the highest EHUM values are taken and combined as $V_2=X_{(1)} + \lambda _2 X_{(2)}$, where $\lambda _2$ is a parameter that needs to be estimated.
3.
Step 3. The objective function for the combined biomarker $V_2$ is maximized with respect to $\lambda _2$. Let $\widehat{V}_2 = X_{(1)} + \widehat{\lambda }_2 X_{(2)}$ denote the updated combination vector.
4.
Step 4. For $i = 3,\ldots ,d$ define $V_i = \widehat{V}_{i-1}+\lambda _i X_{(i)}$ and maximize the objective function with respect to $\lambda _i$. The combination vector obtained at i-th step is given by $\widehat{\lambda }_i$.

The estimated optimal marker using step-down algorithm is given by $\widehat{V}_d = X_{(1)} + \widehat{\lambda }_2 X_{(2)} + \cdots + \widehat{\lambda }_d X_{(d)}$.

Appendix E : On parallel computation with RMPS : Matrix Completion Problem with SCAD penalty

RMPS is parallelizable and up to 2n parallel threads can be used while solving a n-dimensional black-box problem. However in the simulation study part, the time required for optimizing each function is noted down for single thread computing only. To perform parallel computing in MATLAB, in case parfor loop is used instead of for loop, depending on the operations performed within the loops (e.g., objective function evaluation), a scenario might arise where for loop works faster than parfor loop. Because at the beginning of the parfor loop, an amount of time is spent in distributing the parallelizable works to different workers and after termination of parallel jobs, some amount of time is spent for gathering the results. However no such additional time is spent when job is done in single thread (via for loop). So, in case the objective function is not computationally expensive enough compared to the amount of time required for distribution and collection of results in parallel computing (using parfor loop), it is possible for the multi-threaded job to be more time consuming compared to the single-thread computation. We note that for all the considered simulation experiments, time required using single thread computation is faster than using parallel computation. Therefore, all the results regarding computation times are noted down for single threaded computing only. However it might not be the case if the objective function is actually computationally very expensive. In order to show the benefit of using parallel computation with RMPS we consider the following case study.

The problem of recovering an unknown matrix from only a given fraction of its entries is known as matrix completion problem. Candes and Recht (2009) first proposed a method to recover a matrix from a few given entries solving a convex optimization problem. Later to solve this problem, Candes and Tao (2010) minimized nuclear norm of the matrix subject to the constraint that the given entries of the matrix should be the same. In other words, suppose we have a matrix $\textbf{Y}=(y_{ij})_{n \times n}$ with some missing values. In that case, as mentioned in Candes and Tao (2010), the complete matrix $\textbf{Y}$ can be obtained by solving the following problem,

$$\begin{aligned}&\text {minimize :} \; ||\textbf{X}||_* \\&\text {subject to :} \; x_{ij}=y_{ij} \;\text {for all observed} \; (i,j), \end{aligned}$$

where $||\textbf{M}||_*=\sum _i\sigma _i(\textbf{M})$ denotes the nuclear norm, $\sigma _i(\textbf{M})$ being the i-th singular value of matrix $\textbf{M}$. This problem can be solved using convex optimization technique. On a closer look, it can be noted that minimizing nuclear norm in this fashion in similar to the LASSO (Tibshirani (1996)) penalty term. Fan and Li (2001) proposed Smoothly Clipped Absolute Deviation (SCAD) penalty which was shown to have more desirable properties compared to LASSO for solving shrinkage based variable selection problems. But unlike LASSO, SCAD penalty is not a convex minimization (or concave maximization) problem. In this section, the matrix completion problem is solved using the SCAD penalty with RMPS. The matrix completion problem using SCAD penalty can be re-formulated as

$$\begin{aligned}&\text {minimize :} \; \sum _i f(\sigma _i(\textbf{X})) \nonumber \\&\text {subject to :} \; x_{ij}=y_{ij} \;\text {for all observed} \; (i,j), \end{aligned}$$

(1)

where $\sigma _i(\textbf{X})$ are singular values and $f_i$ is the SCAD penalty function dependent of tuning parameters $\lambda $ and $a\;(=3.7)$ (see Fan and Li (2001)).

We consider a picture (Fig. 7) with $61\times 61$ pixels where approximately half (1877 to be precise) of its pixels are missing. The problem given by Eq. 1 can been seen as a black-box function of dimension of 1877 (i.e., the number of missing pixels). It is also known that the numerical value of grey level of each pixel must be between 0 and 255. This problem is solved using RMPS method. We fit the model for 30 values of $\lambda $ which are $\{100,200, \ldots , 3000\}$ and only the best visual output ($\lambda =900$) is reported in Fig. 8.

Unlike the objective functions considered in the performance evaluation studies in the main paper, the evaluation of SCAD penalty based on the singular values of the matrix is very computationally intense. Thus, unlike previous cases, here using parallel computing is noted to be beneficial. It should be noted that this is a 1877 dimensional problem and therefore up to 3754 parallel threads can be used while solving it using RMPS algorithm. We use 4 parallel threads to derive the complete image given in Fig. 8. For comparison of computation time required by single threading and parallel threading with 4 threads, the required computation times for first 50, 100 and 200 iterations are provided for all cases in Table 7. We get more than 3 folds improvement in computation time using parallel threading (with 4 threads) instead of single threading.

Table 7 Computation times (in seconds) required for first 50, 100 and 200 iterations of RMPS while solving matrix completion problem with SCAD penalty using single thread and 4 parallel threads

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, P. Black-box optimization on hyper-rectangle using Recursive Modified Pattern Search and application to ROC-based Classification Problem. Sankhya B 85, 365–404 (2023). https://doi.org/10.1007/s13571-023-00312-w

Download citation

Received: 27 June 2020
Accepted: 27 August 2023
Published: 12 October 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s13571-023-00312-w

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Black-box optimization on hyper-rectangle using Recursive Modified Pattern Search and application to ROC-based Classification Problem

Abstract

Access this article

Similar content being viewed by others

The importance of implementation details and parameter settings in black-box optimization: a case study on Gaussian estimation-of-distribution algorithms and circles-in-a-square packing problems

Lipschitz-inspired HALRECT algorithm for derivative-free global optimization

On Jackknifed Greedy Algorithms and Their Applications in NMR

References

Acknowledgements