Abstract
In this paper, mixed-integer nonsmooth constrained optimization problems are considered, where objective/constraint functions are available only as the output of a black-box zeroth-order oracle that does not provide derivative information. A new derivative-free linesearch-based algorithmic framework is proposed to suitably handle those problems. First, a scheme for bound constrained problems that combines a dense sequence of directions to handle the nonsmoothness of the objective function with primitive directions to handle discrete variables is described. Then, an exact penalty approach is embedded in the scheme to suitably manage nonlinear (possibly nonsmooth) constraints. Global convergence properties of the proposed algorithms toward stationary points are analyzed and results of an extensive numerical experience on a set of mixed-integer test problems are reported.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The following mixed-integer nonlinearly constrained problem is considered
where \(x\in {\mathbb {R}}^n\), \(l,u\in {\mathbb {R}}^n\), and \(I^c\cup I^z = \{1,{2,}\ldots ,n \}\), with \(I^c \cap I^z = \emptyset\) and \(I^c,I^z\ne \emptyset\).Footnote 1 We assume \(l_i < u_i\) for all \(i \in I^c\cup I^z\), and \(l_i,u_i\in {\mathbb {Z}}\) for all \(i\in I^z\). Moreover, the functions \(f: {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) and \(g: {\mathbb {R}}^n\rightarrow {\mathbb {R}}^m\), which may be nondifferentiable, are supposed to be Lipschitz continuous with respect to \(x_i\) for all \(i\in I^c\), i.e., for all \(x,y\in {\mathbb {R}}^n\) a constant \(L>0\) exists such that
The following sets are defined
Throughout the paper X is assumed to be a compact set. Therefore, \(l_i\) and \(u_i\) cannot be infinite.
Remark 1
Note that \(X\cap {{\mathcal {Z}}}\) is compact too. In fact, let us consider any sequence \(\{x_k\}\subseteq X\cap {\mathcal {Z}}\) such that \(x_k\rightarrow {\bar{x}}\). Since X is compact, \({\bar{x}}\in X\). Furthermore, for k sufficiently large, the integer components of \(x_k\) are fixed, then \({\bar{x}}\in {\mathcal {Z}}\). Hence, \({\bar{x}}\in X\cap {\mathcal {Z}}\), meaning that \(X\cap {\mathcal {Z}}\) is compact too.
Problem (1.1) can hence be reformulated as follows
The objective and constraint functions in (1.3) are assumed to be of black-box zeroth-order type, which is to say that the analytical expression is unknown, and the function value corresponding to a given point is the only available information. Therefore, black-box Mixed-Integer Nonlinear Programs (MINLPs) are considered, a class of challenging problems frequently arising in real-world applications. Those problems are usually solved through tailored derivative-free optimization algorithms (see, e.g., [8, 12, 15, 30] and references therein for further details) able to properly manage the presence of both continuous and discrete variables.
The optimization methods for black-box MINLPs that we consider in here can be divided into two main classes: direct-search and model-based methods. The direct-search methods for MINLPs usually share two main features: they perform an alternate minimization between continuous and discrete variables, and use a fixed neighborhood to explore the integer lattice. In particular, [4] adapts the Generalized Pattern Search (GPS), proposed in [50], to solve problems with categorical variables (those variables include integer variables as a special case), so-called mixed variable problems. The approach in [4] has been then extended to address problems with general constraints [1] and stochastic objective function [49]. In [1], constraints are tackled by using a filter approach similar to the one described in [5]. Derivative-free methods for categorical variables and general constraints have also been studied in [37] and [2]. In particular, [37] proposes a general algorithmic framework whose global convergence holds for any continuous local search (e.g., a pattern search) satisfying suitable properties. The Mesh Adaptive Direct Search (MADS), originally introduced in [6] for nonsmooth problems under general constraints, is extended in [2] to solve mixed variable problems. Constraints are tackled through an extreme barrier approach in this case. The original MADS algorithm has been recently extended in [11] to solve problems with “granular variables, i.e., variables with fixed number of decimals”, and nonsmooth objective function over the continuous variables. In addition to the previous references [1, 2, 4, 5, 11, 37, 50], another work that is worth mentioning is [45], where a mesh-based direct-search algorithm is proposed for bound constrained mixed-integer problems involving nonsmooth and noncontinuous objectives.
In [33], three algorithms are proposed for bound constrained MINLP problems. Unlike the aforementioned works, the discrete neighborhood does not have a fixed structure, but depends on a linesearch-type procedure. The first algorithm in [33] performs a distributed minimization over all the variables by updating the current iterate as soon as a point ensuring a sufficient decrease of the objective function is found. It was extended in [34], which deals with the constrained case by adopting a sequential penalty approach, and [52], where the maximal positive basis is replaced with a minimal positive basis based on a direction-rotation technique. Bound constrained MINLP problems are also considered in [23], which extends the algorithm for continuous smooth and nonsmooth objective functions introduced in [22].
Some other direct-search methods not directly connected with MINLP problems are reported for their influence on algorithm development. In [21], the authors propose a new linesearch-based method for nonsmooth nonlinearly constrained optimization problems, ensuring convergence towards Clarke-Jahn stationary points. The constraints are tackled through an exact penalty approach. In [17] and [18], the authors analyze the benefit in terms of efficiency deriving from different ways of incorporating the simplex gradient into direct-search algorithms (e.g., GPS and MADS) for minimizing objective functions which not necessarily require continuous differentiability. In [51], the authors analyze the convergence properties of direct-search methods applied to the minimization of discontinuous functions.
Model-based methods are also widely used in derivative-free optimization to solve MINLPs. In [16], the authors describe an open-source library, called RBFOpt, that uses surrogate models based on radial basis functions for handling bound constrained MINLPs. The same class of problems is also tackled in [44] through quadratic models. This paper extends to the mixed-integer case the trust-region derivative-free algorithm BOBYQA introduced in [46] for continuous problems. Surrogate models employing radial basis functions are used in [43] to describe an algorithm, called SO-MI, able to converge to global optimizers of the problem almost surely. A similar algorithm, called SO-I, is proposed by the same authors in [42] to address integer global optimization problems. In [41], the authors propose an algorithm for MINLP problems that modifies the sampling strategy used in SO-MI and uses also an additional local search. Finally, Kriging models were effectively used in [27] and [25] to develop new sequential algorithms. Models can also be used to boost direct-search methods. For example, in NOMAD, i.e., the software package that implements the MADS algorithm (see [9, 32]), a surrogate-based model is used to generate promising points.
In [31] and [35] methods for black-box problems with only unrelaxable integer variables are devised. In particular, the authors in [31] propose a method for minimizing convex black-box integer problems that uses secant functions interpolating previous evaluated points. In [35], a new method based on a nonmonotone linesearch and primitive directions is proposed to solve a more general problem where the objective function is allowed to be nonconvex. The primitive directions allow the algorithm to escape bad local minima, thus providing the potential to find a global optimum, even if this typically requires the exploration of large neighborhoods.
In this paper, new derivative-free linesearch-type algorithms for mixed-integer nonlinearly constrained problems with possibly nonsmooth functions are proposed. The strategies successfully tested in [21] and [35] for continuous and integer problems, respectively, are combined to devise a globally convergent algorithmic framework that allows tackling the mixed-integer case. Continuous and integer variables are suitably handled by means of specific local searches in this case. On the one hand, a dense sequence of search directions is used to explore the subspace related to the continuous variables and detect descent directions, whose cone can be arbitrarily narrow due to nonsmoothness. On the other hand, a set of primitive discrete directions is adopted to guarantee a thorough exploration of the integer lattice in order to escape bad local minima. A first algorithm for bound constrained problems is developed, then it is adapted to handle the presence of general nonlinear constraints by using an exact penalty approach. Since only the violation of such constraints is included in the penalty function, the algorithm developed for bound constrained problems can be easily adapted to minimize the penalized problem.
With regard to the convergence results, it can be proved that particular sequences of iterates yielded by the two algorithms converge to suitably defined stationary points of the problem considered. In the generally constrained case, this result is based on the equivalence between the original problem and the penalized problem.
The paper is organized as follows. In Sect. 2, we report some definitions and preliminary results. In Sect. 3, we describe the algorithm proposed for mixed-integer problems with bound constraints and we analyze its convergence properties. The same type of analysis is reported in Sect. 4 for the algorithm addressing mixed-integer problems with general nonlinear constraints. Section 5 describes the results of extensive numerical experiments performed for both algorithms. Finally, in Sect. 6 we include some concluding remarks and we discuss future work.
2 Notation and preliminary results
Given a vector \(v\in {\mathbb {R}}^n\), we introduce the subvectors \(v_c\in {\mathbb {R}}^{|I^c|}\) and \(v_z\in {\mathbb {R}}^{|I^z|}\), given by
where \(v_i\) denotes the i-th component of v. When a vector is an element of an infinite sequence of vectors \(\{v_k\}\), the i-th component will be denoted as \((v_k)_i\), in order to avoid possible ambiguities. Moreover, throughout the paper we denote by \(\Vert \cdot \Vert\) the Euclidean norm.
The search directions considered in the algorithms proposed in the next sections have either a null continuous subvector or a null discrete subvector, meaning that we do not consider directions that update both continuous and discrete variables simultaneously. We first report the definition of primitive vector, used to characterize the subvectors of the search directions related to the integer variables. Then we move on to the properties of the subvectors related to the continuous variables.
From [35] we report the following definition of primitive vector.
Definition 1
(Primitive vector) A vector \(v\in {\mathbb {Z}}^n\) is called primitive if the greatest common divisor of its components \(\{v_1,{v_2,}\dots ,v_n\}\) is equal to 1.
Since the objective and constraint functions of the problem considered are assumed to be nonsmooth when fixing the discrete variables, proving convergence to a stationary point requires particular subsequences of the continuous subvectors of the search directions to be provided with the density property. Since the feasible descent directions can form an arbitrarily narrow cone (see, e.g., [3] and [6]), a finite number of search directions is indeed not sufficient. The unit sphere with respect to the continuous variables with center in the origin is denoted as
Then, the definition of a dense subsequence of directions given in [21] is extended to the mixed-integer case.
Definition 2
(Dense subsequence) Let K be an infinite subset of indices (possibly \(K=\{0,1,\dots \}\)) and \(\{s_k\} \subset S(0,1)\) a given sequence of directions. The subsequence \(\{s_k\}_K\) is said to be dense in S(0, 1) if, for any \({\bar{s}}\in S(0,1)\) and for any \(\epsilon > 0\), there exists an index \(k\in K\) such that \(\Vert s_k-{\bar{s}}\Vert \le \epsilon\).
Similarly to what is done in [2], the definition of generalized directional derivative, which is also called Clarke directional derivative, given in [14] is extended to the mixed-integer case. This allows providing necessary optimality conditions for Problem (1.3). We also recall the definition of generalized gradient.
Definition 3
(Generalized directional derivative and generalized gradient) Let \(h: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) be a Lipschitz continuous function near \(x \in {\mathbb {R}}^n\) with respect to its continuous variables \(x_c\) [see, e.g., (1.2)]. The generalized directional derivative of h at x in the direction \(s \in {\mathbb {R}}^n\), with \(s_i = 0\) for \(i \in I^z\), is
To simplify the notation, the generalized gradient of h at x w.r.t the continuous variables can be redefined as
Moreover, let us denote the orthogonal projection over the set X as \([x]_{[l,u]}=\max \{l,\min \{u,x\}\}\) and the interior of a set \({\mathcal {C}}\) as \(\mathop {\mathcal{C}}\limits^{ \circ }\). These concepts will be used throughout the paper.
2.1 The bound constrained case
First, a simplified version of Problem (1.1) is considered, where only bound constraints are present in the definition of the feasible set. The resulting problem will also allow tackling the nonlinearly constrained case. An exact penalty approach is indeed adopted to deal with the general nonlinear constraints, thus giving rise to a bound constrained problem in the end. In particular, the following formulation is considered in this section:
Such a problem can be reformulated as follows
When considering Problem (2.2), the basic concept of feasible direction must be specialized to take into proper account the presence of discrete variables along with continuous ones. In particular, for discrete variables, primitive directions can be used to define the set of feasible directions. We hence suitably adapt the definition of feasible primitive direction set given in [35] to the mixed-integer case.
Definition 4
(Set of feasible primitive directions) Given a point \(x\in X\cap {{\mathcal {Z}}}\),
is the set of feasible primitive directions at x with respect to \(X\cap {\mathcal {Z}}\).
We introduce two definitions of neighborhood related to the discrete variables and we recall the definition of neighborhood related to the continuous variables. They are used to formally define local minimum points.
Definition 5
(Discrete neighborhood) Given a point \({\bar{x}}\in X\cap {\mathcal {Z}}\), the discrete neighborhood of \({\bar{x}}\) is
Definition 6
(Continuous neighborhood) Given a point \({\bar{x}}\in X\cap {\mathcal {Z}}\) and a scalar \(\rho {> 0}\), the continuous neighborhood of \({\bar{x}}\) is
Then, the definition of local minimum points for the bound constrained Problem (2.2) is reported. Basically, a point is referred to as a local minimum when:
-
(i)
it is a local minimum w.r.t. the continuous variables;
-
(ii)
it is a local minimum w.r.t. its discrete neighborhood.
Definition 7
(Local minimum point) A point \(x^*\in X\cap {{\mathcal {Z}}}\) is a local minimum point of Problem (2.2) if, for some \(\epsilon >0\),
Now, taking into account the presence of the bound constraints in Problem (2.2), the cone of feasible continuous directions is introduced. This set is used to define stationary points for Problem (2.2).
Definition 8
(Cone of feasible continuous directions) Given a point \(x\in X\cap {\mathcal {Z}}\), the set
is the cone of feasible continuous directions at x with respect to \(X\cap {\mathcal {Z}}\).
Now, the definition of Clarke–Jahn generalized directional derivative given in [28, Section 3.5] is extended to the mixed-integer case. As opposed to the Clarke directional derivative, in this definition the limit superior is considered only for points y and \(y + ts\) in \(X\cap {{\mathcal {Z}}}\), thus requiring stronger assumptions.
Definition 9
(Clarke–Jahn generalized directional derivative) Given a point \(x \in X\cap {{\mathcal {Z}}}\) with continuous subvector \(x_c\), the Clarke–Jahn generalized directional derivative of function f along direction \(s \in D^c(x)\) is given by:
Finally, a few basic stationarity definitions are reported which will be used in the convergence analysis. More specifically, it will be proved that limit points of the sequence generated by the proposed algorithmic framework exist which are stationary for Problem (2.2).
Definition 10
(Stationary point) A point \(x^*\in X\cap {{\mathcal {Z}}}\) is a stationary point of Problem (2.2) when
Definition 11
(Clarke stationary point) A point \(x^*\in X\cap {{\mathcal {Z}}}\) is a Clarke stationary point of Problem (2.2) when it satisfies
2.2 The nonsmooth nonlinearly constrained case
In this subsection, Problem (1.3) is considered. A local minimum point for this problem is defined as follows.
Definition 12
(Local minimum point) A point \(x^{*}\in {{\mathcal {F}}}\cap {{\mathcal {Z}}} \cap X\) is a local minimum point of Problem (1.3) if, for some \(\epsilon >0\),
Exploiting the necessary optimality conditions introduced in [21], the following KKT stationarity definition can be stated. This definition will also be used in the convergence analysis. More specifically, it will be proved that, using a simple penalty approach, limit points of the sequence generated by the proposed algorithmic framework exist which are KKT stationary for Problem (1.3).
Definition 13
(KKT stationary point) A point \(x^* \in {{\mathcal {F}}} \cap {{\mathcal {Z}}} \cap X\) is a KKT stationary point of Problem (1.3) if there exists a vector \(\lambda ^{*}\in {\mathbb {R}}^m\) such that, for every \(s\in D^c(x^{*})\),
and
3 An algorithm for bound constrained problems
In this section, an algorithm for solving the mixed-integer bound constrained problem defined in Problem (2.2) is proposed, and its convergence properties are analyzed. The optimization over the continuous and discrete variables is performed by means of two local searches based on linesearch algorithms that explore the feasible search directions similarly to the procedures proposed in [21, 33, 34, 36]. In particular, the Projected Continuous Search described in Algorithm 1 and the Discrete Search described in Algorithm 2 are the methods adopted to investigate the directions associated with the continuous and discrete variables, respectively. The idea behind the line searches is to return a positive stepsize \(\alpha\), namely to update the current iterate, whenever a point providing a sufficient reduction of the objective function is found. In Algorithm 1 the sufficient decrease is controlled by the parameter \(\alpha\), while in Algorithm 2 the same role is played by \(\xi\). Once such a point is determined, an expansion step is performed in order to explore if the sufficient reduction may be achieved through a larger stepsize.
The algorithm for bound constrained problems proposed in this section is called DFNDFL because its two main phases are inspired by two previously proposed algorithms. These two algorithms are named, respectively, DFN (an algorithm for continuous nonsmooth optimization, see [21]) and DFLINT (an algorithm for mixed-integer optimization, see [35]). They can be both freely downloaded from the DFL library at http://www.iasi.cnr.it/~liuzzi/DFL. DFNDFL performs an alternate minimization along continuous and discrete variables and is hence divided into two phases (see Algorithm 3 for a detailed scheme). Starting from a point \(x_0 \in X\cap {\mathcal {Z}}\), in Phase 1 the minimization over the continuous variables is performed by using the Projected Continuous Search (Step 7). If the line search fails, i.e., \(\alpha ^c_k = 0\), the tentative stepsize for continuous search directions is reduced (Step 9), otherwise the current iterate is updated (Step 11). Then, in Phase 2.A, the directions in the set \(D \subset D^z({\tilde{x}}_k)\), where \({\tilde{x}}_k\) is the current iterate obtained at the end of Phase 1, are investigated through the Discrete Search (Step 16). If the stepsize returned by the line search performed along a given primitive direction is 0, the corresponding tentative stepsize is halved (Step 18), otherwise the current iterate is updated (Step 20). The directions in D are explored until either a point leading to a sufficient decrease in the objective function is found or D contains no direction to explore. Note that the strategy starts with a subset of \(D^z(x_0)\), namely D, and gradually adds directions to it (see Phase 2.B) throughout the iterations. This choice enables the algorithm to reduce the computational cost. If \({\tilde{x}}_k\) obtained at the end of Phase 1 is not modified in Phase 2.A or a direction in \(D_k\) along which the Discrete Search does not fail with \({\tilde{\alpha }}_{k}^{(d)}=1\) exists, \(D_{k+1}\) is set equal to \(D_k\) and D is set equal to \(D_{k+1}\) (Step 32). Otherwise, \(\xi _k\) is reduced (Step 24) and, if all the feasible primitive discrete directions at \({\tilde{x}}_k\) have been generated, \(D_{k+1}\) and D are not changed compared to the previous iteration (Step 26). Instead, when \(D_k \subset D^z({\tilde{x}}_k)\), \(D_k\) is enriched with new feasible primitive discrete directions (Steps 28–29) and the initial tentative stepsizes of the new directions are set equal to 1.
It is worth noticing that the positive parameter \(\xi _k\) plays an important role in the algorithm, since it rules the sufficient decrease of the objective function value within the Discrete Search. The update of the parameter is performed at Step 24. More specifically, we shrink the value of the parameter when both the current iterate is not updated in Phase 2.A and the Discrete Search fails (with \({\tilde{\alpha }}_{k}^{(d)}=1\)) along each direction in \(D_k\). Hence, Algorithm DFNDFL is not the mere union of the two algorithms DFN [21], for continuous nonsmooth problems, and DFLINT [35], for integer problems. In order to prove convergence of the proposed algorithm to stationary points, optimization with respect to the discrete variables (i.e., phase 2.A of DFNDFL) must be indeed carried out by guaranteeing sufficient decrease with respect to the parameter \(\xi _k\). Such a parameter is possibly decreased in phase 2.B of the algorithm (when the whole iteration “fails”) and eventually goes to zero.
The following propositions guarantee that the algorithm is well-defined. In particular, Proposition 1 follows the same reasoning as [21, Proposition 2.4].
Proposition 1
The Projected Continuous Search cannot infinitely cycle between Step 4 and Step 6.
Proof
We assume by contradiction that in the Projected Continuous Search an infinite monotonically increasing sequence of positive numbers \(\{\beta _j\}\) exists such that \(\beta _j \rightarrow \infty\) for \(j \rightarrow \infty\) and
Since by the instructions of the procedures we have that \([w+\beta _j {{\tilde{p}}}]_{[l,u]} \in X \cap {{\mathcal {Z}}}\), the previous relation is in contrast with the compactness of X, by definition of compact set, and with the continuity of function f. These arguments conclude the proof. \(\square\)
Proposition 2
The Discrete Search cannot infinitely cycle between Step 2 and Step 4.
Proof
First, note that since \(X\cap {{\mathcal {Z}}}\) is compact, the maximum stepsize \({\bar{\alpha }}\) computed at Step 0 is finite. Let us consider the j-th iteration of the Discrete search, we have \(\beta = 2^j\min \{{\bar{\alpha }},{\tilde{\alpha }}\}\). Now, given the termination condition at Step 3, index j cannot exceed \(\lceil \log ({\bar{\alpha }}/\min \{{\bar{\alpha }},{\tilde{\alpha }}\})\rceil\), thus concluding the proof \(\square\)
The following proposition shows that the Projected Continuous Search returns stepsizes that eventually go to zero. This proposition will be used to prove stationarity with respect to the continuous variables.
Proposition 3
Let \(\{\alpha _k^{c}\}\) and \(\{{\tilde{\alpha }}_k^{c}\}\) be the sequences yielded by Algorithm DFNDFL. Then
Proof
The proof follows with minor modifications from the proof of Proposition 2.5 in [21] by considering that also here inequality (2.5) of [21] holds. \(\square\)
The following proposition will be used to show that every limit point of a particular subsequence of iterates is a local minimum with respect to the discrete variables.
Proposition 4
Let \(\{\xi _k\}\) be the sequence produced by Algorithm DFNDFL. Then
Proof
By the instruction of Algorithm DFNDFL, it follows that \(0< \xi _{k+1} \le \xi _k\) for all k, meaning that the sequence \(\{\xi _k\}\) is monotonically nonincreasing. Hence, \(\{\xi _k\}\) converges to a limit \(M \ge 0\). Suppose, by contradiction, that \(M > 0\). This implies that an index \({\bar{k}} > 0\) exists such that
for all \(k\ge {\bar{k}}\). Moreover, for every index \(k\ge {\bar{k}}\), a direction \(d \in D^z({\tilde{x}}_k)\) exists such that
where the inequalities follow from the instructions of Algorithms 2–3 and the equality follows from (3.2). Relation (3.3) implies \(f(x_k)\rightarrow -\infty\), which is in contrast with the assumption that f is continuous on the compact set X. This concludes the proof. \(\square\)
Remark 2
By the preceding proposition and the updating rule of the parameter \(\xi _k\) used in Algorithm DFNDFL, it follows that the set
is infinite.
The previous result is used to prove the next lemma, which in turn is essential to prove the global convergence result related to the continuous variables. This lemma states that the asymptotic convergence properties of the sequence \(\{s_k\}\) still hold when the projection operator is adopted. Its proof closely resembles the proof in [21, Lemma 2.6].
Lemma 1
Let \(\{x_k\}\) and \(\{s_k\}\) be the sequence of points and the sequence of continuous search directions produced by Algorithm DFNDFL, respectively, and \(\{\eta _k\}\) be a sequence such that \(\eta _k>0\), for all k. Further, let K be a subset of indices such that
with \({\bar{x}}\in X\cap {\mathcal {Z}}\) and \({\bar{s}} \in D^c({\bar{x}})\), \({\bar{s}}\ne 0\). Then,
-
(i)
for all \(k \in K\) sufficiently large,
$$\begin{aligned} {[}x_k+\eta _k s_k]_{[l,u]}\ne x_k, \end{aligned}$$ -
(ii)
the following limit holds
$$\begin{aligned} \displaystyle \lim _{k\rightarrow \infty , k\in K}v_k = {\bar{s}}, \end{aligned}$$where
$$\begin{aligned} v_k=\frac{[x_k+\eta _k s_k]_{[l,u]}-x_k}{\eta _k}. \end{aligned}$$(3.7)
Proof
Since \(x_k\in X\cap {\mathcal {Z}}\) and (3.4) holds, we have necessarily that \((x_k)_z = {\bar{x}}_z\) for \(k\in K\) sufficiently large. Now, the proof follows with minor modifications from the proof of Lemma 2.6 in [21]. \(\square\)
The convergence result related to the continuous variables is proved in the next proposition. It will be used in the main convergence result at the end of the section. It basically states that every limit point of the subsequence of iterates defined by the set H (see Remark 2) is a stationary point with respect to the continuous variables.
Proposition 5
Let \(\{x_k\}\) be the sequence of points produced by Algorithm DFNDFL. Let \(H\subseteq \{1,2,\dots \}\) be defined as in Remark 2 and let \({\bar{x}} \in X \cap {{\mathcal {Z}}}\) be any accumulation point of \(\{x_k\}_H\). If the subsequence \(\{s_k\}_H\), with \((s_k)_i=0\) for \(i \in I^z\), is dense in the unit sphere (see Definition 2), then \({\bar{x}}\) satisfies
Proof
For any accumulation point \({\bar{x}}\) of \(\{x_k\}_H\), let \(K\subseteq H\) be an index set such that
Notice that, for all \(k\in K\), \(({\tilde{x}}_k)_z = (x_k)_z\) and \({\tilde{\alpha }}_k^{(d)}=1\), \(d \in D_k\), by the instructions of Algorithm DFNDFL. Hence, for all \(k\in K\), by recalling (3.9), the discrete variables are no longer updated.
Now, recalling Proposition 3 and Lemma 1, and repeating the proof of [21, Proposition 2.7], it can be shown that no direction \({\bar{s}}\in D^c({\bar{x}})\cap S(0,1)\) can exist such that
thus concluding the proof. \(\square\)
The next proposition states that every limit point of the subsequence of iterates defined by the set H (see Remark 2) is a local minimum with respect to the discrete variables. It will be used in the main convergence result at the end of the section.
Proposition 6
Let \(\{x_k\}\), \(\{{\tilde{x}}_k\}\), and \(\{\xi _k\}\) be the sequences produced by Algorithm DFNDFL. Let \(H\subseteq \{1,2,\dots \}\) be defined as in Remark 2and \(x^* \in X \cap {{\mathcal {Z}}}\) be any accumulation point of \(\{x_k\}_H\), then
Proof
Let \(K \subseteq H\) be an index set such that
For every \(k\in K\subseteq H\), we have
meaning that the discrete variables are no longer updated by the Discrete Search.
Let us consider any point \({\bar{x}}\in {{\mathcal {B}}}^z(x^*)\). By the definition of discrete neighborhood \({{\mathcal {B}}}^z(x^*)\), a direction \({\bar{d}} \in D^z(x^*)\) exists such that
Recalling the steps in Algorithm DFNDFL, we have, for all \(k\in H\) sufficiently large, that
Further, by Proposition 3, we have
Then, for all \(k\in K\) sufficiently large, (3.11) and (3.12) imply
Hence, for all \(k\in K\) sufficiently large, by the definition of discrete neighborhood we have \({\bar{d}} \in D^z({\tilde{x}}_k)\) and
Then, since \(k \in K\subseteq H\), by the definition of H we have
Now, by Proposition 4, and taking the limit for \(k\rightarrow \infty\), with \(k\in K\), in (3.13), the result follows. \(\square\)
Now, the main convergence result of the algorithm can be proved.
Theorem 1
Let \(\{x_k\}\) be the sequence of points generated by Algorithm DFNDFL. Let \(H\subseteq \{1,2,\dots \}\) be defined as in Remark 2and let \(\{s_k\}_H\), with \((s_k)_i=0\) for \(i \in I^z\), be a dense subsequence in the unit sphere. Then,
-
(i)
a limit point of \(\{x_k\}_H\) exists;
-
(ii)
every limit point \(x^*\) of \(\{x_k\}_H\) is stationary for Problem (2.2).
Proof
As regards point (i), since \(\{x_k\}_H\) belongs to the compact set \(X \cap {{\mathcal {Z}}}\), it admits limit points. The prove of point (ii) follows by considering Propositions 5 and 6 . \(\square\)
4 An algorithm for nonsmooth nonlinearly constrained problems
In this section, the nonsmooth nonlinearly constrained problem defined in Problem (1.3) is considered. The nonlinear constraints are handled through a simple penalty approach (see, e.g., [21]). In particular, given a positive parameter \(\varepsilon > 0\), the following penalty function is introduced
which allows to define the following bound constrained problem
Hence, only the nonlinear constraints are penalized and the minimization is performed over the set \(X \cap {\mathcal {Z}}\). The algorithm described in Sect. 3 is thus suited for solving this problem, as highlighted in the following remark.
Remark 3
Observe that, for any \(\varepsilon >0\), the structure and properties of Problem (4.1) are the same as Problem (2.2). The Lipschiz continuity with respect to the continuous variables of the penalty function \(P(x;\varepsilon )\) follows by the Lipschitz continuity of f and \(g_i\), with \(i \in \{1,\dots ,m\}\). In particular, called \(L_f\) and \(L_{g_i}\) the Lipschitz constants of f and \(g_i\), respectively, we have that the Lipschitz constant of the penalty function \(P(x;\varepsilon )\) is
To prove the equivalence between Problem (1.3) and Problem (4.1), we report an extended version of the Mangasarian-Fromowitz Constraint Qualification (EMFCQ) [24, 39] for Problem (1.3), which takes into account its mixed-integer structure. This condition states that at a point that is infeasible for Problem (1.3), a direction feasible with respect to \(X \cap {\mathcal {Z}}\) (according to Definitions 4 and 8) that guarantees a reduction in the constraint violation exists.
Assumption 7
(EMFCQ for mixed-integer problems) Let us consider Problem (1.3). For any \(x\in (X\cap {\mathcal {Z}})\setminus {\mathop {{\mathcal {F}}}\limits ^{\circ }}\), one of the following conditions holds:
-
(i)
a direction \(s\in D^c(x)\) exists such that
$$\begin{aligned} (\xi ^{g_i})^\top s < 0, \end{aligned}$$for all \(\xi ^{g_i}\in {\partial _{c}} g_i(x)\) with \(i\in \{h \in \{1,{2,}\dots ,m\}: \ g_h(x)\ge 0\}\);
-
(ii)
a direction \({\bar{d}} \in D^z(x)\) exists such that
$$\begin{aligned} \sum _{i=1}^m \max \{0, g_i(x+{\bar{d}})\} < \sum _{i=1}^m \max \{0, g_i(x)\}. \end{aligned}$$
In order to prove the main convergence properties of the algorithm in this case, the equivalence between the original constrained Problem (1.3) and the penalized Problem (4.1) must be established first. The proof of this result is very technical and quite similar to analogous results from [21]. We report it in the Appendix for the sake of major clarity.
Exploiting this technical result, the algorithm proposed in Sect. 3 can be used to solve Problem (4.1), provided that the penalty parameter is sufficiently small, as stated in the next proposition. The algorithmic scheme designed for solving Problem (1.3) is obtained from Algorithm DFNDFL by replacing f(x) with \(P(x;\varepsilon )\), where \(\varepsilon >0\) is a sufficiently small value. We point out that in this new scheme both the linesearch procedures are performed by replacing f(x) with \(P(x;\varepsilon )\) as well. We refer to this new scheme as DFNDFL–CON.
Proposition 8
Let Assumption 7hold and let \(\{x_k\}\) be the sequence produced by Algorithm DFNDFL–CON. Let \(H\subseteq \{1,2,\dots \}\) be defined as in Remark 2and let \(\{s_k\}_H\), with \((s_k)_i=0\) for \(i \in I^z\), be a dense subsequence in the unit sphere. Then, \(\{x_k\}_H\) admits limit points. Furthermore, a threshold value \(\varepsilon ^*\) exists such that for all \(\varepsilon \in (0, \varepsilon ^*]\) every limit point \(x^*\) of \(\{x_k\}_H\) is stationary for Problem (1.3).
Proof
The proof follows from Proposition 14 and Theorem 1. \(\square\)
5 Numerical experiments
In this section, results of the numerical experiments performed on a set of test problems selected from the literature are reported. In particular, state-of-the-art solvers are used as benchmarks to test the efficiency and reliability of the proposed algorithm. First, the bound constrained case is considered, then nonlinearly constrained problems are tackled. In both cases, to improve the performance of DFNDFL, a modification to Phase 1 is introduced by drawing inspiration from the algorithm CS-DFN proposed in [21] for continuous nonsmooth problems. In particular, recalling that \(I^c \cup I^z = \{1,{2,}\ldots ,n\}\), the change consists in investigating the set of coordinate directions \(\{\pm e^1, {\pm e^2} \ldots , \pm e^{|I^c|}\}\) before exploring a direction from the sequence \(\{s_k\}\). Since this set is constant over the iterations, the actual stepsizes \(\alpha _k^{(i)}\) and tentative stepsizes \({\tilde{\alpha }}_k^{(i)}\) can be stored for each coordinate direction i, with \(i \in \{1,{2,}\ldots ,n\}\). These stepsizes are reduced whenever the projected continuous line search, i.e., Algorithm 1, does not determine any point that satisfies the sufficient decrease condition. When their values become sufficiently small, a direction from the dense sequence \(s_k\) is explored. This improvement allows the algorithm to benefit from the presence of the stepsizes \(\alpha _k^{(i)}\) and \({\tilde{\alpha }}_k^{(i)}\), whose values depend on the knowledge across the iterations of the sensitivity of the objective function over the coordinate directions. The use of those coordinate directions hence allows to somehow capture the local behaviour of the function through the actual/tentative stepsizes. So, we can take advantage of the information gathered in previous iterations through those stepsizes when searching for a new point. Furthermore, we can use the dense set of directions only when really needed (i.e., only when approaching a point where the function is actually nonsmooth). Therefore, the efficiency of the modified DFNDFL is expected to be higher.
The codes related to the DFNDFL and DFNDFL–CON algorithms, together with the test problems used in the experiments are freely available for download at the Derivative-Free Library web page http://www.iasi.cnr.it/~liuzzi/DFL/.
5.1 Algorithms for benchmarking
The algorithms selected as benchmarks are listed below:
-
DFL box (see [33]), a derivative-free linesearch algorithm for bound constrained problems.
-
DFL gen (see [34]), a derivative-free linesearch algorithm for nonlinearly constrained problems.
-
RBFOpt (see [16]), an open-source library RBFOpt for solving black-box bound constrained optimization problems with expensive function evaluations.
-
NOMAD v.3.9.1 (see [9]), a software package which implements the MADS algorithm.
-
MISO (Mixed-Integer Surrogate Optimization) framework, a model-based approach using surrogates [41].
All the algorithms reported above support mixed-integer problems, thus being suited for the comparison with the algorithm proposed in this work. The maximum allowable number of function evaluations in each experiment is 5000. All the codes have been run on an Intel Core i7 10th generation CPU PC running Windows 10 with 16GB of memory. More precisely, all the test problems, DFNDFL, DFL box, DFL gen and RBFOpt are coded in python and have been run using python 3.8. NOMAD, on the other hand, is delivered as a collection of C++ codes and has been run using the provided PyNomad python interface. As for MISO, it is coded in matlab and has been run using Matlab R2020b but using the python coded problems through the matlab python engine.
As regards the parameters used in both DFNDFL and DFL, the values used in the experiments are \(\gamma = 10^{-6}, \, \delta = 0.5, \, \xi _0 = 1\), and \(\theta = 0.5\). Moreover, the initial tentative stepsizes along the coordinate directions \(\pm e^i\) and \(s_k\) of the modified DFNDFL are
while for the discrete directions d the initial tentative stepsize \({\tilde{\alpha }}_0^{(d)}\) is fixed to 1.
Another computational aspect that needs to be further discussed is the generation of the continuous and discrete directions. Indeed, in Phases 1 and 2 of DFNDFL, new search directions might be generated to thoroughly explore neighborhoods of the current iterate. To this end, a dense sequence of directions \(\{s_k\}\) is required in Phase 1 to explore the continuous variables and, in particular, the Sobol sequence [13, 48] is used. Similarly, in Phase 2, new primitive discrete directions must be generated when some suitable conditions hold. In these cases, the Halton sequence [26] is used.
As concerns the parameters used for running RBFOpt and NOMAD, while the former is executed by using the default values, for the latter two different algorithms are considered. The first one is based on the default settings, while the second one results from disabling the usage of models in the search phase, which precisely is obtained by setting DISABLE MODELS. This second version is denoted in the remainder of this document as NOMAD (w/o mod).
5.2 Test problems
The comparison between Algorithm DFNDFL and some state-of-the-art solvers is reported for 24 bound constrained problems. The first 16 problems, which are related to minimax and nonsmooth unconstrained optimization problems, have been selected from [38, Sections 2 and 3], while the remaining 8 problems have been chosen from [41, 42]. The problems are listed in Table 1 along with the respective number of continuous (\(n_c\)) and discrete (\(n_z\)) variables.
Since the problems from [38] are unconstrained, in order to suit the class of problems addressed in this work, the following bound constraints are considered for each variable
where \({\tilde{x}}_0\) is the starting point. Furthermore, since the problems in [38] have only continuous variables, the rule applied to obtain mixed-integer problems is to consider a number of integer variables equal to \(n_z = \left\lfloor {n/2}\right\rfloor\) and a number of continuous variables equal to \(n_c = \left\lceil {n/2}\right\rceil\), where n denotes the dimension of each original problem and \(\left\lfloor {\cdot }\right\rfloor\) and \(\left\lceil {\cdot }\right\rceil\) are the floor and ceil operators, respectively.
More specifically, let us consider both the continuous bound constrained optimization problems from [38], whose formulation is
and the original mixed-integer bound constrained problems from [41, 42], which can be stated as
The resulting mixed-integer problem we deal with in here can be formulated as follows
where \(f(x) = {\tilde{f}}({\tilde{x}})\) with
Moreover, the starting point \(x_0\) adopted for Problem (5.3) is
As concerns constrained mixed-integer optimization problems, the performances of Algorithm DFNDFL–CON are assessed on 204 problems with general nonlinear constraints. Such problems are obtained by adding to the 34 bound-constrained problems defined by Problem (5.3) the 6 classes of constraints reported below and proposed in [29]
Thus, the number of general nonlinear constraints ranges from 1 to 59.
5.3 Data and performance profiles
The comparison among the algorithms is carried out by using data and performance profiles, which are benchmarking tools widely used in derivative-free optimization (see [20, 40]). In particular, given a set S of algorithms, a set P of problems, and a convergence test, data and performance profiles provide complementary information to assess the relative performance among the different algorithms in S when applied to solve problems in P. Specifically, data profiles allow gaining insight on the percentage of problems that are solved (according to the convergence test defined below) by each algorithm within a given budget of function evaluations. On the other hand, performance profiles allow assessing how well an algorithm performs with respect to the others. For each \(s\in S\) and \(p \in P\), the number of function evaluations required by algorithm s to satisfy the convergence condition on problem p is denoted as \(t_{p,s}\). Given a tolerance \(0< \tau < 1\) the convergence test is
where:
-
\(f(x_k)\) is the objective function value computed at \(x_k\). When dealing with problems with general constraints, we set to \(+\infty\) the value of the objective function at infeasible points;
-
\({\hat{f}}(x_0)\) is the objective function value of the worst feasible point determined by all the solvers (note that in the bound-constrained case, \({\hat{f}}(x_0) = f (x_0)\));
-
\(f_L\) is the smallest feasible objective function value computed by any algorithm on the considered problem within the given number of 5000 function evaluations.
The above convergence test requires the best point to achieve a sufficient reduction from the value \({\hat{f}}(x_0)\) of the objective function at the starting point. Note that the smaller the value of the tolerance \(\tau\) is, the higher accuracy is required at the best point. In particular, three levels of accuracy are considered in this paper for the parameter \(\tau\), namely, \(\tau \in \{10^{-1}, 10^{-3},10^{-5}\}\).
Performance and data profiles of solver s can be formally defined as follows
where \(n_p\) is the dimension of problem p. While \(\alpha\) indicates that the number of function evaluations required by algorithm s to achieve the best solution is \(\alpha\)–times the number of function evaluations needed by the best algorithm, \(\kappa\) denotes the number of simplex gradient estimates, with \(n_p + 1\) being the number associated with one simplex gradient. Important features for the comparison are \(\rho _s(1)\), which is a measure of the efficiency of the algorithm, since it is the percentage of problems for which the algorithm s performs the best, and the height reached by each profile as the value of \(\alpha\) or \(\kappa\) increases, which measures the reliability of the solver.
5.3.1 The bound constrained case
Figure 1 reports performance and data profiles related to the comparison of the algorithms that do not employ models, namely DFNDFL, DFL box and NOMAD (without models). In this case, DFNDFL turns out to be the most efficient and reliable algorithm, regardless of the accuracy required. DFL box is more efficient than NOMAD for all values of \(\tau\), whereas NOMAD is (slightly) more reliable for \(\tau = 10^{-1}\) and \(10^{-3}\). It is worth noticing that the remarkable percentage of problems solved for \(\alpha =1\) is an important result for DFNDFL since it shows that using more sophisticated directions than DFL box does not lead to a significant loss of efficiency. We also point out that the initial continuous and primitive search directions used by DFNDFL are the coordinate directions, which are the same as the ones employed in DFL box, thus leading to the same behavior of the algorithms in the first iterations. For each value of \(\tau\), despite the remarkable efficiency, DFL box does not show a strong reliability, which is significantly improved by DFNDFL.
Next we compare DFNDFL against solvers that make use of sophisticated models to improve the search. In particular, we considered NOMAD (using models), RBFOpt and MISO. We point out that these three solvers exploit different kinds of models. In particular, NOMAD takes advantage of quadratic models whereas RBFOpt and MISO make use of radial basis function models. It is important to highlight that both MISO and RBFOpt are designed for a low budget of function evaluations, (i.e., to obtain large improvements at the very beginning of the search); they do not have the capability of finding very accurate local solutions, thus they are in general not expected to be competitive for high precision.
Figure 2 reports performance and data profiles for the three considered levels of accuracy when DFNDFL is compared with the algorithms that make use of models.
From the performance and data profiles, we can note that DFNDFL is competitive with the other methods for low precisions, and gives better results than the other methods, both in terms of efficiency and reliability, when precision gets higher.
These numerical results highlight that DFNDFL has a remarkable efficiency and compares favorably to the state-of-the-art-solvers in terms of reliability, thus confirming and strengthening the properties of DFL box and providing a noticeable contribution to the derivative-free optimization solvers for bound constrained problems.
5.3.2 The nonlinearly constrained case
The algorithms adopted for comparison in this case are DFL gen and the two versions of NOMAD (w/ and w/o models). We point out that the handling of the constraints in NOMAD is performed by the progressive/extreme barrier approach (see [7, 10, 32]) by specifying the PEB constraints type. We would like to highlight that we only used NOMAD and the constrained version of DFL box (namely DFL gen) in this further comparison, due to their better performances in the bound constrained case and the explicit handling of nonlinearly constrained problems.
Figure 3 reports performance and data profiles, for the comparison of DFNDFL–CON, DFL gen and NOMAD (not using models). The figure quite clearly shows that DFNDFL–CON is the most efficient and reliable algorithm, and the difference with the other algorithms significantly grows as the level of accuracy increases. It is important to highlight that using the primitive directions allows our algorithm to improve the strategy of DFL, which only uses the set of coordinate directions. This results in a larger percentage of problems solved by DFNDFL–CON, even when compared with NOMAD (w/ mod).
Finally, Fig. 4 reports the comparison between DFNDFL–CON and NOMAD using models on the set of 204 constrained problems. Also in this case, it emerges that DFNDFL–CON is competitive with NOMAD both in terms of data and performance profiles.
To conclude, these numerical results show that DFNDFL–CON has remarkable efficiency and reliability when compared to state-of-the-art-solvers.
6 Conclusions
In this paper, new linesearch-based methods for mixed-integer nonsmooth optimization problems have been developed assuming that first-order information on the problem functions is not available. First, a general framework for bound constrained problems has been described. Then, an exact penalty approach has been proposed and embedded into the framework for the bound constrained case, thus allowing to tackle the presence of nonlinear (possibly nonsmooth) constraints. Two different sets of directions are adopted to deal with the mixed variables. On the one hand, a dense sequence of continuous search directions is required to detect descent directions. On the other hand, primitive directions are employed to suitably explore the integer lattice thus avoiding to get trapped into bad points.
Numerical experiments have been performed on both bound and nonlinearly constrained problems. The results highlight that the proposed algorithms have good performances when compared with some state-of-the-art-solvers, thus providing a good tool for handling the considered class of derivative-free optimization problems.
Data availability
The datasets generated during and/or analysed during the current study are available in the DFL repository, http://www.iasi.cnr.it/~liuzzi/dfl as package DFNDFL.
Change history
22 July 2022
Missing Open Access funding information has been added in the Funding Note.
References
Abramson, M., Audet, C.: Filter pattern search algorithms for mixed variable constrained optimization problems. Pac. J. Optim. 3, 1–10 (2004)
Abramson, M., Audet, C., Chrissis, J., Walston, J.: Mesh adaptive direct search algorithms for mixed variable optimization. Optim. Lett. 3, 35–47 (2009). https://doi.org/10.1007/s11590-008-0089-2
Abramson, M., Audet, C., Dennis, J.E., Jr., Le Digabel, S.: OrthoMADS: a deterministic MADS instance with orthogonal directions. SIAM J. Optim. 20, 948–966 (2009). https://doi.org/10.1137/080716980
Audet, C., Dennis, J.E., Jr.: Pattern search algorithms for mixed variable programming. SIAM J. Optim. (2001). https://doi.org/10.1137/S1052623499352024
Audet, C., Dennis, J.E., Jr.: A pattern search filter method for nonlinear programming without derivatives. SIAM J. Optim. (2004). https://doi.org/10.1137/S105262340138983X
Audet, C., Dennis, J.E., Jr.: Mesh adaptive direct search algorithms for constrained optimization. SIAM J. Optim. 17, 188–217 (2006). https://doi.org/10.1137/040603371
Audet, C., Dennis, J.E., Jr.: A progressive barrier for derivative-free nonlinear programming. SIAM J. Optim. 20(1), 445–472 (2009)
Audet, C., Hare, W.: Derivative-Free and Blackbox Optimization. Springer, New York (2017). https://doi.org/10.1007/978-3-319-68913-5
Audet, C., Le Digabel, S., Tribes, C., Montplaisir, V.R.: The NOMAD project. https://www.gerad.ca/nomad/
Audet, C., Dennis, J.E., Jr., Le Digabel, S.: Globalization strategies for mesh adaptive direct search. Comput. Optim. Appl. 46, 193–215 (2010). https://doi.org/10.1007/s10589-009-9266-1
Audet, C., Le Digabel, S., Tribes, C.: The mesh adaptive direct search algorithm for granular and discrete variables. SIAM J. Optim. 29, 1164–1189 (2019). https://doi.org/10.1137/18M1175872
Boukouvala, F., Misener, R., Floudas, C.: Global optimization advances in mixed-integer nonlinear programming, minlp, and constrained derivative-free optimization, cdfo. Eur. J. Oper. Res. (2015). https://doi.org/10.1016/j.ejor.2015.12.018
Bratley, P., Fox, B.L.: Algorithm 659: implementing Sobol’s quasirandom sequence generator. ACM Trans. Math. Softw. 14(1), 88–100 (1988). https://doi.org/10.1145/42288.214372
Clarke, F.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)
Conn, A., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. MPS-SIAM Book Series on Optimization, SIAM, Philadelphia (2009)
Costa, A., Nannicini, G.: RBFOpt: an open-source library for black-box optimization with costly function evaluations. Math. Program. Comput. 10, 597–629 (2018). https://doi.org/10.1007/s12532-018-0144-7
Custódio, A.L., Vicente, L.N.: Using sampling and simplex derivatives in pattern search methods. SIAM J. Optim. 18, 537–555 (2007). https://doi.org/10.1137/050646706
Custódio, A.L., Dennis, J.E., Jr., Vicente, L.N.: Using simplex gradients of nonsmooth functions in direct search methods. IMA J. Numer. Anal. 28(4), 770–784 (2008). https://doi.org/10.1093/imanum/drn045
Di Pillo, G., Facchinei, F.: Exact barrier function methods for Lipschitz programs. Appl. Math. Optim. 32(1), 1–31 (1995)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Fasano, G., Liuzzi, G., Lucidi, S., Rinaldi, F.: A linesearch-based derivative-free approach for nonsmooth constrained optimization. SIAM J. Optim. 24, 959–992 (2014). https://doi.org/10.1137/130940037
Garcìa-Palomares, U.M., Rodriguez, J.F.: New sequential and parallel derivative-free algorithms for unconstrained minimization. SIAM J. Optim. 13, 79–96 (2002). https://doi.org/10.1137/S1052623400370606
Garcìa-Palomares, U.M., Costa-Montenegro, E., Asorey Cacheda, R., González-Castaño, F.: Adapting derivative free optimization methods to engineering models with discrete variables. Optim. Eng. (2012). https://doi.org/10.1007/s11081-011-9168-9
Gould, F.J., Tolle, J.W.: Geometry of optimality conditions and constraint qualifications. Math. Program. (1972). https://doi.org/10.1007/BF01584534
Halstrup, M.: Black-box optimization of mixed discrete-continuous optimization problems (2016). Retrieved from Eldorado - Repository of the TU Dortmund
Halton, J.H.: On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Num. Math. 2(1), 84–90 (1960)
Hemker, T., Fowler, K., Farthing, M., Von Stryk, O.: A mixed-integer simulation-based optimization approach with surrogate functions in water resources management. Optim. Eng. 9, 341–360 (2008). https://doi.org/10.1007/s11081-008-9048-0
Jahn, J.: Introduction to the Theory of Nonlinear Optimization, 3rd edn. Springer, Incorporated, New York (2014)
Karmitsa, N.: Test problems for large-scale nonsmooth minimization. Reports of the Department of Mathematical Information Technology, Series B, Scientific computing, B 4/2007 (2007)
Larson, J., Menickelly, M., Wild, S.M.: Derivative-free optimization methods. Acta Num. 28, 287–404 (2019). https://doi.org/10.1017/s0962492919000060
Larson, J., Leyffer, S., Palkar, P., Wild, S.M.: A method for convex black-box integer global optimization. J. Glob. Optim. 1, 1–39 (2021). https://doi.org/10.1007/s10898-020-00978-w
Le Digabel, S.: Algorithm 909: NOMAD: Nonlinear optimization with the MADS algorithm. ACM Trans. Math. Softw. 37(4), 1–15 (2011)
Liuzzi, G., Lucidi, S., Rinaldi, F.: Derivative-free methods for bound constrained mixed-integer optimization. Comput. Optim. Appl. (2012). https://doi.org/10.1007/s10589-011-9405-3
Liuzzi, G., Lucidi, S., Rinaldi, F.: Derivative-free methods for mixed-integer constrained optimization problems. J. Optim. Theory Appl. (2014). https://doi.org/10.1007/s10957-014-0617-4
Liuzzi, G., Lucidi, S., Rinaldi, F.: An algorithmic framework based on primitive directions and nonmonotone line searches for black-box optimization problems with integer variables. Math. Program. Comput. 12, 673–702 (2020). https://doi.org/10.1007/s12532-020-00182-7
Lucidi, S., Sciandrone, M.: On the global convergence of derivative-free methods for unconstrained optimization. SIAM J. Optim. 13, 97–116 (2002). https://doi.org/10.1137/S1052623497330392
Lucidi, S., Piccialli, V., Sciandrone, M.: An algorithm model for mixed variable programming. SIAM J. Optim. (2005). https://doi.org/10.1137/S1052623403429573
Luǩsan, V., Vlček, J.: Test problems for nonsmooth unconstrained and linearly constrained optimization. Technical report VT798-00, Institute of Computer Science, Academy of Sciences of the Czech Republic (2000)
Mangasarian, O., Fromovitz, S.: The fritz john necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Anal. Appl. 17(1), 37–47 (1967)
Moré, J., Wild, S.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20, 172–191 (2009). https://doi.org/10.1137/080724083
Müller, J.: MISO: mixed-integer surrogate optimization framework. Optim. Eng. 17, 1–27 (2015). https://doi.org/10.1007/s11081-015-9281-2
Müller, J., Shoemaker, C., Piché, R.: SO-I: A surrogate model algorithm for expensive nonlinear integer programming problems including global optimization applications. J. Glob. Optim. (2013). https://doi.org/10.1007/s10898-013-0101-y
Müller, J., Shoemaker, C.A., Piché, R.: SO-MI: A surrogate model algorithm for computationally expensive nonlinear mixed-integer black-box global optimization problems. Comput. Oper. Res. 40(5), 1383–1400 (2013). https://doi.org/10.1016/j.cor.2012.08.022
Newby, E., Ali, M.: A trust-region-based derivative free algorithm for mixed integer programming. Comput. Optim. Appl. 60, 199–229 (2014). https://doi.org/10.1007/s10589-014-9660-1
Porcelli, M., Toint, P.: BFO, a trainable derivative-free brute force optimizer for nonlinear bound-constrained optimization and equilibrium computations with continuous and discrete variables. ACM Trans. Math. Softw. 44, 1–25 (2017). https://doi.org/10.1145/3085592
Powell, M.: The BOBYQA algorithm for bound constrained optimization without derivatives. Technical Report, Department of Applied Mathematics and Theoretical Physics (2009)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Sobol, I.: Uniformly distributed sequences with an additional uniform property. USSR Comput. Math. Math. Phys. 16(5), 236–242 (1976). https://doi.org/10.1016/0041-5553(76)90154-3
Sriver, T.A., Chrissis, J.W., Abramson, M.A.: Pattern search ranking and selection algorithms for mixed variable simulation-based optimization. Eur. J. Oper. Res. 198(3), 878–890 (2009). https://doi.org/10.1016/j.ejor.2008.10.020
Torczon, V.: On the convergence of pattern search algorithms. SIAM J. Optim. 7(1), 1–25 (1997). https://doi.org/10.1137/S1052623493250780
Vicente, L.N., Custódio, A.L.: Analysis of direct searches for discontinuous functions. Math. Program. 133, 1–27 (2009). https://doi.org/10.1007/s10107-010-0429-8
Yang, S., Liu, H., Pan, C.: An efficient derivative-free algorithm for bound constrained mixed-integer optimization. Evol. Intell. (2019). https://doi.org/10.1007/s12065-019-00326-2
Funding
Open access funding provided by Università degli Studi di Roma La Sapienza within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The following result simply establishes a connection between minimum points, stationary points and Clarke stationary points of Problem (2.2). It will be used here to relate Problem (1.3) and Problem (4.1).
Lemma 2
Any local minimum point of Problem (2.2) is a stationary point. Furthermore, any stationary point for Problem (2.2) is a Clarke stationary point.
Proof
By Definition 7 a local minimum is stationary with respect to the continuous variables. Thus, it is stationary according to Definition 10.
Since, by (2.1) and (2.5), we have that
it follows that a stationary point is also a Clarke stationary point. \(\square\)
First of all, we report a technical proposition which will be useful in the following.
Proposition 9
Let \(\{x_k\} \subset X\cap {\mathcal {Z}}\) for all k, and \(\{x_k\}\rightarrow {\bar{x}}\in X\cap {\mathcal {Z}}\) for \(k\rightarrow \infty\). Then, for k sufficiently large,
Proof
Since \(x_k\in X\cap {\mathcal {Z}}\) and \(x_k\rightarrow {\bar{x}}\), we have necessarily that \((x_k)_z = {\bar{x}}_z\) for k sufficiently large. Furthermore, for k sufficiently large, we also have that the following inclusions hold:
Now, the result follows by considering \({\bar{d}}\in D^c({\bar{x}})\) and recalling the definition of \(D^c({\bar{x}})\) (i.e., Definition 8) and the above inclusions. \(\square\)
We first prove the equivalence between local minimum points and global minimum points of the two problems. Then, we prove that any Clarke stationary point of Problem (4.1) (according to Definition 11) is a stationary point for Problem (1.3) (according to Definition 13).
Proposition 10
Let Assumption 7hold. A threshold value \(\varepsilon ^{*} > 0\) exists such that the function \(P(x;\varepsilon )\) has no Clarke stationary points in \((X\cap {\mathcal {Z}})\setminus {\mathcal {F}}\) for any \(\varepsilon \in (0,\varepsilon ^{*}]\).
Proof
The proof of this proposition is very similar to the proof of [21, Proposition B.1]. However, we report it since the presence of discrete variables slightly changes the reasoning. By contradiction, we assume that for any integer k, an \(\varepsilon _k \le 1/k\) and a stationary point for Problem (4.1) \(x_k\in (X\cap {\mathcal {Z}})\setminus {\mathcal {F}}\) exist. Then, let us consider a limit point \({\bar{x}}\in (X\cap {\mathcal {Z}})\setminus {\mathop {{\mathcal {F}}}\limits ^{\circ }}\) of the sequence \(\{x_k\}\) and, without loss of generality, let us call the corresponding subsequence as \(\{x_k\}\) as well. Then, since \(x_k\rightarrow {\bar{x}}\), the discrete variables remain fixed, i.e. \((x_k)_i = {\bar{x}}_i\), for all \(i\in I^z\) and k sufficiently large. Now, the proof continues by separately assuming that point (i) or (ii) of Assumption 7 holds.
First we assume that point (i) of Assumption 7 holds at \({\bar{x}}\). Therefore, a direction \({\bar{s}}\in D^c({\bar{x}})\) exists such that
In particular, it follows that
where \(I({\bar{x}}) = \left\{ i\in \{1,{2,}\dots ,m\}: g_i({\bar{x}}) = {\bar{\phi }}({\bar{x}}) \right\}\), \({\bar{\phi }}(x) = \max \left\{ 0,g_1(x),{g_2(x),}\dots ,g_m(x)\right\}\) and \(\eta\) is a positive scalar. Note that \({\bar{x}}\not \in {\mathcal {F}}\) implies \({\bar{\phi }}({\bar{x}}) > 0\).
By Proposition 9, it follows that \({\bar{s}}\in D^c(x_k)\). Moreover, since \(x_k\) satisfies the Definition 10 of stationary point, we have that
By [14], we know that
and
where \(\beta ^i\ge 0\) for all \(i\in I(x_k)\) and Co(A) denotes the convex hull of a set A (see [47, Theorem 3.3]).
Therefore, by (A.2)–(A.4), \(\xi ^{f}_k \in {\partial _{c}} f(x_k)\), \(\xi ^{g_i}_k \in {\partial _{c}} g_i(x_k)\) and \(\beta ^i_k\) with \(i\in I(x_k)\) exist such that
Since m is a finite number, there exists a subsequence of \(\{x_k\}\) such that \(I(x_k) = {\bar{I}}\).
Then, recalling that \((x_k)_i={\bar{x}}_i\), for all \(i\in I^z\) and for k sufficiently large, and since a locally Lipschitz continuous function has a generalized gradient which is locally bounded, it results that the sequences \(\{\xi ^{f}_k\}\) and \(\{\xi ^{g_i}_k\}\), with \(i\in {\bar{I}}\), are bounded. Hence, we get that
Now the upper semicontinuity of \({\partial _{c}} f\) and \({\partial _{c}} g_i\), with \(i\in {\bar{I}}\), at \({\bar{x}}\) (see Proposition 2.1.5 in [14]) implies that \({\bar{\xi }}^{f}\in {\partial _{c}} f({\bar{x}})\) and \({\bar{\xi }}^{g_i}\in {\partial _{c}} g_i({\bar{x}})\) for all \(i\in {\bar{I}}\).
The continuity of the problem functions guarantees that for k sufficiently large
and, in turn, this implies that for k sufficiently large
Since \(I(x_k) \subseteq I({\bar{x}})\), we have that
Finally, for k sufficiently large, (A.1), (A.6), and (A.7) imply
and (A.5), multiplied by \(\varepsilon _k\), implies
Equations (A.8) and (A.9) yield
which, by using (A.6), gives rise to a contradiction when \(\varepsilon _k \rightarrow 0\).
Now we assume that point (ii) of Assumption 7 holds at \({\bar{x}}\). Let \({\bar{d}} \in D^z({\bar{x}})\) be the direction such that
recalling that \((x_k)_i={\bar{x}}_i\), for all \(i\in I^z\) and for k sufficiently large, we have that for k sufficiently large \(D^z({\bar{x}}) = D^z(x_k)\), so that \({\bar{d}}\in D^z(x_k)\). By definition of stationary point and discrete neighborhood, we have
Hence,
Multiplying by \(\varepsilon\) and considering that \(\varepsilon _k \rightarrow 0\), if we take the limit for \(k \rightarrow \infty\) we have that
The latter equation is in contradiction with (A.10). \(\square\)
Now, we can prove that there exists a threshold value \({\bar{\varepsilon }}\) for the penalty parameter such that, for any \(\varepsilon \in (0, {\bar{\varepsilon }})\), any local minimum of the penalized problem is also a local minimum of the original problem. In particular, the following two propositions are analogous to [19, Theorems 10 and 11].
Proposition 11
Let Assumptions 7hold. Given Problem (1.3) and considering Problem (4.1), a threshold value \({\bar{\varepsilon }}>0\) exists such that for every \(\varepsilon \in (0, {\bar{\varepsilon }}]\), any local minimum point \({\bar{x}}\) of Problem (4.1) is also a local minimum of Problem (1.3).
Proof
Let \({\bar{x}}\) be any local minimum point of \(P(x;\epsilon )\) on \(X\cap {{\mathcal {Z}}}\). By Lemma 2, we have that \({\bar{x}}\) is also a Clarke stationary point.
Now Proposition 10 implies that a threshold value \(\varepsilon ^{*} > 0\) exists such that \({\bar{x}} \in {{\mathcal {F}}} \cap {{\mathcal {Z}}} \cap X\) for any \(\varepsilon \in (0,\varepsilon ^{*}]\). Therefore, \(P({\bar{x}};\varepsilon ) = f({\bar{x}})\) which implies that \({\bar{x}}\) is also a local minimum point for Problem 1.3. \(\square\)
Proposition 12
Let Assumption 7hold. Given Problem (1.3) and considering Problem (4.1), a threshold value \({\bar{\varepsilon }}>0\) exists such that for every \(\epsilon \in (0, {\bar{\varepsilon }})\), any global minimum point \({\bar{x}}\) of Problem (4.1) is also a global minimum point of Problem (1.3) and viceversa.
Proof
We start by proving that any global minimum point of Problem (4.1) is also a global minimum point of Problem (1.3). Proceeding by contradiction, let us assume that for any integer k a positive scalar \(\varepsilon _k < 1/k\) and a point \(x_k\) exist such that \(x_k\) is a global minimum point of \(P(x_k;\varepsilon _k)\) but it is not a global minimum point of f(x). If we denote as \(\hat{x}\) a global minimum point of f(x), we have that
Since \(x_k\) are global minimum points, by Lemma 2 they are also stationary points of \(P(x;\varepsilon _k)\) according to Clarke definition for the continuous variables. By Proposition 10 there exists a threshold value \(\varepsilon ^{*} > 0\) such that \(x_k \in {{\mathcal {F}}} \cap X \cap {{\mathcal {Z}}}\) for any \(\varepsilon _k \in (0,\varepsilon ^{*}]\). Therefore, \(P(x_k;\varepsilon _k) = f(x_k)\). By (A.11), it follows that \(f(x_k) \le f({\hat{x}})\), contradicting the assumption that \(x_k\) is not a global minimum point of f(x).
Now we prove that any global minimum point \({\bar{x}}\) of Problem (1.3) is also a global minimum point of Problem (4.1) for any \(\varepsilon \in (0,{\bar{\varepsilon }}\)). Since \({\bar{x}} \in {{\mathcal {F}}} \cap {{\mathcal {Z}}} \cap X\), we have that \(P({\bar{x}};\varepsilon ) = f({\bar{x}})\). By the previous proof, a global minimizer \(x_\varepsilon\) of \(P(x;\varepsilon )\) is feasible for Problem (1.3), hence \(P(x_\varepsilon ;\varepsilon ) = f(x_\varepsilon )\). Furthermore, it is also a global minimum point of Problem (1.3), thus we have \(f(x_\varepsilon ) = f({\bar{x}})\). Therefore, since \(P(x_\varepsilon ;\varepsilon ) = f({\bar{x}})\), \({\bar{x}}\) is also a global minimum point of \(P(x;\varepsilon )\). \(\square\)
In order to give stationarity results for Problem (4.1), we have the following proposition.
Proposition 13
For any \(\varepsilon >0\), every stationary point \({\bar{x}}\) of Problem (4.1) according to Clarke, such that \({\bar{x}}\in {{\mathcal {F}}} \cap {{\mathcal {Z}}} \cap X\), is also a stationary point of Problem (1.3).
Proof
Since \({\bar{x}}\) is, by assumption, a stationary point of Problem (4.1) according to Clarke (see Lemma 2), then we have by definition of Clarke stationarity that for all \(s\in D^c({\bar{x}})\),
and
Hence, by A.12, there exists \(\xi _s\in {\partial _{c}} P({\bar{x}};\varepsilon )\) such that \((\xi _s)^\top s \ge 0\) for all \(s\in D^c({\bar{x}})\). Now, we recall that
for some \(\beta _i\), with \(i\in I(x)\), such that \(\sum _{i\in I(x)}\beta _i=1\) and \(\beta _i\ge 0\) for all \(i\in I(x)\). Hence, we have that \(\xi _s \in {\partial _{c}} f({\bar{x}}) + \frac{1}{\varepsilon }\sum _{i\in I({\bar{x}})} \beta _i{\partial _{c}} g_i({\bar{x}})\). Then, denoting \(\lambda _i = \beta _i/\varepsilon\) with \(i\in I({\bar{x}})\), and assuming \(\lambda _i = 0\) for all \(i \notin I({\bar{x}})\), we can write, for all \(s \in D^c({\bar{x}})\),
Recalling that \({\bar{x}}\) is feasible for Problem (1.3), by (A.13) we have
Considering that \({\bar{x}}\in {{\mathcal {F}}} \cap {{\mathcal {Z}}} \cap X\), (A.14), (A.15), and (A.16) prove that \({\bar{x}}\) is a KKT stationary point for Problem (1.3), thus concluding the proof. \(\square\)
Proposition 14
Let Assumption 7hold. Then, a threshold value \(\varepsilon ^{*} >0\) exists such that, for every \(\varepsilon \in (0,\varepsilon ^{*}]\), every stationary point \({\bar{x}}\) of Problem (4.1) is stationary (according to Definition 13) for Problem (1.3).
Proof
Since \({\bar{x}}\) is stationary for Problem (4.1), we have by Definition 10 that
and
Then, by Definitions 3 and 9 , we have that
By (A.17), it follows that
The proof follows by considering Propositions 10, 13 and (A.18). \(\square\)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Giovannelli, T., Liuzzi, G., Lucidi, S. et al. Derivative-free methods for mixed-integer nonsmooth constrained optimization. Comput Optim Appl 82, 293–327 (2022). https://doi.org/10.1007/s10589-022-00363-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-022-00363-1