Convergence of derivative-free nonmonotone Direct Search Methods for unconstrained and box-constrained mixed-integer optimization

This paper presents a class of nonmonotone Direct Search Methods that converge to stationary points of unconstrained and boxed constrained mixed-integer optimization problems. A new concept is introduced: the quasi-descent direction. A point x is stationary on a set of search directions if there exists no feasible qdd on that set. The method does not require the computation of derivatives nor the explicit manipulation of asymptotically dense matrices. Preliminary numerical experiments carried out on small to medium problems are encouraging.


Introduction
Direct Search Methods (DSM) have been extensively used for solving optimization problems with continuous variables. Despite their simplicity, encouraging numerical results have been reported for solving small and medium size problems, with or without first order derivative information (see [22] and references therein). DSMs solve the minimization problem, minimize f (x), x ∈ F ⊆ R n as follows: At the i-th iteration an estimate x i ∈ F to the solution is known. A search direction d i ∈ ∞ D= {d ∈ R n : ||d|| = 1} and a stepsize τ i ∈ R + complying with manageable conditions are found, and a new estimate x i+1 = x i + τ i d i ∈ F is generated. Early application of DSMs on optimization problems with continuous variables were monotone; i.e. they forced a sufficient decrease to the function values, namely f (x i+1 ) ≤ f (x i ) − σ i ; for some σ i > 0. Steepest descent, variable metric and Newton methods are classical DSM instances which use or approximate first order information.
B Ubaldo M. García Palomares ubaldo@gti.uvigo.es 1 Universidade de Vigo, Vigo, Spain The pattern search method (PSM), introduced in [42], is an instance of a monotone DSM with no derivative information. An extensive list of references for monotone algorithms is given in [5]. Reference [23] stands as the pioneer work of nonmonotone DSMs to problems with continuous variables. Several variants were suggested which inherit the properties of Newton's method with explicit derivative information [11,14,24,44]. These authors claim that nonmonotone DSMs might improve the performance of a monotone algorithm, when the estimates enter into a curved narrow valley. In smooth constrained optimization, nonmonotone algorithms are used to avoid the Maratos' effect [45, and references therein]. Performance of nonmonotone DSMs is more stable on noisy functions than its monotone counterpart [5,16]. Authors in references [14,16,17,19,20] employ nonmonotone DSMs to try to avoid convergence to nearby local minimizers. Finally, the authors in [21] claim that almost any deterministic monotone optimization algorithm for solving models with continuous variables has a nonmonotone counterpart. This paper analyzes the basic theory needed for solving minimize (x;z)∈F g(x; z) with a non-monotone DSM, where F is an unspecified feasible set. The results are then adapted to solve the mixed integer optimization problem (1) formulated as where g(·) might be nonsmooth. To facilitate our exposition, (x; z) stands for a column vector of n + p components, with x ∈ R n and z ∈ Z p . Thus, in (1), s and t are known vectors in R n+ p , which represent respectively, the lower and upper bounds of all variables in (1). An initial effort to use PSMs for solving nonlinearly constrained problems with discrete -and/or categorical-variables can be found in [1,5]. This paper is an outgrowth of the nonmonotone DSM proposed in [19], where the discrete variables were assumed to lay on a regular grid of discrete points.
This work presents a convergence theory for a class of monotone and nonmonotone DSMs, which allows us to have at hand a common framework for solving problem (1) and its particular instances: unconstrained and box-constrained optimization problems with mixed integer variables. We assume that we use penalization, Lagrangian functions, filtering, or any other technique that may transform a constrained problem into a single or a sequence of box-constrained problems structured like (1). The authors in [1,2] propose a simple barrier function, which assumes that the objective function is infinite at all infeasible points. Since there exist nonsmooth functions that do not decrease along any direction from a point that is not a minimum [13, Exercise 2.6], we always consider a finite set of search directions; nonetheless, we show that our algorithm might detect directions of negative curvature. We solve the mixed optimization problem (1) under relaxed conditions, regardless of the availability of derivative information. Some degree of randomness is necessary to fully exploit the algorithm's capabilities, though the numerical tests show that a deterministic version of the algorithm performs well.
For nonsmooth problems, the nonmonotone DSM proposed in this paper shares convergence properties with monotone DSMs frequently cited. As our purpose is to apply this methodology in derivative free optimization (DFO), where differentiability and other function features may be unknown to the user, no effort is devoted to the rate of convergence.
The paper's primary goal is to analyze non-monotone DSM algorithms, which in general, have been scarcely considered in the open literature. All convergence results hold with either explicit use of first order derivatives or for DFO. Moreover, if first order information is at hand, the algorithms simplify significantly, and, generally, a singleton search direction is easily identified.
The remainder of this paper is organized as follows. In the next section we prove some lemmas that are essential for the understanding of the algorithms. We state some useful definitions and display pseudocodes that will guide us in the analysis of the optimization problems to be exposed in Sects. 3, 4 and 5. Section 3 describes the DSM proposed for solving problems with continuous variables. We show that nonmonotone DSMs generate a sequence onverging to a point satisfying a non-smooth stationarity condition; which, to our knowledge, has been overlooked. For Lipschitzian functions the sequence converges to a point satisfying the stationarity condition in the Clarke sense. Section 3 conforms the theory developed in Sect. 2 to unconstrained and box-constrained problems. Compactness, Fréchet differentiability and continuous first order derivatives around limit points are sufficient conditions to achieve convergence. In Sects. 4 and 5 we assume that variables can take discrete values. The methodology persists. Convergence to a stationarity point does not require to detect the best local value in the vicinity of a solution estimate. We suggest a variable separation strategy to solve Problem (1). In Sect. 6 we solve a group of low-dimensional academic problems. The results look promising, and seem to validate the theoretical findings. We also solve a real application problem and report a new global solution to the problem. The paper concludes with some remarks and open questions to research topics that are under investigation.
In short, and for the sake of clarity, we analyze convergence of the algorithms that solve the basic unconstrained and box-constrained optimization problems. This analysis is in substance carried over to the mixed variable problem (1), which is formally studied in Sect. 5.
Notation Throughout the paper we use a rather standard notation with some peculiarities.
-R n , R p , R r , and so forth, are Euclidean spaces.
-F ⊆ R n is the feasible set, -superscripts stand for components and sub-indexes represent different vectors, d T w is the inner product of vectors d, w defined in the same Euclidean space, uu T is a matrix U with components U jk = u j u k ; -(x; z) is a vector with (n + p) components, x ∈ R n , z ∈ Z p , -Whenẑ is fixed, f (x) denotes g(x;ẑ).
σ (·) : R + → R + stands for a sigma-function, which is: q represents a dummy integer, that plays different roles in the paper, -I is the identity matrix and e j , j = 1, . . . , n are its columns, , -Other Capital letters are matrices or finite sets, -If S is a finite set, #S is its cardinality, i will be the iteration number. It is usually implicitly assumed.
-We use the Matlab notation to invoke an algorithm.

Preliminaries
This work addresses non-smooth functions. However, convergence theory might require some degree of differentiability. Some algorithms assume that g(x; z) is Lipschitz continuous; which implies that the Clarke generalized directional derivative exists and it is finite. The Clarke generalized directional derivative is a useful concept in optimization theory. In this section we define concepts and prove several lemmas needed in the general analysis of our algorithms. The basic convergence assumptions are: A1: The objective function g(x; z) is continuous and computable at all feasible points. The level set L = {(x; z) ∈ F : g(x; z) ≤ ϕ 0 } is compact for any ϕ 0 .
This seems to be the weakest condition required by optimization algorithms that solve (1). A2: For a fixed z, the objective function g(x; z) in (1) is Fréchet differentiable and continuously differentiable around limit points. A3: g(x; z) is Lipschitz continuous in its first argument; that is, for a fixed z, |g(y; z)− g(x; z)| ≤ ϑ||y − x|| for some ϑ > 0.
The algorithm for solving the mixed integer problem (1) inherits most of its properties from the algorithms specifically proposed in Sects. 3 and 4 for solving the pure versions.

Definition 2
The directiond ∈ R n is a quasi-descent direction (qdd) at (x;ẑ) when
To facilitate the writing and the reading of this paper, we often denote f (x) Δ = g(x;ẑ) whenẑ remains fixed.

Lemma 1 Let A1, A2 hold and let the unit vector d be a strictly descent direction at
Lemma 2 Under Assumptions A1-A3, let {x j } →x ∈ R n and f 0 (x,d) < 0, thend is a qdd at some x sufficiently close tox.
Proof By A3, f 0 (x,d) exists and it is finite. Ifd is not a qdd for all x close tox, we can identify a sequence {τ j } j∈N ↓ 0, {x j } →x for which By taking limits, it follows that f 0 (x,d) ≥ 0, a contradiction.
Proof Sinced is a feasible qdd, (2)(3) hold by definition for someτ > 0. If (5) holds for τ =τ , and someγ > 0, the lemma is valid. We now proceed by contradiction: If the statement of the lemma is false, Algorithm 1 generates an infinite loop with a sequence of feasible points with strictly decreasing f (·) values, which contradicts A1.
this pseudo-code is an infinite loop when f (·) is unbounded from below on a ray d x =x +τd Choose any γ > 0, We would like to state that a point * x is non-smooth stationary (nss) on D when there exists no feasible qdds at * x; that is, when where D ⊆ ∞ D = {d ∈ R n : ||d|| = 1}. Proof If there exists no nss, then given any x ∈ F there exists λ > 0, d ∈ ∞ D and a qdd defined by (3). The while loop in Algorithm 1 would generate an infinite sequence of strictly descent f (·), contradicting A1.
The implication (6) is, in general, difficult to implement. We resort to finite sets D and discrete λ-values. We define a Λ-set as follows: . . , } with the following properties: -it is a bounded set, -it contains an infinite number of distinct elements in R + that can be lexicographically ordered; that is, ( j < k) ⇒ (λ j > λ k ). -its elements λ 0 , λ 1 , . . . converge to 0, that is, {λ j } ↓ 0.
For practical convenience, stationarity is defined on the sets (Λ, D). Ideally, we would like {x i } to converge to a total nss point. We will prove below that in unconstrained minimization this can occur on a very large set, provided that {D i } are carefully constructed for large enough i. On the other hand, a singleton direction is enough to ensure stationarity of smooth functions with derivative information.
Algorithm 1 is just one way to prove Lemma 3. Many other alternatives are possible, but this algorithm is easy to implement, and it is close to its monotone counterpart algorithm proposed in [18] and the related nonmonotone versions developed afterwards [14,16,17].
The following lemmas will be useful for differentiable functions.
Lemma 5 Letx ∈ R n be a fixed point, let D be a set of search directions fulfilling the conditions required by Lemma 4 and let τ 0 ∈ R + be a positive number. If f (·) is differentiable and Proof We proceed by contradiction and assume that ∇ f (x) = 0. Lemma 4 ensures It follows by Lemma 1 that d is a qdd; consequently, τ ∈ (0, τ 0 ) exists satisfying (5); therefore (11) can occur only if ∇ f (x) = 0.
If ∇ f (x) = 0, it is not possible to find a direction of descent satisfying d T ∇ f (x) < 0. Next lemma shows that, under more stringent conditions, it is possible to identify directions with negative curvature.

The search directions
In the forthcoming sections we will see that the convergence analysis of smooth functions is simplified when (12) holds. So, we always force (12) to be valid when ∇ f (·) exists.
By Lemma 4, orthogonal directions implicitly enforce (12) with α = 1/ √ n. It is also known that the cosine value α = 1/ √ n cannot be improved by any D that spans R n positively, with either n + 1 or 2n search directions [35]. It has been argued that performance of a DFO algorithm may improve with n + 1 search directions [4]. We suggest the directions ±d 1 , . . . , ±d n , where d 1 , . . . , d n are defined by Algorithm 2, Line 4. They are the columns of the Householder matrix H = I − 2uu T generated by a unit random vector u ∈ R n , as suggested by [18]. A list of nice properties of D = {d 1 , . . . , d n } is: -D is a set of orthogonal search directions.
-Algorithm 2 describes an easy way to generate random orthogonal directions obtained from the columns of the Householder matrix. -It has been argued that some degree of randomness may benefit the performance of a deterministic algorithm [22]. In [6] the authors claim that they did not find a deterministic strategy to improve their algorithm. -It is relatively simple to distribute the workload among processors [18].
is not necessary to construct explicitly the whole matrix H . The vector u contains the information needed to generate the search directions. This feature allows a significant saving in memory for medium and large problems. -Algorithm 2 conforms the search directions to the geometry of the constraints. It merely assigns u j = 0 in Line 2b whenever the variable x j is close to either one of its bounds. Note that Hence, x j + d j k = x j , for all k = j, which prevents the j-th variable from getting closer to its bounds. -The published numerical results since its inception in [18] have been highly competitive [14,16,17,19,20]. u j can be randomly generated in [−1 1], j = 1, . . . , n. This is convenient when x ∈ R n . -A suitable vector u ∈ Z n can be as well randomly generated to handle integer variables. From Lines 3a and 4 of Algorithm 2 we obtain that ζ d j = ζ e j −(2u j )u, where ζ = u T u and d j = He j . It follows that

The essential algorithm
The Iteration() algorithm, identified as Algorithm 3, will be the core iteration for those algorithms dealing with variables in R n . It is invoked by . Iterative calls to Iteration() are carried out until convergence criteria are met. The formal input parameters of Iteration() are the actual estimate The Iteration()'s task is to find a feasible qdd. Lines 2a-2c check if the input estimate x i−1 is a feasible qdd; in which case, λ ≤ ε at Line 2d and Iteration() returns. Strictly speaking, x i−1 has no feasible qdd when for any λ > 0 and any d ∈ ∞ D, it follows that x , ϕ , τ , D = Iteration(x , ϕ) 1a Define Λ (and μ) satisfying (7) 1b Choose λ ∈ Λ 1c Construct Q by Algorithm 2 and D ⊇ Q while (λ > ε) 2a Algorithm 3 Iteration() is the core algorithm used in optimization problems with continuous variables in R n . For unconstrained problems F = R n .
In a practical implementation we admit that there is is no qdd at x i−1 when (15) holds for λ > ε, λ ∈ Λ and a finite D i . When Iteration() finds a feasible qdd, it returns Lemma 3 ensures that (16) can be fulfilled. Iteration() also keeps the best estimate x b at its Line 4a.

Nonmonotone DSMs for optimization with variables in R n
This section applies the theory developed in Sect. 2 to our problems of interest. Section 3.1 deals with the unconstrained optimization problem This section describes algorithms to find non-smooth stationary points on a finite set of directions. Section 3.2 goes one step forward. It is concerned with convergence *

Algorithm 4 Continuous() is a non-monotone DSM that returns a stationary point of an optimization problem with variables in
to total stationary points characterized by Definition (6). Previous works have dealt with this issue. To ensure convergence, the authors in [6] need to proceed iteratively with an asymptotically dense matrix. We describe an algorithm which does not require to explicitly deal with an asymptotically dense matrix. Section 3.3 deals with the boxconstrained optimization problem

Unconstrained optimization
In this section we prove several convergence results for unconstrained problems. Under Assumptions A1, A2 convergence to a point fulfilling the classical first order necessary conditions is proved. If A1, A3 hold, a sequence of estimates {x i } i∈N converges to a stationary point fulfilling Definition 5 for D = * D, a limit point of {D i }. By including some more laborious algorithmic implementation in the limit we prove stationarity on the set ∞ D= {d : ||d|| = 1}. Our first task is to impose conditions on the searching sets {D i } that ensure convergence to stationary points of smooth functions.
Theorem 1 Let {d i } be the sequence of search directions that satisfies (12), that is: Proof From (16c) and (12) it follows that Since We claim that the following proposition is valid.
Proposition 1 Let {D i } be a sequence of search direction sets, each containing a direction of descent satisfying (12). Under Assumptions A1 and A2, Algorithm 4 applied to (17) Proof It is obvious.
In general, there is not a simple way to explicitly find a qdd at x; however, a qdd might appear in D when: We should recall that even if f (·) is differentiable, its first order derivatives might not be computable; and there is no way to explicitly verify (12). Nonetheless, Lemma 4 shows that (12) holds when D i is a set of orthogonal directions. Computability of the gradient simplifies the algorithm significantly. Convergence prevails for any strictly positive definite matrix P and Algorithm Continuous(), identified as Algorithm 4, is a non-monotone DSM that returns a stationary point to problem (17) regardless of the presence -or absence-of derivative information. It is invoked by It has 3 input arguments: the starting point x 0 ∈ F, the upper function value ϕ 0 ≥ f (x 0 ) and the accuracy ε. Continuous() finds an nss point when τ i < ε at its Line 1; but it returns ) make sure that all subsequent estimates have a function value below f (x b ). The convergence theory will require {x i } ⊂ F, which is obvious in unconstrained minimization.
Proof Line 1 of Algorithm 4 invokes Iteration(). If x i has no qdds, the if clause in Line 2b of Iteration() never holds and {λ} ↓ 0 within the while loop 2a-2c, but keeping fixed the solution estimate; that is, x j+1 = x j for all j ≥ i. By compactness we can identify a subsequence If the whole sequence {x i } i∈N possess qdds, it follows by (16b) and A1 that We conclude that * x is stationary on the limit set *

D.
Clearly We just proved that Algorithm 4 generates a sequence {x i } converging to a stationary point * x satisfying Definition 5 on a limit set * D. We now include material that will guide us to sketch a method to find stationarity on some D ⊃ * D. We will show stationarity on finite sets {D i } → ∞ D. As stated earlier, it is more plausible to implement the algorithm with discrete values for λ.

Remark 3
LetΛ be given complying with Definition 4, and let x i be a non-smooth stationary point satisfying (8) on the sets Algorithm 5 The input parameter y ∈ R n is non-smooth stationary satisfying (8)  We close this section with a brief description of the Algorithm 6, which improves the nss point found by Algorithm 5. Theoretically, {x i } converges with probability 1 to an nss point complying with Definition 6.

Remark 4
A uniformly random set of q directions in ∞ D can be efficiently obtained as follows: rand(0,1) is a random variable with standard normal distribution In [32] the author suggested that x, noqdd = Non-smooth(x, ε, Λ, budget) Given the starting point y ∈ F , accuracies ε, a Λ-set, a positive budget [ x ← x + λd end-while 5.
return where P r (E) is the probability of the event E and p is the number of generated directions.

Box-constrained optimization
In this subsection we deal with the box-constrained optimization problem (18), which is repeated here for easy reference: where s k , t k are, respectively, the lower and the upper bound of the variable x k . We assume s k < t k ; otherwise, when s k = t k , the variable x k has a constant value and it is no longer considered as a variable. An approach for solving (18) is to consider the barrier function and try to use the same tools used for solving the unconstrained problem (17). The algorithm starts at x 0 ∈ F and applies Algorithm 4 to the function defined by (24). The algorithm also needs that search directions conform to the geometry of the constrained set. Algorithm 2 takes care of this circumstance. To formalize the proof of convergence we need to introduce the following notation and definitions.
-Define the index set B i of binding variables at the i-th iteration as: -Denote by S i the subspace spanned by {x ∈ R n : x k = 0, k ∈ B i }, and let v i ∈ R n be the projected gradient on S i given by other wise. -Let E i = {d 1 , · · · , d q } be a finite set of q vectors in S i satisfying (12), that is, -Let the set D i of search directions be Essentially, we apply Algorithm 4 slightly modified to deal with the binding variables appointed to by B i . We now proceed to prove convergence. Let B be a set that appears infinitely often in the sequence {B i } i∈N . Let K = {i : B i = B } and let * x be a limit point, that is, {x i } i∈K → * x, for some K ⊆ K . (27). Under assumptions A1, A2, Algorithm 4 generates a sequence {x i } i∈K converging to a stationary point * x satisfying:

Theorem 3 Let {D i } be the sequence of search directions satisfying
x be defined as above. To prove (28a) we merely prove s k = * (6). By taking limits we obtain To prove (28b) we need to consider 2 cases: Together these 2 inequalities imply that ∇ f ( * x) k = 0. Besides, both e k , −e k satisfy (6). b. k / ∈ B i . By construction E i is a finite set that contains a direction that satisfies (12). If A1 and A2 hold, we mimic the convergence of Theorem 1 replacing D by E and ∇ f (·) by v and deduce that Algorithm 4 generates

The integer optimization problem
This section is concerned with the integer optimization problem where f (·) : R p → R. Problem (29) is an optimization problem with integer variables subjected to bounds. It is combinatorial. An exhaustive enumeration of the feasible points might be computationally expensive. Besides, (29) is generally a nonconvex problem and normally no algorithm ensures convergence to the optimizer. To try to find a global solution several strategies based on relaxations, cutting planes, branch and bound, surrogate models and heuristics have been devised. It seems out of the question to elaborate a common approach for getting the global solution to any instance of (29). However, some attempts have solved specific instances. In [7], the authors adapt MADS for solving a problem with grid variables regularly distributed along the coordinate axis. In [19], the authors applied a discretized version of Algorithms 3 and 4 for solving the same problem. We recall that this paper assumes that constraints other than bounds can be handled via penalization [12,33], Lagrangian [30], infinite barriers [1] or any other appropriate technique. Our algorithm is monotone with no σ -function involved. From the onset, we admit that we have at hand a feasible starting point z 0 ∈ F and only feasible points are considered as solution estimates. We also assume that A1 holds, which essentially means that the level set L = {z ∈ F : f (z) ≤ f (z 0 )} is finite. We consider convergence to local minimizers. A local neighborhood N (z i , ) can be defined as where || · || stands for any norm and > 0. Usually the iterate z i is considered stationary to (29) if A naive artifice to verify (31a) constructs N (z i , ) and then evaluates all v ∈ N (z i , ). This is often an expensive procedure that precludes the use of > 2. The algorithm Greedy(), identified as Algorithm 7, is described above. It tries to find better estimates that do not belong to the set N (z i , ); z i will be accepted as stationary if where V(z i ) ⊇ N (z i , ). The aim is to build V(z i ) − N (z i , ) = ∅. This can be regarded as a strategy to try to escape from the local stationary point defined by (31a). We now briefly illustrate the way Greedy() constructs V(z i ).
return [y] Algorithm 7 Greedy() algorithm for solving (29), an optimization problem with integer variables. The output y is a stationary point satisfying (33).

Line 3g:
This line is only executed when q = 1. Its purpose is to ensure that all points in N (y, ) are evaluated. Line 4a: Defines the set V q with 2q members. Lines 4b-4e: If (y does not satisfy (33)) the algorithm repeats the while loop with y = v , q = h. This means that v is now the candidate for being the stationary point.

Line 4d:
Otherwise, q is reduced.
With no loss of generality we assume v k = z k i , for k > q and rewrite (32a) as: In the actual implementation v k 1 = z k 1 i , · · · , v k q = z k q i , with k 1 , . . . , k q randomly chosen. The generality of (32b) is preserved, but the explanation that follows is more fluid. We make use of search directions and rewrite (32b) as We have not yet defined the set V q (z i ), but a promising candidate is: where H = I − (2/u T u)uu T is the p × p Householder matrix, u k ∈ {−2, −1, 1, 2} for k ≤ q, and u k = 0, k > q. The set V q (z i ) is the set E generated by Algorithm 2 specialized to discrete variables. Given h ≤ p, y is accepted as stationary if N (y, ). Greedy() is invoked by y = Greedy (z i , η), where y is stationary satisfying (33), z i is the variable under scrutiny, and ||y − z i || 0 ≤ η.

Remark 5 In our implementation we define V(y)
This algorithmic implementation was effective on the numerical tests performed in Sect. 6. A global solution was often returned. A detailed algorithmic description of Greedy() is given just after its pseudocode is displayed in Algorithm 7.  N (z, ). Hence, N (z, ).

Mixed integer optimization
In this section we analyze convergence of the Variable Separation (VS) scheme for solving problem (1), which is repeated here for easy reference.
The difficulties, the standard methodology and the software that has been used for solving mixed integer optimization problems is abridged in [8,28]. The authors in [19] suggested a uniform discretization of the continuous variables on a grid and solve (29). To improve the solution, the problem is again solved on a finer grid until convergence criteria are met. Details can be found in [19].

Neighborhood
The neighborhood definition is crucial to detect good optimizers. An algorithm that returns * z as a solution to a discrete problem must ensure that no better point is found in its neighborhood. A recent discussion on this issue was given in [36] in the context of DFO. We extract 2 definitions of local stationarity depending on a specified neighborhood. ( * x; * z) ∈ F is a local minimizer of a nonlinearly constrained problem with mixed variables if either A or B hold.
and N ( * and . The authors in [36] claim some benefits from the use of B. where we are employing the neighborhood V(z i ) given by (32) in the previous section.

Variable separation for solving (1)
The standard approach for solving (1) is based on the VS scheme. VS is a powerful technique that has been used in optimization problems in a multi-processing environment. It is specially suitable for solving large systems [1,31]. Convergence for DFO with VS and continuous variables was first established in [3] for monotone algorithms and later for nonmonotone DSMs [16]. A typical VS scheme splits a problem into two or more subproblems. In mixed integer problems, it seems natural to divide the problem (1) in the subproblems (37a, 37b). Each can be worked out with specific tools. In the rest of the paper, we commit a minor abuse of notation and B (x i−1 , ρ) substitutes for B(x i−1 , z i−1 , ρ).
Given (x i−1 ; z i−1 ) ∈ F, (37a) gives x i , the best local solution for x provided that z i−1 ∈ Z p stays fixed, then (37b) selects the best discrete variable in V(z i−1 ), a finite set of feasible points that includes z i−1 . This iterative process is repeated until convergence conditions are met. In [36] the authors suggest to look for a global solution to (37b) and claim some benefits [36,Section 2]. However, their choice may be computationally expensive, since it looks for the global optimum of a problem with discrete variables. A local solution will hopefully require the evaluation of g(x i ; z) at points in a small discrete set. In some applications the subproblem (37a) reduces to a linear model and z i is chosen with heuristics linked to the structure of the problem [39]. It is advisable to use known efficient techniques whenever possible.
Lemma 10 Let ρ be large enough and let z j = z i−1 for some j > i. Under Assumptions A1, A4 Algorithm 8 stops at a stationary point.
Proof At iterations i, j, with j > i we obtain by construction that We now assume z j = z i−1 and reach a contradiction. It follows that where equality (42b) holds because B(x j , ρ) = B(x i−1 , ρ) for large enough ρ. It also follows that which is a clear contradiction.

Algorithm 9
Main() algorithm for solving the mixed integer optimization problem (1) by Variable Separation in continuous and discrete variables. Input: if ∃{λ j } → 0 such that (47) holds 2b. return end-ifelse Algorithm 10 (Algorithm MixedContinuous). Section of the Main() algorithm to handle the continuous variables x in the mixed integer problem (1). .

invokes
MixedContinuous(), identified as Algorithm 10, within a repeat-until loop. The algorithm leaves the loop when g(x i ; z i ) < g(x i−1 ; z i ). This strategy makes sure the algorithm generates a decreasing sequence; g(x i ; z i ) > g(x j ; z j ) for i < j. This feature is shared with Algorithm 8 and it was very convenient for proving finite termination. Besides, if the decreasing condition does nold hold, the algorithm might not be converging to the optimum solution ( * x; * z) returned by standardVS(). Remark 7 Algorithms 10 and 11 are, respectively, Algorithms 3 and 7 adapted to the VS scheme for solving Problem (1). They are explicitly shown to facilitate the writing and reading of this section.
Pick q random indices, end-while 6.
return z i Algorithm 11 (Algorithm MixedDiscrete()). Section of the Main() algorithm to handle the discrete variables z in the mixed integer problem (1).

Lemma 11 Given
Proof This is a replica of Lemma 8, replacing f (z) by g(x; z). We reach a similar result, namely, g( Let x i+1 be returned by MixedContinuous(x i , z i+1 , ϕ i ). If a qdd is not found by MixedContinuous(), then ∃{λ j } ↓ 0 such that Main() returns (x i ; z i ) as stationary. In a practical implementation we admit that a qdd at x i does not exist when λ < ε in (47). In short, Main() returns (x i , z i ) satisfying (47) or it generates an infinite sequence {x i , z i , τ i , ϕ i , g i , D i } with the following properties: The following facts are inherited from the material analyzed in the previous sections of this paper. We should bear in mind that f (x) Proof By (48e) and A1 it follows that x satisfying (45).
be the infinite sequence generated by Main() and let * x, * D be an accumulation point of {x i , D i }. As the number of discrete variables is finite by A1, there exists a variable, say z k , that appears infinitely often in S. Let us denote this variable as * z and extract from S the subsequence * which shows that (47) holds. Furthermore, from Lemma 11 we obtain that g(x i ; * z) ≤ g(x i , z), z ∈ V( * z, ). We conclude that (45) holds. The proof is complete.

Numerical experiments
This section reports some results obtained with a code written in C and compiled with DEV-cpp version 5-11 on a computer equipped with an Intel Core CPU @ 3.3 GHz. These preliminary results emphasize four relevant goals: 1. Continuous vs Discrete Performance. All problems tested have global integer optimizers. We ran two algorithms for each problem: Continuous() with x ∈ R n and Greedy() with x ∈ Z n . We would like to show numerically that both algorithms converge adequately to the global solution. We point out that it is common to include the constraint x ∈ Z n to test integer optimization algorithms [37,38,43].

Global solution (glob).
We carried out 100 runs for each problem and report the number of cases where the known global optimum was attained. 3. Search directions. The experiments were performed with the randomly generated directions described by Algorithm 2 and with a coordinate search. 4. Functional evaluations (eval). This figure is often considered a performance index.
The choice of parameters may significantly influence any algorithm's behavior. The best selection is probably problem dependent. Nonetheless we run all examples with a common set of parameters, -= 1.e-07 is the required accuracy. The smaller its value, the larger the value of eval. We use mainly to stop the algorithms. -η = min(n, 6), where n is the number of variables. The η value suggests the size of the local neighborhood where a better stationary point may be located. The larger its value, the larger the value of eval.
This parameter affects the non-monotone behavior of the algorithm. -δ = 1.e-05. To inhibit stagnation we accept x i+1 as a new estimate when: The test problems have a small number of variables; however, they seem to represent a collection of hard problems. The starting point was a random feasible z ∈ Z n . Whenever possible, we state some comparison related issues with other algorithms that solve the same problem. The results are summarized in Tables 1, 2 The algorithm also solved a small nonlinear real problem described in [10,Section 3.3]. The idea is to show an artifice to deal with discrete variables, which are not integer variables. It is worth mentioning that our algorithm unveiled a better solution than that reported in [10]. This section ends with some remarks about the numerical results obtained. [9]. Optimizer: y= (−3, 13) a = 1, b = 5.1/(4π 2 ), c = 5.0/π, r = 6.0, s = 10.0, t = 1.0/(8π),

Branin function
The Branin function is a classical test problem in global optimization. The version shown above has an integer global solution. The results are given in Table 1. [9]. Optimizer: x k = 1, k = 1, . . . , 50

Rosenbrock function
The function (53) has been extensively used as a benchmark. It has long narrow valleys and local minimizers for n > 4 [27]. The global solution is all integer x = (1, 1, . . . , 1, . . . ) regardless the number of variables. The results shown in Table 2 with x ∈ Z 50 are remarkable when compared with [38]. It never gets trapped by any of the numerous local function minimizers [27]. : y = (1, 1) x 1 = y 1 + 0.14, x 2 = y 2 − 0.1, Minimax are non-smooth problems that appear often in testing algorithms.

Pintér function [40]. Optimizer
The Pintér problem was a difficult test for our algorithms. The author in [40, Section 4.4] justifies the need of using global search strategies. Problem (55) possesses many local minimizers with close function values. The dimension and solution can be arbitrarily selected. We chose y k = 1, k = 1, . . . , n with n = 5 and n = 10. The results are shown on Table 4.
The authors in [29] could not solve (56) with any of the heuristic algorithms they tried. In [15] the author reports that his algorithm returns the global solution in 70 out of 100 runs, with 20,000 function evaluations in average. Our algorithms always found the global solution. The number of evaluations for the discrete version was remarkably low in comparison with [38].
The Shekel function (57) has 10 local discrete minimizers, which is a challenge for the Greedy() algorithm. Again, in regards to function evaluations, the discrete version outperformed both the continuous version and the results reported in [15,38].
The problem (59) is a real problem. The goal is to design a minimum volume solution of a beam satisfying physical constraints. A detailed description is given in [10]. To solve (59) we use the exact penalty function 500 (max(g 1 (x), 0) + max(g 2 (x), 0) .

Mixed beam
We now mention the mixed variable problem suggested in [10]; x 4 becomes discrete with the additional constraint To handle (60) as a mixed variable problem we employ an integer variable z whose values are linked to x 4 as follows: Problem (59) is now a non-linearly constrained problem with 3 continuous variables (x 1 , x 2 , x 3 ) and the integer variable z. This kind of problem was not an aim of this paper. Nonetheless, Main() solved it with less than 600 function evaluations.

Comments on the numerical results
Problems 6.1−6.7 were taken from the open literature. They were originally proposed in benchmarks for continuous optimization. Nowadays many researches have added the constraint x ∈ Z n for testing integer optimization algorithms. With this sample we hope to identify features to be improved. Problem 6.1 The version of the Branin function exposed here has one global minimizer and two more local minimizers. This is the only problem in the sample that solved the discrete version in fewer function evaluations than those needed to solve the continuous version. Problem 6.2 The authors in [38] report that they successfully solved the discrete version of the Rosenbrock function in ≈1.6 million function evaluations. Our result is remarkable. Problem 6.3 For the Lukšan function the results in R n and coordinate search were not appealing. This is probably linked to the geometry of the problem. Problem 6.4 The Pintér problem was the most difficult to solve. Often, the algorithms converge to a local minimizer. Global strategies are needed to improve the algorithm's performance. Problem 6.5 In 400 runs our algorithms always returned the global minimizer of the Ackley function, This is an impressive result because few function evaluations were needed for solving the discrete version. Problem 6. 6 We had tested the Shekel function in [15]. The results have improved. Problem 6.7 This problem was efficiently solved; however we should pay attention to the dispersion in function evaluations showed in R n with random directions.
Overall, our algorithms, especially the discrete version, look promising. Better results should be expected by introducing globalization strategies.

Conclusion and final remarks
We have formalized and described a common framework for a class of nonmonotone Direct Search Methods. Convergence of non-smooth functions is proved with no differentiability assumptions. A new concept of quasi descent direction is included to show nonsmooth convergence. These methods converge under weak assumptions to classical first-order stationary points of unconstrained problems with continuous variables. Convergence prevails for box constrained optimization with mixed variables. Derivative information is not required. For unconstrained problems with continuous variables we prove convergence to a point that does not have a quasi descent direction. The role of a finite set of n orthogonal search directions randomly generated was crucial to ensure this result. Orthogonality was also an asset for the design of a variable separation algorithm for solving mixed-integer optimization problem. All convergence properties imply convergence to a Clarke stationary point, provided the function is Lipschitz continuous.
For the sake of clarity, we have exposed the theory complemented with a basic but unspecified implementation of the algorithms. Our framework is open to many variants that deserve further research. Nonetheless, we wrote a preliminary code and carried out numerical experiments on several academic problems and a 5-variable real application.
The results are encouraging. The next apparent line of research is the convergence analysis of DSMs to a nonlinearly constrained mixed-integer optimization problem.
We have left out for future study some important issues that are beyond the aims of this paper, like strategies for accelerating convergence, global minimization, parallelism, and hybrid methods. A clear research topic is the analysis and impact of non-monotone features in other successful approaches to DFO, like MADS [7], surrogate functions [25,26,41], and others.
In this paper, and also in [15], feasibility of the sequence of the solution estimates was forced at all iterations. If this condition can be removed, we would probably broaden the class of nonmonotone DSMs converging under the same conditions stated herein. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.