Globally convergent algorithms for finding zeros of duplomonotone mappings

We introduce a new class of mappings, called duplomonotone, which is strictly broader than the class of monotone mappings. We study some of the main properties of duplomonotone functions and provide various examples, including nonlinear duplomonotone functions arising from the study of systems of biochemical reactions. Finally, we present three variations of a derivative-free line search algorithm for finding zeros of systems of duplomonotone equations, and we prove their linear convergence to a zero of the function.

In mathematical models of biochemical reaction networks [3], a problem arises of finding a zero of functions that are typically not monotone (see Example 5). These functions seem to have a generalized monotonicity property that has not yet appeared in the literature but can be exploited to find a zero of such functions. In this paper we introduce this new class of generalized monotone mappings, which we call duplomonotone, and present a rather simple derivative-free line search algorithm that can be used to find a zero of a duplomonotone function.
The paper is organized as follows: in Sect. 2 we introduce duplomonotone mappings, analyze their basic properties and provide various illustrative examples; in Sect. 3 we present three variations of a derivative-free line search algorithm for finding a zero of a duplomonotone function, and we prove their linear convergence under strong duplomonotonicity plus some Lipschitz-type assumption on the points of the lower level set defined by the initial point.
Throughout, · denotes the Euclidean norm, while the usual inner product is denoted by ·, · . We say that F is a set-valued mapping from R m to R n , denoted by F : R m ⇒ R n , if for every x ∈ R m , F(x) is a subset of R n . The gradient of a differentiable function f : R m → R n at some point x ∈ R m is denoted by ∇ f (x) ∈ R m×n .

Duplomonotonicity
Recall that a function f : R m → R m is said to be monotone when f (x) − f (y), x − y ≥ 0 for all x, y ∈ R m , and strictly monotone if this inequality is strict whenever x = y. Further, f is called strongly monotone for some σ > 0 when We introduce next a new property that is implied by monotonicity.
and strictly duplomonotone if this inequality is strict whenever f (x) = 0. The function f is said to be strongly duplomonotone for some σ > 0 with constantτ > 0 if The modulus of strong duplomonotonicity is the supremum of the constants σ for which (2) holds.

Remark 1
Letting σ be zero in (2) will allow us to handle both duplomonotonicity and strong duplomonotonicity at the same time. Hence, we refer to this as f being strongly duplomonotone with σ ≥ 0.
Obviously, every (strongly) monotone function is (strongly) duplomonotone. In the next simple example we show that the converse is not true in general: the class of duplomonotone functions is strictly broader than the class of monotone functions. Thus, we have: that is, f is duplomonotone if and only if A T A 2 s is positive semidefinite. Furthermore, f is strongly duplomonotone for σ > 0 if and only if for any x ∈ R m and any positive τ , one has where I denotes the identity mapping. Therefore, f is strongly duplomonotone for If A is symmetric, then A T A 2 s = A 3 , whose eigenvalues have the same sign as those of A. Thus, for A symmetric, the function f is duplomonotone if and only if f is monotone. However, this may not be the case if A is asymmetric. As a simple example, if we take then, which is not positive semidefinite, while Moreover, it is not difficult to check that f is not even quasimonotone. 1 In fact, f is strongly duplomonotone with modulus σ = 2. Indeed, which is positive semidefinite if and only if σ ≤ 2. ♦ A strictly monotone function has at most one zero. This is not the case for duplomonotone functions: even under strong duplomonotonicity we can see that the function f (x) = Ax with A given by (3) has a zero at (0, y) T for every y ∈ R. In fact, the zero function is strongly duplomonotone for any σ > 0.
We have shown a function in Example 1 that is duplomonotone but not quasimonotone. It is interesting to note that there are also functions that are quasimonotone but not duplomonotone, e.g. f (x) = −|x| for x ∈ R.
that is, f is duplomonotone if and only if A s is positive semidefinite on the range of f . For example, one can check that for A given in (3) and any b = (b 1 , b 2 ) T ∈ R 2 , the function f is duplomonotone if and only if b 1 = b 2 . ♦ Next we present a direct characterization of duplomonotonicity in terms of the Euclidean norm.
Proposition 1 A function f : R m → R m is strongly duplomonotone for σ ≥ 0 if and only if there is some constantτ > 0 such that for all x ∈ R m and all 0 ≤ τ ≤τ one has Proof For any x ∈ R m and any τ > 0, we have.
1 A function f : R m → R m is quasimonotone if the following implication holds: for every x, y ∈ R m . Monotonicity implies quasimonotonicity.
The stated equivalence follows then from the definition of strong duplomonotonicity of f .
The following example shows the importance of considering the constantτ in the definition of duplomonotonicity: there are functions for which (1) does not hold for all τ > 0. One could also define a weaker notion of duplomonotonicity where the constantτ in (1) depends on each point x. Nevertheless, this property might be too weak to guarantee the convergence of the line search algorithms in Sect. 3, as we need to ensure that the step size is bounded away from zero.
On the other hand, after some algebraic manipulation, one can show that for all x : If τ > 2, the expression above can be negative for some x ∈ R 2 . Indeed, choose any ε > 0 and let z : which is negative for z 2 1 sufficiently big. ♦ The next result shows that if a function is both Lipschitz continuous and strongly duplomonotone for σ > 0, then σ is bounded above by the Lipschitz constant.
Proof Because of the Lipschitz continuity, we have Letτ > 0 be the strong duplomonotonicity constant in (2), and pick any z ∈ R m such that f (z) = 0. Then In the following result we show a direct consequence of duplomonotonicity for differentiable functions.
Proposition 3 Let f : R m → R m be differentiable. The following assertions hold.
(ii) If f is strongly duplomonotone for σ > 0, then Proof Assume that f satisfies (2) with σ ≥ 0 andτ > 0. Fix x ∈ R m and choose an arbitrary τ ∈ (0,τ ]. Dividing (2) by τ we get and taking the limit as τ Remark 2 (i) In general, strict duplomonotonicity does not imply that equality in (5) is only achieved when f (x) = 0, in the same way that strict monotonicity does not imply positive definiteness of ∇ f (x). (ii) Observe that both assertions also hold under the weaker notion of duplomonotonicity where the constantτ depends on each x ∈ R m .
For differentiable functions in one dimension, the notions of (strong) duplomonotonicity and (strong) monotonicity agree. In fact, Proposition 4 establishes that the concepts of monotonicity and duplomonotonicity coincide for continuous functions in one dimension. 2 This is not the case if the function is not continuous, as we show in Example 4.

Corollary 1 Let f : R → R be differentiable. Then f is (strongly) monotone if and only if f is (strongly) duplomonotone.
Proof This is just a consequence of Proposition 3 and the fact that f is (strongly) monotone with constant σ ≥ 0 if and only if f (x) ≥ σ .

Proposition 4 Let f : R → R be continuous. Then f is monotone if and only if f is duplomonotone.
Proof Suppose that f is duplomonotone with constantτ > 0. If there is some z ∈ R such that f (z) > 0, then we claim that there is an open interval containing z such that f (z) is both nondecreasing and positive on it. Indeed, by continuity of f , there is some Hence, f is nondecreasing and positive on (z − δ, z + δ), as claimed.
Observe now that f has to be positive and nondecreasing on (z − δ, +∞), again by continuity of f . Therefore, if we set a : The function f is not monotone (not even locally): On the other hand, f is duplomonotone: for any x ∈ Q the duplomonotonicity condition (1) trivially holds since f (x) = 0, while for any x ∈ Q and any τ > 0 we have Furthermore, one can easily check that this function is not strongly duplomonotone. A slight modification of this example yields a function that is strongly duplomonotone, but still not monotone: let g : R → R be defined for x ∈ R by Again, the function g is not monotone (not even locally), since In this case, g is strongly duplomonotone for σ = 1 with constantτ = 1: for any x ∈ Q and any τ ∈ [0, 1] we have Therefore, without differentiability, the concepts of monotonicity and duplomonotonicity may be quite different, even in one dimension. ♦ In the next proposition we introduce a property that implies duplomonotonicity, but is still weaker than monotonicity (see Example 5). This property has a characterization for differentiable functions analogous to the positive-semidefiniteness of the Jacobian for monotone functions, see e.g. [6,Proposition 12.3].
Proposition 5 Let f : R m → R m be differentiable. Then, for any σ ≥ 0, the following two properties are equivalent: Proof Assume that (i) holds. Choose any x ∈ R m and any τ ∈ [0,τ ).
Thus, dividing by t and taking the limit as t 0, so (ii) follows. Conversely, assume that (ii) holds. Pick any x ∈ R m and any 0 ≤ τ 1 ≤ τ 2 ≤τ . Consider the function for λ ∈ R. Then, by (ii), which implies (i).
Our motivation to characterize duplomonotone mappings arose from mathematical modeling of networks of (bio)chemical reactions, an increasingly prominent application of mathematical and numerical optimization. The next example introduces a very simple (bio)chemical reaction network, involving three molecules and three reactions, where each row of x corresponds to the logarithmic abundance of some molecule and each row of − f (x) corresponds to the rate of change of abundance per unit time.
The function f is not monotone because ∇ f (x) is not positive semidefinite for all x ∈ R 3 . For instance, if z := (0, 0, log(2)) T and w := (3, 3, 2) T , we have Nevertheless, the function f is duplomonotone because, in fact, it satisfies Proposition 5(ii) with σ = 0. Indeed, if we define After some algebraic manipulation, we obtain Thus, and because of (7), we have that Proposition 5(ii) holds for all τ > 0.
Indeed, the function f is strictly duplomonotone because ∂ϕ ∂τ (x, τ ) > 0 for all Hence, ϕ(x, τ ) > ϕ(x, 0) = 0 for all x ∈ Ω and all τ > 0; that is, f is strictly duplomonotone. ♦ The sum of two monotone operators is clearly monotone. Further, if a mapping F is monotone, one can easily show that for all α > 0 the mapping F + α I is strongly monotone. Do these properties also hold for duplomonotone functions? The answer is negative in general. As we show in the next example, duplomonotonicity can be destroyed by the addition of a monotone linear function of arbitrarily small slope.
It is straightforward to extend the definition of duplomonotonicity for set-valued mappings.
Definition 2 A set-valued mapping F : R m ⇒ R m is called duplomonotone with constantτ > 0 if for all x ∈ R m and all τ ∈ [0,τ ] one has The mapping F is said to be strongly duplomonotone for some σ > 0 with constant τ > 0 if for all x ∈ R m and all τ ∈ [0,τ ] one has One can easily extend the characterization of duplomonotonicity given in Proposition 1 to set-valued mappings.

Proposition 6
A set-valued mapping F : R m ⇒ R m is strongly duplomonotone for σ ≥ 0 if and only if there is someτ > 0 such that for all x ∈ R m and all τ ∈ [0,τ ] one has We will not explore duplomonotone set-valued mappings any further here, as it is beyond the scope of the present paper.

Derivative-free algorithms for systems of duplomonotone equations
In this section we consider the problem of finding solutions of systems of nonlinear equations where f : R m → R m is strongly duplomonotone for σ > 0. Corollary 2 drives us to consider the following derivative-free line search algorithm for finding zeros of f .
The steepest descent algorithm could be applied to find solutions to nonlinear equations of type (9) whenever the function f has a computable Jacobian. The main advantage of Algorithm 1 relative to the steepest descent method is that no derivative information is needed. On the other hand, note that one cannot assure in general that the steepest descent method will converge to a zero of the function f , but to a critical point of f (·) 2 (for more details, see e.g. [5,Chapter 11]). This is not a concern under strong duplomonotonicity for σ > 0: in this case, any critical point of f (·) 2 will be a zero of f . Indeed, otherwise one would have ∇ f (·) 2 (x) = 0 and f (x) = 0 for somex ∈ R m . Then whence, by Proposition 3(ii), which is a contradiction.
If f is Lipschitz continuous with a known constant > 0 and is also strongly duplomonotone for σ > 0 with constantτ > 0, then, as a direct consequence of the characterization in Proposition 1, we get for all x ∈ R m and all 0 ≤ τ ≤τ . The right-hand side of (11) attains its minimum (with respect to τ ∈ [0,τ ]) at τ := min σ/ 2 ,τ . Thus, if σ/ 2 ≤τ , we have This makes us consider the following variation of Algorithm 1, where the step size is chosen constant.
As a direct consequence of (12) we have that Algorithm 2 is (globally) linearly convergent to a zero of f , and moreover, the Lipschitz assumption can be relaxed as follows.
Theorem 1 Let f : R m → R m be strongly duplomonotone for σ > 0 with constant τ > 0. Let x 0 ∈ R m be an initial point, and assume there exists some constant > 0 such that Set λ := min σ/ 2 ,τ . Then the iteration whence, f (x k ) is linearly convergent to zero. Thus, if f is continuous, any accumulation point of the sequence x k is a zero of f .
Proof It follows from the argumentation above.
Even when f is known to be Lipschitz continuous, its Lipschitz constant might not be easy to compute. The next result shows that in this case Algorithm 1 can be used, and the step size λ k can always be found by a backtracking technique where λ k is bounded away from zero and the algorithm is linearly convergent. We denote by · the ceiling function, i.e., the smallest following integer to a given number.
Theorem 2 Let f : R m → R m be strongly duplomonotone for σ > 0 with constantτ > 0. Let x 0 ∈ R m be an initial point, and assume that there is a positive constant such that (13) holds. Then, for all 0 < α < 2σ and all 0 < β < 1, Algorithm 1 generates a sequence x k such that f (x k ) is linearly convergent to zero with rate √ 1 − β p , where Thus, if f is continuous, any accumulation point of the sequence x k is a zero of f .
Theorem 3 Let f : R m → R m be strongly duplomonotone for σ > 0 with constant τ > 0. Let x 0 ∈ R m be an initial point, and assume that there exists some positive constants such that (13) holds. Then, for all positive constants λ min and λ max such that there exists some integer q with λ min ≤ β q λ max < min 2σ/ 2 ,τ , Algorithm 3 generates a sequence x k such that f (x k ) is linearly convergent to zero with rate 1 − αβ p+q λ max , where Thus, if f is continuous, any accumulation point of the sequence x k is a zero of f .
Proof Denote by α 0 the initial value of α in Algorithm 3. Proposition 1 together with (13) gives us Further, we have that 1 − 2σ τ + 2 τ 2 ≤ 1 − α 0 β p τ with 0 < τ ≤τ if and only if 0 < τ ≤ min (2σ − α 0 β p )/ 2 ,τ . By assumption, there exists some positive integer q such that λ min ≤ β q λ max < min 2σ/ 2 ,τ . By the definition of p in (18), we have β q λ max ≤ (2σ − α 0 β p )/ 2 . Hence, Thus, for all x ∈ L(x 0 ), we have Finally, observe that there is some positive integer s such that β s λ max < λ min . Therefore, given x k , a new point x k+1 is guaranteed to be found in a finite number of steps of Algorithm 3, because the double backtracking loop can only be executed a maximum of sp + q times (after a maximum of sp iterations the value of α will be equal to αβ p , after which, a maximum of q iterations will be enough to find an appropriate step size λ k ). Thus, we have αλ k ≥ α 0 β p+q λ max . Consequently, by the acceptance criteria of the step size in Algorithm 3, we have and the claims follow.
Remark 4 (i) The condition λ min ≤ β q λ max < min 2σ/ 2 ,τ in Theorem 3 is needed to avoid the possibility of an infinite loop in an iteration of the algorithm. Nevertheless, we believe this condition should not be too difficult to guarantee in practice, as it basically requires that λ min is not "too big" and β is not "too small". (ii) Certainly, the constant β used for updating α can be chosen different from the constant β used for updating λ k , and Theorem 3 would remain valid with slight changes. Nonetheless, we have decided to use the same constant to ease the notation and the analysis. (iii) In Algorithm 3,the constant α is required to be smaller than λ −1 max to avoid unnecessary iterations (otherwise, the initial step λ k = λ max would always be too big, since 1 − αλ k would be negative).