Abstract
We introduce a new class of mappings, called duplomonotone, which is strictly broader than the class of monotone mappings. We study some of the main properties of duplomonotone functions and provide various examples, including nonlinear duplomonotone functions arising from the study of systems of biochemical reactions. Finally, we present three variations of a derivative-free line search algorithm for finding zeros of systems of duplomonotone equations, and we prove their linear convergence to a zero of the function.
Similar content being viewed by others
1 Introduction
Monotone mappings have been extensively studied in the literature, see for instance [6, Chapter 12] or the recent monograph [1]. In many practical problems, though, the monotonicity assumption turns out to be too strong. Consequently, several generalized notions of monotonocity have been introduced and thoroughly studied by various authors in order to relax it while keeping some of the useful properties of monotone mappings, see [2, 4] and the references therein.
In mathematical models of biochemical reaction networks [3], a problem arises of finding a zero of functions that are typically not monotone (see Example 5). These functions seem to have a generalized monotonicity property that has not yet appeared in the literature but can be exploited to find a zero of such functions. In this paper we introduce this new class of generalized monotone mappings, which we call duplomonotone, and present a rather simple derivative-free line search algorithm that can be used to find a zero of a duplomonotone function.
The paper is organized as follows: in Sect. 2 we introduce duplomonotone mappings, analyze their basic properties and provide various illustrative examples; in Sect. 3 we present three variations of a derivative-free line search algorithm for finding a zero of a duplomonotone function, and we prove their linear convergence under strong duplomonotonicity plus some Lipschitz-type assumption on the points of the lower level set defined by the initial point.
Throughout, \(\Vert \cdot \Vert \) denotes the Euclidean norm, while the usual inner product is denoted by \(\langle \cdot ,\cdot \rangle .\) We say that \(F\) is a set-valued mapping from \(\mathbb {R}^{m}\) to \(\mathbb {R}^{n}\), denoted by \(F:\mathbb {R}^{m}\rightrightarrows \mathbb {R}^{n}\), if for every \(x\in \mathbb {R}^{m}\), \(F(x)\) is a subset of \(\mathbb {R}^{n}\). The gradient of a differentiable function \(f:\mathbb {R}^m\rightarrow \mathbb {R}^n\) at some point \(x\in \mathbb {R}^m\) is denoted by \(\nabla f(x)\in \mathbb {R}^{m\times n}\).
2 Duplomonotonicity
Recall that a function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is said to be monotone when
and strictly monotone if this inequality is strict whenever \(x\ne y\). Further, \(f\) is called strongly monotone for some \(\sigma >0\) when
We introduce next a new property that is implied by monotonicity.
Definition 1
A function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is called duplomonotone with constant \(\bar{\tau }>0\) if
and strictly duplomonotone if this inequality is strict whenever \(f(x)\ne 0\). The function \(f\) is said to be strongly duplomonotone for some \(\sigma >0\) with constant \(\bar{\tau }>0\) if
The modulus of strong duplomonotonicity is the supremum of the constants \(\sigma \) for which (2) holds.
Remark 1
Letting \(\sigma \) be zero in (2) will allow us to handle both duplomonotonicity and strong duplomonotonicity at the same time. Hence, we refer to this as \(f\) being strongly duplomonotone with \(\sigma \ge 0\).
Obviously, every (strongly) monotone function is (strongly) duplomonotone. In the next simple example we show that the converse is not true in general: the class of duplomonotone functions is strictly broader than the class of monotone functions. Thus, we have:
Example 1
Given a matrix \(A\in \mathbb {R}^{m\times m}\), consider the linear function \(f(x):=Ax\). Recall that the symmetric part of \(A\) is the matrix \(A_{s}:=\frac{1}{2}(A+A^{T})\). The mapping \(f\) is monotone if and only if \(A_{s}\) is positive semidefinite (see e.g. [6, Example 12.2]). On the other hand, \(f\) is duplomonotone if and only if there is some \(\bar{\tau }>0\) such that, for any \(x\in \mathbb {R}^{m}\), one has
that is, \(f\) is duplomonotone if and only if \(\left( A^{T}A^{2}\right) _{s}\) is positive semidefinite. Furthermore, \(f\) is strongly duplomonotone for \(\sigma >0\) if and only if for any \(x\in \mathbb {R}^{m}\) and any positive \(\tau \), one has
where \(I\) denotes the identity mapping. Therefore, \(f\) is strongly duplomonotone for \(\sigma >0\) if and only if \(\left( A^{T}(A-\sigma I)A\right) _{s}=A^{T}(A_{s}-\sigma I)A\) is positive semidefinite.
If \(A\) is symmetric, then \(\left( A^{T}A^{2}\right) _{s}=A^{3}\), whose eigenvalues have the same sign as those of \(A\). Thus, for \(A\) symmetric, the function \(f\) is duplomonotone if and only if \(f\) is monotone. However, this may not be the case if A is asymmetric. As a simple example, if we take
then,
which is not positive semidefinite, while
is positive semidefinite. Thus, \(f((x,y)^{T})=(2x,2x)^{T}\) is duplomonotone but is not monotone. Moreover, it is not difficult to check that \(f\) is not even quasimonotone.Footnote 1 In fact, \(f\) is strongly duplomonotone with modulus \(\sigma =2\). Indeed,
which is positive semidefinite if and only if \(\sigma \le 2\).\(\Diamond \)
A strictly monotone function has at most one zero. This is not the case for duplomonotone functions: even under strong duplomonotonicity we can see that the function \(f(x)=Ax\) with \(A\) given by (3) has a zero at \((0,y)^{T}\) for every \(y\in \mathbb {R}\). In fact, the zero function is strongly duplomonotone for any \(\sigma >0\).
We have shown a function in Example 1 that is duplomonotone but not quasimonotone. It is interesting to note that there are also functions that are quasimonotone but not duplomonotone, e.g. \(f(x)=-|x|\) for \(x\in \mathbb {R}\).
Example 2
Given a matrix \(A\in \mathbb {R}^{m\times m}\) and a vector \(b\in \mathbb {R}^{m}\), consider the affine function \(f(x):=Ax+b\). By [6, Example 12.2], \(f\) is monotone if and only if \(A_{s}\) is positive semidefinite. On the other hand, \(f\) is duplomonotone if and only if
that is, \(f\) is duplomonotone if and only if \(A_{s}\) is positive semidefinite on the range of \(f\). For example, one can check that for \(A\) given in (3) and any \(b=(b_{1},b_{2})^{T}\in \mathbb {R}^{2}\), the function \(f\) is duplomonotone if and only if \(b_{1}=b_{2}\).\(\Diamond \)
Next we present a direct characterization of duplomonotonicity in terms of the Euclidean norm.
Proposition 1
A function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is strongly duplomonotone for \(\sigma \ge 0\) if and only if there is some constant \(\bar{\tau }>0\) such that for all \(x\in \mathbb {R}^{m}\) and all \(0\le \tau \le \bar{\tau }\) one has
Proof
For any \(x\in \mathbb {R}^{m}\) and any \(\tau >0\), we have.
The stated equivalence follows then from the definition of strong duplomonotonicity of \(f\).\(\square \)
The following example shows the importance of considering the constant \(\bar{\tau }\) in the definition of duplomonotonicity: there are functions for which (1) does not hold for all \(\tau >0\). One could also define a weaker notion of duplomonotonicity where the constant \(\bar{\tau }\) in (1) depends on each point \(x\). Nevertheless, this property might be too weak to guarantee the convergence of the line search algorithms in Sect. 3, as we need to ensure that the step size is bounded away from zero.
Example 3
Let \(f:\mathbb {R}^{2}\rightarrow \mathbb {R}^{2}\) be given by \(f(x):=(x_{1}x_{2}^{2},x_{2})^{T}\) for \(x:=(x_{1},x_{2})^{T}\in \mathbb {R}^{2}\). It is easy to check that \(f\) is not monotone: if we take \(x:=(-3,0)^{T}\) and \(y:=(-1,1)^{T}\), we have
On the other hand, after some algebraic manipulation, one can show that for all \(x:=(x_{1},x_{2})^{T}\in \mathbb {R}^{2}\), one has
which is nonnegative for all \(\tau \in [0,2]\). Thus, \(f\) is duplomonotone with constant \(\bar{\tau }=2\). If \(\tau >2\), the expression above can be negative for some \(x\in \mathbb {R}^{2}\). Indeed, choose any \(\varepsilon >0\) and let \(z:=(z_{1},\sqrt{\varepsilon /2}/(\varepsilon +1))\) for some \(z_{1}\in \mathbb {R}\). Then,
which is negative for \(z_{1}^{2}\) sufficiently big.\(\Diamond \)
The next result shows that if a function is both Lipschitz continuous and strongly duplomonotone for \(\sigma >0\), then \(\sigma \) is bounded above by the Lipschitz constant.
Proposition 2
If a function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is Lipschitz continuous with constant \(\ell >0\) and strongly duplomonotone for \(\sigma >0\), with \(f\not \equiv 0,\) then \(\sigma \le \ell \).
Proof
Because of the Lipschitz continuity, we have
Let \(\bar{\tau }>0\) be the strong duplomonotonicity constant in (2), and pick any \(z\in \mathbb {R}^{m}\) such that \(f(z)\ne 0\). Then
whence \(\sigma \le \ell \). \(\square \)
In the following result we show a direct consequence of duplomonotonicity for differentiable functions.
Proposition 3
Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be differentiable. The following assertions hold.
-
(i)
If \(f\) is duplomonotone, then
$$\begin{aligned} f(x)^{T}\nabla f(x)f(x)\ge 0\quad \text {for all }\quad x\in \mathbb {R}^{m}. \end{aligned}$$(5) -
(ii)
If \(f\) is strongly duplomonotone for \(\sigma >0\), then
$$\begin{aligned} f(x)^{T}\nabla f(x)f(x)\ge \sigma \Vert f(x)\Vert ^{2}\quad \text {for all }\quad x\in \mathbb {R}^{m}. \end{aligned}$$(6)
Proof
Assume that \(f\) satisfies (2) with \(\sigma \ge 0\) and \(\bar{\tau }>0\). Fix \(x\in \mathbb {R}^{m}\) and choose an arbitrary \(\tau \in (0,\bar{\tau }]\). Dividing (2) by \(\tau \) we get
and taking the limit as \(\tau \searrow 0\), we obtain \(f(x)^{T}\nabla f(x)f(x)\ge \sigma \Vert f(x)\Vert ^{2}\). \(\square \)
Remark 2
-
(i)
In general, strict duplomonotonicity does not imply that equality in (5) is only achieved when \(f(x)=0\), in the same way that strict monotonicity does not imply positive definiteness of \(\nabla f(x)\).
-
(ii)
Observe that both assertions also hold under the weaker notion of duplomonotonicity where the constant \(\bar{\tau }\) depends on each \(x\in \mathbb {R}^{m}\).
For differentiable functions in one dimension, the notions of (strong) duplomonotonicity and (strong) monotonicity agree. In fact, Proposition 4 establishes that the concepts of monotonicity and duplomonotonicity coincide for continuous functions in one dimension.Footnote 2 This is not the case if the function is not continuous, as we show in Example 4.
Corollary 1
Let \(f:\mathbb {R}\rightarrow \mathbb {R}\) be differentiable. Then \(f\) is (strongly) monotone if and only if \(f\) is (strongly) duplomonotone.
Proof
This is just a consequence of Proposition 3 and the fact that \(f\) is (strongly) monotone with constant \(\sigma \ge 0\) if and only if \(f'(x)\ge \sigma \). \(\square \)
Proposition 4
Let \(f:\mathbb {R}\rightarrow \mathbb {R}\) be continuous. Then \(f\) is monotone if and only if \(f\) is duplomonotone.
Proof
Suppose that \(f\) is duplomonotone with constant \(\bar{\tau }>0\). If there is some \(z\in \mathbb {R}\) such that \(f(z)>0\), then we claim that there is an open interval containing \(z\) such that \(f(z)\) is both nondecreasing and positive on it. Indeed, by continuity of \(f\), there is some \(\delta _{0}>0\) such that \(f(x)>f(z)/2>0\) for all \(x\in (z-\delta _{0},z+\delta _{0})\). Set \(\delta :=\min \left\{ \delta _{0},\bar{\tau }f(z)/4\right\} \). Choose any \(x,y\in (z-\delta ,z+\delta )\) with \(x>y\), and set \(\tau :=\frac{x-y}{f(x)}\in (0,\bar{\tau })\). Then, \(x-\tau f(x)=y\). From the duplomonotonicity of \(f\), we deduce
Hence, \(f\) is nondecreasing and positive on \((z-\delta ,z+\delta ),\) as claimed.
Observe now that \(f\) has to be positive and nondecreasing on \((z-\delta ,+\infty )\), again by continuity of \(f\). Therefore, if we set \(a:=\inf \left\{ x\in \mathbb {R}\mid f(x)>0\right\} \in \mathbb {R}\cup \left\{ -\infty ,+\infty \right\} \), it follows that \(\left\{ x\in \mathbb {R}\mid f(x)>0\right\} =(a,+\infty )\) and \(f\) is nondecreasing on \((a,+\infty )\). Using the same argument, we deduce that \(\left\{ x\in \mathbb {R}\mid f(x)<0\right\} =(-\infty ,b)\) with \(b\in \mathbb {R}\cup \left\{ -\infty ,+\infty \right\} \) and \(f\) is nondecreasing on \((-\infty ,b)\). Thus, \(f\) is monotone. \(\square \)
Example 4
Consider the function \(f:\mathbb {R}\rightarrow \mathbb {R}\) defined for \(x\in \mathbb {R}\) by
The function \(f\) is not monotone (not even locally):
On the other hand, \(f\) is duplomonotone: for any \(x\in \mathbb {Q}\) the duplomonotonicity condition (1) trivially holds since \(f(x)=0\), while for any \(x\not \in \mathbb {Q}\) and any \(\tau >0\) we have
Furthermore, one can easily check that this function is not strongly duplomonotone. A slight modification of this example yields a function that is strongly duplomonotone, but still not monotone: let \(g:\mathbb {R}\rightarrow \mathbb {R}\) be defined for \(x\in \mathbb {R}\) by
Again, the function \(g\) is not monotone (not even locally), since
In this case, \(g\) is strongly duplomonotone for \(\sigma =1\) with constant \(\bar{\tau }=1\): for any \(x\not \in \mathbb {Q}\) and any \(\tau \in [0,1]\) we have
Therefore, without differentiability, the concepts of monotonicity and duplomonotonicity may be quite different, even in one dimension.\(\Diamond \)
In the next proposition we introduce a property that implies duplomonotonicity, but is still weaker than monotonicity (see Example 5). This property has a characterization for differentiable functions analogous to the positive-semidefiniteness of the Jacobian for monotone functions, see e.g. [6, Proposition 12.3].
Proposition 5
Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be differentiable. Then, for any \(\sigma \ge 0\), the following two properties are equivalent:
-
(i)
\(\langle f(x-\tau _{1}f(x))-f(x-\tau _{2}f(x)),f(x) \rangle \ge \sigma (\tau _{2}-\tau _{1})\Vert f(x)\Vert ^{2}\) for all \(x\in \mathbb {R}^{m}\), \(0\le \tau _{1}\le \tau _{2}\le \bar{\tau }\);
-
(ii)
\(f(x)^{T}\nabla f(x-\tau f(x))f(x)\ge \sigma \Vert f(x)\Vert ^{2}\) for all \(x\in \mathbb {R}^{m}\), \(\tau \in [0,\bar{\tau }]\).
Proof
Assume that (i) holds. Choose any \(x\in \mathbb {R}^{m}\) and any \(\tau \in [0,\bar{\tau })\). For all \(t\in (0,\bar{\tau }-\tau ]\) one has
Thus, dividing by \(t\) and taking the limit as \(t\searrow 0\),
so (ii) follows.
Conversely, assume that (ii) holds. Pick any \(x\in \mathbb {R}^{m}\) and any \(0\le \tau _{1}\le \tau _{2}\le \bar{\tau }\). Consider the function
for \(\lambda \in \mathbb {R}\). Then, by (ii),
for all \(\lambda \in [0,1]\), whence,
which implies (i). \(\square \)
Our motivation to characterize duplomonotone mappings arose from mathematical modeling of networks of (bio)chemical reactions, an increasingly prominent application of mathematical and numerical optimization. The next example introduces a very simple (bio)chemical reaction network, involving three molecules and three reactions, where each row of \(x\) corresponds to the logarithmic abundance of some molecule and each row of \(-f(x)\) corresponds to the rate of change of abundance per unit time.
Example 5
Consider the function \(f:\mathbb {R}^{3}\rightarrow \mathbb {R}^{3}\) defined for \(x\in \mathbb {R}^{3}\) by \(f(x):=([F,R]-[R,F])\exp ([F,R]^{T}x),\) where \(\exp (\cdot )\) denotes the component-wise exponential,
and \([\,\cdot \,,\cdot \,]\) is the horizontal concatenation operator. That is, for any \(x:=(x_{1},x_{2},x_{3})^{T}\in \mathbb {R}^{3}\) we have
The function \(f\) is not monotone because \(\nabla f(x)\) is not positive semidefinite for all \(x\in \mathbb {R}^{3}\). For instance, if \(z:=(0,0,\log (2))^{T}\) and \(w:=\left( 3,3,2\right) ^{T}\), we have
Nevertheless, the function \(f\) is duplomonotone because, in fact, it satisfies Proposition 5(ii) with \(\sigma =0\). Indeed, if we define
we have
After some algebraic manipulation, we obtain
Thus,
and because of (7), we have that Proposition 5(ii) holds for all \(\tau >0\).
Indeed, the function \(f\) is strictly duplomonotone because \(\frac{\partial \varphi }{\partial \tau }(x,\tau )>0\) for all \(x\not \in \varOmega \), where
Hence, \(\varphi (x,\tau )>\varphi (x,0)=0\) for all \(x\not \in \varOmega \) and all \(\tau >0\); that is, \(f\) is strictly duplomonotone.\(\Diamond \)
The sum of two monotone operators is clearly monotone. Further, if a mapping \(F\) is monotone, one can easily show that for all \(\alpha >0\) the mapping \(F+\alpha I\) is strongly monotone. Do these properties also hold for duplomonotone functions? The answer is negative in general. As we show in the next example, duplomonotonicity can be destroyed by the addition of a monotone linear function of arbitrarily small slope.
Example 6
Consider the matrix
By Example 1, the function \(f(x):=Ax\) is duplomonotone, since \(A^{T}A^{2}=0_{2\times 2}\). Choose any \(\alpha >0\) and consider the function \(g(x):=Bx\), with \(B:=A+\alpha I\). Then,
The eigenvalues of \(\left( B^{T}B^{2}\right) _{s}\) are \(\alpha ^{3}+\alpha \pm 1/2\alpha \sqrt{9\alpha ^{2}+4}\). If \(\alpha \in (0,1/2)\), we have that \(\alpha ^{3}+\alpha -1/2\alpha \sqrt{9\alpha ^{2}+4}<0\). Therefore, the function \(g=f+\alpha I\) is not duplomonotone for any \(\alpha \in (0,1/2)\).\(\Diamond \)
A direct consequence of Proposition 3 is that \(-f(x)\) is a descent direction for \(\Vert f(\cdot )\Vert ^{2}\) at any point \(x\in \mathbb {R}^{m}\) when \(f\) is duplomonotone. This property inspires the derivative-free algorithms in Sect. 3 for finding zeros of the function \(f\).
Corollary 2
Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be differentiable and strongly duplomonotone for \(\sigma >0\). Then, for all \(x\in \mathbb {R}^{m}\), either \(f(x)=0\) or the vector \(-f(x)\) provides a descent direction for the merit function \(\Vert f(\cdot )\Vert ^{2}\) at the point \(x\).
Proof
Observe that, for any \(x\in \mathbb {R}^{m},\) we have \(\nabla \left( \Vert f(\cdot )\Vert ^{2}\right) (x)=2\nabla f(x)f(x)\). Thus, inequality (6) implies that
The assertion follows. \(\square \)
It is straightforward to extend the definition of duplomonotonicity for set-valued mappings.
Definition 2
A set-valued mapping \(F:\mathbb {R}^{m}\rightrightarrows \mathbb {R}^{m}\) is called duplomonotone with constant \(\bar{\tau }>0\) if for all \(x\in \mathbb {R}^{m}\) and all \(\tau \in [0,\bar{\tau }]\) one has
The mapping \(F\) is said to be strongly duplomonotone for some \(\sigma >0\) with constant \(\bar{\tau }>0\) if for all \(x\in \mathbb {R}^{m}\) and all \(\tau \in [0,\bar{\tau }]\) one has
One can easily extend the characterization of duplomonotonicity given in Proposition 1 to set-valued mappings.
Proposition 6
A set-valued mapping \(F:\mathbb {R}^{m}\rightrightarrows \mathbb {R}^{m}\) is strongly duplomonotone for \(\sigma \ge 0\) if and only if there is some \(\bar{\tau }>0\) such that for all \(x\in \mathbb {R}^{m}\) and all \(\tau \in [0,\bar{\tau }]\) one has
We will not explore duplomonotone set-valued mappings any further here, as it is beyond the scope of the present paper.
3 Derivative-free algorithms for systems of duplomonotone equations
In this section we consider the problem of finding solutions of systems of nonlinear equations
where \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is strongly duplomonotone for \(\sigma >0\). Corollary 2 drives us to consider the following derivative-free line search algorithm for finding zeros of \(f\).
Observe that, when \(f\) is differentiable, the step size acceptance criteria (10) is implied by the usual Armijo rule for the function \(\Vert f(\cdot )\Vert ^{2}\) and the direction \(d_{k}:=-f(x_{k})\). Indeed, given some constant \(c\in (0,1)\), the Armijo rule for \(\Vert f(\cdot )\Vert ^{2}\) will search for a step size \(\lambda _{k}\) satisfying
Proposition 3(ii) gives us
Taking \(\alpha :=2\sigma c\), we get \(0<\alpha <2\sigma \), and (10) follows.
The steepest descent algorithm could be applied to find solutions to nonlinear equations of type (9) whenever the function \(f\) has a computable Jacobian. The main advantage of Algorithm 1 relative to the steepest descent method is that no derivative information is needed. On the other hand, note that one cannot assure in general that the steepest descent method will converge to a zero of the function \(f\), but to a critical point of \(\Vert f(\cdot )\Vert ^{2}\) (for more details, see e.g. [5, Chapter 11]). This is not a concern under strong duplomonotonicity for \(\sigma >0\): in this case, any critical point of \(\Vert f(\cdot )\Vert ^{2}\) will be a zero of \(f\). Indeed, otherwise one would have \(\nabla \left( \Vert f(\cdot )\Vert ^{2}\right) (\tilde{x})=0\) and \(f(\tilde{x})\ne 0\) for some \(\tilde{x}\in \mathbb {R}^{m}\). Then
whence, by Proposition 3(ii),
which is a contradiction.
If \(f\) is Lipschitz continuous with a known constant \(\ell >0\) and is also strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\), then, as a direct consequence of the characterization in Proposition 1, we get
for all \(x\in \mathbb {R}^{m}\) and all \(0\le \tau \le \bar{\tau }\). The right-hand side of (11) attains its minimum (with respect to \(\tau \in [0,\bar{\tau }]\)) at \(\tau ^{\star }:=\min \left\{ \sigma /\ell ^{2},\bar{\tau }\right\} \). Thus, if \(\sigma /\ell ^{2}\le \bar{\tau }\), we have
This makes us consider the following variation of Algorithm 1, where the step size is chosen constant.
As a direct consequence of (12) we have that Algorithm 2 is (globally) linearly convergent to a zero of \(f\), and moreover, the Lipschitz assumption can be relaxed as follows.
Theorem 1
Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\). Let \(x_{0}\in \mathbb {R}^{m}\) be an initial point, and assume there exists some constant \(\ell >0\) such that
where \(L(x_{0})\) is the lower level set defined by
Set \(\lambda :=\min \left\{ \sigma /\ell ^{2},\bar{\tau }\right\} \). Then the iteration \(x_{k+1}:=x_{k}-\lambda f(x_{k})\) satisfies
whence, \(f(x_{k})\) is linearly convergent to zero. Thus, if \(f\) is continuous, any accumulation point of the sequence \(x_{k}\) is a zero of \(f\).
Proof
It follows from the argumentation above. \(\square \)
Even when \(f\) is known to be Lipschitz continuous, its Lipschitz constant might not be easy to compute. The next result shows that in this case Algorithm 1 can be used, and the step size \(\lambda _{k}\) can always be found by a backtracking technique where \(\lambda _{k}\) is bounded away from zero and the algorithm is linearly convergent. We denote by \(\left\lceil \cdot \right\rceil \) the ceiling function, i.e., the smallest following integer to a given number.
Theorem 2
Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\). Let \(x_{0}\in \mathbb {R}^{m}\) be an initial point, and assume that there is a positive constant \(\ell \) such that (13) holds. Then, for all \(0<\alpha <2\sigma \) and all \(0<\beta <1\), Algorithm 1 generates a sequence \(x_{k}\) such that \(f(x_{k})\) is linearly convergent to zero with rate \(\sqrt{1-\beta ^{p}}\), where
Thus, if \(f\) is continuous, any accumulation point of the sequence \(x_{k}\) is a zero of \(f\).
Proof
Let \(x\in L(x_{0})\). We will prove that the step size \((1/\alpha )\beta ^{p}\) with \(p\) as in (15) always satisfies (10), i.e., that we have
Proposition 1 gives us
for all \(\tau \in [0,\bar{\tau }]\). Take \(p\) as in (15), that is,
Then (13) holds for all \(0<\tau \le (1/\alpha )\beta ^{p}\). This, together with (17), implies that
Moreover, we have that \(1-2\sigma \tau +\ell ^{2}\tau ^{2}\le 1-\alpha \tau \) if and only if \(\tau \le (2\sigma -\alpha )/\ell ^{2}\). The definition of \(p\) implies that \((1/\alpha )\beta ^{p}\le (2\sigma -\alpha )/\ell ^{2}\); hence,
which implies (16). Therefore, given a point \(x_{k}\) generated by Algorithm 1, the integer \(p_{k}\) can always be found and it satisfies \(p_{k}\le p\). Thus, \(\lambda _{k}=(1/\alpha )\beta ^{p_{k}}\ge (1/\alpha )\beta ^{p}\), and we have
which in particular yields \(x_{k+1}\in L(x_{0})\), and the claims in the statement follow. \(\square \)
Remark 3
Notice that: (i) the constant \(\ell \) in (13) does not need to be known in order to use Algorithm 1, but is involved in the rate of convergence, and (ii) Lipschitz continuity of the function \(f\) on \(L(x_{0})\) implies (13).
Even if \(f\) is known (or conjectured) to be both Lipschitz continuous and strongly duplomonotone for \(\sigma >0\), in practical situations the values of both the Lipschitz constant and \(\sigma \) might be unknown or difficult to compute. The following modification of Algorithm 1 permits finding an adequate step size by a double backtracking technique, where an additional backtracking is performed in order to find an appropriate value of the parameter \(\alpha \) in (10) such that \(\alpha <2\sigma \).
Theorem 3
Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\). Let \(x_{0}\in \mathbb {R}^{m}\) be an initial point, and assume that there exists some positive constants \(\ell \) such that (13) holds. Then, for all positive constants \(\lambda _{\min }\) and \(\lambda _{\max }\) such that there exists some integer \(q\) with \(\lambda _{\min }\le \beta ^{q}\lambda _{\max }<\min \left\{ 2\sigma /\ell ^{2},\bar{\tau }\right\} \), Algorithm 3 generates a sequence \(x_{k}\) such that \(f(x_{k})\) is linearly convergent to zero with rate \(\sqrt{1-\alpha \beta ^{p+q}\lambda _{\max }}\), where
Thus, if \(f\) is continuous, any accumulation point of the sequence \(x_{k}\) is a zero of \(f\).
Proof
Denote by \(\alpha _{0}\) the initial value of \(\alpha \) in Algorithm 3. Proposition 1 together with (13) gives us
Further, we have that \(1-2\sigma \tau +\ell ^{2}\tau ^{2}\le 1-\alpha _{0}\beta ^{p}\tau \) with \(0<\tau \le \bar{\tau }\) if and only if \(0<\tau \le \min \left\{ (2\sigma -\alpha _{0}\beta ^{p})/\ell ^{2},\bar{\tau }\right\} \). By assumption, there exists some positive integer \(q\) such that \(\lambda _{\min }\le \beta ^{q}\lambda _{\max }<\min \left\{ 2\sigma /\ell ^{2},\bar{\tau }\right\} \). By the definition of \(p\) in (18), we have \(\beta ^{q}\lambda _{\max }\le (2\sigma -\alpha _{0}\beta ^{p})/\ell ^{2}\). Hence,
Thus, for all \(x\in L(x_{0})\), we have
Finally, observe that there is some positive integer \(s\) such that \(\beta ^{s}\lambda _{\max }<\lambda _{\min }\). Therefore, given \(x_{k}\), a new point \(x_{k+1}\) is guaranteed to be found in a finite number of steps of Algorithm 3, because the double backtracking loop can only be executed a maximum of \(sp+q\) times (after a maximum of \(sp\) iterations the value of \(\alpha \) will be equal to \(\alpha \beta ^{p}\), after which, a maximum of \(q\) iterations will be enough to find an appropriate step size \(\lambda _{k}\)). Thus, we have \(\alpha \lambda _{k}\ge \alpha _{0}\beta ^{p+q}\lambda _{\max }\). Consequently, by the acceptance criteria of the step size in Algorithm 3, we have
and the claims follow.\(\square \)
Remark 4
-
(i)
The condition \(\lambda _{\min }\le \beta ^{q}\lambda _{\max }<\min \left\{ 2\sigma /\ell ^{2},\bar{\tau }\right\} \) in Theorem 3 is needed to avoid the possibility of an infinite loop in an iteration of the algorithm. Nevertheless, we believe this condition should not be too difficult to guarantee in practice, as it basically requires that \(\lambda _{\min }\) is not “too big” and \(\beta \) is not “too small”.
-
(ii)
Certainly, the constant \(\beta \) used for updating \(\alpha \) can be chosen different from the constant \(\beta \) used for updating \(\lambda _{k}\), and Theorem 3 would remain valid with slight changes. Nonetheless, we have decided to use the same constant to ease the notation and the analysis.
-
(iii)
In Algorithm 3,the constant \(\alpha \) is required to be smaller than \(\lambda _{\max }^{-1}\) to avoid unnecessary iterations (otherwise, the initial step \(\lambda _{k}=\lambda _{\max }\) would always be too big, since \(1-\alpha \lambda _{k}\) would be negative).
Notes
A function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is quasimonotone if the following implication holds:
$$\begin{aligned} \langle f(x),y-x\rangle >0\Rightarrow \langle f(y),y-x\rangle \ge 0, \end{aligned}$$for every \(x,y\in \mathbb {R}^{m}\). Monotonicity implies quasimonotonicity.
This result and the proof included here is due to the referee of this paper, who noticed that the Dirichlet function in Example 4 is not monotone because it is not continuous.
References
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Cambini, A., Martein, L.: Generalized Convexity and Optimization: Theory and Applications. Springer, Berlin (2008)
Fleming, R.M.T., Thiele, I.: Mass conserved elementary kinetics is sufficient for the existence of a non-equilibrium steady state concentration. J. Theor. Biol. 314, 173–181 (2012)
Hadjisavvas, N., Komlósi, S., Schaible, S.S.: Handbook of Generalized Convexity and Generalized Monotonicity. Springer, New York (2005)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer, Berlin (1998)
Acknowledgments
We are indebted to the referee for his/her valuable comments and suggestions, specially for contributing to the paper with Proposition 4. We would also like to thank Professor Michael A. Saunders for his careful reading of the manuscript and useful suggestions. This work was supported by the National Research Fund, Luxembourg, co-funded under the Marie Curie Actions of the European Commission (FP7-COFUND), and by the U.S. Department of Energy, Offices of Advanced Scientific Computing Research and the Biological and Environmental Research as part of the Scientific Discovery Through Advanced Computing program, grant #DE-SC0010429.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
About this article
Cite this article
Aragón Artacho, F.J., Fleming, R.M.T. Globally convergent algorithms for finding zeros of duplomonotone mappings. Optim Lett 9, 569–584 (2015). https://doi.org/10.1007/s11590-014-0769-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-014-0769-z