1 Introduction

Monotone mappings have been extensively studied in the literature, see for instance [6, Chapter 12] or the recent monograph [1]. In many practical problems, though, the monotonicity assumption turns out to be too strong. Consequently, several generalized notions of monotonocity have been introduced and thoroughly studied by various authors in order to relax it while keeping some of the useful properties of monotone mappings, see [2, 4] and the references therein.

In mathematical models of biochemical reaction networks [3], a problem arises of finding a zero of functions that are typically not monotone (see Example 5). These functions seem to have a generalized monotonicity property that has not yet appeared in the literature but can be exploited to find a zero of such functions. In this paper we introduce this new class of generalized monotone mappings, which we call duplomonotone, and present a rather simple derivative-free line search algorithm that can be used to find a zero of a duplomonotone function.

The paper is organized as follows: in Sect. 2 we introduce duplomonotone mappings, analyze their basic properties and provide various illustrative examples; in Sect. 3 we present three variations of a derivative-free line search algorithm for finding a zero of a duplomonotone function, and we prove their linear convergence under strong duplomonotonicity plus some Lipschitz-type assumption on the points of the lower level set defined by the initial point.

Throughout, \(\Vert \cdot \Vert \) denotes the Euclidean norm, while the usual inner product is denoted by \(\langle \cdot ,\cdot \rangle .\) We say that \(F\) is a set-valued mapping from \(\mathbb {R}^{m}\) to \(\mathbb {R}^{n}\), denoted by \(F:\mathbb {R}^{m}\rightrightarrows \mathbb {R}^{n}\), if for every \(x\in \mathbb {R}^{m}\), \(F(x)\) is a subset of \(\mathbb {R}^{n}\). The gradient of a differentiable function \(f:\mathbb {R}^m\rightarrow \mathbb {R}^n\) at some point \(x\in \mathbb {R}^m\) is denoted by \(\nabla f(x)\in \mathbb {R}^{m\times n}\).

2 Duplomonotonicity

Recall that a function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is said to be monotone when

$$\begin{aligned} \langle f(x)-f(y),x-y\rangle \ge 0\quad \text {for all }\quad x,y\in \mathbb {R}^{m}, \end{aligned}$$

and strictly monotone if this inequality is strict whenever \(x\ne y\). Further, \(f\) is called strongly monotone for some \(\sigma >0\) when

$$\begin{aligned} \langle f(x)-f(y),x-y\rangle \ge \sigma \Vert x-y\Vert ^{2}\quad \text {for all }\quad x,y\in \mathbb {R}^{m}. \end{aligned}$$

We introduce next a new property that is implied by monotonicity.

Definition 1

A function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is called duplomonotone with constant \(\bar{\tau }>0\) if

$$\begin{aligned} \langle f(x)-f(x-\tau f(x)),f(x)\rangle \ge 0\quad \text {whenever }x\in \mathbb {R}^{m},0\le \tau \le \bar{\tau }, \end{aligned}$$
(1)

and strictly duplomonotone if this inequality is strict whenever \(f(x)\ne 0\). The function \(f\) is said to be strongly duplomonotone for some \(\sigma >0\) with constant \(\bar{\tau }>0\) if

$$\begin{aligned} \langle f(x)-f(x-\tau f(x)),f(x)\rangle \ge \sigma \tau \Vert f(x)\Vert ^{2}\quad \text {whenever }x\in \mathbb {R}^{m},0\le \tau \le \bar{\tau }. \end{aligned}$$
(2)

The modulus of strong duplomonotonicity is the supremum of the constants \(\sigma \) for which (2) holds.

Remark 1

Letting \(\sigma \) be zero in (2) will allow us to handle both duplomonotonicity and strong duplomonotonicity at the same time. Hence, we refer to this as \(f\) being strongly duplomonotone with \(\sigma \ge 0\).

Obviously, every (strongly) monotone function is (strongly) duplomonotone. In the next simple example we show that the converse is not true in general: the class of duplomonotone functions is strictly broader than the class of monotone functions. Thus, we have:

Example 1

Given a matrix \(A\in \mathbb {R}^{m\times m}\), consider the linear function \(f(x):=Ax\). Recall that the symmetric part of \(A\) is the matrix \(A_{s}:=\frac{1}{2}(A+A^{T})\). The mapping \(f\) is monotone if and only if \(A_{s}\) is positive semidefinite (see e.g. [6, Example 12.2]). On the other hand, \(f\) is duplomonotone if and only if there is some \(\bar{\tau }>0\) such that, for any \(x\in \mathbb {R}^{m}\), one has

$$\begin{aligned} 0\le \langle f(x)-f(x-\tau f(x)),f(x)\rangle =\tau x^{T}A^{T}A^{2}x,\quad \text {whenever }0\le \tau \le \bar{\tau }; \end{aligned}$$

that is, \(f\) is duplomonotone if and only if \(\left( A^{T}A^{2}\right) _{s}\) is positive semidefinite. Furthermore, \(f\) is strongly duplomonotone for \(\sigma >0\) if and only if for any \(x\in \mathbb {R}^{m}\) and any positive \(\tau \), one has

$$\begin{aligned} 0\le \langle f(x)-f(x-\tau f(x)),f(x)\rangle -\sigma \tau \Vert f(x)\Vert ^{2}&= \tau x^{T}A^{T}A^{2}x-\sigma \tau x^{T}A^{T}Ax\\&= \tau x^{T}A^{T}(A-\sigma I)Ax, \end{aligned}$$

where \(I\) denotes the identity mapping. Therefore, \(f\) is strongly duplomonotone for \(\sigma >0\) if and only if \(\left( A^{T}(A-\sigma I)A\right) _{s}=A^{T}(A_{s}-\sigma I)A\) is positive semidefinite.

If \(A\) is symmetric, then \(\left( A^{T}A^{2}\right) _{s}=A^{3}\), whose eigenvalues have the same sign as those of \(A\). Thus, for \(A\) symmetric, the function \(f\) is duplomonotone if and only if \(f\) is monotone. However, this may not be the case if A is asymmetric. As a simple example, if we take

$$\begin{aligned} A:=\left[ \begin{array}{cc} 2 &{} 0\\ 2 &{} 0 \end{array}\right] , \end{aligned}$$
(3)

then,

$$\begin{aligned} A_{s}=\left[ \begin{array}{cc} 2 &{} 1\\ 1 &{} 0 \end{array}\right] , \end{aligned}$$

which is not positive semidefinite, while

$$\begin{aligned} \left( A^{T}A^{2}\right) _{s}=\left[ \begin{array}{cc} 16 &{} 0\\ 0 &{} 0 \end{array}\right] \end{aligned}$$

is positive semidefinite. Thus, \(f((x,y)^{T})=(2x,2x)^{T}\) is duplomonotone but is not monotone. Moreover, it is not difficult to check that \(f\) is not even quasimonotone.Footnote 1 In fact, \(f\) is strongly duplomonotone with modulus \(\sigma =2\). Indeed,

$$\begin{aligned} A^{T}(A-\sigma I)A=\left[ \begin{array}{cc} 8(2-\sigma ) &{} 0\\ 0 &{} 0 \end{array}\right] , \end{aligned}$$

which is positive semidefinite if and only if \(\sigma \le 2\).\(\Diamond \)

A strictly monotone function has at most one zero. This is not the case for duplomonotone functions: even under strong duplomonotonicity we can see that the function \(f(x)=Ax\) with \(A\) given by (3) has a zero at \((0,y)^{T}\) for every \(y\in \mathbb {R}\). In fact, the zero function is strongly duplomonotone for any \(\sigma >0\).

We have shown a function in Example 1 that is duplomonotone but not quasimonotone. It is interesting to note that there are also functions that are quasimonotone but not duplomonotone, e.g. \(f(x)=-|x|\) for \(x\in \mathbb {R}\).

Example 2

Given a matrix \(A\in \mathbb {R}^{m\times m}\) and a vector \(b\in \mathbb {R}^{m}\), consider the affine function \(f(x):=Ax+b\). By [6, Example 12.2], \(f\) is monotone if and only if \(A_{s}\) is positive semidefinite. On the other hand, \(f\) is duplomonotone if and only if

$$\begin{aligned} (Ax+b)^{T}A(Ax+b)\ge 0\quad \text {for all }x\in \mathbb {R}^{m}; \end{aligned}$$

that is, \(f\) is duplomonotone if and only if \(A_{s}\) is positive semidefinite on the range of \(f\). For example, one can check that for \(A\) given in (3) and any \(b=(b_{1},b_{2})^{T}\in \mathbb {R}^{2}\), the function \(f\) is duplomonotone if and only if \(b_{1}=b_{2}\).\(\Diamond \)

Next we present a direct characterization of duplomonotonicity in terms of the Euclidean norm.

Proposition 1

A function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is strongly duplomonotone for \(\sigma \ge 0\) if and only if there is some constant \(\bar{\tau }>0\) such that for all \(x\in \mathbb {R}^{m}\) and all \(0\le \tau \le \bar{\tau }\) one has

$$\begin{aligned} \Vert f(x-\tau f(x))\Vert ^{2}\le (1-2\sigma \tau )\Vert f(x)\Vert ^{2}+\Vert f(x-\tau f(x))-f(x)\Vert ^{2}. \end{aligned}$$
(4)

Proof

For any \(x\in \mathbb {R}^{m}\) and any \(\tau >0\), we have.

$$\begin{aligned} \Vert f(x-\tau f(x))\Vert ^{2}&= \Vert (f(x-\tau f(x))-f(x))+f(x)\Vert ^{2}\\&= \Vert f(x-\tau f(x))-f(x)\Vert ^{2}+\Vert f(x)\Vert ^{2}\\&+2\langle f(x-\tau f(x))-f(x),f(x)\rangle . \end{aligned}$$

The stated equivalence follows then from the definition of strong duplomonotonicity of \(f\).\(\square \)

The following example shows the importance of considering the constant \(\bar{\tau }\) in the definition of duplomonotonicity: there are functions for which (1) does not hold for all \(\tau >0\). One could also define a weaker notion of duplomonotonicity where the constant \(\bar{\tau }\) in (1) depends on each point \(x\). Nevertheless, this property might be too weak to guarantee the convergence of the line search algorithms in Sect. 3, as we need to ensure that the step size is bounded away from zero.

Example 3

Let \(f:\mathbb {R}^{2}\rightarrow \mathbb {R}^{2}\) be given by \(f(x):=(x_{1}x_{2}^{2},x_{2})^{T}\) for \(x:=(x_{1},x_{2})^{T}\in \mathbb {R}^{2}\). It is easy to check that \(f\) is not monotone: if we take \(x:=(-3,0)^{T}\) and \(y:=(-1,1)^{T}\), we have

$$\begin{aligned} \langle f(x)-f(y),x-y\rangle =-1. \end{aligned}$$

On the other hand, after some algebraic manipulation, one can show that for all \(x:=(x_{1},x_{2})^{T}\in \mathbb {R}^{2}\), one has

$$\begin{aligned} \langle f(x)-f(x-\tau f(x)),f(x)\rangle =\left( (\tau -1)^{2}x_{1}^{2}x_{2}^{4}+(2 -\tau )x_{1}^{2}x_{2}^{2}+1\right) \tau x_{2}^{2}, \end{aligned}$$

which is nonnegative for all \(\tau \in [0,2]\). Thus, \(f\) is duplomonotone with constant \(\bar{\tau }=2\). If \(\tau >2\), the expression above can be negative for some \(x\in \mathbb {R}^{2}\). Indeed, choose any \(\varepsilon >0\) and let \(z:=(z_{1},\sqrt{\varepsilon /2}/(\varepsilon +1))\) for some \(z_{1}\in \mathbb {R}\). Then,

$$\begin{aligned} \langle f(z)-f(z-(2+\varepsilon )f(z)),f(z)\rangle =\frac{-(\varepsilon ^{4} +2{\varepsilon }^{3})z_{1}^{2}+4\varepsilon ^{4} +16\varepsilon ^{3}+20\varepsilon ^{2}+8\varepsilon }{8(\varepsilon +1) ^{4}}, \end{aligned}$$

which is negative for \(z_{1}^{2}\) sufficiently big.\(\Diamond \)

The next result shows that if a function is both Lipschitz continuous and strongly duplomonotone for \(\sigma >0\), then \(\sigma \) is bounded above by the Lipschitz constant.

Proposition 2

If a function \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is Lipschitz continuous with constant \(\ell >0\) and strongly duplomonotone for \(\sigma >0\), with \(f\not \equiv 0,\) then \(\sigma \le \ell \).

Proof

Because of the Lipschitz continuity, we have

$$\begin{aligned} \Vert f(x-\tau f(x))-f(x)\Vert \le \ell \tau \Vert f(x)\Vert \quad \text {for all }\quad x\in \mathbb {R}^{m},\tau >0. \end{aligned}$$

Let \(\bar{\tau }>0\) be the strong duplomonotonicity constant in (2), and pick any \(z\in \mathbb {R}^{m}\) such that \(f(z)\ne 0\). Then

$$\begin{aligned} \sigma \bar{\tau }\Vert f(z)\Vert ^{2}&\le \langle f(z)-f(z -\bar{\tau }f(z)),f(z)\rangle \le \Vert f(z-\bar{\tau }f(z))-f(z)\Vert \Vert f(z)\Vert \\&\le \ell \bar{\tau }\Vert f(z)\Vert ^{2}, \end{aligned}$$

whence \(\sigma \le \ell \). \(\square \)

In the following result we show a direct consequence of duplomonotonicity for differentiable functions.

Proposition 3

Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be differentiable. The following assertions hold.

  1. (i)

    If \(f\) is duplomonotone, then

    $$\begin{aligned} f(x)^{T}\nabla f(x)f(x)\ge 0\quad \text {for all }\quad x\in \mathbb {R}^{m}. \end{aligned}$$
    (5)
  2. (ii)

    If \(f\) is strongly duplomonotone for \(\sigma >0\), then

    $$\begin{aligned} f(x)^{T}\nabla f(x)f(x)\ge \sigma \Vert f(x)\Vert ^{2}\quad \text {for all }\quad x\in \mathbb {R}^{m}. \end{aligned}$$
    (6)

Proof

Assume that \(f\) satisfies (2) with \(\sigma \ge 0\) and \(\bar{\tau }>0\). Fix \(x\in \mathbb {R}^{m}\) and choose an arbitrary \(\tau \in (0,\bar{\tau }]\). Dividing (2) by \(\tau \) we get

$$\begin{aligned} -\left\langle \frac{f(x-\tau f(x))-f(x)}{\tau },f(x)\right\rangle \ge \sigma \Vert f(x)\Vert ^{2}, \end{aligned}$$

and taking the limit as \(\tau \searrow 0\), we obtain \(f(x)^{T}\nabla f(x)f(x)\ge \sigma \Vert f(x)\Vert ^{2}\). \(\square \)

Remark 2

  1. (i)

    In general, strict duplomonotonicity does not imply that equality in (5) is only achieved when \(f(x)=0\), in the same way that strict monotonicity does not imply positive definiteness of \(\nabla f(x)\).

  2. (ii)

    Observe that both assertions also hold under the weaker notion of duplomonotonicity where the constant \(\bar{\tau }\) depends on each \(x\in \mathbb {R}^{m}\).

For differentiable functions in one dimension, the notions of (strong) duplomonotonicity and (strong) monotonicity agree. In fact, Proposition 4 establishes that the concepts of monotonicity and duplomonotonicity coincide for continuous functions in one dimension.Footnote 2 This is not the case if the function is not continuous, as we show in Example 4.

Corollary 1

Let \(f:\mathbb {R}\rightarrow \mathbb {R}\) be differentiable. Then \(f\) is (strongly) monotone if and only if \(f\) is (strongly) duplomonotone.

Proof

This is just a consequence of Proposition 3 and the fact that \(f\) is (strongly) monotone with constant \(\sigma \ge 0\) if and only if \(f'(x)\ge \sigma \). \(\square \)

Proposition 4

Let \(f:\mathbb {R}\rightarrow \mathbb {R}\) be continuous. Then \(f\) is monotone if and only if \(f\) is duplomonotone.

Proof

Suppose that \(f\) is duplomonotone with constant \(\bar{\tau }>0\). If there is some \(z\in \mathbb {R}\) such that \(f(z)>0\), then we claim that there is an open interval containing \(z\) such that \(f(z)\) is both nondecreasing and positive on it. Indeed, by continuity of \(f\), there is some \(\delta _{0}>0\) such that \(f(x)>f(z)/2>0\) for all \(x\in (z-\delta _{0},z+\delta _{0})\). Set \(\delta :=\min \left\{ \delta _{0},\bar{\tau }f(z)/4\right\} \). Choose any \(x,y\in (z-\delta ,z+\delta )\) with \(x>y\), and set \(\tau :=\frac{x-y}{f(x)}\in (0,\bar{\tau })\). Then, \(x-\tau f(x)=y\). From the duplomonotonicity of \(f\), we deduce

$$\begin{aligned} 0\le f(x)-f(x-\tau f(x))=f(x)-f(y). \end{aligned}$$

Hence, \(f\) is nondecreasing and positive on \((z-\delta ,z+\delta ),\) as claimed.

Observe now that \(f\) has to be positive and nondecreasing on \((z-\delta ,+\infty )\), again by continuity of \(f\). Therefore, if we set \(a:=\inf \left\{ x\in \mathbb {R}\mid f(x)>0\right\} \in \mathbb {R}\cup \left\{ -\infty ,+\infty \right\} \), it follows that \(\left\{ x\in \mathbb {R}\mid f(x)>0\right\} =(a,+\infty )\) and \(f\) is nondecreasing on \((a,+\infty )\). Using the same argument, we deduce that \(\left\{ x\in \mathbb {R}\mid f(x)<0\right\} =(-\infty ,b)\) with \(b\in \mathbb {R}\cup \left\{ -\infty ,+\infty \right\} \) and \(f\) is nondecreasing on \((-\infty ,b)\). Thus, \(f\) is monotone. \(\square \)

Example 4

Consider the function \(f:\mathbb {R}\rightarrow \mathbb {R}\) defined for \(x\in \mathbb {R}\) by

$$\begin{aligned} f(x):={\left\{ \begin{array}{ll} 0, &{} \text {if }x\in \mathbb {Q};\\ 1, &{} \text {if }x\not \in \mathbb {Q}. \end{array}\right. } \end{aligned}$$

The function \(f\) is not monotone (not even locally):

$$\begin{aligned} (f(\pi )-f(4))(\pi -4)=\pi -4<0. \end{aligned}$$

On the other hand, \(f\) is duplomonotone: for any \(x\in \mathbb {Q}\) the duplomonotonicity condition (1) trivially holds since \(f(x)=0\), while for any \(x\not \in \mathbb {Q}\) and any \(\tau >0\) we have

$$\begin{aligned} \left( f(x)-f(x-\tau f(x))\right) f(x)=1-f(x-\tau )\ge 0. \end{aligned}$$

Furthermore, one can easily check that this function is not strongly duplomonotone. A slight modification of this example yields a function that is strongly duplomonotone, but still not monotone: let \(g:\mathbb {R}\rightarrow \mathbb {R}\) be defined for \(x\in \mathbb {R}\) by

$$\begin{aligned} g(x):={\left\{ \begin{array}{ll} 0, &{} \text {if }x\in \mathbb {Q};\\ x, &{} \text {if }x\not \in \mathbb {Q}. \end{array}\right. } \end{aligned}$$

Again, the function \(g\) is not monotone (not even locally), since

$$\begin{aligned} (g(\pi )-g(4))(\pi -4)=\pi (\pi -4)<0. \end{aligned}$$

In this case, \(g\) is strongly duplomonotone for \(\sigma =1\) with constant \(\bar{\tau }=1\): for any \(x\not \in \mathbb {Q}\) and any \(\tau \in [0,1]\) we have

$$\begin{aligned} \left( g(x)-g(x-\tau g(x))\right) g(x)-\tau g(x)^{2}&= (x-g((1-\tau )x))x-\tau x{}^{2}\\&= {\left\{ \begin{array}{ll} (1-\tau )x^{2}, &{} \text {if }(1-\tau )x\in \mathbb {Q}\\ 0, &{} \text {if }(1-\tau )x\not \in \mathbb {Q} \end{array}\right. }\\&\ge 0. \end{aligned}$$

Therefore, without differentiability, the concepts of monotonicity and duplomonotonicity may be quite different, even in one dimension.\(\Diamond \)

In the next proposition we introduce a property that implies duplomonotonicity, but is still weaker than monotonicity (see Example 5). This property has a characterization for differentiable functions analogous to the positive-semidefiniteness of the Jacobian for monotone functions, see e.g.  [6, Proposition 12.3].

Proposition 5

Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be differentiable. Then, for any \(\sigma \ge 0\), the following two properties are equivalent:

  1. (i)

    \(\langle f(x-\tau _{1}f(x))-f(x-\tau _{2}f(x)),f(x) \rangle \ge \sigma (\tau _{2}-\tau _{1})\Vert f(x)\Vert ^{2}\) for all \(x\in \mathbb {R}^{m}\), \(0\le \tau _{1}\le \tau _{2}\le \bar{\tau }\);

  2. (ii)

    \(f(x)^{T}\nabla f(x-\tau f(x))f(x)\ge \sigma \Vert f(x)\Vert ^{2}\) for all \(x\in \mathbb {R}^{m}\), \(\tau \in [0,\bar{\tau }]\).

Proof

Assume that (i) holds. Choose any \(x\in \mathbb {R}^{m}\) and any \(\tau \in [0,\bar{\tau })\). For all \(t\in (0,\bar{\tau }-\tau ]\) one has

$$\begin{aligned} -\langle f(x-(t+\tau )f(x))-f(x-\tau f(x)),f(x)\rangle \ge \sigma t\Vert f(x)\Vert ^{2}. \end{aligned}$$

Thus, dividing by \(t\) and taking the limit as \(t\searrow 0\),

$$\begin{aligned} \sigma \Vert f(x)\Vert ^{2}&\le -\left\langle \lim _{t\searrow 0} \frac{f(x-(t+\tau )f(x))-f(x-\tau f(x))}{t},f(x)\right\rangle \\&= \langle \nabla f(x-\tau f(x))^Tf(x),f(x)\rangle , \end{aligned}$$

so (ii) follows.

Conversely, assume that (ii) holds. Pick any \(x\in \mathbb {R}^{m}\) and any \(0\le \tau _{1}\le \tau _{2}\le \bar{\tau }\). Consider the function

$$\begin{aligned} h(\lambda ):=\langle f(x\!-\!(\lambda \tau _{1}\!+\!(1\!-\!\lambda )\tau _{2}) f(x))\!-\!f(x-\tau _{2}f(x))-\sigma \lambda (\tau _{2}-\tau _{1})f(x), f(x)\rangle \end{aligned}$$

for \(\lambda \in \mathbb {R}\). Then, by (ii),

$$\begin{aligned} h'(\lambda )&= \langle \nabla f(x-(\lambda \tau _{1} +(1-\lambda )\tau _{2}) f(x))^T(\tau _{2}-\tau _{1})f(x) -\sigma (\tau _{2}-\tau _{1})f(x),f(x)\rangle \\&\ge 0, \end{aligned}$$

for all \(\lambda \in [0,1]\), whence,

$$\begin{aligned} 0=h(0)\le h(1)=\langle f(x-\tau _{1}f(x))-f(x-\tau _{2}f(x))-\sigma (\tau _{2}-\tau _{1}) f(x),f(x)\rangle , \end{aligned}$$

which implies (i). \(\square \)

Our motivation to characterize duplomonotone mappings arose from mathematical modeling of networks of (bio)chemical reactions, an increasingly prominent application of mathematical and numerical optimization. The next example introduces a very simple (bio)chemical reaction network, involving three molecules and three reactions, where each row of \(x\) corresponds to the logarithmic abundance of some molecule and each row of \(-f(x)\) corresponds to the rate of change of abundance per unit time.

Example 5

Consider the function \(f:\mathbb {R}^{3}\rightarrow \mathbb {R}^{3}\) defined for \(x\in \mathbb {R}^{3}\) by \(f(x):=([F,R]-[R,F])\exp ([F,R]^{T}x),\) where \(\exp (\cdot )\) denotes the component-wise exponential,

$$\begin{aligned} F:=\left[ \begin{array}{ccc} 0 &{} 0 &{} 1\\ 1 &{} 0 &{} 0\\ 0 &{} 1 &{} 0 \end{array}\right] ,\, R:=\left[ \begin{array}{ccc} 1 &{} 0 &{} 0\\ 0 &{} 1 &{} 0\\ 0 &{} 0 &{} 1 \end{array}\right] , \end{aligned}$$

and \([\,\cdot \,,\cdot \,]\) is the horizontal concatenation operator. That is, for any \(x:=(x_{1},x_{2},x_{3})^{T}\in \mathbb {R}^{3}\) we have

$$\begin{aligned} f(x)=\left[ \begin{array}{c} 2e^{x_{1}}-e^{x_{2}}-e^{x_{3}}\\ -e^{x_{1}}+2e^{x_{2}}-e^{x_{3}}\\ -e^{x_{1}}-e^{x_{2}}+2e^{x_{3}} \end{array}\right] . \end{aligned}$$

The function \(f\) is not monotone because \(\nabla f(x)\) is not positive semidefinite for all \(x\in \mathbb {R}^{3}\). For instance, if \(z:=(0,0,\log (2))^{T}\) and \(w:=\left( 3,3,2\right) ^{T}\), we have

$$\begin{aligned} w^{T}\nabla f(z)w=-2. \end{aligned}$$

Nevertheless, the function \(f\) is duplomonotone because, in fact, it satisfies Proposition 5(ii) with \(\sigma =0\). Indeed, if we define

$$\begin{aligned} \varphi (x,\tau ):=\langle f(x)-f(x-\tau f(x)),f(x)\rangle , \end{aligned}$$

we have

$$\begin{aligned} \frac{\partial \varphi }{\partial \tau }(x,\tau )=\langle \nabla f(x-\tau f(x))^Tf(x),f(x)\rangle . \end{aligned}$$
(7)

After some algebraic manipulation, we obtain

$$\begin{aligned} \varphi (x,\tau )&= 3e^{x_{1}+\tau (-2e^{x_{1}}+e^{x_{2}} +e^{x_{3}})}(-2e^{x_{1}}+e^{x_{2}}+e^{x_{3}})\\&+3e^{x_{2}+\tau (e^{x_{1}}-2e^{x_{2}}+e^{x_{3}})} (e^{x_{1}}-2e^{x_{2}}+e^{x_{3}})\\&+3e^{x_{3}+\tau (e^{x_{1}}+e^{x_{2}}-2e^{x_{3}})} (e^{x_{1}}+e^{x_{2}}-2e^{x_{3}})\\&+(-2e^{x_{1}}+e^{x_{2}}+e^{x_{3}})^{2}+(e^{x_{1}} -2e^{x_{2}}+e^{x_{3}})^{2}\\&+(e^{x_{1}}+e^{x_{2}}-2e^{x_{3}})^{2}. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{\partial \varphi }{\partial \tau }(x,\tau )&= 3e^{x_{1} +\tau (-2e^{x_{1}}+e^{x_{2}}+e^{x_{3}})}(-2e^{x_{1}}+e^{x_{2}} +e^{x_{3}})^{2}\\&+3e^{x_{2}+\tau (e^{x_{1}}-2e^{x_{2}}+e^{x_{3}})}(e^ {x_{1}}-2e^{x_{2}}+e^{x_{3}})^{2}\\&+3e^{x_{3}+\tau (e^{x_{1}}+e^{x_{2}}-2e^{x_{3}})}(e^ {x_{1}}+e^{x_{2}}-2e^{x_{3}})^{2}\\&\ge 0, \end{aligned}$$

and because of (7), we have that Proposition 5(ii) holds for all \(\tau >0\).

Indeed, the function \(f\) is strictly duplomonotone because \(\frac{\partial \varphi }{\partial \tau }(x,\tau )>0\) for all \(x\not \in \varOmega \), where

$$\begin{aligned} \varOmega :=\left\{ x\in \mathbb {R}^{3}\mid f(x)=0\right\} =\left\{ (x_{1},x_{2},x_{3})^{T}\in \mathbb {R}^{3}\mid x_{1}=x_{2}=x_{3}\right\} . \end{aligned}$$

Hence, \(\varphi (x,\tau )>\varphi (x,0)=0\) for all \(x\not \in \varOmega \) and all \(\tau >0\); that is, \(f\) is strictly duplomonotone.\(\Diamond \)

The sum of two monotone operators is clearly monotone. Further, if a mapping \(F\) is monotone, one can easily show that for all \(\alpha >0\) the mapping \(F+\alpha I\) is strongly monotone. Do these properties also hold for duplomonotone functions? The answer is negative in general. As we show in the next example, duplomonotonicity can be destroyed by the addition of a monotone linear function of arbitrarily small slope.

Example 6

Consider the matrix

$$\begin{aligned} A:=\left[ \begin{array}{cc} 0 &{} 1\\ 0 &{} 0 \end{array}\right] . \end{aligned}$$

By Example 1, the function \(f(x):=Ax\) is duplomonotone, since \(A^{T}A^{2}=0_{2\times 2}\). Choose any \(\alpha >0\) and consider the function \(g(x):=Bx\), with \(B:=A+\alpha I\). Then,

$$\begin{aligned} \left( B^{T}B^{2}\right) _{s}=\left[ \begin{array}{cc} \alpha ^{3} &{} \frac{3}{2}\alpha ^{2}\\ \frac{3}{2}\alpha ^{2} &{} \alpha ^{3}+2\alpha \end{array}\right] . \end{aligned}$$

The eigenvalues of \(\left( B^{T}B^{2}\right) _{s}\) are \(\alpha ^{3}+\alpha \pm 1/2\alpha \sqrt{9\alpha ^{2}+4}\). If \(\alpha \in (0,1/2)\), we have that \(\alpha ^{3}+\alpha -1/2\alpha \sqrt{9\alpha ^{2}+4}<0\). Therefore, the function \(g=f+\alpha I\) is not duplomonotone for any \(\alpha \in (0,1/2)\).\(\Diamond \)

A direct consequence of Proposition 3 is that \(-f(x)\) is a descent direction for \(\Vert f(\cdot )\Vert ^{2}\) at any point \(x\in \mathbb {R}^{m}\) when \(f\) is duplomonotone. This property inspires the derivative-free algorithms in Sect. 3 for finding zeros of the function \(f\).

Corollary 2

Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be differentiable and strongly duplomonotone for \(\sigma >0\). Then, for all \(x\in \mathbb {R}^{m}\), either \(f(x)=0\) or the vector \(-f(x)\) provides a descent direction for the merit function \(\Vert f(\cdot )\Vert ^{2}\) at the point \(x\).

Proof

Observe that, for any \(x\in \mathbb {R}^{m},\) we have \(\nabla \left( \Vert f(\cdot )\Vert ^{2}\right) (x)=2\nabla f(x)f(x)\). Thus, inequality (6) implies that

$$\begin{aligned} \langle \nabla \left( \Vert f(\cdot )\Vert ^{2}\right) (x),-f(x)\rangle =-2\langle \nabla f(x)f(x),f(x)\rangle \le -2\sigma \Vert f(x)\Vert ^{2}. \end{aligned}$$
(8)

The assertion follows. \(\square \)

It is straightforward to extend the definition of duplomonotonicity for set-valued mappings.

Definition 2

A set-valued mapping \(F:\mathbb {R}^{m}\rightrightarrows \mathbb {R}^{m}\) is called duplomonotone with constant \(\bar{\tau }>0\) if for all \(x\in \mathbb {R}^{m}\) and all \(\tau \in [0,\bar{\tau }]\) one has

$$\begin{aligned} \langle y_{0}-y_{1},y_{0}\rangle \ge 0\quad \text {whenever }y_{0}\in F(x),y_{1}\in F(x-\tau y_{0}). \end{aligned}$$

The mapping \(F\) is said to be strongly duplomonotone for some \(\sigma >0\) with constant \(\bar{\tau }>0\) if for all \(x\in \mathbb {R}^{m}\) and all \(\tau \in [0,\bar{\tau }]\) one has

$$\begin{aligned} \langle y_{0}-y_{1},y_{0}\rangle \ge \sigma \tau \Vert y_{0}\Vert ^{2} \quad \text {whenever }y_{0}\in F(x),y_{1}\in F(x-\tau y_{0}). \end{aligned}$$

One can easily extend the characterization of duplomonotonicity given in Proposition 1 to set-valued mappings.

Proposition 6

A set-valued mapping \(F:\mathbb {R}^{m}\rightrightarrows \mathbb {R}^{m}\) is strongly duplomonotone for \(\sigma \ge 0\) if and only if there is some \(\bar{\tau }>0\) such that for all \(x\in \mathbb {R}^{m}\) and all \(\tau \in [0,\bar{\tau }]\) one has

$$\begin{aligned} \Vert y_{1}\Vert ^{2}\le (1-2\sigma \tau )\Vert y_{0}\Vert ^{2} +\Vert y_{1}-y_{0}\Vert ^{2}\quad \text {whenever }y_{0}\in F(x),y_{1}\in F(x-\tau y_{0}). \end{aligned}$$

We will not explore duplomonotone set-valued mappings any further here, as it is beyond the scope of the present paper.

3 Derivative-free algorithms for systems of duplomonotone equations

In this section we consider the problem of finding solutions of systems of nonlinear equations

$$\begin{aligned} f(x)=0, \end{aligned}$$
(9)

where \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) is strongly duplomonotone for \(\sigma >0\). Corollary 2 drives us to consider the following derivative-free line search algorithm for finding zeros of \(f\).

figure a

Observe that, when \(f\) is differentiable, the step size acceptance criteria (10) is implied by the usual Armijo rule for the function \(\Vert f(\cdot )\Vert ^{2}\) and the direction \(d_{k}:=-f(x_{k})\). Indeed, given some constant \(c\in (0,1)\), the Armijo rule for \(\Vert f(\cdot )\Vert ^{2}\) will search for a step size \(\lambda _{k}\) satisfying

$$\begin{aligned} \Vert f(x_{k}+\lambda _{k}d_{k})\Vert ^{2}&\le \Vert f(x_{k})\Vert ^{2} +2c\lambda _{k}d_{k}^{T}\nabla f(x_{k})f(x_{k})\\&= \Vert f(x_{k})\Vert ^{2}-2c\lambda _{k}f(x_{k})^{T}\nabla f(x_{k})f(x_{k}). \end{aligned}$$

Proposition 3(ii) gives us

$$\begin{aligned} \Vert f(x_{k})\Vert ^{2}-2c\lambda _{k}f(x_{k})^{T}\nabla f(x_{k}) f(x_{k})&\le (1-2\sigma c\lambda _{k})\Vert f(x_{k})\Vert ^{2}. \end{aligned}$$

Taking \(\alpha :=2\sigma c\), we get \(0<\alpha <2\sigma \), and (10) follows.

The steepest descent algorithm could be applied to find solutions to nonlinear equations of type (9) whenever the function \(f\) has a computable Jacobian. The main advantage of Algorithm 1 relative to the steepest descent method is that no derivative information is needed. On the other hand, note that one cannot assure in general that the steepest descent method will converge to a zero of the function \(f\), but to a critical point of \(\Vert f(\cdot )\Vert ^{2}\) (for more details, see e.g. [5, Chapter 11]). This is not a concern under strong duplomonotonicity for \(\sigma >0\): in this case, any critical point of \(\Vert f(\cdot )\Vert ^{2}\) will be a zero of \(f\). Indeed, otherwise one would have \(\nabla \left( \Vert f(\cdot )\Vert ^{2}\right) (\tilde{x})=0\) and \(f(\tilde{x})\ne 0\) for some \(\tilde{x}\in \mathbb {R}^{m}\). Then

$$\begin{aligned} 0=\nabla \left( \Vert f(\cdot )\Vert ^{2}\right) (\tilde{x})=2\nabla f(\tilde{x})f(\tilde{x}), \end{aligned}$$

whence, by Proposition 3(ii),

$$\begin{aligned} 0=f(\tilde{x})^{T}\nabla f(\tilde{x})f(\tilde{x}) \ge \sigma \Vert f(\tilde{x})\Vert ^{2}>0, \end{aligned}$$

which is a contradiction.

If \(f\) is Lipschitz continuous with a known constant \(\ell >0\) and is also strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\), then, as a direct consequence of the characterization in Proposition 1, we get

$$\begin{aligned} \Vert f(x-\tau f(x))\Vert ^{2}\le (1-2\sigma \tau +\ell ^{2}\tau ^{2})\Vert f(x)\Vert ^{2}, \end{aligned}$$
(11)

for all \(x\in \mathbb {R}^{m}\) and all \(0\le \tau \le \bar{\tau }\). The right-hand side of (11) attains its minimum (with respect to \(\tau \in [0,\bar{\tau }]\)) at \(\tau ^{\star }:=\min \left\{ \sigma /\ell ^{2},\bar{\tau }\right\} \). Thus, if \(\sigma /\ell ^{2}\le \bar{\tau }\), we have

$$\begin{aligned} \Vert f(x-\tau ^{\star }f(x))\Vert ^{2}\le \left( 1-\frac{\sigma ^{2}}{\ell ^{2}}\right) \Vert f(x)\Vert ^{2}. \end{aligned}$$
(12)

This makes us consider the following variation of Algorithm 1, where the step size is chosen constant.

figure b

As a direct consequence of (12) we have that Algorithm 2 is (globally) linearly convergent to a zero of \(f\), and moreover, the Lipschitz assumption can be relaxed as follows.

Theorem 1

Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\). Let \(x_{0}\in \mathbb {R}^{m}\) be an initial point, and assume there exists some constant \(\ell >0\) such that

$$\begin{aligned} \Vert f(x-\tau f(x))-f(x)\Vert \le \ell \tau \Vert f(x)\Vert \quad \text {for all } x\in L(x_{0})\text { and all }0\le \tau \le \bar{\tau }, \end{aligned}$$
(13)

where \(L(x_{0})\) is the lower level set defined by

$$\begin{aligned} L(x_{0}):=\left\{ x\in \mathbb {R}^{m}:\Vert f(x)\Vert \le \Vert f(x_{0})\ |\right\} . \end{aligned}$$
(14)

Set \(\lambda :=\min \left\{ \sigma /\ell ^{2},\bar{\tau }\right\} \). Then the iteration \(x_{k+1}:=x_{k}-\lambda f(x_{k})\) satisfies

$$\begin{aligned} \Vert f(x_{k+1})\Vert \le \sqrt{1-2\sigma \lambda +\ell ^{2} \lambda ^{2}}\,\Vert f(x_{k})\Vert ; \end{aligned}$$

whence, \(f(x_{k})\) is linearly convergent to zero. Thus, if \(f\) is continuous, any accumulation point of the sequence \(x_{k}\) is a zero of \(f\).

Proof

It follows from the argumentation above. \(\square \)

Even when \(f\) is known to be Lipschitz continuous, its Lipschitz constant might not be easy to compute. The next result shows that in this case Algorithm 1 can be used, and the step size \(\lambda _{k}\) can always be found by a backtracking technique where \(\lambda _{k}\) is bounded away from zero and the algorithm is linearly convergent. We denote by \(\left\lceil \cdot \right\rceil \) the ceiling function, i.e., the smallest following integer to a given number.

Theorem 2

Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\). Let \(x_{0}\in \mathbb {R}^{m}\) be an initial point, and assume that there is a positive constant \(\ell \) such that (13) holds. Then, for all \(0<\alpha <2\sigma \) and all \(0<\beta <1\), Algorithm 1 generates a sequence \(x_{k}\) such that \(f(x_{k})\) is linearly convergent to zero with rate \(\sqrt{1-\beta ^{p}}\), where

$$\begin{aligned} p:=\left\lceil \frac{1}{\log \beta }\left( \log \alpha +\min \left\{ \log (\bar{\tau }),\log \left( 2\sigma -\alpha \right) -2\log \ell \right\} \right) \right\rceil . \end{aligned}$$
(15)

Thus, if \(f\) is continuous, any accumulation point of the sequence \(x_{k}\) is a zero of \(f\).

Proof

Let \(x\in L(x_{0})\). We will prove that the step size \((1/\alpha )\beta ^{p}\) with \(p\) as in (15) always satisfies (10), i.e., that we have

$$\begin{aligned} \Vert f(x-(1/\alpha )\beta ^{p}f(x))\Vert ^{2}\le (1-\beta ^{p}) \Vert f(x)\Vert ^{2}. \end{aligned}$$
(16)

Proposition 1 gives us

$$\begin{aligned} \Vert f(x-\tau f(x))\Vert ^{2}\le (1-2\sigma \tau )\Vert f(x)\Vert ^{2} +\Vert f(x-\tau f(x))-f(x)\Vert ^{2}, \end{aligned}$$
(17)

for all \(\tau \in [0,\bar{\tau }]\). Take \(p\) as in (15), that is,

$$\begin{aligned} p:=\left\lceil \max \left\{ \frac{\log (\alpha \bar{\tau })}{\log \beta },\frac{\log \alpha +\log \left( 2\sigma -\alpha \right) -2\log \ell }{\log \beta }\right\} \right\rceil . \end{aligned}$$

Then (13) holds for all \(0<\tau \le (1/\alpha )\beta ^{p}\). This, together with (17), implies that

$$\begin{aligned} \Vert f(x-\tau f(x))\Vert ^{2}\le (1-2\sigma \tau +\ell ^{2}\tau ^{2})\ |f(x)\Vert ^{2}\quad \text { whenever }\quad 0<\tau \le \frac{1}{\alpha }\beta ^{p}. \end{aligned}$$

Moreover, we have that \(1-2\sigma \tau +\ell ^{2}\tau ^{2}\le 1-\alpha \tau \) if and only if \(\tau \le (2\sigma -\alpha )/\ell ^{2}\). The definition of \(p\) implies that \((1/\alpha )\beta ^{p}\le (2\sigma -\alpha )/\ell ^{2}\); hence,

$$\begin{aligned} \Vert f(x-\tau f(x))\Vert ^{2}\le (1-\alpha \tau )\Vert f(x)\Vert ^{2}\quad \text { whenever }\quad 0<\tau \le \frac{1}{\alpha }\beta ^{p}, \end{aligned}$$

which implies (16). Therefore, given a point \(x_{k}\) generated by Algorithm 1, the integer \(p_{k}\) can always be found and it satisfies \(p_{k}\le p\). Thus, \(\lambda _{k}=(1/\alpha )\beta ^{p_{k}}\ge (1/\alpha )\beta ^{p}\), and we have

$$\begin{aligned} \Vert f(x_{k+1})\Vert ^{2}=\Vert f(x_{k}-\lambda _{k}f(x_{k})\Vert ^{2} \le (1-\alpha \lambda _{k})\Vert f(x_{k})\Vert ^{2}\le (1-\beta ^{p}) \Vert f(x_{k})\Vert ^{2}, \end{aligned}$$

which in particular yields \(x_{k+1}\in L(x_{0})\), and the claims in the statement follow. \(\square \)

Remark 3

Notice that: (i) the constant \(\ell \) in (13) does not need to be known in order to use Algorithm 1, but is involved in the rate of convergence, and (ii) Lipschitz continuity of the function \(f\) on \(L(x_{0})\) implies (13).

Even if \(f\) is known (or conjectured) to be both Lipschitz continuous and strongly duplomonotone for \(\sigma >0\), in practical situations the values of both the Lipschitz constant and \(\sigma \) might be unknown or difficult to compute. The following modification of Algorithm 1 permits finding an adequate step size by a double backtracking technique, where an additional backtracking is performed in order to find an appropriate value of the parameter \(\alpha \) in  (10) such that \(\alpha <2\sigma \).

figure c

Theorem 3

Let \(f:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\) be strongly duplomonotone for \(\sigma >0\) with constant \(\bar{\tau }>0\). Let \(x_{0}\in \mathbb {R}^{m}\) be an initial point, and assume that there exists some positive constants \(\ell \) such that (13) holds. Then, for all positive constants \(\lambda _{\min }\) and \(\lambda _{\max }\) such that there exists some integer \(q\) with \(\lambda _{\min }\le \beta ^{q}\lambda _{\max }<\min \left\{ 2\sigma /\ell ^{2},\bar{\tau }\right\} \), Algorithm 3 generates a sequence \(x_{k}\) such that \(f(x_{k})\) is linearly convergent to zero with rate \(\sqrt{1-\alpha \beta ^{p+q}\lambda _{\max }}\), where

$$\begin{aligned} p:=\left\lceil \frac{\log (2\sigma -\ell ^{2}\beta ^{q}\lambda _{\max })-\log ( \alpha )}{\log (\beta )}\right\rceil . \end{aligned}$$
(18)

Thus, if \(f\) is continuous, any accumulation point of the sequence \(x_{k}\) is a zero of \(f\).

Proof

Denote by \(\alpha _{0}\) the initial value of \(\alpha \) in Algorithm 3. Proposition 1 together with (13) gives us

$$\begin{aligned} \Vert f(x-\tau f(x))\Vert ^{2}\le (1-2\sigma \tau +\ell ^{2}\tau ^{2})\Vert f(x)\Vert ^{2}\quad \text { whenever }x\in L(x_{0}),0<\tau \le \bar{\tau }. \end{aligned}$$

Further, we have that \(1-2\sigma \tau +\ell ^{2}\tau ^{2}\le 1-\alpha _{0}\beta ^{p}\tau \) with \(0<\tau \le \bar{\tau }\) if and only if \(0<\tau \le \min \left\{ (2\sigma -\alpha _{0}\beta ^{p})/\ell ^{2},\bar{\tau }\right\} \). By assumption, there exists some positive integer \(q\) such that \(\lambda _{\min }\le \beta ^{q}\lambda _{\max }<\min \left\{ 2\sigma /\ell ^{2},\bar{\tau }\right\} \). By the definition of \(p\) in (18), we have \(\beta ^{q}\lambda _{\max }\le (2\sigma -\alpha _{0}\beta ^{p})/\ell ^{2}\). Hence,

$$\begin{aligned} \lambda _{\min }\le \beta ^{q}\lambda _{\max }\le \min \left\{ \frac{2\sigma -\alpha _{0}\beta ^{p}}{\ell ^{2}},\bar{\tau }\right\} . \end{aligned}$$

Thus, for all \(x\in L(x_{0})\), we have

$$\begin{aligned} \Vert f(x-(\beta ^{q}\lambda _{\max })f(x))\Vert ^{2}\le (1-\alpha _{0} \beta ^{p+q}\lambda _{\max })\Vert f(x)\Vert ^{2}. \end{aligned}$$
(19)

Finally, observe that there is some positive integer \(s\) such that \(\beta ^{s}\lambda _{\max }<\lambda _{\min }\). Therefore, given \(x_{k}\), a new point \(x_{k+1}\) is guaranteed to be found in a finite number of steps of Algorithm 3, because the double backtracking loop can only be executed a maximum of \(sp+q\) times (after a maximum of \(sp\) iterations the value of \(\alpha \) will be equal to \(\alpha \beta ^{p}\), after which, a maximum of \(q\) iterations will be enough to find an appropriate step size \(\lambda _{k}\)). Thus, we have \(\alpha \lambda _{k}\ge \alpha _{0}\beta ^{p+q}\lambda _{\max }\). Consequently, by the acceptance criteria of the step size in Algorithm 3, we have

$$\begin{aligned} \Vert f(x_{k+1})\Vert ^{2}\le (1-\alpha \lambda _{k})\Vert f(x_{k})\ |^{2}\le (1-\alpha _{0}\beta ^{p+q}\lambda _{\max })\Vert f(x)\Vert ^{2}, \end{aligned}$$

and the claims follow.\(\square \)

Remark 4

  1. (i)

    The condition \(\lambda _{\min }\le \beta ^{q}\lambda _{\max }<\min \left\{ 2\sigma /\ell ^{2},\bar{\tau }\right\} \) in Theorem 3 is needed to avoid the possibility of an infinite loop in an iteration of the algorithm. Nevertheless, we believe this condition should not be too difficult to guarantee in practice, as it basically requires that \(\lambda _{\min }\) is not “too big” and \(\beta \) is not “too small”.

  2. (ii)

    Certainly, the constant \(\beta \) used for updating \(\alpha \) can be chosen different from the constant \(\beta \) used for updating \(\lambda _{k}\), and Theorem 3 would remain valid with slight changes. Nonetheless, we have decided to use the same constant to ease the notation and the analysis.

  3. (iii)

    In Algorithm 3,the constant \(\alpha \) is required to be smaller than \(\lambda _{\max }^{-1}\) to avoid unnecessary iterations (otherwise, the initial step \(\lambda _{k}=\lambda _{\max }\) would always be too big, since \(1-\alpha \lambda _{k}\) would be negative).