1 Introduction

In this work we propose a variant of the derivative-free spectral residual method Pand-SR  presented in [16], for solving nonlinear systems of equations of the form:

$$\begin{aligned} F(x)=0, \end{aligned}$$
(1)

with the aim of obtaining stronger global convergence results when \(F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) is a continuously differentiable mapping. Indeed, the sequence generated by Pand-SR  was proved to be convergent under mild standard assumptions, but only in a more specific setting it was shown in [16] that the limit point is also a solution of (1).

Inspired by [11], we adopt here a different linesearch strategy, which allows us to obtain a more general and nontrivial result for methods that do not make any use of derivatives of f, and in fact was not established in [16]. Namely we can prove that at every limit point \(x^*\) of the sequence \(\{x_k\}\) generated by the new algorithm, either \(F(x^*)=0\) or the gradient of the merit function

$$\begin{aligned} f(x) = \frac{1}{2}\Vert F(x)\Vert _2^2 \end{aligned}$$
(2)

is orthogonal to the residual F:

$$\begin{aligned} \big \langle \nabla f(x^*), F(x^*) \big \rangle = \big \langle J(x^*)^T F(x^*),F(x^*) \big \rangle = 0, \end{aligned}$$
(3)

being J the Jacobian of F.Footnote 1 Clearly the orthogonality condition (3) does not generally imply \(F(x^*)=0\); however this result can be recovered under additional conditions, e.g. when \(J(x^*)\) is positive (negative) definite. We further remark that the improvement with respect to Pand-SR  is not only theoretical; as discussed in Sect. 4, the performed numerical experiments show that the new linesearch has a positive impact also on the practical behaviour of the method.

Given the current iterate \(x_k\), spectral residual methods are methods of linesearch type which produce a new iterate \(x_{k+1}\) of the form:

$$\begin{aligned} x_{k+1}= x_k \pm \lambda _k \beta _k F(x_k) \end{aligned}$$
  • both the residual vectors \(\pm F(x_k)\) are used as search directions;

  • the spectral coefficient \(\beta _k\ne 0\) is generally the reciprocal of an appropriate Rayleigh quotient, approximating some eigenvalue of (suitable secant approximations of) the Jacobian [11, 15];

  • the steplength parameter \(\lambda _k>0\) is determined by suitable—typically nonmonotone—linesearch strategies to reduce the norm of F (or a smooth merit function as (2)).

Spectral residual methods have received a large attention because of the low-cost of the iterations, and because they require a low memory storage being matrix free, see e.g. [7, 9,10,11, 16]. They are particularly attractive when the Jacobian matrix of F is not available analytically or its computation is burdensome. Indeed, distinguishing features of these methods are that the computation of the search directions does not involve the solution of linear systems, and that effective derivative-free linesearch conditions can be defined [6, 7, 11, 12, 15].

The paper is organized as follows. Our algorithm is presented in Sect. 2, where we describe the new linesearch strategy and recall the main features of the spectral residual method Pand-SR . Convergence analysis is developed in Sect. 3 and numerical experiments are discussed in Sect. 4. Some conclusions and perspectives are drawn in Sect. 5.

1.1 Notations

The symbol \(\Vert \cdot \Vert \) denotes the Euclidean norm, J denotes the Jacobian matrix of F. Given a sequence of vectors \(\{x_k\}\), we occasionally denote \(F(x_k)\) by \(F_k\).

2 The Srand2    algorithm

We present a spectral residual method that is a modification of the Projected Approximate Norm Descent algorithm with Spectral Residual step (Pand-SR ) proposed in [16]. Pand-SR  was developed for solving convexly constrained nonlinear systems; here it is applied in an unconstrained setting. A brief discussion on the constrained case is postponed to Sect. 5.

The new algorithm is denoted as Srand2  (Spectral Residual Approximate Norm Descent) and differs from Pand-SR  in the definition of the linesearch conditions and in the choice of the spectral stepsize \(\beta _k\).

Both Pand-SR  and Srand2  employ a nonmonotone linesearch strategy based on the so-called approximate norm descent property [12]. This means that the generated sequence of iterates \(\{x_k\}\) satisfies

$$\begin{aligned} \Vert F(x_{k+1}) \Vert \le (1+\eta _k) \Vert F(x_k) \Vert \end{aligned}$$
(4)

for all k, where \(\{\eta _k\}\) is a positive sequence of scalars such that

$$\begin{aligned} \sum _{k=0}^\infty \eta _k \le \eta \le \infty . \end{aligned}$$
(5)

The idea behind such a condition is to allow a highly nonmonotone behaviour of \(\Vert F_k\Vert \) for (initial) large values of \(\eta _k\) while promoting a decrease of \(\Vert F \Vert \) for small (final) values of \(\eta _k\). A nonmonotone behaviour of the norm of F is crucial to avoid practical stagnation of methods based on spectral stepsizes (see e.g. [5, 11, 17]); at the same time condition (4) ensures the sequence \(\{ \Vert F_k \Vert \}\) to be bounded (see Theorem 1 in Sect. 3).

In detail, given the current iterate \(x_k\) and the initial stepsize \(\beta _k\), in [16] a new iterate \(x_{k+1}\) of form

$$\begin{aligned} x_{k+1}= x_k - \lambda _k \beta _k F(x_k)\quad \mathrm{or}\quad x_{k+1}= x_k + \lambda _k \beta _k F(x_k) \end{aligned}$$
(6)

is computed. The scalar \(\lambda _k \in (0,1]\) is fixed by using a backtracking strategy; starting from \(\lambda _k=1\), it is progressively reduced by a factor \(\sigma \in (0,1)\) (e.g. halved) until one of the following conditions is satisfied:

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1-\alpha (1+\lambda _k))\Vert F(x_k)\Vert , \end{aligned}$$
(7)

or

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1+\eta _k-\alpha \lambda _k)\Vert F(x_k)\Vert , \end{aligned}$$
(8)

where \(\alpha \in (0,1)\).

In Srand2  conditions (7) and (8) are respectively replaced by

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1-\alpha (1+\lambda _k^2))\Vert F(x_k)\Vert , \end{aligned}$$
(9)

and

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1+\eta _k-\alpha \lambda _k^2)\Vert F(x_k)\Vert . \end{aligned}$$
(10)

All these conditions are derivative-free. If F is continuously differentiable, as long as \(F_k^T J(x_k)F_k\ne 0\), either \(+\beta _k F_k\) or \(-\beta _k F_k\) is a descent direction for the merit function f in (2) and for \(\Vert F\Vert \) at \(x_k\); hence the first condition (9) (similarly (7)) promotes a sufficient decrease in \(\Vert F\Vert \) and is crucial for establishing results on the convergence of \(\{\Vert F_k\Vert \}\) to zero. On the other hand, the second condition (10) (similarly (8)) allows for an increase of \(\Vert F\Vert \) depending on the magnitude of \(\eta _k\). Trivially, (9) implies (10) and both imply the approximate norm descent condition (4); the same holds for conditions (7) and (8).

We observe that the change in conditions (9) and (10) with respect to (7) and (8) only derives from the \(\lambda _k^2\) term in the right hand side of (9) and (10). This squared term is common to other linesearch strategies as e.g. those in [11, 12]. This small change in the linesearch conditions has a great impact on the global convergence result of the overall algorithm as shown in the forthcoming section.

As concerns the choice of the spectral coefficient \(\beta _k\) in (6), both Pand-SR  and Srand2  use formulas closely related to the Barzilai–Borwein’s steplength employed in spectral gradient methods for optimization problems, see e.g. [2, 5]. However, differently from the optimization case, in spectral residual methods \(\beta _k\) may be positive or negative since both directions \(\pm F_k\) are attempted. Also, its absolute value is constrained to belong to a given interval \([\beta _{\min }, \beta _{\max }]\) to get a bounded sequence of stepsizes. As an example \(\beta _k\) can be chosen by computing

$$\begin{aligned} \beta _{k,1}= & {} \frac{p_{k-1}^T p_{k-1}}{p_{k-1}^T y_{k-1}}, \end{aligned}$$
(11)

or

$$\begin{aligned} \beta _{k,2}= & {} \frac{p_{k-1}^T y_{k-1}}{y_{k-1}^T y_{k-1}}, \end{aligned}$$

with \(p_{k-1} =x_{k}-x_{k-1}\) and \( y_{k-1}= F_k- F_{k-1},\) and then ensuring that \(\beta _{k,1}\) or \(\beta _{k,2}\) is such that \(|\beta _{k}| \in [\beta _{\min }, \beta _{\max }]\) by some thresholding rule. Alternative choices of \(\beta _k\) that suitably combine \(\beta _{k,1}\) and \(\beta _{k,2}\) can be found in [15], where a systematic analysis of the stepsize selection for spectral residual methods is addressed also in combination with an approximate norm descent linesearch. In Algorithm 2.1 we formally describe Srand2  for a general \(\beta _k\) such that \(|\beta _{k}| \in [\beta _{\min }, \beta _{\max }]\).

figure a

We observe that the Repeat loop at Step 2 terminates in a finite number of steps: indeed, from the continuity of F and the positivity of \(\eta _k\), there exists \(\bar{\lambda }>0\) such that

$$\begin{aligned} \Vert F(x_{k}\pm \lambda \beta _k F(x_k))\Vert \le \Vert F(x_k)\Vert +(\eta _k-\alpha \lambda ^2)\Vert F(x_k)\Vert , \end{aligned}$$

with \(\lambda \in (0, \bar{\lambda }],\) and \(i=1, \dots ,n;\) therefore, inequality (10) holds for small enough values of \(\lambda _k\).

3 Global convergence analysis

We now provide the convergence analysis of the Srand2  algorithm. Theorems 1 and 2 analyze the behaviour of the sequences \(\{\lambda _k\}\) and \(\{\Vert F_k\Vert \}\); they state general results which derive from the linesearch strategy and hold for Pand-SR  as well. Their proofs follow the lines of [16, Theorem 4.2] and therefore are not reported in this work. Theorem 2 in particular identifies situations where \(\{\Vert F_k\Vert \}\) may or may not converge to zero. Theorem 3 constitutes the main contribution of this work. It is related both to the linesearch strategy and to the choice of the spectral residual steps, and it does not rely on the specific choice of \(\beta _k\).

Theorem 1

Let \(F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) be a continuous map,  and let \(\{x_k\}\) and \(\{\lambda _k\}\) be the sequences of iterates and of linesearch stepsizes generated by the Srand2  algorithm. Then the sequence \(\{\Vert F_k\Vert \}\) is convergent and bounded by

$$\begin{aligned} \Vert F_{k}\Vert \le e^{\eta }\Vert F_0\Vert ,\quad \text {for all}\ k \ge 0, \end{aligned}$$
(12)

where \(\eta >0\) is given in (5). Moreover

$$\begin{aligned} \lim _{k \rightarrow \infty } \lambda _k^2\Vert F_k\Vert =0. \end{aligned}$$
(13)

Theorem 2

Let \(F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) be a continuous map,  and let \(\{x_k\}\) and \(\{\lambda _k\}\) be the sequences of iterates and of linesearch stepsizes generated by the Srand2  algorithm. Then

  1. (i)

    \(\mathop {\mathrm{liminf }}_{k \rightarrow \infty } \lambda _k^2 > 0 \hbox { implies that } \lim _{k\rightarrow \infty } \Vert F_k\Vert =0. \)

  2. (ii)

    If \((9)\) is satisfied for infinitely many k,  then \(\lim _{k \rightarrow \infty } \Vert F_k\Vert =0\).

  3. (iii)

    If \(\Vert F_k\Vert \le \Vert F_{k+1}\Vert \) for infinitely many iterations,  then \(\mathop {\mathrm{liminf }}_{k \rightarrow \infty } \lambda _k^2=0\).

  4. (iv)

    If \(\Vert F_k\Vert \le \Vert F_{k+1}\Vert \) for all k sufficiently large,  then \(\{ \Vert F_k\Vert \}\) does not converge to 0.

We now provide the main convergence result, that is at every limit point \(x^*\) of the sequence \(\{x_k\}\) generated by the Srand2  algorithm, the gradient of the merit function f in (2) is orthogonal to the residual \(F(x^*)\).

Theorem 3

Let F be continuously differentiable. Let \(\{x_k\}\) be the sequence generated by the Srand2  algorithm and let \(x^*\) be a limit point of \(\{x_k\}\). Then either

$$\begin{aligned} F(x^*)=0 \end{aligned}$$

or

$$\begin{aligned} \big \langle \nabla f(x^*), F(x^*) \big \rangle = \big \langle J(x^*)^TF(x^*), F(x^*) \big \rangle =0. \end{aligned}$$
(14)

Proof

Let \(\textit{K}\) be an infinite subset of indices such that \(\lim _{k \in \textit{K}} x_k=x^*.\) By Theorem 1 we know that \(\lim _{k \in \textit{K}} \lambda _k^2\Vert F_k\Vert =0\). Hence there are two possibilities:

$$\begin{aligned} \mathrm{either}~~~\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k^2 > 0 ~~~\mathrm{or}~~~ \mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k^2 =0. \end{aligned}$$

The first one implies \(\lim _{k \in \textit{K}} \Vert F_k\Vert =0\). Then using the continuity of F it follows easily that

$$\begin{aligned} \lim _{k \in \textit{K}} \Vert F(x_k)\Vert =\Vert F(x^*)\Vert =0. \end{aligned}$$

In the second case we have \(\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k^2 =\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k = 0\). Let \(\underline{\lambda }_k=\lambda _k/\sigma \) denote the last attempted value for the linesearch parameter before \(\lambda _k \) is accepted during the backtracking phase. Hence for sufficiently large values of \(k\in \textit{K}\) we have

$$\begin{aligned} \Vert F(x_k-\underline{\lambda }_k \beta _k F_k)\Vert> & {} (1+\eta _k-\alpha \underline{\lambda }_k^2)\Vert F(x_k)\Vert ,\\ \Vert F(x_k+\underline{\lambda }_k \beta _k F_k)\Vert> & {} (1+\eta _k-\alpha \underline{\lambda }_k^2)\Vert F(x_k)\Vert . \end{aligned}$$

Being \(\eta _k > 0\), and by virtue of (12), there is a positive constant \(c_1\) such that

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert - \Vert F(x_k)\Vert> (\eta _k-\alpha \underline{\lambda }_k^2)\Vert F(x_k))\Vert> -\alpha \underline{\lambda }_k^2 \Vert F(x_k)\Vert > -c_1 \alpha \underline{\lambda }_k^2, \end{aligned}$$
(15)

and multiplying both sides of (15) by \(\Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert +\Vert F(x_k)\Vert ,\) we obtain

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert ^2-\Vert F(x_k)\Vert ^2> -c_1 \alpha \underline{\lambda }_k^2 \big (\Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert +\Vert F(x_k)\Vert \big ). \end{aligned}$$
(16)

Now we observe that \(x_k \pm \lambda _k \beta _k F_k\) is bounded \(\forall k \in \textit{K}\); indeed, by hypothesis \(\lambda _k\in (0,1]\), \(|\beta _k|\le \beta _{\max }\), the subsequence \(\{x_k\}_{k\in K}\) is convergent to \(x^*\) and hence bounded, and \(\Vert F_k\Vert \) is bounded by Theorem 1. Then recalling the definition of \(\underline{\lambda }_k=\lambda _k/ \sigma \) and the continuity of F, we have

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert +\Vert F(x_k)\Vert \le c_2, ~~k\in \textit{K}, \end{aligned}$$
(17)

for some positive constant \(c_2\). Consequently, from (16) and (17), there exists a constant \(c>0\) such that

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert ^2-\Vert F(x_k))\Vert ^2> -c \alpha \underline{\lambda }_k^2, \end{aligned}$$
(18)

for sufficiently large values of \( k \in \textit{K}\).

Now, we suppose that \(\beta _k >0\) for infinitely many indices \(k \in \textit{K}_1 \subseteq \textit{K}\), and we consider the two steps \( -\lambda _k \beta _k F_k\) and \(+\lambda _k \beta _k F_k\) separately.

  • Firstly, we consider \(-\lambda \beta _k F_k.\) By virtue of the Mean Value Theorem and (18), there exists \(\xi _k \in [0,1]\) such that

    $$\begin{aligned} \big \langle \nabla f (x_k-\xi _k \underline{\lambda }_k\beta _kF_k), -\underline{\lambda }_k \beta _k F_k\big \rangle > - c \alpha \underline{\lambda }_k^2, \end{aligned}$$

    for sufficiently large \( k \in \textit{K}\). Hence, for all large \( k \in \textit{K}_1\) we have that:

    $$\begin{aligned} \big \langle \nabla f(x_k-\xi _k \underline{\lambda }_k\beta _kF_k), F_k \big \rangle < c \alpha \frac{\underline{\lambda }_k}{\beta _k} \le c \alpha \frac{\underline{\lambda }_k}{\beta _{\text {min}}}. \end{aligned}$$
    (19)
  • Now we consider \( +\lambda \beta _k F_k.\) Similarly there exists \(\xi '_k \in [0,1]\) such that for all large \( k \in \textit{K}_1\)

    $$\begin{aligned} \big \langle \nabla f(x_k+\xi '_k \underline{\lambda }_k\beta _kF_k), F_k \big \rangle > -c \alpha \frac{\underline{\lambda }_k}{\beta _k} \ge -c \alpha \frac{\underline{\lambda }_k}{\beta _{\text {min}}}. \end{aligned}$$
    (20)

Since \(\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k= 0,\) taking limits in (19) and (20) we get

$$\begin{aligned} \big \langle \nabla f(x^*), F(x^*) \big \rangle = 0. \end{aligned}$$

We proceed in a quite similar way if \(\beta _k < 0\) for infinitely many indices. \(\square \)

Corollary 1

The orthogonality condition (14) implies \(F(x^*)=0\) in the following cases : 

  1. (a)

    \(J(x^*)\) is positive (negative) definite; 

  2. (b)

    \(v^T J(x^*)v \ne 0,\) for all \(v \in {\mathbb {R}}^n,\) \(v \ne 0.\)

Case (a) in Corollary 1 includes the class of nonlinear monotone systems of equations of the form (1) with F continuously differentiable and strictly monotone, that is \((F(x)-F(y))^T(x-y)> 0\) for any \(x, y\in {\mathbb {R}}^n\) with \(x\ne y\) [4]. Nonlinear monotone systems of equations arise in several applications and tailored spectral type methods have been recently proposed, see e.g. [18].

Remark 1

A general result like Theorem 3 was not proved for Pand-SR , which is in turn known to be convergent. Moreover, if \(x^*\) is the limit point and \(x_0\) the starting guess, the following bound

$$\begin{aligned} \Vert x_0 - x^*\Vert \le \beta _{\max } \left( \frac{1}{\alpha }+\frac{\eta }{\alpha } e^{\eta }\right) \Vert F_0\Vert \end{aligned}$$
(21)

was provided in [16]. However it cannot be proved in general that \(F(x^*)=0\). Such a result was obtained in [16] basing the choice of \(\beta _k\) on (11), assuming the Jacobian J to be Lipschitz continuous, and focusing on specific classes of problems. For example, [16, Theorem 5.2] consider the case of \(J(x^*)\) with positive (negative) definite symmetric part and suitably bounded condition number. In [16, Theorem 5.2] instead, \(J(x^*)\) is assumed to be strongly diagonal dominant, with diagonal entries of constant sign.

We show in the forthcoming section, that the stronger convergence properties of Srand2  correspond in practice to an algorithm potentially more robust than Pand-SR . Of course, we cannot expect strong difference in the performance of the two methods, given the small change between the two. Nevertheless, the new linesearch is able to recover few cases when \(\Vert F_k\Vert \) does not converge to zero encountered with the previous one.

4 Numerical illustration

We compare the performance of Srand2  and Pand-SR  algorithms on two problem sets. The first set (named set-Luksan) contains 17 nonlinear systems of the Lukšan test collection described in [13] that are commonly used as benchmark for optimization algorithms. The second set (named set-contact) consists in nonlinear systems arising in the solution of rail-wheel contact models via the classical CONTACT algorithm [8]. These tests were described in details and used in [15, Section 5.2]. We selected here the 153 problems generated with train speed of magnitude \(v = 16 m/s\), yielding systems whose dimensions vary from \(n=156\) to \(n=1394\).

Pand-SR  and Srand2  algorithms were implemented as described in Sect. 2 with parameters

$$\begin{aligned}&\beta _0=1,\quad \beta _{\text {min}}= 10^{-10},\quad \beta _{\text {max}}=10^{10},\quad \alpha = 10^{-4},\quad \sigma =0.5,\\&\eta _k = 0.99^k(100+\Vert F_0\Vert ^2)\quad \forall k\ge 0, \end{aligned}$$

see [16]. A maximum number of \(10^5\) iterations and F-evaluations was imposed, and a maximum number of backtracks equal to 40 was allowed at each iteration. The procedure was declared successful when

$$\begin{aligned} \Vert F_k \Vert \le 10^{-6}. \end{aligned}$$
(22)

Failure was declared when either the assigned maximum number of iterations or F-evaluations or backtracks was reached, or \(\Vert F\Vert \) was not reduced for 500 consecutive iterations. Such occurrences are denoted below as \(\mathtt{F_{it}}\), \(\mathtt{F_{fe}}\), \(\mathtt{F_{bt}}\), \(\mathtt{F_{in}}\), respectively.

Regarding the choice of \(\beta _k\), we used three classical rules based on \(\beta _{k,1}\), \(\beta _{k,2}\) and their alternation, respectively named BB1, BB2 and ALT in what follows. Given a scalar \(\beta \), let \(T(\beta )\) be the projection of \(|\beta |\) onto \({{I_{\beta }}}{\mathop {=}\limits ^\mathrm{def}}[ \beta _{\min }, \beta _{\max }]\), that is

$$\begin{aligned} T(\beta ) = \min \{\beta _{\text {max}}, \max \{ \beta _{\text {min}},|\beta |\}\}. \end{aligned}$$
(23)

We recall below the definition of BB1, BB2 and ALT as given in [15].

BB1 rule.:

By [7, 9, 10, 16], at each iteration set

$$\begin{aligned} \beta _k= {\left\{ \begin{array}{ll} \beta _{k,1}&{} \text {if } \ |\beta _{k,1}|\in {{I_{\beta }}}\\ T(\beta _{k,1}) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(24)
BB2 rule.:

At each iteration set

$$\begin{aligned} \beta _k= {\left\{ \begin{array}{ll} \beta _{k,2}&{} \text {if } \ |\beta _{k,2}|\in {{I_{\beta }}}\\ T(\beta _{k,2}) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(25)
ALT rule.:

Following [1, 7], at each iteration alternate between \(\beta _{k,1}\) and \(\beta _{k,2}\), setting:

$$\begin{aligned}&\beta ^{{\small \mathrm{ALT}}}_k= {\left\{ \begin{array}{ll} \beta _{k,1}&{} \text {for } k \hbox { odd} \\ \beta _{k,2}&{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(26)
$$\begin{aligned}&\beta _k= {\left\{ \begin{array}{ll} \beta ^{{\small \mathrm{ALT}}}_k &{} \quad \text {if} \ |\beta ^{{\small \mathrm{ALT}}}| \in {{I_{\beta }}}\\ \beta _{k,1}&{} \quad \text {if}\ k\ \text {even},\ |\beta _{k,1}| \in {{I_{\beta }}},\ |\beta _{k,2}| \notin {{I_{\beta }}}\\ \beta _{k,2}&{} \quad \text {if}\ k\ \text {odd,} \ |\beta _{k,2}| \in {{I_{\beta }}},\ |\beta _{k,1}| \notin {{I_{\beta }}} \\ T(\beta ^{{\small \mathrm{ALT}}}_k) &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(27)

We experimented Pand-SR  and Srand2  also with more elaborated, adaptive rules for \(\beta _k\) see e.g. [2, 15], but the qualitative behaviour of the two methods did not change; therefore we do not report the corresponding results.

Problems in set-Luksan were solved setting \(n=500\) and starting from the initial guess \(x_0\) suggested in [13]. Problem lu5 requires an odd value for n and therefore we set \(n=501\). For 16 out of 17 problems, Pand-SR  and Srand2  give the same results: Table 1 reports the number of F-evaluations varying the updating rule for \(\beta _k\). More interesting is the case of Problem lu16 reported in Table 2. Though performing a large number of F-evaluations, Srand2  is able to successfully solve Problem lu16 using BB2 and ALT, whereas Pand-SR  returns a failure with all the attempted \(\beta _k\) rules.

Table 1 set-Luksan: number of F-evaluations performed by Pand-SR  and Srand2  with different rules for \(\beta _k\)
Table 2 set-Luksan: number of F-evaluations performed by Pand-SR  and Srand2  with different rules for \(\beta _k\) on for Problem lu16

In Fig. 1 we give an insight into the convergence behaviour of both methods with BB2 on Problem lu16. We display \(\Vert F_k\Vert \) versus the iterations and the number of F-evaluations (top part), the number of backtracks performed by both algorithms (central part), and values of \(\Vert F_k\Vert \) and \(\lambda _k\) versus the iterations for both algorithms (bottom part). All plots are obtained by disabling the stopping criterion on the number of consecutive increases of \(\Vert F\Vert \). In this setting Pand-SR  fails since the maximum number of backtracks is reached, after 3278 iterations and 56883 F-evaluations while Srand2  converges after 8456 iterations and 45624 F-evaluations. We observe that the sequence of \(\{\Vert F_k\Vert \}\) generated by Pand-SR  does not satisfy the stopping criterion (22), whereas the increasing number of backtracks along the iterations corresponds to the fact that \(\{\lambda _k\}\) is going to zero. On the contrary, the sequence \(\{\Vert F_k\Vert \}\) generated by Srand2  converges to zero and \(\lambda _k\) does not decrease with the iterations. Both situations are in accordance with the theory: at least one among the sequences \(\{\Vert F_k\Vert \}\) and \(\{\lambda _k\}\) converges to zero, but the linesearch adopted in Srand2  more likely generates a sequence \(\{\Vert F_k\Vert \}\) that goes to zero.

Fig. 1
figure 2

set-Luksan: convergence history generated by Pand-SR  and Srand2  with BB2 for Problem lu16

This behaviour is also confirmed by the experiments performed with the set-contact problems. Results obtained for these problems are summarized in the F-evaluation performance profiles [3] of Fig. 2, where Pand-SR  and Srand2 , combined with rules BB2 (top plot) and ALT (bottom plot), are compared. Results with BB1 are not reported since the two algorithms give exactly the same values for the number of F-evaluations. The plots clearly show that the two algorithms perform similarly and Srand2  is slightly more robust. In detail, Pand-SR  and Srand2  with BB2 solve 132 and 135 problems, respectively. Also in combination with the ALT rule, Srand2  solves 3 problems more than Pand-SR .

Fig. 2
figure 3

set-contact: F-evaluation performance profile of Pand-SR  and Srand2  methods with BB2 (top) and with ALT (bottom)

In the 6 cases recovered by Srand2 , the behaviour of the two methods was similar to what observed with Problem lu16. To witness, the graphs reported in Fig. 3 are relative to one of the cases where the BB2 rule was in use. Analogous observations as for Fig. 1 can be drawn, regarding convergence to zero of the sequences \(\{ \lambda _k \}\) and \(\{ \Vert F_k\Vert \}\).

Fig. 3
figure 4

set-contact: convergence history generated by Pand-SR  and Srand2  with BB2 for problem \(155\_3\_3\) in [15, Table B.5]

5 Conclusions and outlook

In this work we show how to modify the algorithm proposed in [16] in order to establish mild general conditions that guarantee the convergence of the sequence \(\{\Vert F_k\Vert \}\) to zero, and the corresponding practical benefits in terms of robustness.

The Pand-SR  algorithm in [16] was developed for solving constrained nonlinear system of the form

$$\begin{aligned} F(x)=0, x\in \varOmega , \end{aligned}$$
(28)

where \(\varOmega \subset {\mathbb {R}}^n\) is a convex set whose relative interior is non-empty. Srand2  can also be adapted to the solution of constrained problems of the form (28) by relying on suitable projection operator onto the feasible set \(\varOmega \) as follows. Proceeding as in [16], feasible iterates \(\{x_k\}\) can be defined by starting from a feasible \(x_0\), and by setting for \(k>0\)

$$\begin{aligned} x_{k+1} = P(x_k\pm \lambda _k\beta _k F_k), \end{aligned}$$

where P denotes a projection operator onto the considered domain. As an example, if \(\varOmega \) is a n-dimensional box \(\{x\in {\mathbb {R}}^n\,\,\, \hbox { s.t. } \,\,\, l\le x\le u\}\), where \(l\in ({\mathbb {R}}\cup {-\infty })^n\), \(u\in ({\mathbb {R}}\cup {\infty })^n\), and the inequalities are meant component-wise, a projection map may be given by \(P(x)=\max \left\{ l, \min \left\{ x,u\right\} \right\} \).

Such a modification of the Srand2  algorithm to handle constrained problems, trivially enjoys the theoretical properties presented in Theorems 1 and 2. Remarkably, the new global convergence result of Theorem 3 can also be easily extended to problem (28) for limit points lying in the interior of \(\varOmega \). Convergence to solutions on the boundary of \(\varOmega \) is currently under investigation.