On the global convergence of a new spectral residual algorithm for nonlinear systems of equations

Papini, Alessandra; Porcelli, Margherita; Sgattoni, Cristina

doi:10.1007/s40574-020-00270-5

On the global convergence of a new spectral residual algorithm for nonlinear systems of equations

Open access
Published: 24 November 2020

Volume 14, pages 367–378, (2021)
Cite this article

Download PDF

You have full access to this open access article

Bollettino dell'Unione Matematica Italiana Aims and scope Submit manuscript

On the global convergence of a new spectral residual algorithm for nonlinear systems of equations

Download PDF

Alessandra Papini¹,
Margherita Porcelli ORCID: orcid.org/0000-0003-0183-1204² &
Cristina Sgattoni³

1924 Accesses
Explore all metrics

Abstract

We present a derivative-free method for solving systems of nonlinear equations that belongs to the class of spectral residual methods. We will show that by endowing a previous version of the algorithm with a suitable new linesearch strategy, standard global convergence results can be attained under mild general assumptions. The robustness of the new method is therefore potentially improved with respect to the previous version as shown by the reported numerical experiments.

A spectral algorithm for large-scale systems of nonlinear monotone equations

Article 06 March 2017

Analysis of the Convergence of a Combined Method for the Solution of Nonlinear Equations

Article 24 July 2014

A study on the local convergence and the dynamics of Chebyshev–Halley–type methods free from second derivative

Article 03 March 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this work we propose a variant of the derivative-free spectral residual method Pand-SR presented in [16], for solving nonlinear systems of equations of the form:

$$\begin{aligned} F(x)=0, \end{aligned}$$

(1)

with the aim of obtaining stronger global convergence results when $F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n$ is a continuously differentiable mapping. Indeed, the sequence generated by Pand-SR was proved to be convergent under mild standard assumptions, but only in a more specific setting it was shown in [16] that the limit point is also a solution of (1).

Inspired by [11], we adopt here a different linesearch strategy, which allows us to obtain a more general and nontrivial result for methods that do not make any use of derivatives of f, and in fact was not established in [16]. Namely we can prove that at every limit point $x^*$ of the sequence $\{x_k\}$ generated by the new algorithm, either $F(x^*)=0$ or the gradient of the merit function

$$\begin{aligned} f(x) = \frac{1}{2}\Vert F(x)\Vert _2^2 \end{aligned}$$

(2)

is orthogonal to the residual F:

$$\begin{aligned} \big \langle \nabla f(x^*), F(x^*) \big \rangle = \big \langle J(x^*)^T F(x^*),F(x^*) \big \rangle = 0, \end{aligned}$$

(3)

being J the Jacobian of F.^{Footnote 1} Clearly the orthogonality condition (3) does not generally imply $F(x^*)=0$; however this result can be recovered under additional conditions, e.g. when $J(x^*)$ is positive (negative) definite. We further remark that the improvement with respect to Pand-SR is not only theoretical; as discussed in Sect. 4, the performed numerical experiments show that the new linesearch has a positive impact also on the practical behaviour of the method.

Given the current iterate $x_k$, spectral residual methods are methods of linesearch type which produce a new iterate $x_{k+1}$ of the form:

$$\begin{aligned} x_{k+1}= x_k \pm \lambda _k \beta _k F(x_k) \end{aligned}$$

both the residual vectors $\pm F(x_k)$ are used as search directions;
the spectral coefficient $\beta _k\ne 0$ is generally the reciprocal of an appropriate Rayleigh quotient, approximating some eigenvalue of (suitable secant approximations of) the Jacobian [11, 15];
the steplength parameter $\lambda _k>0$ is determined by suitable—typically nonmonotone—linesearch strategies to reduce the norm of F (or a smooth merit function as (2)).

Spectral residual methods have received a large attention because of the low-cost of the iterations, and because they require a low memory storage being matrix free, see e.g. [7, 9,10,11, 16]. They are particularly attractive when the Jacobian matrix of F is not available analytically or its computation is burdensome. Indeed, distinguishing features of these methods are that the computation of the search directions does not involve the solution of linear systems, and that effective derivative-free linesearch conditions can be defined [6, 7, 11, 12, 15].

The paper is organized as follows. Our algorithm is presented in Sect. 2, where we describe the new linesearch strategy and recall the main features of the spectral residual method Pand-SR . Convergence analysis is developed in Sect. 3 and numerical experiments are discussed in Sect. 4. Some conclusions and perspectives are drawn in Sect. 5.

1.1 Notations

The symbol $\Vert \cdot \Vert $ denotes the Euclidean norm, J denotes the Jacobian matrix of F. Given a sequence of vectors $\{x_k\}$, we occasionally denote $F(x_k)$ by $F_k$.

2 The Srand2 algorithm

We present a spectral residual method that is a modification of the Projected Approximate Norm Descent algorithm with Spectral Residual step (Pand-SR ) proposed in [16]. Pand-SR was developed for solving convexly constrained nonlinear systems; here it is applied in an unconstrained setting. A brief discussion on the constrained case is postponed to Sect. 5.

The new algorithm is denoted as Srand2 (Spectral Residual Approximate Norm Descent) and differs from Pand-SR in the definition of the linesearch conditions and in the choice of the spectral stepsize $\beta _k$.

Both Pand-SR and Srand2 employ a nonmonotone linesearch strategy based on the so-called approximate norm descent property [12]. This means that the generated sequence of iterates $\{x_k\}$ satisfies

$$\begin{aligned} \Vert F(x_{k+1}) \Vert \le (1+\eta _k) \Vert F(x_k) \Vert \end{aligned}$$

(4)

for all k, where $\{\eta _k\}$ is a positive sequence of scalars such that

$$\begin{aligned} \sum _{k=0}^\infty \eta _k \le \eta \le \infty . \end{aligned}$$

(5)

The idea behind such a condition is to allow a highly nonmonotone behaviour of $\Vert F_k\Vert $ for (initial) large values of $\eta _k$ while promoting a decrease of $\Vert F \Vert $ for small (final) values of $\eta _k$. A nonmonotone behaviour of the norm of F is crucial to avoid practical stagnation of methods based on spectral stepsizes (see e.g. [5, 11, 17]); at the same time condition (4) ensures the sequence $\{ \Vert F_k \Vert \}$ to be bounded (see Theorem 1 in Sect. 3).

In detail, given the current iterate $x_k$ and the initial stepsize $\beta _k$, in [16] a new iterate $x_{k+1}$ of form

$$\begin{aligned} x_{k+1}= x_k - \lambda _k \beta _k F(x_k)\quad \mathrm{or}\quad x_{k+1}= x_k + \lambda _k \beta _k F(x_k) \end{aligned}$$

(6)

is computed. The scalar $\lambda _k \in (0,1]$ is fixed by using a backtracking strategy; starting from $\lambda _k=1$, it is progressively reduced by a factor $\sigma \in (0,1)$ (e.g. halved) until one of the following conditions is satisfied:

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1-\alpha (1+\lambda _k))\Vert F(x_k)\Vert , \end{aligned}$$

(7)

or

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1+\eta _k-\alpha \lambda _k)\Vert F(x_k)\Vert , \end{aligned}$$

(8)

where $\alpha \in (0,1)$.

In Srand2 conditions (7) and (8) are respectively replaced by

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1-\alpha (1+\lambda _k^2))\Vert F(x_k)\Vert , \end{aligned}$$

(9)

and

$$\begin{aligned} \Vert F(x_{k+1})\Vert \le (1+\eta _k-\alpha \lambda _k^2)\Vert F(x_k)\Vert . \end{aligned}$$

(10)

All these conditions are derivative-free. If F is continuously differentiable, as long as $F_k^T J(x_k)F_k\ne 0$, either $+\beta _k F_k$ or $-\beta _k F_k$ is a descent direction for the merit function f in (2) and for $\Vert F\Vert $ at $x_k$; hence the first condition (9) (similarly (7)) promotes a sufficient decrease in $\Vert F\Vert $ and is crucial for establishing results on the convergence of $\{\Vert F_k\Vert \}$ to zero. On the other hand, the second condition (10) (similarly (8)) allows for an increase of $\Vert F\Vert $ depending on the magnitude of $\eta _k$. Trivially, (9) implies (10) and both imply the approximate norm descent condition (4); the same holds for conditions (7) and (8).

We observe that the change in conditions (9) and (10) with respect to (7) and (8) only derives from the $\lambda _k^2$ term in the right hand side of (9) and (10). This squared term is common to other linesearch strategies as e.g. those in [11, 12]. This small change in the linesearch conditions has a great impact on the global convergence result of the overall algorithm as shown in the forthcoming section.

As concerns the choice of the spectral coefficient $\beta _k$ in (6), both Pand-SR and Srand2 use formulas closely related to the Barzilai–Borwein’s steplength employed in spectral gradient methods for optimization problems, see e.g. [2, 5]. However, differently from the optimization case, in spectral residual methods $\beta _k$ may be positive or negative since both directions $\pm F_k$ are attempted. Also, its absolute value is constrained to belong to a given interval $[\beta _{\min }, \beta _{\max }]$ to get a bounded sequence of stepsizes. As an example $\beta _k$ can be chosen by computing

$$\begin{aligned} \beta _{k,1}= & {} \frac{p_{k-1}^T p_{k-1}}{p_{k-1}^T y_{k-1}}, \end{aligned}$$

(11)

or

$$\begin{aligned} \beta _{k,2}= & {} \frac{p_{k-1}^T y_{k-1}}{y_{k-1}^T y_{k-1}}, \end{aligned}$$

with $p_{k-1} =x_{k}-x_{k-1}$ and $ y_{k-1}= F_k- F_{k-1},$ and then ensuring that $\beta _{k,1}$ or $\beta _{k,2}$ is such that $|\beta _{k}| \in [\beta _{\min }, \beta _{\max }]$ by some thresholding rule. Alternative choices of $\beta _k$ that suitably combine $\beta _{k,1}$ and $\beta _{k,2}$ can be found in [15], where a systematic analysis of the stepsize selection for spectral residual methods is addressed also in combination with an approximate norm descent linesearch. In Algorithm 2.1 we formally describe Srand2 for a general $\beta _k$ such that $|\beta _{k}| \in [\beta _{\min }, \beta _{\max }]$.

We observe that the Repeat loop at Step 2 terminates in a finite number of steps: indeed, from the continuity of F and the positivity of $\eta _k$, there exists $\bar{\lambda }>0$ such that

$$\begin{aligned} \Vert F(x_{k}\pm \lambda \beta _k F(x_k))\Vert \le \Vert F(x_k)\Vert +(\eta _k-\alpha \lambda ^2)\Vert F(x_k)\Vert , \end{aligned}$$

with $\lambda \in (0, \bar{\lambda }],$ and $i=1, \dots ,n;$ therefore, inequality (10) holds for small enough values of $\lambda _k$.

3 Global convergence analysis

We now provide the convergence analysis of the Srand2 algorithm. Theorems 1 and 2 analyze the behaviour of the sequences $\{\lambda _k\}$ and $\{\Vert F_k\Vert \}$; they state general results which derive from the linesearch strategy and hold for Pand-SR as well. Their proofs follow the lines of [16, Theorem 4.2] and therefore are not reported in this work. Theorem 2 in particular identifies situations where $\{\Vert F_k\Vert \}$ may or may not converge to zero. Theorem 3 constitutes the main contribution of this work. It is related both to the linesearch strategy and to the choice of the spectral residual steps, and it does not rely on the specific choice of $\beta _k$.

Theorem 1

Let $F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n$ be a continuous map, and let $\{x_k\}$ and $\{\lambda _k\}$ be the sequences of iterates and of linesearch stepsizes generated by the Srand2 algorithm. Then the sequence $\{\Vert F_k\Vert \}$ is convergent and bounded by

$$\begin{aligned} \Vert F_{k}\Vert \le e^{\eta }\Vert F_0\Vert ,\quad \text {for all}\ k \ge 0, \end{aligned}$$

(12)

where $\eta >0$ is given in (5). Moreover

$$\begin{aligned} \lim _{k \rightarrow \infty } \lambda _k^2\Vert F_k\Vert =0. \end{aligned}$$

(13)

Theorem 2

Let $F: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n$ be a continuous map, and let $\{x_k\}$ and $\{\lambda _k\}$ be the sequences of iterates and of linesearch stepsizes generated by the Srand2 algorithm. Then

(i)
$\mathop {\mathrm{liminf }}_{k \rightarrow \infty } \lambda _k^2 > 0 \hbox { implies that } \lim _{k\rightarrow \infty } \Vert F_k\Vert =0. $
(ii)
If $(9)$ is satisfied for infinitely many k, then $\lim _{k \rightarrow \infty } \Vert F_k\Vert =0$.
(iii)
If $\Vert F_k\Vert \le \Vert F_{k+1}\Vert $ for infinitely many iterations, then $\mathop {\mathrm{liminf }}_{k \rightarrow \infty } \lambda _k^2=0$.
(iv)
If $\Vert F_k\Vert \le \Vert F_{k+1}\Vert $ for all k sufficiently large, then $\{ \Vert F_k\Vert \}$ does not converge to 0.

We now provide the main convergence result, that is at every limit point $x^*$ of the sequence $\{x_k\}$ generated by the Srand2 algorithm, the gradient of the merit function f in (2) is orthogonal to the residual $F(x^*)$.

Theorem 3

Let F be continuously differentiable. Let $\{x_k\}$ be the sequence generated by the Srand2 algorithm and let $x^*$ be a limit point of $\{x_k\}$. Then either

$$\begin{aligned} F(x^*)=0 \end{aligned}$$

or

$$\begin{aligned} \big \langle \nabla f(x^*), F(x^*) \big \rangle = \big \langle J(x^*)^TF(x^*), F(x^*) \big \rangle =0. \end{aligned}$$

(14)

Proof

Let $\textit{K}$ be an infinite subset of indices such that $\lim _{k \in \textit{K}} x_k=x^*.$ By Theorem 1 we know that $\lim _{k \in \textit{K}} \lambda _k^2\Vert F_k\Vert =0$. Hence there are two possibilities:

$$\begin{aligned} \mathrm{either}~~~\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k^2 > 0 ~~~\mathrm{or}~~~ \mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k^2 =0. \end{aligned}$$

The first one implies $\lim _{k \in \textit{K}} \Vert F_k\Vert =0$. Then using the continuity of F it follows easily that

$$\begin{aligned} \lim _{k \in \textit{K}} \Vert F(x_k)\Vert =\Vert F(x^*)\Vert =0. \end{aligned}$$

In the second case we have $\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k^2 =\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k = 0$. Let $\underline{\lambda }_k=\lambda _k/\sigma $ denote the last attempted value for the linesearch parameter before $\lambda _k $ is accepted during the backtracking phase. Hence for sufficiently large values of $k\in \textit{K}$ we have

$$\begin{aligned} \Vert F(x_k-\underline{\lambda }_k \beta _k F_k)\Vert> & {} (1+\eta _k-\alpha \underline{\lambda }_k^2)\Vert F(x_k)\Vert ,\\ \Vert F(x_k+\underline{\lambda }_k \beta _k F_k)\Vert> & {} (1+\eta _k-\alpha \underline{\lambda }_k^2)\Vert F(x_k)\Vert . \end{aligned}$$

Being $\eta _k > 0$, and by virtue of (12), there is a positive constant $c_1$ such that

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert - \Vert F(x_k)\Vert> (\eta _k-\alpha \underline{\lambda }_k^2)\Vert F(x_k))\Vert> -\alpha \underline{\lambda }_k^2 \Vert F(x_k)\Vert > -c_1 \alpha \underline{\lambda }_k^2, \end{aligned}$$

(15)

and multiplying both sides of (15) by $\Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert +\Vert F(x_k)\Vert ,$ we obtain

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert ^2-\Vert F(x_k)\Vert ^2> -c_1 \alpha \underline{\lambda }_k^2 \big (\Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert +\Vert F(x_k)\Vert \big ). \end{aligned}$$

(16)

Now we observe that $x_k \pm \lambda _k \beta _k F_k$ is bounded $\forall k \in \textit{K}$; indeed, by hypothesis $\lambda _k\in (0,1]$, $|\beta _k|\le \beta _{\max }$, the subsequence $\{x_k\}_{k\in K}$ is convergent to $x^*$ and hence bounded, and $\Vert F_k\Vert $ is bounded by Theorem 1. Then recalling the definition of $\underline{\lambda }_k=\lambda _k/ \sigma $ and the continuity of F, we have

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert +\Vert F(x_k)\Vert \le c_2, ~~k\in \textit{K}, \end{aligned}$$

(17)

for some positive constant $c_2$. Consequently, from (16) and (17), there exists a constant $c>0$ such that

$$\begin{aligned} \Vert F(x_k\pm \underline{\lambda }_k \beta _k F_k)\Vert ^2-\Vert F(x_k))\Vert ^2> -c \alpha \underline{\lambda }_k^2, \end{aligned}$$

(18)

for sufficiently large values of $ k \in \textit{K}$.

Now, we suppose that $\beta _k >0$ for infinitely many indices $k \in \textit{K}_1 \subseteq \textit{K}$, and we consider the two steps $ -\lambda _k \beta _k F_k$ and $+\lambda _k \beta _k F_k$ separately.

Firstly, we consider $-\lambda \beta _k F_k.$ By virtue of the Mean Value Theorem and (18), there exists $\xi _k \in [0,1]$ such that
$$\begin{aligned} \big \langle \nabla f (x_k-\xi _k \underline{\lambda }_k\beta _kF_k), -\underline{\lambda }_k \beta _k F_k\big \rangle > - c \alpha \underline{\lambda }_k^2, \end{aligned}$$
for sufficiently large $ k \in \textit{K}$. Hence, for all large $ k \in \textit{K}_1$ we have that:
$$\begin{aligned} \big \langle \nabla f(x_k-\xi _k \underline{\lambda }_k\beta _kF_k), F_k \big \rangle < c \alpha \frac{\underline{\lambda }_k}{\beta _k} \le c \alpha \frac{\underline{\lambda }_k}{\beta _{\text {min}}}. \end{aligned}$$
(19)
Now we consider $ +\lambda \beta _k F_k.$ Similarly there exists $\xi '_k \in [0,1]$ such that for all large $ k \in \textit{K}_1$
$$\begin{aligned} \big \langle \nabla f(x_k+\xi '_k \underline{\lambda }_k\beta _kF_k), F_k \big \rangle > -c \alpha \frac{\underline{\lambda }_k}{\beta _k} \ge -c \alpha \frac{\underline{\lambda }_k}{\beta _{\text {min}}}. \end{aligned}$$
(20)

Since $\mathop {\mathrm{liminf }}_{k \in \textit{K}} \lambda _k= 0,$ taking limits in (19) and (20) we get

$$\begin{aligned} \big \langle \nabla f(x^*), F(x^*) \big \rangle = 0. \end{aligned}$$

We proceed in a quite similar way if $\beta _k < 0$ for infinitely many indices. $\square $

Corollary 1

The orthogonality condition (14) implies $F(x^*)=0$ in the following cases :

(a)
$J(x^*)$ is positive (negative) definite;
(b)
$v^T J(x^*)v \ne 0,$ for all $v \in {\mathbb {R}}^n,$ $v \ne 0.$

Case (a) in Corollary 1 includes the class of nonlinear monotone systems of equations of the form (1) with F continuously differentiable and strictly monotone, that is $(F(x)-F(y))^T(x-y)> 0$ for any $x, y\in {\mathbb {R}}^n$ with $x\ne y$ [4]. Nonlinear monotone systems of equations arise in several applications and tailored spectral type methods have been recently proposed, see e.g. [18].

Remark 1

A general result like Theorem 3 was not proved for Pand-SR , which is in turn known to be convergent. Moreover, if $x^*$ is the limit point and $x_0$ the starting guess, the following bound

$$\begin{aligned} \Vert x_0 - x^*\Vert \le \beta _{\max } \left( \frac{1}{\alpha }+\frac{\eta }{\alpha } e^{\eta }\right) \Vert F_0\Vert \end{aligned}$$

(21)

was provided in [16]. However it cannot be proved in general that $F(x^*)=0$. Such a result was obtained in [16] basing the choice of $\beta _k$ on (11), assuming the Jacobian J to be Lipschitz continuous, and focusing on specific classes of problems. For example, [16, Theorem 5.2] consider the case of $J(x^*)$ with positive (negative) definite symmetric part and suitably bounded condition number. In [16, Theorem 5.2] instead, $J(x^*)$ is assumed to be strongly diagonal dominant, with diagonal entries of constant sign.

We show in the forthcoming section, that the stronger convergence properties of Srand2 correspond in practice to an algorithm potentially more robust than Pand-SR . Of course, we cannot expect strong difference in the performance of the two methods, given the small change between the two. Nevertheless, the new linesearch is able to recover few cases when $\Vert F_k\Vert $ does not converge to zero encountered with the previous one.

4 Numerical illustration

We compare the performance of Srand2 and Pand-SR algorithms on two problem sets. The first set (named set-Luksan) contains 17 nonlinear systems of the Lukšan test collection described in [13] that are commonly used as benchmark for optimization algorithms. The second set (named set-contact) consists in nonlinear systems arising in the solution of rail-wheel contact models via the classical CONTACT algorithm [8]. These tests were described in details and used in [15, Section 5.2]. We selected here the 153 problems generated with train speed of magnitude $v = 16 m/s$, yielding systems whose dimensions vary from $n=156$ to $n=1394$.

Pand-SR and Srand2 algorithms were implemented as described in Sect. 2 with parameters

$$\begin{aligned}&\beta _0=1,\quad \beta _{\text {min}}= 10^{-10},\quad \beta _{\text {max}}=10^{10},\quad \alpha = 10^{-4},\quad \sigma =0.5,\\&\eta _k = 0.99^k(100+\Vert F_0\Vert ^2)\quad \forall k\ge 0, \end{aligned}$$

see [16]. A maximum number of $10^5$ iterations and F-evaluations was imposed, and a maximum number of backtracks equal to 40 was allowed at each iteration. The procedure was declared successful when

$$\begin{aligned} \Vert F_k \Vert \le 10^{-6}. \end{aligned}$$

(22)

Failure was declared when either the assigned maximum number of iterations or F-evaluations or backtracks was reached, or $\Vert F\Vert $ was not reduced for 500 consecutive iterations. Such occurrences are denoted below as $\mathtt{F_{it}}$, $\mathtt{F_{fe}}$, $\mathtt{F_{bt}}$, $\mathtt{F_{in}}$, respectively.

Regarding the choice of $\beta _k$, we used three classical rules based on $\beta _{k,1}$, $\beta _{k,2}$ and their alternation, respectively named BB1, BB2 and ALT in what follows. Given a scalar $\beta $, let $T(\beta )$ be the projection of $|\beta |$ onto ${{I_{\beta }}}{\mathop {=}\limits ^\mathrm{def}}[ \beta _{\min }, \beta _{\max }]$, that is

$$\begin{aligned} T(\beta ) = \min \{\beta _{\text {max}}, \max \{ \beta _{\text {min}},|\beta |\}\}. \end{aligned}$$

(23)

We recall below the definition of BB1, BB2 and ALT as given in [15].

BB1 rule.:: By [7, 9, 10, 16], at each iteration set
$$\begin{aligned} \beta _k= {\left\{ \begin{array}{ll} \beta _{k,1}&{} \text {if } \ |\beta _{k,1}|\in {{I_{\beta }}}\\ T(\beta _{k,1}) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(24)
BB2 rule.:: At each iteration set
$$\begin{aligned} \beta _k= {\left\{ \begin{array}{ll} \beta _{k,2}&{} \text {if } \ |\beta _{k,2}|\in {{I_{\beta }}}\\ T(\beta _{k,2}) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(25)
ALT rule.:: Following [1, 7], at each iteration alternate between $\beta _{k,1}$ and $\beta _{k,2}$, setting:
$$\begin{aligned}&\beta ^{{\small \mathrm{ALT}}}_k= {\left\{ \begin{array}{ll} \beta _{k,1}&{} \text {for } k \hbox { odd} \\ \beta _{k,2}&{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(26)
$$\begin{aligned}&\beta _k= {\left\{ \begin{array}{ll} \beta ^{{\small \mathrm{ALT}}}_k &{} \quad \text {if} \ |\beta ^{{\small \mathrm{ALT}}}| \in {{I_{\beta }}}\\ \beta _{k,1}&{} \quad \text {if}\ k\ \text {even},\ |\beta _{k,1}| \in {{I_{\beta }}},\ |\beta _{k,2}| \notin {{I_{\beta }}}\\ \beta _{k,2}&{} \quad \text {if}\ k\ \text {odd,} \ |\beta _{k,2}| \in {{I_{\beta }}},\ |\beta _{k,1}| \notin {{I_{\beta }}} \\ T(\beta ^{{\small \mathrm{ALT}}}_k) &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(27)

We experimented Pand-SR and Srand2 also with more elaborated, adaptive rules for $\beta _k$ see e.g. [2, 15], but the qualitative behaviour of the two methods did not change; therefore we do not report the corresponding results.

Problems in set-Luksan were solved setting $n=500$ and starting from the initial guess $x_0$ suggested in [13]. Problem lu5 requires an odd value for n and therefore we set $n=501$. For 16 out of 17 problems, Pand-SR and Srand2 give the same results: Table 1 reports the number of F-evaluations varying the updating rule for $\beta _k$. More interesting is the case of Problem lu16 reported in Table 2. Though performing a large number of F-evaluations, Srand2 is able to successfully solve Problem lu16 using BB2 and ALT, whereas Pand-SR returns a failure with all the attempted $\beta _k$ rules.

Table 1 set-Luksan: number of F-evaluations performed by Pand-SR and Srand2 with different rules for $\beta _k$

Full size table

Table 2 set-Luksan: number of F-evaluations performed by Pand-SR and Srand2 with different rules for $\beta _k$ on for Problem lu16

Full size table

In Fig. 1 we give an insight into the convergence behaviour of both methods with BB2 on Problem lu16. We display $\Vert F_k\Vert $ versus the iterations and the number of F-evaluations (top part), the number of backtracks performed by both algorithms (central part), and values of $\Vert F_k\Vert $ and $\lambda _k$ versus the iterations for both algorithms (bottom part). All plots are obtained by disabling the stopping criterion on the number of consecutive increases of $\Vert F\Vert $. In this setting Pand-SR fails since the maximum number of backtracks is reached, after 3278 iterations and 56883 F-evaluations while Srand2 converges after 8456 iterations and 45624 F-evaluations. We observe that the sequence of $\{\Vert F_k\Vert \}$ generated by Pand-SR does not satisfy the stopping criterion (22), whereas the increasing number of backtracks along the iterations corresponds to the fact that $\{\lambda _k\}$ is going to zero. On the contrary, the sequence $\{\Vert F_k\Vert \}$ generated by Srand2 converges to zero and $\lambda _k$ does not decrease with the iterations. Both situations are in accordance with the theory: at least one among the sequences $\{\Vert F_k\Vert \}$ and $\{\lambda _k\}$ converges to zero, but the linesearch adopted in Srand2 more likely generates a sequence $\{\Vert F_k\Vert \}$ that goes to zero.

This behaviour is also confirmed by the experiments performed with the set-contact problems. Results obtained for these problems are summarized in the F-evaluation performance profiles [3] of Fig. 2, where Pand-SR and Srand2 , combined with rules BB2 (top plot) and ALT (bottom plot), are compared. Results with BB1 are not reported since the two algorithms give exactly the same values for the number of F-evaluations. The plots clearly show that the two algorithms perform similarly and Srand2 is slightly more robust. In detail, Pand-SR and Srand2 with BB2 solve 132 and 135 problems, respectively. Also in combination with the ALT rule, Srand2 solves 3 problems more than Pand-SR .

In the 6 cases recovered by Srand2 , the behaviour of the two methods was similar to what observed with Problem lu16. To witness, the graphs reported in Fig. 3 are relative to one of the cases where the BB2 rule was in use. Analogous observations as for Fig. 1 can be drawn, regarding convergence to zero of the sequences $\{ \lambda _k \}$ and $\{ \Vert F_k\Vert \}$.

5 Conclusions and outlook

In this work we show how to modify the algorithm proposed in [16] in order to establish mild general conditions that guarantee the convergence of the sequence $\{\Vert F_k\Vert \}$ to zero, and the corresponding practical benefits in terms of robustness.

The Pand-SR algorithm in [16] was developed for solving constrained nonlinear system of the form

$$\begin{aligned} F(x)=0, x\in \varOmega , \end{aligned}$$

(28)

where $\varOmega \subset {\mathbb {R}}^n$ is a convex set whose relative interior is non-empty. Srand2 can also be adapted to the solution of constrained problems of the form (28) by relying on suitable projection operator onto the feasible set $\varOmega $ as follows. Proceeding as in [16], feasible iterates $\{x_k\}$ can be defined by starting from a feasible $x_0$, and by setting for $k>0$

$$\begin{aligned} x_{k+1} = P(x_k\pm \lambda _k\beta _k F_k), \end{aligned}$$

where P denotes a projection operator onto the considered domain. As an example, if $\varOmega $ is a n-dimensional box $\{x\in {\mathbb {R}}^n\,\,\, \hbox { s.t. } \,\,\, l\le x\le u\}$, where $l\in ({\mathbb {R}}\cup {-\infty })^n$, $u\in ({\mathbb {R}}\cup {\infty })^n$, and the inequalities are meant component-wise, a projection map may be given by $P(x)=\max \left\{ l, \min \left\{ x,u\right\} \right\} $.

Such a modification of the Srand2 algorithm to handle constrained problems, trivially enjoys the theoretical properties presented in Theorems 1 and 2. Remarkably, the new global convergence result of Theorem 3 can also be easily extended to problem (28) for limit points lying in the interior of $\varOmega $. Convergence to solutions on the boundary of $\varOmega $ is currently under investigation.

Notes

The symbol $\langle x,y \rangle $ denotes the scalar product between vectors x and y.

References

Dai, Y.H., Fletcher, R.: Projected Barzilai–Borwein methods for large-scale box-constrained quadratic programming. Numer. Math. 100, 21–47 (2005)
Article MathSciNet Google Scholar
di Serafino, D., Ruggiero, V., Toraldo, G., Zanni, L.: On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput. 318, 176–195 (2018)
MathSciNet MATH Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Article MathSciNet Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research, vol. I. Springer, New York (2003)
MATH Google Scholar
Fletcher, R.: On the Barzilai–Borwein method. In: Optimization and Control with Applications. Applied Optimization, vol. 96, pp. 235–256. Springer, New York (2005)
Gonçalves, M.L.N., Oliveira, F.R.: On the global convergence of an inexact quasi-Newton conditional gradient method for constrained nonlinear systems. Numer. Algor. 84, 609–631 (2020)
Grippo, L., Sciandrone, M.: Nonmonotone derivative-free methods for nonlinear equations. Comput. Optim. Appl. 37, 297–328 (2007)
Article MathSciNet Google Scholar
Kalker, J., Jacobson, B.: Rolling Contact Phenomena. Springer, Wien (2000)
Book Google Scholar
La Cruz, W.: A projected derivative-free algorithm for nonlinear equations with convex constraints. Optim. Method Softw. 29, 24–41 (2014)
Article MathSciNet Google Scholar
La Cruz, W., Raydan, M.: Nonmonotone spectral methods for large-scale nonlinear systems. Optim. Method Softw. 18, 583–599 (2003)
Article MathSciNet Google Scholar
La Cruz, W., Martinez, J.M., Raydan, M.: Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Math. Comput. 75, 1429–1448 (2006)
Article MathSciNet Google Scholar
Li, D.H., Fukushima, M.: A derivative-free line search and global convergence of Broyden-like method for nonlinear equations. Optim. Method Softw. 13(3), 181–201 (2000)
Article MathSciNet Google Scholar
Lukšan, L.: Inexact trust region method for large sparse systems of nonlinear equations. J. Optim. Theory Appl. 81(3), 569–590 (1994)
Article MathSciNet Google Scholar
Marini, L., Morini, B., Porcelli, M.: Quasi-Newton methods for constrained nonlinear systems: complexity analysis and applications. Comput. Optim. Appl. 71, 147–170 (2018)
Article MathSciNet Google Scholar
Meli E., Morini, B., Porcelli, M., Sgattoni, C.: Solving nonlinear systems of equations via spectral residual methods: stepsize selection and applications, pp. 1–28 (2020). arXiv:2005.05851
Morini, B., Porcelli, M., Toint, P.: Approximate norm descent methods for constrained nonlinear systems. Math. Comput. 87, 1327–1351 (2018)
Article MathSciNet Google Scholar
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)
Article MathSciNet Google Scholar
Zhang, L., Zhou, W.: Spectral gradient projection method for solving nonlinear monotone equations. J. Comput. Appl. Math. 196, 478–484 (2006)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are indebted to Benedetta Morini for valuable discussions on spectral residual methods, as well as the referee for his/her careful reading and suggestions, which led to significant improvement of the manuscript. The authors are members of the Gruppo Nazionale per il Calcolo Scientifico (GNCS) of the Istituto Nazionale di Alta Matematica (INdAM) and this work was partially supported by INdAM-GNCS under Progetti di Ricerca 2019 and 2020.

Funding

Open access funding provided by Alma Mater Studiorum - Università di Bologna within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Dipartimento di Ingegneria Industriale, Università degli Studi di Firenze, viale G.B. Morgagni 40, 50134, Florence, Italy
Alessandra Papini
Dipartimento di Matematica, Università di Bologna, Piazza di Porta San Donato 5, 40126, Bologna, Italy
Margherita Porcelli
Dipartimento di Matematica e Informatica “Ulisse Dini”, Università degli Studi di Firenze, viale G.B. Morgagni 67a, 50134, Florence, Italy
Cristina Sgattoni

Authors

Alessandra Papini
View author publications
You can also search for this author in PubMed Google Scholar
Margherita Porcelli
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Sgattoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margherita Porcelli.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Papini, A., Porcelli, M. & Sgattoni, C. On the global convergence of a new spectral residual algorithm for nonlinear systems of equations. Boll Unione Mat Ital 14, 367–378 (2021). https://doi.org/10.1007/s40574-020-00270-5

Download citation

Received: 06 June 2020
Accepted: 04 November 2020
Published: 24 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s40574-020-00270-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the global convergence of a new spectral residual algorithm for nonlinear systems of equations

Abstract

Similar content being viewed by others

A spectral algorithm for large-scale systems of nonlinear monotone equations

Analysis of the Convergence of a Combined Method for the Solution of Nonlinear Equations

A study on the local convergence and the dynamics of Chebyshev–Halley–type methods free from second derivative