On the global convergence of a new spectral residual algorithm for nonlinear systems of equations

We present a derivative-free method for solving systems of nonlinear equations that belongs to the class of spectral residual methods. We will show that by endowing a previous version of the algorithm with a suitable new linesearch strategy, standard global convergence results can be attained under mild general assumptions. The robustness of the new method is therefore potentially improved with respect to the previous version as shown by the reported numerical experiments.


Introduction
In this work we propose a variant of the derivative-free spectral residual method Pand-SR presented in [16], for solving nonlinear systems of equations of the form: with the aim of obtaining stronger global convergence results when F : R n → R n is a continuously differentiable mapping. Indeed, the sequence generated by proved to be convergent under mild standard assumptions, but only in a more specific setting it was shown in [16] that the limit point is also a solution of (1). Inspired by [11], we adopt here a different linesearch strategy, which allows us to obtain a more general and nontrivial result for methods that do not make any use of derivatives of f , and in fact was not established in [16]. Namely we can prove that at every limit point x * of the sequence {x k } generated by the new algorithm, either F(x * ) = 0 or the gradient of the merit function is orthogonal to the residual F: being J the Jacobian of F. 1 Clearly the orthogonality condition (3) does not generally imply F(x * ) = 0; however this result can be recovered under additional conditions, e.g. when J (x * ) is positive (negative) definite. We further remark that the improvement with respect to Pand-SR is not only theoretical; as discussed in Sect. 4, the performed numerical experiments show that the new linesearch has a positive impact also on the practical behaviour of the method. Given the current iterate x k , spectral residual methods are methods of linesearch type which produce a new iterate x k+1 of the form: -both the residual vectors ±F(x k ) are used as search directions; -the spectral coefficient β k = 0 is generally the reciprocal of an appropriate Rayleigh quotient, approximating some eigenvalue of (suitable secant approximations of) the Jacobian [11,15]; -the steplength parameter λ k > 0 is determined by suitable-typically nonmonotonelinesearch strategies to reduce the norm of F (or a smooth merit function as (2)).
Spectral residual methods have received a large attention because of the low-cost of the iterations, and because they require a low memory storage being matrix free, see e.g. [7,[9][10][11]16]. They are particularly attractive when the Jacobian matrix of F is not available analytically or its computation is burdensome. Indeed, distinguishing features of these methods are that the computation of the search directions does not involve the solution of linear systems, and that effective derivative-free linesearch conditions can be defined [6,7,11,12,15]. The paper is organized as follows. Our algorithm is presented in Sect. 2, where we describe the new linesearch strategy and recall the main features of the spectral residual method Pand-SR. Convergence analysis is developed in Sect. 3 and numerical experiments are discussed in Sect. 4. Some conclusions and perspectives are drawn in Sect. 5.

Notations
The symbol · denotes the Euclidean norm, J denotes the Jacobian matrix of F. Given a sequence of vectors {x k }, we occasionally denote F(x k ) by F k .

The Srandalgorithm
We present a spectral residual method that is a modification of the Projected Approximate Norm Descent algorithm with Spectral Residual step (Pand-SR) proposed in [16]. Pand-SR was developed for solving convexly constrained nonlinear systems; here it is applied in an unconstrained setting. A brief discussion on the constrained case is postponed to Sect. 5.
The new algorithm is denoted as Srand2 (Spectral Residual Approximate Norm Descent) and differs from Pand-SR in the definition of the linesearch conditions and in the choice of the spectral stepsize β k .
Both Pand-SR and Srand2 employ a nonmonotone linesearch strategy based on the so-called approximate norm descent property [12]. This means that the generated sequence for all k, where {η k } is a positive sequence of scalars such that The idea behind such a condition is to allow a highly nonmonotone behaviour of F k for (initial) large values of η k while promoting a decrease of F for small (final) values of η k . A nonmonotone behaviour of the norm of F is crucial to avoid practical stagnation of methods based on spectral stepsizes (see e.g. [5,11,17]); at the same time condition (4) ensures the sequence { F k } to be bounded (see Theorem 1 in Sect. 3).
In detail, given the current iterate x k and the initial stepsize β k , in [16] a new iterate x k+1 of form is computed. The scalar λ k ∈ (0, 1] is fixed by using a backtracking strategy; starting from λ k = 1, it is progressively reduced by a factor σ ∈ (0, 1) (e.g. halved) until one of the following conditions is satisfied: or where α ∈ (0, 1). In Srand2 conditions (7) and (8) are respectively replaced by and All these conditions are derivative-free. If F is continuously differentiable, as long as F T k J (x k )F k = 0, either +β k F k or −β k F k is a descent direction for the merit function f in (2) and for F at x k ; hence the first condition (9) (similarly (7)) promotes a sufficient decrease in F and is crucial for establishing results on the convergence of { F k } to zero. On the other hand, the second condition (10) (similarly (8)) allows for an increase of F depending on the magnitude of η k . Trivially, (9) implies (10) and both imply the approximate norm descent condition (4); the same holds for conditions (7) and (8).
We observe that the change in conditions (9) and (10) with respect to (7) and (8) only derives from the λ 2 k term in the right hand side of (9) and (10). This squared term is common to other linesearch strategies as e.g. those in [11,12]. This small change in the linesearch conditions has a great impact on the global convergence result of the overall algorithm as shown in the forthcoming section.
As concerns the choice of the spectral coefficient β k in (6), both Pand-SR and Srand2 use formulas closely related to the Barzilai-Borwein's steplength employed in spectral gradient methods for optimization problems, see e.g. [2,5]. However, differently from the optimization case, in spectral residual methods β k may be positive or negative since both directions ±F k are attempted. Also, its absolute value is constrained to belong to a given interval [β min , β max ] to get a bounded sequence of stepsizes. As an example β k can be chosen by computing or and then ensuring that β k,1 or β k,2 is such that |β k | ∈ [β min , β max ] by some thresholding rule. Alternative choices of β k that suitably combine β k,1 and β k,2 can be found in [15], where a systematic analysis of the stepsize selection for spectral residual methods is addressed also in combination with an approximate norm descent linesearch. In Algorithm 2.1 we formally describe Srand2 for a general β k such that |β k | ∈ [β min , β max ].
We observe that the Repeat loop at Step 2 terminates in a finite number of steps: indeed, from the continuity of F and the positivity of η k , there existsλ > 0 such that with λ ∈ (0,λ], and i = 1, . . . , n; therefore, inequality (10) holds for small enough values of λ k .

Global convergence analysis
We now provide the convergence analysis of the Srand2 algorithm. Theorems 1 and 2 analyze the behaviour of the sequences {λ k } and { F k }; they state general results which derive from the linesearch strategy and hold for Pand-SR as well. Their proofs follow the lines of [16,Theorem 4.2] and therefore are not reported in this work. Theorem 2 in particular identifies situations where { F k } may or may not converge to zero. Theorem 3 constitutes the main contribution of this work. It is related both to the linesearch strategy and to the choice of the spectral residual steps, and it does not rely on the specific choice of β k .
where η > 0 is given in (5). Moreover Theorem 2 Let F : R n → R n be a continuous map, and let {x k } and {λ k } be the sequences of iterates and of linesearch stepsizes generated by the Srand2 algorithm. Then (9) is satisfied for infinitely many k, then lim k→∞ F k = 0. (iii) If F k ≤ F k+1 for infinitely many iterations, then liminf k→∞ λ 2 k = 0. (iv) If F k ≤ F k+1 for all k sufficiently large, then { F k } does not converge to 0.
We now provide the main convergence result, that is at every limit point x * of the sequence {x k } generated by the Srand2 algorithm, the gradient of the merit function f in (2) is orthogonal to the residual F(x * ).
Theorem 3 Let F be continuously differentiable. Let {x k } be the sequence generated by the Srand2 algorithm and let x * be a limit point of {x k }. Then either Proof Let K be an infinite subset of indices such that lim k∈K x k = x * . By Theorem 1 we know that lim k∈K λ 2 k F k = 0. Hence there are two possibilities: The first one implies lim k∈K F k = 0. Then using the continuity of F it follows easily that In the second case we have liminf k∈K λ 2 k = liminf k∈K λ k = 0. Let λ k = λ k /σ denote the last attempted value for the linesearch parameter before λ k is accepted during the backtracking phase. Hence for sufficiently large values of k ∈ K we have Being η k > 0, and by virtue of (12), there is a positive constant c 1 such that and multiplying both sides of (15) by Now we observe that x k ± λ k β k F k is bounded ∀k ∈ K; indeed, by hypothesis λ k ∈ (0, 1], |β k | ≤ β max , the subsequence {x k } k∈K is convergent to x * and hence bounded, and F k is bounded by Theorem 1. Then recalling the definition of λ k = λ k /σ and the continuity of F, for some positive constant c 2 . Consequently, from (16) and (17), there exists a constant c > 0 such that for sufficiently large values of k ∈ K. Now, we suppose that β k > 0 for infinitely many indices k ∈ K 1 ⊆ K, and we consider the two steps −λ k β k F k and +λ k β k F k separately.
-Firstly, we consider −λβ k F k . By virtue of the Mean Value Theorem and (18), there exists ξ k ∈ [0, 1] such that for sufficiently large k ∈ K. Hence, for all large k ∈ K 1 we have that: -Now we consider +λβ k F k . Similarly there exists ξ k ∈ [0, 1] such that for all large k ∈ K 1 Since liminf k∈K λ k = 0, taking limits in (19) and (20) we get We proceed in a quite similar way if β k < 0 for infinitely many indices.

Corollary 1
The orthogonality condition (14) implies F(x * ) = 0 in the following cases: Case (a) in Corollary 1 includes the class of nonlinear monotone systems of equations of the form (1) with F continuously differentiable and strictly monotone, that is (F(x) − F(y)) T (x − y) > 0 for any x, y ∈ R n with x = y [4]. Nonlinear monotone systems of equations arise in several applications and tailored spectral type methods have been recently proposed, see e.g. [18].

Remark 1
A general result like Theorem 3 was not proved for Pand-SR, which is in turn known to be convergent. Moreover, if x * is the limit point and x 0 the starting guess, the following bound was provided in [16]. However it cannot be proved in general that F(x * ) = 0. Such a result was obtained in [16] basing the choice of β k on (11), assuming the Jacobian J to be Lipschitz continuous, and focusing on specific classes of problems. For example, [16,Theorem 5.2] consider the case of J (x * ) with positive (negative) definite symmetric part and suitably bounded condition number. In [16,Theorem 5.2] instead, J (x * ) is assumed to be strongly diagonal dominant, with diagonal entries of constant sign. We show in the forthcoming section, that the stronger convergence properties of Srand2 correspond in practice to an algorithm potentially more robust than Pand-SR. Of course, we cannot expect strong difference in the performance of the two methods, given the small change between the two. Nevertheless, the new linesearch is able to recover few cases when F k does not converge to zero encountered with the previous one.

Numerical illustration
We compare the performance of Srand2 and Pand-SR algorithms on two problem sets. The first set (named set-Luksan) contains 17 nonlinear systems of the Lukšan test collection described in [13] that are commonly used as benchmark for optimization algorithms. The second set (named set-contact) consists in nonlinear systems arising in the solution of railwheel contact models via the classical CONTACT algorithm [8]. These tests were described in details and used in [15,Section 5.2]. We selected here the 153 problems generated with train speed of magnitude v = 16m/s, yielding systems whose dimensions vary from n = 156 to n = 1394.
Failure was declared when either the assigned maximum number of iterations or Fevaluations or backtracks was reached, or F was not reduced for 500 consecutive iterations. Such occurrences are denoted below as F it , F fe , F bt , F in , respectively. Regarding the choice of β k , we used three classical rules based on β k,1 , β k,2 and their alternation, respectively named BB1, BB2 and ALT in what follows. Given a scalar β, let T (β) be the projection of |β| onto I β def = [β min , β max ], that is We recall below the definition of BB1, BB2 and ALT as given in [15].
BB1 rule. By [7,9,10,16], at each iteration set BB2 rule. At each iteration set ALT rule. Following [1,7], at each iteration alternate between β k,1 and β k,2 , setting: We experimented Pand-SR and Srand2 also with more elaborated, adaptive rules for β k see e.g. [2,15], but the qualitative behaviour of the two methods did not change; therefore we do not report the corresponding results. Problems in set-Luksan were solved setting n = 500 and starting from the initial guess x 0 suggested in [13]. Problem lu5 requires an odd value for n and therefore we set n = 501. For 16 out of 17 problems, Pand-SR and Srand2 give the same results: Table 1 reports the number of F-evaluations varying the updating rule for β k . More interesting is the case of Problem lu16 reported in Table 2. Though performing a large number of F-evaluations, Srand2 is able to successfully solve Problem lu16 using BB2 and ALT, whereas Pand-SR returns a failure with all the attempted β k rules.
In Fig. 1 we give an insight into the convergence behaviour of both methods with BB2 on Problem lu16. We display F k versus the iterations and the number of F-evaluations (top part), the number of backtracks performed by both algorithms (central part), and values of F k and λ k versus the iterations for both algorithms (bottom part). All plots are obtained by disabling the stopping criterion on the number of consecutive increases of F . In this setting Pand-SR fails since the maximum number of backtracks is reached, after 3278 iterations and 56883 F-evaluations while Srand2 converges after 8456 iterations and 45624 F-evaluations. We observe that the sequence of { F k } generated by Pand-SR does not satisfy the stopping criterion (22), whereas the increasing number of backtracks along the iterations corresponds to the fact that {λ k } is going to zero. On the contrary, the sequence { F k } generated by Srand2 converges to zero and λ k does not decrease with the iterations. Both situations are in accordance with the theory: at least one among the sequences { F k } and {λ k } converges to zero, but the linesearch adopted in Srand2 more likely generates a sequence { F k } that goes to zero.   This behaviour is also confirmed by the experiments performed with the set-contact problems. Results obtained for these problems are summarized in the F-evaluation performance profiles [3] of Fig. 2, where Pand-SR and Srand2, combined with rules BB2 (top plot) and ALT (bottom plot), are compared. Results with BB1 are not reported since the two algorithms give exactly the same values for the number of F-evaluations. The plots clearly show that the two algorithms perform similarly and Srand2 is slightly more robust. In detail, Pand-SR and Srand2 with BB2 solve 132 and 135 problems, respectively. Also in combination with the ALT rule, Srand2 solves 3 problems more than Pand-SR.
In the 6 cases recovered by Srand2, the behaviour of the two methods was similar to what observed with Problem lu16. To witness, the graphs reported in Fig. 3 are relative to one of the cases where the BB2 rule was in use. Analogous observations as for Fig. 1 can be drawn, regarding convergence to zero of the sequences {λ k } and { F k }.  Table B.5]

Conclusions and outlook
In this work we show how to modify the algorithm proposed in [16] in order to establish mild general conditions that guarantee the convergence of the sequence { F k } to zero, and the corresponding practical benefits in terms of robustness.
The Pand-SR algorithm in [16] was developed for solving constrained nonlinear system of the form where Ω ⊂ R n is a convex set whose relative interior is non-empty. Srand2 can also be adapted to the solution of constrained problems of the form (28) by relying on suitable projection operator onto the feasible set Ω as follows. Proceeding as in [16], feasible iterates {x k } can be defined by starting from a feasible x 0 , and by setting for k > 0 where P denotes a projection operator onto the considered domain. As an example, if Ω is a n-dimensional box {x ∈ R n s.t. l ≤ x ≤ u}, where l ∈ (R ∪ −∞) n , u ∈ (R ∪ ∞) n , and the inequalities are meant component-wise, a projection map may be given by P(x) = max {l, min {x, u}}. Such a modification of the Srand2 algorithm to handle constrained problems, trivially enjoys the theoretical properties presented in Theorems 1 and 2. Remarkably, the new global convergence result of Theorem 3 can also be easily extended to problem (28) for limit points lying in the interior of Ω. Convergence to solutions on the boundary of Ω is currently under investigation.
Funding Open access funding provided by Alma Mater Studiorum -Università di Bologna within the CRUI-CARE Agreement.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.