Inexact free derivative quasi-Newton method for large-scale nonlinear system of equations

In this work, we propose a free derivative quasi-Newton method for solving large-scale nonlinear systems of equation. We introduce a two-stage linear search direction and develop its global convergence theory. Besides, we prove that the method enjoys superlinear convergence rate. Finally, numerical experiments illustrate that the proposed method is competitive with respect to Newton-Krylov methods and other well-known methods for solving large-scale nonlinear systems of equations.


Introduction
Let F : R n → R n be a nonlinear continuously differentiable function, we are interested in solving the problem especially when n is considerably large. The most popular method for solving (1) is the Newton method [1,2]. This method has very good convergence properties, but its greatest difficulty is having to calculate the Jacobian matrix of F, F , and evaluate it in each iteration which is, computation-C.A. Arias and C. Gómez contributed equally to this work. B C. A. Arias carlosarias@unicauca.edu.co C. Gómez chcgomezmo@gmail.com 1 Departamento de Matemáticas, Universidad del Cauca, Popayán, Colombia 2 Facultad de Ciencias Básicas e Ingeniería, Universidad de los Llanos, Villavicencio, Colombia ally, very expensive. One strategy to avoid this calculus and the evaluations is to use an approximation to F (x) which entails to a variety of methods known as quasi-Newton methods [3][4][5][6][7][8].
However, both Newton and quasi-Newton methods need to solve a linear system of equations in their routines; thus, if n is large, even quasi-Newton methods are expensive. It is important to say that although many quasi-Newton methods admit a cheap formula for updating the inverse of B k , generally these formulas depend on a term that involves a fraction whose denominator could be zero or a very small quantity so in practice this could lead to an ill-conditioned matrix. To counter this, iterative Krylov methods [9][10][11] were introduced to Newtonian algorithms, which contributed to decrease the computational cost on these algorithms. This strategy consists of finding approximately the search direction, so the mentioned linear system will solve with some tolerable error.
For Newton-Krylov methods, fast convergence properties have been proved whereas, for quasi-Newton-Krylov methods, have been proved fast convergence properties if the Krylov method is the conjugate gradient method [12][13][14][15] or employed a Jacobian restart strategy [16]. The greatest difficulty of inexact quasi-Newton methods is that so far it has not been possible to prove that inexact quasi-Newton direction are descent directions for the associated merit function to (1).
Several studies are concerned with solving (1). In [17,18], the authors proposed a different approach to solve (1). They proposed a nonmonotone spectral free derivative method. This was an ingenious proposal based on spectral gradient method and systematically uses ±F(x k ) as a search direction. Due to the simplicity of this method, the computational cost is very low and although global convergence has been proved, linear or superlinear convergence has not yet been proved.
In this paper, we establish a global inexact quasi-Newton method, which is of low computational cost to solve (1). We propose a two-stage linear search procedure for obtaining descent direction and we ensure the global convergence without falling into infinite cycles. Also, under reasonable conditions, we prove fast convergence properties for the method. This paper is organized as follows. In Section 2 we introduce the new algorithm and make some remarks. In Section 3 we develop the convergence theory of the algorithm introduced previously. Besides, we prove global convergence of the method and the linear and superlinear convergence rate. In Section 4 we present some numerical experiments that show the robustness and competitiveness of the new algorithm. Furthermore, we compare the performance of our algorithm with respect to the algorithms proposed in [17,18]. Finally, we make some remarks in Section 5.
Taking into account that one of our interests is to purpose a global algorithm to solve (1), we considered the following minimization problem to this. 2 is the associated merit function to (1) and · is the Euclidean norm in R n .
Below we show the IFDQ algorithm which main innovation is the two-stage linear search procedure Algorithm 1 IFDQ.
Step 1: Step 2: Two stages linear search procedure: x k+1 = x k − α k d k and go to Step 3.

else
, set α k = βα k . If α k ≥ λ, go to line 1 in the two stages linear search and repeat the procedure, else, break. 6. end if Step 3: Update B k such that B k+1 let be a nonsingular matrix.

Remark 1
So far it has not been possible to prove, in general, that the inexact quasi-Newton direction, obtained in Step 1 is a descent direction for the merit function f given in (2). For this reason, it is necessary to try both d k and −d k directions in the line search procedure.

Remark 2 In
Step 2 we set two trial directions, d k and −d k . Note that unless ∇ f T d k = 0, at least one of the trial directions will be a descent direction, hence, the two-stage linear search procedure will not fall in an infinite cycle.

Remark 3
In the linear search procedure we seek a sufficient decrease of the merit function f but if the step length α k is too small then the algorithm breaks down, avoiding small steps with a poor decrease. Note that the lower bound for the step length is set by the user and can be as small as desired. So to prevent either, many breaks of the algorithm and small steps with a poor decrease we recommend to take λ = 10 −4 . Whit this value, the algorithm showed a good performance.

Convergence theory
In this section, we present the main theoretical results obtained for IFDQ algorithm. In the following lemmas and theorems we show that, under reasonable assumptions, the algorithm converges to a solution of the problem (1) and locally enjoys good convergence properties: full inexact quasi-Newton step is accepted and it can even has until superlinear converge.
The hypotheses under which we develop the convergence theory of the proposed algorithm are: Previous hypotheses are classical hypotheses for quasi-Newton methods. Hypotheses H1 and H3 depend on the problem to be solved whereas H2 and H4 depends on the approximations to F (x k ).
An immediate consequence of H3 is that for all x, y ∈ R n , It is important to mention that (4) in H4 is known as Dennis-Moré condition. The first lemma of our theoretical development ensures that if the IFDQ algorithm does not break down then, it generates a sequence such that its images converge to zero.

Lemma 1 If {x k } is a sequence generated by IFDQ algorithm then lim
Proof Let {x k } be a sequence given by IFDQ algorithm, it follows that By recalling that 1 − λ 2 < 1, the result is established.
The next result is immediate since F is a continuous function.

Corollary 1
If {x k } is a sequence generated by IFDQ algorithm and x * is a cluster point of {x k } then x * is a solution of (1).
The following theorem ensures the convergence of the sequence generated by IFDQ algorithm under the assumption of non-singularity of the Jacobian matrix F in the solution of the problem.
Theorem 1 Assume H2. Let {x k } be a sequence generated by IFDQ algorithm. If By recalling that F is a continuous function and Lemma 1, we have that F(x * ) = 0. On the other hand, let K = F (x * ) −1 and δ > 0 small enough such that for any The existence of δ can be guaranteed thanks to Lemmas 1.1 and 1.2 in [15]. Observe that if y ∈ B(x * , δ) then By combining the two last inequalities we can infer that On the other hand, let ∈ (0, δ/4). Since x * is a cluster point of {x k } and F(x * ) = 0 then, there exist k large enough such that where M = max{K , T } and T is the constant of hypothesis H2. Hence, we obtain Thus, we conclude x k+1 ∈ B(x * , δ). On the other hand, since IFDQ attempts for a monotone decrease and x k ∈ S ε then So, from (6) and using the last inequality we can infer that Finally, from (8) and (9) we have that x k+1 ∈ S , with which we prove that x k ∈ S for all k large enough, and since F( In the next lemma, we show that the trial directions in Step 1 remain bounded for all k.
Lemma 2 Assume H2. Let {x k } be a sequence generated by IFDQ algorithm then, Proof The first part of this lemma follows from (7) and the fact that θ k ∈ (0, 1). On the other side, observe that Thus, by Lemma 1 and the last inequality, we have the desired result.
The next theorem ensures convergence of the sequence generated by IFDQ algorithm without non-singularity condition of F (x * ).
Theorem 2 Let {x k } be a sequence generated by the algorithm. If x * is an isolated cluster point of the sequence then, lim Proof Taking into account Lemmas 1 and 2, this proof follows the same ideas of the proof of theorem 3 in [18] The next theorem ensures that, at least locally, the IFDQ algorithm shows good performance in the sense that the full quasi-Newton step will be accepted in the two-stage linear search procedure. In the proof we assume a weaker hypothesis than the Dennis-Moré condition. This hypothesis is related with the bounded deterioration property and ensures that at least, for all k large enough, the approximation B k to the jacobian matrix F (x k ) is bounded above by a constant. As in the proof of Theorem 1, let M = max{K , T } where T is the constant in H2. H1, H2 and H3. Let {x k } be a sequence generated by IFDQ algorithm and x * be a cluster point of the sequence. If F (x * ) is nonsingular and for all k large enough θ k < 1 3 − λ and F (x k ) − B k < 1 24M 3 then d k and α k = 1 will be accepted, in the two-stage linear search procedure.

Theorem 3 Assume hypotheses
Thus, so, from Lemma 2 and taking into account that for all k large enough Now, since F(x k ) converges to zero then, for all k large enough Hence, by (10), (11) and since θ k < 1 3 − λ then, for all k large enough thus α k = 1 and d k will be accepted.
The following theorem is the first theorem in which we show good convergence properties of the IFDQ algorithm. For this purpose, we assume that

Theorem 4 Under the same hypotheses of the previous theorem. If in addition
Proof By using Theorem 3, we can ensure that the full quasi-Newton step will be accepted for all k large enough and by Theorem 1, x k → x * , so for all k large enough, F (x k ) −1 there exist and F (x k ) −1 ≤ 2M hence, by (5) and Step 1 in the algorithm we have that Thus, by Lemma 2, So, by the mean value theorem, thereby, for k large enough such that where 0 < R < 1, which completes the proof.
Finally, to complete our theoretical development, the next theorem ensures, under reasonable assumptions, superlinear convergence of the IFDQ algorithm. H1, H2, H3 and H4. Let {x k } be a sequence generated by IFDQ algorithm and x * be a cluster point of the sequence. If F (x * ) is nonsingular and θ k → 0 then x k → x * superlinearly.

Theorem 5 Assume
Proof From (12) we can infer that By Lemma 2 and the Mean Value Theorem, The desired result follows from hypothesis H4 and the fact that x k → x * and θ k → 0 as k → ∞.

Numerical experiments
In this section we report the numerical results of the IFDQ algorithm when solving twenty problems. Sixteen of the problems were taken from [17] and references therein, the rest of the problems were taken from [19][20][21]. It is important to say that we did not take into account problems 13, 14, 15 and 18 of [17]. First, problems 13 and 14 include many random parameters that difficult the reproduction of the experiments. Second, the poor performance of the algorithms with problems 15 and 18 did not allow us to draw relevant conclusions.
The experiments were carried out in Matlab using an Intel Core2 T M laptop with a RAM of 4GB. To evaluate the performance of the IFDQ algorithm, we ran experiments and compared the results with four more algorithms: SANE [17], DF-SANE [18], Ac-DFSANE [22] and NITSOL [23].
SANE and DF-SANE are spectral free derivative algorithms. The descent trial direction at each iteration of these methods is ±F(x k ) and the main difference between them is the linear search.
Ac-DFSANE is an accelerated version, proposed recently, for the DF-SANE algorithm. This chooses in a very ingenious way the new iterate improving in many times the descent achieved in the linear search.
Finally, NITSOL is a practical and efficient implementation of the classical inexact Newton method with GMRES procedure to find the descent direction. This algorithm approximates derivatives by finite differences when these are not available.
For all algorithms we used F(x k ) < 10 −6 and k < 300 as stop criteria. For IFDQ algorithm we took B 0 = I n as initial approximation to F (x 0 ); θ k = 1 k+2 as inexact parameter in Step 1 and λ = 10 −4 and β = 0.5 for the two-stage linear search procedure in Step 2. To find d k in Step 1, we used GMRES procedure.
In the same way, in Step 3 we used the "good" Broyden update [24]. It is important to say that this update is a least change secant update, thus satisfying the well-known property of bounded deterioration [2,25] with which Dennis and Schnabel in [2] showed that the sequence of matrices {B k } satisfies the Dennis-Moré condition (4), i.e., the sequence of matrices {B k } satisfies hypothesis H4.
On the other hand, SANE, DF-SANE, Ac-DFSANE and NITSOL algorithms were carried out with the same parameters as in respective references given above.
In Table 1, we show the complete list of problems with which we ran our experiments. Starting points were the same as in the respective references.
In Tables 2 and 3 we report the results obtained using the following conventions: F : functions of Table 1. Method: algorithm used to solve the problems.
n : size of the problem to solve. k : number of iterations required for the algorithms to solve each problem. The results in Tables 2 and 3 showed good performance of the IFDQ algorithm. First because IFDQ algorithm required the same or fewer iterations than its counterparts 2. Exponential function 2 [17] 3. Exponential function 3 [17] 4. Diagonal function premultiplied by a quasi-orthogonal matrix [17,26] 5. Extended Rosenbrock function [17,26] 6. Chandrasekhar's H-equation [17] 7. Badly scaled augmented Powell's function [17,26] 8. Trigonometric function [17] 9. Singular function [17] 10. Logarithmic function [17] 11. Broyden tridiagonal function [17,27] 12. Trigexp function [17,27] 13. Strictly function 1 [17,28] 14. Strictly function 2 [17,28] 15. Zero Jacobian function [17] 16. Geometric programming function [17] 17. Extended Wood function [19] 18. Brent function [19] 19. Yamamura function [20] 20. Zhou function [21], Problem 2. in 17.5% of the problems. Second, our algorithm converged on thirty-eight of the problems, that is, IFDQ algorithm converged on 95% of the experiments carried out. It is important to mention that although in general when SANE, DF-SANE, Ac-DFSANE and NITSOL converged, they were faster, in terms of CPU time, than IFDQ, but in most cases, that difference was only for a few seconds. This behavior is due to the fact that SANE, DF-SANE and Ac-DFSANE used ±F(x k ) as trial directions, so they do not have to solve a linear system of equation like (3) or make matrix vector products as products made for IFDQ to update B k . On the other side, NITSOL makes a single linear search, so generally requires less evaluations of function than IFDQ. In the same way, NITSOL approximate derivatives by finite differences when these are not available nevertheless we think that this could be affecting the convergence of the method since this was the one with the lowest success rate. For the above, IFDQ seems to be a competitive algorithm.
In the second and third columns of Table 4 we show the percentage of experiments in which each algorithm won in terms of CPU time and number of iterations. In last column of this table we show the percentage of success of each algorithm with the experiments. The results show the IFDQ algorithm as an equilibrated method since it was the most successful without requiring a large number of iterations or CPU time compared to the other algorithms.  Ac-DFSANE * * * * * * Table 2 continued  Table 2 continued  Table 3 Problems 11 to 20  Table 3 continued  Table 3 continued  To test the global convergence of the IFDQ algorithm, we experimented with some of the problems by randomly varying the starting point. In all cases, we ran the algorithm with 500 starting points, whose components were uniformly distributed in the interval [−100, 100]. In Table 5 we show, for each experiment, the problem, the size of the problem and the success rate of the method. The success rate of the algorithm on selected problems shows us a robust method; thus, it would be a good option for solving large-scale nonlinear system of equations.
To finish our experiments, we want to show the inner behavior of the IFDQ algorithm when solving the problem given by the Extended Rosenbrock function.
In Table 6, we show the behavior of the most important parameters in the algorithm when solving the above-mentioned problem. In this table, it helps us to analyze the rate convergence of the algorithm. As we can see in Table  6, Rel Res converge to zero when θ k converge to zero, which suppose a superlinear convergence of the algorithm, as we proved in Theorem 5. On the other hand, α k = 1 for k > 18 just as we proved in Theorem 3.

Final remarks
In this work we proposed a new free derivative method for solving, especially, large-scale nonlinear systems of equations. This method takes inexact quasi-Newton  directions to build the new iterate and, taking into account that so far it has not been possible to demonstrate that this is a descent direction and seeking to establish global convergence we proposed a two-stage linear search procedure. For this new method, we show that it enjoys good convergence properties, this is, IFDQ method locally performs very well and, under reasonable hypotheses, has until superlinear convergence.
Numerical experiments showed that IFDQ had a performance according to expected and that this is a competitive method for solving large-scale nonlinear systems of equations.

Declarations
Ethics approval The authors declare that they followed all the rules of a good scientific practice.
Consent to participate All authors approve their participation in this work.

Consent for publication
The authors approve the publication of this research.

Competing interests
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.