A Dai–Liao conjugate gradient method via modified secant equation for system of nonlinear equations

In this paper, we propose a Dai–Liao (DL) conjugate gradient method for solving large-scale system of nonlinear equations. The method incorporates an extended secant equation developed from modified secant equations proposed by Zhang et al. (J Optim Theory Appl 102(1):147–157, 1999) and Wei et al. (Appl Math Comput 175(2):1156–1188, 2006) in the DL approach. It is shown that the proposed scheme satisfies the sufficient descent condition. The global convergence of the method is established under mild conditions, and computational experiments on some benchmark test problems show that the method is efficient and robust.


Introduction
A typical system of nonlinear equations has the general form where F : R n → R n is a nonlinear mapping assumed to be continuously differentiable in a neighborhood of R n . Systems of nonlinear equations play important role in sciences and engineering fields; therefore, solving (1) has become a subject of interest to researchers in the aforementioned areas. Numerous algorithms or schemes have been developed for solving these systems of equations. Notable among them are the Newton and quasi-Newton schemes [14,22,34,52], which converge rapidly from sufficiently good starting point. However, the requirement for computation and storage of the Jacobian matrix or an approximation of it at each iteration makes the two methods unattractive for large-scale nonlinear systems [51]. The ideal method for solving large-scale systems is the conjugate gradient (CG) method, which forms an important class of algorithms used in solving large-scale unconstrained optimization problems. The method is popular with mathematicians and engineers engaged in large-scale problems because of it low memory requirement and strong global convergence properties [19]. Generally, the nonlinear conjugate gradient method is used to solve large-scale problems in the following form; where f : R n → R is a continuously differentiable function that is bounded from below and its gradient is available. The method generates a sequence of iterates x k from an initial point x 0 ∈ R n using the iterative formula x k+1 = x k + s k , s k = α k d k , k = 0, 1, . . . , (3) where x k is the current iterate, α k > 0 is a step length computed using suitable line search technique, and d k is the CG search direction defined by where β k is a scalar known as the CG update parameter, and F k = ∇ f (x k ). It is worth noting that a crucial element in any CG algorithm is the formula definition of the update parameter β k [4], which is why different CG algorithms corresponding to different choices of β k in (4) have been proposed (see [8,[10][11][12][13][14]17,33,50,51,53,65]). Also, some of the CG methods for unconstrained optimization are not globally convergent, so efforts have been made by researchers to develop CG methods that are not only globally convergent but also are numerically efficient. These new methods are based on secant equations. For nonlinear conjugate gradient methods, the conjugacy condition is given by Perry [44] extended (5) by exploiting the following secant condition of quasi-Newton schemes: and quasi-Newton search direction d k given by where B k is a square matrix, which approximates the Hessian ∇ 2 f (x). By using (6) and (7), Perry gave an extension of (5) as: and using (4), the Perry search direction is given as where and Following Perry's approach, Dai and Liao [18] incorporated a nonnegative parameter t to propose the following extension of (8): It is noted that for t = 0, (12) reduces to (5), and if t = 1, we obtain Perry's condition (8). Consequently, by substituting (4) into (12), Dai and Liao [18] proposed the following CG update parameter: Numerical results have shown that the DL method is effective; however, it is much dependent on the nonnegative parameter t for which there is no optimal value [4], and it may not necessarily generate descent directions [8].
That is, the method may not satisfy the descent condition or the sufficient descent condition, namely there exists a constant λ > 0 such that Based on the DL conjugacy condition (12), conjugate gradient methods have been proposed over the years using modified secant equations. For example, Babaie-Kafaki et al. [13] and Yabe and Takano [55] proposed CG methods by applying a revised form of the modified secant equation proposed by Zhang and Xu [63] and Zhang et al. [64] and the modified secant equation proposed by Li and Fukushima [36]. Li et al. [37] applied the modified secant equation proposed by Wei et al. [54], while Ford et al. [26] employed the multi-step quasi-Newton conditions proposed by Ford and Moghrabi [27,28]. CG methods based on modified secant equations have also been studied by Narushima and Yabe [57] and Reza Arazm et al. [7]. These methods have been found to be numerically efficient and globally convergent under suitable conditions, but like the DL method, they also fail to ensure sufficient descent. Recently, by employing Perry's idea [44], efficient CG methods with descent directions have been proposed. Liu and Shang [39] proposed a Perry conjugate gradient method, which provides prototypes for developing other special form of the Perry method like the HS method and the DL method [18]. Liu and Xu [40] presented a new Perry CG method with sufficient descent properties, which is independent of any line search. Also, based on the self-scaling memoryless BFGS update, Andrei [6] proposed an accelerated adaptive class of Perry conjugate gradient algorithms, whose search direction is determined by symmetrization of the scaled Perry CG direction [44].
CG methods for systems of nonlinear equations are rare as most of the methods are for unconstrained optimization. However, over the years, the method has been extended to large-scale nonlinear systems of equations by researchers. Using a combination of the Polak-Ribieré-Polyak (PRP) conjugate gradient method for unconstrained optimization [45,47] and the hyperplane projection method of Solodov and Svaiter [48], Cheng [16] proposed a PRP-type method for systems of monotone equations. Yu [58,59] extended the PRP method [45] to solve large-scale nonlinear systems with monotone line search strategies, which are modifications of the Grippo-Lampariello-Lucidi [29] and Li-Fukushima [35] schemes. As a further research of the Perry's conjugate gradient method, Dai et al. [21] combined the modified Perry conjugate gradient method [41] and the hyperplane projection technique of Solodov and Svaiter [48] to propose a derivative-free method for solving large-scale nonlinear monotone equations. By combining the descent Dai-Liao CG method by Babaie-Kafaki and Ghanbari [54] and the projection method in [48], Abubakar and Pumam [2] proposed a descent Dai-Liao CG method for nonlinear equations. Numerical results show the method to be efficient. Based on the projection strategy [48], Liu and Feng [38] proposed a derivative-free iterative method for large-scale nonlinear monotone equations, which can be used to solved large-scale non-smooth problems due to its lower storage and derivative-free information. Abubakar and Kumam [1] proposed an improved three-term derivative-free method for solving large-scale nonlinear equations. The method is based on a modified HS method with the projection technique of Solodov and Svaiter [48]. Abubakar et al. [3] proposed a descent Dai-Liao CG method for solving nonlinear convex constraint monotone equations. The method is an extension of the method in [2]. By using a convex combination of two different positive spectral coefficients, Mohammed and Abubakar [42] proposed a combination of positive spectral gradient-like method and projection method for solving nonlinear monotone equations. Awwal et al. [43] proposed a hybrid spectral gradient algorithm for system of nonlinear monotone equations with convex constraints. The scheme is combination of a convex combination of two different positive spectral parameters and the projection technique.
Here, based on the work of Babaie-Kafaki and Ghanbari [9], and the Dai-Liao (DL) [18] approach, we propose a Dai-Liao conjugate gradient method for system of nonlinear equations by incorporating an extended secant equation in the classical DL update.
Throughout this work, we use . to denote the Euclidean norm of vectors, We also assume that problem (1) is Lipschitz continuous and f in (2) is specified by The paper is organized as follows: in Sect. 2, we present details of the method. Convergence analysis is presented in Sect. 3. Numerical results of the method are presented in Sect. 4. Finally, conclusions are made in Sect. 5.

Proposed method and its algorithm
Following the Dai-Liao approach, Babaie-Kafaki and Ghanbari [9] proposed the following extension of the PRP update parameter where β P R P k is the classical PRP parameter and t is a nonnegative parameter, whose values were determined by carrying out eigenvalue analysis. Motivated by this, and employing similar approach, we propose a modification of the classical DL update parameter. In what follows, we suggest an extension of some previously modified secant equations.
Here, we propose the following secant equation as an extension of (6), (18), and (20): where φ is a nonnegative parameter, ϑ k−1 is defined by (21) and s T k−1 μ k−1 = 0. We observe that for φ = 0, (22) becomes the standard secant equation defined by (6), and if φ = 3 2 , (22) reduces to (19). Also, for φ = 1 2 , we see that (22) reduces to the modified secant equation proposed by Zhang et al. [64]. Substituting u k−1 in (22) for y k−1 in (13), we obtain the following version of the DL update parameter: Observe that, in general, the denominator, d T k−1 u k−1 may not be nonzero since ϑ k−1 as defined in (22) may be non-positive. Therefore, we redefine u k−1 and obtain its revised form as Consequently, we get the revised form of (23) aŝ Andrei [4] noted that the parameter t has no optimal choice and so, to obtain descent directions for our proposed method, we proceed to obtain appropriate values for t. From (4), and after some algebra, our search direction becomes: Following Perry's approach [44], search direction of our proposed method can be written as where H k , called the search direction matrix is given by and z k−1 is as defined by (24). And from (27) we can write (30) is a symmetric matrix.

Proposition 2.1 The matrixH k defined by
Proof Using direct computation, we see thatH k =H T k . Hence,H k is symmetric. And so, to analyze the descent property of our method, we need to find eigenvalues ofH k and their structure. (30). Then, the eigenvalues ofH k consist of 1 with (n − 2 multiplicity), λ + k and λ − k , where

Theorem 2.2 Let the matrixH k be defined by
and a k = t s k−1 2 By multiplying both sides of (30) by τ i k−1 , we obtain which can be viewed as an eigenvector equation. So, τ i k−1 , for i = 1, . . . , n − 2 are the eigenvectors ofH k with eigenvalue 1 each. Let λ + k and λ − k be the remaining two eigenvalues, respectively. Observe that (30) can be written asH Clearly,H k represents a rank-two update, so from the fundamental algebra formula (see inequality (1.2.70)) of [49] det Since sum of the eigenvalues of a square symmetric matrix equals to its trace, from (30), we have for which we obtain Using the relationship between trace and determinant of a matrix and its eigenvalues, we can obtain λ + k and λ − k as roots of the following quadratic polynomial: So, the remaining two eigenvalues are obtained from (40). And applying the quadratic formula with some rearrangements, we obtain We can write (41) as which proves (31) and (32).
To obtain λ + k and λ − k as real numbers, we must have = (a k − 1) 2 + b k−1 ≥ 0. From Cauchy inequality, b k = s k−1 2 z k−1 2 (s T k−1 z k−1 ) 2 ≥ 1, so, > 0. Consequently, both eigenvalues are real numbers and λ + k > 0 since (1 + a k ) is nonnegative. And to obtain λ − k > 0, the following must be satisfied: After some algebra, we obtain the following estimation for the parameter t, which satisfies (43): So, λ − k > 0 if (44) is satisfied. In addition, for t satisfying (44),H k is nonsingular. Therefore, all the eigenvalues of the symmetric matrixH k are positive real numbers, which ensures that it is a positive-definite matrix. Moreover, using (42) and (44), we obtain the following estimation for λ + k and λ − k : And the proof is complete. Hence, from (29), we have which shows that the descent condition is satisfied. We, therefore, propose the following formula for the parameter t in the modified DL method: where ξ > 1 4 and γ < 1 4 . Remark 2.3 Since the DL parameter t is nonnegative, we restrict the values of the parameter γ in (47) to be negative so as to avoid a numerically unreasonable approximation [32]. So, based on the above remark, we can write the modified DL update parameter as with ξ ≥ 1 4 and γ < 0 satisfying (47) and guaranteeing the descent condition. We also write the search direction for the proposed method as We use the derivative-free line search proposed by Li and Fukushima [34] to compute our step length α k .
Let σ 1 > 0, σ 2 > 0 and r ∈ (0, 1) be constants and let {η k } be a given positive sequence such that and Let i k be the smallest non-negative integer i such that (51) holds for α = r i . Let α k = r i k . Now, we describe the algorithm of the proposed method as follows:
Otherwise, compute the search direction d k by (49).
Step 3 Compute α k via the line search in (51).
Step 4 Set x k+1 = x k + α k d k .
Step 5 Set k := k + 1 and go to Step 2.

Convergence analysis
The following assumptions are required to analyze the convergence of the ADLCG algorithm.

Assumption 3.1 The level set
is bounded. (3) F is Lipschitz continuous in some neighborhood N of ; namely, there exists a positive constant L > 0 such that, for all x, y ∈ N .
Assumption (3.1) and condition (3) imply that there exists a positive constant ω such that for all x ∈ , (see Proposition 1.3 of [13]). (4) The Jacobian of F is bounded, symmetric and positive-definite on 1 , which implies that there exist constants m 2 ≥ m 1 > 0 such that and This implies that {x k } ⊂ and lim Proof From the line search (51) and for all k > 0, we obtain And by summing up the above k inequality, we obtain Therefore, by (52) and since {η i } satisfies (50), then the series k i=0 α k d k 2 is convergent, which implies that (59) holds. Using the same argument as above, with σ 1 α k F (x k ) 2 on the left-hand sides, we obtain (60).
On the other hand, from (46), and (45), we have But from (45), we have Thus, from (76) and applying the sandwich theorem, we obtain Therefore, lim And the proof is completed.

Numerical result
In this section, we test the efficiency and robustness of our proposed approach using the following method in the literature: A new derivative-free conjugate gradient method for solving large-scale nonlinear systems of equations (NDFCG) [24]. All the codes used were written in MATLAB R2014a environment and run on a personal computer (2.20GHZ CPU, 8GB RAM). Also, the two algorithms used in the experiment were implemented with the same line search procedure, and the parameters are set to σ 1 = σ 2 = 10 −4 , α 0 = 0.1, r = 0.2 and η k = 1 (k+1) 2 . In addition, we set ξ = 0.5, γ = −0.5 and μ k−1 = s k−1 for the ADLCG method. Also, the iteration was set to terminate if it exceeds 2000 or the inequality F k ≤ 10 −10 is satisfied ( Table 1).
The two algorithms were tested using the following test problems with various sizes: Problem 4.1 [2] The elements of the function F (x) are given by: Problem 4.2 [2] The elements of the function F (x) are given by: Problem 4.3 [67] The elements of the function F (x) are given by: Problem 4.4 [56] The elements of the function F (x) are given by: x i + i, i = 1, 2, . . . , n.. Problem 4.5 [38] The elements of the function F (x) are given by: Problem 4.6 [51] The function F (x) is given by Problem 4.7 [61] The elements of the function F (x) are given by: Problem 4.8 [2] The elements of the function F (x) are given by:  Problem 4.9 [2] The elements of the function F (x) are given by: n−1 + x 2 n . Problem 4.10 [2] The elements of the function F (x) are given by: F n (x) = x n − e cos x n−1 + x n n + 1 .
Using the performance profile of Dolan and More [23], we generate Figs. 1 and 2 to show the performance and efficiency of each of the two methods. To better illustrate the performance of the two methods, a summary of the results is presented in Table 2. The summarized data show the number of problems for which each method is a winner in terms of number of iterations and CPU time, respectively. The corresponding percentages of number of problems solved are also indicated In Figs. 1 and 2, we observed that the curve representing the ADLCG method is above the curve representing the NDFCG method. This is a measure of the efficiency of the ADLCG method compared to the NDFCG scheme.
Similarly, the summary reported in Table 2 indicated that the ADLCG method is a winner with respect to number of iterations and CPU time. The table shows that the ADLCG method solves 95% (76 out of 80) of the problems with less number of iterations compared to the NDFCG method, which solves only 3.75% (3 out of 80). The summarized result also shows that both methods solve 1 problem with the same number of iteration, which translates to 1.25% and is reported as undecided. Also, the summary indicated that the ADLCG method outperforms the NDFCG scheme as it solves 72.5% (58 out of 80) of the problems with less CPU time compared to 27.5% (22 out of 80) solved by the NDFCG. Therefore, it is clear from Figs. 1 and 2 and the summarized result in Table 2 that our method is more efficient than the NDFCG method and better for large-scale nonlinear systems.

Conclusion
In this work, we proposed a Dai-Liao conjugate gradient method via modified secant equation for systems of nonlinear equations. This was achieved by finding appropriate values for the nonnegative parameter in the DL method using of an extended secant equation developed from the work of Zhang et al. [64] and Wei et al [54]. Numerical comparisons with some existing methods and Global convergence show that the method is efficient.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.