Symmetric Perry conjugate gradient method
 1.3k Downloads
 2 Citations
Abstract
A family of new conjugate gradient methods is proposed based on Perry’s idea, which satisfies the descent property or the sufficient descent property for any line search. In addition, based on the scaling technology and the restarting strategy, a family of scaling symmetric Perry conjugate gradient methods with restarting procedures is presented. The memoryless BFGS method and the SCALCG method are the special forms of the two families of new methods, respectively. Moreover, several concrete new algorithms are suggested. Under Wolfe line searches, the global convergence of the two families of the new methods is proven by the spectral analysis for uniformly convex functions and nonconvex functions. The preliminary numerical comparisons with CG_DESCENT and SCALCG algorithms show that these new algorithms are very effective algorithms for the largescale unconstrained optimization problems. Finally, a remark for further research is suggested.
Keywords
Conjugate gradient method Descent property Spectral analysis Global convergence1 Introduction
This paper is organized as follows. In Sect. 2, first, the family of the symmetric Perry conjugate gradient methods is deduced. Then the spectra of the iteration matrix are analyzed, so, its sufficient descent property is proved and several concrete algorithms are proposed. In Sect. 3, the scaling technology and the restarting strategy are applied to the symmetric Perry conjugate gradient methods, thus, a family of scaling Perry conjugate gradient methods with restarting procedures is developed. In Sect. 4, the global convergence of the two families of the new methods with the Wolfe line searches is proven by the spectral analysis of the conjugate gradient iteration matrix. In Sect. 5, the preliminary numerical results are reported. A remark for further research is given in Sect. 6.
2 The symmetric Perry conjugate gradient method
The method formulated by (1) and (5) is called the symmetric Perry conjugate gradient method, denoted by SPCG. And the directions generated by (5) are called the symmetric Perry conjugate gradient directions, which will be proven to be descent directions in Sect. 2.2.
From the above discussions, a family of new nonlinear conjugate gradient algorithms can be obtained as follows:
Algorithm 1
(SPCG)
 Step 1.

Give an initial point x _{1} and ε≥0. Set k=1.
 Step 2.

Calculate g _{1}=g(x _{1}). If ∥g _{1}∥≤ε then stop, otherwise let d _{1}=−g _{1}.
 Step 3.

Calculate steplength α _{ k } with line searches.
 Step 4.

Set x _{ k+1}=x _{ k }+α _{ k } d _{ k }.
 Step 5.

Calculate g _{ k+1}=g(x _{ k+1}). If ∥g _{ k+1}∥≤ε then stop.
 Step 6.

Calculate the directions d _{ k+1} via (5) with different σ.
 Step 7.

Set k=k+1, then go to step 3.
Remark 1
2.1 Spectral analysis
Here, we analyze the spectra of the Perry matrix and the symmetric Perry matrix.
Theorem 1
Proof
According to Theorem 1, the following theorem for the symmetric Perry matrix Q _{ k+1} defined by (6) can be deduced.
Theorem 2
Proof
From the above theorem, we can easily obtain the following corollary.
Corollary 1
Proof
2.2 Descent property
3 Scaling technology and restarting strategy
Hence, we can introduce the following scaling symmetric Perry conjugate gradient method with restarting procedures (SSPCGRP).
Algorithm 2
(SSPCGRP)
 Step 1.

Give an initial point x _{1} and ε≥0. Set k=1 and Nrestart=0.
 Step 2.

Calculate g _{1}=g(x _{1}). If ∥g _{1}∥≤ε, then stop, otherwise, let d _{1}=−g _{1}.
 Step 3.

Calculate steplength α _{ k } using the Wolfe line searches (15) and (16) with initial guess α _{ k,0}, where α _{1,0}=1/∥g _{1}∥ and α _{ k,0}=α _{ k−1}∥d _{ k−1}∥/∥d _{ k }∥ when k≥2.
 Step 4.

Set x _{ k+1}=x _{ k }+α _{ k } d _{ k }.
 Step 5.

Calculate g _{ k+1}=g(x _{ k+1}). If ∥g _{ k+1}∥≤ε then stop.
 Step 6.

If the Powell restarting criterion (39) holds, then calculate the directions d _{ k+1} via (38) with different σ and ρ, let y _{ r }=y _{ k } and s _{ r }=s _{ k } (store y _{ r } and s _{ r }), set Nrestart=Nrestart+1 and k=k+1, go to step 3. Otherwise, go to step 7.
 Step 7.

If Nrestart=0, then calculate the directions d _{ k+1} via (38) with different σ and ρ, otherwise, calculate d _{ k+1} via (48), where \(\widehat{y}_{k}\) and \(\widehat{g}_{k+1}\) are computed by (46) and (47), respectively, \(\widetilde{\sigma}\), \(\widehat{\sigma}\) and \(\widehat{\rho}\) are preset parameters.
 Step 8.

Set k=k+1, go to step 3.
In Algorithm 2, k and Nrestart record the number of iterations and the number of restarting procedures, respectively.
When ρ=1 and \(\sigma=c_{1}\frac{y_{k}^{\mathrm {T}}y_{k}}{s_{k}^{\mathrm {T}}y_{k}}\) in (38), and \(\widehat{\rho}=1\), \(\widetilde{\sigma} =c_{2}\frac{y_{k}^{\mathrm{T}}H_{r+1}y_{k}}{s_{k}^{\mathrm{T}}y_{k}}\) and \(\widehat{\sigma}=c_{2}\frac{y_{r}^{\mathrm{T}}y_{r}}{s_{r}^{\mathrm {T}}y_{r}}\) in (46)–(48), then the SSPCGRP algorithm is denoted by SPDRP, or SPDRP(c _{1},c _{2}) to indicate the dependence on the positive constants c _{1} and c _{2}. Especially, when they are equal to 1, i.e., \(\sigma=\frac{y_{k}^{\mathrm{T}}y_{k}}{s_{k}^{\mathrm{T}}y_{k}}\) in (38), \(\widetilde{\sigma}\) and \(\widehat{\sigma}\) are computed by (45), the condition numbers κ _{2}(Q _{ k+1})=κ _{2}(ρQ _{ k+1}), κ _{2}(H _{ k+1}) and κ _{2}(H _{ r+1}) are optimal, where Q _{ k+1} is defined by (6) with \(\sigma=\frac{y_{k}^{\mathrm{T}}y_{k}}{s_{k}^{\mathrm{T}}y_{k}}\). So, the SSPCGRP algorithm is called the symmetric Perry descent conjugate gradient method with optimal condition numbers and restarting procedures, denoted by SPDOCRP.
When ρσ=1, \(\rho=\frac{s_{k}^{\mathrm {T}}s_{k}}{y_{k}^{\mathrm {T}}s_{k}}\), \(\widehat{\rho}\ \widehat{\sigma}=1\), \(\widehat{\rho}=\frac{s_{r}^{\mathrm{T}}s_{r}}{y_{r}^{\mathrm{T}}s_{r}}\) and \(\widetilde{\sigma}=1\), these formulas (38), (46), (47) and (48) were used by N. Andrei in [2], the SSPCGRP algorithm becomes the SCALCG algorithm with the spectral choice for θ _{ k+1} [2], it is also called AndreiPerry conjugate gradient method with restarting procedures.
4 Convergence
 H1.

f is bounded below in \(\mathbb{R}^{n}\) and f is continuously differentiable in a neighborhood \(\mathcal{N}\) of the level set \(\mathcal{L} \stackrel{def}{=}\{x : f(x)\le f(x_{0})\}\), where x _{0} is the starting point of the iteration.
 H2.
 The gradient of f is Lipschitz continuous in \(\mathcal{N}\), that is, there exists a constant L>0 such that$$ \bigl\\nabla f(\bar{x})\nabla f(x)\bigr\ \leq L\ \bar{x}x \, \quad \forall\bar{x},x \in\mathcal{N}. $$(49)
Next, we introduce the spectral condition lemma of the global convergence for an objective function satisfying H1 and H2, which comes from [18], Theorem 4.1.
Lemma 1
Remark 2
In what follows, the convergence of these resulting algorithms is proved by evaluating the spectral boundary of the iteration matrix and Lemma 1. The proof method is called the spectral method. It should be pointed out that the proof method also can be applied to the nonsymmetric conjugate gradient methods, if the positive square root of the maximum eigenvalue \(M_{k}^{\mathrm{T}}M_{k}\) substitutes for with the one of M _{ k } in Lemma 1, that is, the maximum singular value of M _{ k } substitutes for the maximum eigenvalue of M _{ k } (see Theorem 3.1 in [17]).
4.1 The convergence for uniformly convex functions
 H3.
 There exists a constant m>0 such that$$ \bigl(\nabla f(\bar{x})\nabla f(x) \bigr)^{\mathrm{T}}(\bar{x}x)\ge m \\bar{x}x\^2 \quad \forall\bar{x},\ x \in\mathcal{N}. $$(54)
Theorem 3
Assume that H1, H2 and H3 hold. Let ν _{0} and ν _{1} be two positive constants. For the symmetric Perry conjugate gradient method (1) and (5) with ν _{0}≤σ≤ν _{1}, the Wolfe line searches (15) and (16) are implemented. If g _{1}≠0 and steplength α _{ k }>0 for k≥1, then g _{ k }=0 for some k>1, or lim_{ k→∞}∥g _{ k }∥=0.
Proof
Assume that g _{ k }≠0, \(\forall k\in\mathbb{N}\). Below, by induction, we first prove that the line search direction d _{ k }, defined by (5), satisfies the sufficient descent property (8).
When k=1, \(d_{1}^{\mathrm{T}}g_{1}=\g_{1}\^{2}<0\). From (16), it follows that \(s_{1}^{\mathrm{T}}y_{1}\ge(1b_{2})\alpha_{1} d_{1}^{\mathrm{T}}g_{1}>0\).
Remark 3
Next, we prove the global convergence of the SSPCGRP method for uniformly convex functions.
Theorem 4
Assume that H1, H2 and H3 hold, and that ν _{0} and ν _{1} are two positive constants. Let the sequence {x _{ k }} be generated by the SSPCGRP algorithm (Algorithm 2), where the five different parameters σ, ρ, \(\widetilde{\sigma}\), \(\widehat{\sigma}\) and \(\widehat{\rho}\) satisfy \(\nu_{0}\le\sigma, \rho, \widetilde{\sigma},\widehat{\sigma },\widehat {\rho}\le \nu_{1}\). If g _{ 1}≠0, and steplength α _{ k }>0 for k≥1, then g _{ k }=0 for some k>1, or lim_{ k→∞}∥g _{ k }∥=0.
Proof
In what follows, by induction, we prove that \(\widetilde{y}_{k}^{\mathrm{T}}\widetilde{s}_{k}>0\) and the sufficient descent property (8) is true for all k.
If the Powell restarting criterion (39) never holds for all k≥1, the iteration matrix is ρQ _{ k+1}. Thus, similar to Theorem 3, it can be easily shown that the results of Theorem 4 are true.
Hence, the spectral condition number of the iteration matrix of Algorithm 2 is uniformly bounded above, which claims that the results of Theorem 4 are true according to Remark 2. □
From (56) and this theorem, it can be shown that the SPDRP algorithm and the SCALCG algorithm with the spectral choice [2] are global convergence for uniformly convex functions under the Wolfe line searches.
4.2 The convergence for general nonlinear functions
For general nonlinear functions, we first have following result for the symmetric Perry conjugate gradient method.
Theorem 5
Assume that H1 and H2 hold. For the symmetric Perry conjugate gradient method (1) and (5) with \(\sigma=c\frac{y_{k}^{\mathrm{T}}y_{k}}{s_{k}^{\mathrm{T}}y_{k}}\), where c is a positive constant, if the line searches satisfy the Wolfe conditions (15) and (16), then lim_{ k→∞}∥y _{ k }∥=0 implies that lim inf_{ k→∞}∥g _{ k }∥=0.
Proof
Next, we prove the global convergence of the SSPCGRP algorithm (Algorithm 2) for general nonlinear functions.
Theorem 6
Assume that H1 and H2 hold. Let the sequence {x _{ k }} be generated by the SSPCGRP algorithm with \(\sigma=c\frac{y_{k}^{\mathrm{T}}y_{k}}{s_{k}^{\mathrm{T}}y_{k}}\) and ν _{0}≤ρ≤ν _{1} in (38), where c, ν _{0} and ν _{1} are positive constants. If the line searches satisfy the Wolfe conditions (15) and (16), then lim_{ k→∞}∥y _{ k }∥=0 implies that lim inf_{ k→∞}∥g _{ k }∥=0.
Proof
The above two theorems show that the SPDCG(c) algorithm and the SPDRP(c _{1},c _{2}) algorithm are global convergence for the nonconvex functions under the Wolfe line searches, as lim_{ k→∞}∥y _{ k }∥=0. The condition for the global convergence, lim_{ k→∞}∥y _{ k }∥=0, was used by J.Y. Han, et al. in [14].
5 Numerical experiments
The numerical experiments use two groups test functions, one group (145 test functions) is taken from the CUTEr [9] library, referring to website:
which is only used to test mBFGS, SPDCG, RSPDCGs and CG_DESCENT algorithms. In order to compare with the SCALCG algorithm, the second group consists of the 73 unconstrained problems but the 71st in SCALCG Fortran software package coded by N. Andrei, referring to website:
http://camo.ici.ro/forum/SCALCG/.
For the second group, each test function is made ten experiments with the number of variable 1000,2000,…,10000, respectively. The starting points used are those given in the code, SCALCG.
The SPDCG, mBFGS and RSPDCGs algorithms are coded according to the package, CG_DESCENT (C language, Version 5.3), with minor revisions and implement the approximate Wolfe line searches with the default parameters in CG_DESCENT [10, 12]. The package, CG_ DESCENT, can be got from Hager’s web page at
http://www.math.ufl.edu/~hager/.
In addition, in order to compare with the SCALCG algorithm, all subroutines of the SPDRP algorithm are written in Fortran 77 with the double precision, and the SPDRP algorithm uses the Wolfe line searches in the SCALCG Fortran code.
The termination criterion of all algorithms is that ∥g∥_{∞}<10^{−6}, where ∥⋅∥_{∞} is the infinity norm of a vector. The maximum number of iterations is 500n, where n is the number of variables. The tests are performed on PC (Dell Inspiron 530), Intel^{®} Core™ 2 Duo, E4600, 2.40 GHz, 2.39 GHz, RAM 2.00 GB, with the gcc and g77 compilers.
For the first group of test functions, to compare the algorithms: mBFGS and SPDOC with the RSPDCGs and CG_DESCENT algorithms. we divide the group into two parts: large scale problems, whose numbers of variables are not less than 100 (72 test functions), and small scale problems, whose numbers of variables are less than 100 (73 test functions).
So, the preliminary numerical experiments show that SPDOC and SPDOCRP are very effective algorithms for the large scale unconstrained optimization problems.
In addition, for the SPDCG algorithm, the inequality (33) shows that the descent degree of the line search directions of the algorithm becomes higher and higher as the value of c increases, but the performance of the algorithm is not directly proportional to c. In fact, the line search directions generated by the SPDCG algorithm vary with the value of c. What kind of criterion can be used to evaluate the performance of an algorithm? Does the criterion exist? These are still open problems. Of course, the condition number and the descent property are two important factors.
6 Conclusion
For the parameter σ in SPCG algorithm, besides the cases mentioned above, there also exist other choices, such as \(\sigma =c^{2}\frac{s_{k}^{\mathrm{T}}s_{k}}{s_{k}^{\mathrm{T}} y_{k}}\), \(\sigma =c\frac {s_{k}^{\mathrm{T}} y_{k}}{s_{k}^{\mathrm{T}}s_{k}}\), \(\sigma=c \frac{s_{k}^{\mathrm {T}}y_{k}}{y_{k}^{\mathrm{T}}y_{k}}\), \(\sigma=c \frac{s_{k}^{\mathrm{T}}s_{k}}{y_{k}^{\mathrm{T}}y_{k}}\), \(\sigma=c \frac{y_{k}^{\mathrm{T}}y_{k}}{s_{k}^{\mathrm{T}}s_{k}}\), and so on, where c>0.
For the SSPCGRP algorithm, when ρσ=1, \(\sigma=\frac{y_{k}^{\mathrm{T}}y_{k}}{s_{k}^{\mathrm{T}}y_{k}}\), \(\widehat{\sigma}=\frac{y_{r}^{\mathrm{T}}y_{r}}{s_{r}^{\mathrm{T}}y_{r}}\), \(\widehat{\rho}\ \widehat{\sigma}= \widetilde{\sigma}=1\), these formulas (38), (46), (47) and (48) were suggested by D. F. Shanno in [25] and [26]. When \(\rho=\sigma=\widehat{\rho}=\widetilde{\sigma}=\widehat {\sigma }=1\), the SSPCGRP algorithm becomes the memoryless BFGS conjugate gradient method with restarting procedures. Therefore, it is worthy of studying further how the parameters σ and ρ are chosen to construct more effective nonlinear conjugate gradient algorithms.
The condition number of Q _{ k+1} defined by (6) only depends on the parameter σ and the condition number of ρQ _{ k+1} is the same as the one of Q _{ k+1} (see (37)), so, we let ρ=1 and \(\widehat{\rho}=1\) in SSPCGRP algorithm. That is to say, σ can scale the symmetric Perry iteration matrix Q _{ k+1}. Therefore, the symmetric Perry conjugate gradient methods have the selfscaling property, Similarly, σ can also alter the maximum and minimum eigenvalues of the Perry iteration matrix P _{ k+1} defined by (4), and P _{ k+1} is a selfscaling matrix. Thus, the parameter σ in the condition (12) is a selfscaling factor, which can alter the condition number of the iteration matrix of the conjugate gradient method.
Notes
Acknowledgements
The authors are grateful to the anonymous referees and the Editorinchief, Prof. W.W. Hager, for their valuable comments and suggestions on the original version of this paper. The authors also thank N. Andrei for the Fortran code, SCALCG, W.W. Hager and H. Zhang for the C code, CG_DESCENT (Version 5.3) and J.J. Moré for Matlab code, perf.m. Finally, the authors thank PhD Kuiting Zhang (Department of Computer Science, Weifang University) for his help of C language and the Linux system.
Supplementary material
References
 1.AlBaali, M.: Descent property and global convergence of the FletcherReeves method with inexact linesearch. IMA J. Numer. Anal. 5, 121–124 (1985) MathSciNetCrossRefzbMATHGoogle Scholar
 2.Andrei, N.: Scaled conjugate gradient algorithms for unconstrained optimization. Comput. Optim. Appl. 38, 401–416 (2007) MathSciNetCrossRefzbMATHGoogle Scholar
 3.Andrei, N.: A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimization. Appl. Math. Lett. 20, 645–650 (2007) MathSciNetCrossRefzbMATHGoogle Scholar
 4.Dai, Y.H., Liao, L.Z.: New conjugate conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 43, 87–101 (2001) MathSciNetCrossRefzbMATHGoogle Scholar
 5.Dai, Y.H., Yuan, Y.X.: A nonlinear conjugate gradient with a strong global convergence properties. SIAM J. Optim. 10, 177–182 (1999) MathSciNetCrossRefzbMATHGoogle Scholar
 6.Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program., Ser. A 91, 201–213 (2002) CrossRefzbMATHGoogle Scholar
 7.Fletcher, R.M., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J. 7, 149–154 (1964) MathSciNetCrossRefzbMATHGoogle Scholar
 8.Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1992) MathSciNetCrossRefzbMATHGoogle Scholar
 9.Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEr (and SifDec), a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29(4), 373–394 (2003) MathSciNetCrossRefzbMATHGoogle Scholar
 10.Hager, W.W., Zhang, H.: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16, 170–192 (2005) MathSciNetCrossRefzbMATHGoogle Scholar
 11.Hager, W.W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2, 35–58 (2006) MathSciNetzbMATHGoogle Scholar
 12.Hager, W.W., Zhang, H.: Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. 32(1), 113–137 (2006) MathSciNetCrossRefGoogle Scholar
 13.Hager, W.W., Zhang, H.: The limited memory conjugate gradient method (2012). www.math.ufl.edu/~hager/papers/CG/lcg.pdf
 14.Han, J.Y., Liu, G.H., Yin, H.X.: Convergence of Perry and Shanno’s memoryless quasiNewton method for nonconvex optimization problems. OR Trans. 1, 22–28 (1997) Google Scholar
 15.Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–439 (1952) MathSciNetCrossRefzbMATHGoogle Scholar
 16.Liu, Y., Storey, C.: Efficient generalized conjugate gradient algorithms, part 1: theory. J. Optim. Theory Appl. 69, 129–137 (1991) MathSciNetCrossRefzbMATHGoogle Scholar
 17.Liu, D.Y., Shang, Y.F.: A new Perry conjugate gradient method with the generalized conjugacy condition. In: Computational Intelligence and Software Engineering (CiSE), 2010 International Conference on Issue Date: 10–12 Dec. 2010. doi: 10.1109/CISE.2010.5677114 Google Scholar
 18.Liu, D.Y., Xu, G.Q.: Applying Powell’s symmetrical technique to conjugate gradient methods. Comput. Optim. Appl. 49(2), 319–334 (2011). doi: 10.1007/s1058900993021 MathSciNetCrossRefzbMATHGoogle Scholar
 19.Liu, D.Y., Xu, G.Q.: A Perry descent conjugate gradient method with restricted spectrum, optimization online, nonlinear optimization (unconstrained optimization), March 2011. http://www.optimizationonline.org/DB_HTML/2011/03/2958.html
 20.Oren, S.S., Spedicato, E.: Optimal conditioning of selfscaling variable metric algorithms. Math. Program. 10, 70–90 (1976) MathSciNetCrossRefzbMATHGoogle Scholar
 21.Perry, A.: A modified conjugate gradient algorithm. Oper. Res., Tech. Notes 26(6), 1073–1078 (1978) MathSciNetCrossRefzbMATHGoogle Scholar
 22.Polak, E., Ribière, G.: Note sur la convergence de méthodes de directions conjuguées. Rev. Fr. Inform. Rech. Oper. 3(16), 35–43 (1969) zbMATHGoogle Scholar
 23.Polyak, B.T.: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 9, 94–112 (1969) CrossRefGoogle Scholar
 24.Powell, M.J.D.: Restart procedures for the conjugate gradient method. Math. Program. 12, 241–254 (1977) CrossRefzbMATHGoogle Scholar
 25.Shanno, D.F.: Conjugate gradient methods with inexact searches. Math. Oper. Res. 3, 244–256 (1978) MathSciNetCrossRefzbMATHGoogle Scholar
 26.Shanno, D.F.: On the convergence of a new conjugate gradient algorithm. SIAM J. Numer. Anal. 15, 1247–1257 (1978) MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.