A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems

Mishra, Shashi Kant; Chakraborty, Suvra Kanti; Samei, Mohammad Esmael; Ram, Bhagwat

doi:10.1186/s13660-021-02554-6

A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems

Research
Open access
Published: 28 January 2021

Volume 2021, article number 25, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Inequalities and Applications Submit manuscript

A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems

Download PDF

Shashi Kant Mishra¹,
Suvra Kanti Chakraborty²,
Mohammad Esmael Samei ORCID: orcid.org/0000-0002-5450-3127³ &
…
Bhagwat Ram⁴

6309 Accesses
13 Citations
Explore all metrics

Abstract

A Polak–Ribière–Polyak (PRP) algorithm is one of the oldest and popular conjugate gradient algorithms for solving nonlinear unconstrained optimization problems. In this paper, we present a q-variant of the PRP (q-PRP) method for which both the sufficient and conjugacy conditions are satisfied at every iteration. The proposed method is convergent globally with standard Wolfe conditions and strong Wolfe conditions. The numerical results show that the proposed method is promising for a set of given test problems with different starting points. Moreover, the method reduces to the classical PRP method as the parameter q approaches 1.

Three modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property

Article Open access 08 April 2015

On q-variant of Dai–Yuan conjugate gradient algorithm for unconstrained optimization problems

Article 04 April 2021

A conjugate gradient-based algorithm for large-scale quadratic programming problem with one quadratic constraint

Article 09 May 2019

1 Introduction

The conjugate gradient (CG) methods have played an important role in solving nonlinear optimization problems due to their simplicity of iteration and very low memory requirements [1, 2]. Of course, the CG methods are not among the fastest or most robust optimization algorithms for solving nonlinear problems today, but they are very popular among engineers and mathematicians to solve nonlinear optimization problems [3–5]. The origin of the methods dates back to 1952 when Hestenes and Stiefel introduced a CG method [6] for solving a symmetric positive definite linear system of equations. Further, Fletcher and Reeves modified the same method called FR [7] in the 1960s and developed a conjugate gradient method to solve unconstrained nonlinear optimization problems.

The conjugate gradient methods deflect the steepest descent method [8] by adding to it a positive multiple of the direction used in the previous step. They only require the first-order derivative and overcome the shortcomings of the slow convergence rate of the steepest descent method. By means of conjugacy, the conjugate gradient methods make the steepest descent direction to account for conjugacy and thus enhance the efficiency and reliability of the algorithm. Different conjugate gradient algorithms correspond to different choices of the scalar parameter $\beta _{k}$ [6, 7, 9]. The parameter $\beta _{k}$ is selected to minimize a convex quadratic function in a subspace spanned by a set of mutually conjugate descent directions, but the effectiveness of the algorithm depends on the accuracy of the line searches.

Quantum calculus, known as q-calculus, is the study of calculus without limits, where classical mathematical formulas are obtained as q approaches 1. In q-calculus, the classical derivative is replaced by the q-difference operator. Jackson [10, 11] was the first to have some applications of the q-calculus and introduced the q-analogue of the classical derivative and integral operators. Applications of q-calculus play an important role in various fields of mathematics and physics [12–20].

In 1969, Polak and Ribière [21] and Polyak [22] proposed a conjugate gradient method independently, later it was called Polak, Ribière, and Polyak (PRP) method. In view of the practical computation, the PRP method performed much better than the FR method for many unconstrained optimization problems because it automatically recovered once a small step length was generated, although the global convergence of the PRP method was proved only for the strictly convex functions [23]. For general nonlinear functions, Powell showed that the PRP method could cycle infinitely without approaching a solution even if the step-length was chosen to be the least positive minimizer of the line search function [24]. To change this unbalanced state, Gilbert and Nocedal [25] considered Powell’s suggestions [26] to modify the PRP method and showed that this modification of the PRP method is globally convergent for exact and inexact line searches.

In 2019, Yuan et al. proposed a new modified three-term conjugate gradient algorithm based on the modified Armijo line search technique [27]. After that in 2020, they designed a modified conjugate gradient method with a sufficient descent property and a trust region property [28]. The authors in [29] proposed the modified Hestenes–Stiefe (HS) conjugate gradient algorithm in order to solve large-scale complex smooth and nonsmooth optimization problems.

In 2020, Yuan et al. further proposed the PRP method and established the global convergence proof with the modified weak Wolfe–Powell line search technique for nonconvex functions. The numerical results demonstrated the competitiveness of the method compared to the existing methods. The engineering Muskingum model and image restoration problems were used to determine the interesting aspects of the given algorithm [30]. The generalized conjugate gradient algorithms were studied for solving large-scale unconstrained optimization problems within the real world applications, and two open problems were formulated [31–33].

The preliminary experimental optimization results using q-calculus were first shown in the field of global optimization [34]. The idea of this work is utilized in the stochastic q-neurons which are based on activation functions converted into the corresponding stochastic q-activation functions for improving the effectiveness of the algorithm. The q-gradient concept is further utilized in the least mean square algorithm to inherit the fast convergence property with less dependency on the eigenvalue of the input correlation matrix [35]. A modified least mean algorithm using q-calculus was also proposed which automatically adapted the learning rate with respect to the error and was shown to have fast convergence [36]. In optimization, the q-calculus was employed in Newton, modified Newton, BFGS, and limited memory BFGS methods for solving unconstrained nonlinear optimization problems [19, 37–40] with the least number of iterations. In the field of conjugate gradient methods, the q-analogue of the Fletcher–Reeves method was developed [41] to optimize unimodal and multimodal functions, and the Gaussian perturbations were used in some iterations to ensure the convergence globally in the probabilistic sense only.

In this paper, we propose a q-variant of PRP method, called q-PRP, with the sufficient descent property independent of the line searches and convexity assumption of the objective function. Under a condition on the q-gradient of the objective function and some other appropriate conditions, the proposed method is globally convergent. The numerical experiments are conducted to show the effectiveness of the q-PRP algorithm. For a set of given test functions with different starting points, it was able to escape from many local minima to reach global minima due to q-gradient.

The remainder of this paper is organized as follows: In the next section, we present the essential preliminaries. The main results are presented in Sect. 3, and their convergence proofs are given in Sect. 4. The numerical examples of the theoretical results are analyzed in Sect. 5. The paper is then ended with a conclusion and directions for future work.

2 Essential preliminaries

In this section, the principal terms of q-calculus are formed by assuming $0< q<1$, as follows: The q-integer $[n]_{q}$ is defined by

$$ [n]_{q} = \textstyle\begin{cases} \frac{1-q^{n}}{ 1-q},& q\ne 1, \\ n,& q=1, \end{cases} $$

for all $n\in \mathbb{N}$. The q-analogue of $(1+x)_{q}^{n}$ is the polynomial given by

$$ (1+x)_{q}^{n} = \textstyle\begin{cases} 1, & n=0, \\ \prod_{k=0}^{n-1} (1+q^{k}x), & n\geq 1. \end{cases} $$

The derivative of $x^{n}$ with respect to x is given by $[n]_{q}x^{n-1}$. The q-derivative $D_{q}f$ of a function f is given by

$$ D_{q}f(x) = \frac{f(qx)- f(x)}{qx- x}, $$

if $x\in 0$, and $D_{q}f(0)=f'(0)$, provided $f'(0)$ exists. Note that

$$ \lim_{q\to 1}D_{q}f(x)=\lim_{q\to 1} \frac{f(qx)-f(x)}{(q-1)x} = \frac{{\mathrm{d}}f(x)}{{\mathrm{d}} x}, $$

if f is differentiable.

Example 2.1

Let the function $f : \mathbb{R}\to \mathbb{R}$ be such that $f(x)=\ln x$. Then, we have

$$ \biggl( \frac{{\mathrm{d}}}{ {\mathrm{d}} x} \biggr)_{q} \ln x = \frac{ \ln x - \ln ( qx) }{ (1-q)x } = \frac{ \ln \frac{1}{q}}{(1-q)x}. $$

It is obvious that the q-derivative of a function is a linear operator, that is, for any constant a and b, we have [42]

$$ D_{q} \bigl\{ af(x) + bg(x) \bigr\} = aD_{q} f(x) + b D_{q} g(x). $$

Let $f(x)$ be a continuous function on $[a, b]$, where $a, b \in \mathbb{R}$. Then, there exist $\hat{q} \in (0, 1)$ and $x \in (a,b)$ [43] such that

$$ f(b) - f(a) = D_{q} f(x) (b-a), $$

for all $q \in (\hat{q}, 1) \cup (1, \hat{q}^{-1})$. The q-partial derivative of a function $f : \mathbb{R}^{n} \to \mathbb{R}$ at $x\in \mathbb{R}^{n}$ with respect to $x_{i}$, where scalar $q \in (0,1)$, is given as [34]

$$ D_{q, x_{i}} f(x) = \textstyle\begin{cases} \frac{1}{(1-q) x_{i}} [ f ( x_{1}, x_{2},\ldots , x_{i-1}, x_{i}, x_{i+1},\ldots , x_{n} ) \\ \quad {}- f (x_{1}, x_{2},\ldots , x_{i-1}, q x_{i},x_{i+1},\ldots , x_{n} ) ], & x_{i}\ne 0, q\ne 1, \\ \frac{\partial }{\partial x_{i}} f ( x_{1}, x_{2},\ldots , x_{i-1}, 0, x_{i+1},\ldots , x_{n} ),& x_{i}=0, \\ \frac{ \partial }{\partial x_{i}} f ( x_{1}, x_{2},\ldots , x_{i-1}, x_{i}, x_{i+1},\ldots , x_{n} ),& q=1. \end{cases} $$

We now choose the parameter q as a vector, that is,

$$ q=(q_{1},\ldots , q_{i},\ldots , q_{n})^{T} \in \mathbb{R}^{n}. $$

Then, the q-gradient vector [34] of f is

$$ \nabla _{q} f(x)^{T} = \begin{bmatrix} D_{q_{1}, x_{1}} f(x) & \ldots & D_{q_{i}, x_{i}} f(x) & \ldots & D_{q_{n}, x_{n}} f(x) \end{bmatrix} . $$

Let $\{ q^{k}_{i} \}$ be a real sequence defined by

$$ q^{k+1}_{i} = 1- \frac{ q^{k}_{i}}{ (k+1)^{2}}, $$

(1)

for each $i=1,\ldots ,n$, where $k=0,1,2,\ldots $ , and a fixed starting number $0< q^{0}_{i} < 1$. Of course, the sequence $\{q^{k}_{i}\}$ converges to $(1,\ldots , 1)$ as $k \to \infty $ [38]. Thus, the q-gradient reduces to a classical derivative. For the sake of convenience, we represent the q-gradient vector of f at $x^{k}$ as

$$ g_{q^{k}} \bigl( x^{k} \bigr) = \nabla _{q^{k}} f \bigl( x^{k} \bigr). $$

Example 2.2

Consider the function $f : \mathbb{R}^{2} \to \mathbb{R}$ defined by

$$ f(x) = x_{1} x_{2}^{2} + 4x_{1}^{2}. $$

Then, the q-gradient is given as

$$ \nabla _{q^{k}} f(x)^{T} = \begin{bmatrix} 4(1+q^{k}_{1})x_{1}+x_{2}^{2} & x_{1}(1+q^{k}_{1})x_{2} \end{bmatrix} . $$

In the next section, we present the q-PRP method. To improve the efficiency, we utilize the q-gradient in inexact line search methods to generate the step-length which ensures the reduction of the objective function value.

3 On q-Polak–Ribière–Polyak conjugate gradient algorithm

Consider the following unconstrained nonlinear optimization problem:

$$ (P) \quad \min_{x\in \mathbb{R}^{n}} f(x), $$

where $f: \mathbb{R}^{n} \to \mathbb{R}$ is a continuously q-differentiable function. The numerical optimization algorithms of general objective functions differ mainly in generating the search directions. In the conjugate gradient algorithms, a sequence of iterates is generated with a given starting point $x^{0} \in \mathbb{R}^{n}$ by the following schema:

$$ x^{k+1}=x^{k}+p^{k}, \qquad p^{k}=\alpha _{k}d_{q^{k}}^{k}, $$

(2)

for all $k\ge 0$, where $x^{k}$ is the current iterate, $d_{q^{k}}^{k}$ is a descent direction of f at $x^{k}$ and $\alpha _{k}>0$ is the step-length. Note that the descent direction $d_{q^{k}}^{k} = -g_{q^{k}}^{k}$ leads to the q-steepest descent method [34]. In the case $q^{k}$ approaches

$$ (1,1,\ldots , 1)^{T} $$

as $k\to \infty $, the method reduces to the classical steepest descent method [7]. The search direction $d_{q}^{k}$ is guaranteed to have a descent direction due to the following:

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}< 0. $$

(3)

The directions $d_{q^{k}}^{k}$ are generated in the light of classical conjugate direction methods [7, 9, 21, 44, 45] as

$$ d_{ q^{k}}^{k} = \textstyle\begin{cases} -g_{q^{k}}^{k},& k=0, \\ -g_{q^{k}}^{k}+\beta _{k}^{q-\mathrm{PRP}}d_{q^{k-1}}^{k-1},& k \ge 1, \end{cases} $$

(4)

where $\beta _{k}^{q-\mathrm{PRP}}\in \mathbb{R}$ is modified from a scalar quantity $\beta _{k}$ in the PRP method and presented as follows:

$$ \beta _{k}^{q-\mathrm{PRP}} = \frac{ (g_{q^{k}}^{k} )^{T} (g_{q^{k}}^{k}-g_{q^{k-1}}^{k-1} )}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. $$

(5)

Some well-known conjugate gradient methods are available, such as FR (Fletcher–Revees) [7], PRP (Polak–Ribière–Polyak) [9, 21], and HS (Hestenes–Stiefel) [6] conjugate gradient method, respectively. Among these, the PRP method is considered the best in practical computation. In order to guarantee the global convergence, we choose $d_{q^{k}}^{k}$ to satisfy the sufficient descent condition:

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \le - c \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}, $$

(6)

where $c>0$ is a constant. There are several approaches to find the step-length. Among them, the exact line search [46, 47] is time consuming and sometimes difficult to carry out. Therefore, the researchers adopt the approaches of some inexact line search techniques such as Wolfe line search [48], Goldstein line search [49], or Armijo line search with backtracking [50]. The most used line search conditions for determining the step-length are the so-called standard Wolfe line search conditions:

$$ f \bigl( x^{k}+\alpha _{k}d_{q^{k}}^{k} \bigr) \le f\bigl(x^{k}\bigr) + \delta \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q}^{k} $$

(7)

and

$$ g_{q^{k}} \bigl( x^{k} + \alpha _{k}d_{q^{k}}^{k} \bigr)^{T}d_{q^{k}}^{k} \ge \sigma \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}, $$

(8)

where $0<\delta <\sigma <1$. The first condition (7) is called the Armijo condition, which ensures a sufficient reduction of the objective function value, while the second condition (8) is called the curvature condition, which ensures nonacceptance of short step-length. To investigate the global convergence property of the PRP method, a modified Armijo line search method was proposed [51]. For given constants $\mu >0$, $\delta , \rho \in (0, 1)$, the line search aims to find

$$ \alpha _{k}=\max \biggl\{ \rho ^{j} \frac{\mu \lvert (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} \rvert }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} : j = 0, 1,\ldots \biggr\} $$

such that (2) and (4) satisfy

$$ f \bigl(x^{k+1} \bigr) \le f \bigl(x^{k} \bigr) - \delta \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, $$

(9)

and

$$ - C_{1} \bigl\lVert g_{q^{k+1}} \bigl(x^{k+1} \bigr) \bigr\rVert ^{2} \le \bigl( g_{q^{k+1}} \bigl( x^{k+1} \bigr) \bigr)^{T} d_{q^{k+1}}^{k+1} \le -C_{2} \bigl\lVert g_{q^{k+1}} \bigl(x^{k+1} \bigr) \bigr\rVert ^{2}, $$

where $0< C_{2}<1<C_{1}$ are constants. Accordingly, since $\{ f(x^{k})\}_{k\ge 0}$ is a nonincreasing sequence, we have

$$ \sum_{k=0}^{\infty } \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} < \infty . $$

Equivalently,

$$ \lim_{k\to \infty } \alpha _{k} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert =0. $$

(10)

It is worth mentioning that a step-length computed by the standard Wolfe line search conditions (7)–(8) may not be sufficiently close to a minimizer of $(P)$. Instead, the strong Wolfe line search conditions can be used, which consist of (7) and, instead of (8), the following strengthened version:

$$ \bigl\lvert g_{q^{k}} \bigl(x^{k} + \alpha _{k} d_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr\rvert \le -\sigma \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} $$

(11)

is used. From (11), we see that if $\sigma \to \infty $, then the step-length satisfying (7) and (11) tends to be the optimal step-length [2]. Note that appropriate choices for a starting point have a positive effect on computational cost and convergence speed of the algorithm. The modified PRP conjugate gradient-like method introduced by [52] is presented in the context of q-calculus as:

$$\begin{aligned} d_{q^{k}}^{k} = \textstyle\begin{cases} - g_{q^{k}}^{k}, & k=0, \\ -g_{q^{k}}^{k} + \beta _{k}^{q-\mathrm{PRP}} d_{q^{k-1}}^{k-1} - \theta ^{k} ( g_{q^{k}}^{k}-g_{q^{k-1}}^{ k-1} ),& k>0. \end{cases}\displaystyle \end{aligned}$$

(12)

With the q-gradient, we can have a modification of [52] by taking

$$\begin{aligned} \theta ^{k} = \frac{ ( g_{q^{k}}^{k} )^{T} d_{q^{k-1}}^{k-1}}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. \end{aligned}$$

(13)

From (12) and (13) for $k\ge 1$, we obtain

$$ d_{q^{k}}^{k} = -g_{q^{k}}^{k} + \frac{ ( g_{q^{k}}^{k} )^{T} ( g_{q^{k}}^{k}-g_{q^{k-1}}^{k-1} ) }{ \lVert g_{ q^{k-1}}^{k-1} \rVert ^{2} } d_{q^{k-1}}^{k-1} - \frac{ ( g_{q^{k}}^{k} )^{T} d_{q^{k-1}}^{k-1} }{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl( g_{q^{k}}^{k} - g_{q^{k-1}}^{ k-1} \bigr), $$

that is,

$$\begin{aligned} \bigl( d_{q^{k}}^{k} \bigr)^{T} g_{q^{k}}^{k} &= - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}. \end{aligned}$$

(14)

This implies that $d_{q^{k}}^{k}$ provides a q-descent direction of the objective function at $x^{k}$. It is worth mentioning that if exact line search [53] is used to compute the step-length $\alpha _{k}$, then $\theta ^{k}=0$, and $q^{k}\to (1, 1,\ldots , 1)^{T}$ for $k\to \infty $. Then, finally, the q-PRP method reduces to the classical PRP method.

The number of steps taken by the algorithm to a large extent determines the number of iterations which always differs from one problem to another. Thus, we present the following Algorithm 1 to solve the problem $(P)$.

4 Global convergence

In this section, we prove the global convergence of Algorithm 1 under the following assumptions.

Assumption 4.1

The level set

$$ \Omega = \bigl\{ x \in \mathbb{R}^{n} : f(x) \le f \bigl(x^{0}\bigr) \bigr\} , $$

is bounded, where $x^{0}$ is a starting point.

Assumption 4.2

In some neighborhood N of Ω, f has a continuous q-derivative and there exists a constant $L>0$ such that

$$ \bigl\lVert g_{q}(x) - g_{q}(y) \bigr\rVert \le L \lVert x-y \rVert , $$

(15)

for $x, y \in N$.

Since $\{f(x)\}$ is nonincreasing, it is clear that the sequence $\{ x^{k} \}$ generated by Algorithm 1 is contained in Ω. From Assumptions 4.1 and 4.2, there is a constant $\eta >0$ such that

$$ \bigl\lVert g_{q^{k}}(x) \bigr\rVert \le \eta , $$

(16)

for each $x\in \Omega $. Based on Assumption 4.1, there exists a positive constant $\mathcal{B}$ such that $\lVert x\rVert \le B$, for all $x\in \Omega $. Without any specification, let $\{x^{k}\}$ and $\{d_{q^{k}}^{k}\}$ be the iterative sequence and q-descent direction sequence generated by Algorithm 1. To this point, we present the following lemma.

Lemma 4.1

If there exists a constant $\epsilon >0$, and $\{q^{k}\}$ generated by (1) is such that

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert \ge \epsilon , $$

(17)

for all k, then there exists a constant $\mathcal{M} > 0$ such that the q-descent direction satisfies

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \mathcal{M}, $$

(18)

for all k.

Proof

From (12) and (16) for $k\ge 1$, we obtain

$$ d_{q^{k}}^{k} = -g_{q^{k}}^{k} + \frac{ ( g_{q^{k}} )^{k} ( g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} )}{ \lVert g_{q^{k-1}}^{ k-1} \rVert ^{2}} d_{q^{k-1}}^{k-1} - \frac{ ( g_{q^{k}} )^{T} d_{q^{k-1}}^{k-1}}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl( g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} \bigr). $$

Taking the norm of both sides of the above equation and using (16), we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \eta + 2 \eta \frac{ \lVert g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} \rVert \lVert d_{q^{k-1}}^{k-1} \rVert }{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. $$

From Assumption 4.2 and (17), we have

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \eta + 2 \eta \frac{ L\alpha _{k-1} \lVert d_{q^{k-1}}^{k-1} \rVert }{ \epsilon ^{2}} \bigl\lVert d_{q^{k-1}}^{ k-1} \bigr\rVert . $$

(19)

From (10), $\alpha _{k-1}d_{q^{k-1}}^{k-1}\to 0$ and since $\{q^{k}\}$ approaches $(1,\ldots , 1)^{T}$ as $k\to \infty $, there exist a constant $s\in (0, 1)$ and an integer $k_{0}$ such that the following inequality holds for all $k\ge k_{0}$:

$$ 2 \eta \frac{ L\alpha _{k-1}}{ \epsilon ^{2}} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert \le s. $$

From (19), we get for any $k>k_{0}$,

$$\begin{aligned} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert &\le \eta + s \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert \\ &\le \eta ( 1+s) +s^{2} \bigl\lVert d_{q^{k-2}}^{k-2} \bigr\rVert \\ & \quad \vdots \\ &\le \eta \bigl( 1 + s + s^{2} + \cdots + s^{k-k_{0}-1} \bigr) + s^{k-k_{0}} \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert . \end{aligned}$$

For k sufficiently large with $s\in (0, 1)$, the second term of the above inequality can satisfy

$$ s^{k-k_{0}} \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert < \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert . $$

Thus, we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert < \frac{\eta }{ 1 - s} + \bigl\lVert d_{ q^{k_{0}}}^{k_{0}} \bigr\rVert . $$

Choosing

$$ \mathcal{M} = \max \biggl\{ \bigl\lVert d_{q^{1}}^{1} \bigr\rVert , \bigl\lVert d_{q^{2}}^{2} \bigr\rVert ,\ldots , \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert , \frac{\eta }{1-s}+ \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert \biggr\} , $$

thus we get (18). □

We now present that the modified q-PRP method with modified Armijo-type line search introduced by [51] due to the q-gradient is globally convergent.

Theorem 4.2

Assume that Assumptions 4.1and 4.2hold, then Algorithm 1 generates an infinite sequence $\{x^{k}\}$ such that

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

(20)

Proof

For the sake of obtaining a contradiction, we suppose that the given conclusion is not true. Then, there exists a constant $\epsilon >0$ such that

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert \ge \epsilon , $$

(21)

for all k. If $\liminf_{ k \to \infty } \alpha _{k} > 0$, then from (10) and (14), we get

$$ \liminf_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

This contradicts the assumption (22). Suppose that $\liminf_{k\to \infty }\alpha _{k}=0$, that is, there is an infinite index set $\mathcal{K}$ such that

$$ \lim_{\substack{k \to \infty ,\\ k \in \mathcal{K}}} \alpha _{k} = 0. $$

Suppose that step-9 of Algorithm 1 utilizes (9) to generate the step-length. When $k\in \mathcal{K}$ is sufficiently large, and $\rho ^{-1}\alpha _{k}$ for $\rho \in (0 , 1)$ [52] does not satisfy (9), then we must have

$$ f \bigl( x^{k}+\rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - f \bigl( x^{k} \bigr) > - \delta \rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$

(22)

From the q-mean value theorem, there is $\gamma _{k} \in (0,1)$ such that

$$ f \bigl( x^{k}+\rho ^{-1} \alpha _{k} d_{q^{k}}^{k} \bigr) - f \bigl(x^{k} \bigr) = \rho ^{-1} \alpha _{k}g_{q^{k}} \bigl( x^{k} + \gamma _{k} \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}, $$

that is,

$$\begin{aligned} f \bigl( x^{k}+\rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) -f \bigl( x^{k} \bigr) &= \rho ^{-1} \alpha _{k} \bigl(g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ & \quad{} + \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}} \bigl( x^{k} + \gamma _{k} \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - g_{q^{k}} \bigl(x^{k} \bigr) \bigr)^{T} d_{q^{k}}^{k}. \end{aligned}$$

From Lemma 4.1 and Assumption 4.2, we have

$$ f \bigl( x^{k} + \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - f \bigl(x^{k} \bigr) \le \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} + L\rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, $$

(23)

where $L>0$. From (22) and (23),

$$ - \delta \rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} \le \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} + L \rho ^{-2} \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$

Using (14), we get

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \le \alpha _{k} ( \delta +L) \rho ^{-1} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$

Since $\{d_{q^{k}}^{k}\}$ is bounded and $\lim_{k\in \mathcal{K}, k \to \infty } \alpha _{k}=0$,

$$ \lim_{\substack{k \to \infty ,\\ k \in \mathcal{K}}} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

This gives a contradiction. The proof is complete. □

The following important result introduced by Zoutendijk [54] can be expressed in the sense of q-calculus as follows:

Lemma 4.3

Suppose that Assumptions 4.1and 4.2hold. Consider the iteration methods (2) and (4), where $d_{q^{k}}^{k}$ satisfies (3) and $\alpha _{k}$ is obtained by standard Wolfe line search conditions (7)–(8) and strong Wolfe line search conditions (7) and (11). Then,

$$ \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< + \infty . $$

(24)

We now present the convergence analysis of Algorithm 1 with standard Wolfe conditions, which is a modification of [55, 56] in the sense of q-calculus. In this case, the step-lengths are bounded below by a positive constant.

Theorem 4.4

Assume that the line search fulfills the standard Wolfe conditions (7)–(8). If there exists a positive constant $\alpha _{0}\in (0,1]$ such that $\alpha _{k}\ge \alpha _{0}$ for all $k\ge 0$, then

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

(25)

Proof

From (3) and the first Wolfe condition (7), we have

$$\begin{aligned} f \bigl( x^{k+1} \bigr) &\le f \bigl(x^{k} \bigr) + \delta \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ &\le f \bigl( x^{k} \bigr)\le f \bigl( x^{k-1} \bigr)\le \cdots \le f \bigl( x^{0} \bigr). \end{aligned}$$

This means that the sequence $\{f(x^{k})\}_{k\ge 0}$ is bounded. From the second standard Wolfe condition (8) and Assumption 4.2, we get

$$\begin{aligned} - ( 1 - \sigma ) \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} &\le \bigl( g_{q^{k+1}}^{k+1} - g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ &\le \bigl\lVert g_{q^{k+1}}^{k+1} - g_{q^{k}}^{k} \bigr\rVert \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \alpha _{k} L \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, \end{aligned}$$

that is,

$$ - \frac{ (1-\sigma ) (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k}}{ L \lVert d_{q^{k}}^{k} \rVert ^{2}} \le \alpha _{k}. $$

Post-multiplying both sides by $\delta (g_{q^{k}}^{k})^{T}d_{q^{k}}^{k}$, we get

$$ - \frac{ ( 1 - \sigma ) \delta ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ L \lVert d_{q^{k}}^{k} \rVert ^{2}} \le \alpha _{k} \delta \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}. $$

From the first standard Wolfe condition (7), we have

$$ \frac{ \delta ( 1 - \sigma )}{ L } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \le f \bigl(x^{k} \bigr) - f \bigl(x^{k+1} \bigr). $$

Since $\{f(x^{k})\}_{k\ge 0}$ is bounded,

$$\begin{aligned} \begin{aligned} \frac{ \delta (1 - \sigma )}{L} \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} &\le f \bigl(x^{0} \bigr) - f \bigl(x^{1} \bigr) + \bigl( f \bigl(x^{1} \bigr) - f \bigl( x^{2} \bigr) \bigr) + \cdots \\ &= f \bigl(x^{0} \bigr) - \lim_{ k \to \infty } f \bigl(x^{k} \bigr)< +\infty . \end{aligned} \end{aligned}$$

Thus, Zoutendijk condition (24) holds, that is,

$$ \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< + \infty . $$

(26)

From Assumption 4.1, there exists a constant $\mathcal{B}$ such that

$$ \bigl\lVert p^{k} \bigr\rVert = \bigl\lVert \alpha _{k} d_{q^{k}}^{k} \bigr\rVert \le \mathcal{B}. $$

Since $\alpha _{k}\ge \alpha _{0}$, we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \frac{\mathcal{B}}{\alpha _{0}}. $$

This, together with (6) and (26), leads to (25). □

We present the following theorem which is a modification of that in [57] using the q-gradient for q-PRP method with strong Wolfe conditions.

Theorem 4.5

Suppose that $x^{0}$ is a starting point and Assumptions 4.1and 4.2hold. Let $\{x^{k}\}$ be the sequence generated by Algorithm 1. If $\beta _{k}^{q-\mathrm{PRP}}$ is such that the step-length $\alpha _{k}$ satisfies the strong Wolfe conditions (7) and (11), then either

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0 \quad \textit{or}\quad \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< \infty . $$

(27)

Proof

From (4), for all $k\ge 1$, we have

$$ d_{q^{k}}^{k} + g_{q^{k}}^{k} = \beta _{k}^{q-\mathrm{PRP}} d_{q^{k-1}}^{k-1}. $$

Squaring both sides of the above equation, we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} + \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + 2 \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} = \bigl( \beta _{k}^{ q - \mathrm{PRP}} \bigr)^{2} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert ^{2}. $$

Since $d_{q^{k}}^{k}$ satisfies the descent condition $(g_{q^{k}}^{k})^{T} d_{q^{k}}^{k} < 0$,

$$\begin{aligned} \bigl\lVert d_{q^{k}}^{ k} \bigr\rVert ^{2} \ge - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + \bigl( \beta _{k}^{q-\mathrm{PRP}} \bigr)^{2} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert ^{2}. \end{aligned}$$

(28)

Pre-multiplying (4) for $k\ge 1$ by $g_{q^{k}}^{k}$, we get

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} = - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + \beta _{k}^{ q - \mathrm{PRP}} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}. $$

(29)

From (29) and the second strong Wolfe condition (11), one obtains

$$ \bigl\lvert g_{q^{k}}^{k} d_{q^{k}}^{k} \bigr\rvert + \sigma \bigl\lvert \beta _{k}^{q-\mathrm{PRP}} \bigr\rvert \bigl\lvert \bigl( g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr\rvert \ge \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}. $$

(30)

From the inequality

$$ ( a + \sigma b)^{2} \le \bigl( 1 + \sigma ^{2} \bigr) \bigl( a^{2}+b^{2} \bigr), $$

for all a, b, $\sigma \ge 0$, with

$$ a = \bigl\lvert \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr\rvert $$

and

$$ b = \bigl\lvert \beta _{k}^{q- \mathrm{PRP}} \bigr\rvert \bigl\lvert \bigl( g_{q^{k-1}}^{ k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr\rvert , $$

we can express (30) as

$$\begin{aligned} \bigl( \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \bigl( \beta _{k}^{q-\mathrm{PRP}} \bigr)^{2} \bigl( \bigl( g_{q^{k-1} }^{ k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \ge c_{1} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{4}, \end{aligned}$$

(31)

where $c_{1} = \frac{1}{(1+\sigma ^{2})}$. Note that

$$\begin{aligned} &\frac{ ( (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \\ &\quad = \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ \bigl( \bigl(g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \frac{ \lVert d_{q^{k}}^{k} \rVert ^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2} } \bigl( \bigl(g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \biggr]. \end{aligned}$$

From (28) one gets

$$\begin{aligned} &\frac{ ( (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert } \\ & \quad \ge \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ \bigl( \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \bigl( \beta _{k}^{q - \mathrm{PRP}} \bigr)^{2} \bigl( \bigl( g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \\ &\quad \quad{} - \frac{ ( g_{q^{k-1}}^{k-1} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2} } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \biggr]. \end{aligned}$$

Using (31), we obtain

$$ \begin{aligned} &\frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \\ &\quad \ge \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ c_{1} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{4} - \frac{ ( g_{q^{k}}^{k-1} d_{q^{k-1}}^{k-1} )^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \biggr]. \end{aligned} $$

(32)

If (27) is not true, then from the Zoutendijk condition (24) with (32) we obtain the following inequality:

$$\begin{aligned} \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \ge \frac{c}{2} \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \end{aligned}$$

(33)

which holds for k sufficiently large with $q^{k}$ approaching $(1,\ldots ,1)^{T}$. From (32) and (33), one immediately recovers (30). □

The following lemma immediately follows from the above convergence theorem.

Lemma 4.6

Suppose that Assumptions 4.1and 4.2hold and, from Algorithm 1, the step-length is determined using strong Wolfe conditions. If

$$ \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{r}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} = + \infty , $$

(34)

for any $r\in [0, 4]$, then the method converges in the sense that (27) is true.

Proof

If (27) is not true, then from Theorem 4.5, it follows that

$$\begin{aligned} \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} < + \infty . \end{aligned}$$

(35)

Because $\lVert g_{q^{k}}^{k} \rVert $ is bounded away from zero and $r\in [0, 4]$, it is easy to see that (35) contradicts (34). Therefore, the lemma is true. □

The above inequality shows that if a conjugate gradient method fails to converge, then the length of the search direction will diverge to infinity. Observe that in the above developments, the sufficient descent condition is assumed. This lemma is very useful for proving the global convergence of some conjugate gradient methods without assuming the sufficient descent condition.

5 Numerical illustration

In this section, we investigate the computational efficiency of Algorithm 1 using standard Wolfe conditions (7) and (8), and strong Wolfe conditions (7) and (11), respectively, in contrast to the classical PRP method under the same two conditions.

All codes of Algorithm 1 and classical PRP method are written in R version 3.6.1 installed on a laptop having Intel(R) Core(TM) i3-4005U, 1.70 GHz CPU processor and 4 GB RAM. The iteration was set to terminate if it exceeded 1000 or the gradient of a function was within 10⁻⁶.

Example 5.1

Consider a function (Mishra 6) [58] $f : \mathbb{R}^{2}\to \mathbb{R}$ given by

$$\begin{aligned} f(x) &= -\log \bigl( \sin ^{2} \bigl( ( \cos x_{1} + \cos x_{2} )^{2} \bigr)- \cos ^{2} \bigl( ( \sin x_{1} + \sin x_{2} )^{2} \bigr) \bigr) \\ & \quad{} + 0.1 \bigl[ (x_{1}-1)^{2} + (x_{2}-1)^{2} \bigr]. \end{aligned}$$

We find the q-gradient of the above function at the point

$$ x=(2.88 , 1.82)^{T}, $$

with the starting parameter value

$$ q^{1}=(0.32 , 0.32)^{T}. $$

We run the q-gradient algorithm [39] for $k=1,\ldots ,50$ iterations so that $q^{50}$ approaches

$$ (0.999607921, 0.999607921)^{T}, $$

and in the 50th iteration we get the q-gradient

$$ g_{q^{50}}^{50}=(-0.41348771, -0.63704079)^{T}. $$

The complete computational details are given in Table 1 which is depicted graphically through Fig. 1. Note that Fig. 2 provides the three-dimensional view of Mishra 6 test function.

Table 1 q-Gradient of Example 5.1

Full size table

Example 5.2

Consider a function $f : \mathbb{R}^{2} \to \mathbb{R}$ given by

$$ f(x_{1}, x_{2}) =(1-x_{1})^{2}+100 \bigl(x_{2}-x_{1}^{2} \bigr)^{2}. $$

The Rosenbrock function, also called Rosenbrock’s valley or banana function, is a nonconvex, unimodal, and nonseparable function. Finding its global minimum numerically is difficult. It has only one global minimizer located at the point

$$ x^{*}=(1 , 1)^{T}, $$

with the search range $[-100, 100]$ for $x_{1}$ and $x_{2}$. For performing the experiment, we first generated 37 different starting points from the interval $[-5, 5]$ for the above Rosenbrock function. The numerical results are shown in Table 2 for Algorithm 1 and Table 3 for the classical PRP Algorithm. From these tables, we observe that the number of iterations $(NI)$ is smaller in the case of Algorithm 1 in comparison to the classical PRP method. The meanings of columns of both tables are well-defined. Figure 3 shows the results of comparisons in the number of iterations.

Table 2 Numerical results of Example 5.2 using Algorithm 1

Full size table

Table 3 Numerical results of Example 5.2 using classical PRP Algorithm

Full size table

Example 5.3

Consider the following Rastrigin function $f : \mathbb{R}^{2} \to \mathbb{R}$, that is,

$$ f(x_{1}, x_{2}) = 20 + x_{1}^{2}+x_{2}^{2} - 10(\cos 2\pi x_{1}) + \cos 2\pi x_{2}. $$

The Rastrigin test function is a nonconvex, multimodal, and separable function, which has several local minimizers arranged in a regular lattice, but it has only one global minimizer located at the point

$$ x^{*}=(0, 0)^{T}. $$

The search range for the Rastrigin function is $[-5.12, 5.12]$ in both $x_{1}$ and $x_{2}$. This function poses a fairly difficult problem due to its large search space and its large number of local minima. With a chosen starting point $(0.2, 0.2)^{T}$, we minimize this function through Algorithm 1 using strong Wolfe conditions. Note that q-PRP terminates in 5 iterations as

$$ g_{q^{5}}^{5}=(0.0001900418 , 0.0001900418)^{T}, $$

with step-length $\alpha _{5}=0.252244535$. Thus, we get the global minimizer

$$ x^{*} =x^{5}= (-2.05643E-08 , -2.05643E-08)^{T}, $$

with minimum function value

$$ f\bigl(x^{*}\bigr) = 1.669775E-13, $$

while running the classical PRP method using strong Wolfe conditions from the same chosen starting point, it terminates in 5 iterations with

$$ g_{q^{5}}^{5}=(1.776357E-10 , 2.66453E-10)^{T}, $$

$\alpha _{5}=0.002547382$, but fails to achieve the global minimizer as

$$ x^{*}=x^{5}= (-1.990911 , -1.990911)^{T}, $$

and

$$ f\bigl(x^{*}\bigr)=7.967698, $$

which are not true. This is one of the advantages of using the q-gradient in our proposed method over the classical method.

We now execute Algorithm 1 on a set of test functions taken from the CUTEr library [59] with 51 different starting points under standard and strong Wolfe conditions, respectively. Note that direction $d_{q}^{k}$ generated by the proposed method is the q-descent direction due to the involvement of the q-gradient. Tables 4 and 5 list the numerical results of Algorithm 1 for 51 different starting points on a set of test problems, and Figs. 4 and 5 show this comparison graphically for both q-PRP and classical PRP methods under standard and strong Wolfe conditions, respectively. We conclude that our method is better than the classical method with the smaller number of iterations for the selected set of test problems.

Table 4 Numerical results using Algorithm 1

Full size table

Table 5 Numerical results using classical PRP

Full size table

6 Conclusion and future work

This paper proposed the q-PRP conjugate gradient method, which is an improvement of classical PRP conjugate gradient methods. The global convergence of the proposed method is established under the standard and strong Wolfe line searches. The effectiveness of the proposed method has been shown by some numerical examples. We find that the proposed method due to the q-gradient converges fast for a set of test problems with different starting points. The inclusion of q-calculus in other conjugate gradient methods deserves further investigation.

Availability of data and materials

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Mishra, S.K., Ram, B.: Conjugate gradient methods. In: Introduction to Unconstrained Optimization with R, pp. 211–244. Springer, Singapore (2019)
Chapter MATH Google Scholar
Andrei, N.: Nonlinear Conjugate Gradient Methods for Unconstrained Optimization. Springer, Berlin (2020)
Book MATH Google Scholar
Li, Y., Chen, W., Zhou, H., Yang, L.: Conjugate gradient method with pseudospectral collocation scheme for optimal rocket landing guidance. Aerosp. Sci. Technol. 104, 105999 (2020)
Article Google Scholar
Liu, J., Du, S., Chen, Y.: A sufficient descent nonlinear conjugate gradient method for solving m-tensor equations. J. Comput. Appl. Math. 371, 112709 (2020)
Article MathSciNet MATH Google Scholar
Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm for large-scale nonlinear equations and image restoration problems. Appl. Numer. Math. 147, 129–141 (2020)
Article MathSciNet MATH Google Scholar
Hestenes, M.R., Stiefel, E., et al.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952)
Article MathSciNet MATH Google Scholar
Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154 (1964)
MathSciNet MATH Google Scholar
Mishra, S.K., Ram, B.: Steepest descent method. In: Introduction to Unconstrained Optimization with R, pp. 131–173. Springer, Singapore (2019)
Chapter MATH Google Scholar
Polak, E., Ribiere, G.: Note sur la convergence de méthodes de directions conjuguées. ESAIM: Math. Model. Numer. Anal. 3(R1), 35–43 (1969)
MATH Google Scholar
Jackson, F.H.: On q-functions and a certain difference operator. Earth Environ. Sci. Trans. R. Soc. Edinb. 46(2), 253–281 (1909)
Article Google Scholar
Jackson, D.O., Fukuda, T., Dunn, O., Majors, E.: On q-definite integrals. Q. J. Pure Appl. Math. 41, 193–203 (1910)
Google Scholar
Ernst, T.: A method for q-calculus. J. Nonlinear Math. Phys. 10(4), 487–525 (2003)
Article MathSciNet MATH Google Scholar
Awan, M.U., Talib, S., Kashuri, A., Noor, M.A., Chu, Y.-M.: Estimates of quantum bounds pertaining to new q-integral identity with applications. Adv. Differ. Equ. 2020, 424 (2020)
Article MathSciNet Google Scholar
Samei, M.E.: Existence of solutions for a system of singular sum fractional q-differential equations via quantum calculus. Adv. Differ. Equ. 2020, 23 (2020). https://doi.org/10.1186/s13662-019-2480-y
Article MathSciNet Google Scholar
Liang, S., Samei, M.E.: New approach to solutions of a class of singular fractional q-differential problem via quantum calculus. Adv. Differ. Equ. 2020, 14 (2020). https://doi.org/10.1186/s13662-019-2489-2
Article MathSciNet Google Scholar
Ahmadian, A., Rezapour, S., Salahshour, S., Samei, M.E.: Solutions of sum-type singular fractional q-integro-differential equation with m-point boundary value problem using quantum calculus. Math. Methods Appl. Sci. 43(15), 8980–9004 (2020). https://doi.org/10.1002/mma.6591
Article MathSciNet MATH Google Scholar
Samei, M.E., Hedayati, H., Rezapour, S.: Existence results for a fraction hybrid differential inclusion with Caputo–Hadamard type fractional derivative. Adv. Differ. Equ. 2019, 163 (2019). https://doi.org/10.1186/s13662-019-2090-8
Article MathSciNet MATH Google Scholar
Samei, M.E., Ranjbar, G.K., Hedayati, V.: Existence of solutions for equations and inclusions of multiterm fractional q-integro-differential with nonseparated and initial boundary conditions. J. Inequal. Appl. 2019, 273 (2019). https://doi.org/10.1186/s13660-019-2224-2
Article MathSciNet Google Scholar
Mishra, S.K., Panda, G., Ansary, M.A.T., Ram, B.: On q-Newton’s method for unconstrained multiobjective optimization problems. J. Appl. Math. Comput. 63, 391–410 (2020)
Article MathSciNet Google Scholar
Lai, K.K., Mishra, S.K., Ram, B.: On q-quasi-Newton’s method for unconstrained multiobjective optimization problems. Mathematics 8(4), 616 (2020)
Article Google Scholar
Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 9(4), 94–112 (1969)
Article MATH Google Scholar
Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 9(4), 94–112 (1969)
Article MATH Google Scholar
Yuan, Y.: Numerical Methods for Nonlinear Programming. Shanghai Sci. Technol., Shanghai (1993)
Google Scholar
Powell, M.J.: Nonconvex minimization calculations and the conjugate gradient method. In: Numerical Analysis, pp. 122–141. Springer, Berlin (1984)
Chapter Google Scholar
Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1992)
Article MathSciNet MATH Google Scholar
Powell, M.J.: Convergence properties of algorithms for nonlinear optimization. SIAM Rev. 28(4), 487–500 (1986)
Article MathSciNet MATH Google Scholar
Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm and its application in large-scale optimization problems and image restoration. J. Inequal. Appl. 2019, 247 (2019) https://doi.org/10.1186/s13660-019-2192-6
Article MathSciNet Google Scholar
Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm for large-scale nonlinear equations and image restoration problems. Appl. Numer. Math. 147, 129–141 (2020). https://doi.org/10.1016/j.apnum.2019.08.022
Article MathSciNet MATH Google Scholar
Hu, W., Wu, J., Yuan, G.: Some modified Hestenes–Stiefel conjugate gradient algorithms with application in image restoration. Appl. Numer. Math. 158, 360–376 (2020). https://doi.org/10.1016/j.apnum.2020.08.009
Article MathSciNet MATH Google Scholar
Yuan, G., Lu, J., Wang, Z.: The PRP conjugate gradient algorithm with a modified WWP line search and its application in the image restoration problems. Appl. Numer. Math. 152, 1–11 (2020). https://doi.org/10.1016/j.apnum.2020.01.019
Article MathSciNet MATH Google Scholar
Yuan, G., Wei, Z., Yang, Y.: The global convergence of the Polak–Ribière–Polyak conjugate gradient algorithm under inexact line search for nonconvex functions. J. Comput. Appl. Math. 362, 262–275 (2019)
Article MathSciNet MATH Google Scholar
Yuan, G., Wang, X., Zhou, S.: The projection technique for two open problems of unconstrained optimization problems. J. Optim. Theory Appl. 186, 590–619 (2020)
Article MathSciNet MATH Google Scholar
Zhang, M., Zhou, Y., Wang, S.: A modified nonlinear conjugate gradient method with the Armijo line search and its application. Math. Probl. Eng. 2020, 6210965 (2020)
MathSciNet Google Scholar
Soterroni, A.C., Galski, R.L., Ramos, F.M.: The q-gradient vector for unconstrained continuous optimization problems. In: Operations Research Proceedings 2010, pp. 365–370. Springer, Berlin (2011)
Chapter Google Scholar
Sadiq, A., Usman, M., Khan, S., Naseem, I., Moinuddin, M., Al-Saggaf, U.M.: q-LMF: quantum calculus-based least mean fourth algorithm. In: Fourth International Congress on Information and Communication Technology, pp. 303–311. Springer, Berlin (2020)
Chapter Google Scholar
Sadiq, A., Khan, S., Naseem, I., Togneri, R., Bennamoun, M.: Enhanced q-least mean square. Circuits Syst. Signal Process. 38(10), 4817–4839 (2019)
Article Google Scholar
Chakraborty, S.K., Panda, G.: q-line search scheme for optimization problem. arXiv preprint, arXiv:1702.01518 (2017)
Chakraborty, S.K., Panda, G.: Newton like line search method using q-calculus. In: International Conference on Mathematics and Computing, pp. 196–208. Springer, Berlin (2017)
Google Scholar
Lai, K.K., Mishra, S.K., Panda, G., Chakraborty, S.K., Samei, M.E., Ram, B.: A limited memory q-BFGS algorithm for unconstrained optimization problems. J. Appl. Math. Comput. (2020). https://doi.org/10.1007/s12190-020-01432-6
Article Google Scholar
Lai, K.K., Mishra, S.K., Panda, G., Ansary, M.A.T., Ram, B.: On q-steepest descent method for unconstrained multiobjective optimization problems. AIMS Math. 5(6), 5521–5540 (2020)
Article MathSciNet Google Scholar
Gouvêa, É.J., Regis, R.G., Soterroni, A.C., Scarabello, M.C., Ramos, F.M.: Global optimization using q-gradients. Eur. J. Oper. Res. 251(3), 727–738 (2016)
Article MathSciNet MATH Google Scholar
Aral, A., Gupta, V., Agarwal, R.P., et al.: Applications of q-Calculus in Operator Theory. Springer, New York (2013)
Book MATH Google Scholar
Rajković, P., Stanković, M., Marinković, S.D.: Mean value theorems in g-calculus. Mat. Vesn. 54(3–4), 171–178 (2002)
MathSciNet MATH Google Scholar
Dai, Y.-H., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)
Article MathSciNet MATH Google Scholar
Fletcher, R.: Practical Methods of Optimization, vol. 80, 4. Wiley, New York (1987)
MATH Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006)
MATH Google Scholar
Mishra, S.K., Ram, B.: Introduction to Unconstrained Optimization with R. Springer, Singapore (2019)
Book MATH Google Scholar
Wolfe, P.: Convergence conditions for ascent methods, II: some corrections. SIAM Rev. 13(2), 185–188 (1971)
Article MathSciNet MATH Google Scholar
Goldstein, A.A.: On steepest descent. J. Soc. Ind. Appl. Math., A, on Control 3(1), 147–151 (1965)
MathSciNet MATH Google Scholar
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)
Article MathSciNet MATH Google Scholar
Grippo, L., Lucidi, S.: A globally convergent version of the Polak–Ribiere conjugate gradient method. Math. Program. 78(3), 375–391 (1997)
Article MathSciNet MATH Google Scholar
Zhang, L., Zhou, W., Li, D.-H.: A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26(4), 629–640 (2006)
Article MathSciNet MATH Google Scholar
Mishra, S.K., Ram, B.: One-dimensional optimization methods. In: Introduction to Unconstrained Optimization with R, pp. 85–130. Springer, Boston (2019)
Chapter MATH Google Scholar
Zoutendijk, G.: Nonlinear programming, computational methods. In: Integer and Nonlinear Programming, pp. 37–86 (1970)
MATH Google Scholar
Yuan, G.: Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems. Optim. Lett. 3(1), 11–21 (2009)
Article MathSciNet MATH Google Scholar
Aminifard, Z., Babaie-Kafaki, S.: A modified descent Polak–Ribiére–Polyak conjugate gradient method with global convergence property for nonconvex functions. Calcolo 56(2), 16 (2019)
Article MATH Google Scholar
Dai, Y., Han, J., Liu, G., Sun, D., Yin, H., Yuan, Y.-X.: Convergence properties of nonlinear conjugate gradient methods. SIAM J. Optim. 10(2), 345–358 (2000)
Article MathSciNet MATH Google Scholar
Mishra, S.K.: Global optimization by differential evolution and particle swarm methods: evaluation on some benchmark functions. Available at SSRN 933827 (2006)
Gould, N.I.M., Orban, D., Toint, P.L.: CUTEr (and SifDec), a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29, 373–394 (2003)
Article MATH Google Scholar

Download references

Acknowledgements

The first author was supported by the Science and Engineering Research Board (Grant No. DST-SERB-MTR-2018/000121). The third author was supported by Bu-Ali Sina University. The fourth author was supported by the University Grants Commission (IN) (Grant No. UGC-2015-UTT-59235). Constructive comments by the referees are gratefully acknowledged.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Mathematics, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
Shashi Kant Mishra
Department of Mathematics, Sir Gurudas Mahavidyalaya, Kolkata, 700067, India
Suvra Kanti Chakraborty
Department of Mathematics, Bu-Ali Sina University, Hamedan, Iran
Mohammad Esmael Samei
DST-Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
Bhagwat Ram

Authors

Shashi Kant Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Suvra Kanti Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Esmael Samei
View author publications
You can also search for this author in PubMed Google Scholar
Bhagwat Ram
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors declare that the study was realized in collaboration with equal responsibility. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohammad Esmael Samei.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mishra, S.K., Chakraborty, S.K., Samei, M.E. et al. A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems. J Inequal Appl 2021, 25 (2021). https://doi.org/10.1186/s13660-021-02554-6

Download citation

Received: 21 September 2020
Accepted: 14 January 2021
Published: 28 January 2021
DOI: https://doi.org/10.1186/s13660-021-02554-6

A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems

Abstract

Similar content being viewed by others

Three modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property

On q-variant of Dai–Yuan conjugate gradient algorithm for unconstrained optimization problems

A conjugate gradient-based algorithm for large-scale quadratic programming problem with one quadratic constraint

1 Introduction

2 Essential preliminaries

Example 2.1

Example 2.2

3 On q-Polak–Ribière–Polyak conjugate gradient algorithm

4 Global convergence

Assumption 4.1

Assumption 4.2

Lemma 4.1

Proof

Theorem 4.2

Proof

Lemma 4.3

Theorem 4.4

Proof

Theorem 4.5

Proof

Lemma 4.6

Proof

5 Numerical illustration

Example 5.1

Example 5.2

Example 5.3

6 Conclusion and future work

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Consent for publication

Rights and permissions

About this article

Cite this article

Share this article

MSC

Keywords

Search

Navigation