1 Introduction

The conjugate gradient (CG) methods have played an important role in solving nonlinear optimization problems due to their simplicity of iteration and very low memory requirements [1, 2]. Of course, the CG methods are not among the fastest or most robust optimization algorithms for solving nonlinear problems today, but they are very popular among engineers and mathematicians to solve nonlinear optimization problems [35]. The origin of the methods dates back to 1952 when Hestenes and Stiefel introduced a CG method [6] for solving a symmetric positive definite linear system of equations. Further, Fletcher and Reeves modified the same method called FR [7] in the 1960s and developed a conjugate gradient method to solve unconstrained nonlinear optimization problems.

The conjugate gradient methods deflect the steepest descent method [8] by adding to it a positive multiple of the direction used in the previous step. They only require the first-order derivative and overcome the shortcomings of the slow convergence rate of the steepest descent method. By means of conjugacy, the conjugate gradient methods make the steepest descent direction to account for conjugacy and thus enhance the efficiency and reliability of the algorithm. Different conjugate gradient algorithms correspond to different choices of the scalar parameter \(\beta _{k}\) [6, 7, 9]. The parameter \(\beta _{k}\) is selected to minimize a convex quadratic function in a subspace spanned by a set of mutually conjugate descent directions, but the effectiveness of the algorithm depends on the accuracy of the line searches.

Quantum calculus, known as q-calculus, is the study of calculus without limits, where classical mathematical formulas are obtained as q approaches 1. In q-calculus, the classical derivative is replaced by the q-difference operator. Jackson [10, 11] was the first to have some applications of the q-calculus and introduced the q-analogue of the classical derivative and integral operators. Applications of q-calculus play an important role in various fields of mathematics and physics [1220].

In 1969, Polak and Ribière [21] and Polyak [22] proposed a conjugate gradient method independently, later it was called Polak, Ribière, and Polyak (PRP) method. In view of the practical computation, the PRP method performed much better than the FR method for many unconstrained optimization problems because it automatically recovered once a small step length was generated, although the global convergence of the PRP method was proved only for the strictly convex functions [23]. For general nonlinear functions, Powell showed that the PRP method could cycle infinitely without approaching a solution even if the step-length was chosen to be the least positive minimizer of the line search function [24]. To change this unbalanced state, Gilbert and Nocedal [25] considered Powell’s suggestions [26] to modify the PRP method and showed that this modification of the PRP method is globally convergent for exact and inexact line searches.

In 2019, Yuan et al. proposed a new modified three-term conjugate gradient algorithm based on the modified Armijo line search technique [27]. After that in 2020, they designed a modified conjugate gradient method with a sufficient descent property and a trust region property [28]. The authors in [29] proposed the modified Hestenes–Stiefe (HS) conjugate gradient algorithm in order to solve large-scale complex smooth and nonsmooth optimization problems.

In 2020, Yuan et al. further proposed the PRP method and established the global convergence proof with the modified weak Wolfe–Powell line search technique for nonconvex functions. The numerical results demonstrated the competitiveness of the method compared to the existing methods. The engineering Muskingum model and image restoration problems were used to determine the interesting aspects of the given algorithm [30]. The generalized conjugate gradient algorithms were studied for solving large-scale unconstrained optimization problems within the real world applications, and two open problems were formulated [3133].

The preliminary experimental optimization results using q-calculus were first shown in the field of global optimization [34]. The idea of this work is utilized in the stochastic q-neurons which are based on activation functions converted into the corresponding stochastic q-activation functions for improving the effectiveness of the algorithm. The q-gradient concept is further utilized in the least mean square algorithm to inherit the fast convergence property with less dependency on the eigenvalue of the input correlation matrix [35]. A modified least mean algorithm using q-calculus was also proposed which automatically adapted the learning rate with respect to the error and was shown to have fast convergence [36]. In optimization, the q-calculus was employed in Newton, modified Newton, BFGS, and limited memory BFGS methods for solving unconstrained nonlinear optimization problems [19, 3740] with the least number of iterations. In the field of conjugate gradient methods, the q-analogue of the Fletcher–Reeves method was developed [41] to optimize unimodal and multimodal functions, and the Gaussian perturbations were used in some iterations to ensure the convergence globally in the probabilistic sense only.

In this paper, we propose a q-variant of PRP method, called q-PRP, with the sufficient descent property independent of the line searches and convexity assumption of the objective function. Under a condition on the q-gradient of the objective function and some other appropriate conditions, the proposed method is globally convergent. The numerical experiments are conducted to show the effectiveness of the q-PRP algorithm. For a set of given test functions with different starting points, it was able to escape from many local minima to reach global minima due to q-gradient.

The remainder of this paper is organized as follows: In the next section, we present the essential preliminaries. The main results are presented in Sect. 3, and their convergence proofs are given in Sect. 4. The numerical examples of the theoretical results are analyzed in Sect. 5. The paper is then ended with a conclusion and directions for future work.

2 Essential preliminaries

In this section, the principal terms of q-calculus are formed by assuming \(0< q<1\), as follows: The q-integer \([n]_{q}\) is defined by

$$ [n]_{q} = \textstyle\begin{cases} \frac{1-q^{n}}{ 1-q},& q\ne 1, \\ n,& q=1, \end{cases} $$

for all \(n\in \mathbb{N}\). The q-analogue of \((1+x)_{q}^{n}\) is the polynomial given by

$$ (1+x)_{q}^{n} = \textstyle\begin{cases} 1, & n=0, \\ \prod_{k=0}^{n-1} (1+q^{k}x), & n\geq 1. \end{cases} $$

The derivative of \(x^{n}\) with respect to x is given by \([n]_{q}x^{n-1}\). The q-derivative \(D_{q}f\) of a function f is given by

$$ D_{q}f(x) = \frac{f(qx)- f(x)}{qx- x}, $$

if \(x\in 0\), and \(D_{q}f(0)=f'(0)\), provided \(f'(0)\) exists. Note that

$$ \lim_{q\to 1}D_{q}f(x)=\lim_{q\to 1} \frac{f(qx)-f(x)}{(q-1)x} = \frac{{\mathrm{d}}f(x)}{{\mathrm{d}} x}, $$

if f is differentiable.

Example 2.1

Let the function \(f : \mathbb{R}\to \mathbb{R}\) be such that \(f(x)=\ln x\). Then, we have

$$ \biggl( \frac{{\mathrm{d}}}{ {\mathrm{d}} x} \biggr)_{q} \ln x = \frac{ \ln x - \ln ( qx) }{ (1-q)x } = \frac{ \ln \frac{1}{q}}{(1-q)x}. $$

It is obvious that the q-derivative of a function is a linear operator, that is, for any constant a and b, we have [42]

$$ D_{q} \bigl\{ af(x) + bg(x) \bigr\} = aD_{q} f(x) + b D_{q} g(x). $$

Let \(f(x)\) be a continuous function on \([a, b]\), where \(a, b \in \mathbb{R}\). Then, there exist \(\hat{q} \in (0, 1)\) and \(x \in (a,b)\) [43] such that

$$ f(b) - f(a) = D_{q} f(x) (b-a), $$

for all \(q \in (\hat{q}, 1) \cup (1, \hat{q}^{-1})\). The q-partial derivative of a function \(f : \mathbb{R}^{n} \to \mathbb{R}\) at \(x\in \mathbb{R}^{n}\) with respect to \(x_{i}\), where scalar \(q \in (0,1)\), is given as [34]

$$ D_{q, x_{i}} f(x) = \textstyle\begin{cases} \frac{1}{(1-q) x_{i}} [ f ( x_{1}, x_{2},\ldots , x_{i-1}, x_{i}, x_{i+1},\ldots , x_{n} ) \\ \quad {}- f (x_{1}, x_{2},\ldots , x_{i-1}, q x_{i},x_{i+1},\ldots , x_{n} ) ], & x_{i}\ne 0, q\ne 1, \\ \frac{\partial }{\partial x_{i}} f ( x_{1}, x_{2},\ldots , x_{i-1}, 0, x_{i+1},\ldots , x_{n} ),& x_{i}=0, \\ \frac{ \partial }{\partial x_{i}} f ( x_{1}, x_{2},\ldots , x_{i-1}, x_{i}, x_{i+1},\ldots , x_{n} ),& q=1. \end{cases} $$

We now choose the parameter q as a vector, that is,

$$ q=(q_{1},\ldots , q_{i},\ldots , q_{n})^{T} \in \mathbb{R}^{n}. $$

Then, the q-gradient vector [34] of f is

$$ \nabla _{q} f(x)^{T} = \begin{bmatrix} D_{q_{1}, x_{1}} f(x) & \ldots & D_{q_{i}, x_{i}} f(x) & \ldots & D_{q_{n}, x_{n}} f(x) \end{bmatrix} . $$

Let \(\{ q^{k}_{i} \}\) be a real sequence defined by

$$ q^{k+1}_{i} = 1- \frac{ q^{k}_{i}}{ (k+1)^{2}}, $$
(1)

for each \(i=1,\ldots ,n\), where \(k=0,1,2,\ldots \) , and a fixed starting number \(0< q^{0}_{i} < 1\). Of course, the sequence \(\{q^{k}_{i}\}\) converges to \((1,\ldots , 1)\) as \(k \to \infty \) [38]. Thus, the q-gradient reduces to a classical derivative. For the sake of convenience, we represent the q-gradient vector of f at \(x^{k}\) as

$$ g_{q^{k}} \bigl( x^{k} \bigr) = \nabla _{q^{k}} f \bigl( x^{k} \bigr). $$

Example 2.2

Consider the function \(f : \mathbb{R}^{2} \to \mathbb{R}\) defined by

$$ f(x) = x_{1} x_{2}^{2} + 4x_{1}^{2}. $$

Then, the q-gradient is given as

$$ \nabla _{q^{k}} f(x)^{T} = \begin{bmatrix} 4(1+q^{k}_{1})x_{1}+x_{2}^{2} & x_{1}(1+q^{k}_{1})x_{2} \end{bmatrix} . $$

In the next section, we present the q-PRP method. To improve the efficiency, we utilize the q-gradient in inexact line search methods to generate the step-length which ensures the reduction of the objective function value.

3 On q-Polak–Ribière–Polyak conjugate gradient algorithm

Consider the following unconstrained nonlinear optimization problem:

$$ (P) \quad \min_{x\in \mathbb{R}^{n}} f(x), $$

where \(f: \mathbb{R}^{n} \to \mathbb{R}\) is a continuously q-differentiable function. The numerical optimization algorithms of general objective functions differ mainly in generating the search directions. In the conjugate gradient algorithms, a sequence of iterates is generated with a given starting point \(x^{0} \in \mathbb{R}^{n}\) by the following schema:

$$ x^{k+1}=x^{k}+p^{k}, \qquad p^{k}=\alpha _{k}d_{q^{k}}^{k}, $$
(2)

for all \(k\ge 0\), where \(x^{k}\) is the current iterate, \(d_{q^{k}}^{k}\) is a descent direction of f at \(x^{k}\) and \(\alpha _{k}>0\) is the step-length. Note that the descent direction \(d_{q^{k}}^{k} = -g_{q^{k}}^{k}\) leads to the q-steepest descent method [34]. In the case \(q^{k}\) approaches

$$ (1,1,\ldots , 1)^{T} $$

as \(k\to \infty \), the method reduces to the classical steepest descent method [7]. The search direction \(d_{q}^{k}\) is guaranteed to have a descent direction due to the following:

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}< 0. $$
(3)

The directions \(d_{q^{k}}^{k}\) are generated in the light of classical conjugate direction methods [7, 9, 21, 44, 45] as

$$ d_{ q^{k}}^{k} = \textstyle\begin{cases} -g_{q^{k}}^{k},& k=0, \\ -g_{q^{k}}^{k}+\beta _{k}^{q-\mathrm{PRP}}d_{q^{k-1}}^{k-1},& k \ge 1, \end{cases} $$
(4)

where \(\beta _{k}^{q-\mathrm{PRP}}\in \mathbb{R}\) is modified from a scalar quantity \(\beta _{k}\) in the PRP method and presented as follows:

$$ \beta _{k}^{q-\mathrm{PRP}} = \frac{ (g_{q^{k}}^{k} )^{T} (g_{q^{k}}^{k}-g_{q^{k-1}}^{k-1} )}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. $$
(5)

Some well-known conjugate gradient methods are available, such as FR (Fletcher–Revees) [7], PRP (Polak–Ribière–Polyak) [9, 21], and HS (Hestenes–Stiefel) [6] conjugate gradient method, respectively. Among these, the PRP method is considered the best in practical computation. In order to guarantee the global convergence, we choose \(d_{q^{k}}^{k}\) to satisfy the sufficient descent condition:

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \le - c \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}, $$
(6)

where \(c>0\) is a constant. There are several approaches to find the step-length. Among them, the exact line search [46, 47] is time consuming and sometimes difficult to carry out. Therefore, the researchers adopt the approaches of some inexact line search techniques such as Wolfe line search [48], Goldstein line search [49], or Armijo line search with backtracking [50]. The most used line search conditions for determining the step-length are the so-called standard Wolfe line search conditions:

$$ f \bigl( x^{k}+\alpha _{k}d_{q^{k}}^{k} \bigr) \le f\bigl(x^{k}\bigr) + \delta \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q}^{k} $$
(7)

and

$$ g_{q^{k}} \bigl( x^{k} + \alpha _{k}d_{q^{k}}^{k} \bigr)^{T}d_{q^{k}}^{k} \ge \sigma \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}, $$
(8)

where \(0<\delta <\sigma <1\). The first condition (7) is called the Armijo condition, which ensures a sufficient reduction of the objective function value, while the second condition (8) is called the curvature condition, which ensures nonacceptance of short step-length. To investigate the global convergence property of the PRP method, a modified Armijo line search method was proposed [51]. For given constants \(\mu >0\), \(\delta , \rho \in (0, 1)\), the line search aims to find

$$ \alpha _{k}=\max \biggl\{ \rho ^{j} \frac{\mu \lvert (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} \rvert }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} : j = 0, 1,\ldots \biggr\} $$

such that (2) and (4) satisfy

$$ f \bigl(x^{k+1} \bigr) \le f \bigl(x^{k} \bigr) - \delta \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, $$
(9)

and

$$ - C_{1} \bigl\lVert g_{q^{k+1}} \bigl(x^{k+1} \bigr) \bigr\rVert ^{2} \le \bigl( g_{q^{k+1}} \bigl( x^{k+1} \bigr) \bigr)^{T} d_{q^{k+1}}^{k+1} \le -C_{2} \bigl\lVert g_{q^{k+1}} \bigl(x^{k+1} \bigr) \bigr\rVert ^{2}, $$

where \(0< C_{2}<1<C_{1}\) are constants. Accordingly, since \(\{ f(x^{k})\}_{k\ge 0}\) is a nonincreasing sequence, we have

$$ \sum_{k=0}^{\infty } \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} < \infty . $$

Equivalently,

$$ \lim_{k\to \infty } \alpha _{k} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert =0. $$
(10)

It is worth mentioning that a step-length computed by the standard Wolfe line search conditions (7)–(8) may not be sufficiently close to a minimizer of \((P)\). Instead, the strong Wolfe line search conditions can be used, which consist of (7) and, instead of (8), the following strengthened version:

$$ \bigl\lvert g_{q^{k}} \bigl(x^{k} + \alpha _{k} d_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr\rvert \le -\sigma \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} $$
(11)

is used. From (11), we see that if \(\sigma \to \infty \), then the step-length satisfying (7) and (11) tends to be the optimal step-length [2]. Note that appropriate choices for a starting point have a positive effect on computational cost and convergence speed of the algorithm. The modified PRP conjugate gradient-like method introduced by [52] is presented in the context of q-calculus as:

$$\begin{aligned} d_{q^{k}}^{k} = \textstyle\begin{cases} - g_{q^{k}}^{k}, & k=0, \\ -g_{q^{k}}^{k} + \beta _{k}^{q-\mathrm{PRP}} d_{q^{k-1}}^{k-1} - \theta ^{k} ( g_{q^{k}}^{k}-g_{q^{k-1}}^{ k-1} ),& k>0. \end{cases}\displaystyle \end{aligned}$$
(12)

With the q-gradient, we can have a modification of [52] by taking

$$\begin{aligned} \theta ^{k} = \frac{ ( g_{q^{k}}^{k} )^{T} d_{q^{k-1}}^{k-1}}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. \end{aligned}$$
(13)

From (12) and (13) for \(k\ge 1\), we obtain

$$ d_{q^{k}}^{k} = -g_{q^{k}}^{k} + \frac{ ( g_{q^{k}}^{k} )^{T} ( g_{q^{k}}^{k}-g_{q^{k-1}}^{k-1} ) }{ \lVert g_{ q^{k-1}}^{k-1} \rVert ^{2} } d_{q^{k-1}}^{k-1} - \frac{ ( g_{q^{k}}^{k} )^{T} d_{q^{k-1}}^{k-1} }{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl( g_{q^{k}}^{k} - g_{q^{k-1}}^{ k-1} \bigr), $$

that is,

$$\begin{aligned} \bigl( d_{q^{k}}^{k} \bigr)^{T} g_{q^{k}}^{k} &= - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}. \end{aligned}$$
(14)

This implies that \(d_{q^{k}}^{k}\) provides a q-descent direction of the objective function at \(x^{k}\). It is worth mentioning that if exact line search [53] is used to compute the step-length \(\alpha _{k}\), then \(\theta ^{k}=0\), and \(q^{k}\to (1, 1,\ldots , 1)^{T}\) for \(k\to \infty \). Then, finally, the q-PRP method reduces to the classical PRP method.

The number of steps taken by the algorithm to a large extent determines the number of iterations which always differs from one problem to another. Thus, we present the following Algorithm 1 to solve the problem \((P)\).

Algorithm 1
figure a

q-PRP conjugate gradient algorithm

4 Global convergence

In this section, we prove the global convergence of Algorithm 1 under the following assumptions.

Assumption 4.1

The level set

$$ \Omega = \bigl\{ x \in \mathbb{R}^{n} : f(x) \le f \bigl(x^{0}\bigr) \bigr\} , $$

is bounded, where \(x^{0}\) is a starting point.

Assumption 4.2

In some neighborhood N of Ω, f has a continuous q-derivative and there exists a constant \(L>0\) such that

$$ \bigl\lVert g_{q}(x) - g_{q}(y) \bigr\rVert \le L \lVert x-y \rVert , $$
(15)

for \(x, y \in N\).

Since \(\{f(x)\}\) is nonincreasing, it is clear that the sequence \(\{ x^{k} \}\) generated by Algorithm 1 is contained in Ω. From Assumptions 4.1 and 4.2, there is a constant \(\eta >0\) such that

$$ \bigl\lVert g_{q^{k}}(x) \bigr\rVert \le \eta , $$
(16)

for each \(x\in \Omega \). Based on Assumption 4.1, there exists a positive constant \(\mathcal{B}\) such that \(\lVert x\rVert \le B\), for all \(x\in \Omega \). Without any specification, let \(\{x^{k}\}\) and \(\{d_{q^{k}}^{k}\}\) be the iterative sequence and q-descent direction sequence generated by Algorithm 1. To this point, we present the following lemma.

Lemma 4.1

If there exists a constant \(\epsilon >0\), and \(\{q^{k}\}\) generated by (1) is such that

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert \ge \epsilon , $$
(17)

for all k, then there exists a constant \(\mathcal{M} > 0\) such that the q-descent direction satisfies

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \mathcal{M}, $$
(18)

for all k.

Proof

From (12) and (16) for \(k\ge 1\), we obtain

$$ d_{q^{k}}^{k} = -g_{q^{k}}^{k} + \frac{ ( g_{q^{k}} )^{k} ( g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} )}{ \lVert g_{q^{k-1}}^{ k-1} \rVert ^{2}} d_{q^{k-1}}^{k-1} - \frac{ ( g_{q^{k}} )^{T} d_{q^{k-1}}^{k-1}}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl( g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} \bigr). $$

Taking the norm of both sides of the above equation and using (16), we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \eta + 2 \eta \frac{ \lVert g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} \rVert \lVert d_{q^{k-1}}^{k-1} \rVert }{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. $$

From Assumption 4.2 and (17), we have

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \eta + 2 \eta \frac{ L\alpha _{k-1} \lVert d_{q^{k-1}}^{k-1} \rVert }{ \epsilon ^{2}} \bigl\lVert d_{q^{k-1}}^{ k-1} \bigr\rVert . $$
(19)

From (10), \(\alpha _{k-1}d_{q^{k-1}}^{k-1}\to 0\) and since \(\{q^{k}\}\) approaches \((1,\ldots , 1)^{T}\) as \(k\to \infty \), there exist a constant \(s\in (0, 1)\) and an integer \(k_{0}\) such that the following inequality holds for all \(k\ge k_{0}\):

$$ 2 \eta \frac{ L\alpha _{k-1}}{ \epsilon ^{2}} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert \le s. $$

From (19), we get for any \(k>k_{0}\),

$$\begin{aligned} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert &\le \eta + s \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert \\ &\le \eta ( 1+s) +s^{2} \bigl\lVert d_{q^{k-2}}^{k-2} \bigr\rVert \\ & \quad \vdots \\ &\le \eta \bigl( 1 + s + s^{2} + \cdots + s^{k-k_{0}-1} \bigr) + s^{k-k_{0}} \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert . \end{aligned}$$

For k sufficiently large with \(s\in (0, 1)\), the second term of the above inequality can satisfy

$$ s^{k-k_{0}} \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert < \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert . $$

Thus, we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert < \frac{\eta }{ 1 - s} + \bigl\lVert d_{ q^{k_{0}}}^{k_{0}} \bigr\rVert . $$

Choosing

$$ \mathcal{M} = \max \biggl\{ \bigl\lVert d_{q^{1}}^{1} \bigr\rVert , \bigl\lVert d_{q^{2}}^{2} \bigr\rVert ,\ldots , \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert , \frac{\eta }{1-s}+ \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert \biggr\} , $$

thus we get (18). □

We now present that the modified q-PRP method with modified Armijo-type line search introduced by [51] due to the q-gradient is globally convergent.

Theorem 4.2

Assume that Assumptions 4.1and 4.2hold, then Algorithm 1 generates an infinite sequence \(\{x^{k}\}\) such that

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$
(20)

Proof

For the sake of obtaining a contradiction, we suppose that the given conclusion is not true. Then, there exists a constant \(\epsilon >0\) such that

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert \ge \epsilon , $$
(21)

for all k. If \(\liminf_{ k \to \infty } \alpha _{k} > 0\), then from (10) and (14), we get

$$ \liminf_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

This contradicts the assumption (22). Suppose that \(\liminf_{k\to \infty }\alpha _{k}=0\), that is, there is an infinite index set \(\mathcal{K}\) such that

$$ \lim_{\substack{k \to \infty ,\\ k \in \mathcal{K}}} \alpha _{k} = 0. $$

Suppose that step-9 of Algorithm 1 utilizes (9) to generate the step-length. When \(k\in \mathcal{K}\) is sufficiently large, and \(\rho ^{-1}\alpha _{k}\) for \(\rho \in (0 , 1)\) [52] does not satisfy (9), then we must have

$$ f \bigl( x^{k}+\rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - f \bigl( x^{k} \bigr) > - \delta \rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$
(22)

From the q-mean value theorem, there is \(\gamma _{k} \in (0,1)\) such that

$$ f \bigl( x^{k}+\rho ^{-1} \alpha _{k} d_{q^{k}}^{k} \bigr) - f \bigl(x^{k} \bigr) = \rho ^{-1} \alpha _{k}g_{q^{k}} \bigl( x^{k} + \gamma _{k} \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}, $$

that is,

$$\begin{aligned} f \bigl( x^{k}+\rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) -f \bigl( x^{k} \bigr) &= \rho ^{-1} \alpha _{k} \bigl(g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ & \quad{} + \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}} \bigl( x^{k} + \gamma _{k} \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - g_{q^{k}} \bigl(x^{k} \bigr) \bigr)^{T} d_{q^{k}}^{k}. \end{aligned}$$

From Lemma 4.1 and Assumption 4.2, we have

$$ f \bigl( x^{k} + \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - f \bigl(x^{k} \bigr) \le \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} + L\rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, $$
(23)

where \(L>0\). From (22) and (23),

$$ - \delta \rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} \le \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} + L \rho ^{-2} \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$

Using (14), we get

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \le \alpha _{k} ( \delta +L) \rho ^{-1} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$

Since \(\{d_{q^{k}}^{k}\}\) is bounded and \(\lim_{k\in \mathcal{K}, k \to \infty } \alpha _{k}=0\),

$$ \lim_{\substack{k \to \infty ,\\ k \in \mathcal{K}}} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

This gives a contradiction. The proof is complete. □

The following important result introduced by Zoutendijk [54] can be expressed in the sense of q-calculus as follows:

Lemma 4.3

Suppose that Assumptions 4.1and 4.2hold. Consider the iteration methods (2) and (4), where \(d_{q^{k}}^{k}\) satisfies (3) and \(\alpha _{k}\) is obtained by standard Wolfe line search conditions (7)(8) and strong Wolfe line search conditions (7) and (11). Then,

$$ \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< + \infty . $$
(24)

We now present the convergence analysis of Algorithm 1 with standard Wolfe conditions, which is a modification of [55, 56] in the sense of q-calculus. In this case, the step-lengths are bounded below by a positive constant.

Theorem 4.4

Assume that the line search fulfills the standard Wolfe conditions (7)(8). If there exists a positive constant \(\alpha _{0}\in (0,1]\) such that \(\alpha _{k}\ge \alpha _{0}\) for all \(k\ge 0\), then

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$
(25)

Proof

From (3) and the first Wolfe condition (7), we have

$$\begin{aligned} f \bigl( x^{k+1} \bigr) &\le f \bigl(x^{k} \bigr) + \delta \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ &\le f \bigl( x^{k} \bigr)\le f \bigl( x^{k-1} \bigr)\le \cdots \le f \bigl( x^{0} \bigr). \end{aligned}$$

This means that the sequence \(\{f(x^{k})\}_{k\ge 0}\) is bounded. From the second standard Wolfe condition (8) and Assumption 4.2, we get

$$\begin{aligned} - ( 1 - \sigma ) \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} &\le \bigl( g_{q^{k+1}}^{k+1} - g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ &\le \bigl\lVert g_{q^{k+1}}^{k+1} - g_{q^{k}}^{k} \bigr\rVert \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \alpha _{k} L \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, \end{aligned}$$

that is,

$$ - \frac{ (1-\sigma ) (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k}}{ L \lVert d_{q^{k}}^{k} \rVert ^{2}} \le \alpha _{k}. $$

Post-multiplying both sides by \(\delta (g_{q^{k}}^{k})^{T}d_{q^{k}}^{k}\), we get

$$ - \frac{ ( 1 - \sigma ) \delta ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ L \lVert d_{q^{k}}^{k} \rVert ^{2}} \le \alpha _{k} \delta \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}. $$

From the first standard Wolfe condition (7), we have

$$ \frac{ \delta ( 1 - \sigma )}{ L } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \le f \bigl(x^{k} \bigr) - f \bigl(x^{k+1} \bigr). $$

Since \(\{f(x^{k})\}_{k\ge 0}\) is bounded,

$$\begin{aligned} \begin{aligned} \frac{ \delta (1 - \sigma )}{L} \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} &\le f \bigl(x^{0} \bigr) - f \bigl(x^{1} \bigr) + \bigl( f \bigl(x^{1} \bigr) - f \bigl( x^{2} \bigr) \bigr) + \cdots \\ &= f \bigl(x^{0} \bigr) - \lim_{ k \to \infty } f \bigl(x^{k} \bigr)< +\infty . \end{aligned} \end{aligned}$$

Thus, Zoutendijk condition (24) holds, that is,

$$ \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< + \infty . $$
(26)

From Assumption 4.1, there exists a constant \(\mathcal{B}\) such that

$$ \bigl\lVert p^{k} \bigr\rVert = \bigl\lVert \alpha _{k} d_{q^{k}}^{k} \bigr\rVert \le \mathcal{B}. $$

Since \(\alpha _{k}\ge \alpha _{0}\), we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \frac{\mathcal{B}}{\alpha _{0}}. $$

This, together with (6) and (26), leads to (25). □

We present the following theorem which is a modification of that in [57] using the q-gradient for q-PRP method with strong Wolfe conditions.

Theorem 4.5

Suppose that \(x^{0}\) is a starting point and Assumptions 4.1and 4.2hold. Let \(\{x^{k}\}\) be the sequence generated by Algorithm 1. If \(\beta _{k}^{q-\mathrm{PRP}}\) is such that the step-length \(\alpha _{k}\) satisfies the strong Wolfe conditions (7) and (11), then either

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0 \quad \textit{or}\quad \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< \infty . $$
(27)

Proof

From (4), for all \(k\ge 1\), we have

$$ d_{q^{k}}^{k} + g_{q^{k}}^{k} = \beta _{k}^{q-\mathrm{PRP}} d_{q^{k-1}}^{k-1}. $$

Squaring both sides of the above equation, we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} + \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + 2 \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} = \bigl( \beta _{k}^{ q - \mathrm{PRP}} \bigr)^{2} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert ^{2}. $$

Since \(d_{q^{k}}^{k}\) satisfies the descent condition \((g_{q^{k}}^{k})^{T} d_{q^{k}}^{k} < 0\),

$$\begin{aligned} \bigl\lVert d_{q^{k}}^{ k} \bigr\rVert ^{2} \ge - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + \bigl( \beta _{k}^{q-\mathrm{PRP}} \bigr)^{2} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert ^{2}. \end{aligned}$$
(28)

Pre-multiplying (4) for \(k\ge 1\) by \(g_{q^{k}}^{k}\), we get

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} = - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + \beta _{k}^{ q - \mathrm{PRP}} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}. $$
(29)

From (29) and the second strong Wolfe condition (11), one obtains

$$ \bigl\lvert g_{q^{k}}^{k} d_{q^{k}}^{k} \bigr\rvert + \sigma \bigl\lvert \beta _{k}^{q-\mathrm{PRP}} \bigr\rvert \bigl\lvert \bigl( g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr\rvert \ge \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}. $$
(30)

From the inequality

$$ ( a + \sigma b)^{2} \le \bigl( 1 + \sigma ^{2} \bigr) \bigl( a^{2}+b^{2} \bigr), $$

for all a, b, \(\sigma \ge 0\), with

$$ a = \bigl\lvert \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr\rvert $$

and

$$ b = \bigl\lvert \beta _{k}^{q- \mathrm{PRP}} \bigr\rvert \bigl\lvert \bigl( g_{q^{k-1}}^{ k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr\rvert , $$

we can express (30) as

$$\begin{aligned} \bigl( \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \bigl( \beta _{k}^{q-\mathrm{PRP}} \bigr)^{2} \bigl( \bigl( g_{q^{k-1} }^{ k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \ge c_{1} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{4}, \end{aligned}$$
(31)

where \(c_{1} = \frac{1}{(1+\sigma ^{2})}\). Note that

$$\begin{aligned} &\frac{ ( (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \\ &\quad = \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ \bigl( \bigl(g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \frac{ \lVert d_{q^{k}}^{k} \rVert ^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2} } \bigl( \bigl(g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \biggr]. \end{aligned}$$

From (28) one gets

$$\begin{aligned} &\frac{ ( (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert } \\ & \quad \ge \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ \bigl( \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \bigl( \beta _{k}^{q - \mathrm{PRP}} \bigr)^{2} \bigl( \bigl( g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \\ &\quad \quad{} - \frac{ ( g_{q^{k-1}}^{k-1} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2} } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \biggr]. \end{aligned}$$

Using (31), we obtain

$$ \begin{aligned} &\frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \\ &\quad \ge \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ c_{1} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{4} - \frac{ ( g_{q^{k}}^{k-1} d_{q^{k-1}}^{k-1} )^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \biggr]. \end{aligned} $$
(32)

If (27) is not true, then from the Zoutendijk condition (24) with (32) we obtain the following inequality:

$$\begin{aligned} \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \ge \frac{c}{2} \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \end{aligned}$$
(33)

which holds for k sufficiently large with \(q^{k}\) approaching \((1,\ldots ,1)^{T}\). From (32) and (33), one immediately recovers (30). □

The following lemma immediately follows from the above convergence theorem.

Lemma 4.6

Suppose that Assumptions 4.1and 4.2hold and, from Algorithm 1, the step-length is determined using strong Wolfe conditions. If

$$ \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{r}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} = + \infty , $$
(34)

for any \(r\in [0, 4]\), then the method converges in the sense that (27) is true.

Proof

If (27) is not true, then from Theorem 4.5, it follows that

$$\begin{aligned} \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} < + \infty . \end{aligned}$$
(35)

Because \(\lVert g_{q^{k}}^{k} \rVert \) is bounded away from zero and \(r\in [0, 4]\), it is easy to see that (35) contradicts (34). Therefore, the lemma is true. □

The above inequality shows that if a conjugate gradient method fails to converge, then the length of the search direction will diverge to infinity. Observe that in the above developments, the sufficient descent condition is assumed. This lemma is very useful for proving the global convergence of some conjugate gradient methods without assuming the sufficient descent condition.

5 Numerical illustration

In this section, we investigate the computational efficiency of Algorithm 1 using standard Wolfe conditions (7) and (8), and strong Wolfe conditions (7) and (11), respectively, in contrast to the classical PRP method under the same two conditions.

All codes of Algorithm 1 and classical PRP method are written in R version 3.6.1 installed on a laptop having Intel(R) Core(TM) i3-4005U, 1.70 GHz CPU processor and 4 GB RAM. The iteration was set to terminate if it exceeded 1000 or the gradient of a function was within 10−6.

Example 5.1

Consider a function (Mishra 6) [58] \(f : \mathbb{R}^{2}\to \mathbb{R}\) given by

$$\begin{aligned} f(x) &= -\log \bigl( \sin ^{2} \bigl( ( \cos x_{1} + \cos x_{2} )^{2} \bigr)- \cos ^{2} \bigl( ( \sin x_{1} + \sin x_{2} )^{2} \bigr) \bigr) \\ & \quad{} + 0.1 \bigl[ (x_{1}-1)^{2} + (x_{2}-1)^{2} \bigr]. \end{aligned}$$

We find the q-gradient of the above function at the point

$$ x=(2.88 , 1.82)^{T}, $$

with the starting parameter value

$$ q^{1}=(0.32 , 0.32)^{T}. $$

We run the q-gradient algorithm [39] for \(k=1,\ldots ,50\) iterations so that \(q^{50}\) approaches

$$ (0.999607921, 0.999607921)^{T}, $$

and in the 50th iteration we get the q-gradient

$$ g_{q^{50}}^{50}=(-0.41348771, -0.63704079)^{T}. $$

The complete computational details are given in Table 1 which is depicted graphically through Fig. 1. Note that Fig. 2 provides the three-dimensional view of Mishra 6 test function.

Figure 1
figure 1

Graphical representation of the q-gradient of Mishra 6 function based on Table 1

Figure 2
figure 2

Three-dimensional view of Mishra 6 function

Table 1 q-Gradient of Example 5.1

Example 5.2

Consider a function \(f : \mathbb{R}^{2} \to \mathbb{R}\) given by

$$ f(x_{1}, x_{2}) =(1-x_{1})^{2}+100 \bigl(x_{2}-x_{1}^{2} \bigr)^{2}. $$

The Rosenbrock function, also called Rosenbrock’s valley or banana function, is a nonconvex, unimodal, and nonseparable function. Finding its global minimum numerically is difficult. It has only one global minimizer located at the point

$$ x^{*}=(1 , 1)^{T}, $$

with the search range \([-100, 100]\) for \(x_{1}\) and \(x_{2}\). For performing the experiment, we first generated 37 different starting points from the interval \([-5, 5]\) for the above Rosenbrock function. The numerical results are shown in Table 2 for Algorithm 1 and Table 3 for the classical PRP Algorithm. From these tables, we observe that the number of iterations \((NI)\) is smaller in the case of Algorithm 1 in comparison to the classical PRP method. The meanings of columns of both tables are well-defined. Figure 3 shows the results of comparisons in the number of iterations.

Figure 3
figure 3

Graphical representation of q-PRP and PRP algorithms based on Tables 2 and 3

Table 2 Numerical results of Example 5.2 using Algorithm 1
Table 3 Numerical results of Example 5.2 using classical PRP Algorithm

Example 5.3

Consider the following Rastrigin function \(f : \mathbb{R}^{2} \to \mathbb{R}\), that is,

$$ f(x_{1}, x_{2}) = 20 + x_{1}^{2}+x_{2}^{2} - 10(\cos 2\pi x_{1}) + \cos 2\pi x_{2}. $$

The Rastrigin test function is a nonconvex, multimodal, and separable function, which has several local minimizers arranged in a regular lattice, but it has only one global minimizer located at the point

$$ x^{*}=(0, 0)^{T}. $$

The search range for the Rastrigin function is \([-5.12, 5.12]\) in both \(x_{1}\) and \(x_{2}\). This function poses a fairly difficult problem due to its large search space and its large number of local minima. With a chosen starting point \((0.2, 0.2)^{T}\), we minimize this function through Algorithm 1 using strong Wolfe conditions. Note that q-PRP terminates in 5 iterations as

$$ g_{q^{5}}^{5}=(0.0001900418 , 0.0001900418)^{T}, $$

with step-length \(\alpha _{5}=0.252244535\). Thus, we get the global minimizer

$$ x^{*} =x^{5}= (-2.05643E-08 , -2.05643E-08)^{T}, $$

with minimum function value

$$ f\bigl(x^{*}\bigr) = 1.669775E-13, $$

while running the classical PRP method using strong Wolfe conditions from the same chosen starting point, it terminates in 5 iterations with

$$ g_{q^{5}}^{5}=(1.776357E-10 , 2.66453E-10)^{T}, $$

\(\alpha _{5}=0.002547382\), but fails to achieve the global minimizer as

$$ x^{*}=x^{5}= (-1.990911 , -1.990911)^{T}, $$

and

$$ f\bigl(x^{*}\bigr)=7.967698, $$

which are not true. This is one of the advantages of using the q-gradient in our proposed method over the classical method.

We now execute Algorithm 1 on a set of test functions taken from the CUTEr library [59] with 51 different starting points under standard and strong Wolfe conditions, respectively. Note that direction \(d_{q}^{k}\) generated by the proposed method is the q-descent direction due to the involvement of the q-gradient. Tables 4 and 5 list the numerical results of Algorithm 1 for 51 different starting points on a set of test problems, and Figs. 4 and 5 show this comparison graphically for both q-PRP and classical PRP methods under standard and strong Wolfe conditions, respectively. We conclude that our method is better than the classical method with the smaller number of iterations for the selected set of test problems.

Figure 4
figure 4

Graphical representation of q-PRP and PRP algorithms under standard Wolfe conditions based on Tables 4 and 5

Figure 5
figure 5

Graphical representation of q-PRP and PRP algorithms under strong Wolfe conditions based on Tables 4 and 5

Table 4 Numerical results using Algorithm 1
Table 5 Numerical results using classical PRP

6 Conclusion and future work

This paper proposed the q-PRP conjugate gradient method, which is an improvement of classical PRP conjugate gradient methods. The global convergence of the proposed method is established under the standard and strong Wolfe line searches. The effectiveness of the proposed method has been shown by some numerical examples. We find that the proposed method due to the q-gradient converges fast for a set of test problems with different starting points. The inclusion of q-calculus in other conjugate gradient methods deserves further investigation.