Abstract
This paper focuses on a class of nonlinear optimization subject to linear inequality constraints with unavailable-derivative objective functions. We propose a derivative-free trust-region methods with interior backtracking technique for this optimization. The proposed algorithm has four properties. Firstly, the derivative-free strategy is applied to reduce the algorithm’s requirement for first- or second-order derivatives information. Secondly, an interior backtracking technique ensures not only to reduce the number of iterations for solving trust-region subproblem but also the global convergence to standard stationary points. Thirdly, the local convergence rate is analyzed under some reasonable assumptions. Finally, numerical experiments demonstrate that the new algorithm is effective.
Similar content being viewed by others
1 Introduction
In this paper, we analyze the solution of following nonlinear optimization problem:
where \(f(x)\) is a nonlinear twice continuously differentiable function, but its first-order or second-order derivatives are not explicitly available, \(A\stackrel{\mathrm{def}}{=}[a_{1}^{T},a_{2}^{T},\ldots,a_{m}^{T}]^{T} \in\Re^{m\times n}\) with \(a_{i}^{T} \in\Re^{n}\) and \(b\stackrel{\mathrm{def}}{=}[b_{1},b_{2},\ldots,b_{m}]^{T} \in\Re ^{m}\). The feasible set, in (1), is denoted by \(\Omega\stackrel{\mathrm{def}}{=} \{ x\in\Re^{n}| Ax\geqslant b \} \) and the strict interior feasible set is \(\operatorname {int}(\Omega)\stackrel{\mathrm{def}}{=} \{ x\in\Re^{n}| Ax> b \} \).
1.1 Affine-scaling matrix for inequality constraints
The KKT system of (1) is
where \(\lambda_{f}\in\Re^{m}\). A feasibility \(x^{*}\) is said to be the stationary point for problem (1), if there exists a vector \(0\leqslant\lambda_{f^{*}}\in\Re^{m}\) such that the KKT system (2) holds.
To solve this KKT system, some effective affine-scaling algorithms are designed. Reference [1] proposed an affine-scaling trust-region method with interior-point technique for bound-constrained semismooth equations. Reference [2] introduced affine-scaling interior-point Newton methods for bound-constrained nonlinear optimization. In particular, [3] proved the superlinear and quadratic convergence properties of affine-scaling interior-point Newton methods for bound optimization problems without strict complementarity assumption. Different affine-scaling matrix denotes different algorithm. In [4], the Dikin affine scaling was denoted by
Moreover, diagonal matrix \(C_{f_{k}}\stackrel{\mathrm{def}}{=} \operatorname {diag}\{ \vert \lambda_{f_{k}} \vert \}\) was presented in [4]. Then \(\lambda_{f_{k}}\) could be obtained as a least-squares Lagrangian multiplier approximation computed by
One efficient affine-scaling interior-point trust-region model is the one which is presented in [5] and [6], written in the form
where \(\nabla f(x_{k})\) is the gradient of \(f(x)\) at the current iteration, \(H_{f_{k}}\) is either \(\nabla^{2} f(x_{k})\) or its approximation. Furthermore, \(\Vert \nabla f_{k}^{T}h_{f_{k}} \Vert \leqslant \varepsilon\), where
and ε is a small enough constant, is usually considered as the termination criterion in this class of algorithms.
Motivation
The above discussions illustrate that the affine-scaling interior-point trust-region method is an effective way to solve the nonlinear optimization problems with inequality constraints. The trust-region frame guarantees the stable numerical performance. However, in Eqs. (4)–(6) the first- and second-order derivatives play important roles during the computational process, which maybe fail to solve the optimization problems like (1). If both the feasibility and the stability of the algorithm need to be guaranteed, we should consider the derivative-free trust-region methods.
1.2 Derivative-free technique for trust-region subproblem
Since the first- or second-order derivatives of objective functions are not explicitly available, the derivative-free optimization algorithms have been favored by researchers for a time. The application forms of the derivative-free theory are devise [7, 8] and widely applied. Reference [9] proposed a derivative-free algorithm for least-squares minimization, and proved the local convergence in [10]. Reference [11] presented a derivative-free approach to constrained multiobjective nonsmooth optimization. Reference [12] presented a higher-order contingent derivative of perturbation maps in multiobjective optimization. In [13], Conn proposed an unconstrained derivative-free trust-region method. They constructed the trust-region subproblem
by using a polynomial interpolation technique, where \(\nabla m(x_{k})=g_{k}\), and \(\nabla^{2} m(x_{k})=H_{m_{k}}\). Following this idea, we consider that \(Y _{k}=\{y_{k}^{0}, y_{k}^{1},\ldots, y_{k}^{t}\}\) is an interpolation sample set around the current iteration point \(x_{k}\), and we construct the trust-region subproblem
\(C_{m_{k}}\stackrel{\mathrm{def}}{=}\operatorname {diag}\{ \vert \lambda _{m_{k}} \vert \}\) with \(\lambda_{m_{k}}\) obtained from
We should note that the gradient and Hessian in (5) and (7), (4) and (8), (6) and (9) are different. Meanwhile, since the algorithm in this paper adopts both the decrease direction p and the stepsize α to update the iteration point, we give a new definition of the error bounds between the objective function \(f(x_{k}+\alpha p)\) and the approximation function \(m(x_{k}+\alpha p)\) to ensure the global convergence. We shall show the details after assumption (A1).
Assumption
-
(A1)
Suppose that a level set \(\mathcal{L}(x_{0})\) and a maximal radius \(\Delta_{\max}\) are given. Assume that f is twice continuously differentiable with Lipschitz continuous Hessian in an appropriate open domain containing the \(\Delta_{\max}\) neighborhood \(\bigcup_{x\in\mathcal{L}(x_{0})}B(x, \Delta_{\max})\) of the set \(\mathcal {L}(x_{0})\).
Definition 1
Given a function f satisfies (A1). \(\mathcal{M}= \{ m:\Re ^{n}\rightarrow\Re, m \in C^{2} \} \) is a set of model functions. If there exist positive constants \(\kappa_{ef}\), \(\kappa_{eg}\), \(\kappa _{eh}\), and \(\kappa_{blh}\), such that, for any \(x\in\mathcal {L}(x_{0})\), \(\Delta\in(0,\Delta_{\max}]\), and \(\alpha\in(0,1]\), there is a model function \(m(x+\alpha p) \in\mathcal{M}\), with Lipschitz continuous Hessian and corresponding Lipschitz constant bounded by \(\kappa_{blh}\), and such that:
-
1
the error between the Hessian of the model \(m(x +\alpha p)\) and the Hessian of the function \(f(x +\alpha p)\) satisfies
$$ \bigl\Vert \nabla^{2}f(x+\alpha p)-\nabla ^{2}m(x+\alpha p) \bigr\Vert \leqslant\kappa_{eh} \alpha \Delta, \quad \forall p \in B(0,\Delta); $$(10) -
2
the error between the gradient of the model \(m(x +\alpha p)\) and the gradient of the function \(f(x +\alpha p)\) satisfies
$$ \bigl\Vert \nabla f(x+\alpha p)-\nabla m(x +\alpha p) \bigr\Vert \leqslant\kappa_{eg} \alpha^{2} \Delta ^{2}, \quad \forall p \in B(0,\Delta); $$(11) -
3
the error between the model \(m(x +\alpha p)\) and the function \(f(x +\alpha p)\) satisfies
$$ \bigl\Vert f(x +\alpha p)- m(x +\alpha p) \bigr\Vert \leqslant \kappa_{ef} \alpha^{3} \Delta^{3}, \quad \forall p \in B(0, \Delta). $$(12)
Such a model m is called fully quadratic on \(B(x,\Delta)\).
In this paper, we aim to present a class of derivative-free trust-region method for nonlinear programming with linear inequality constraints. The main features of this paper are:
-
We use the derivatives of approximation function \(m(x_{k}+\alpha p)\) to replace the derivatives of objective function \(f(x_{k}+\alpha p)\) to reduce the algorithm’s requirement for gradient and Hessian of the iteration points. We solve an affine-scaling trust-region subproblem to find a feasible search direction in each iteration.
-
In the kth iteration, a feasible search direction p is obtained from an affine-scaling trust-region subproblem. Meanwhile, interior backtracking skill will be applied both for determining stepsize α and for guaranteeing the feasibility of iteration point.
-
We will show that the iteration points generated by the proposed algorithm could converge to the optimal points of (1).
-
Local convergence will be given under some reasonable assumptions.
This paper is organized as follows: we describe a class of derivative-free trust-region method in Sect. 2. The main results including global convergence property and local convergence rate will be discussed in Sect. 3. The numerical results will be illustrated in Sect. 4. Finally, we give some conclusions.
Notation
In this paper, \(\Vert \cdot \Vert \) is the 2-norm for a vector and the induced 2-norm for a matrix. \(B\subset\Re^{n}\) is a closed ball and \(B(x,\Delta)\) is the closed ball centered at x, with radius \(\Delta>0\). Y is a sample set and \(\mathcal{L}(x_{0})= \{ x\in\Re ^{n}|f(x)\leqslant f(x_{0}),A x\geqslant b \} \) is the level set about the objective function f. We use the subscript \(f_{k}\) and subscript \(m_{k}\) to distinguish the relevant information between the original function and the approximate function. For example, \(H_{f_{k}}\) is the Hessian of f at kth iteration and \(H_{m_{k}}\) is the Hessian of \(m_{k}\) at kth iteration.
2 A derivative-free trust region method with interior backtracking technique
To solve the optimization problem (1) with not all available first- or second-order derivatives, we design a derivative-free trust-region method. An affine-scaling matrix is denoted by (3) for linear inequality constraints. We chose a stepsize \(\alpha_{k}\) satisfying the following inequalities:
Moreover, set
where \(\theta_{k}\in(\theta_{0},1]\), for some \(0<\theta_{0}<1\). The \(\theta_{k}\) is to ensure the iterative points generated by the algorithm are strictly interior. Combining with (13a), (13b) and (14), this interior backtracking technique is to guarantee the feasibility of the iterative points. The algorithm possesses the trust-region property and the derivative-free technique is reflected in the trust-region subproblem (7) since the gradient \(g_{k}\) and Hessian \(H_{m_{k}}\) come from the approximation function, which are different from \(\nabla f_{k}\) and \(H_{f_{k}}\) in (5), satisfying the error bounds (11) and (12). We adopt \(\Vert g_{k}^{T}h_{m_{k}} \Vert \) to be a termination criterion. Now we present the derivative-free trust-region method in detail (see Algorithm 1).
Remark 1
We add a backtracking interior line-search technique in the algorithm. It is helpful to reducing the number of iterations. Equation (13a) is used to guarantee the descent property of \(f(x)\) and (13b) ensures the feasibility of \(x_{k}+\alpha_{k} p_{k}\).
Remark 2
The scalar \(\alpha_{k}\), given in step 5, denotes the stepsize along \(p_{k}\) to the boundary (13b) of the linear inequality constraints
with \(\Gamma_{k}\stackrel{\mathrm{def}}{=}+\infty\) if \(-(a_{i}^{T}x_{k}-b_{i})/(a_{i}^{T}p_{k})\leqslant0\) for all \(i=1,2,\ldots,m\). A key property of the scalar \(\alpha_{k}\) is that an arbitrary step \(\alpha_{k} p_{k}\) to the point \(x_{k}+\alpha_{k} p_{k}\) does not violate any linear inequality constraints.
Remark 3
Let
The first-order necessary conditions of (7) implies that there exists \(v_{m_{k}} \geqslant0\) such that
In order to obtain a suitable approximation function, Algorithm 1 needs to update the objective function of the trust-region subproblem if necessary. The model-improvement algorithm is applied only if \(\Vert g_{k}^{T}h_{m_{k}} \Vert \leqslant\varepsilon\) and at least one of the following holds: The model \(m(x_{k}+ \alpha p)\) is not certifiably fully quadratic on \(B(x_{k},\Delta_{k} )\) or \(\Delta_{k} > \iota \Vert g_{k}^{T}h_{m_{k}} \Vert \). It improves on the current approximate function \(m(x_{k}+\alpha p)\) to meet the requirements of the error bounds so that the model function becomes fully quadratic. We display the model-improvement mechanism in Algorithm 2 which has the same principle as Algorithm 2 proposed in [14], with a constant \(\omega \in(0, 1)\).
3 Main results and discussion
In this section, we mainly discuss some properties about the proposed algorithm, including the discussion of the error bounds, the sufficiently descent property, the global and local convergence properties. First of all, we make some necessary assumptions as follows.
Assumptions
-
(A2)
The level set \(\mathcal{L}(x_{0})\) is bounded.
-
(A3)
There exist positive constants \(\kappa_{g_{f}}\) and \(\kappa_{g_{m}}\) such that \(\Vert \nabla{f_{k}} \Vert \leqslant\kappa_{g_{f}}\) and \(\Vert g_{k} \Vert \leqslant \kappa_{g_{m}}\), respectively, for all \(x_{k}\in\mathcal{L}(x_{0})\).
-
(A4)
There exist positive constants \(\kappa_{H_{f}}\) and \(\kappa_{H_{g}}\) such that \(\Vert H_{f_{k}} \Vert \leqslant \kappa_{H_{f}}\) and \(\Vert H_{m_{k}} \Vert \leqslant\kappa _{H_{m}}\), respectively, for all \(x_{k}\in\mathcal{L}(x_{0})\).
-
(A5)
\([ \begin{matrix} A & -D_{k}^{\frac{1}{2}} \end{matrix} ] \) is full row rank for all \(x_{k}\in\mathcal{L}(x_{0})\).
3.1 Error bounds
Observe first that some error bounds hold immediately.
Lemma 1
Suppose that (A1)–(A5), the error bounds (10)–(12) and the fact that \(\Delta_{k}\leqslant\Delta_{\max}\) hold. If \(m_{k}\) is a fully quadratic model on \(B(x_{k},\Delta_{k})\), then the following bound is true:
Proof
Using the theory of matrix perturbation analysis, Eqs. (4) and (8), we obtain
where \(AA^{T}\) is a positive definition matrix and \(D_{k}\) is a diagonal matrix related with \(x_{k}\in\mathcal{L}(x_{0})\). By (A2), there exists a constant \(\kappa_{\lambda}>0\) such that \(\Vert (AA^{T}+D_{k})^{-1} \Vert \Vert A \Vert \leqslant \kappa_{\lambda}\). Thus, from (6), (9) and the error bound (11), one has the fact that
Clearly, the conclusion holds with \(\kappa_{h}=(1+\kappa_{\lambda} \Vert A \Vert )\kappa_{eg}\Delta_{\max}\). □
Lemma 2
Suppose that (A1)–(A5), the error bounds (10)–(12) and the fact that \(\Delta_{k}\leqslant\Delta_{\max}\) hold. If \(m_{k}\) is a fully quadratic model on \(B(x_{k},\Delta_{k})\), for some constant \(\kappa_{2}\), one has
Proof
Using the triangle inequality, Cauchy–Schwarz inequality, (18), (A3), the error bounds (10)–(12) and the fact that \(\alpha_{k}\in(0,1]\) and \(\Delta_{k}\leqslant\Delta_{\max}\) successively, we obtain
which implies that the inequality (20) holds with \(\kappa _{2}=\kappa_{eg}\kappa_{g_{f}}\Delta_{\max}+\kappa_{g_{m}}\kappa_{h}\). □
Lemma 3
Suppose that (A1)–(A5), the error bounds (10)–(12) and the fact that \(\Delta_{k}\leqslant\Delta_{\max}\) hold. If \(\Vert \nabla f_{k}^{T}h_{f_{k}} \Vert \neq0\), then step 3 of Algorithm 1 will stop in a finite number of improvement steps.
Proof
Now we should prove that \(\Vert \nabla f_{k}^{T}h_{f_{k}} \Vert \) must be zero if the loop of Algorithm 2 is infinite.
In fact, there are two cases could cause Algorithm 2 to be implemented. One is that \(m_{k}\) is not fully quadratic, the other is that the radius \(\Delta_{k}>\iota \Vert g_{k}^{T}h_{m_{k}} \Vert \). Then set \(m_{k}^{(0)}=m_{k}\), and improve the model to be fully quadratic on \(B(x_{k}, \Delta_{k})\), which denoted by \(m^{(1)}_{k}\). If \((g_{k}^{T}h_{m_{k}})^{(1)}\) of \(m_{k}^{(1)}\) satisfies the inequality \(\iota \Vert (g_{k}^{T}h_{m_{k}})^{(1)} \Vert \geqslant\Delta_{k}\), Algorithm 2 stops with \(\widetilde{\Delta}_{k}=\Delta_{k} \leqslant\iota \Vert (g_{k}^{T}h_{m_{k}})^{(1)} \Vert \).
Otherwise, \(\iota \Vert (g_{k}^{T}h_{m_{k}})^{(1)} \Vert <\Delta _{k} \) holds. Algorithm 2 will improve the model on \(B(x_{k},\omega\Delta_{k})\) and the resulting model is denoted by \(m^{(2)}_{k}\). If \(m^{(2)}_{k}\) satisfies \(\iota \Vert (g_{k}^{T}h_{m_{k}})^{(2)} \Vert \geqslant\omega\Delta_{k}\), the procedure stops. If not, the radius should be multiplied by ω and Algorithm 2 will improve the model on \(B(x_{k},\omega^{2} \Delta_{k})\), and go on.
The only case for Algorithm 2 to be infinite is if
It implies
By the bound (20) \(\Vert \nabla f_{k}^{T}h_{f_{k}}-(g_{k}^{T}h_{m_{k}})^{(i)} \Vert \leqslant\kappa_{2}\omega^{i-1} \alpha_{k}\Delta_{k} \quad \text{for all } i\geqslant1\), we obtain
By the choice of \(\omega\in(0,1)\) the above inequality means that \(\Vert \nabla f_{k} ^{T}h_{f_{k}} \Vert =0\). The conclusion shows us step 3 will stop in a finite number of improvements. □
3.2 Sufficiently descent property
In order to guarantee the global convergence property of the proposed algorithm, it is necessary to show that a sufficiently descent condition is satisfied at the kth iteration. We obtained in [6] if step \(p_{k}\) is the optimal point of the trust-region subproblem (7), there is a constant \(\kappa_{3}>0\) such that
Lemma 4
Suppose that (A1)–(A5) and the error bounds (10)–(12) hold. \(p_{k}\) is the solution of the trust-region subproblem (7). Then there must exist an appropriate \(\alpha_{k}>0\) which satisfied inequalities (13a).
Proof
We start by considering the maximal step-length along the trust-region subproblem descent direction that preserves sufficient feasibility in the sense of the (13a). Successively using the mean value theorem and (11), we obviously obtain
where \(\xi_{k} \in(x_{k},x_{k}+s_{k})\).
There are two cases that may be considered. The first is \(p_{k}^{T}\nabla^{2} f(\xi_{k}) p_{k}\leqslant0\). By canceling the last term of Eqs. (22), (21), \(\frac{ \Vert g_{k}^{T}h_{m_{k}} \Vert ^{\frac{1}{2}}}{\kappa _{\lambda_{m}}\kappa_{H_{m}}}\leqslant\Delta_{k}\) for large enough k and the fact that \(\Delta_{k}\leqslant\Delta _{\max}\), it is thus easy to see that there exists an \(\alpha^{*}=[\frac{\kappa_{3}(1-\kappa_{1})\kappa _{H_{m}}\kappa_{\lambda_{m}} }{\kappa_{eg}\Delta_{\max}}]^{\frac {1}{2}}>0\) such that (13a) holds. The second case is \(p_{k}^{T}\nabla^{2} f(\xi_{k}) p_{k}>0\). Using the Cauchy–Schwarz inequality and the fact that \(\alpha_{k}\in(0,1]\) and \(\Delta_{k}\leqslant \Delta_{\max}\), we deduce that
when \(\alpha^{*}=\frac{\kappa_{3}(1-\kappa_{1}) \kappa _{H_{m}}\kappa_{\lambda_{m}}}{\kappa_{eg}\Delta_{\max}+\frac {1}{2}\kappa_{H_{f}} }>0\). Thus the final conclusion obtained. □
We therefore see that it is reasonable to design line-search step criterion in step 5, which provided us a nonincreasing sequence \(\{f(x_{k})\}\).
Lemma 5
Let step \(p_{k}\) be the solution of the trust-region subproblem (7). Suppose that (A1)–(A5) hold. Then there exists a positive constant \(\kappa_{4}\) such that step \(p_{k}\) satisfies the following sufficiently descent condition:
for all \(g_{k}\), \(h_{m_{k}}\), \(\Vert M_{m_{k}} \Vert \), and \(\Delta_{k}\).
Proof
Combining now (7), (17), Lemma 4, \(\theta_{k}\in (\theta_{0},1]\) and the fact that \(\alpha_{k}\leqslant1\), we get
□
3.3 Global convergence
Every iteration point in the \(k+1\)th iteration will be chosen on the region \(B(x_{k}, \alpha_{k} \Delta_{k})\). Following the lemma one first shows that the current iteration must be successful if \(\alpha_{k} \Delta_{k}\) is small enough.
Lemma 6
Suppose that (A1)–(A5) and the error bounds (10)–(12) hold. \(m_{k}\) is fully quadratic on \(B(x_{k},\Delta_{k})\), \(\Vert g_{k}^{T}h_{m_{k}} \Vert \neq0\) and
where \(\kappa_{\lambda_{m}}\) is the bound of \(C_{m_{k}}\), for all \(x \in\mathcal{L}(x_{0})\). Then the kth iteration is successful.
Proof
We notice that, for all k and the model function \(m_{k}\), one has \(f(x_{k})=m(x_{k})\). Let \(M_{f_{k}}=\bigl[ {\scriptsize\begin{matrix}{} H_{f_{k}} & 0 \cr 0 & C_{f_{k}} \end{matrix}} \bigr] \), from (16) and (A3), we know that \(\Vert M_{m_{k}} \Vert \leqslant\kappa_{H_{m}}\kappa_{\lambda_{m}}\). Thus combining \(\Delta_{k} \leqslant\frac{ \Vert g_{k}^{T}h_{m_{k}} \Vert ^{\frac{1}{2}}}{\kappa_{H_{m}}\kappa _{\lambda_{m}}}\) with the sufficient decrease condition (23), we immediately get
Using Eqs. (12), (23), the fact that \(\alpha_{k}\in (0,1]\) and \(\theta_{k}\in(0,1]\), we have
Thus \(\rho_{k} \geqslant\eta_{1}\) and the iteration is successful. □
Lemma 7
Suppose that (A1)–(A5) and the error bounds (10)–(12) hold. If the number of successful iteration is finite, then
Proof
We consider that all the model-improving iterations before \(m_{k}\) becomes fully quadratic are less than a constant N. Suppose that the current iteration is an iteration after a successful one. It means that an infinite number of iterations are acceptable or not nice. In these two cases, \(\Delta_{k}\) is shrinking. Furthermore, \(\Delta_{k}\) is reduced by a factor ζ at least once every N iterations, which implies \(\Delta_{k}\rightarrow0\).
For the jth iteration, we denote the ith iteration after j by the index \(i_{j}\), then
Using the triangle inequality, we obtain
The following work is to show that all these terms on the right-hand side are converging to zero. Because of the Lipschitz continuity of ∇f and the fact that \(\Vert x_{i_{j}} -x_{j} \Vert \rightarrow0\) the first and second terms converge to zero. The inequalities (10) and (11) imply the third and fourth terms on the right-hand side are converging to zero. According to Lemma 3, if \(\Vert g_{i_{j}}^{T}h_{m_{i_{j}}} \Vert \nrightarrow0\) for small enough \(\Delta_{i_{j}}\), \(i_{j}\) would be a successful iteration, which yield a contradiction. Thus the last term converges to zero. □
Lemma 8
Suppose that (A1)–(A5), the error bounds (10)–(12) and (23) hold. Suppose furthermore that the strict complementarity of the problem (1) holds. Then
Proof
The key is that we may find a contradiction with the fact that \(\{f(x_{k})\}\) is a nonincreasing bounded sequence unless \(x_{k}\) is a stationary point. We thus have to verify that there exists some \(\epsilon>0\) such that \(\{f(x_{k})\}\) is not convergent under the assumption of \(\Vert g_{k}^{T}h_{m_{k}} \Vert \geqslant\epsilon^{2}\). We observe from (13a), Lemma 4 and (21) that
Thus from (24), two cases should be considered next, that is,
and
We now start the proof of (25). On one hand, \(\alpha_{k}\) is accepted by (13b) the boundary of inequality constraints along \(p_{k}\). From Eq. (15)
with \(\alpha_{k}=+\infty\) if \(-(a_{i}^{T}x_{k}-b_{i})/(a_{i}^{T}p_{k})\leqslant0\) for all \(i=1,2,\ldots,m\), \(\hat{p}_{k}=D_{k}^{-\frac{1}{2}}Ap_{k}\) and (17), we know that there exists \(\lambda_{m_{k+1}}\) such that
where \(\hat{p}_{k}^{i}\) and \(\lambda^{i}_{m_{k+1}}\) are the ith components of the vectors \(\hat{p}_{k}\) and \(\lambda_{m_{k+1}}\), respectively. Hence, there exists \(j\in{1,\ldots,m}\) such that
From (17), we have
Since \([ \begin{matrix} A & -D_{k}^{\frac{1}{2}} \end{matrix} ] ^{T}\) is full row rank for all \(x\in\mathcal{L}(x_{0})\), \({\lambda_{m_{k}}}\) is bounded and \(m(x)\) is twice continuously differentiable, there exist \(\kappa _{5}>0\) and \(\kappa_{6}>0\) such that
Using the fact that \(v_{m_{k}}( \Delta_{k}-\Vert \bigl( {\scriptsize\begin{matrix}{} p_{k} \cr \hat{p}_{k} \end{matrix}} \bigr) \Vert ) =0\) and taking the norm to both sides of (17), we deduce that
And noting \(\Vert (p_{k};\hat{p}_{k}) \Vert \leqslant\Delta_{k}\), we can obtain
Combining the assumption \(\Vert g_{k}^{T}h_{m_{k}} \Vert >\epsilon^{2}\) with \(\Delta_{k}\rightarrow0\) deduced from (24), it is clear from the fact \(\Vert M_{m_{k}} \Vert \leqslant\kappa_{\lambda _{m}}\kappa_{H_{m}}\) that, for ∀k,
Thus (27) implies that
Furthermore, \(\Delta_{k}\rightarrow0\) means that \(\lim_{k\rightarrow \infty} \Vert p_{k} \Vert =0\), from which we deduce that, for some \(0< \theta_{0} <1 \) and \(\theta_{k}-1=O( \Vert p_{k} \Vert ^{2})\), the strictly feasible stepsize \(\theta_{k} \in(\theta _{0},1]\rightarrow1\). From the above, we have already seen that (25) holds in the case that \(\alpha_{k}\) is determined by (13b).
There is another case that \(\alpha_{k}\) is determined by (13a). In this case, we are able to verify that \(\alpha _{k}=1\) is acceptable when k sufficiently large. If not,
must hold. Applying the Taylor series, (10)–(11), (A3) and the fact that \(\Delta_{k}\leqslant\Delta_{\max}\), we deduce that
where \(\xi_{k} \in(x_{k},x_{k}+s_{k})\). This inequality is equivalent to the form of
Moreover, (21) and \(\Vert g_{k}^{T}h_{m_{k}} \Vert \geqslant\epsilon^{2}\) imply that
Thus if \(\Delta_{k}\leqslant\frac{2(1-\kappa_{1})\kappa_{3}\epsilon }{2\kappa_{eg}\Delta_{\max}+\kappa_{eh}\Delta_{\max}+\kappa_{H_{m}}} \leqslant\frac{\epsilon}{\kappa_{H_{m}}}\) we deduce from the inequality
that
Clearly, a contradiction appears here. It implies that \(\alpha_{k}=1\) for k sufficiently large. Therefore (25) always holds.
On the other hand, we should prove that (26) is true. From step 3 of Algorithm 1, we know that
By the assumption that \(\Vert g_{k}^{T}h_{m_{k}} \Vert \geqslant\epsilon^{2}\), we obtain
Whenever \(\Delta_{k}\) falls below a constant \(\bar{\kappa}_{7}\) given by
the kth iteration is either successful or model-improving, and hence from step 9, we are able to deduce both that \(\Delta_{k+1} \geqslant \Delta_{k}\) and \(\Delta_{k+1} \geqslant\zeta\Delta_{k}\). Combining with the rules of step 9 we conclude that \(\Delta_{k+1} \geqslant \min \{ \iota\epsilon,\zeta\bar{\kappa}_{7} \} =\kappa_{7}\). It means that \(\Delta_{k}\nrightarrow0\), if \(\Vert g_{k}^{T}h_{m_{k}} \Vert \geqslant\epsilon^{2}\).
In conclusion, the sequence \(\{f(x_{k})\}\) is not convergent if we suppose that \(\Vert g_{k}^{T}h_{m_{k}} \Vert \geqslant\epsilon ^{2}\), which contradicts the fact that \(\{f(x_{k})\}\) is a nonincreasing bounded sequence. It implies that
□
Lemma 9
For any subsequence \(\{k_{i}\}\) such that
we also have
Proof
First, we note that, by (28), \(\Vert g_{k_{i}}^{T}h_{m_{k_{i}}} \Vert \leqslant\varepsilon\) when i sufficiently large. Thus the criticality step ensures that the model \(m_{k_{i}}\) is a fully quadratic function on the ball \(B(x_{k_{i}}, \Delta_{k_{i}} )\), with \(\Delta _{k_{i}}\leqslant\iota \Vert g_{k_{i}}^{T}h_{m_{k_{i}}} \Vert \) for all i sufficiently large (if \(\Vert \nabla f_{k_{i}}^{T}h_{f_{k_{i}}} \Vert \neq0\)). Then, using the bound (20) on the error between the terminal conditions of function and model, we have
As a consequence, we have
for all i sufficiently large. But \(\Vert g_{k_{i}}^{T}h_{m_{k_{i}}} \Vert \rightarrow0\) implies (29) holds. □
Then we obtain the global convergence derived from Lemmas 8 and 9.
Theorem 1
Suppose that (A1)–(A5), the error bounds (10)–(12) and (23) hold. Suppose furthermore that the strict complementarity of the problem (1) holds. Let \(\{x_{k}\}\subset\Re^{n}\) be sequence generated by Algorithm 1. Then
The above theorem shows us there exists a limit point that is first-order critical. In fact, we are able to prove that all limit points of the sequence of iterations are first-order critical.
Theorem 2
Suppose that (A1)–(A5), the error bounds (10)–(12) and (23) hold. Suppose furthermore that the strict complementarity of the problem (1) holds. Let \(\{x_{k}\}\subset\Re^{n}\) be sequence generated by Algorithm 1. Then
Proof
We first obtained from Lemma 7 that the theorem holds in the case when S is finite. Hence, we will assume that S is infinite. For the purpose of deriving a contradiction, we suppose that there exists a subsequence \(\{k_{i}\}\) of successful or acceptable iterations such that
for some \(\epsilon_{1}>0\) and for all i. Then, because of Lemma 9, we obtain
for some \(\epsilon_{2} >0\) and for all i sufficiently large. Without loss of generality, we pick \(\epsilon_{2}\) such that
Lemma 8 then ensures the existence, for each \(\{k_{i}\}\) in the subsequence, of a first iteration \(\ell_{i} > k_{i}\) such that \(\Vert g_{\ell_{i}}^{T} h_{m_{\ell_{i}}} \Vert < \epsilon^{2}_{2} \). By removing elements from \(\{k_{i}\}\), without loss of generality and without a change of notation, we thus see that there exists another subsequence indexed by \(\{\ell_{i}\}\) such that
for sufficiently large i, with inequality (30) being retained.
We now restrict our attention to the set \(\mathcal{K}\) corresponding to the subsequence of iterations whose indices are in the set
where \(k_{i}\) and \(\ell_{i}\) belong to the two subsequences given above in (31).
We know that \(\Vert g_{k}^{T}h_{m_{k}} \Vert \geqslant\epsilon ^{2}_{2} \) for \(k\in\mathcal{K}\). From Lemma 8 \(\lim_{k\rightarrow+\infty}\alpha_{k} \Delta _{k}=0\) and by Lemma 5 we conclude that for any large enough \(k\in\mathcal{K}\) the iteration k is either successful if the model is fully quadratic or model-improving otherwise. Moreover, for each \(k\in\mathcal{K}\cap S\) we have
and, for any such k large enough, \(\Delta_{k}\leqslant\frac{\epsilon_{2}}{\kappa_{h_{m}}\kappa _{\lambda_{m}}}\). Hence, we have \(\alpha_{k} \theta_{k}\Delta_{k}\leqslant\frac {f(x_{k})-f(x_{k}+s_{k})}{\eta_{1} \kappa_{4}\epsilon_{2}}\) for \(k\in\mathcal{K}\cap S\) sufficiently large. Since for any \(k\in\mathcal{K}\) large enough the iteration is either successful or model-improving and since for a model-improving iteration \(x_{k+1}=x_{k}+s_{k}\), we have, for all i sufficiently large,
Because the sequence \(\{f(x_{k})\}\) is bounded below and monotonic decreasing, we see that the right-hand side of this inequality must converge to zero, and we therefore obtain
Now,
Since ∇f is Lipschitz continuity, we see that the first term of the above inequality \(\Vert \nabla f(x_{k_{i}})^{T}h_{f_{k_{i}}}-\nabla f(x_{\ell_{i}})^{T}h_{f_{\ell _{i}}} \Vert \rightarrow0\) and is bounded by \(\epsilon^{2}_{2}\) for i sufficiently large. Equation (32) implies the third term \(\Vert g_{\ell _{i}}^{T}h_{m_{\ell_{i}}} \Vert \leqslant\epsilon^{2}_{2}\). From (31) we see that \(m_{\ell_{i}}\) is a fully quadratic function on \(B(x_{\ell_{i}}, \iota \Vert g_{\ell_{i}}^{T}h_{m_{\ell_{i}}} \Vert )\). Using (11) and (32), we deduce that \(\Vert \nabla f(x_{\ell_{i}})^{T}h_{f_{\ell_{i}}}-g_{\ell _{i}}^{T}h_{m_{\ell_{i}}} \Vert \leqslant\kappa_{eg} \iota \epsilon^{2}_{2}\) for i sufficiently large. Combining with these bounds we obtain the consequence that
for i large enough. This result contradicts (30), which implies the initial assumption is false and the theorem follows. □
3.4 Local convergence
Having proved the global convergence, we now focus on the speed of the local convergence. For this motivation, more acceptable assumptions are given as follows.
Assumptions
-
(A6)
\(x_{*}\) is the solution of problem (1), which satisfies the strong second-order sufficient condition, that is, let the columns of \(Z_{*}\) denote an orthogonal basis for the null space of \([ \begin{matrix} A & -D^{\frac{1}{2}}_{*} \end{matrix} ] \), then there exists \(\varpi>0\) such that
$$ d^{T}({Z_{*}}M_{f_{*}}{Z_{*}})d \geqslant\varpi \Vert d \Vert ^{2}, \quad \forall d. $$(33) -
(A7)
Let
$$ \lim_{k \rightarrow\infty}\frac{ \Vert (M_{m_{k}}-M_{f_{k}})Z_{k} p_{k} \Vert }{ \Vert p_{k} \Vert }=0. $$(34)This means that for large k
$$\begin{aligned} p_{k}^{T} \bigl(Z_{k}^{T} M_{m_{k}}Z_{k} \bigr)p_{k} =&p_{k}^{T} \bigl(Z_{k}^{T}M_{f_{k}}Z_{k} \bigr)p_{k}+o \bigl( \Vert p_{k} \Vert ^{2} \bigr). \end{aligned}$$
Theorem 3
Suppose that (A1)–(A7), the error bounds (10)–(12) and (23) hold. \(\{x_{k}\}\) is a sequence generated by Algorithm 1. Suppose furthermore that the strict complementarity of the problem (1) holds. Then, for sufficiently large k, the stepsize \(\alpha_{k}\equiv1\) and there exists \(\hat{\Delta}>0\) such that \(\Delta_{k}\geqslant\Delta_{K'}\geqslant\hat{\Delta}\), \(\forall k\geqslant K'\), where \(K'\) is a large enough index.
Proof
According to the algorithm, the stepsize \(\alpha_{k}\) is given in (15)
From \(\hat{p}_{k}=D_{k}^{-\frac{1}{2}}Ap_{k}\) and (17), there exists \(\lambda_{m_{k+1}}\) such that
where \(\hat{p}_{k}^{i}\) and \(\lambda^{i}_{m_{k+1}}\) are the ith component of the vectors \(\hat{p}_{k}\) and \(\lambda_{m_{k+1}}\), respectively.
If \(\Vert p_{k} \Vert <\Delta_{k}\), then \(v_{m_{k}}=0\). Since the strict complementarity of the problem (1) holds at every limit point of \(\{x_{k}\}\), i.e., \(\vert \lambda _{m_{k+1}}^{T}j \vert + \vert a_{j}^{T}x_{k}-b_{j} \vert >0\), for all large k, \(\lambda_{m_{k+1}}=\lambda_{m_{k+1}}^{N}>0\) when \(v_{m_{k}}=0\). So, \(\lambda_{m_{k+1}}^{j}=(\lambda_{m_{k+1}}^{N})^{j}>0\). From (35), it is clear that \(\lim_{k \rightarrow\infty}\alpha_{k}=1\).
If \(\Vert p_{k} \Vert =\Delta_{k} \rightarrow0\), then \(v_{m_{k+1}}\rightarrow\infty\). From (35),
From the above, we have found that if \(\Vert g_{k}^{T}h_{m_{k}} \Vert \geqslant\varepsilon^{2}\) holds and \(\Delta_{k}\rightarrow 0\), we conclude that \(\lim_{k\rightarrow\infty}\alpha_{k}=+\infty\), and \(\lim_{k\rightarrow\infty}\theta_{k}=1\).
Further, by the condition on the strictly feasible stepsize \(\theta_{k}-1=O( \Vert p_{k} \Vert )\), and \(\lim_{k\rightarrow \infty}p_{k}=0\), we have \(\lim_{k\rightarrow\infty}\theta_{k}=1\).
We can obtain from above that \(\lim_{k\rightarrow\infty}\Gamma _{k}=+\infty\) when \(\alpha_{k}\) is given in (15) along \(p_{k}\). It means that if \(\alpha_{k}\) is determined by (13b), \(\alpha _{k}\equiv1\) for sufficiently large k. Thus
The error bound (11) shows us \((g_{k}-\nabla f_{k})^{T} p_{k}=o( \Vert p_{k} \Vert ^{2})\). Hence we see from (36) that \(f(x_{k}+p_{k})\leqslant f(x_{k})+\kappa_{1}g_{k}^{T}p_{k} \) at the kth iteration.
Combining with the fact that \(p_{k}^{T}A^{T}D^{-1}_{k}C_{m_{k}}Ap_{k} \rightarrow0\), we know that \(x_{k+1}=x_{k}+p_{k}\). So
By assumptions (A1)–(A7), we can obtain
Let the columns of \(Z_{k}\) denote an orthogonal basis for the null space of \([ \begin{matrix} A & -D_{k}^{\frac{1}{2}} \end{matrix} ] \). We get \(g_{k}^{T}p_{k}\leqslant-[ \begin{matrix} p^{T}_{k} , \hat{p}_{k}^{T} \end{matrix} ] M_{m_{k}}\bigl[ {\scriptsize\begin{matrix}{} p_{k} \cr \hat{p}_{k} \end{matrix}} \bigr] =-p^{T}_{k}Z_{k}^{T}M_{m_{k}}Z_{k} p_{k} \). Therefore, from (33)–(34), we see that for all large k
Hence, one has
For a similar proof, we can obtain \(p_{k}\rightarrow0\). Combining (37) with (38), one has the fact that \(\rho_{k}\rightarrow1\). Hence there exists \(\hat{\Delta}>0\) such that when \(\Vert p_{k} \Vert \leqslant\hat{\Delta}\), \(\hat{\rho}_{k}\geqslant\rho_{k} \geqslant\eta_{2}\), and \(\Delta _{k+1}\geqslant\Delta_{k}\). As \(p_{k}\rightarrow0\), there exists an index \(K'\) such that \(\Vert p_{k} \Vert \leqslant\hat{\Delta}\) whenever \(k\geqslant K'\). Thus, the conclusion holds. □
Theorem 3 implies that the local convergence rate of Algorithm 1 depends on the Hessian at \(x_{*}\) and the local convergence rate of \(p_{k}\). Meanwhile, if \(p_{k}\) is a quasi-Newton step, for sufficiently large k, the sequence \(\{x_{k}\}\) will reach a superlinear local convergence rate to the optimal point \(x_{*}\).
4 Numerical experiments
We now demonstrate the experiment performance of the proposed derivative-free trust-region method.
Environment: The algorithms are written in Matlab R2009a and run on a PC with 2.66 GHz Intel(R) Core(TM)2 Quad CPU and 4 G DDR2.
Initialization: The values \(\Delta_{0}=2\), \(\eta_{0}=0.25\), \(\eta_{1}=0.75\), \(\zeta=0.5\), \(\varsigma=1.5\), \(\iota=0.5\), \(\beta=0.25\), \(\alpha=0.2\), \(\varepsilon= 10^{-8}\) and \(\omega=0.3\) are used. \(\Delta_{\max}\) is equal to 4, 6, 8, respectively.
Termination criteria: \(\Vert g_{k} ^{T}h_{m_{k}} \Vert \leqslant\varepsilon\).
Problems: We first test 20 linear inequality constrained optimization problems (listed in Table 1) from Test Examples for Nonlinear Programming Codes [15, 16]. It is worth noting that the assumptions (A2)–(A5) play very important roles in the theoretical proof. Here (A2) is a general assumption in the optimization problem and (A5) can be satisfied if the iteration points are not optimal. According to the definitions of error bounds in our algorithm, the gradient (or Hessian) of the model function must be bounded if there exists a constant such that the gradient (or Hessian) norm of the objective function is bounded. Therefore, most of the above test problems satisfy the assumptions (A2)–(A5). For example (HS21)
Of course, we will use the level set to limit the bound of \(\Vert \nabla f(x) \Vert \) during program execution, which will be much smaller than this value. Even if the boundedness of the gradient and of the Hessian of the objective functions cannot be satisfied at the same time, at least the boundedness within the level set can be guaranteed.
We use the tool of Dolan and Moré [17] to analyze the efficiency of the given algorithm. Figures 1 and 2 show that Algorithm 1 is feasible and has the robust property.
Furthermore we test five simple linear inequality constrained optimization problems from [16] and compare the experiment results of different trust-region radius upper bound \(\Delta_{\max}\). Table 2 shows the experiment results, where nf represents the number of function evaluations, n is the dimension of the test problems and F means the algorithm terminated in the case that the iteration number exceeds the maximum number. The CPU times of the test problems are reported. Table 2 indicates that Algorithm 1 is executable to reach optimal point. The choice of \(\Delta_{\max}=6\) is made to enable us to carry out more gratifying results. But the results show that the number of iterations maybe higher than any other derivative-based algorithms. The reason we think is that the derivatives of most of the test problems we chose are available and a derivative-free technique may increase the number of executions; then higher iteration numbers are necessary.
5 Conclusions
In this paper, we propose an affine-scaling derivative-free method for linear inequality constrained optimizations.
-
(1)
This algorithm is mainly designed to solve the unavailable derivatives optimization problems in engineering. The proposed algorithm adopts interior backtracking technique and possesses the trust-region property.
-
(2)
The global convergence is proved by using the definition of fully quadratic. It shows that the iteration points generated by the proposed algorithm could converge to the optimal points of (1). Meanwhile, we get the result that the local convergence rate of the proposed algorithm depends on \(p_{k}\). If \(p_{k}\) becomes the quasi-Newton step, then the sequence \(x_{k}\) generated by the algorithm converges to \(x_{*}\) superlinearly.
-
(3)
The preliminary numerical experiments verify the new algorithm we proposed is feasible and effective for solving unavailable-derivative linear inequality constrained optimization problems.
References
Kanzow, C., Klug, A.: An interior-point affine-scaling trust-region method for semismooth equations with box constraints. Comput. Optim. Appl. 37(3), 329–353 (2007)
Kanzow, C., Klug, A.: On affine-acaling interior-point Newton methods for nonlinear minimization with bound constraints, Comput. Optim. Appl. 35(2), 177–197 (2006)
Heinkenschloss, M., Ulbrich, M., Ulbrich, S.: Superlinear and quadratic convergence of affine-scaling interior-point Newton methods for problems with simple bounds without strict complementarity assumption. Math. Program. 86(3), 615–635 (1999)
Liuzzi, G., Lucidi, S., Sciandrone, M.: Sequential penalty derivative-free methods for nonlinear constrained optimization. SIAM J. Optim. 20(5), 2614–2635 (2010)
Coleman, T.F., Li, Y.: A trust region and affine scaling interior point method for nonconvex minimization with linear inequality constraints. Math. Program. 88(1), 1–31 (1997)
Zhu, D.: A new affine scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints. J. Comput. Appl. Math. 161(1), 1–25 (2003)
Sahu, D.R., Yao, J.C.: A generalized hybrid steepest descent method and applications. J. Nonlinear Var. Anal. 1, 111–126 (2017)
Gibali, A.: Two simple relaxed perturbed extragradient methods for solving variational inequlities in Euclidean spaces. J. Nonlinear Var. Anal. 2, 49–61 (2018)
Zhang, H., Conn, A.R., Scheinberg, K.: A derivative-free algorithm for least-squares minimization. SIAM J. Optim. 20(6), 3555–3576 (2010)
Zhang, H., Conn, A.R.: On the local convergence of a derivative-free algorithm for least-squares minimization. Comput. Optim. Appl. 51(2), 481–507 (2012)
Liuzzi, G., Lucidi, S., Rinaldi, F.: A derivative-free approach to constrained multiobjective nonsmooth optimization. SIAM J. Optim. 26(4), 2744–2774 (2016)
Tung, L.T.: Higher-order contingent derivative of perturbation maps in multiobjective optimization. J. Nonlinear Funct. Anal. 2015, 19 (2015)
Conn, A.R., Scheinberg, K., Vicente, L.N.: Global convergence of general derivative-free trust-region algorithms to first- and second-order critical points. SIAM J. Optim. 20(1), 387–415 (2006)
Jing, G., Zhu, D.: An affine scaling derivative-free trust region method with interior backtracking technique for bounded-constrained nonlinear programming. J. Syst. Sci. Complex. 27(3), 537–564 (2014)
Hock, W., Schittkowski, K.: Test Examples for Nonlinear Programming Codes. Springer, Bayreuth (1987)
Schittkowski, K.: More test examples for nonlinear programming codes (1987)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Acknowledgements
This work is supported by the National Science Foundation of China under Grant No. 11626037, 13th five-year Science and Technology Project of Education Department of Jilin Province under Grant No. JJKH20170036KJ, the PhD Start-up Fund of Natural Science Foundation of Beihua University and Youth Training Project Foundation of Beihua University.
Author information
Authors and Affiliations
Contributions
All authors contributed equally and significantly in writing this article. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Gao, J., Cao, J. A class of derivative-free trust-region methods with interior backtracking technique for nonlinear optimization problems subject to linear inequality constraints. J Inequal Appl 2018, 108 (2018). https://doi.org/10.1186/s13660-018-1698-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-018-1698-7