A modified Levenberg–Marquardt algorithm for low order-value optimization problem

Lv, Xiaochen; Yu, Zhensheng

doi:10.1007/s12190-024-02140-1

A modified Levenberg–Marquardt algorithm for low order-value optimization problem

Original Research
Open access
Published: 24 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

A modified Levenberg–Marquardt algorithm for low order-value optimization problem

Download PDF

21 Accesses
Explore all metrics

Abstract

In this paper, we consider a modified Levenberg–Marquardt algorithm for Low Order Value Optimization problems(LOVO). In the algorithm, we obtain the search direction by a combination of LM steps and approximate LM steps, and solve the subproblems therein by QR decomposition or cholesky decomposition. We prove the global convergence of the algorithm theoretically and discuss the worst-case complexity of the algorithm. Numerical results show that the algorithm in this paper is superior in terms of number of iterations and computation time compared to both LM-LOVO and GN-LOVO algorithm.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we consider the following low order-value optimization (LOVO) [1] problem:

$$\begin{aligned} \min \quad {{S}_{p}}\left( x \right) =\sum \limits _{k=1}^{p}{{{R}_{{{i}_{k}}\left( x \right) }}}\left( x \right) \end{aligned}$$

(1.1)

where ${R_{i}}:{{\mathbb {R}}^{n}}\rightarrow \mathbb {R}, i=1,\ldots ,r$. In detail, given a model $\varphi (t)$ depends on n parameters $(x\in {{\mathbb {R}}^{n}})$, we want to find a set $\mathcal {P}\subset \mathcal {R}$ with p elements and parameters $\bar{x}\in {{\mathbb {R}}^{n}}$, where $\mathcal {R}=\left\{ \left( {{t}_{i}},{{y}_{i}} \right) , i=1,\ldots ,r \right\} $ is the point of the given dataset in ${{\mathbb {R}}^{m}}\times \mathbb {R}$, such that $\varphi (t)=\phi (x,t),\phi \left( \bar{x},t \right) \approx {{y}_{i}}$, $\forall \left( {{t}_{i}},{{y}_{i}} \right) \in \mathcal {P}$. The remaining $r-p$ elements in $\mathcal {R}-\mathcal {P}$ are the possible outliers. Consider given $x\in {{\mathbb {R}}^{n}}$ and an integer p, $p\in \{ 1,..., r \}$, we sort $\{ {R_{i}}( x ), i=1,...,r \}$ in ascending order as follows:

$$\begin{aligned} {{R}_{{{i}_{1}}\left( x \right) }}\left( x \right) \le {{R}_{{{i}_{2}}\left( x \right) }}\left( x \right) \le \cdots \le {{R}_{{{i}_{k}}\left( x \right) }}\left( x \right) \le \cdots \le {{R}_{{{i}_{r}}\left( x \right) }}\left( x \right) . \end{aligned}$$

(1.2)

where ${i_{k}}( x )$ is the ${i_{k}}$-th smallest element in that set, for the given value of x.

This problem is of great importance in the field of robust estimation [2]. It is well known that there are often measurement errors in the acquisition of experimental data, which can lead to inconsistent values (outliers). The presence of these values can affect the obtained model, so it is necessary to ignore these outliers during the model adjustment process. The methods for dealing with outliers is shown in [3].

We can consider $\varphi (t)=\phi (x,t)$ as the fitted model and define ${R_{i}}(x)=\frac{1}{2}({{F}_{i}}(x))^{2}$, where ${{F}_{i}}(x)={{y}_{i}}-\phi (x,{{t}_{i}}),i=1,\ldots ,r$. Then, the corresponding LOVO problem is as follows:

$$\begin{aligned} \textrm{min}{{S}_{p}}\left( x \right) =\textrm{min}\sum \limits _{k=1}^{p}{{{R}_{{{i}_{k}}\left( x \right) }}\left( x \right) }=\textrm{min}\sum \limits _{k=1}^{p}{\frac{1}{2}}{{\left( {{F}_{{{i}_{k}}\left( x \right) }}\left( x \right) \right) }^{2}}. \end{aligned}$$

(1.3)

where ${{R}_{i}},i=1,\ldots ,r$ is a residual function.

When $p=r$, the LOVO problem is the classical least squares problem. When $p<r$, the parameter $\bar{x}\in {{\mathbb {R}}^{n}}$ solving (1.3) defines a model $\phi (\bar{x},t)$ that is free from the influence of the worst $r-p$ bias. When $p\ll r$, the LOVO problem is able to discover hidden structures where a large number of erroneous observations are mixed with a small number of correct data.

As a generalization of the classical nonlinear least squares problem, the LOVO problem has many applications, including robust estimation [4], hidden pattern recognition [1], and protein structure comparison [5], etc. The LOVO problem has been studied by many scholars both theoretically and algorithmically, and considerable results have been obtained. Andreani et al. [1, 5,6,7] interpreted the protein structure alignment problem as a LOVO problem, and proposed solving this problem by using Dynamic Programming Line Search algorithm, Gauss–Newton algorithm and Newtonian trust region method. They also defined two types of optimality conditions for the LOVO problem and proposed two essentially smooth algorithms that converge to weak and strong critical points. Birgin et al. [8] solved the Value-at-Risk constraints optimization problem by modeling the LOVO problem with Low-order value function constraints, and introduced the Augmented Lagrangian algorithm to solve it. Jiang et al. [9] proposed KKT necessary conditions and KKT sufficient conditions for the LOVO problem under convexity assumption, and also presented a smooth reconstruction of the LOVO problem that locally satisfies the KKT necessary conditions. Martinez [10] gave examples of LOVO problems that may be analyzed in the context of Piecewise–Smooth optimization, and proved the coordinate search method can ensure that the limit points satisfy the strong optimality condition .

Recently, Castelani et al. [11] proposed a new LM algorithm [12,13,14] to solve the LOVO problem. This algorithm can avoid the use of second-order information of the function in the calculation effectively. However, there are relatively few studies that provide in-depth analysis of algorithm complexity in existing research. As an upper bound on the number of iterations required to obtain an approximate solution, the global complexity bound is one of the important factors for selecting an appropriate algorithm. Many scholars have also studied this, including Argyros et al. [15] analyzed the complexity of Newton iteration algorithm and zhao et al. [16, 17] analyzed the complexity of LM algorithm. Inspired by those papers, this paper proposes a modified LM algorithm and focuses on the analysis of algorithm complexity. The contributions are as follows:

(a)
The modified LM algorithm is used to solve the LOVO problem, the algorithm uses the addition of LM step and approximate LM step as the next search direction, and solves the subproblem by QR decomposition or cholesky decomposition, which can save a lot of Jacobian matrix calculation effectively and accelerate the convergence speed.
(b)
The global convergence of the algorithm is given, and the worst-case complexity bound is discussed.

2 The modified Levenberg–Marquardt algorithm for LOVO problems

In order to solve the LOVO problem better, another form of this problem is given below. Let $\mathcal {C}=\left\{ {{\mathcal {C}}_{1}},\ldots ,{{\mathcal {C}}_{q}} \right\} $ denote the set of all combinations taking p values in element $\left\{ 1,2,\ldots ,r \right\} $ at a time, then for each $i\in \left\{ 1,\ldots ,q \right\} $ we define the following function:

$$\begin{aligned} {{f}_{i}}\left( x \right) =\sum \limits _{k\in {{\mathcal {C}}_{i}}}{{{R}_{k}}\left( x \right) } \end{aligned}$$

(2.1)

and

$$\begin{aligned} {{f}_{\min }}\left( x \right) =\min \left\{ {{f}_{i}}\left( x \right) ,i=1,\ldots ,q \right\} . \end{aligned}$$

(2.2)

It follows from (1.1) and (2.2) that ${{S}_{p}}(x)={{f}_{\min }}(x)$. If the function ${{f}_{i}}$ is continuous, since ${{S}_{p}}$ is a sum of continuous functions, then the function ${{S}_{p}}$ is also continuous. But even if all functions ${{f}_{i}}$ are differentiable, the function ${{S}_{p}}$ is generally nonsmooth.

It can be obtained from (2.1) that ${{R}_{k}}(x)=\frac{1}{2}{{\left( {{F}_{k}}(x) \right) }^{2}}, k\in {{\mathcal {C}}_{i}}, i=1,\ldots ,q$. Then (1.3) can be written as:

$$\begin{aligned} {{f}_{i}}\left( x \right) =\frac{1}{2}\sum \limits _{k\in {{\mathcal {C}}_{i}}}{{{F}_{k}}{{\left( x \right) }^{2}}}=\frac{1}{2}\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( x \right) \right\| _{2}^{2}. \end{aligned}$$

(2.3)

in (2.3), ${{F}_{{{\mathcal {C}}_{i}}}}(x)$ refers to the mapping of x to a vector of size p consisting of the function ${{F}_{k}}(x)$ defined in (1.3)$, k \in \mathcal {C}_i$, whose order is arbitrary. Similarly, ${{J}_{{{\mathcal {C}}_{i}}}}({{x}_{k}})$ is defined as the Jacobian matrix of the map.

In this section, the modified version of LM algorithm will be used to solve the LOVO problem, which mainly refers to the modified LM algorithm framework for solving nonlinear equations mentioned in [18], that is, in addition to calculating a LM step, an approximate LM step is also calculated in each iteration.

$$\begin{aligned} {{\hat{d}}_{k}}=-{{\left( {{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) +{{\gamma }_{k}}I \right) }^{-1}}{{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{F}_{{{\mathcal {C}}_{i}}}}\left( {{y}_{k}} \right) ,{{y}_{k}}={{x}_{k}}+{{d}_{k}}. \end{aligned}$$

(2.4)

where ${\gamma }_{k}>0$ is the damping parameter.

The addition of LM step and approximate LM step is used as the trial step of each iteration, saving a considerable amount of Jacobian matrix calculations effectively. At the same time, the trust region idea is used to determine whether to accept the trial step and update LM parameters, and then the convergence speed of traditional LM is accelerated. The above algorithm is referred to as MLM-LOVO. Since the LM algorithm is closely related to the trust-region method, some relevant definitions are introduced in the following.

According to [11], it has the following definition.

Definition 1

Given $x\in {{\mathbb {R}}^{n}}$, we define the minimal function set of ${{f}_{\min }}$ in x by

$$\begin{aligned}{{I}_{\min }}\left( x \right) =\left\{ i=\left\{ 1,\ldots ,q \right\} \left| {{f}_{\min }}\left( x \right) \right. ={{f}_{i}}\left( x \right) \right\} .\end{aligned}$$

In order to define the search direction of MLM-LOVO at the current point ${{x}_{k}}$, we choose an index $i\in {{I}_{\min }}({{x}_{k}})$ and use the direction defined by the computed improved LM algorithm in [11]. The original LM search direction ${{d}_{k}}\in {{\mathbb {R}}^{n}}$ is defined as the solution of:

$$\begin{aligned} \underset{d\in {{\mathbb {R}}^{n}}}{\mathop {\min }}\,{{m}_{k,i}}\left( d \right) =\frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) +{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) d \right\| }^{2}}+\frac{{{\gamma }_{k}}}{2}{{\left\| d \right\| }^{2}}, \end{aligned}$$

(2.5)

The approximate LM search direction ${{\hat{d}}_{k}}\in {{\mathbb {R}}^{n}}$ is the solution as follows:

$$\begin{aligned} \underset{d\in {{\mathbb {R}}^{n}}}{\mathop {\min }}\,{{m}_{k,i}}\left( {\hat{d}} \right) =\frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{y}_{k}} \right) +{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \hat{d} \right\| }^{2}}+\frac{{{\gamma }_{k}}}{2}{{\left\| {\hat{d}} \right\| }^{2}},{{y}_{k}}={{x}_{k}}+{{d}_{k}}, \end{aligned}$$

(2.6)

Then the search direction is denoted as ${{s}_{k}}$:

$$\begin{aligned} {{s}_{k}}={{d}_{k}}+{{\hat{d}}_{k}}. \end{aligned}$$

Define the actual reduction of ${{f}_{\min }}$ at the k-th iteration as:

$$\begin{aligned} Are{{d}_{k,i}}= & {} {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \\= & {} \frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \right\| }^{2}}-\frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \right\| }^{2}}, \end{aligned}$$

define the new predicted reduction as

$$\begin{aligned} Pre{{d}_{k,i}}={{m}_{k,i}}\left( 0 \right) -{{m}_{k,i}}\left( {{d}_{k}} \right) +{{m}_{k,i}}\left( \hat{0} \right) -{{m}_{k,i}}\left( {{{\hat{d}}}_{k}} \right) , \end{aligned}$$

And the ratio of the actual reduction to the predicted reduction:

$$\begin{aligned} {{\rho }_{k,i}}=\frac{Are{{d}_{k,i}}}{\Pr e{{d}_{k,i}}}. \end{aligned}$$

Similar to the trust-region method, ${{\rho }_{k,i}}$ is used to decide whether to accept the trial step and to update the LM parameter rule.

The detailed algorithm steps are given in the following.

3 Convergence analysis

3.1 Global convergence

The following basic assumptions are given in order to prove the global convergence of the algorithm.

Assumption 1

The level set

$$\begin{aligned}{{C}_{\left( {{x}_{0}} \right) }}=\left\{ x\in {{\mathbb {R}}^{n}}|{{f}_{\min }}\left( x \right) \le {{f}_{\min }}\left( {{x}_{0}} \right) \right\} \end{aligned}$$

is a bounded set on ${{\mathbb {R}}^{n}}$ such that the function ${{f}_{i}}, i=1,\ldots ,q$ is continuously differentiable and has Lipschitz continuous gradient in an open set containing ${{C}_{\left( {{x}_{0}} \right) }}$, where the Lipschitz constants ${{L}_{i}}>0$.

It can be deduced from Proposition 1 of [11]:

Given ${{x}_{k}}\in {{\mathbb {R}}^{n}},$ ${\gamma }_{k}>0,$ and ${{i}_{k}}\in \left\{ 1,\ldots ,q \right\} ,$ the Cauchy step obtained from

$$\begin{aligned}{{t}_{k}}=\underset{t\in \mathbb {R}}{\mathop {\arg \min }}\,\left\{ {{m}_{k,{{i}_{k}}}}\left( -t\nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right) \right\} \end{aligned}$$

and expressed by ${{d}^{C}}\left( {{x}_{k}} \right) =-{{t}_{k}}\nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \in {{\mathbb {R}}^{n}}.$

Similarly, ${{\hat{d}}^{C}}\left( {{y}_{k}} \right) =-{{t}_{k}}\nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \in {{\mathbb {R}}^{n}},$ satisfies:

$$\begin{aligned} \begin{aligned} \begin{aligned}&{{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}^{C}}\left( {{x}_{k}} \right) \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{\hat{d}}^{C}}\left( {{y}_{k}} \right) \right) \\&\ge \frac{\left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) }, \\ \end{aligned} \end{aligned} \end{aligned}$$

(3.1)

Since the Cauchy step is obtained by a constant that minimizes the model in the direction of the gradient vector, it is derivable:

$$\begin{aligned} \begin{aligned} \begin{aligned}&{{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{\hat{d}}_{k}} \right) \\&\ge \frac{\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) }. \\ \end{aligned} \end{aligned} \end{aligned}$$

(3.2)

where $\theta >0$, is independent of k.

Lemma 1

Let ${{x}_{k}}\in {{\mathbb {R}}^{n}}$, ${{i}_{k}}\in {{I}_{\min }}\left( {{x}_{k}} \right) $ be fixed in Step 1 of Algorithm 1, and an unsuccessful iteration of Step 3 of Algorithm 1 will be executed a finite number of times.

Proof

For each $\lambda $ fixed in step 1 of the Algorithm 1, there

$$\begin{aligned}\begin{aligned}&1-\frac{{{\rho }_{k,{{i}_{k}}}}}{2}=1-\frac{{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) }{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \\&\quad =\frac{2{{m}_{k,{{i}_{k}}}}\left( 0 \right) -2{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -2{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k}} \right) +{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) }{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \\&\quad =\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( 0 \right) -2{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -2{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) }{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) }. \\ \end{aligned}\end{aligned}$$

By Taylor series expansion and Lipschitz continuity of $\nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) $

$$\begin{aligned} {{f}_{{{i}_{k}}}}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \le {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) +\frac{{{L}_{{{i}_{k}}}}}{2}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}. \end{aligned}$$

(3.3)

By the definition of ${{f}_{\min }}$ and (3.3), it follows that

$$\begin{aligned} \begin{aligned} \begin{aligned}&{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \le {{f}_{{{i}_{k}}}}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \\&\text { }\le {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) +\frac{{{L}_{{{i}_{k}}}}}{2}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}. \\ \end{aligned} \end{aligned} \end{aligned}$$

(3.4)

and it can be obtained by (2.3) and step 3 of Algorithm 1

$$\begin{aligned} {{\left\| {{F}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) +{{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) {{d}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{d}_{k}} \right\| }^{2}}=2{{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{d}_{k}}. \end{aligned}$$

(3.5)

similarly

$$\begin{aligned} {{\left\| {{F}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{y}_{k}} \right) +{{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) {{{\hat{d}}}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{{\hat{d}}}_{k}} \right\| }^{2}}=2{{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{\hat{d}}_{k}}. \end{aligned}$$

(3.6)

according to (3.5) and (3.6), there is

$$\begin{aligned} 1-\frac{{{\rho }_{k,{{i}_{k}}}}}{2}&=\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{\left\| {{F}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) +{{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) {{d}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{d}_{k}} \right\| }^{2}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\quad +\frac{2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{y}_{k}} \right) +{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) {{{\hat{d}}}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{{\hat{d}}}_{k}} \right\| }^{2}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&=\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( 0 \right) -2{{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) -\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{d}_{k}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\quad +\frac{2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -2{{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) -\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&=\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) -{{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) -\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{d}_{k}}-\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\le \frac{\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}-\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}+\frac{{{L}_{{{i}_{k}}}}}{2}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\le \frac{{{L}_{{{i}_{k}}}}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}}{4\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) }. \end{aligned}$$

(3.7)

By the definition of ${{\gamma }_{k}}$, it follows that

$$\begin{aligned} \left\| {{d}_{k}} \right\| \le \frac{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }{{{\sigma }_{k}}+{{\gamma }_{k}}}\le \frac{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }{{{\gamma }_{k}}}=\frac{1}{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| \lambda }, \end{aligned}$$

(3.8)

similarly

$$\begin{aligned} \left\| {{{\hat{d}}}_{k}} \right\| \le \frac{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}\lambda }, \end{aligned}$$

(3.9)

Through the expressions (3.8) and (3.9) , there is

$$\begin{aligned} \begin{aligned} {{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}&={{\left( \sqrt{{{\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) }^{T}}\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) } \right) }^{2}} \\&\le {{\left( \sqrt{\frac{{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}} \right) }^{2}}\\&=\frac{{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}, \\ \end{aligned} \end{aligned}$$

(3.10)

where ${{\sigma }_{k}}={{\sigma }_{\min }}\left( {{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \right) $ and ${{\sigma }_{\min }}\left( A \right) $ denotes the smallest eigenvalue of A.

Replacing (3.10) into (3.7) and according to the definition of ${{\gamma }_{k}}$, it can be obtained

$$\begin{aligned} \begin{aligned}&\frac{\frac{{{L}_{{{i}_{k}}}}{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}}{4\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \\&\text { }\le \frac{\frac{{{L}_{{{i}_{k}}}}{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}}{\frac{4\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) }}\le \frac{\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) {{L}_{{{i}_{k}}}}}{2\theta {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}} \\&\text { }\le \left( \frac{{{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}}+\frac{1}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}} \right) \frac{{{L}_{{{i}_{k}}}}}{2\theta \lambda }, \\ \end{aligned} \end{aligned}$$

(3.11)

where supposing that $\lambda \ge 1$.

According to Eq.(3.11), it can be obtained

$$\begin{aligned}\underset{\lambda \rightarrow \infty }{\mathop {\lim }}\,1-\frac{{{\rho }_{k,{{i}_{k}}}}}{2}\le 0,\end{aligned}$$

which is equivalent to

$$\begin{aligned}\underset{\lambda \rightarrow \infty }{\mathop {\lim }}\,{{\rho }_{k,{{i}_{k}}}}\ge 2. \end{aligned}$$

$\square $

Theorem 1

Let ${{\left\{ {{x}_{k}} \right\} }_{k\in \mathbb {N}}}$ be the sequence generated by Algorithm 1, choose $\varepsilon =0$, and ${{x}^{*}}$ be the limit point of this sequence. ${\kappa }'=\left\{ k|{{i}_{k}}=i \right\} \subset \mathbb {N}$ is the infinite index subset belonging to $i\in \left\{ 1,\ldots ,q \right\} $ such that ${{\lim }_{k\in {\kappa }'}}{{x}_{k}}={{x}^{*}}$ holds and Assumption 1 is satisfied. There are

$$\begin{aligned}\underset{k\in {\kappa }'}{\mathop {\lim }}\,\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| =0\end{aligned}$$

and $i\in {{I}_{\min }}\left( {{x}^{*}} \right) $.

Proof

Since $\left\{ 1,\ldots ,q \right\} $ is a finite set, there is an index i chosen infinitely many times by Algorithm 1.

The proof follows by contradiction. For this index i, there exists $\varepsilon >0$ and an infinite subset ${\kappa }_{1} \subset {\kappa }'$ such that for all $k\in {{\kappa }_{1}}$, $\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| \ge \varepsilon $.

By the continuity of the Jacobian matrix ${{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) $, it follows that for all $k\in {{\kappa }_{1}}$

$$\begin{aligned} \left\| {{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }} \right\| \le \underset{k\in {{\kappa }_{1}}}{\mathop {\sup }}\,\left\{ {{\left\| {{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \right\| }^{2}} \right\} ={{J}_{i}} \end{aligned}$$

(3.12)

According to (3.11), for $\lambda \ge 1$, there are

$$\begin{aligned} \begin{aligned}&1-\frac{{{\rho }_{k,i}}}{2}\le \left( \frac{{{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}}+\frac{1}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}} \right) \frac{{{L}_{i}}}{2\theta \lambda }\le \left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{2\theta \lambda } \\&\quad \Rightarrow {{\rho }_{k,i}}\ge 2-\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta \lambda }, \\ \end{aligned} \end{aligned}$$

(3.13)

According to (3.13), we can concluded that when $\lambda \ge b=\max \left\{ 1,\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta } \right\} $, for all $k\in {{\kappa }_{1}}$, the successful iteration of step 3 of Algorithm 1 will be executed.

$$\begin{aligned} {{\rho }_{k,i}}\ge 2-\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta \lambda }\ge 2-\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta b}\ge 1\ge {{P}_{0}}, \end{aligned}$$

(3.14)

Based on step 4 of Algorithm 1, for all $k\in {{\kappa }_{1}}$, ${{\lambda }_{k}}$ is upper bounded by ${{\lambda }_{k}}\le M=4b$. It can be obtained

$$\begin{aligned} \begin{aligned}&\frac{{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) }{{{m}_{k,i}}\left( 0 \right) -{{m}_{k,i}}\left( {{d}_{k}} \right) +{{m}_{k,i}}\left( \hat{0} \right) -{{m}_{k,i}}\left( {{{\hat{d}}}_{k}} \right) }\ge {{P}_{0}} \\&\quad \Leftrightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge {{P}_{0}}\left( {{m}_{k,i}}\left( 0 \right) -{{m}_{k,i}}\left( {{d}_{k}} \right) +{{m}_{k,i}}\left( \hat{0} \right) -{{m}_{k,i}}\left( {{{\hat{d}}}_{k}} \right) \right) \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge {{P}_{0}}\left( \frac{\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) } \right) \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\lambda }_{k}}{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\lambda }_{k}}{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta }{2\left( \frac{{{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{{{\varepsilon }^{2}}}+{{\lambda }_{k}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\varepsilon }^{2}}{{\lambda }_{k}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2\left( {{J}_{i}}^{2}+{{\varepsilon }^{2}}M \right) }\\&\quad \Leftrightarrow {{f}_{\min }}\left( {{x}_{k+1}} \right) -{{f}_{\min }}\left( {{x}_{k}} \right) \le -\frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2C}, \end{aligned} \end{aligned}$$

where $C={{J}_{i}}^{2}+{{\varepsilon }^{2}}M$.

It contradicts the assumption that ${{f}_{\min }}$ has a lower bound. Therefore, there is no such ${{\kappa }_{1}}$, and hence

$$\begin{aligned}\underset{k\in {\kappa }'}{\mathop {\lim }}\,\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| =0.\end{aligned}$$

See [11] for the proof of the second part of the theorem. $\square $

3.2 Worst-case complexity bound

In the following, denote by $\text {}\mathcal {S}$ and $\text {}\mathcal {U}$ the set of successful versus unsuccessful iterations $\mathcal {S} = \{ k : {{\rho }_{k,{{i}_{k}}}} \ge {{p}_{0}} \}$, and $\mathcal {U} = \{ k : {{\rho }_{k,{{i}_{k}}}} < {{p}_{0}} \}$

The worst-case complexity bound for Algorithm 1 is established below. That is, given a tolerance $\varepsilon \in (0,1)$, the goal is to derive an upper bound on the number of iterations in the worst case such that

$$\begin{aligned}\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| <\varepsilon .\end{aligned}$$

Proposition 1

By Assumption 1, let $k_{\varepsilon }$ be the first iteration index such that the above equation holds. If the successful iteration index set $\mathcal {S}_{\varepsilon }$ before $k_{\varepsilon }$, there is

$$\begin{aligned}\left| {{\mathcal {S}}_{\varepsilon }} \right| \le K\text { }\text { }K=\frac{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }{{{P}_{0}}\theta {{\varepsilon }^{2}}}{{f}_{\min }}\left( {{x}_{1}} \right) .\end{aligned}$$

Proof

By the proof of Theorem 1 and $\left\| \nabla f_{{i_k}}(x_k) \right\| \ge \varepsilon $, where $k \in \mathcal {S}_{\varepsilon }$, there is

$$\begin{aligned}{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge {{P}_{0}}\frac{\theta {{\varepsilon }^{2}}}{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }. \end{aligned}$$

Therefore, by summing over all the iterated indices, it can be obtained

$$\begin{aligned}{{f}_{\min }}\left( {{x}_{1}} \right) \ge \sum \limits _{k\in {{S}_{\varepsilon }}}{{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) }\ge \left| {{\mathcal {S}}_{\varepsilon }} \right| \frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }. \end{aligned}$$

$\square $

Lemma 2

By Assumption 1, let ${{\mathcal {U}}_{\varepsilon }}$ denote the set of unsuccessful iterations whose iteration index is less than or equal to $k_{\varepsilon }$. There are

$$\begin{aligned}\left| {{\mathcal {U}}_{\varepsilon }} \right| \le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) \left| {{\mathcal {S}}_{\varepsilon }} \right| .\end{aligned}$$

Proof

By the definition of $k_{\varepsilon }$, we have $k_{\varepsilon }\in {{\mathcal {S}}_{\varepsilon }}$. The goal is to bound the number of unsuccessful iterations between two successful iterations. Let $\left\{ {{k}_{1}},\ldots ,{{k}_{t}}={{k}_{\varepsilon }} \right\} $ be an ordering of ${{\mathcal {S}}_{\varepsilon }}$, and $l\in \left\{ 1,\ldots ,t-1 \right\} $ due to the update formula for ${{\lambda }_{k}}$ at successful iterations, there

$$\begin{aligned}{{\lambda }_{{{k}_{l}}+1}}\ge \max \left\{ \frac{{{\lambda }_{k}}}{4},{{\lambda }_{\min }} \right\} \ge {{\lambda }_{\min }}.\end{aligned}$$

Similarly, there is ${{\lambda }_{k}}<b=\max \left\{ 1,\left( \frac{{{J}{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}{i}}}{\theta } \right\} $ for any unsuccessful iteration $k\in \left\{ {{k}_{l}}+1,...,{{k}_{l+1}}-1 \right\} $ using the update rule of ${{\lambda }_{k}}$ at unsuccessful iterations, it can be obtained

$$\begin{aligned}\forall k\in \left\{ {{k}_{l}}+1,...,{{k}_{l+1}}-1 \right\} {{\lambda }_{k}}={{4}^{k-{{k}_{l}}-1}}{{\lambda }_{{{k}_{l}}+1}}\ge {{4}^{k-{{k}_{l}}-1}}m.\end{aligned}$$

thus, the number of unsuccessful iterations between ${{k}_{l}}$ and ${{k}_{l+1}}$ is equal to${{k}_{l+1}}-{{k}_{l}}-1$satisfied

$$\begin{aligned}{{k}_{l+1}}-{{k}_{l}}-1\le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) .\end{aligned}$$

when $l=1,...,t-1$, we have

$$\begin{aligned}\sum \limits _{l=1}^{t-1}{\left( {{k}_{l+1}}-{{k}_{l}}-1 \right) }\le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) \left[ \left| {{\mathcal {S}}_{\varepsilon }} \right| -1 \right] .\end{aligned}$$

What remains is the number of possible unsuccessful iterations between the iteration at index 1 and the first successful iteration ${{k}_{1}}$. Due to ${{\lambda }_{1}}\ge m$, similar to the above can be obtained

$$\begin{aligned}{{k}_{1}}-1\le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) .\end{aligned}$$

Combining Proposition 1 and Lemma 2 yields the following complexity estimate. $\square $

Theorem 2

Under Assumption 1, Algorithm 1 is required at most

$$\begin{aligned}\left| {{\mathcal {S}}_{\varepsilon }} \right| +\left| {{\mathcal {U}}_{\varepsilon }} \right| =\left( {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) +1 \right) \frac{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }{{{P}_{0}}\theta {{\varepsilon }^{2}}}{{f}_{\min }}\left( {{x}_{1}} \right) .\end{aligned}$$

iteratively generate $\left| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right| <\varepsilon $.

4 Numerical example

Now the numerical experiments of the proposed algorithm are carried out and the numerical results are compared with those of the classical GN algorithm [19] and LM algorithm [11]. The algorithm is implemented in the Julia language version 1.0.4, where the test environment is: Intel(R) Core(TM) i5-11300H CPU@ 3.10GHz 16.0GB RAM.

During the experiment, MLM-LOVO used a simple reduced test ${{f}_{\min }}\left( {{x}_{k}}+{{s}_{k}} \right) <{{f}_{\min }}$ at step 3, the computation of the direction ${{s}_{k}}$ is done by solving the two linear systems in Step 3 of the algorithm with the Cholesky factorization of the matrix ${{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) +{{\gamma }_{k}}I$. QR decomposition is also used to solve the problem in the experiment, and the experimental effect is equivalent to the former. In the experiment, MLM-LOVO was used as a subroutine of RAFF to solve the problem, and the details of RAFF can be found in reference [11].

A solution $\overline{x}={{x}_{k}}$ is said to be successful if

$$\begin{aligned}\left\| \nabla {{f}_{{{i}_{k}}}}\left( \overline{x} \right) \right\| \le \varepsilon \end{aligned}$$

is satisfied, for some ${{i}_{k}}\in {{I}_{\min }}\left( \overline{x} \right) $, where ${{f}_{{{i}_{k}}}}$ is given by (2.1). The algorithm stops if the gradient cannot be calculated due to numerical errors.

The above cubic model is as follows

$$\begin{aligned}\phi \left( x,t \right) ={{x}_{1}}{{t}^{3}}+{{x}_{2}}{{t}^{2}}+{{x}_{3}}t+{{x}_{4}},\end{aligned}$$

where $x\in {{\mathbb {R}}^{4}}$ represents the model parameters, $t\in R$ represents the model variables, the continuous line represents the model adjusted by the MLM-LOVO algorithm, and the dashed line is the model of the "correct solution". Where r is 10 and 100, p is 9 and 90, the initial point, the accuracy parameter $\varepsilon ={{10}^{-4}}$, and the exact solution is ${{x}^{*}}=(2,0,-4,-10)$. Figure 1 shows that the proposed MLM-LOVO algorithm can be used to correctly identify and ignore outliers.

Table 1 MLM-LOVO Numerical results

Full size table

MLM-LOVO algorithm has the characteristics of small dimension and large data set when fitting problems. For the cubic model mentioned above, The MLM-LOVO algorithm proposed in this paper, the LM-LOVO algorithm in reference [11] and the GN-LOVO algorithm are used to test the data sets containing 10, 100, 1000 and 10,000 data. The randomly generated questions include: 10 points and 1 outlier, 10 points and 2 outliers, 100 points and 2 outliers, 100 points and 10 outliers, 1000 points and 10 outliers, 1000 points and 100 outliers, and 10,000 points and 100 outliers. Among them, the numerical experimental results of some data of the MLM-LOVO algorithm are shown in Table 1. r, p denote the number of data and the number of trusted points, respectively, F denotes the detected outliers, and T denotes the correct outlier among the detected outliers, and IT denotes the number of iterations.

The comparison of time and number of iterations numerical results of the three algorithms is shown in Table 2. As can be seen in Table 2, the proposed algorithm has certain advantages in time and number of iterations for the same initial points when testing the data set of the same size. In addition, the difference in the number of outliers seems to have little effect on the algorithm time and the number of iterations.

Table 2 Numerical results of solving cubic model

Full size table

The comparison of the numerical results of the function value calculation and Jacobian matrix calculation of the three algorithms is shown in Table 3. As can be seen from Table 3, the Jacobian matrix computation in the proposed algorithm is less than that of GN-LOVO algorithm and LM-LOVO algorithm, that is, the proposed algorithm reduces the Jacobian matrix computation to some extent. Numerical results show the feasibility and effectiveness of the proposed algorithm for the LOVO problem.

Table 3 Numerical results of solving cubic model

Full size table

5 Conclusion

In this paper, we employ the modified LM algorithm to tackle the LOVO problem. The algorithm we adopt computes a single LM step and an approximate LM step within each iteration, consequently minimizing the Jacobian computation overhead. We provide a numerical example to validate the algorithm, presenting the numerical experimental results. Comparative analysis against the GN algorithm and LM algorithm showcases distinct advantages in terms of both time efficiency and iteration count for the proposed approach.

References

Andreani, R., Martinez, J.M., Martinez, L., Yano, F.S.: Low order-value optimization and applications. J. Global Optim. 43(1), 1–22 (2009)
Article MathSciNet Google Scholar
Andreani, R., Dunder, C., Martinez, J.M.: Nonlinear-programming reformulation of the order-value optimization problem. Math. Methods Oper. Res. 61(3), 365–384 (2005)
Article MathSciNet Google Scholar
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Article Google Scholar
Law, J.: Robust statistics-the approach based on influence functions. J. Royal Stat. Soc. Ser. D Statist. 35(5), 565–566 (2018)
Google Scholar
Andreani, R., Martinez, J.M., Martinez, L., Yano, F.: Continuous optimization methods for structure alignments. Math. Program. 112(1), 93–124 (2008)
Article MathSciNet Google Scholar
Martinez, L., Andreani, R., Martinez, J.M.: Convergent algorithms for protein structural alignment. BMC Bioinformatics. 8(306) (2007)
Andreani, R., Martinez, J.M., Martinez, L.: Trust-region superposition methods for protein alignment. IMA J. Numer. Anal. 28(4), 690–710 (2008)
Article MathSciNet Google Scholar
Birgin, E.G., Bueno, L.F., Krejic, N., Martinez, J.M.: Low order-value approach for solving var-constrained optimization problems. J. Global Optim. 51(4), 715–742 (2011)
Article MathSciNet Google Scholar
Jiang, Z., Hu, Q., Zheng, X.: Optimality condition and complexity of order-value optimization problems and low order-value optimization problems. J. Global Optim. 69(2), 511–523 (2017)
Article MathSciNet Google Scholar
Martinez, J.M.: Generalized order-value optimization. TOP. 20(1), 75–98 (2012)
Article MathSciNet Google Scholar
Castelani, E.V., Lopes, R., Shirabayashi, W.V.I., Sobral, F.N.C.: A robust method based on lovo functions for solving least squares problems. J. Global Optim. 80(2), 387–414 (2021)
Article MathSciNet Google Scholar
Bergou, E., Gratton, S., Vicente, L.N.: Levenberg–Marquardt methods based on probabilistic gradient models and inexact subproblem solution, with application to data assimilation. SIAM/ASA J. Uncertain. Quantif. 4(1), 924–951 (2016)
Article MathSciNet Google Scholar
Moré, J.J.: The Levenberg–Marquardt algorithm: Implementation and theory. In: Watson, G.A. (ed.) Numerical Analysis, pp. 105–116. Springer, Berlin, Heidelberg (1978)
Chapter Google Scholar
Rezaeiparsa, Z., Ashrafi, A.: A new adaptive levenberg-marquardt parameter with a nonmonotone and trust region strategies for the system of nonlinear equations. Math. Sci. (2023)
Argyros, I.K., Silva, G.N.: Extended Traub–Wolniakowski convergence and complexity of newton iteration in banach space. J. Complex. 43, 38–50 (2017)
Article Google Scholar
Zhao, R., Fan, J.: Global complexity bound of the Levenberg–Marquardt method. Optim. Methods Softw. 31(4), 805–814 (2016)
Article MathSciNet Google Scholar
Bergou, E.H., Diouane, Y., Kungurtsev, V.: Convergence and complexity analysis of a Levenberg–Marquardt algorithm for inverse problems. J. Optim. Theory Appl. 185(3), 927–944 (2020)
Article MathSciNet Google Scholar
Fan, J.: The modified Levenberg–Marquardt method for nonliner equations with cubic convergence. Math. Comput. 81(277), 447–466 (2012)
Article Google Scholar
Andreani, R., Cesar, G., Cesar-Jr, R. M., Martinez, J. M., Silva, P. J. S.: Efficient curve detection using a Gauss-Newton method with applications in agriculture. In: Proceedings 1st International Workshop on Computer Vision Applications for Developing Regions in Conjunction with ICCV 2007-CVDR-ICCV07 (2007)

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China (Grant No.12371308).

Author information

Authors and Affiliations

College of Science, University of Shanghai for Science and Technology, Shanghai, 200093, China
Xiaochen Lv & Zhensheng Yu

Authors

Xiaochen Lv
View author publications
You can also search for this author in PubMed Google Scholar
Zhensheng Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhensheng Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no known conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lv, X., Yu, Z. A modified Levenberg–Marquardt algorithm for low order-value optimization problem. J. Appl. Math. Comput. (2024). https://doi.org/10.1007/s12190-024-02140-1

Download citation

Received: 25 March 2024
Revised: 20 May 2024
Accepted: 22 May 2024
Published: 24 July 2024
DOI: https://doi.org/10.1007/s12190-024-02140-1

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A modified Levenberg–Marquardt algorithm for low order-value optimization problem

Abstract

1 Introduction

2 The modified Levenberg–Marquardt algorithm for LOVO problems

Definition 1

3 Convergence analysis

3.1 Global convergence

Assumption 1

Lemma 1

Proof

Theorem 1

Proof

3.2 Worst-case complexity bound

Proposition 1

Proof

Lemma 2

Proof

Theorem 2

4 Numerical example

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation