1 Introduction

In this paper, we consider the following low order-value optimization (LOVO) [1] problem:

$$\begin{aligned} \min \quad {{S}_{p}}\left( x \right) =\sum \limits _{k=1}^{p}{{{R}_{{{i}_{k}}\left( x \right) }}}\left( x \right) \end{aligned}$$
(1.1)

where \({R_{i}}:{{\mathbb {R}}^{n}}\rightarrow \mathbb {R}, i=1,\ldots ,r\). In detail, given a model \(\varphi (t)\) depends on n parameters \((x\in {{\mathbb {R}}^{n}})\), we want to find a set \(\mathcal {P}\subset \mathcal {R}\) with p elements and parameters \(\bar{x}\in {{\mathbb {R}}^{n}}\), where \(\mathcal {R}=\left\{ \left( {{t}_{i}},{{y}_{i}} \right) , i=1,\ldots ,r \right\} \) is the point of the given dataset in \({{\mathbb {R}}^{m}}\times \mathbb {R}\), such that \(\varphi (t)=\phi (x,t),\phi \left( \bar{x},t \right) \approx {{y}_{i}}\), \(\forall \left( {{t}_{i}},{{y}_{i}} \right) \in \mathcal {P}\). The remaining \(r-p\) elements in \(\mathcal {R}-\mathcal {P}\) are the possible outliers. Consider given \(x\in {{\mathbb {R}}^{n}}\) and an integer p, \(p\in \{ 1,..., r \}\), we sort \(\{ {R_{i}}( x ), i=1,...,r \}\) in ascending order as follows:

$$\begin{aligned} {{R}_{{{i}_{1}}\left( x \right) }}\left( x \right) \le {{R}_{{{i}_{2}}\left( x \right) }}\left( x \right) \le \cdots \le {{R}_{{{i}_{k}}\left( x \right) }}\left( x \right) \le \cdots \le {{R}_{{{i}_{r}}\left( x \right) }}\left( x \right) . \end{aligned}$$
(1.2)

where \({i_{k}}( x )\) is the \({i_{k}}\)-th smallest element in that set, for the given value of x.

This problem is of great importance in the field of robust estimation [2]. It is well known that there are often measurement errors in the acquisition of experimental data, which can lead to inconsistent values (outliers). The presence of these values can affect the obtained model, so it is necessary to ignore these outliers during the model adjustment process. The methods for dealing with outliers is shown in [3].

We can consider \(\varphi (t)=\phi (x,t)\) as the fitted model and define \({R_{i}}(x)=\frac{1}{2}({{F}_{i}}(x))^{2}\), where \({{F}_{i}}(x)={{y}_{i}}-\phi (x,{{t}_{i}}),i=1,\ldots ,r\). Then, the corresponding LOVO problem is as follows:

$$\begin{aligned} \textrm{min}{{S}_{p}}\left( x \right) =\textrm{min}\sum \limits _{k=1}^{p}{{{R}_{{{i}_{k}}\left( x \right) }}\left( x \right) }=\textrm{min}\sum \limits _{k=1}^{p}{\frac{1}{2}}{{\left( {{F}_{{{i}_{k}}\left( x \right) }}\left( x \right) \right) }^{2}}. \end{aligned}$$
(1.3)

where \({{R}_{i}},i=1,\ldots ,r\) is a residual function.

When \(p=r\), the LOVO problem is the classical least squares problem. When \(p<r\), the parameter \(\bar{x}\in {{\mathbb {R}}^{n}}\) solving (1.3) defines a model \(\phi (\bar{x},t)\) that is free from the influence of the worst \(r-p\) bias. When \(p\ll r\), the LOVO problem is able to discover hidden structures where a large number of erroneous observations are mixed with a small number of correct data.

As a generalization of the classical nonlinear least squares problem, the LOVO problem has many applications, including robust estimation [4], hidden pattern recognition [1], and protein structure comparison [5], etc. The LOVO problem has been studied by many scholars both theoretically and algorithmically, and considerable results have been obtained. Andreani et al. [1, 5,6,7] interpreted the protein structure alignment problem as a LOVO problem, and proposed solving this problem by using Dynamic Programming Line Search algorithm, Gauss–Newton algorithm and Newtonian trust region method. They also defined two types of optimality conditions for the LOVO problem and proposed two essentially smooth algorithms that converge to weak and strong critical points. Birgin et al. [8] solved the Value-at-Risk constraints optimization problem by modeling the LOVO problem with Low-order value function constraints, and introduced the Augmented Lagrangian algorithm to solve it. Jiang et al. [9] proposed KKT necessary conditions and KKT sufficient conditions for the LOVO problem under convexity assumption, and also presented a smooth reconstruction of the LOVO problem that locally satisfies the KKT necessary conditions. Martinez [10] gave examples of LOVO problems that may be analyzed in the context of Piecewise–Smooth optimization, and proved the coordinate search method can ensure that the limit points satisfy the strong optimality condition .

Recently, Castelani et al. [11] proposed a new LM algorithm [12,13,14] to solve the LOVO problem. This algorithm can avoid the use of second-order information of the function in the calculation effectively. However, there are relatively few studies that provide in-depth analysis of algorithm complexity in existing research. As an upper bound on the number of iterations required to obtain an approximate solution, the global complexity bound is one of the important factors for selecting an appropriate algorithm. Many scholars have also studied this, including Argyros et al. [15] analyzed the complexity of Newton iteration algorithm and zhao et al. [16, 17] analyzed the complexity of LM algorithm. Inspired by those papers, this paper proposes a modified LM algorithm and focuses on the analysis of algorithm complexity. The contributions are as follows:

  1. (a)

    The modified LM algorithm is used to solve the LOVO problem, the algorithm uses the addition of LM step and approximate LM step as the next search direction, and solves the subproblem by QR decomposition or cholesky decomposition, which can save a lot of Jacobian matrix calculation effectively and accelerate the convergence speed.

  2. (b)

    The global convergence of the algorithm is given, and the worst-case complexity bound is discussed.

2 The modified Levenberg–Marquardt algorithm for LOVO problems

In order to solve the LOVO problem better, another form of this problem is given below. Let \(\mathcal {C}=\left\{ {{\mathcal {C}}_{1}},\ldots ,{{\mathcal {C}}_{q}} \right\} \) denote the set of all combinations taking p values in element \(\left\{ 1,2,\ldots ,r \right\} \) at a time, then for each \(i\in \left\{ 1,\ldots ,q \right\} \) we define the following function:

$$\begin{aligned} {{f}_{i}}\left( x \right) =\sum \limits _{k\in {{\mathcal {C}}_{i}}}{{{R}_{k}}\left( x \right) } \end{aligned}$$
(2.1)

and

$$\begin{aligned} {{f}_{\min }}\left( x \right) =\min \left\{ {{f}_{i}}\left( x \right) ,i=1,\ldots ,q \right\} . \end{aligned}$$
(2.2)

It follows from (1.1) and (2.2) that \({{S}_{p}}(x)={{f}_{\min }}(x)\). If the function \({{f}_{i}}\) is continuous, since \({{S}_{p}}\) is a sum of continuous functions, then the function \({{S}_{p}}\) is also continuous. But even if all functions \({{f}_{i}}\) are differentiable, the function \({{S}_{p}}\) is generally nonsmooth.

It can be obtained from (2.1) that \({{R}_{k}}(x)=\frac{1}{2}{{\left( {{F}_{k}}(x) \right) }^{2}}, k\in {{\mathcal {C}}_{i}}, i=1,\ldots ,q\). Then (1.3) can be written as:

$$\begin{aligned} {{f}_{i}}\left( x \right) =\frac{1}{2}\sum \limits _{k\in {{\mathcal {C}}_{i}}}{{{F}_{k}}{{\left( x \right) }^{2}}}=\frac{1}{2}\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( x \right) \right\| _{2}^{2}. \end{aligned}$$
(2.3)

in (2.3), \({{F}_{{{\mathcal {C}}_{i}}}}(x)\) refers to the mapping of x to a vector of size p consisting of the function \({{F}_{k}}(x)\) defined in (1.3)\(, k \in \mathcal {C}_i\), whose order is arbitrary. Similarly, \({{J}_{{{\mathcal {C}}_{i}}}}({{x}_{k}})\) is defined as the Jacobian matrix of the map.

In this section, the modified version of LM algorithm will be used to solve the LOVO problem, which mainly refers to the modified LM algorithm framework for solving nonlinear equations mentioned in [18], that is, in addition to calculating a LM step, an approximate LM step is also calculated in each iteration.

$$\begin{aligned} {{\hat{d}}_{k}}=-{{\left( {{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) +{{\gamma }_{k}}I \right) }^{-1}}{{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{F}_{{{\mathcal {C}}_{i}}}}\left( {{y}_{k}} \right) ,{{y}_{k}}={{x}_{k}}+{{d}_{k}}. \end{aligned}$$
(2.4)

where \({\gamma }_{k}>0\) is the damping parameter.

The addition of LM step and approximate LM step is used as the trial step of each iteration, saving a considerable amount of Jacobian matrix calculations effectively. At the same time, the trust region idea is used to determine whether to accept the trial step and update LM parameters, and then the convergence speed of traditional LM is accelerated. The above algorithm is referred to as MLM-LOVO. Since the LM algorithm is closely related to the trust-region method, some relevant definitions are introduced in the following.

According to [11], it has the following definition.

Definition 1

Given \(x\in {{\mathbb {R}}^{n}}\), we define the minimal function set of \({{f}_{\min }}\) in x by

$$\begin{aligned}{{I}_{\min }}\left( x \right) =\left\{ i=\left\{ 1,\ldots ,q \right\} \left| {{f}_{\min }}\left( x \right) \right. ={{f}_{i}}\left( x \right) \right\} .\end{aligned}$$

In order to define the search direction of MLM-LOVO at the current point \({{x}_{k}}\), we choose an index \(i\in {{I}_{\min }}({{x}_{k}})\) and use the direction defined by the computed improved LM algorithm in [11]. The original LM search direction \({{d}_{k}}\in {{\mathbb {R}}^{n}}\) is defined as the solution of:

$$\begin{aligned} \underset{d\in {{\mathbb {R}}^{n}}}{\mathop {\min }}\,{{m}_{k,i}}\left( d \right) =\frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) +{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) d \right\| }^{2}}+\frac{{{\gamma }_{k}}}{2}{{\left\| d \right\| }^{2}}, \end{aligned}$$
(2.5)

The approximate LM search direction \({{\hat{d}}_{k}}\in {{\mathbb {R}}^{n}}\) is the solution as follows:

$$\begin{aligned} \underset{d\in {{\mathbb {R}}^{n}}}{\mathop {\min }}\,{{m}_{k,i}}\left( {\hat{d}} \right) =\frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{y}_{k}} \right) +{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \hat{d} \right\| }^{2}}+\frac{{{\gamma }_{k}}}{2}{{\left\| {\hat{d}} \right\| }^{2}},{{y}_{k}}={{x}_{k}}+{{d}_{k}}, \end{aligned}$$
(2.6)

Then the search direction is denoted as \({{s}_{k}}\):

$$\begin{aligned} {{s}_{k}}={{d}_{k}}+{{\hat{d}}_{k}}. \end{aligned}$$

Define the actual reduction of \({{f}_{\min }}\) at the k-th iteration as:

$$\begin{aligned} Are{{d}_{k,i}}= & {} {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \\= & {} \frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \right\| }^{2}}-\frac{1}{2}{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \right\| }^{2}}, \end{aligned}$$

define the new predicted reduction as

$$\begin{aligned} Pre{{d}_{k,i}}={{m}_{k,i}}\left( 0 \right) -{{m}_{k,i}}\left( {{d}_{k}} \right) +{{m}_{k,i}}\left( \hat{0} \right) -{{m}_{k,i}}\left( {{{\hat{d}}}_{k}} \right) , \end{aligned}$$

And the ratio of the actual reduction to the predicted reduction:

$$\begin{aligned} {{\rho }_{k,i}}=\frac{Are{{d}_{k,i}}}{\Pr e{{d}_{k,i}}}. \end{aligned}$$

Similar to the trust-region method, \({{\rho }_{k,i}}\) is used to decide whether to accept the trial step and to update the LM parameter rule.

The detailed algorithm steps are given in the following.

3 Convergence analysis

3.1 Global convergence

The following basic assumptions are given in order to prove the global convergence of the algorithm.

Assumption 1

The level set

$$\begin{aligned}{{C}_{\left( {{x}_{0}} \right) }}=\left\{ x\in {{\mathbb {R}}^{n}}|{{f}_{\min }}\left( x \right) \le {{f}_{\min }}\left( {{x}_{0}} \right) \right\} \end{aligned}$$

is a bounded set on \({{\mathbb {R}}^{n}}\) such that the function \({{f}_{i}}, i=1,\ldots ,q\) is continuously differentiable and has Lipschitz continuous gradient in an open set containing \({{C}_{\left( {{x}_{0}} \right) }}\), where the Lipschitz constants \({{L}_{i}}>0\).

It can be deduced from Proposition 1 of [11]:

Given \({{x}_{k}}\in {{\mathbb {R}}^{n}},\) \({\gamma }_{k}>0,\) and \({{i}_{k}}\in \left\{ 1,\ldots ,q \right\} ,\) the Cauchy step obtained from

$$\begin{aligned}{{t}_{k}}=\underset{t\in \mathbb {R}}{\mathop {\arg \min }}\,\left\{ {{m}_{k,{{i}_{k}}}}\left( -t\nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right) \right\} \end{aligned}$$

and expressed by \({{d}^{C}}\left( {{x}_{k}} \right) =-{{t}_{k}}\nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \in {{\mathbb {R}}^{n}}.\)

Similarly, \({{\hat{d}}^{C}}\left( {{y}_{k}} \right) =-{{t}_{k}}\nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \in {{\mathbb {R}}^{n}},\) satisfies:

$$\begin{aligned} \begin{aligned} \begin{aligned}&{{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}^{C}}\left( {{x}_{k}} \right) \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{\hat{d}}^{C}}\left( {{y}_{k}} \right) \right) \\&\ge \frac{\left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) }, \\ \end{aligned} \end{aligned} \end{aligned}$$
(3.1)

Since the Cauchy step is obtained by a constant that minimizes the model in the direction of the gradient vector, it is derivable:

$$\begin{aligned} \begin{aligned} \begin{aligned}&{{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{\hat{d}}_{k}} \right) \\&\ge \frac{\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) }. \\ \end{aligned} \end{aligned} \end{aligned}$$
(3.2)

where \(\theta >0\), is independent of k.

Lemma 1

Let \({{x}_{k}}\in {{\mathbb {R}}^{n}}\), \({{i}_{k}}\in {{I}_{\min }}\left( {{x}_{k}} \right) \) be fixed in Step 1 of Algorithm 1, and an unsuccessful iteration of Step 3 of Algorithm 1 will be executed a finite number of times.

Proof

For each \(\lambda \) fixed in step 1 of the Algorithm 1, there

$$\begin{aligned}\begin{aligned}&1-\frac{{{\rho }_{k,{{i}_{k}}}}}{2}=1-\frac{{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) }{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \\&\quad =\frac{2{{m}_{k,{{i}_{k}}}}\left( 0 \right) -2{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -2{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k}} \right) +{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) }{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \\&\quad =\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( 0 \right) -2{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -2{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) }{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) }. \\ \end{aligned}\end{aligned}$$

By Taylor series expansion and Lipschitz continuity of \(\nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \)

$$\begin{aligned} {{f}_{{{i}_{k}}}}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \le {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) +\frac{{{L}_{{{i}_{k}}}}}{2}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}. \end{aligned}$$
(3.3)

By the definition of \({{f}_{\min }}\) and (3.3), it follows that

$$\begin{aligned} \begin{aligned} \begin{aligned}&{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \le {{f}_{{{i}_{k}}}}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) \\&\text { }\le {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) +\frac{{{L}_{{{i}_{k}}}}}{2}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}. \\ \end{aligned} \end{aligned} \end{aligned}$$
(3.4)

and it can be obtained by (2.3) and step 3 of Algorithm 1

$$\begin{aligned} {{\left\| {{F}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) +{{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) {{d}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{d}_{k}} \right\| }^{2}}=2{{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{d}_{k}}. \end{aligned}$$
(3.5)

similarly

$$\begin{aligned} {{\left\| {{F}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{y}_{k}} \right) +{{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) {{{\hat{d}}}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{{\hat{d}}}_{k}} \right\| }^{2}}=2{{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) +\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{\hat{d}}_{k}}. \end{aligned}$$
(3.6)

according to (3.5) and (3.6), there is

$$\begin{aligned} 1-\frac{{{\rho }_{k,{{i}_{k}}}}}{2}&=\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{\left\| {{F}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) +{{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) {{d}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{d}_{k}} \right\| }^{2}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\quad +\frac{2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{\left\| {{F}_{{{\mathcal {C}}_{i}}}}\left( {{y}_{k}} \right) +{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) {{{\hat{d}}}_{k}} \right\| }^{2}}+{{\gamma }_{k}}{{\left\| {{{\hat{d}}}_{k}} \right\| }^{2}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&=\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( 0 \right) -2{{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) -\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{d}_{k}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\quad +\frac{2{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -2{{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) -\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&=\frac{{{f}_{\min }}\left( {{x}_{k}}+{{d}_{k}}+{{{\hat{d}}}_{k}} \right) -{{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) -\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{d}_{k}}-\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\le \frac{\nabla {{f}_{{{i}_{k}}}}{{\left( {{x}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}-\nabla {{f}_{{{i}_{k}}}}{{\left( {{y}_{k}} \right) }^{T}}{{{\hat{d}}}_{k}}+\frac{{{L}_{{{i}_{k}}}}}{2}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}}{2\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \nonumber \\&\le \frac{{{L}_{{{i}_{k}}}}{{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}}{4\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) }. \end{aligned}$$
(3.7)

By the definition of \({{\gamma }_{k}}\), it follows that

$$\begin{aligned} \left\| {{d}_{k}} \right\| \le \frac{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }{{{\sigma }_{k}}+{{\gamma }_{k}}}\le \frac{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }{{{\gamma }_{k}}}=\frac{1}{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| \lambda }, \end{aligned}$$
(3.8)

similarly

$$\begin{aligned} \left\| {{{\hat{d}}}_{k}} \right\| \le \frac{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}\lambda }, \end{aligned}$$
(3.9)

Through the expressions (3.8) and (3.9) , there is

$$\begin{aligned} \begin{aligned} {{\left\| {{d}_{k}}+{{{\hat{d}}}_{k}} \right\| }^{2}}&={{\left( \sqrt{{{\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) }^{T}}\left( {{d}_{k}}+{{{\hat{d}}}_{k}} \right) } \right) }^{2}} \\&\le {{\left( \sqrt{\frac{{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}} \right) }^{2}}\\&=\frac{{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}, \\ \end{aligned} \end{aligned}$$
(3.10)

where \({{\sigma }_{k}}={{\sigma }_{\min }}\left( {{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \right) \) and \({{\sigma }_{\min }}\left( A \right) \) denotes the smallest eigenvalue of A.

Replacing (3.10) into (3.7) and according to the definition of \({{\gamma }_{k}}\), it can be obtained

$$\begin{aligned} \begin{aligned}&\frac{\frac{{{L}_{{{i}_{k}}}}{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}}{4\left( {{m}_{k,{{i}_{k}}}}\left( 0 \right) -{{m}_{k,{{i}_{k}}}}\left( {{d}_{k}} \right) +{{m}_{k,{{i}_{k}}}}\left( \hat{0} \right) -{{m}_{k,{{i}_{k}}}}\left( {{{\hat{d}}}_{k}} \right) \right) } \\&\text { }\le \frac{\frac{{{L}_{{{i}_{k}}}}{{\left( \left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| +\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| \right) }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}}}{\frac{4\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) }}\le \frac{\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) {{L}_{{{i}_{k}}}}}{2\theta {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}{{\lambda }^{2}}} \\&\text { }\le \left( \frac{{{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}}+\frac{1}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}} \right) \frac{{{L}_{{{i}_{k}}}}}{2\theta \lambda }, \\ \end{aligned} \end{aligned}$$
(3.11)

where supposing that \(\lambda \ge 1\).

According to Eq.(3.11), it can be obtained

$$\begin{aligned}\underset{\lambda \rightarrow \infty }{\mathop {\lim }}\,1-\frac{{{\rho }_{k,{{i}_{k}}}}}{2}\le 0,\end{aligned}$$

which is equivalent to

$$\begin{aligned}\underset{\lambda \rightarrow \infty }{\mathop {\lim }}\,{{\rho }_{k,{{i}_{k}}}}\ge 2. \end{aligned}$$

\(\square \)

Theorem 1

Let \({{\left\{ {{x}_{k}} \right\} }_{k\in \mathbb {N}}}\) be the sequence generated by Algorithm 1, choose \(\varepsilon =0\), and \({{x}^{*}}\) be the limit point of this sequence. \({\kappa }'=\left\{ k|{{i}_{k}}=i \right\} \subset \mathbb {N}\) is the infinite index subset belonging to \(i\in \left\{ 1,\ldots ,q \right\} \) such that \({{\lim }_{k\in {\kappa }'}}{{x}_{k}}={{x}^{*}}\) holds and Assumption 1 is satisfied. There are

$$\begin{aligned}\underset{k\in {\kappa }'}{\mathop {\lim }}\,\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| =0\end{aligned}$$

and \(i\in {{I}_{\min }}\left( {{x}^{*}} \right) \).

Proof

Since \(\left\{ 1,\ldots ,q \right\} \) is a finite set, there is an index i chosen infinitely many times by Algorithm 1.

The proof follows by contradiction. For this index i, there exists \(\varepsilon >0\) and an infinite subset \({\kappa }_{1} \subset {\kappa }'\) such that for all \(k\in {{\kappa }_{1}}\), \(\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| \ge \varepsilon \).

By the continuity of the Jacobian matrix \({{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \), it follows that for all \(k\in {{\kappa }_{1}}\)

$$\begin{aligned} \left\| {{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }} \right\| \le \underset{k\in {{\kappa }_{1}}}{\mathop {\sup }}\,\left\{ {{\left\| {{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) \right\| }^{2}} \right\} ={{J}_{i}} \end{aligned}$$
(3.12)

According to (3.11), for \(\lambda \ge 1\), there are

$$\begin{aligned} \begin{aligned}&1-\frac{{{\rho }_{k,i}}}{2}\le \left( \frac{{{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{4}}}+\frac{1}{{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}} \right) \frac{{{L}_{i}}}{2\theta \lambda }\le \left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{2\theta \lambda } \\&\quad \Rightarrow {{\rho }_{k,i}}\ge 2-\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta \lambda }, \\ \end{aligned} \end{aligned}$$
(3.13)

According to (3.13), we can concluded that when \(\lambda \ge b=\max \left\{ 1,\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta } \right\} \), for all \(k\in {{\kappa }_{1}}\), the successful iteration of step 3 of Algorithm 1 will be executed.

$$\begin{aligned} {{\rho }_{k,i}}\ge 2-\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta \lambda }\ge 2-\left( \frac{{{J}_{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}_{i}}}{\theta b}\ge 1\ge {{P}_{0}}, \end{aligned}$$
(3.14)

Based on step 4 of Algorithm 1, for all \(k\in {{\kappa }_{1}}\), \({{\lambda }_{k}}\) is upper bounded by \({{\lambda }_{k}}\le M=4b\). It can be obtained

$$\begin{aligned} \begin{aligned}&\frac{{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) }{{{m}_{k,i}}\left( 0 \right) -{{m}_{k,i}}\left( {{d}_{k}} \right) +{{m}_{k,i}}\left( \hat{0} \right) -{{m}_{k,i}}\left( {{{\hat{d}}}_{k}} \right) }\ge {{P}_{0}} \\&\quad \Leftrightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge {{P}_{0}}\left( {{m}_{k,i}}\left( 0 \right) -{{m}_{k,i}}\left( {{d}_{k}} \right) +{{m}_{k,i}}\left( \hat{0} \right) -{{m}_{k,i}}\left( {{{\hat{d}}}_{k}} \right) \right) \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge {{P}_{0}}\left( \frac{\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\gamma }_{k}} \right) } \right) \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta \left( {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{y}_{k}} \right) \right\| }^{2}} \right) }{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\lambda }_{k}}{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta {{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\lambda }_{k}}{{\left\| \nabla {{f}_{{{i}_{k}}}}\left( {{x}_{k}} \right) \right\| }^{2}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta }{2\left( \frac{{{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}}{{{\varepsilon }^{2}}}+{{\lambda }_{k}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2\left( {{\left\| {{J}_{{{\mathcal {C}}_{{{i}_{k}}}}}}\left( {{x}_{k}} \right) \right\| }^{2}}+{{\varepsilon }^{2}}{{\lambda }_{k}} \right) } \\&\quad \Rightarrow {{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge \frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2\left( {{J}_{i}}^{2}+{{\varepsilon }^{2}}M \right) }\\&\quad \Leftrightarrow {{f}_{\min }}\left( {{x}_{k+1}} \right) -{{f}_{\min }}\left( {{x}_{k}} \right) \le -\frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2C}, \end{aligned} \end{aligned}$$

where \(C={{J}_{i}}^{2}+{{\varepsilon }^{2}}M\).

It contradicts the assumption that \({{f}_{\min }}\) has a lower bound. Therefore, there is no such \({{\kappa }_{1}}\), and hence

$$\begin{aligned}\underset{k\in {\kappa }'}{\mathop {\lim }}\,\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| =0.\end{aligned}$$

See [11] for the proof of the second part of the theorem. \(\square \)

3.2 Worst-case complexity bound

In the following, denote by \(\text {}\mathcal {S}\) and \(\text {}\mathcal {U}\) the set of successful versus unsuccessful iterations \(\mathcal {S} = \{ k : {{\rho }_{k,{{i}_{k}}}} \ge {{p}_{0}} \}\), and \(\mathcal {U} = \{ k : {{\rho }_{k,{{i}_{k}}}} < {{p}_{0}} \}\)

The worst-case complexity bound for Algorithm 1 is established below. That is, given a tolerance \(\varepsilon \in (0,1)\), the goal is to derive an upper bound on the number of iterations in the worst case such that

$$\begin{aligned}\left\| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right\| <\varepsilon .\end{aligned}$$

Proposition 1

By Assumption 1, let \(k_{\varepsilon }\) be the first iteration index such that the above equation holds. If the successful iteration index set \(\mathcal {S}_{\varepsilon }\) before \(k_{\varepsilon }\), there is

$$\begin{aligned}\left| {{\mathcal {S}}_{\varepsilon }} \right| \le K\text { }\text { }K=\frac{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }{{{P}_{0}}\theta {{\varepsilon }^{2}}}{{f}_{\min }}\left( {{x}_{1}} \right) .\end{aligned}$$

Proof

By the proof of Theorem 1 and \(\left\| \nabla f_{{i_k}}(x_k) \right\| \ge \varepsilon \), where \(k \in \mathcal {S}_{\varepsilon }\), there is

$$\begin{aligned}{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) \ge {{P}_{0}}\frac{\theta {{\varepsilon }^{2}}}{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }. \end{aligned}$$

Therefore, by summing over all the iterated indices, it can be obtained

$$\begin{aligned}{{f}_{\min }}\left( {{x}_{1}} \right) \ge \sum \limits _{k\in {{S}_{\varepsilon }}}{{{f}_{\min }}\left( {{x}_{k}} \right) -{{f}_{\min }}\left( {{x}_{k+1}} \right) }\ge \left| {{\mathcal {S}}_{\varepsilon }} \right| \frac{{{P}_{0}}\theta {{\varepsilon }^{2}}}{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }. \end{aligned}$$

\(\square \)

Lemma 2

By Assumption 1, let \({{\mathcal {U}}_{\varepsilon }}\) denote the set of unsuccessful iterations whose iteration index is less than or equal to \(k_{\varepsilon }\). There are

$$\begin{aligned}\left| {{\mathcal {U}}_{\varepsilon }} \right| \le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) \left| {{\mathcal {S}}_{\varepsilon }} \right| .\end{aligned}$$

Proof

By the definition of \(k_{\varepsilon }\), we have \(k_{\varepsilon }\in {{\mathcal {S}}_{\varepsilon }}\). The goal is to bound the number of unsuccessful iterations between two successful iterations. Let \(\left\{ {{k}_{1}},\ldots ,{{k}_{t}}={{k}_{\varepsilon }} \right\} \) be an ordering of \({{\mathcal {S}}_{\varepsilon }}\), and \(l\in \left\{ 1,\ldots ,t-1 \right\} \) due to the update formula for \({{\lambda }_{k}}\) at successful iterations, there

$$\begin{aligned}{{\lambda }_{{{k}_{l}}+1}}\ge \max \left\{ \frac{{{\lambda }_{k}}}{4},{{\lambda }_{\min }} \right\} \ge {{\lambda }_{\min }}.\end{aligned}$$

Similarly, there is \({{\lambda }_{k}}<b=\max \left\{ 1,\left( \frac{{{J}{i}}^{2}}{{{\varepsilon }^{4}}}+\frac{1}{{{\varepsilon }^{2}}} \right) \frac{{{L}{i}}}{\theta } \right\} \) for any unsuccessful iteration \(k\in \left\{ {{k}_{l}}+1,...,{{k}_{l+1}}-1 \right\} \) using the update rule of \({{\lambda }_{k}}\) at unsuccessful iterations, it can be obtained

$$\begin{aligned}\forall k\in \left\{ {{k}_{l}}+1,...,{{k}_{l+1}}-1 \right\} {{\lambda }_{k}}={{4}^{k-{{k}_{l}}-1}}{{\lambda }_{{{k}_{l}}+1}}\ge {{4}^{k-{{k}_{l}}-1}}m.\end{aligned}$$

thus, the number of unsuccessful iterations between \({{k}_{l}}\) and \({{k}_{l+1}}\) is equal to\({{k}_{l+1}}-{{k}_{l}}-1\)satisfied

$$\begin{aligned}{{k}_{l+1}}-{{k}_{l}}-1\le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) .\end{aligned}$$

when \(l=1,...,t-1\), we have

$$\begin{aligned}\sum \limits _{l=1}^{t-1}{\left( {{k}_{l+1}}-{{k}_{l}}-1 \right) }\le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) \left[ \left| {{\mathcal {S}}_{\varepsilon }} \right| -1 \right] .\end{aligned}$$

What remains is the number of possible unsuccessful iterations between the iteration at index 1 and the first successful iteration \({{k}_{1}}\). Due to \({{\lambda }_{1}}\ge m\), similar to the above can be obtained

$$\begin{aligned}{{k}_{1}}-1\le {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) .\end{aligned}$$

Combining Proposition 1 and Lemma 2 yields the following complexity estimate. \(\square \)

Theorem 2

Under Assumption 1, Algorithm 1 is required at most

$$\begin{aligned}\left| {{\mathcal {S}}_{\varepsilon }} \right| +\left| {{\mathcal {U}}_{\varepsilon }} \right| =\left( {{\log }_{\lambda }}\left( \frac{b}{{{\lambda }_{\min }}} \right) +1 \right) \frac{2\left( J_{i}^{2}+M{{\varepsilon }^{2}} \right) }{{{P}_{0}}\theta {{\varepsilon }^{2}}}{{f}_{\min }}\left( {{x}_{1}} \right) .\end{aligned}$$

iteratively generate \(\left| \nabla {{f}_{i}}\left( {{x}_{k}} \right) \right| <\varepsilon \).

4 Numerical example

Now the numerical experiments of the proposed algorithm are carried out and the numerical results are compared with those of the classical GN algorithm [19] and LM algorithm [11]. The algorithm is implemented in the Julia language version 1.0.4, where the test environment is: Intel(R) Core(TM) i5-11300H CPU@ 3.10GHz 16.0GB RAM.

During the experiment, MLM-LOVO used a simple reduced test \({{f}_{\min }}\left( {{x}_{k}}+{{s}_{k}} \right) <{{f}_{\min }}\) at step 3, the computation of the direction \({{s}_{k}}\) is done by solving the two linear systems in Step 3 of the algorithm with the Cholesky factorization of the matrix \({{J}_{{{\mathcal {C}}_{i}}}}{{\left( {{x}_{k}} \right) }^{T}}{{J}_{{{\mathcal {C}}_{i}}}}\left( {{x}_{k}} \right) +{{\gamma }_{k}}I\). QR decomposition is also used to solve the problem in the experiment, and the experimental effect is equivalent to the former. In the experiment, MLM-LOVO was used as a subroutine of RAFF to solve the problem, and the details of RAFF can be found in reference [11].

A solution \(\overline{x}={{x}_{k}}\) is said to be successful if

$$\begin{aligned}\left\| \nabla {{f}_{{{i}_{k}}}}\left( \overline{x} \right) \right\| \le \varepsilon \end{aligned}$$

is satisfied, for some \({{i}_{k}}\in {{I}_{\min }}\left( \overline{x} \right) \), where \({{f}_{{{i}_{k}}}}\) is given by (2.1). The algorithm stops if the gradient cannot be calculated due to numerical errors.

Fig. 1
figure 1

Test problems for \(r=10,p=9\) and \(r=100,p=90\) experiments are simulated according to the cubic model

The above cubic model is as follows

$$\begin{aligned}\phi \left( x,t \right) ={{x}_{1}}{{t}^{3}}+{{x}_{2}}{{t}^{2}}+{{x}_{3}}t+{{x}_{4}},\end{aligned}$$

where \(x\in {{\mathbb {R}}^{4}}\) represents the model parameters, \(t\in R\) represents the model variables, the continuous line represents the model adjusted by the MLM-LOVO algorithm, and the dashed line is the model of the "correct solution". Where r is 10 and 100, p is 9 and 90, the initial point, the accuracy parameter \(\varepsilon ={{10}^{-4}}\), and the exact solution is \({{x}^{*}}=(2,0,-4,-10)\). Figure 1 shows that the proposed MLM-LOVO algorithm can be used to correctly identify and ignore outliers.

Table 1 MLM-LOVO Numerical results

MLM-LOVO algorithm has the characteristics of small dimension and large data set when fitting problems. For the cubic model mentioned above, The MLM-LOVO algorithm proposed in this paper, the LM-LOVO algorithm in reference [11] and the GN-LOVO algorithm are used to test the data sets containing 10, 100, 1000 and 10,000 data. The randomly generated questions include: 10 points and 1 outlier, 10 points and 2 outliers, 100 points and 2 outliers, 100 points and 10 outliers, 1000 points and 10 outliers, 1000 points and 100 outliers, and 10,000 points and 100 outliers. Among them, the numerical experimental results of some data of the MLM-LOVO algorithm are shown in Table 1. rp denote the number of data and the number of trusted points, respectively, F denotes the detected outliers, and T denotes the correct outlier among the detected outliers, and IT denotes the number of iterations.

The comparison of time and number of iterations numerical results of the three algorithms is shown in Table 2. As can be seen in Table 2, the proposed algorithm has certain advantages in time and number of iterations for the same initial points when testing the data set of the same size. In addition, the difference in the number of outliers seems to have little effect on the algorithm time and the number of iterations.

Table 2 Numerical results of solving cubic model

The comparison of the numerical results of the function value calculation and Jacobian matrix calculation of the three algorithms is shown in Table 3. As can be seen from Table 3, the Jacobian matrix computation in the proposed algorithm is less than that of GN-LOVO algorithm and LM-LOVO algorithm, that is, the proposed algorithm reduces the Jacobian matrix computation to some extent. Numerical results show the feasibility and effectiveness of the proposed algorithm for the LOVO problem.

Table 3 Numerical results of solving cubic model

5 Conclusion

In this paper, we employ the modified LM algorithm to tackle the LOVO problem. The algorithm we adopt computes a single LM step and an approximate LM step within each iteration, consequently minimizing the Jacobian computation overhead. We provide a numerical example to validate the algorithm, presenting the numerical experimental results. Comparative analysis against the GN algorithm and LM algorithm showcases distinct advantages in terms of both time efficiency and iteration count for the proposed approach.