1 Introduction

In this paper, we propose a new long-step interior point algorithm (IPA) for linear optimization. We consider the primal-dual linear programming (LP) problem pair in the following standard form:

$$\begin{aligned} \left. \begin{aligned} \min \&{\mathbf {c}}^T {\mathbf {x}} \\ A {\mathbf {x}}&={\mathbf {b}} \\ {\mathbf {x}}&\ge {\mathbf {0}} \end{aligned} \right\} \left. \begin{aligned} \max \&{\mathbf {b}}^T {\mathbf {y}} \\ A^T {\mathbf {y}}+{\mathbf {s}}&={\mathbf {c}} \\ {\mathbf {s}}&\ge {\mathbf {0}} \end{aligned} \right\} \end{aligned}$$
(1)

where \(A \in {\mathbb {R}}^{m \times n}\) with full row rank, \({\mathbf {b}} \in {\mathbb {R}}^{m}\) and \({\mathbf {c}} \in {\mathbb {R}}^{n}\) are given.

The simplex method for solving linear optimization problems was developed by Dantzig (1951). Although there were different attempts to propose new methods, this was the only numerically efficient algorithm to solve LP problems for many years. Khachian (1979) proved that the ellipsoid method can solve the linear optimization problem in polynomial time. This result received much attention because of its theoretical importance, but it turned out that in practice, its performance is significantly worse than that of the simplex algorithm. Karmarkar (1984) proposed a new polynomial algorithm for LP, and this result started a new era in operations research. This algorithm generates a sequence of points in the interior of the feasible polyhedron (i.e., it is an IPA) and therefore follows an entirely different approach from the simplex method, which gives a sequence of vertices of the feasible set. Since then, this approach has received much attention, and numerous new IPAs have been introduced not just for linear optimization but also for many other problem classes, such as linear complementarity problems (LCPs), convex optimization, symmetric optimization, second-order cone optimization, etc.

Based on the step length, IPAs can be divided into two main groups, short-step and long-step methods. Long-step methods perform better in practice, but in general short-step variants have better theoretical complexity \(O(\sqrt{n}L)\). Here, n denotes the dimension of the problem and \(L=\log \frac{{\mathbf {x}}_0^T {\mathbf {s}}_0}{\varepsilon }\), where \(({\mathbf {x}}_0, {\mathbf {y}}_0, {\mathbf {s}}_0)\) is the given starting point and \(\varepsilon \) is the required precision. This discrepancy was pointed out by Renegar (2001) as the "irony of IPAs". In the last twenty years, different attempts have been made to overcome this issue (e.g., Bai et al. 2008; Peng et al. 2002; Potra 2004).

The wide neighbourhood \({\mathcal {N}}_{\infty }^-\) (to be defined in Sect. 3) has been proposed by Kojima et al. (1989). Their algorithm turned out to be efficient in practice, and its complexity was O(nL). Ai and Zhang (2005) introduced an IPA that works in a new wide neighbourhood of the central path. They proved that the method has the same theoretical complexity as the short-step variants.

Using the wide neighborhood applied by Ai and Zhang, several authors proposed new long-step methods with the best known theoretical complexity. There are related results for linear programming (Darvay and Takács 2018; Liu et al. 2011; Yang et al. 2016), for horizontal linear complementarity problems (Potra 2014), and also for semidefinite optimization (Feng and Fang 2014; Li and Terlaky 2010; Pirhaji et al. 2017).

To be able to determine new search directions in IPAs, Darvay (2003) introduced the method of algebraic equivalent transformation. His main idea was to apply a strictly increasing, continuously differentiable function \(\varphi \) to the centering equation of the central path system, then apply Newton’s method to determine the new search directions. In his paper, Darvay applied the function \(\varphi (t)=\sqrt{t}\) and introduced a new, short-step algorithm for linear optimization. Most algorithms in the literature can be considered as a special case of this technique, where \(\varphi (t)=t\), i.e., the identity map. The function \(\varphi (t)=t-\sqrt{t}\) has been introduced by Darvay et al. (2016), also in the context of linear optimization, and has recently been investigated in several papers of Darvay and his coauthors. They presented a corrector-predictor IPA for linear optimization (Darvay et al. 2020a), and proposed another corrector-predictor IPA for sufficient LCPs (Darvay et al. 2020b), and also introduced a short-step IPA for sufficient LCPs (Darvay et al. 2021). Furthermore, the function \(\varphi (t)=\frac{\sqrt{t}}{2(1+\sqrt{t})}\) has been proposed by Kheirfam and Haghighi (2016), to solve \({\mathcal {P}}^* (\kappa )\) linear complementarity problems. In this paper, we investigate a new long-step IPA for linear optimization, based on the function \(\varphi (t)=t-\sqrt{t}\).

Most of the algorithms based on the algebraic equivalent transformation technique are short-step variants, except for the method of Darvay and Takács (2018), which is based on the function \(\varphi (t)=\sqrt{t}\) and applies an Ai-Zhang type wide neighborhood.

Throughout this paper, we use the following notations. Scalars and indices are denoted by lowercase Latin letters. Vectors are denoted by bold lowercase Latin letters, and we use uppercase Latin letters to denote matrices. Sets are denoted by capital calligraphic letters. Let \(\mathbf {x,s}\in {\mathbb {R}}^n\) be two vectors; then \(\mathbf {xs}\) is componentwise, namely, the Hadamard product of \({\mathbf {x}}\) and \({\mathbf {s}}\). \({\mathbf {x}}^+\) and \({\mathbf {x}}^-\) represent the positive and negative part of the vector \({\mathbf {x}}\), i.e.,

$$\begin{aligned} {\mathbf {x}}^+=\max \{{\mathbf {x}},{\mathbf {0}} \} \in {\mathbb {R}}^n \quad \text { and } \quad {\mathbf {x}}^-=\min \{{\mathbf {x}},{\mathbf {0}} \} \in {\mathbb {R}}^n, \end{aligned}$$

where the maximum and minimum are taken componentwise.

If \(\alpha \in {\mathbb {R}}\), then \({\mathbf {x}}^{\alpha }=[x_1^{\alpha }, x_2^{\alpha }, \dots , x_n^{\alpha }]^T\). If \(s_i \ne 0\) holds for all \(i \in \{1, \dots , n\}\), then the fraction of \({\mathbf {x}}\) and \({\mathbf {s}}\) is the vector \({\mathbf {x}}/{\mathbf {s}}=[x_1/s_1, x_2/s_2 \dots , x_n/s_n]^T\). The vector of ones is denoted by \({\mathbf {e}}\). \(\Vert {\mathbf {x}} \Vert \) is the Euclidean norm of \({\mathbf {x}}\), \(\Vert {\mathbf {x}} \Vert _1=\sum _{i=1}^n |x_i|\) denotes the \(L^1\) (Manhattan) norm of \({\mathbf {x}}\), and \(\Vert {\mathbf {x}} \Vert _{\infty }=\max _{i=1}^n |x_i|\) is the infinity norm of \({\mathbf {x}}\). \(\text {diag} ({\mathbf {x}})\) is the diagonal matrix with the elements of the vector \({\mathbf {x}}\) in its diagonal. Finally, \({\mathcal {I}}\) denotes the index set \({\mathcal {I}}=\{1, \dots , n\}\).

The paper is organized as follows. In Sect. 2 we give an overview of Darvay’s algebraic equivalent transformation technique. In Sect. 3 we define a new wide neighborhood, introduce a large-update IPA, and examine its correctness. In the last subsection, we prove that the complexity of the new method is \(O(\sqrt{n}L)\). In Sect. 6 we present our preliminary numerical results. Section 7 summarizes our conclusions.

2 The algebraic equivalent transformation technique

The optimality criteria of the primal-dual pair (1) can be formulated as:

$$\begin{aligned} \left. \begin{aligned} A \mathbf {x}&= \mathbf {b}, \\ A^T \mathbf {y} + \mathbf {s}&= \mathbf {c}, \\ \mathbf {xs}&= \mathbf {0}. \end{aligned} \begin{aligned} \mathbf {x}&\ge \mathbf {0} \\ \mathbf {s}&\ge \mathbf {0} \\ \end{aligned} \right\} \end{aligned}$$

In the case of IPAs, instead of the third equation of the optimality criteria (the complementarity condition), we consider a perturbed version

$$\begin{aligned} \left. \begin{aligned} A \mathbf {x}&= \mathbf {b}, \\ A^T \mathbf {y} + \mathbf {s}&= \mathbf {c}, \\ \mathbf {xs}&= \nu \mathbf {e}, \end{aligned} \begin{aligned} \mathbf {x}&\ge \mathbf {0} \\ \mathbf {s}&\ge \mathbf {0} \\ \end{aligned} \right\} \end{aligned}$$
(2)

where \(\nu \) is a given positive parameter.

Let \({\mathcal {F}}=\{({\mathbf {x}},{\mathbf {y}},{\mathbf {s}}):\ A {\mathbf {x}} = {\mathbf {b}},\ A^T {\mathbf {y}} + {\mathbf {s}} = {\mathbf {c}},\ {\mathbf {x}} \ge {\mathbf {0}},\ {\mathbf {s}} \ge {\mathbf {0}} \} \) denote the set of primal-dual feasible solutions and \({\mathcal {F}}_+=\{({\mathbf {x}},{\mathbf {y}},{\mathbf {s}}) \in {\mathcal {F}}: \ {\mathbf {x}}> {\mathbf {0}},\ {\mathbf {s}} > {\mathbf {0}} \}\) the set of strictly feasible solutions.

If \({\mathcal {F}}_+ \ne \emptyset \), then for each \(\nu >0\) system (2) has a unique solution (Sonnevend 1986), the \(\nu \)-center. The set of \(\nu \)-centers form a path that is called the central path, and system (2) is called the central path problem. Furthermore, as \(\nu \) tends to 0, the \(\nu \)-centers converge to a solution of the linear programming problem (1).

To be able to find new search directions, Darvay (2003) introduced the algebraic equivalent transformation technique (AET). His main idea was to transform the central path problem (2) to an equivalent form:

$$\begin{aligned} \left. \begin{aligned} A {\mathbf {x}}&= {\mathbf {b}}, {\mathbf {x}}\ge {\mathbf {0}} \\ A^T {\mathbf {y}} + {\mathbf {s}}&= {\mathbf {c}}, {\mathbf {s}} \ge {\mathbf {0}} \\ \varphi \left( \frac{\mathbf {xs}}{\nu } \right)&= \varphi \left( {\mathbf {e}} \right) , \end{aligned} \right\} \end{aligned}$$
(3)

where \(\varphi : (\xi , \infty ) \rightarrow {\mathbb {R}}\) is a continuously differentiable function with \(\varphi ' (t)>0\) for all \(t \in (\xi , \infty )\), \(\xi \in [0,1)\). It is important to note that the transformed system (3) does not modify the central path; it determines different search directions depending on the function \(\varphi \). More precisely, if we are at the point \(({\mathbf {x}},{\mathbf {y}},{\mathbf {s}})\in \mathcal {F}_+ \subset \mathbb {R}^{n+m+n}\) and take a step toward the \(\nu =\tau \mu \)-center, where \(\mu ={\mathbf {x}}^T{\mathbf {s}}/n\) and \(\tau \in (0,1)\) is a given update parameter, then applying Newton’s method to (3), the search direction (\(\varDelta {\mathbf {x}},\varDelta {\mathbf {y}},\varDelta {\mathbf {s}}\)) is the solution of the following system:

$$\begin{aligned} \left. \begin{aligned} A \varDelta {\mathbf {x}}&= {\mathbf {0}} \\ A^T \varDelta {\mathbf {y}}+\varDelta {\mathbf {s}}&= {\mathbf {0}} \\ {\mathbf {s}} \varDelta {\mathbf {x}}+ {\mathbf {x}} \varDelta {\mathbf {s}}&= \tau \mu \frac{\varphi ({\mathbf {e}}) - \varphi \left( \frac{\mathbf {xs}}{\tau \mu }\right) }{\varphi '\left( \frac{\mathbf {xs}}{\tau \mu }\right) }. \end{aligned}\right\} \end{aligned}$$
(4)

Traditionally, in the analysis of Ai-Zhang type methods, the value of the update parameter \(\tau \) is included in the formulation of the Newton-system; this is the main reason why we chose the value of \(\nu \) as \(\tau \mu \). The value of \(\tau \) does not depend on the dimension of the problem; i.e., we propose a large-update IPA.

Since we assumed that A has full row rank and \({\mathbf {x}}\) and \({\mathbf {s}}\) are strictly positive vectors, the Newton-directions are uniquely determined by the system (4).

To facilitate the analysis of IPAs, we consider a scaled version of (4). Let

$$\begin{aligned} {\mathbf {v}}=\sqrt{\frac{\mathbf {xs}}{\tau \mu }}, \quad \mathbf {dx}=\frac{{\mathbf {v}} \varDelta {\mathbf {x}}}{ {\mathbf {x}}}, \quad \mathbf {ds}=\frac{{\mathbf {v}} \varDelta {\mathbf {s}}}{ {\mathbf {s}}}, \text { and } \bar{A}=A \text { diag} \left( \frac{{\mathbf {v}}}{{\mathbf {s}}} \right) . \end{aligned}$$

With these notations, the scaled Newton-system can be written as:

$$\begin{aligned} \left. \begin{aligned} \bar{A} \mathbf {dx}&= {\mathbf {0}} \\ \bar{A}^T \varDelta {\mathbf {y}}+ \mathbf {ds}&= {\mathbf {0}} \\ \mathbf {dx} +\mathbf {ds}&= {\mathbf {p}}_{\varphi }, \end{aligned}\right\} \end{aligned}$$

where

$$\begin{aligned} {\mathbf {p}}_{\varphi }=\frac{\varphi ({\mathbf {e}})- \varphi ({\mathbf {v}}^2)}{{\mathbf {v}} \varphi ' ({\mathbf {v}}^2)}.\end{aligned}$$

In this paper, we investigate the function \(\varphi (t)=t-\sqrt{t}\), \(t>1/2\) (i.e., \(\xi =1/2\)) introduced by Darvay et al. (2016). Since we fixed the function \(\varphi \), from now on, we omit the subscript \(\varphi \) and simply write

$$\begin{aligned} {\mathbf {p}}=\frac{2({\mathbf {v}}-{\mathbf {v}}^2)}{2 {\mathbf {v}}-{\mathbf {e}}}. \end{aligned}$$

Our goal is to introduce a new long-step IPA based on this function. To be able to prove the correctness of this method, we need to ensure that \({\mathbf {p}}\) is well-defined. Therefore, we assume that \(v_i > 1/2 \) is satisfied for all \(i \in {\mathcal {I}}\).

Let p be the function for which \(p(v_i)=p_i\) holds for all \(v_i \in (1/2, \infty )\), i.e.,

$$\begin{aligned}p: \left( \frac{1}{2}, \infty \right) \rightarrow {\mathbb {R}}, \quad p(t)=\frac{2(t-t^2)}{2t-1}. \end{aligned}$$

Throughout the analysis, we will also investigate different estimations of the function p(t).

3 The new algorithm

The main idea of Ai and Zhang (2005) was to decompose the Newton-directions into positive and negative parts and use different step lengths with the two components. If we apply this approach to the system (4), we get the following two systems:

$$\begin{aligned} \left. \begin{aligned} A \varDelta {\mathbf {x}}_-&={\mathbf {0}} \\ A^T \varDelta {\mathbf {y}}_- + \varDelta {\mathbf {s}}_-&={\mathbf {0}} \\ {\mathbf {s}} \varDelta {\mathbf {x}}_- + {\mathbf {x}} \varDelta {\mathbf {s}}_-&=\tau \mu {\mathbf {v}} {\mathbf {p}}^- \end{aligned} \right\} \left. \begin{aligned} A \varDelta {\mathbf {x}}_+&={\mathbf {0}} \\ A^T \varDelta {\mathbf {y}}_+ + \varDelta {\mathbf {s}}_+&={\mathbf {0}} \\ {\mathbf {s}} \varDelta {\mathbf {x}}_+ + {\mathbf {x}} \varDelta {\mathbf {s}}_+&=\tau \mu {\mathbf {v}} {\mathbf {p}}^+, \end{aligned} \right\} \end{aligned}$$
(5)

and the new point with step length \(\alpha =(\alpha _1,\alpha _2)\) will be \({\mathbf {x}}(\alpha )={\mathbf {x}}+\alpha _1\varDelta {\mathbf {x}}_-+\alpha _2\varDelta {\mathbf {x}}_+\), \({\mathbf {y}}(\alpha )={\mathbf {y}}+\alpha _1\varDelta {\mathbf {y}}_-+\alpha _2\varDelta {\mathbf {y}}_+\) and \({\mathbf {s}}(\alpha )={\mathbf {s}}+\alpha _1\varDelta {\mathbf {s}}_-+\alpha _2\varDelta {\mathbf {s}}_+\). For both systems, the coefficient matrix is exactly the same as in the system (4); therefore, using the same reasoning, it is easy to see that both systems have unique solutions.

It is important to notice that \(\varDelta {\mathbf {x}}_+\) is not the positive part of \(\varDelta {\mathbf {x}}\) (in this case the sign \(+\) is a subscript instead of a superscript), it is the solution of the system with \({\mathbf {p}}^+\) on its right-hand side. The notation is similar for the other solutions of these systems.

We introduce the index sets \({\mathcal {I}}_+=\{ i \in {\mathcal {I}}: x_i s_i \le \tau \mu \}=\{i \in {\mathcal {I}}: v_i \le 1\}\), and \({\mathcal {I}}_-={\mathcal {I}} \setminus {\mathcal {I}}_+\). Under the technical assumption \(v_i > \frac{1}{2}\), the nonnegativity of a coordinate \(p_i\) is equivalent to \(i \in {\mathcal {I}}_+\).

To facilitate the analysis of the algorithm, we introduce the scaled search directions

$$\begin{aligned} \mathbf {dx}_-=\frac{{\mathbf {v}} \varDelta {\mathbf {x}}_-}{{\mathbf {x}}}, \ \mathbf {ds}_-=\frac{{\mathbf {v}} \varDelta {\mathbf {s}}_-}{{\mathbf {s}}},\ \mathbf {dx}_+=\frac{{\mathbf {v}} \varDelta {\mathbf {x}}_+}{{\mathbf {x}}}, \ \mathbf {ds}_+=\frac{{\mathbf {v}} \varDelta {\mathbf {s}}_+}{{\mathbf {s}}}.\end{aligned}$$

The systems (5) then transform to the following systems

$$\begin{aligned} \left. \begin{aligned} \bar{A} \mathbf {dx}_-&={\mathbf {0}} \\ \bar{A}^T \varDelta {\mathbf {y}}_- + \mathbf {ds}_-&={\mathbf {0}} \\ \mathbf {dx}_- + \mathbf {ds}_-&={\mathbf {p}}^-, \end{aligned} \right\} \left. \begin{aligned} \bar{A} \mathbf {dx}_+&={\mathbf {0}} \\ \bar{A}^T \varDelta {\mathbf {y}}_+ + \mathbf {ds}_+&={\mathbf {0}} \\ \mathbf {dx}_+ + \mathbf {ds}_+&={\mathbf {p}}^+. \end{aligned} \right\} \end{aligned}$$
(6)

The wide neighborhood \({\mathcal {N}}_{\infty }^-\) has been introduced by Kojima et al. (1989). It is defined as follows:

$$\begin{aligned} {\mathcal {N}}_{\infty }^- (1-\tau )=\{ ({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {F}}_+: \mathbf {xs} \ge \tau \mu {\mathbf {e}} \}=\{({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {F}}_+: {\mathbf {v}}\ge {\mathbf {e}}\}.\end{aligned}$$

Notice that this means that a point is in the neighborhood \({\mathcal {N}}_\infty ^-(1-\tau )\) if and only if the corresponding index set \({\mathcal {I}}_+\) is empty, namely \({\mathbf {p}}^+={\mathbf {0}}\). In the analysis, we are going to use a new neighborhood that depends only on the positive part of the vector \({\mathbf {p}}\):

$$\begin{aligned} {\mathcal {W}} (\tau , \beta ) = \left\{ ({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {F}}_+: \Vert {\mathbf {p}}^+ \Vert \le \beta \text { and } {\mathbf {v}} > \frac{1}{2} {\mathbf {e}}\right\} , \end{aligned}$$

where \(0<\beta <1/2\) is a given parameter value. The role of the technical condition \({\mathbf {v}} > {\mathbf {e}}/2\) has been discussed at the end of Sect. 2. This neighborhood is a modification of the one introduced by Ai and Zhang (2005) (since they require \(\Vert \mathbf {vp}^+\Vert \le \beta \)) and it is equivalent to the one used by Darvay and Takács (2018) for the function \(\varphi (t)=\sqrt{t}\).

Following the idea of Ai and Zhang (2005), the next lemma verifies that \({\mathcal {W}}(\tau ,\beta )\) is indeed a wide neighborhood:

Lemma 1

Let \(0<\beta <1/2\) and \(0< \tau <1\) be given parameters, and let \(\gamma =1/4 \ (1+\sqrt{1-2 \beta })^2 \tau \). Then

$$\begin{aligned} {\mathcal {N}}_{\infty }^- (1- \tau ) \subseteq {\mathcal {W}}(\tau ,\beta ) \subseteq {\mathcal {N}}_{\infty }^- (1- \gamma ).\end{aligned}$$

Proof

If \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {N}}_{\infty }^- (1- \tau )\), then \(\Vert {\mathbf {p}}^+ \Vert =0 < \beta \) and \({\mathbf {v}}\ge {\mathbf {e}}> 1/2 {\mathbf {e}}\).

For the second inclusion, let \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}} (\tau , \beta )\) and assume indirectly that there exists an index \(i \in {\mathcal {I}}\) for which \(x_i s_i < \gamma \mu \), i.e.,

$$v_i^2< \gamma / \tau =1/4 \ (1+\sqrt{1-2 \beta })^2.$$

Since p(t) is a strictly decreasing function,

$$\begin{aligned} \begin{aligned} p_i&=p(v_i)> \frac{2 \left( \sqrt{\frac{\gamma }{\tau }} - \frac{\gamma }{\tau } \right) }{2 \sqrt{\frac{\gamma }{\tau }}-1}= \frac{\beta }{\sqrt{1-2 \beta }} > \beta , \end{aligned}\end{aligned}$$

which is a contradiction. \(\square \)

The following lower and upper bounds on the coordinates of the vector \({\mathbf {v}}\) will be useful for different estimations during the analysis.

Corollary 1

Let \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau ,\beta )\), then

$$\begin{aligned} \frac{1+\sqrt{1-2\beta }}{2} \le&v_i \le 1&\forall i&\in {\mathcal {I}}_+ ,\\ 1 <&v_i\le \sqrt{n/\tau }&\forall i&\in {\mathcal {I}}_-. \end{aligned}$$

Proof

The first statement follows directly from Lemma 1. The upper bound \(v_i \le \sqrt{n/\tau }\) holds for all \(i \in {\mathcal {I}}\) since

$$\begin{aligned} \sum _{i \in {\mathcal {I}}} v_i^2=\sum _{i \in {\mathcal {I}}} \frac{x_i s_i}{\tau \mu }= \frac{1}{\tau \mu } {\mathbf {x}}^T {\mathbf {s}}=\frac{n}{\tau }. \end{aligned}$$
(7)

\(\square \)

Before presenting the analysis, we give the pseudocode of the IPA.

figure a

During the analysis, we consider the case of \(\alpha _2=1\), i.e., we take a full Newton-step in the direction \((\varDelta {\mathbf {x}}_+,\varDelta {\mathbf {y}}_+,\varDelta {\mathbf {s}}_+)\), and determine a value of \(\alpha _1\) so that the desired complexity of the algorithm can be achieved.

From now on, we assume that a point \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}} (\tau , \beta )\) is given, and in the next section, we prove the correctness of the algorithm.

4 Analysis of the algorithm

Let us introduce the following notations:

$$\begin{aligned}&\mathbf {dx}(\alpha )=\alpha _1 \mathbf {dx}_-+ \alpha _2 \mathbf {dx}_+, \quad \mathbf {ds}(\alpha )=\alpha _1 \mathbf {ds}_-+ \alpha _2 \mathbf {ds}_+, \\&{\mathbf {h}}(\alpha )=\tau \mu {\mathbf {v}}^2+\alpha _1 \tau \mu {\mathbf {v}} {\mathbf {p}}^-+\alpha _2 \tau \mu {\mathbf {v}} {\mathbf {p}}^+, \end{aligned}$$

where \(\alpha _1,\alpha _2 \in [0,1]\) are given step lengths, whose values will be specified later. With these notations, the equation

$$\begin{aligned} {\mathbf {x}}(\alpha ) {\mathbf {s}} (\alpha )=({\mathbf {x}}+ \alpha _1 \varDelta {\mathbf {x}}_- + \alpha _2 \varDelta {\mathbf {x}}_+)({\mathbf {s}}+ \alpha _1 \varDelta {\mathbf {s}}_- + \alpha _2 \varDelta {\mathbf {s}}_+) \end{aligned}$$

can be written as

$$\begin{aligned} {\mathbf {x}}(\alpha ) {\mathbf {s}} (\alpha )={\mathbf {h}} (\alpha )+ \tau \mu \mathbf {dx}(\alpha )\mathbf {ds}(\alpha ).\end{aligned}$$

It is important to note that the search directions are orthogonal, as usually in the case of LP problems, since

$$\begin{aligned} \mathbf {dx}(\alpha )^T \mathbf {ds}(\alpha )= \alpha _1^2 \mathbf {dx}_- ^T \mathbf {ds}_-+ \alpha _1 \alpha _2 (\mathbf {dx}_- ^T \mathbf {ds}_+ + \mathbf {dx}_+ ^T \mathbf {ds}_-)+ \alpha _2^2 \mathbf {dx}_+ ^T \mathbf {ds}_+.\end{aligned}$$

Furthermore, \(\mathbf {dx}_+\) and \(\mathbf {dx}_-\) are in the kernel of the matrix \(\bar{A}\), while \(\mathbf {ds}_+\) and \(\mathbf {ds}_-\) are in the rowspace of \(\bar{A}\) (see system (6)). Therefore, all four scalar products are 0 in the previous expression.

The next two lemmas give lower bounds on the value of \({\mathbf {h}}(\alpha )\).

Lemma 2

Let \(\alpha \in [0,1]^2\). Then \(h_i (\alpha ) \ge \tau \mu \) for all \(i \in {\mathcal {I}}_-\).

Proof

In the case of \(i \in {\mathcal {I}}_-\), \(v_i>1\) and \(h_i(\alpha )=\tau \mu v_i(v_i+ \alpha _1 p_i)\). We need to prove that \(v_i(v_i+ \alpha _1 p_i) \ge 1\), i.e., \(\alpha _1 \le \frac{1-v_i^2}{v_i p_i}\) holds.

Let us examine the expression \(\frac{1-t^2}{t p(t)}\) over the interval \((1,\infty )\):

$$\begin{aligned} \frac{1-t^2}{t p(t)}=\frac{1-t^2}{t}\frac{2t-1}{2t(1-t)}=\frac{2t^2+t-1}{2t^2}=1+\frac{t-1}{2t^2} > 1.\end{aligned}$$

On the other hand, \(\alpha _1 \le 1\) by definition. Thus, \(h_i (\alpha ) \ge \tau \mu \) holds for all \(i \in {\mathcal {I}}_-\). \(\square \)

We show that \({\mathbf {h}} (\alpha )\) is a componentwise strictly positive vector.

Lemma 3

Let \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau , \beta )\) and \(\alpha \in [0,1]^2\). Then \({\mathbf {h}}(\alpha ) \ge \gamma \mu {\mathbf {e}}\), and consequently \({\mathbf {h}}(\alpha ) > {\mathbf {0}}\).

Proof

By Lemma 1, \(\tau \mu v_i^2={\mathbf {x}}_i{\mathbf {s}}_i \ge \gamma \mu \) for all \(i\in {\mathcal {I}}\). Furthermore, if \(i\in {\mathcal {I}}_+\), then \(v_ip_i>0\), so \( h_i(\alpha ) \ge \tau \mu v_i^2\ge \gamma \mu \).

In the case of \(i \in {\mathcal {I}}_-\), the statement is a consequence of Lemma 2, since \( h_i(\alpha ) \ge \tau \mu \ge \gamma \mu \). \(\square \)

To be able to prove the feasibility of the new iterates and ensure that they stay in the neighborhood \({\mathcal {W}}(\tau ,\beta )\), we need the following technical lemma:

Lemma 4

Let \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau , \beta )\), \(\alpha _1= \sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\). Then

$$\begin{aligned} \Vert \left[ \mathbf {dx}(\alpha )\mathbf {ds}(\alpha )\right] ^- \Vert _1 = \Vert \left[ \mathbf {dx}(\alpha )\mathbf {ds}(\alpha )\right] ^+ \Vert _1 \le \frac{1}{2} \beta .\end{aligned}$$

Proof

According to Lemma 3.5 of Ai and Zhang (2005) and using the orthogonality of \(\mathbf {dx}(\alpha )\) and \(\mathbf {ds}(\alpha )\), we have

$$\begin{aligned}\begin{array}{rlll} \Vert [\mathbf {dx}(\alpha )\mathbf {ds}(\alpha )]^- \Vert _1 &{}= \Vert [\mathbf {dx}(\alpha )\mathbf {ds}(\alpha )]^+ \Vert _1 &{} \le \frac{1}{4} \Vert \mathbf {dx}(\alpha )+\mathbf {ds}(\alpha ) \Vert ^2 \\ = \frac{1}{4} \Vert \alpha _1 (\mathbf {dx}_-+\mathbf {ds}_-) &{} +\alpha _2 (\mathbf {dx}_++\mathbf {ds}_+) \Vert ^2 &{}= \frac{1}{4} \left( \alpha _1^2 \Vert {\mathbf {p}}^- \Vert ^2 + \alpha _2^2 \Vert {\mathbf {p}}^+ \Vert ^2 \right) . \end{array}\end{aligned}$$

By the definition of \({\mathcal {W}}(\tau ,\beta )\), we have \(\Vert {\mathbf {p}}^+ \Vert \le \beta \). We need to estimate the term \(\Vert {\mathbf {p}}^- \Vert ^2\). According to (7), we have

$$\begin{aligned} \Vert {\mathbf {p}}^- \Vert ^2 = \sum _{i \in {\mathcal {I}}_-} \left( v_i - \frac{v_i}{2 v_i-1} \right) ^2 \le \sum _{i \in {\mathcal {I}}_-} v_i^2 \le \sum _{i \in {\mathcal {I}}} v_i^2 =\frac{n}{\tau }. \end{aligned}$$

Using these two estimations and substituting the values of \(\alpha _1\) and \(\alpha _2\), we can write

$$\begin{aligned} \frac{1}{4} \left( \alpha _1^2 \Vert {\mathbf {p}}^- \Vert ^2 + \alpha _2^2 \Vert {\mathbf {p}}^+ \Vert ^2 \right) \le \frac{1}{4} \frac{\beta \tau }{n} \frac{n}{\tau }+\frac{1}{4} \beta ^2=\frac{1}{4} \beta +\frac{1}{4} \beta ^2 \le \frac{1}{2} \beta .\end{aligned}$$

\(\square \)

The next lemma gives a positive lower bound on the vector \({\mathbf {x}}(\alpha ){\mathbf {s}}(\alpha )\), which is the first step to prove the strict feasibility of the new point.

Lemma 5

Let \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}} (\tau , \beta )\), \(\alpha _1= \sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\). Then

$${\mathbf {x}}(\alpha ){\mathbf {s}}(\alpha ) \ge \frac{1-2\beta +\sqrt{1-2\beta }}{2} \tau \mu {\mathbf {e}}$$

holds.

Proof

By Lemma 3, we have \({\mathbf {h}}(\alpha ) \ge \gamma \mu {\mathbf {e}}\). Using Lemma 4 and substituting the value of \(\gamma \), we get

$$\begin{aligned}\begin{aligned} {\mathbf {x}}(\alpha ){\mathbf {s}}(\alpha )&={\mathbf {h}}(\alpha )+\tau \mu \mathbf {dx}(\alpha )\mathbf {ds}(\alpha ) \ge \gamma \mu {\mathbf {e}}-\tau \mu \Vert [\mathbf {dx}(\alpha )\mathbf {ds}(\alpha )]^- \Vert _1 {\mathbf {e}} \\&\ge \gamma \mu {\mathbf {e}}-\tau \mu \frac{1}{2} \beta {\mathbf {e}}= \tau \mu \left( \frac{\gamma }{\tau }-\frac{\beta }{2} \right) {\mathbf {e}}=\frac{1-2\beta +\sqrt{1-2\beta }}{2} \tau \mu {\mathbf {e}}. \end{aligned}\end{aligned}$$

\(\square \)

The following statement is the linear programming analogue of Proposition 3.2 by Ai and Zhang (2005) (they proposed it for monotone linear complementarity problems). The proof remains the same.

Lemma 6

Let \((\mathbf {x,y,s})\in {\mathcal {F}}^+\) and \(( \varDelta {\mathbf {x}}, \varDelta {\mathbf {y}}, \varDelta {\mathbf {s}})\) be the solution of the system

$$\begin{aligned} \begin{aligned} A \varDelta {\mathbf {x}}&= {\mathbf {0}} \\ A^T \varDelta {\mathbf {y}}+\varDelta {\mathbf {s}}&= {\mathbf {0}} \\ {\mathbf {s}} \varDelta {\mathbf {x}}+ {\mathbf {x}} \varDelta {\mathbf {s}}&= {\mathbf {z}}. \end{aligned} \end{aligned}$$

If \({\mathbf {z}}+\mathbf {xs}>0\) and \(({\mathbf {x}}+t_0 \varDelta {\mathbf {x}})({\mathbf {s}}+t_0 \varDelta {\mathbf {s}})>0\) holds for some \(t_0 \in (0,1]\), then \({\mathbf {x}}+t \varDelta {\mathbf {x}}>0\) and \({\mathbf {s}}+t \varDelta {\mathbf {s}}>0\) for all \(t \in (0,t_0]\).

We have already proved that \(\mathbf {h}(\alpha )>\mathbf {0}\) for all \(\alpha \in [0,1]\) (see Lemma 3), and \({\mathbf {x}}(\alpha ){\mathbf {s}}(\alpha )>0\) for \(\alpha _1=\sqrt{\beta \tau /n}\) and \(\alpha _2=1\) (see Lemma 5), therefore by Lemma 6, we have that the new points are also strictly positive, namely \({\mathbf {x}}(\alpha )> {\mathbf {0}}\) and \({\mathbf {s}}(\alpha )> {\mathbf {0}}\).

The following two statements propose bounds on the duality gap of the new point: \(\mu (\alpha )={\mathbf {x}}(\alpha )^T {\mathbf {s}} (\alpha )/n\).

Lemma 7

Let \(\alpha _1=\sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\). Then \(\mu (\alpha ) \ge \left( 1- \alpha _1 \right) \mu \).

Proof

Since \({\mathbf {v}}^T{\mathbf {p}}^+\ge 0\), and \(v_ip_i=v_i^2-\frac{v_i^2}{2v_i-1}\le v_i^2\) for all \(i\in {\mathcal {I}}_-\) then by (7) we have

$$\begin{aligned} \begin{aligned} \mu (\alpha )&=\frac{{\mathbf {x}}(\alpha )^T {\mathbf {s}} (\alpha )}{n} = \mu + \frac{\alpha _1 \tau \mu }{n} {\mathbf {v}}^T {\mathbf {p}}^-+\frac{\alpha _2 \tau \mu }{n} {\mathbf {v}}^T {\mathbf {p}}^+ \ge \mu +\frac{\alpha _1 \tau \mu }{n} {\mathbf {v}}^T {\mathbf {p}}^-\\&= \mu - \frac{\alpha _1 \tau \mu }{n} \sum _{i \in {\mathcal {I}}_-} \frac{2 v_i(v_i^2-v_i)}{2v_i-1} \ge \mu - \frac{\alpha _1 \tau \mu }{n} \sum _{i \in {\mathcal {I}}} v_i^2 = \left( 1- \alpha _1 \right) \mu . \end{aligned}\end{aligned}$$

\(\square \)

The following theorem guarantees the proper reduction of the duality gap after an iteration:

Lemma 8

Assume that \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau , \beta )\), \(\alpha _1 = \sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\). Then

$$\begin{aligned} \mu (\alpha ) \le \left( 1- \sqrt{\frac{\beta \tau }{n}} \left[ \frac{8}{9} \left( 1- \tau \right) -\sqrt{\beta \tau } \right] \right) \mu . \end{aligned}$$
(8)

Proof

Observe that

$$\begin{aligned} \mu (\alpha ) =\frac{{\mathbf {x}}(\alpha )^T {\mathbf {s}} (\alpha )}{n}= \mu + \frac{\alpha _1 \tau \mu }{n} {\mathbf {v}}^T{\mathbf {p}}^- + \frac{\alpha _2 \tau \mu }{n} {\mathbf {v}}^T {\mathbf {p}}^+. \end{aligned}$$

First, let us estimate the term \({\mathbf {v}}^T {\mathbf {p}}^+ \):

$$\begin{aligned} {\mathbf {e}}^T \left( {\mathbf {v}} {\mathbf {p}}^+ \right) = \left\| {\mathbf {v}} {\mathbf {p}}^+ \right\| _1 \le \sqrt{n} \left\| {\mathbf {v}} {\mathbf {p}}^+ \right\| \le \sqrt{n} \beta . \end{aligned}$$
(9)

The first equality holds since \({\mathbf {v}}\) is positive, and we consider only the positive part of \({\mathbf {p}}\). By applying the Cauchy-Schwarz inequality, we get the first estimation. Using the property \( v_i \le 1\) when \(i \in {\mathcal {I}}_+\) and the definition of the neighborhood \({\mathcal {W}}(\tau ,\beta )\), the last inequality can also be verified.

To obtain an upper bound on the expression \({\mathbf {v}}^T{\mathbf {p}}^- \), consider the inequalities \(2 {\mathbf {v}}-{\mathbf {e}}>{\mathbf {0}}\) and \(v_i>1\) for all \(i \in {\mathcal {I}}_-\):

$$\begin{aligned} {\mathbf {v}}^T{\mathbf {p}}^-&={\mathbf {e}}^T \left( {\mathbf {v}} \frac{2({\mathbf {v}}-{\mathbf {v}}^2)}{2 {\mathbf {v}}-{\mathbf {e}}}^- \right) = \sum _{i \in {\mathcal {I}}_-} \frac{2 v_i(v_i-v_i^2)}{2v_i-1} \nonumber \\&=\sum _{i \in {\mathcal {I}}_-} \frac{2 v_i^2}{(1+v_i)(2v_i-1)}(1-v_i^2) \nonumber \\&\le \sum _{i \in {\mathcal {I}}_-} \frac{8}{9} (1-v_i^2) \le \sum _{i \in {\mathcal {I}}} \frac{8}{9} (1-v_i^2)=\frac{8}{9} n \left( 1-\frac{1}{\tau } \right) . \end{aligned}$$
(10)

Using (9) and (10) we obtain

$$\begin{aligned} \mu (\alpha ) \le \mu + \frac{\alpha _1 \tau \mu }{n} \frac{8}{9} n \left( 1-\frac{1}{\tau } \right) +\frac{\alpha _2 \tau \mu }{n} \sqrt{n} \beta =\left( 1- \alpha _1 \left[ \frac{8}{9} \left( 1- \tau \right) -\sqrt{\beta \tau } \right] \right) \mu . \end{aligned}$$

\(\square \)

Notice, that the upper bound on \(\mu (\alpha )\) in (8) is positive for all \(\beta , \tau \in (0,1)\). Indeed,

$$\begin{aligned} 1- \sqrt{\frac{\beta \tau }{n}} \left[ \frac{8}{9} \left( 1- \tau \right) -\sqrt{\beta \tau } \right] \ge 1-\frac{8}{9}(1-\tau )>\frac{1}{9}.\end{aligned}$$

With a suitable parameter setting, we can ensure that the duality gap decreases strictly monotonically, i.e., \(\mu (\alpha )<\mu \).

Corollary 2

Let \(\tau \le 1/2\) and \(\beta \le 1/4\). If \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau , \beta )\), \(\alpha _1 = \sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\), then \(\mu (\alpha )< \mu \) holds.

Proof

We need to check whether the multiplier of \(\mu \) in inequality (8) is less than 1. This means that \(8/9(1-\tau )-\sqrt{\beta \tau }>0\) and this holds when \(\beta <64/81 (1-\tau )^2/\tau \), which is satisfied for our choice of parameter values. \(\square \)

In addition to strict feasibility, we also need to prove the fulfilment of the technical condition \({\mathbf {v}}(\alpha ) =\sqrt{\frac{{\mathbf {x}}(\alpha ) {\mathbf {s}}(\alpha )}{\tau \mu (\alpha )}} > \frac{1}{2} {\mathbf {e}}\).

Lemma 9

Let \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau , \beta )\), \(\alpha _1=\sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\). If \(\beta < \frac{\sqrt{3}}{4}\), then \({\mathbf {v}} (\alpha )> \frac{1}{2} {\mathbf {e}}\) holds.

Proof

From Lemma 5 and Corollary 2, we have

$$\begin{aligned} {\mathbf {v}}^2(\alpha )=\frac{{\mathbf {x}}(\alpha ) {\mathbf {s}}(\alpha )}{\tau \mu (\alpha )} \ge \frac{1-2\beta +\sqrt{1-2\beta }}{2} {\mathbf {e}}.\end{aligned}$$
(11)

Since \(\frac{1-2\beta +\sqrt{1-2\beta }}{2} > \frac{1}{4}\) if \(\beta < \frac{\sqrt{3}}{4}\), we have proved the statement. \(\square \)

To show that the new iterates remain in the neighborhood \({\mathcal {W}}(\tau ,\beta )\), we need another technical lemma:

Lemma 10

Let \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau , \beta )\), \(\alpha _1=\sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\). Then

$$\Vert [\tau \mu (\alpha ){\mathbf {e}}-{\mathbf {h}}(\alpha )]^+ \Vert \le \beta \tau \mu (\alpha ) \left( 1- \frac{1+\sqrt{1-2\beta }}{2} \right) .$$

Proof

Based on Lemma 2, \(\tau \mu (\alpha )- h_i(\alpha ) \le 0\) for all \(i \in {\mathcal {I}}_-\). Therefore we need to examine indices only from the set \({\mathcal {I}}_+\).

Since \(1/2<v_i \le 1\) for all \(i \in {\mathcal {I}}_+\), we have

(12)

Using Corollary 2 and (12), we obtain that

$$\begin{aligned} \begin{aligned} \tau \mu (\alpha )-h_i(\alpha )&= \tau \mu (\alpha )-\tau \mu \left( v_i^2+ v_i p_i \right) \le \tau \mu (\alpha ) \left( 1-v_i^2-v_i p_i \right) \\&= \tau \mu (\alpha ) p_i (1- v_i) \le \tau \mu (\alpha ) p_i \left( 1- \frac{1+\sqrt{1-2\beta }}{2} \right) \ \forall \ i\in {\mathcal {I}}_+, \end{aligned} \end{aligned}$$

where in the last estimation, we used the first statement of Corollary 1.

Using the definition of \({\mathcal {W}}(\tau ,\beta )\), we obtain

which concludes the proof. \(\square \)

Now we are ready to prove that after an iteration, if the right-hand side of the third equation in the Newton system (6) is denoted by \({\mathbf {p}}(\alpha )\), then \(\Vert {\mathbf {p}}(\alpha )^+ \Vert \le \beta \) holds. Together with Lemma 9, this means that the new iterates after the Newton-step remain in the neighborhood \({\mathcal {W}}(\tau ,\beta )\) .

Lemma 11

Let \(\beta \le \frac{1}{8}\), \(\tau \le \frac{1}{8}\). If \(({\mathbf {x}}, {\mathbf {y}}, {\mathbf {s}}) \in {\mathcal {W}}(\tau , \beta )\), \(\alpha _1=\sqrt{\frac{\beta \tau }{n}}\) and \(\alpha _2=1\), then the new iterate stays in the same neighborhood, namely \(({\mathbf {x}}(\alpha ), {\mathbf {y}}(\alpha ), {\mathbf {s}}(\alpha )) \in {\mathcal {W}}(\tau , \beta )\).

Proof

By the definition of \({\mathcal {W}}(\tau ,\beta )\) and Lemma 9, we need to prove

$$\begin{aligned} \left\| {\mathbf {p}}(\alpha )^+ \right\| =\left\| \left[ \frac{2 {\mathbf {v}}(\alpha )({\mathbf {e}}-{\mathbf {v}}(\alpha ))}{2 {\mathbf {v}}(\alpha )-{\mathbf {e}}} \right] ^+ \right\| \le \beta . \end{aligned}$$

Since \( \frac{2 {\mathbf {v}}(\alpha )}{ 2 {\mathbf {v}}^2(\alpha )+{\mathbf {v}}(\alpha )-{\mathbf {e}}} > {\mathbf {0}}\) when \({\mathbf {v}}(\alpha )> 1/2 {\mathbf {e}}\), we have

$$\begin{aligned} \left\| {\mathbf {p}}(\alpha )^+ \right\|&= \left\| \frac{2 {\mathbf {v}}(\alpha )}{\left( 2 {\mathbf {v}}(\alpha )-{\mathbf {e}} \right) \left( {\mathbf {e}}+{\mathbf {v}}(\alpha ) \right) } \left[ {\mathbf {e}}-{\mathbf {v}}^2 (\alpha ) \right] ^+ \right\| \nonumber \\&\le \left\| \frac{2 {\mathbf {v}}(\alpha )}{ 2 {\mathbf {v}}^2(\alpha )+{\mathbf {v}}(\alpha )-{\mathbf {e}}} \right\| _{\infty } \left\| \left[ {\mathbf {e}}-{\mathbf {v}}^2 (\alpha ) \right] ^+ \right\| . \end{aligned}$$
(13)

Let \(q: \left( \frac{1}{2}, \infty \right) \rightarrow {\mathbb {R}}\) defined by \(q(t)=\frac{2t}{2t^2+t-1}\). This function is strictly decreasing on its domain, therefore using (11), the first term in (13) can be estimated as

$$\begin{aligned} \left\| \frac{2 {\mathbf {v}}(\alpha )}{ 2 {\mathbf {v}}^2(\alpha )+{\mathbf {v}}(\alpha )-{\mathbf {e}}} \right\| _{\infty } \le q\left( \sqrt{\frac{1-2\beta +\sqrt{1-2 \beta }}{2}}\right) , \end{aligned}$$
(14)

where the expression \(\sqrt{(1-2\beta +\sqrt{1-2 \beta })/2}\) is strictly decreasing in \(\beta \), implying that the upper bound is strictly increasing in \(\beta \).

To give an upper bound on \(\left\| \left[ {\mathbf {e}}-{\mathbf {v}}^2 (\alpha ) \right] ^+ \right\| \), we use Lemmas 10, 4 and then 7:

$$\begin{aligned} \left\| \left[ {\mathbf {e}}-{\mathbf {v}}^2 (\alpha ) \right] ^+ \right\|&=\frac{1}{\tau \mu (\alpha )} \left\| \left[ \tau \mu (\alpha ){\mathbf {e}}-{\mathbf {x}}(\alpha ) {\mathbf {s}}(\alpha )\right] ^+ \right\| \nonumber \\&\le \frac{1}{\tau \mu (\alpha )} \Big ( \left\| [\tau \mu (\alpha ) {\mathbf {e}}-{\mathbf {h}}(\alpha )]^+ \right\| + \tau \mu \left\| [\mathbf {dx}(\alpha )\mathbf {ds}(\alpha )]^- \right\| \Big ) \nonumber \\&\le \frac{1}{\tau \mu (\alpha )} \left( \beta \tau \mu (\alpha ) \left( 1-\frac{1+\sqrt{1-2\beta }}{2} \right) +\tau \mu \frac{\beta }{2} \right) \nonumber \\&\le \beta \left( \frac{1-\sqrt{1-2\beta }}{2}+ \frac{1}{2-2\sqrt{\beta \tau } } \right) , \end{aligned}$$
(15)

where the last term is strictly increasing in both \(\beta \) and \(\tau \).

Using the just proved inequalities (13), (14) and (15), we obtain

(16)

To prove that this expression is less than or equal to \(\beta \), we need to ensure that the value of the term in square brackets is at most 1. Notice that by the monotonicity of the estimations (14) and (15), their product is also strictly increasing both in \(\beta \) and \(\tau \). Moreover, substituting \(\beta =\tau =1/8\), the coefficient of \(\beta \) on the right-hand side of (16) is less than 0.77, which concludes the proof. \(\square \)

5 Iteration bound of the algorithm

Theorem 1

Let \(\beta =\tau =\frac{1}{8}\), \(\alpha _1=\sqrt{\frac{\beta \tau }{n}}\), \(\alpha _2=1\), and suppose that a starting point \(({\mathbf {x}}_0, {\mathbf {y}}_0, {\mathbf {s}}_0) \in {\mathcal {W}}(\tau ,\beta )\) is given. The algorithm then provides an \(\varepsilon \)-optimal solution of the primal-dual pair of LPs in

$$\begin{aligned} O\left( \sqrt{n} \log \frac{{\mathbf {x}}_0^T {\mathbf {s}}_0}{\varepsilon } \right) \end{aligned}$$

iterations.

Proof

Let \(({\mathbf {x}}_k,{\mathbf {y}}_k,{\mathbf {s}}_k)\) denote the point given by the algorithm in the \(k^\mathrm{th}\) iteration. According to Lemma 8, the following inequality holds for the duality gap in the \(k^\mathrm{th}\) iteration:

From the above inequalities, we get that \({\mathbf {x}}_k^T {\mathbf {s}}_k \le \varepsilon \) holds if

$$\begin{aligned}\left( 1- \sqrt{\frac{\beta \tau }{n}} \left[ \frac{8}{9} (1-\tau )-\sqrt{\tau \beta } \right] \right) ^k \mu _0 n \le \varepsilon \end{aligned}$$

is satisfied. Taking the logarithm of both sides, we obtain

$$\begin{aligned} k \log \left[ 1- \sqrt{\frac{\beta \tau }{n}} \left( \frac{8}{9} (1-\tau )-\sqrt{\tau \beta } \right) \right] + \log (\mu _0 n) \le \log \varepsilon .\end{aligned}$$

Using the inequality \(-\log (1-\vartheta ) \ge \vartheta \), we can require the fulfillment of the stronger inequality

$$\begin{aligned}-k \sqrt{\frac{\beta \tau }{n}} \left( \frac{8}{9} (1-\tau )-\sqrt{\tau \beta } \right) + \log (\mu _0 n) \le \log \varepsilon . \end{aligned}$$

The last inequality is satisfied when

$$\begin{aligned} k \ge \sqrt{\frac{n}{\beta \tau }} \frac{1}{\frac{8}{9} (1-\tau )-\sqrt{\tau \beta }} \log \left( \frac{{\mathbf {x}}_0^T {\mathbf {s}}_0}{\varepsilon } \right) , \end{aligned}$$

and this proves the statement. \(\square \)

In the analysis, we applied the fixed step lengths \(\alpha _1=\sqrt{\frac{\beta \tau }{n}}\), \(\alpha _2=1\). When describing the IPA in Algorithm 1, we chose \(\alpha _1\) as the largest value so that the new iterate remains in the neighborhood \({\mathcal {W}}(\tau ,\beta )\). Since the duality gap is strictly decreasing in \(\alpha _1\) and \(\sqrt{\frac{\beta \tau }{n}}\) is a lower bound on the value of \(\alpha _1\) in Algorithm 1, its complexity is at least as good as the analyzed case, i.e., the derived complexity result holds for Algorithm 1 as well.

Table 1 Size of the selected LP instances before and after preprocessing and the time required for the preprocessing and postsolving procedures
Table 2 Numerical results for the selected Netlib LP instances

As can be seen from Theorem 1, the investigated method can produce an \(\varepsilon \)-optimal solution to LP problems in polynomial time. There are different results in the literature on rounding the solutions provided by an IPA to an exact solution in polynomial time, see, e.g., Mehrotra and Ye (1993), Roos et al. (1997).

6 Numerical results

To illustrate that the method can be applied to solve LP problems in practice, we implemented it in Matlab and solved selected linear programming problem instances from the Netlib library (Gay 1985). The numerical experiments were carried out on a Dell laptop with an Intel i7 processor and 16 GB RAM.

First, we transformed the problems to the standard form, then eliminated the redundant constraints using the procedure |eliminateRedundantRows.m| by Ploskas and Samaras (2017). After these reformulations, we applied a similar method to procedure CLEAN from Adler et al. (1989) to eliminate fix-valued variables from the linear programming problems.

To be able to give strictly feasible initial points in the neighborhood \({\mathcal {W}}(\tau ,\beta )\), we first transformed the problems into symmetric form and then applied the self-dual embedding technique (Ye et al. 1994). To avoid doubling the number of constraints in the first case, we carried out this reformulation according to the last Remark of Jansen et al. (1994, p. 232).

The numbers of rows and columns of the original LP problems (in standard form) are denoted by \(m_0\) and \(n_0\), respectively, while the sizes after the reformulations and the embedding procedure are denoted by m and n. These are shown in the second to fifth columns of Table 1. We note that \(m_0\) and \(n_0\) differ from the number of rows and columns given on the Netlib site, since the original formulation of the Netlib LP problems possibly contains lower and upper bounds on the variables. In these cases, the sizes were modified when we reformulated the problems in standard form. The times required to clean and embed the problem (preprocessing) and retrieve the solution of the original optimization problem (postsolve) are also shown in Table 1, in the columns "Prep. (s)" and "Posts. (s)".

For the embedded problem, we may choose \({\mathbf {x}}={\mathbf {e}}\) and \({\mathbf {s}}={\mathbf {e}}\) as proper initial points, since they are strictly feasible and are included in the neighborhood \({\mathcal {W}}(\tau ,\beta )\). The step lengths \(\alpha _1\) and \(\alpha _2\) were calculated in the following greedy way. We fixed the value of \(\alpha _2\) as 1 and determined the largest value \(\alpha _1\) so that the new point \(({\mathbf {x}}(\alpha ), {\mathbf {y}}(\alpha ), {\mathbf {s}}(\alpha ))\) remains in the neighborhood \({\mathcal {W}}(\tau ,\beta )\).

We compared three variants of Algorithm 1, based on the functions \(\varphi (t)=t\), \(\varphi (t)=\sqrt{t}\) and \(\varphi (t)=t-\sqrt{t}\). The first IPA is a moderately modified version of the original method of Ai and Zhang (2005) (we used a slightly different neighborhood definition \(\mathcal {W}(\tau ,\beta )\)). The second case is the IPA proposed by Darvay and Takács (2018). The third IPA is the algorithm introduced in this paper.

The value of the precision parameter \(\varepsilon \) was \(10^{-6}\). The number of iterations and the running time (in seconds) required to achieve this precision (i.e., to find a point for which the duality gap is less than \(\varepsilon \)) for the different algorithm variants are shown in Table 2.

According to our numerical results, there is no significant difference in the performance of the three algorithms for linear programming problems; however, the second variant is moderately better on this test problem set, both in terms of average number of iterations and average running time. It can also be observed that the new variant performs slightly better than the algorithm based on the function \(\varphi (t)=t\).

7 Conclusion

We investigated a new long-step IPA based on the algebraic equivalent transformation technique, using the function \(\varphi (t)=t-\sqrt{t}\) and a new Ai-Zhang-type wide neighborhood \({\mathcal {W}}(\tau ,\beta )\).

We proved that the algorithm is well-defined and provides an \(\varepsilon \)-optimal solution in at most \(O \left( \sqrt{n} \log \left( \frac{{\mathbf {x}}_0^T {\mathbf {s}}_0}{\varepsilon } \right) \right) \) steps, therefore, it has the same theoretical complexity as the best short-step variants. According to our preliminary numerical results, the new algorithm performs well in practice.

To extend our results, we would like to propose a similar long-step algorithm for \(\mathcal {P}_*(\kappa )\) linear complementarity problems, based on the function \(\varphi (t)=t-\sqrt{t}\). We expect that the choice of function \(\varphi \) will cause significant difference in the performance of the different variants.

Another interesting question for further research is investigating an infeasible variant of the proposed IPA to avoid applying the self-dual embedding technique when determining the starting points.