Infeasible Interior-Point Methods for Linear Optimization Based on Large Neighborhood

Asadi, Alireza; Roos, Cornelis

doi:10.1007/s10957-015-0826-5

Infeasible Interior-Point Methods for Linear Optimization Based on Large Neighborhood

Open access
Published: 29 October 2015

Volume 170, pages 562–590, (2016)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Infeasible Interior-Point Methods for Linear Optimization Based on Large Neighborhood

Download PDF

Alireza Asadi¹ &
Cornelis Roos¹

2516 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, we design a class of infeasible interior-point methods for linear optimization based on large neighborhood. The algorithm is inspired by a full-Newton step infeasible algorithm with a linear convergence rate in problem dimension that was recently proposed by the second author. Unfortunately, despite its good numerical behavior, the theoretical convergence rate of our algorithm is worse up to square root of problem dimension.

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

1 Introduction

We consider the linear optimization (LO) problem in standard format, given by (P), and its dual problem given by (D) (see Sect. 3). We assume that the problems (P) and (D) are feasible. As is well known, this implies that both problems have optimal solutions and the same optimal value. In Sect. 8, we discuss how infeasibility and/or unboundedness of (P) and (D) can be detected.

Interior-point methods (IPMs) for LO are divided into two classes: feasible interior-point methods (FIPMs) and infeasible interior-point methods (IIPMs). FIPMs assume that a primal-dual strictly feasible point is available from which the algorithm can immediately start. In order to get such a solution several initialization methods have been presented, e.g., by Megiddo [1] and Anstreicher [2]. The so-called big M method of Megiddo has not been received well due to the numerical instabilities caused by the use of huge coefficients. A disadvantage of the so-called combined phase-I and phase-II method of Anstreicher is that the feasible region must be nonempty and a lower bound on the optimal objective value must be known.

IIPMs have the advantage that they can start with an arbitrary point and try to achieve feasibility and optimality, simultaneously. It is worth to mention that the best-known iteration bound for IIPMs is due to Mizuno [3]. We refer to, e.g., [4] for a survey of IIPMs.

Recently, the second author proposed an IIPM with full-Newton steps [4]. Each main iteration of the algorithm improves optimality and feasibility simultaneously, by a constant factor until a solution with a given accuracy is obtained. The amount of reduction is too small for practical purposes. The aim of this paper is to design a more aggressive variant of the algorithm which improves optimality and feasibility faster. We emphasize that this is our aim and also what happens in practice. As we will see, however, our algorithm suffers the same irony that occurs for FIPMs, namely that the theoretical convergence rate of large-update methods is worse than for full-Newton methods.

In the analysis of our algorithm, we use so-called kernel function-based barrier function. Any such barrier function is based on a univariate function, named its kernel function.^{Footnote 1} Such functions can be found in [5] and are closely related to the so-called self-regular functions introduced in [6]. In these references only FIPMs are considered, and it is shown that these functions are much more efficient for the process of re-centering, which is a crucial part in every FIPM, especially when an iterate is far from the central path. Not surprising, it turned out that these functions are also useful in our algorithm, where re-centering is also a crucial ingredient.

The paper is organized as follows. In Sect. 2, we describe the notations which we use throughout the paper. In Sect. 3, we explain the properties of a kernel function-based barrier function. In Sect. 4, as a preparation to our large neighborhood-based IIPM, we briefly recall the use of kernel-based barrier functions in large-update FIPMs, as presented in [5]. It will become clear in this section that the convergence rate highly depends on the underlying kernel function.

In Sect. 5, we describe our algorithm in detail. In our description we use a search direction based on a general kernel function. The algorithm uses two types of damped Newton steps: a so-called feasibility step and some centering steps. The feasibility step serves to reduce the infeasibility, whereas the centering steps keep the infeasibility fixed, but improve the optimality. This procedure is repeated until a solution with enough feasibility and optimality is obtained. Though many parts of our analysis are valid for general kernel function, at some places we restrict ourselves to a specific kernel function. We show that the complexity of our algorithm based on this kernel function is slightly worse than the complexity of the algorithm introduced and discussed by Salahi et al. [7], who used another variant of this function. Note that the best-known iteration bound for the IIPMs that use a large neighborhood of the central path is due to Potra and Stoer [8] for a class of superlinearly convergent IIPMs for sufficient linear complementarity problems (LCP).

2 Notations

We use the following notational conventions. If $x, s \in \mathbb {R}^n$, then xs denotes the coordinatewise (or Hadamard) product of the vectors x and s. The nonnegative orthant and positive orthant are denoted as $\mathbb {R}^n_+$ and $\mathbb {R}^n_{++}$, respectively. If $z\in \mathbb {R}_+^n$ and $f: \mathbb {R}_+\rightarrow \mathbb {R}_+$, then $f\left( z\right) $ denotes the vector in $\mathbb {R}_+^n$ whose i-th component is $f\left( z_i\right) $, with $1\le i \le n$. We write $f(x)=O(g(x))$, if for any x in the domain of f, $f(x)\le cg(x)$ for some positive constant c. For $z\in \mathbb {R}^n$, we denote the $l_1$-norm by $\left\| z\right\| _1$, and the Euclidean norm by $\left\| z\right\| $.

3 Barrier Functions Based on Kernel Functions

Let the primal LO problem be defined as

$$\begin{aligned} \min \left\{ c^Tx : Ax=b,\quad x \ge 0\right\} , \end{aligned}$$

(P)

and its dual problem as

$$\begin{aligned} \max \left\{ b^Ty : A^Ty+s=c,\quad s \ge 0\right\} . \end{aligned}$$

(D)

Here, A is a full row rank matrix in $\mathbb {R}^{m\times n},\,b,y\in \mathbb {R}^m$, $c,x,s\in \mathbb {R}^n$; x, y and s are vectors of variables.

It is well known that the triple $\left( x,y,s\right) $ is optimal for (P) and (D) if and only if

$$\begin{aligned} Ax= & {} b,\quad x\ge 0,\nonumber \\ A^Ty + s= & {} c,\quad s\ge 0,\nonumber \\ xs= & {} 0. \end{aligned}$$

(1)

The first two equations require primal and dual feasibility, respectively, and $xs = 0$ is the so-called complementarity condition. Note that since x and s are nonnegative, $xs = 0$ holds if and only if $x^Ts=0$. Therefore, since the feasibility conditions imply $x^Ts = c^Tx-b^Ty$, the complementarity condition resembles the fact that at optimality the primal and dual objective values coincide.

Interior-point methods (IPMs) replace the complementarity condition by $xs=\mu {\mathbf{e}},$ where $\mu >0$. This gives rise to the following system:

$$\begin{aligned} Ax= & {} b,\quad x\ge 0,\nonumber \\ A^Ty + s= & {} c,\quad s\ge 0,\nonumber \\ xs= & {} \mu {\mathbf{e}}, \end{aligned}$$

(2)

where $xs=\mu {\mathbf{e}}$ is called the centering condition. It has been established (see, e.g., [9, Theorem II.4]) that the system (2) has a solution for some $\mu >0$ if and only if it has a solution for every $\mu >0$. Moreover, this holds if and only if (P) and (D) satisfy the interior-point condition (IPC), i.e., there exist a feasible $x>0$ for (P) and a feasible (y, s) with $s>0$ for (D). If the matrix A has full row rank, then this solution is unique. It is denoted by $\left( x(\mu ),y(\mu ),s(\mu )\right) $ and called the $\mu $-center of (P) and (D). The $\mu $-centers form a virtual path inside the feasibility region leading to an optimal solution of (P) and (D). This path is called the central path of (P) and (D). When driving $\mu $ to zero, the central path converges to an optimal triple (x, y, s) for (P) and (D).

We proceed by showing that the $\mu $-center can be characterized as the minimizer of a suitably chosen primal-dual barrier function. In fact we will define a wide class of such barrier functions, each of which is determined by a kernel function. A kernel function is just a univariate nonnegative function $\psi (t)$, where $t>0$, which is strictly convex, minimal at $t=1$ and such that $\psi (1)=0$, whereas $\psi (t)$ goes to infinity both when t goes to zero and when t goes to infinity.

Now let (x, y, s) be a feasible triple with $x>0$ and $s>0$. We call such a triple strictly feasible. We define the variance vector for this triple with respect to $\mu $ as follows:

$$\begin{aligned} v := \sqrt{\frac{xs}{\mu }}. \end{aligned}$$

Observe that $v={\mathbf{e}}$ holds if and only if (x, y, s) is the $\mu $-center of (P) and (D). Given any kernel function $\psi $ we extend its definition to $\mathbb {R}^n_{++}$ according to

$$\begin{aligned} \varPsi (v):=\sum \limits _{i=1}^n\psi (v_i). \end{aligned}$$

(3)

It is obvious that $\varPsi (v)$ is nonnegative everywhere, and $\varPsi ({\mathbf{e}})=0$. Yet we can define a barrier function $\varPhi (x,s,\mu )$ as follows:

$$\begin{aligned} \varPhi (x,s,\mu ) := \varPsi (v), \end{aligned}$$

(4)

where v is the variance vector of (x, y, s) with respect to $\mu $. It is now obvious that $\varPhi (x,s,\mu )$ is well-defined, nonnegative for every strictly feasible triple, and moreover,

$$\begin{aligned} \varPhi (x,s,\mu )=0 \Leftrightarrow \varPsi (v) = 0 \Leftrightarrow v = {\mathbf{e}}\Leftrightarrow (x,y,s) = (x(\mu ),y(\mu ),s(\mu )). \end{aligned}$$

This implies that $(x(\mu ),y(\mu ),s(\mu ))$ is the (unique) minimizer of $\varPhi (x,s,\mu )$.

As in [5], we call the kernel function $\psi $ eligible iff it satisfies the following technical conditions.

$$\begin{aligned} t\psi ^{\prime \prime }(t)+\psi ^{\prime }(t)> & {} 0,\,\quad t< 1,\\ t\psi ^{\prime \prime }(t)-\psi ^{\prime }(t)> & {} 0,\,\quad t> 1,\\ \psi ^{\prime \prime \prime }(t)< & {} 0,\,\quad t>0,\\ 2\psi ^{\prime \prime }(t)^2 -\psi ^{\prime }(t)\psi ^{\prime \prime \prime }(t)> & {} 0,\,\quad t< 1. \end{aligned}$$

In the sequel it is always assumed that $\psi $ is an eligible kernel function. Properties of eligible kernel functions will be recalled from [5] without repeating their proofs.

We would like to mention that a kernel function $\psi $, introduced and discussed in [5], has the following format

$$\begin{aligned} \psi (t) = \tfrac{t^2-1}{2} + \psi _b(t),\,\quad t>0. \end{aligned}$$

The term $\tfrac{t^2-1}{2}$ is called the growth term which goes to infinity as t goes to infinity. $\psi _b$ has the property that it goes to infinity as t tends to zero. This term is called the barrier term of $\psi $.

4 A Class of Large-update FIPMs for LO

In this section we recall from [5] some results for a large-update FIPM for solving (P) and (D) using a kernel function-based barrier function.

Let us start with a definition. We call a triple (x, y, s), with $x>0,\,s>0,$ an $\varepsilon $-solution of (P) and (D) iff the duality gap $x^Ts$ and the norms of the residual vectors $r_b:= b - A x, \text{ and } r_c:= c-A^Ty - s$, do not exceed $\varepsilon $. In other words, defining $\epsilon (x,y,s):=\max \{x^Ts,\Vert r_b\Vert ,\Vert r_c\Vert \},$ we say that (x, y, s) is an $\varepsilon $-solution if $\epsilon (x,y,s)\le \varepsilon .$

We assume, without any loss of generality, that the triple

$$\begin{aligned} (x^0,y^0,s^0) = ({\mathbf{e}},0,{\mathbf{e}}), \end{aligned}$$

(5)

is a primal-dual feasible solution.^{Footnote 2} We then have $x^0s^0=\mu ^0{\mathbf{e}}$ for $\mu ^0=1$. This means that $(x^0,y^0,s^0)$ is the 1-center, and hence $\varPhi (x^0,s^0,\mu ^0)=0$. We use this triple to initialize our algorithm.

Each main (or outer) iteration of the algorithm starts with a strictly feasible triple (x, y, s) that satisfies $\varPhi (x,s,\mu )\le \tau $ for some $\mu \in ]0,1]$, where $\tau $ is a fixed positive constant. It then constructs a new triple $(x^+,y^+,s^+)$ such that $\varPhi (x^+,s^+,\mu ^+)\le \tau $ with $\mu ^+<\mu $. When taking $\tau $ small enough, we obtain in this way a sequence of strictly feasible triples that belong to small neighborhoods of a sequence of $\mu $-centers, for a decreasing sequence of $\mu $’s. As a consequence, the sequence of constructed triples (x, y, s) converges to an optimal solution of (P) and (D).

We will assume that $\mu ^+ = (1-\theta )\mu $, where $\theta \in ]0,1[$ is a fixed constant, e.g., $\theta =0.5$ or $\theta =0.99$. The larger the $\theta $, the more aggressive the algorithm is. In particular when $\theta $ is large, each outer iteration will require several so-called inner iterations.

It is easy to compute the number the outer iterations. Using $\mu ^0=1$ one has the following result.

Lemma 4.1

(cf. [9, Lemma II.17]) The algorithm needs at most

$$\begin{aligned} \left\lceil \frac{1}{\theta }\log \frac{n}{\varepsilon }\right\rceil , \end{aligned}$$

outer iterations to generate a strictly feasible $\varepsilon $-solution.

The main task is therefore to get a sharp upper estimate for the number of inner iterations during an outer iteration. We now describe how such an estimate is obtained. We go into some detail, though without repeating proofs, because the results that we recall below are relevant for the IIPM that we discuss in the next section.

As said before, at the start of each outer iteration we have a strictly feasible triple (x, y, s) and $\mu >0$ such that $\varPhi (x,s,\mu ) \le \tau $. We first need to estimate the increase in $\varPhi $ when $\mu $ is updated to $\mu ^+ = (1-\theta )\mu $. For this we need the following Lemma.

Lemma 4.2

(cf. [5, Theorem 3.2]) Let $\rho :[0,\infty [\rightarrow [1,\infty [$ be the inverse function of $\psi (t)$ for $t\ge 1$. Then we have for any positive vector v and any $\beta \ge 1$:

$$\begin{aligned} \varPsi (\beta v)\le n\psi \left( \beta \rho \left( \frac{\varPsi (v)}{n}\right) \right) . \end{aligned}$$

Now let v be the variance vector of (x, y, s) with respect to $\mu $. Then one easily understands that the variance vector $v^+$ of (x, y, s) with respect to $\mu ^+$ is given by $v^+:=v/\sqrt{1-\theta }$. Hence, using Lemma 4.2 with $\beta = 1/\sqrt{1-\theta }$ we may write

$$\begin{aligned} \varPhi (x,s,\mu ^+) = \varPsi (v^+) = \varPsi \left( \frac{v}{\sqrt{1-\theta }}\right) \le n\psi \left( \frac{\rho \left( \frac{\varPsi (v)}{n}\right) }{\sqrt{1-\theta }}\right) \le n\psi \left( \frac{\rho \left( \frac{\tau }{n}\right) }{\sqrt{1-\theta }}\right) , \end{aligned}$$

where the last inequality holds because $\rho $ is monotonically increasing and $\varPsi (v):=\varPhi (x,s,\mu ) \le \tau $. Hence the number $\bar{\tau }$, defined by

$$\begin{aligned} \bar{\tau } := n\psi \left( \frac{\rho \left( \frac{\tau }{n}\right) }{\sqrt{1-\theta }}\right) , \end{aligned}$$

(6)

is an upper bound for the value of $\varPsi $ after a $\mu $-update. Note that this bound depends not on the triple (x, y, s), but only on the kernel function $\psi $ and the parameters $n,\tau $ and $\theta $.

To simplify the notation we redefine $\mu $ according to $\mu :=\mu ^+$. Thus we need to deal with the following question: given a triple (x, y, s) such that $\varPhi (x,s,\mu ) \le \bar{\tau }$, how many inner iterations are needed to generate a triple (x, y, s) such that $\varPhi (x,s,\mu ) \le \tau $. To answer this question we have to describe an inner iteration. It has been argued in Section 2.2. of [5] that it is natural to define search directions $(\varDelta x,\varDelta y,\varDelta s)$ by the system

$$\begin{aligned} A\varDelta x= & {} 0,\\ A^T\varDelta y + \varDelta s= & {} 0,\\ s\varDelta x + x\varDelta s= & {} -\mu v\nabla \varPsi (v). \end{aligned}$$

This system has a unique solution. It may be worth pointing out that if $\psi (t) = \psi _1(t)$, then $-\mu v\nabla \varPsi (v)=\mu {\mathbf{e}}-xs$, and hence the resulting direction is the primal-dual Newton direction that is used in all primal-dual FIPMs. By doing a line search in this direction with respect to $\varPsi $ we get new iterates

$$\begin{aligned} (x,y,s):= (x,y,s) + \alpha (\varDelta x,\varDelta y,\varDelta s), \end{aligned}$$

where $\alpha $ is the step size. According to [5, Lemma 4.4], we use below the following default step size:

$$\begin{aligned} \alpha =\frac{1}{\psi ^{\prime \prime }(\rho (2\delta (v)))}, \end{aligned}$$

where $\rho $ is the inverse function of $-\tfrac{1}{2}\psi ^{\prime }(t)$, and

$$\begin{aligned} \delta (v) := \frac{1}{2}\left\| \nabla \varPsi (v)\right\| . \end{aligned}$$

Fig. 1 shows a formal description of the algorithm.

The closeness of (x, y, s) to the $\mu $-center is measured by $\varPsi (v)$, where v is the variance vector of (x, y, s) with respect to the current value of $\mu $. The initial triple (x, y, s) is as given by (5) and $\mu =1$. So we then have $\varPsi (v)=0 \le \tau $. After a $\mu $-update we have $\varPsi (v) \le \bar{\tau }$. Then a sequence of inner iterations is performed to restore the inequality $\varPsi (v)\le \tau $. Then $\mu $ is updated again and so on. This process is repeated until $n\mu $ falls below the accuracy parameter $\varepsilon $ after which we have obtained an $\varepsilon $-solution.

To estimate the number of inner iterations we proceed as follows. Denoting the decrease in the value of $\varPsi $ as $\varDelta \varPsi $, it was shown in [5, Theorem 4.6] that

$$\begin{aligned} \varDelta \varPsi \ge \alpha \delta (v)^2 = \frac{\delta (v)^2}{\psi ^{\prime \prime }(\rho (2\delta (v)))}. \end{aligned}$$

(7)

Since the kernel function $\psi $ is eligible, the last expression is increasing in $\delta (v)$ [5, Lemma 4.7]. Besides, by [5, Theorem 4.9], $\delta (v)$ is bounded from below as follows:

$$\begin{aligned} \delta (v)\ge \tfrac{1}{2}\psi ^{\prime }\left( \rho \left( \varPsi (v)\right) \right) . \end{aligned}$$

(8)

Combining (7) and (8), we arrive at

$$\begin{aligned} \varDelta \varPsi \ge \frac{(\psi ^{\prime }\left( \rho \left( \varPsi (v)\right) \right) )^2}{4\psi ^{\prime \prime }\left( \rho \left( \psi ^{\prime }\left( \rho \left( \varPsi \left( v\right) \right) \right) \right) \right) }. \end{aligned}$$

(9)

Following [5], let $\gamma $ be the smallest number such that

$$\begin{aligned} \varPsi (v)^{\gamma -1}\frac{(\psi ^{\prime }\left( \rho \left( \varPsi (v)\right) \right) )^2}{4\psi ^{\prime \prime }\left( \rho \left( \psi ^{\prime }\left( \rho \left( \varPsi \left( v\right) \right) \right) \right) \right) } \ge \kappa \end{aligned}$$

(10)

for some positive constant $\kappa $, whenever $\varPsi (v)\ge \tau $. In [10], it is established that such constants $\kappa $ and $\gamma $ exist for each kernel functions. When denoting the value of the barrier function after the $\mu $-update as $\varPsi _0$ and the value after the k-th inner iteration as $\varPsi _k$, it follows from (9) and (10) that

$$\begin{aligned} \varPsi _0\le \bar{\tau },\,\quad \varPsi _k \le \varPsi _{k-1} - \kappa \varPsi _{k-1}^{1-\gamma },\, k=1,\, 2\ \ldots , \end{aligned}$$

(11)

with $\bar{\tau }$ as in (6). At this stage we may point out why the use of kernel functions other than the logarithmic kernel function may be advantageous. The reason is that if $\psi $ is the logarithmic kernel function, then $\gamma =1$, whence we obtain $\varPsi _k \le \varPsi _{k-1} - \kappa $ for each $k\ge 1$, provided that $\varPsi _{k-1}\ge \tau $. This resembles the well-known fact that the best lower bound for the decrease in the logarithmic barrier function is a fixed constant, no matter what the value of $\varPsi (v)$ is. As we will see, smaller values of $\gamma $ can be obtained for other kernel functions, which leads to larger reductions in the barrier function value, and hence lower iteration numbers.

By [5, Lemma 5.1], (11) implies that the number of inner iterations will not exceed

$$\begin{aligned} \frac{\bar{\tau }^{\gamma }}{\kappa \gamma } = \frac{1}{\kappa \gamma }\left( n\psi \left( \frac{\rho \left( \frac{\tau }{n}\right) }{\sqrt{1-\theta }}\right) \right) ^\gamma . \end{aligned}$$

(12)

Multiplying this number by the number of outer iterations, as given by Lemma 4.1, we obtain the following upper bound for the total number of iterations:

$$\begin{aligned} \frac{1}{\theta \kappa \gamma }\left( n\psi \left( \frac{\rho \left( \frac{\tau }{n}\right) }{\sqrt{1-\theta }}\right) \right) ^\gamma \log \frac{n}{\varepsilon }. \end{aligned}$$

Given a kernel function $\psi $, it is now straightforward to compute the resulting iteration bound from this expression.

Let now $\psi (t) = \psi _q(t)$, where

$$\begin{aligned} \psi _q(t):= \frac{t^2-1}{2} + \frac{t^{1-q}-1}{q-1},\quad \text {with}\quad q>1. \end{aligned}$$

This function is introduced and discussed in [11] whereby it is established that $\gamma = \tfrac{q+1}{2q}$ and $\kappa = \tfrac{1}{124\,q}$. As a result, the iteration number turns out to be

$$\begin{aligned} O\left( qn^{\tfrac{q+1}{2q}}\log \tfrac{n}{\varepsilon }\right) . \end{aligned}$$

The expression $qn^{\tfrac{q+1}{2q}}$ is minimal at $q = \tfrac{1}{2}\log n$ and then it is equal to $\tfrac{1}{2} e \log n \sqrt{n}$. Hence we obtain the iteration bound

$$\begin{aligned} O\left( \sqrt{n}\log n \log \tfrac{n}{\varepsilon }\right) , \end{aligned}$$

for the algorithm, which is the best-known iteration bound for large-update FIPMs based on kernel functions. It should be mentioned that the best-known iteration bound for the FIPMs that use a large neighborhood of the central path is due to Potra [12] who designed a superlinearly convergent predictor–corrector algorithm for linear complementarity problems that has an $O(\sqrt{n}L)$ iteration bound, with L denoting the length of a binary string encoding the problem data.

In this paper, we consider a IIPM based on the use of a kernel function. Although many of the results below hold for any eligible kernel function, we will concentrate of the kernel $\psi _q$.

5 A Class of Large-Update IIPMs for LO

As usual in the theoretical analysis of IIPMs, we take the initial iterates as follows:

$$\begin{aligned} \left( x^0,y^0,s^0\right) = \zeta ({\mathbf{e}},0,{\mathbf{e}}), \end{aligned}$$

(13)

where ${\mathbf{e}}$ denotes the all-one vector in $\mathbb {R}^n$, and $\zeta $ is a number such that

$$\begin{aligned} \left\| x^*+s^*\right\| _\infty \le \zeta , \end{aligned}$$

(14)

for some optimal solutions $\left( x^*,y^*,s^*\right) $ of (P) and (D). It is worth noting that if the data A, b, c are integral and L denotes the length of a binary string encoding the triple (A, b, c), then it is well known that there exist optimal solutions $x^*$ of (P) and $(y^*,s^*)$ of (D) that satisfy (14) with $\zeta =2^L$ [4, Section 4.7].

Following [4], for any $0<\nu \le 1$ we consider the primal-dual perturbed pair ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $), defined as follows:

and

where $ r_b^0$ and $ r_c^0$ denote the initial primal and dual residual vectors, respectively. Note that, if $\nu =0$, then the perturbed pair ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) coincides with the original pair (P) and (D). Moreover, $(x^0,y^0,s^0)$ is strictly feasible for the perturbed problems ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) if $\nu = 1$. In other words, if $\nu = 1$, then ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) satisfy the IPC and since $x^0s^0 = \mu ^0{\mathbf{e}}$, with $\mu ^0 = \zeta ^2$, the $\mu ^0$-center of $(P_1)$ and $(D_1)$ is given by $\left( x^0,y^0,s^0\right) $. The following Lemma is crucial.

Lemma 5.1

(cf. [13, Theorem 5.13]) The original problems (P) and (D) are feasible if and only if for each $\nu $ satisfying $0 < \nu \le 1$ the perturbed problems ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) satisfy the IPC.

Hence, for any $\nu \in ]0,1]$, the problems ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) satisfy the IPC. This implies that the central path of ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) exists. The $\mu $-center of ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) is uniquely determined by the system

$$\begin{aligned} b - Ax= & {} \nu r_b^0, \qquad x\ge 0,\\ c - A^Ty - s= & {} \nu r_c^0,\qquad s\ge 0,\\ xs= & {} \mu {\mathbf{e}}. \end{aligned}$$

In the sequel, the parameters $\nu $ and $\mu $ will always satisfy $\mu =\nu \zeta ^2$. Therefore, we feel free to denote $\mu $-center of ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) as $\left( x(\nu ),y(\nu ),s(\nu )\right) $. The set of these $\mu $-centers, as $\nu $ runs through the interval ]0, 1], is called the homotopy path of (P) and (D) with respect to $\zeta $. By driving $\nu $ to zero, the homotopy path converges to an optimal solution of (P) and (D) [14]. Our algorithm starts at the $\mu $-center for $\nu =1$ and then follows this homotopy path to obtain an $\varepsilon $-solution of (P) and (D). Note that along the homotopy path the residual vectors are given by $\nu r_b^0$ and $\nu r_c^0$, whereas the duality gap by $\nu n \zeta ^2$. So these quantities converge to zero with the same speed. As a consequence we have

$$\begin{aligned} \epsilon \left( x(\nu ),y(\nu ),s(\nu )\right) = \nu \epsilon \left( x(1),y(1),s(1)\right) = \nu \epsilon \left( \zeta {\mathbf{e}},0,\zeta {\mathbf{e}}\right) ,\,\quad \nu \in ]0,1]. \end{aligned}$$

5.1 An Outer Iteration of the Algorithm

Just as in the case of large-update FIPMs we use a primal-dual barrier function to measure closeness of the iterates to the $\mu $-center of ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $). So, just as in Sect. 4, $\varPsi (v)$ will denote the barrier function based on the kernel function $\psi (t)$, as given in (3). Here, v denotes the variance vector of a triple (x, y, s) with respect to $\mu >0$, and we define $\varPhi (x,s,\mu )$ as in (4). The algorithm is designed in such a way that at the start of each outer iteration we have $\varPsi (v) \le \tau $ for some threshold value $\tau = O(1)$. As $\varPsi (v) = 0$ at the starting points (13), the condition $\varPsi (v) \le \tau $ is certainly satisfied at the start of the first outer iteration.

Each outer iteration of the algorithm consists of a feasibility step and some centering steps. At the start of the outer iteration we have a triple (x, y, s) that is strictly feasible for ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $), for some $\nu \in ]0,1]$, and that belongs to the $\tau $-neighborhood of the $\mu $-center of ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $), where $\mu = \nu \zeta ^2$. We first perform a feasibility step during which we generate a triple $\left( x^f,y^f,s^f\right) $ which is strictly feasible for the perturbed problems ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$), where $\nu ^+ = (1-\theta )\nu $ with $\theta \in ]0,1[$, and moreover, close enough to the $\mu ^+$-center of ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$), with $\mu ^+ = \nu ^+\zeta ^2$, i.e., $\varPhi \left( x^f,s^f;\mu ^+\right) \le \tau ^f$, for some suitable value of $\tau ^f$.

After the feasibility step, we perform some centering steps to get a strictly feasible triple $\left( x^+,y^+,s^+\right) $ of ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$) in the $\tau $-neighborhood of the $\mu ^+$-center of ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$), where $\mu ^+ = (1-\theta )\mu = \nu ^+\zeta ^2$. During the centering steps the iterates stay feasible for ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$). Hence for the analysis of the centering steps we can use the analysis for FIPMs, presented in Sect. 4. From this analysis we derive that the number of centering steps will not exceed

$$\begin{aligned} \frac{\left( \varPhi \left( x^f,s^f,\mu ^+\right) \right) ^{\gamma }}{\kappa \gamma }, \end{aligned}$$

where the parameters $\gamma $ and $\kappa $ depend on the kernel function $\psi $. Hence we are left with the problem of defining a suitable search direction $\left( \varDelta ^f x, \varDelta ^f y, \varDelta ^f s\right) $ for the feasibility step and to determine $\theta $ such that after the feasibility step we have $\varPhi \left( x^f,s^f,\mu ^+\right) \le \tau ^f$ for some suitable value of $\tau ^f$. The number of outer iterations will be $\tfrac{1}{\theta }\log \left( \frac{\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})}{\varepsilon }\right) $. Thus, the total number of iterations will not exceed

$$\begin{aligned} \frac{(\tau ^f)^\gamma }{\theta \kappa \gamma }\log \tfrac{\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})}{\varepsilon }. \end{aligned}$$

(15)

5.2 Feasibility Step

For the search direction in the feasibility step, we use the triple $\left( \varDelta ^f x, \varDelta ^f y, \varDelta ^f s\right) $ that is (uniquely) defined by the following system:

$$\begin{aligned} A\varDelta ^f x= & {} \nu r_b^0, \end{aligned}$$

(16a)

$$\begin{aligned} A^T\varDelta ^f y + \varDelta ^f s= & {} \nu r_c^0,\end{aligned}$$

(16b)

$$\begin{aligned} s\varDelta ^f x+x\varDelta ^f s= & {} 0. \end{aligned}$$

(16c)

Then, defining the new iterates according to

$$\begin{aligned} x^f = x + \theta \varDelta ^f x,\quad y^f =y+\theta \varDelta ^f y,\quad s^f = s + \theta \varDelta ^f s, \end{aligned}$$

we have, due to (16a),

$$\begin{aligned} b-Ax^f = b - A\left( x + \theta \varDelta ^f x\right) = b-Ax - \theta \nu r_b^0 = \nu r_b^0 - \theta \nu r_b^0 = (1-\theta )\nu r_b^0 = \nu ^+ r_b^0. \end{aligned}$$

In the same way one shows that $c-A^Ty^f - s^f = \nu ^+ r_c^0$. Hence it remains to find $\theta $ such that $x^f$ and $s^f$ are positive and $\varPhi (x^f,s^f,\mu ^+) \le \tau ^f$. This is the hard part of the analysis of our algorithm, which we leave to the subsection below. We conclude this part with a formal description of the algorithm, in Fig. 2, and a graphical illustration in Fig. 3.

The straight lines in Fig. 3 depict the central paths of the pair ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) and the pair ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$). The $\tau $-neighborhoods of the $\mu $- and $\mu ^+$-centers are shown by the dark gray circles. The light gray region specifies the $\tau ^f$-neighborhood of the $\mu ^+$-center of ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$). The feasibility step is depicted by the first arrow at the right-hand side. The rest of the arrows stand for the centering steps. The iterates are shown by the circlets.

5.3 Analysis of the Feasibility Step

The feasibility step starts with some strictly feasible triple (x, y, s) for ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) and $\mu =\nu \zeta ^2$ such that

$$\begin{aligned} \varPsi (v) \le \tau \quad \text{ with } v:=\sqrt{\frac{xs}{\mu }}. \end{aligned}$$

Our goal is to find $\theta $ such that after the feasibility step, with step size $\theta $, the iterates $\left( x^f,y^f,s^f\right) $ lie in the $\tau ^f$-neighborhood of the $\mu ^+$-center of the new perturbed pair ($\hbox {P}_{\nu ^+}$) and ($\hbox {D}_{\nu ^+}$), with $\tau ^f=O(n)$. This means that $\left( x^f,y^f,s^f\right) $ are such that

$$\begin{aligned} \varPsi ( v^f) \le \tau ^f\quad \text{ where }\quad v^f:=\sqrt{\frac{x^fs^f}{\mu ^+}},\,\quad \mu ^+ = (1-\theta )\mu . \end{aligned}$$

To proceed, we define scaled feasibility directions $d_x^f$ and $d_s^f$ as follows:

$$\begin{aligned} d_x^f:=\frac{v\varDelta ^f x}{x} \text{ and } d_s^f:=\frac{v\varDelta ^f s}{s}. \end{aligned}$$

(17)

Using these notations, we may write

$$\begin{aligned} x^f= & {} x+\theta \varDelta ^f x = x\left( {\mathbf{e}}+ \theta \frac{\varDelta ^f x}{x}\right) = x\left( {\mathbf{e}}+ \theta \frac{d_x^f}{v}\right) = \frac{x}{v}\left( v + \theta d_x^f\right) , \end{aligned}$$

(18)

$$\begin{aligned} s^f= & {} s+\theta \varDelta ^f s = s\left( {\mathbf{e}}+ \theta \frac{\varDelta ^f s}{s}\right) = s\left( {\mathbf{e}}+ \theta \frac{d_s^f}{v}\right) = \frac{s}{v}\left( v + \theta d_s^f\right) . \end{aligned}$$

(19)

This shows that $x^f$ and $s^f$ are positive if and only if $v + \theta d_x^f$ and $v + \theta d_s^f$ are positive. On the other hand, using (17), we can reformulate (16c) as follows:

$$\begin{aligned} x\varDelta ^f s + s\varDelta ^f x =0 \,\,\Leftrightarrow \,\, \frac{\varDelta ^f s}{s} + \frac{\varDelta ^f x}{x} = 0 \,\,\Leftrightarrow \,\, \frac{v\varDelta ^f s}{s} + \frac{v\varDelta ^f x}{x} = 0 \,\,\Leftrightarrow \,\, d_x^f + d_s^f = 0. \end{aligned}$$

Therefore, $d_s^f=- d_x^f$. As a consequence, $x^f$ and $s^f$ are positive if and only if $v \pm \theta d_x^f>0$. Since $v>0$, this is equivalent to $v^2 - \theta ^2( d_x^f)^2>0$. We conclude that $x^f$ and $s^f$ are positive if and only if $0\le \theta <\theta _{\max }$, where

$$\begin{aligned} \theta _{\max }=\frac{1}{\left\| \frac{ d_x^f}{v}\right\| _\infty }. \end{aligned}$$

Yet we turn to the requirement that $\varPsi ( v^f) \le \tau ^f$. Using (18), (19) and $xs=\mu v^2$, we write

$$\begin{aligned} \left( v^f\right) ^2 = \frac{x^fs^f}{\mu ^+} = \frac{\left( v + \theta d_x^f\right) \left( v - \theta d_x^f\right) }{1-\theta } = \frac{v^2 - \theta ^2( d_x^f)^2}{1-\theta }. \end{aligned}$$

Hence, if $\theta < \theta _{\max }$ then we may write

$$\begin{aligned} v^f = \frac{\hat{v}}{\sqrt{1-\theta }},\,\quad \text{ where }\quad \hat{v} = \sqrt{v^2 - \theta ^2( d_x^f)^2}. \end{aligned}$$

Lemma 5.2

Let $\theta $ be such that $1/\sqrt{1-\theta }=O(1)$. Then $\varPsi (\hat{v})= O(n)$ implies $\varPsi ( v^f)= O(n)$.

Proof

By Lemma 4.2, we have

$$\begin{aligned} \varPsi ( v^f)\le n\psi \left( \frac{\rho \left( \frac{\varPsi (\hat{v})}{n}\right) }{\sqrt{1-\theta }}\right) . \end{aligned}$$

Let $\varPsi (\hat{v})= O(n)$. Then $\varPsi (\hat{v})\le Cn$ for some positive constant C. Hence $\varPsi (\hat{v})/n\le C$. Recall that $\rho (s)\ge 1$ for all $s\ge 0$ and $\rho (s)$ is monotonically increasing. Furthermore, $\psi (t)$ is monotonically increasing for $t\ge 1$. Hence we obtain $\varPsi ( v^f)\le n\psi \left( {\rho \left( C\right) }/{\sqrt{1-\theta }}\right) $. Since ${1}/{\sqrt{1-\theta }}=O(1)$, the coefficient of n in the above upper bound for $\varPsi ( v^f)$ does not depend on n. Hence the Lemma follows. $\square $

Due to Lemma 5.2 it suffices for our goal to find $\theta $ such that $\varPsi ({\hat{v}}) \le {\hat{\tau }}$ where $\hat{\tau }=O(n)$. In the sequel, we consider $\varPsi (\hat{v})$ as a function of $\theta $, denoted as $f_1(\theta )$. So we have

$$\begin{aligned} f_1(\theta ) := \varPsi (\hat{v}) = \varPsi \left( \sqrt{{v}^2 - \theta ^2( d_x^f)^2}\right) . \end{aligned}$$

We proceed by deriving a tight upper bound for $f_1(\theta )$, thereby using similar arguments as in [5]. Since the kernel function $\psi (t)$ is eligible, $\varPsi (v)$ is e-convex (cf. [5, Lemma 2.1]), whence we have

$$\begin{aligned} f_1(\theta ) \le f(\theta ):= \frac{1}{2}\left[ \varPsi \left( v+\theta d_x^f\right) +\varPsi \left( v-\theta d_x^f\right) \right] . \end{aligned}$$

The first and the second derivatives of $f(\theta )$ are as follows:

$$\begin{aligned} f^{\prime }(\theta )= & {} \frac{1}{2}\sum _{i=1}^n \left[ \psi ^{\prime }\left( v_i+\theta { d_x^f}_i\right) -\psi ^{\prime }\left( v_i-\theta { d_x^f}_i\right) \right] { d_x^f}_i, \end{aligned}$$

(20)

$$\begin{aligned} f^{\prime \prime }(\theta )= & {} \frac{1}{2}\sum _{i=1}^n \left[ \psi ^{\prime \prime }\left( v_i+\theta { d_x^f}_i\right) +{\psi ^{\prime \prime }\left( v_i-\theta { d_x^f}_i\right) }\right] \left( { d_x^f}_i\right) ^2. \end{aligned}$$

(21)

Since $\psi ^{\prime \prime \prime }(t) < 0,\, \forall t > 0$, it follows that $\psi ^{\prime \prime }(t)$ is monotonically decreasing. From this we deduce that

$$\begin{aligned} \psi ^{\prime \prime }\left( v_i+\theta { d_x^f}_i\right) +\psi ^{\prime \prime }\left( v_i-\theta { d_x^f}_i\right) \le 2\psi ^{\prime \prime }\left( v_i - \theta \left| { d_x^f}_i\right| \right) \le 2\psi ^{\prime \prime }\left( v_{\min }-\theta \Vert d_x^f\Vert \right) , \end{aligned}$$

where $v_{\min }:=\min (v)$ and $\theta $ small enough, i.e., such that $v_{\min }-\theta \Vert d_x^f\Vert >0$. Substitution into (21) gives

$$\begin{aligned} f^{\prime \prime }(\theta )\le \Vert d_x^f\Vert ^2\psi ^{\prime \prime }\left( v_{\min }-\theta \Vert d_x^f\Vert \right) . \end{aligned}$$

By integrating both sides of this inequality with respect to $\theta $, while using that $f^{\prime }(0)=0$, as follows from (20), we obtain

$$\begin{aligned} f^{\prime }(\theta )= & {} f^{\prime }(0)+\int _0^\theta f^{\prime \prime }(\xi )\,d\xi \le \Vert d_x^f\Vert ^2 \int _0^\theta \psi ^{\prime \prime }\left( v_{\min }-\xi \Vert d_x^f\Vert \right) \,d\xi \\= & {} -\Vert d_x^f\Vert \int _0^\theta \psi ^{\prime \prime }\left( v_{\min }-\xi \Vert d_x^f\Vert \right) \,d\left( v_{\min }-\xi \Vert d_x^f\Vert \right) \\= & {} \; \Vert d_x^f\Vert \left[ \psi ^{\prime }\left( v_{\min }\right) -\psi ^{\prime }\left( v_{\min }-\theta \Vert d_x^f\Vert \right) \right] . \end{aligned}$$

Integrating once more, we get

$$\begin{aligned} f(\theta ) - f(0)= & {} \int _0^\theta f^{\prime }(\xi )\,d\xi \le \Vert d_x^f\Vert \int _0^\theta \left[ \psi ^{\prime }\left( v_{\min }\right) -\psi ^{\prime }\left( v_{\min } - \xi \Vert d_x^f\Vert \right) \right] \,d\xi \nonumber \\= & {} \psi ^{\prime }(v_{\min })\theta \Vert d_x^f\Vert + \psi \left( v_{\min }-\theta \Vert d_x^f\Vert \right) - \psi (v_{\min }) \nonumber \\\le & {} \left[ \psi ^{\prime }(v_{\min }) - \psi ^{\prime }\left( v_{\min }-\theta \Vert d_x^f\Vert \right) \right] \theta \Vert d_x^f\Vert , \end{aligned}$$

(22)

where the last inequality holds because $\psi $ is convex.^{Footnote 3}

The first derivative with respect to $v_{\min }$ of the right-hand side expression in this inequality is given by $(\psi ^{\prime \prime }(v_{\min }) - \psi ^{\prime \prime }(v_{\min } - \theta \Vert d_x^f\Vert ))\theta \Vert d_x^f\Vert $. Since $\psi ^{\prime \prime }$ is (strictly) decreasing, this derivative is negative. Hence it follows that the expression is decreasing in $v_{\min }$. Therefore, when $\theta $ and $\Vert d_x^f\Vert $ are fixed, the less the $v_{\min }$ is, the larger the expression will be. Below we establish how small $v_{\min }$ can be when $\delta (v)$ is given.

For each coordinate $v_i$ of v we have $|\psi ^{\prime }(v_{\min })|/2 \le \Vert \nabla \varPsi (v)\Vert /2 = \delta (v)$, which means that

$$\begin{aligned} -\delta (v) \le -\tfrac{1}{2}\psi ^{\prime }(v_i) \le \delta (v),\,\quad 1\le i \le n. \end{aligned}$$

Since the inverse function $\rho $ of $-\psi ^{\prime }/2$ is monotonically decreasing, this is equivalent to

$$\begin{aligned} \rho (\delta (v)) \le v_i \le \rho (-\delta (v)),\,\quad 1\le i \le n. \end{aligned}$$

(23)

Hence the smallest possible value of $v_{\min }$ is $\rho (\delta (v))$, and this value is attained in the (exceptional) case where $v_{\min }$ is the only coordinate of the vector v that differs from 1. So we may assume that $v_{\min } = \rho (\delta (v))$. This implies $-\psi ^{\prime }(v_{\min })/2 = \delta (v)$ and hence $\psi ^{\prime }(v_{\min })\le 0$, whence $v_{\min }\le 1$.

In the sequel, we denote $\delta (v)$ simply as $\delta $. Substitution into (22) gives that

$$\begin{aligned} f(\theta ) - f(0) \le \left[ -2\delta - \psi ^{\prime }\left( \rho (\delta )-\theta \Vert d_x^f\Vert \right) \right] \theta \Vert d_x^f\Vert . \end{aligned}$$

Hence we certainly have $f(\theta )\le \hat{\tau }$ if

$$\begin{aligned} f(0)+\left[ -2\delta - \psi ^{\prime }\left( \rho (\delta )-\theta \Vert d_x^f\Vert \right) \right] \theta \Vert d_x^f\Vert \le \hat{\tau }. \end{aligned}$$

Since $f(0)=\varPsi (v) \le \tau $, it can be verified that the last inequality holds if

$$\begin{aligned} - \tfrac{1}{2}\psi ^{\prime }\left( \rho (\delta )-\theta \Vert d_x^f\Vert \right) \le \delta +\frac{\hat{\tau }- \tau }{2\theta \Vert d_x^f\Vert }. \end{aligned}$$

Since $\rho $ is decreasing, this is equivalent to

$$\begin{aligned} \rho (\delta )-\theta \Vert d_x^f\Vert \ge \rho \left( \delta +\frac{\hat{\tau }- \tau }{2\theta \Vert d_x^f\Vert }\right) . \end{aligned}$$

(24)

Note that if $\theta $ approaches zero, then the left-hand side expression converges to $\rho (\delta )$ and the right-hand side expression to zero. The left-hand side is decreasing in $\theta $, whereas the right-hand side is increasing. The largest possible $\theta $ makes both sides equal. In order to get a tight approximation for this value, we first need to estimate $\Vert d_x^f\Vert $. This is done in the next section, where we follow a similar approach as in [15].

5.3.1 Upper Bound for $\Vert d_{x}^{f}\Vert $

One may easily check that the system (16), which defines the search directions $\varDelta ^{f}x$, $\varDelta ^{f}y$, and $\varDelta ^{f}s$, can be expressed in terms of the scaled search directions $ d_{x}^{f}$ and $ d_{s}^{f}$ as follows:

$$\begin{aligned} \bar{A} d_{x}^{f}= & {} \nu r_b^0, \end{aligned}$$

(25)

$$\begin{aligned} \bar{A}^T \frac{\varDelta ^{f}y}{\mu }+ d_{s}^{f}= & {} \nu {vs^{-1}r_c^0}, \end{aligned}$$

(26)

$$\begin{aligned} d_{x}^{f} + d_{s}^{f}= & {} 0, \end{aligned}$$

(27)

where

$$\begin{aligned} \bar{A}=AV^{-1}X, \quad V=\mathrm{diag}(v), \quad X=\mathrm{diag}(x). \end{aligned}$$

At this stage we use that the initial iterates are given by (13) and (14), so we have

$$\begin{aligned} x^{0}=s^{0}=\zeta {\mathbf{e}}, \quad y^{0}=0, \quad \mu ^{0}=\zeta ^{2}, \end{aligned}$$

(28)

where $\zeta > 0$ is such that

$$\begin{aligned} \Vert x^{*}+s^{*}\Vert _{\infty }\le \zeta \end{aligned}$$

(29)

for some optimal solutions $x^{*}$ of (P) and $(y^{*}, s^{*})$ of (D).

Using a similar argument as given in [15, Section 4.3], it can be proven that

$$\begin{aligned} \Vert d_x^f\Vert \le \frac{{\mathbf{e}}^{T}(x+s)}{\zeta v_{\min }}. \end{aligned}$$

(30)

In [15, Lemma 4.3], it is shown that

$$\begin{aligned} \nu \zeta {\mathbf{e}}^T(x+s) \le x^T s + n \nu \zeta ^2. \end{aligned}$$

(31)

Using $xs=\mu v^2$ and (23) we may write

$$\begin{aligned} x^Ts = \mu {\mathbf{e}}^Tv^2 \le \mu n\rho (-\delta )^2 = n\nu \zeta ^2\rho (-\delta )^2. \end{aligned}$$

Substitution in (31) yields that

$$\begin{aligned} \zeta {\mathbf{e}}^{T}(x+s)\le n\zeta ^2 (1+\rho (-\delta )^2). \end{aligned}$$

Substitution in (30), also using that $v_{\min }\ge \rho (\delta )$, we obtain

$$\begin{aligned} \Vert d_{x}^{f}\Vert \le \frac{n(1+\rho (-\delta )^2)}{\rho (\delta )}. \end{aligned}$$

(32)

5.3.2 Condition for $\theta $

Yet we return to the condition (24) on $\theta $:

$$\begin{aligned} \rho (\delta ) \ge \theta \Vert d_x^f\Vert + \rho \left( \delta +\frac{\hat{\tau }- \tau }{2\theta \Vert d_x^f\Vert }\right) . \end{aligned}$$

The right-hand side is increasing in $\Vert d_{x}^{f}\Vert $. Therefore, due to (32), it suffices if

$$\begin{aligned} \rho (\delta ) \ge \frac{\theta n(1+\rho (-\delta )^2)}{\rho (\delta )}+\rho \left( \delta +\frac{\rho (\delta )\left( \hat{\tau }- \tau \right) }{2\theta n(1+\rho (-\delta )^2)}\right) . \end{aligned}$$

(33)

Obviously, this implies that $\theta n(1+\rho (-\delta )^2)< \rho (\delta )^2$. Therefore, there exists $\alpha \in ]0,1[$, independent of n, such that

$$\begin{aligned} \theta = \frac{\alpha \rho (\delta )^2}{n(1+\rho (-\delta )^2)}. \end{aligned}$$

(34)

Clearly, using this $\theta $, the expression (33) can be restated as

$$\begin{aligned} \rho (\delta ) \ge \alpha \rho (\delta ) + \rho \left( \delta +\frac{\hat{\tau }- \tau }{2\alpha \rho (\delta )}\right) . \end{aligned}$$

(35)

Our objective is to find the largest possible $\alpha $ satisfying this inequality. In order to proceed we need bounds for $\delta =\delta (v)$ and $\rho (\delta )$. The next section deals with some technical lemmas which are useful in deriving these bounds.

6 Some Technical Lemmas

Recall that $\rho $ is defined as the inverse function of $-\tfrac{1}{2}\psi ^{\prime }(t)$ and $\rho $ as the inverse function of $\psi (t)$ for $t\ge 1$. We also need the inverse function of $\psi (t)$ for $t\in ]0,1]$, which we denote as $\chi $. To get tight estimates for these inverse functions, we need the inverse functions $\bar{\chi }$ and $\bar{\rho }$ of $\psi _b(t)$ and $-\psi ^{\prime }_b(t)$, respectively. Recall that, as it was shown in [5], one has

$$\begin{aligned} \psi ^{\prime }_b(t)<0,\,\quad \psi ^{\prime \prime }_b(t)> 0,\,\quad \psi ^{\prime \prime \prime }_b(t)<0,\,\quad t>0. \end{aligned}$$

Hence $\psi _b(t)$ is monotonically decreasing and $\psi ^{\prime }_b(t)$ is monotonically increasing. This implies that $\psi _b(t)$ and $-\psi ^{\prime }_b(t)$ have inverse functions and these function are monotonically decreasing. We denote these inverse functions as $\bar{\chi }$ and $\bar{\rho }$, respectively.

Lemma 6.1

With $\bar{\chi }$ denoting the inverse function of $\psi _b(t)$, one has

$$\begin{aligned} \bar{\chi }\left( s+\frac{1}{2}\right) \le \chi (s) \le \bar{\chi }\left( s\right) ,\quad s\ge 0. \end{aligned}$$

Proof

Let $t\in ]0,1]$. Then one has

$$\begin{aligned} t=\chi (s) \quad \Leftrightarrow \quad \psi (t) = s \quad \Leftrightarrow \quad \psi _b(t) = s+\tfrac{1-t^2}{2} \quad \Leftrightarrow \quad \chi (s) = \bar{\chi }\left( s+\tfrac{1-\chi (s)^2}{2}\right) . \end{aligned}$$

Since $\chi (s)\in ]0,1]$, this implies the inequalities in the lemma. $\square $

Lemma 6.2

With $\bar{\rho }$ denoting the inverse function of $- \psi ^{\prime }_b(t)$ for $t >0$, one has

$$\begin{aligned} \bar{\rho }(1 + 2s) \le&\rho (s)&\le \bar{\rho }(2s),\quad s\ge 0. \end{aligned}$$

Proof

Since $\psi ^{\prime }(t) = t+\psi _b^{\prime }(t)$, one has

If $s\ge 0$, then $\rho (s)=t\in ]0,1]$, and hence $\bar{\rho }\left( 2s\right) \ge \rho (s) \ge \bar{\rho }\left( 2s+1\right) $, proving the lemma. $\square $

Recall that $\rho $ is the inverse function of $\psi (t)$ for $t\ge 1$. The following two results are less trivial than the preceding two lemmas.

Lemma 6.3

(cf. [5, Lemma 6.2]) For $s\ge 0$, one has

$$\begin{aligned} \sqrt{1+2s}\le \rho (s)\le 1 + \sqrt{2s}. \end{aligned}$$

Lemma 6.4

One has, for any $v\in \mathbb {R}^n_{++}$,

$$\begin{aligned} \tfrac{1}{2}\psi ^{\prime }\left( \rho \left( \varPsi (v)\right) \right) \le \ \delta (v)\le -\tfrac{1}{2}\psi ^{\prime }\left( \chi \left( \varPsi (v)\right) \right) . \end{aligned}$$

Proof

The left-hand side inequality in the lemma is due to [5, Theorem 4.9]. The proof of the right-hand side inequality can be obtained by slightly changing the proof of [5, Theorem 4.9] and is therefore omitted. $\square $

7 Iteration Bound of the IIPM Based on $\psi _q$

Since the kernel function $\psi _q$ led to the best-known iteration bound for large-update FIPMs based on kernel functions, from now on we restrict ourselves to the case where $\psi =\psi _q$. Thus, we have

$$\begin{aligned} \psi _b(t) = \frac{t^{1-q}-1}{q-1},\,\quad \psi ^{\prime }_b(t) = -t^{-q},\,\quad \psi ^{\prime \prime }_b(t) = qt^{-q-1} ,\,\quad \psi ^{\prime \prime \prime }_b(t) = -q(q+1)t^{-q-2}. \end{aligned}$$

In the current case the inverse functions $\bar{\chi }$ and $\bar{\rho }$ are defined as follows:

$$\begin{aligned} \bar{\chi }(s)= & {} \frac{1}{[1+(q-1)s]^\frac{1}{q-1}},\,\quad s > \frac{-1}{q-1}, \end{aligned}$$

(36)

$$\begin{aligned} \bar{\rho }(s)= & {} \frac{1}{s^\frac{1}{q}},\,\quad s > 0. \end{aligned}$$

(37)

Since $\chi $ and $-\tfrac{1}{2}\psi ^{\prime }$ are decreasing, $-\tfrac{1}{2}\psi ^{\prime }\chi $ is increasing. Hence, letting $\varPsi (v)\le \tau $, Lemma 6.4 implies that

$$\begin{aligned} \delta =\delta (v)\le -\tfrac{1}{2}\psi ^{\prime }\left( \chi (\tau )\right) . \end{aligned}$$

(38)

By Lemma 6.1 we have $\chi (\tau ) \ge \bar{\chi }\left( \tau +\tfrac{1}{2}\right) $. Using once more that $-\tfrac{1}{2}\psi ^{\prime }$ is decreasing we obtain

$$\begin{aligned} 2\delta \le -\psi ^{\prime }\left( \bar{\chi }\left( \tau +\tfrac{1}{2}\right) \right) . \end{aligned}$$

Since $-\psi ^{\prime }(t) = t^{-q}-t \le t^{-q}$, and due to (36), it follows that

$$\begin{aligned} 2\delta\le & {} \bar{\chi }\left( \tau +\tfrac{1}{2}\right) ^{-q}\\= & {} {\left[ 1+(q-1)\left( \tau +\tfrac{1}{2}\right) \right] ^\frac{q}{q-1}} \\= & {} \left[ 1+(q-1)\left( \tau +\tfrac{1}{2}\right) \right] {\left[ 1+(q-1)\left( \tau +\tfrac{1}{2}\right) \right] ^\frac{1}{q-1}}\\\le & {} \left[ 1+(q-1)\left( \tau +\tfrac{1}{2}\right) \right] e^{\tau +\tfrac{1}{2}}, \end{aligned}$$

where the last inequality is due to $(1+ax)^{1/x}\le e^a$ for $x> 0$ and $1+ax>0$. Hence, when taking $\tau \le 1/2$, we have

$$\begin{aligned} \delta \le \tfrac{1}{2} qe. \end{aligned}$$

(39)

Since $\rho $ is decreasing, by applying $\rho $ to both sides of (38), and using Lemma 6.1 and (36) we obtain

$$\begin{aligned} \rho (\delta ) \ge \chi (\tau ) \ge \bar{\chi }\left( \tau +\tfrac{1}{2}\right) = \frac{1}{[1+(q-1)\left( \tau +\tfrac{1}{2}\right) ]^\frac{1}{q-1}}. \end{aligned}$$

If $\tau \le 1/2$ this implies

$$\begin{aligned} \rho (\delta ) \ge \frac{1}{q^\frac{1}{q-1}} = \frac{1}{e^\frac{\log q}{q-1}} \ge \frac{1}{e}.\end{aligned}$$

(40)

Using that $\rho $ is decreasing and also Lemma 6.2 and (37) we have

$$\begin{aligned} \rho \left( \delta +\frac{\hat{\tau }- \tau }{2\alpha \rho (\delta )}\right)\le & {} \rho \left( \frac{\hat{\tau }- \tau }{2\alpha \rho (\delta )}\right) \le \bar{\rho }\left( \frac{\hat{\tau }- \tau }{\rho (\delta )}\right) = \left( \frac{\rho (\delta )}{\hat{\tau }- \tau }\right) ^{\frac{1}{q}} \le \frac{1}{\left( \hat{\tau }- \tau \right) ^{\frac{1}{q}}}. \end{aligned}$$

Furthermore, using $\rho \ge 1/e$ we conclude that (35) certainly will hold if

$$\begin{aligned} \frac{1-\alpha }{e} \ge \frac{1}{\left( \hat{\tau }- \tau \right) ^{\frac{1}{q}}}. \end{aligned}$$

Now taking $\alpha = 1/2$ it follows that (35) will be satisfied if

$$\begin{aligned} q \le \frac{\log (\hat{\tau }- \tau )}{\log (2e)}. \end{aligned}$$

(41)

Substitution of $\alpha = 1/2$ into (34) yields

$$\begin{aligned} \theta = \frac{\rho (\delta )^2}{2n\left( 1+\rho (-\delta )^2\right) }. \end{aligned}$$

Due to (40) we have $\rho (\delta )\ge 1/e$. In order to find an upper bound for $\rho (\delta )$ we do as follows. For $s\ge 0$,

$$\begin{aligned} t=\rho (-s) \;\Leftrightarrow \; -\frac{1}{2}\psi ^{\prime }(t)= -s \;\Leftrightarrow \; 2s = t-t^{-q} \;\Leftrightarrow \; t = 2s + t^{-q},\,\quad t\ge 1. \end{aligned}$$

Since $t\ge 1$ we have $t^{-q}\in ]0,1]$. Hence $t=\rho (-s)$ implies $ 2s \le \rho (-s) \le 2s+1$. Thus, $\rho (-\delta ) \le 2\delta +1$. By (39), one has $\rho (-\delta ) \le 1+qe$. We therefore may conclude that (33) certainly holds if we take

$$\begin{aligned} \theta = \frac{1}{2e^2n\left( 1+(1+qe)^2\right) }. \end{aligned}$$

(42)

This is the value that will be used in the sequel.

7.1 Complexity Analysis

As we established in (15), the total number of iterations is at most

$$\begin{aligned} \frac{\left( \tau ^f\right) ^\gamma }{\theta \kappa \gamma }\log \frac{n\zeta ^2}{\varepsilon }. \end{aligned}$$

We assume that $\hat{\tau }= O(n)$. Due to Lemma 5.2 we then also have $\tau ^f = O(n)$, provided that ${1}/{\sqrt{1-\theta }}=O(1)$. Due to (42) the latter condition is satisfied. To simplify the presentation we use $\tau ^f=n$ in the analysis below, but our results can easily be adapted to the case where $\hat{\tau }=O(n)$. When choosing $\theta $ maximal such that (42) holds, we have

$$\begin{aligned} \theta = \frac{1}{2e^2n\left( 1+(1+qe)^2\right) }. \end{aligned}$$

Hence, substitution of $\gamma = \tfrac{q+1}{2q}$ and $\kappa = \tfrac{1}{124\,q}$ yields that the total number of iterations does not exceed

$$\begin{aligned} 248e^2qn\left( 1+(1+qe)^2\right) {n^{\frac{q+1}{2q}}}\,\log \frac{n\zeta ^2}{\varepsilon }= O\left( q^3n\sqrt{n}{\,n^\frac{1}{2q}}\,\log \frac{n\zeta ^2}{\varepsilon }\right) . \end{aligned}$$

(43)

The expression $q^3n^\frac{1}{2q}$ is minimal if $q = \log n /6$, and then it is equal to $e^4 (\log n)^3/512$. This value of q satisfies (41), since $\log (2e) \le 6$. Hence we obtain the following iteration bound:

$$\begin{aligned} O\left( n\sqrt{n}\,(\log n)^3\,\log \frac{n\zeta ^2}{\varepsilon }\right) . \end{aligned}$$

(44)

It is worth mentioning that if $\psi _1$ were used, then the iteration bound of our algorithm would be $O\left( n^2\log \frac{n\zeta ^2}{\varepsilon }\right) $, which is a factor $\sqrt{n}/(\log n)^3$ worse than (44).

8 Detecting Infeasibility or Unboundedness

As indicated in the introduction, the algorithm will detect infeasibility or/and unboundedness of (P) and (D) if no optimal solutions exist. In that case Lemma 5.1 implies the existence of $\bar{\nu } > 0$ such that the perturbed pair ($\hbox {P}_\nu $) and ($\hbox {D}_\nu $) satisfy the IPC if and only if $\nu \in ]\bar{\nu },1]$. As long as $\nu ^+ = (1-\theta ) \nu >\bar{\nu }$ the algorithm will run as it should, with $\theta $ given by (42). However, if $\varepsilon $ is small enough, at some stage it will happen that $\nu > \bar{\nu } \ge \nu ^+$. At this stage the new perturbed pair does not satisfy the IPC. This will reveal itself since at that time we necessarily have $\theta _{\max } < \tilde{\theta }$. If this happens we may conclude that there is no optimal pair $(x^*,s^*)$ satisfying $\Vert x^*+s^*\Vert _\infty \le \zeta $. We refer to [4] for a more detailed discussion.

9 Computational Results

In this section, we present the numerical results for a practical variant of the algorithm described in Sect. 5. Theoretically, the barrier parameter $\mu $ is updated by a factor $(1-\theta )$ with $\theta $ given by (42), and the iterates are kept very close to the $\mu $-centers, namely the $\tau $-neighborhood of the $\mu $-centers, with $\tau =\tfrac{1}{8}$. In practice, it is not efficient to do in this way and not even necessary either. We present a variant of the algorithm which uses a predictor–corrector step in the feasibility step. Moreover, for the parameter $\tau $, defined in Sect. 5.1, we allow some larger value than $\tfrac{1}{8}$, e.g., $\tau = O(n)$. We set $\tau = \hat{\tau } = O(n)$ with $\hat{\tau }$ defined as in Sect. 5.1. As a consequence, the algorithm does not need centering steps. We choose $\hat{\tau }$ according to the following heuristics: if $n \le 500$, then $\hat{\tau } = 100n$, for $500\le n\le 5000$, we choose $\hat{\tau } = 10n$ and for $n\ge 5000$, we set $\hat{\tau } = 3n$. We compare the performance of the algorithm with the well-known LIPSOL package [16].

9.1 Starting Point and Stopping Criterion

A critical issue when implementing a primal-dual method is to find a suitable starting point. It seems sensible to obtain a starting point which is well centered and as close to a feasible primal-dual point as possible. The one suggested by theory, i.e., given by (13), being nicely centered, it may be quite far from the feasibility region. Moreover, to find a suitable $\zeta $ is another issue.

In our implementation, we use a starting point $\left( x^0,y^0,s^0\right) $ which is proposed by Lustig et al. [17] and inspired by the starting point used by Mehrtora [18]. Since we are interested in a point which is in the $\tau $-neighborhood of the $\mu ^0$-center. If $\varPhi (x^0; s^0;\mu _0) > \tau $, we increase $\mu _0$ until $\varPhi (x0; s0;\mu _0) \le \tau $.

As in LIPSOL, our algorithm terminates if, for $\varepsilon =10^{-6}$, either

$$\begin{aligned} E(x,y,s):=\frac{\Vert r_b\Vert }{\max (1,\Vert b\Vert )} + \frac{\Vert r_c\Vert }{\max (1,\Vert c\Vert )} + \frac{|c^Tx - b^Ty|}{\max (1,|c^Tx|,|b^Ty|)}\le \varepsilon , \end{aligned}$$

(45)

or $|x^Ts - (x^+)^Ts^+| < \varepsilon $ occurs. The condition (45) measures the total relative errors in the optimality conditions (1), while the later criterion terminates the program if only a tiny improvement is obtained on the optimality. In fact, it prevents the program from stalling. We include this criterion following Lustig [19].

9.2 Feasibility Step Size

As in other efficient numerical experiments, e.g., [16, 17], regardless of the theoretical result, we apply different step sizes along the primal step $\varDelta x $ and the dual step $(\varDelta y,\varDelta s)$. This implies that the feasibility improves much faster than when identical step sizes are used. Letting (x, y, s) be the current iterates and $(\varDelta x,\varDelta y,\varDelta s)$ the Newton step, we obtain the maximum step sizes $\theta ^p_{\max }$ and $\theta ^d_{\max }$ in, respectively, the primal and the dual spaces as follows:

$$\begin{aligned} \theta ^p_{\max } = \min \limits _{\varDelta x_i<0}\left\{ 1,-\frac{x_i}{\varDelta x_i}\right\} ,\quad \theta ^d_{\max } = \min \limits _{\varDelta s_i<0}\left\{ 1,-\frac{s_i}{\varDelta s_i}\right\} . \end{aligned}$$

The goal is to keep the iterates close to the $\mu $-center, i.e., in its $\hat{\tau }$-neighborhood where $\hat{\tau }$ is defined in Sect. 5.3. Thus, letting $\bar{\theta }$ be such that

$$\begin{aligned} \varPhi (x+\bar{\theta }\theta ^p_{\max }\varDelta x,s+\bar{\theta }\theta ^d_{\max }\varDelta s,\mu )\le \hat{\tau }, \end{aligned}$$

the primal and the dual step sizes $\theta _p$ and $\theta _d$ are defined as follows:

$$\begin{aligned} \theta _p = \bar{\theta }\theta ^p_{\max } \quad \text {and}\quad \theta _d = \bar{\theta }\theta ^d_{\max }. \end{aligned}$$

9.3 Solving the Linear System

We apply the backslash command of MATLAB (’$\backslash $’) to solve the normal equations

$$\begin{aligned} AD^2A^T \varDelta y = \hat{b}, \end{aligned}$$

(46)

where $\hat{b}$ is some right-hand side and D, in LO case, has the form

$$\begin{aligned} D:=\mathrm{diag}\left( \sqrt{\frac{x}{s}}\right) . \end{aligned}$$

Denoting $M:=AD^2A^T$ in (46), whenever the matrix M is ill-conditioned, we could obtain some more accurate solution by perturbing M as

$$\begin{aligned} M = M + 10^{-9}I, \end{aligned}$$

where I is the identity matrix with size of M.

9.4 A Rule for Updating $\mu $

Motivated by the numerical results, and considering the fact that the Mehrotra’s PC method has become the most efficient in practice and used in most IPM-based software packages, e.g., [16, 20–22], we present the numerical results of the variant of our algorithm which uses Mehrotra’s PC direction at the feasibility step.

At the feasibility step, we apply the system

$$\begin{aligned} A\varDelta ^a_x= & {} r_b,\\ A^T\varDelta ^a_y + \varDelta ^a_s= & {} r_c,\\ s\varDelta ^a_x + x\varDelta ^a_s= & {} -xs, \end{aligned}$$

to obtain the affine-scaling directions $(\varDelta ^a_x,\varDelta ^a_y,\varDelta ^a_s)$. Then, the maximum step sizes $\theta ^p_{\max }$ and $\theta ^d_{\max }$ in, respectively, primal and dual spaces are calculated as described in Sect. 9.2. Then, defining

$$\begin{aligned} \mu ^a := \frac{(x + \theta ^p_{\max }\varDelta ^a_x)^T(s + \theta ^d_{\max }\varDelta ^a_s)}{n}, \end{aligned}$$

we let

$$\begin{aligned} \sigma = \left( \bar{\sigma }\frac{\mu ^a}{\mu }\right) ^3,\quad \bar{\sigma }\in ]0,1[. \end{aligned}$$

We use $\bar{\sigma }=0.3$, as the default value of $\bar{\sigma }$. If $\sigma <1$, we calculate the new barrier update parameter $\mu $ as follows:

$$\begin{aligned} \mu _{\mathrm{new}} = \sigma \mu ^a. \end{aligned}$$

Then, if necessary, by increasing $\mu _{\mathrm{new}}$ by a constant factor, say 1.1, we derive some $\mu _{\mathrm{new}}$ for which

$$\begin{aligned} \varPhi (x,s,\mu _{\mathrm{new}})\le \hat{\tau }. \end{aligned}$$

Note that $\mu _{\mathrm{new}}$ is acceptable only if $\mu _{\mathrm{new}} < \mu $.

In the case where $\mu _{\mathrm{new}} \ge \mu $ or $\sigma \ge 1$, we do not perform any $\mu $-update and proceed with the following corrector step which attempts to center toward the current $\mu $-center:

$$\begin{aligned} A\varDelta ^c_s= & {} 0,\nonumber \\ A^T\varDelta ^c_y + \varDelta ^c_s= & {} 0,\nonumber \\ s\varDelta ^c_s + x\varDelta ^c_s= & {} \mu {\mathbf{e}}- \varDelta ^a_x\varDelta ^a_s. \end{aligned}$$

(47)

Recall that if $\sigma \ge 1$, then we have an increase in the duality gap and hence the use of $\mu _{\mathrm{new}}$ is no longer sensible. If $\sigma < 1$ and $\mu _{\mathrm{new}} < \mu $, we use the system (47) with $\mu =0$ as a corrector step.

The feasibility step $\left( \varDelta ^f x,\varDelta ^f y,\varDelta ^f s\right) $ is obtained as follows:

$$\begin{aligned} \varDelta ^f x = \varDelta ^a_x + \varDelta ^c_s,\quad \varDelta ^f y=\varDelta ^a_y+\varDelta ^c_y,\quad \varDelta ^f s = \varDelta ^a_s + \varDelta ^c_s. \end{aligned}$$

Next, we calculate the maximum step sizes $\theta ^p_{\max }$ and $\theta ^d_{\max }$ in, respectively, the primal and the dual spaces along the step $\left( \varDelta ^f x,\varDelta ^f y,\varDelta ^f s\right) $, as described in Sect. 9.2. Assuming that $\mu $ is the outcome of the predictor step, we obtain a $\theta $ such that

$$\begin{aligned} \varPhi (x+\theta \theta ^p_{\max }\varDelta ^f x,s+\theta \theta ^d_{\max }\varDelta ^f s,\mu ) \le \hat{\tau }, \end{aligned}$$

with $\hat{\tau }$ chosen as described in Sect. 9.4. By setting $\theta _p = \theta \theta ^d_{\max }$ and $\theta _d=\theta \theta ^d_{\max }$, we calculate the new iterates $\left( x^f, y^f, s^f\right) $ as follows:

$$\begin{aligned} x^f = x + \theta _p\varDelta ^f x,\quad y^f=y+\theta _d\varDelta ^f y \quad \text {and}\quad s^f=s+\theta _d\varDelta ^f s. \end{aligned}$$

Table 1 Numerical results $\left( q=\tfrac{\log n}{6} \hbox {in} \psi _q\right) $

Full size table

9.5 Results

In this section, we present our numerical results. Motivated by the theoretical result which says that the kernel function $\psi _q$ gives rise to the best-known theoretical iteration bound for large-update IIPMs based on kernel functions, we compare the performance of the algorithm described in the previous subsection based on both the logarithmic barrier function and the $\psi _q$-based barrier function. As the theory suggests, we use $q=\frac{\log n}{6}$ in $\psi _q$.

Our test was done on a standard PC with Intel$^\circledR $ Core$^{\texttt {TM}}$ 2 Duo CPU and 3.25 GB of RAM. The code was implemented by version 7.11.0 (R2010b) of MATLAB$^\circledR $ on Windows XP Professional operating system. The problems chosen for our test are from the NETLIB set. To simplify the study, we choose the problems which are in (P) format, i.e., there is no nonzero lower bound or finite upper bound on the decision variables.

Before solving a problem using the algorithm, we first solve it for singleton variables. This helps to reduce the size of the problem.

Numerical results are tabulated in Table 1. In second and the fourth columns, we listed the total number of iterations of the algorithm based on, respectively, $\psi _1$, the kernel function of the logarithmic barrier function, and $\psi _q$. The third and fifth columns stand for the quantity E(x, y, s). The iteration numbers of the LIPSOL package are given in the sixth column of these tables. Finally, in the seventh column, we listed the quantity E(x, y, s) of the LIPSOL package. In each row, the Italic number denotes the smallest of the iteration numbers of the three algorithms, namely our algorithm based on $\psi _1$ and $\psi _q$, and LIPSOL, and the bold number denotes the smallest of the iteration numbers of $\psi _1$-based and $\psi _q$-based algorithms. As it can be noticed from the last row of the table, the overall performance of the algorithm based on $\psi _1$ is much better than that the variant based on $\psi _q$. However, in some of the problems, the $\psi _q$-based algorithm outperforms the $\psi _1$-based algorithm. This amounts to the problems AGG, BANDM, DEGEN2, DEGEN3, SCSD1, SCSD6, SCSD8 and SHARE2B. The columns related to LIPSOL prove that it is still the best; however, our $\psi _1$-based algorithm saved one iteration compared with LIPSOL, to solve AGG2 and AGG3, and two iterations when solving STOCFOR1.

10 Conclusions

In this paper, we analyze infeasible interior-point methods (IIPMs) for LO based on a large neighborhood. Our work is motivated by [4] in which Roos presents a full-Newton IIPM for LO. Since the analysis of our algorithms requires properties of barrier functions based on kernel functions that are used in large-update feasible interior-point methods (FIPMs), we present primal-dual large-update FIPMs for LO based on kernel functions, as well.

In Roos’ algorithm, the iterates move within small neighborhood of the $\mu $-centers of the perturbed problem pairs. In many IIPMs, the algorithm reduces the infeasibility and the duality gap at the same rate. His algorithm has the advantages that it uses full-Newton steps and hence no calculation of step size is needed, and moreover its theoretical iteration bound is $O(n\log (\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})/\varepsilon ))$ which coincides with the best-known iteration bound for IIPMs. Nevertheless, it has the deficiency that it is too slow in practice as it reduces the quantity $\epsilon (x,y,s)$ by a factor $(1-\theta )$ with $\theta = O(\tfrac{1}{n})$ at an iteration.

We attempt to design a large-update version of Roos’ algorithm which allows some larger amount of reduction of $\epsilon (x,y,s)$ at an iteration than $1-\theta $ with $\theta =O(1/n)$. This requires that the parameter $\theta $ is larger than O(1 / n), even $\theta = O(1)$. Unfortunately, the result of Sect. 5 implies that $\theta $ is $O(1/(n(\log n)^2))$ which yields $O(n\sqrt{n}(\log n)^3\log (\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})/\varepsilon )$ iteration bound for a variant. Since the theoretical result of the algorithm is disappointing, we rely on the numerical results to establish that our algorithm is a large-update method. A practically efficient version of the algorithm is presented and its numerical result is compared with the well-known LIPSOL package. Fortunately, the numerical results seem promising as our algorithm has iteration numbers close to those of LIPSOL and, in a few cases, outperforms it. This means that IIPMs suffer from the same irony as FIPMs, i.e., regardless of their nice practical performance, the theoretical complexity of large-update methods is worse. Recall that the best-known iteration bound for large-update IIPMs is $O(n\sqrt{n}\log n\log (\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})/\varepsilon )$ which is due to Salahi et al. [7].

As in LIPSOL [16], different step sizes in the primal and the dual spaces are used in our implementation. This gives rise to a faster achievement in feasibility than when identical step sizes are used. Moreover, inspired by the LIPSOL package, we use a predictor–corrector step in the feasibility step of the algorithm.

Future research may focus on the following areas:

As mentioned before, our algorithm has a factor $(\log n)^2$ worse iteration bound than the best-known iteration bound for large-update IIPMs. One may consider how to modify the analysis such that the iteration bound of our algorithm is improved by a factor $(\log n)^2$.
As mentioned in Sect. 10, according to the analysis of our algorithm presented in Sect. 5, the barrier-updating parameter $\theta $ is $O(1/(n(\log n)^2))$. This yields the loose iteration bound given by (44). This slender value of $\theta $ is obtained because of some difficulties in the analysis of the algorithm which uses the largest value of $\theta $, satisfying (24), to assure that $\varPsi (\hat{v}) = O(n)$. This value of $\theta $ is much smaller than the best value we may choose. Assuming $n=60$, the largest value of $\theta $ satisfying $\varPsi (\hat{v}) = n$ is 0.849114 while the value of $\theta $ suggested by theory is 0.141766. A future research may focus on some new analysis of the algorithm which yields some larger value of $\theta $.
Roos’ full-Newton step IIPM was extended to semidefinite optimization (SDO) by Mansouri and Roos [23], to symmetric optimization (SO) by Gu et al. [24] and to LCP by Mansouri et al. [25]. An extension of large-update FIPMs based on kernel functions to SDO was presented by El Ghami [10]. One may consider how our algorithm behaves in theory and practice when it is extended to case of SDO, SO and LCP.

Notes

Throughout this paper, by a kernel function, we refer to the kernel functions introduced and discussed in [5].
The problems (P) and (D) can be embedded into a self-dual problem for which the given triple is a feasible solution and that has an optimal solution that induces optimal solutions for (P) and (D).
We use that if f is convex and differentiable, then
$$\begin{aligned} (b-a)f^{\prime }(a)\le f(b)-f(a) \le (b-a)f^{\prime }(b). \end{aligned}$$

References

Megiddo, N. (ed.): Pathways to the optimal set in linear programming. In: Progress in Mathematical Programming (Pacific Grove, CA, 1987), pp. 131–158. Springer, New York (1989)
Anstreicher, K.M.: A combined phase I-phase II projective algorithm for linear programming. Math. Progr. 43(2, (Ser. A)), 209–223 (1989)
Article MathSciNet MATH Google Scholar
Mizuno, S.: Polynomiality of infeasible-interior-point algorithms for linear programming. Math. Progr. 67(1, Ser. A), 109–119 (1994)
Article MathSciNet MATH Google Scholar
Roos, C.: A full-Newton step $O(n)$ infeasible interior-point algorithm for linear optimization. SIAM J. Optim. 16(4), 1110–1136 (2006)
Article MathSciNet MATH Google Scholar
Bai, Y., El Ghami, M., Roos, C.: A comparative study of Kernel functions for primal-dual interior-point algorithms in linear optimization. SIAM J. Optim. 15(1), 101–128 (2004)
Article MathSciNet MATH Google Scholar
Peng, J., Roos, C., Terlaky, T.: Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton Series in Applied Mathematics. Princeton University Press, Princeton (2002)
MATH Google Scholar
Salahi, M., Peyghami, M.R., Terlaky, T.: New complexity analysis of IIPMs for linear optimization based on a specific self-regular function. Eur. J. Oper. Res. 186(2), 466–485 (2008)
Article MathSciNet MATH Google Scholar
Potra, F.A., Stoer, J.: On a class of superlinearly convergent polynomial-time interior-point methods for sufficient LCP. SIAM J. Optim. 20(3), 1333–1363 (2009)
Article MathSciNet MATH Google Scholar
Roos, C., Terlaky, T., Vial, J.-Ph.: Interior-Point Methods for Linear Optimization. Springer, New York (2006). In: Second edition of Theory and Algorithms for Linear Optimization [Wiley, Chichester, 1997]
El Ghami, M.: New Primal-Dual Interior-Point Methods Based on Kernel Functions. Ph.D. thesis, Delft University of Technology, The Netherlands (2005)
Peng, J., Roos, C., Terlaky, T.: A new and efficient large-update interior-point method for linear optimization. Vychisl. Tekhnol. 6(4), 61–80 (2001)
MathSciNet MATH Google Scholar
Potra, F.A.: A superlinearly convergent predictor-corrector method for degenerate LCP in a wide neighborhood of the central path with $O(\sqrt{n}L)$-iteration complexity. Math. Progr. 100(2, Ser. A), 317–337 (2004)
Article MathSciNet MATH Google Scholar
Ye, Y.: Interior Point Algorithms. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley, New York (1997)
Google Scholar
Asadi, A., Gu, G., Roos, C.: Convergence of the homotopy path for a full-Newton step infeasible interior-point method. Oper. Res. Lett. 38(2), 147–151 (2010)
Article MathSciNet MATH Google Scholar
Gu, G., Mansouri, H., Zangiabadi, M., Bai, Y., Roos, C.: Improved full-Newton step $O(n)$ infeasible interior-point method for linear optimization. J. Optim. Theory Appl. 145(2), 271–288 (2010)
Article MathSciNet MATH Google Scholar
Zhang, Y.: Solving large-scale linear programs by interior-point methods under the MATLAB environment. Optim. Methods Softw. 10(1), 1–31 (1998)
Article MathSciNet MATH Google Scholar
Lustig, I., Marsten, R.E., Shanno, D.F.: On implementing Mehrotra’s predictor-corrector interior-point method for linear programming. SIAM J. Optim. 2(3), 435–449 (1992)
Article MathSciNet MATH Google Scholar
Mehrotra, S.: On the implementation of a primal-dual interior-point method. SIAM J. Optim. 2(4), 575–601 (1992)
Article MathSciNet MATH Google Scholar
Lustig, I.: Feasibility issues in a primal-dual interior-point method for linear programming. Math. Progr. 49(2, (Ser. A)):145–162 (1990/91)
Vanderbei, R.J.: LOQO: an interior-point code for quadratic programming. Optim. Methods Softw. 11/12(1–4), 451–484 (1999)
Article MathSciNet MATH Google Scholar
Andersen, E.D., Andersen, K. D.: The Mosek interior-point optimizer for linear programming: an implementation of the homogeneous algorithm. In: Frenk, H., Roos, K., Terlaky, T., Zhang, S. (eds.) High Performance Optimization, vol. 33 of Appl. Optim. pp. 197–232. Kluwer Academy Publication, Dordrecht (2000)
Czyzyk, J., Mehrotra, S., Wagner, M., Wright, S.: PCx: an interior-point code for linear programming. Optim. Methods Softw. 11/12(1–4), 397–430 (1999)
Article MathSciNet MATH Google Scholar
Mansouri, H., Roos, C.: A new full-Newton step $O(n)$ infeasible interior-point algorithm for semidefinite optimization. Numer. Algorithms 52(2), 225–255 (2009)
Article MathSciNet MATH Google Scholar
Gu, G., Zangiabadi, M., Roos, C.: Full Nesterov–Todd step infeasible interior-point method for symmetric optimization. Eur. J. Oper. Res. 3, 473–484 (2011)
Article MathSciNet MATH Google Scholar
Mansouri, H., Zangiabadi, M., Pirhaji, M.: A full-Newton step $O(n)$ infeasible-interior-point algorithm for linear complementarity problems. Nonlinear Anal. Real World Appl. 12(1), 545–561 (2011)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering, Computer Science and Mathematics, Delft University of Technology, P.O. Box 5031, 2600 GA, Delft, The Netherlands
Alireza Asadi & Cornelis Roos

Authors

Alireza Asadi
View author publications
You can also search for this author in PubMed Google Scholar
Cornelis Roos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alireza Asadi.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Asadi, A., Roos, C. Infeasible Interior-Point Methods for Linear Optimization Based on Large Neighborhood. J Optim Theory Appl 170, 562–590 (2016). https://doi.org/10.1007/s10957-015-0826-5

Download citation

Published: 29 October 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10957-015-0826-5

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Infeasible Interior-Point Methods for Linear Optimization Based on Large Neighborhood

Abstract

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Random Gradient-Free Minimization of Convex Functions

1 Introduction

2 Notations

3 Barrier Functions Based on Kernel Functions

4 A Class of Large-update FIPMs for LO

Lemma 4.1

Lemma 4.2

5 A Class of Large-Update IIPMs for LO

Lemma 5.1

5.1 An Outer Iteration of the Algorithm

5.2 Feasibility Step

5.3 Analysis of the Feasibility Step

Lemma 5.2

Proof

5.3.1 Upper Bound for \(\Vert d_{x}^{f}\Vert \)

5.3.2 Condition for \(\theta \)

6 Some Technical Lemmas

Lemma 6.1

Proof

Lemma 6.2

Proof

Lemma 6.3

Lemma 6.4

Proof

7 Iteration Bound of the IIPM Based on \(\psi _q\)

7.1 Complexity Analysis

8 Detecting Infeasibility or Unboundedness

9 Computational Results

9.1 Starting Point and Stopping Criterion

9.2 Feasibility Step Size

9.3 Solving the Linear System

9.4 A Rule for Updating \(\mu \)

9.5 Results

10 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation