1 Introduction

This work is intended for bound-constrained nonlinear optimization problems on the form

$$\begin{aligned} \begin{array}{cl} \text {minimize } &{} \> f(x) \\ \text {subject to } &{} \> l \le x \le u, \end{array} \end{aligned}$$
(NLP)

where \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) is twice continuously differentiable, \(\nabla ^2 f(x)\) is locally Lipschitz continuous and \(l, u\in \left\{ {\mathbb {R}} \cup \{ -\infty , \infty \} \right\} ^n\) are such that \(l< u\). However, to make the work and its ideas more comprehensible, we initially describe the theoretical framework and the corresponding results for problems on the form

$$\begin{aligned} \begin{array}{cl} \text {minimize } &{} \> f(x) \\ \text {subject to } &{} \> x \ge 0. \end{array} \end{aligned}$$
(P)

For completeness, analogous results for problems on the form of (NLP) together with complementary remarks are given in “Appendix A”.

Bound-constrained optimization problems appear in many different applications and are frequently subproblems in augmented Lagrangian methods. For a general overview of solution methods, see [15] and e.g., the introduction in [18] for a thorough review of previous work. Common solution techniques are: active-set methods, which aim to determine the active constraints and solve a reduced problem with the inactive variables, e.g., [8, 18]; methods involving projections onto the feasible set such as projected-gradient methods, e.g., [1, 27], projected-Newton or trust-region methods, e.g., [2, 6, 7, 22] and projected quasi-Newton methods, e.g., [4, 21, 34]. We are not aware of any primal-dual interior-point methods specialized for bound-constrained optimization except for more general methods, e.g., [9, 12, 28,29,30]. Other techniques that are related to trust-region and interior methods are affine-scaling interior-point methods, which are based upon a reformulation of the first-order necessary optimality conditions combined with a Newton-like method, e.g., [5, 19, 20].

In contrast, we consider the classical primal-dual interior-point framework. This means solving or approximately solving a sequence of systems of nonlinear equations for which we consider Newton’s method as the model method. As interior methods converge, the Newton systems typically become increasingly ill-conditioned due to large diagonal elements in the Schur complement. This is not harmful for direct solvers but it may deteriorate the performance of iterative solvers. We propose a strategy for generating approximate solutions to Newton systems, which in general involves solving smaller systems of linear equations. In the ideal case, these systems do not become increasingly ill-conditioned due to the barrier parameter approaching zero. The specific approximate solutions, and the size of the system that needs to be solved at each iteration, are determined by estimates of the active and inactive constraints at the solution. However, in general these sets are unknown and have to be estimated as the iterations proceed. In this work we use basic heuristics to determine the considered sets but other approaches may also be used, e.g., approaches similar to those in [8, 18]. In addition, we motivate and suggest two Newton-like approaches which utilize an intermediate step in combination with the solution of a Newton-like system. The intermediate step partially consists of the proposed partial approximate solutions.

The work is meant to contribute to the theoretical and numerical understanding of approximate solutions to systems of linear equations arising in interior-point methods. The approach is mainly intended for, but not limited to, bound-constrained problems, e.g., the work may also be interpreted in the framework of linear complementarity problems, see e.g., [32]. We envisage the use of the approximate solution procedure as an accelerator for a direct solver. In particular, when solving a sequence of Newton systems for a given value of the barrier parameter \(\mu\). E.g., when the direct solver and the approximate solution procedure can be run in parallel. To give an indication of the potential of the approximate solutions, we show numerical simulations on randomly generated problems as well as problems from the CUTEst test collection [16].

The manuscript is organized as follows; Sect. 2 contains a brief background to primal-dual interior-point methods and an introduction to the theoretical framework; in Sect. 3 we propose partial and full approximate solutions to Newton systems arising in interior-point methods, as well as motivate two Newton-like approaches; Sect. 4 contains numerical results on convex bound-constrained quadratic optimization problems, both randomly generated and problems from the CUTEst test collection; finally in Sect. 5 we give some concluding remarks.

2 Background

We are interested in the asymptotic behavior of primal-dual interior-point methods in the vicinity of a local minimizer \(x^*\) and its corresponding multipliers \(\lambda ^*\). In particular, we assume that the iterates of the method converge to a vector \(\left( x^{*T}, \lambda ^{*T} \right) ^T \triangleq (x^*,\lambda ^*)\) that satisfies

$$\begin{aligned} \nabla f(x^*) - \lambda ^*&= 0, \quad \text { (stationarity) } \end{aligned}$$
(1a)
$$\begin{aligned} x^*&\ge 0, \quad \text { (feasibility)} \end{aligned}$$
(1b)
$$\begin{aligned} \lambda ^*&\ge 0, \quad \text { (non-negativity of multipliers)} \end{aligned}$$
(1c)
$$\begin{aligned} x^* \cdot \lambda ^*&= 0, \quad \text { (complementarity)} \end{aligned}$$
(1d)
$$\begin{aligned} Z(x^*)^T \nabla ^2 f(x^*) Z (x^*)&\succ 0 , \end{aligned}$$
(1e)
$$\begin{aligned} x^* + \lambda ^*&> 0, \quad \text { (strict complementarity)} \end{aligned}$$
(1f)

where “\(\cdot\)” is defined as the component-wise operator and \(Z(x^*)\) is a matrix whose columns span the nullspace of the Jacobian corresponding to the constraints with a strictly positive multiplier, \(\lambda ^*\). Equations (1a)–(1d) constitute first-order necessary optimality conditions for a local minimizer of (P). These conditions together with (1e) form second-order sufficient conditions [17]. For the theoretical framework we also assume that \((x^*, \lambda ^*)\) satisfies (1f). We are particularly interested in the function \(F_{\mu }:{\mathbb {R}}^{2n} \rightarrow {\mathbb {R}}^{2n}\) defined by

$$\begin{aligned} F_{\mu }(x,\lambda ) = \begin{bmatrix} \nabla f(x) - \lambda \\ \varLambda X e - \mu e \end{bmatrix}, \end{aligned}$$

where \(\mu \in {\mathbb {R}}\) is the barrier parameter, \(X \in {\mathbb {R}}^{n \times n}, \varLambda \in {\mathbb {R}}^{n \times n}\), \(X=\text {diag}(x)\), \(\varLambda = \text {diag}(\lambda )\) and e is a vector of ones of appropriate size. A vector \((x,\lambda )\) with \(x\ge 0\), \(\lambda \ge 0\) and \(F_{\mu }(x, \lambda ) = 0\) for \(\mu =0\) satisfies the first-order optimality conditions (1a)–(1d). Primal-dual interior-point methods aim to solve or approximately solve \(F_{\mu }(x,\lambda ) = 0\) for a decreasing sequence of \(\mu >0\), while maintaining \(x>0\) and \(\lambda > 0\). This is typically done with Newton-like methods which means solving a sequence of systems of linear equations on the form

$$\begin{aligned} F'(x, \lambda ) \begin{bmatrix} \varDelta x^N \\ \varDelta \lambda ^N \end{bmatrix} = -F_{\mu }(x,\lambda ), \end{aligned}$$
(2)

where \(F': {\mathbb {R}}^{2n} \rightarrow {\mathbb {R}}^{2n}\) is the Jacobian of \(F_{\mu }\). The Jacobian is given by

$$\begin{aligned} F'(x, \lambda ) = \begin{bmatrix} H &{} -I \\ \varLambda &{} X \end{bmatrix}, \end{aligned}$$
(3)

where \(H=\nabla ^2 f(x)\) and the subscript \(\mu\) is omitted since \(F'\) is independent of the barrier parameter. For each \(\mu\), iterations are performed until a specified measure of improvement is achieved, thereupon \(\mu\) is decreased and the process is repeated. A natural measure in our setting is \(\Vert F_{\mu }(x,\lambda ) \Vert _2\) where \(\Vert F_{\mu }(x,\lambda ) \Vert _2 =0\) gives the exact solution. To improve efficiency many algorithms seek approximate solutions, a basic condition for the reduction of \(\mu\) is \(\Vert F_{\mu }(x,\lambda ) \Vert _2 < \mu\) [24, Ch. 17, p. 572]. Herein, we consider a possibly weaker version, namely \(\Vert F_{\mu }(x,\lambda ) \Vert _2 < C_1 \mu\) for some constant \(C_1 >0\). Moreover, it will throughout be assumed that all considered vectors \((x,\lambda )\) satisfy \(x>0\) and \(\lambda > 0\). The subscript in the norms will hereafter be omitted since all considered norms in this work are of type 2-norm.

Definition 1

(Order-notation) Let \(\alpha\), \(\gamma \in {\mathbb {R}}\) be two positive related quantities. If there exists a constant \(C_2>0\) such that \(\gamma \ge C_2 \alpha\) for sufficiently small \(\alpha\), then \(\gamma = \varOmega (\alpha )\). Similarly, if there exists a constant \(C_3>0\) such that \(\gamma \le C_3 \alpha\) for sufficiently small \(\alpha\), then \(\gamma = {\mathcal {O}}(\alpha )\). If there exist constants \(C_2, C_3 > 0\) such that \(C_2 \alpha \le \gamma \le C_3 \alpha\) for sufficiently small \(\alpha\) then, \(\gamma = \varTheta (\alpha )\).

Definition 2

(Neighborhood) For a given \(\delta >0\), let the neighborhood around \((x^*, \lambda ^*)\) be defined by \({\mathcal {B}}( (x^*, \lambda ^*), \delta ) = \{ (x,\lambda ) : \Vert (x,\lambda )-(x^*, \lambda ^*) \Vert \leq \delta \}\).

Assumption 1

(Strict local minimizer) The vector \((x^*, \lambda ^*)\) satisfies (1), i.e., second-order sufficient optimality conditions and strict complementarity.

The following two results provide the theoretical framework and additional definitions of various quantities. In particular, the existence of a neighborhood where the Jacobian is nonsingular and there exists a Lipschitz continuous barrier trajectory which is parameterized by the barrier parameter \(\mu\). The results are well known and can be found in e.g., the work of Ortega and Rheinboldt [26] and Byrd et al. [3] whose setting is similar to the one in this work.

Lemma 1

Under Assumption 1 there exists \(\delta >0\) such that \(F'(x,\lambda )\) is continuous and nonsingular for \((x,\lambda ) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\) and

$$\begin{aligned} \Vert F'(x, \lambda ) ^{-1}\Vert \le M, \end{aligned}$$

for some constant \(M>0\).

Proof

See [26, p. 46]. \(\square\)

Lemma 2

Let Assumption 1 hold and let \({\mathcal {B}}((x^*, \lambda ^*), \delta )\) be defined by Lemma 1. Then there exists \({\hat{\mu }}>0\) such that for each \(0<\mu \le {\hat{\mu }}\) there is a Lipschitz continuous function \((x^{\mu },\lambda ^{\mu }) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\) that satisfies \(F_{\mu }(x^{\mu },\lambda ^{\mu }) = 0\) and

$$\begin{aligned} \left\| \left( x^{\mu }, \lambda ^{\mu } \right) - \left( x^*, \lambda ^* \right) \right\| \le C_4 \mu , \end{aligned}$$

where \(C_4 = \inf _{(x,\lambda ) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )} \Vert F'(x,\lambda ) ^{-1}\frac{ \partial F_{\mu } (x,\lambda )}{\partial \mu } \Vert\).

Proof

The result follows from the implicit function theorem, see e.g., [26, p. 128].\(\square\)

The next lemma relates the measure \(\Vert F_\mu (x,\lambda ) \Vert\) to the distance between the barrier trajectory and vectors \((x,\lambda )\) that are sufficiently close. An analogous result is given by Byrd et al. [3].

Lemma 3

Under Assumption 1, let \({\mathcal {B}}((x^*, \lambda ^*), \delta )\) and \({\hat{\mu }}\) be defined by Lemma 1 and Lemma 2 respectively. For \(0<\mu \le {\hat{\mu }}\) and \((x,\lambda )\) sufficiently close to \((x^{\mu },\lambda ^{\mu }) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\) there exist constants \(C_5, C_6 > 0\) such that

$$\begin{aligned} C_5 \left\| \left( x , \lambda \right) - \left( x^{\mu } , \lambda ^{\mu } \right) \right\| \le \Vert F_{\mu }(x,\lambda ) \Vert \le C_6 \left\| \left( x ,\lambda \right) - \left( x^{\mu }, \lambda ^{\mu } \right) \right\| . \end{aligned}$$

Proof

See [3, p. 43]. \(\square\)

Recall that the reduction of \(\mu\) can be determined with the condition \(\Vert F_\mu (x,\lambda )\Vert <C_1 \mu\), for some constant \(C_1>0\). It can be shown that vectors \((x,\lambda )\), which satisfy this condition and are sufficiently close to the barrier trajectory, have their individual components bounded within certain intervals at sufficiently small \(\mu\). The individual components can be partitioned into two sets of indices which depend on how close the iterate is to its feasibility bound, see Definition 3. The order of magnitude of the individual components, which are given in Lemma 4 below, will be of importance in the derivation of various approximate solutions to (2).

Definition 3

(Active/inactive constraint) For a given \(x^* \ge 0\) constraint \(i\in \{1, \dots , n \}\) is defined as active if \(x_i^* = 0\) and inactive if \(x_i^* > 0\). The corresponding active and inactive set are defined as \({\mathcal {A}} = \{i\in \{1, \dots , n \} : x^*_i = 0 \}\), and \({\mathcal {I}}=\{i\in \{1,\dots ,n\}:x^*_i>0\}\) respectively.

Lemma 4

Under Assumption 1, let \({\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) and \({\hat{\mu }}\) be defined by Lemma 1 and Lemma 2 respectively. Then there exists \({\bar{\mu }}\), with \(0 <{\bar{\mu }}\le {\hat{\mu }}\), such that for \(0 < \mu \le {{\bar{\mu }}}\) and \((x,\lambda )\) sufficiently close to \((x^{\mu }, \lambda ^{\mu }) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\) so that \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\) it holds that

$$\begin{aligned} x_i = {\left\{ \begin{array}{ll} {\mathcal {O}}(\mu ) &{} i \in {\mathcal {A}}, \\ \varTheta (1) &{} i \in {\mathcal {I}}, \end{array}\right. } \quad \lambda _i = {\left\{ \begin{array}{ll} \varTheta (1) &{} i \in {\mathcal {A}}, \\ {\mathcal {O}}(\mu ) &{} i \in {\mathcal {I}}. \end{array}\right. } \end{aligned}$$
(4)

Proof

Under Assumption 1 it holds that

$$\begin{aligned} x_i^* = {\left\{ \begin{array}{ll} 0 &{} i \in {\mathcal {A}}, \\ c_i &{} i \in {\mathcal {I}}, \end{array}\right. } \quad \lambda _i^* = {\left\{ \begin{array}{ll} c_i &{} i \in {\mathcal {A}}, \\ 0 &{} i \in {\mathcal {I}}, \end{array}\right. } \end{aligned}$$

where \(c_i = \varTheta (1)\), \(i = 1,\dots , n\). The function \((x^{\mu }, \lambda ^{\mu })\) is Lipschitz continuous and hence for each \(\mu \le {\hat{\mu }}\) it holds that \((x^{\mu }, \lambda ^{\mu }) \in {\mathcal {B}}\left( (x^*, \lambda ^*), L_{F'} \mu \right)\), where \(L_{F'}\) is the Lipschitz constant of \(F'\) on \({\mathcal {B}}((x^*, \lambda ^*), \delta )\). There exist \({\bar{\mu }}_1\), with \(0 < {\bar{\mu }}_1 \le {\hat{\mu }}\), such that for \(0 < \mu \le {\bar{\mu }}_1\) it holds that

$$\begin{aligned} x_i^\mu = {\left\{ \begin{array}{ll} {\mathcal {O}}(\mu ) &{} i \in {\mathcal {A}}, \\ \varTheta (1) &{} i \in {\mathcal {I}}, \end{array}\right. } \quad \lambda _i^\mu = {\left\{ \begin{array}{ll} \varTheta (1) &{} i \in {\mathcal {A}}, \\ {\mathcal {O}}(\mu ) &{} i \in {\mathcal {I}}. \end{array}\right. } \end{aligned}$$

The condition \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\) implies that there exists a constant \(C_1>0\) such that \(\Vert F_{\mu } (x,\lambda ) \Vert \le C_1 \mu\). Lemma 3 and \(\Vert F_{\mu } (x,\lambda ) \Vert \le C_1 \mu\) give

$$\begin{aligned} \left\| \left( x ,\lambda \right) - \left( x^{\mu } , \lambda ^{\mu } \right) \right\| \le \frac{1}{ C_5 }\Vert F_{\mu }(x,\lambda ) \Vert \le \frac{C_1}{C_5} \mu , \end{aligned}$$

which implies that \((x,\lambda ) \in {\mathcal {B}}\left( (x^\mu , \lambda ^\mu ), \frac{C_1}{ C_5} \mu \right)\). Similarly here, there exists \({\bar{\mu }}_2\), with \(0 < {\bar{\mu }}_2 \le {\hat{\mu }}\), such that the result follows for \(0<\mu \le {\bar{\mu }}\) with \({\bar{\mu }}= \min \{{\bar{\mu }}_1, {\bar{\mu }}_2 \}\). \(\square\)

The result of Lemma 4 shows two regions which depend on \(\mu\). The first region, \(0<\mu \le {\hat{\mu }}\), defines where the barrier trajectory \((x^\mu , \lambda ^\mu )\) exists and the second region, \(0 < \mu \le {\bar{\mu }}\le {\hat{\mu }}\), defines where asymptotic behavior occurs.

3 Approximate solutions

This section initially contains an introduction to the groundwork of the ideas which precede the results. It is followed by a subsection that contains approximate solutions for specific components of the solution of (2) together with related results. The last subsection contains procedures for approximating the full solution of (2), as well as related results. Under Assumption 1 it holds that

$$\begin{aligned} \lim _{\mu \rightarrow 0} x^\mu _i = 0, \; i \in {\mathcal {A}}, \; \text{ and } \; \lim _{\mu \rightarrow 0} \lambda _i^\mu = 0, \quad i \in {\mathcal {I}}, \end{aligned}$$

in consequence, the Schur complement of X in (2) becomes increasingly ill-conditioned as \(\mu \rightarrow 0\). These properties have been utilized by several authors before, e.g., in the development of preconditioners [10, 13]. The idea in this work is to exploit them and the additional property that (P) only has bound constraints to obtain partial or full approximate solutions of (2). In particular, by utilization of structure and the asymptotic behavior of coefficients in the arising systems of linear equations. With the partition \((\varDelta x^N, \varDelta \lambda ^N) = (\varDelta x_{\mathcal {A}}^N, \varDelta x_{\mathcal {I}}^N, \varDelta \lambda _{\mathcal {A}}^N, \varDelta \lambda _{\mathcal {I}}^N)\), (2) can be written as

$$\begin{aligned} \begin{bmatrix} H_{{\mathcal {A}} {\mathcal {A}}} &{} H_{{\mathcal {A}} {\mathcal {I}}} &{} -I_{{\mathcal {A}} {\mathcal {A}}} &{} \\ H_{{\mathcal {I}} {\mathcal {A}}} &{} H_{{\mathcal {I}} {\mathcal {I}}} &{} &{} -I_{{\mathcal {I}}{\mathcal {I}}} \\ \varLambda _{{\mathcal {A}} {\mathcal {A}}} &{} &{} X_{{\mathcal {A}} {\mathcal {A}}} &{} \\ &{} \varLambda _{{\mathcal {I}} {\mathcal {I}}} &{} &{} X_{{\mathcal {I}} {\mathcal {I}}} \end{bmatrix} \begin{bmatrix} \varDelta x_{\mathcal {A}}^N \\ \varDelta x_{\mathcal {I}}^N \\ \varDelta \lambda _{\mathcal {A}}^N \\ \varDelta \lambda _{\mathcal {I}}^N \end{bmatrix} = -\begin{bmatrix} \nabla f(x)_{\mathcal {A}} - \lambda _{\mathcal {A}} \\ \nabla f(x)_{\mathcal {I}} - \lambda _{\mathcal {I}} \\ \varLambda _{{\mathcal {A}} {\mathcal {A}}} X_{{\mathcal {A}} {\mathcal {A}}} e - \mu e \\ \varLambda _{{\mathcal {I}} {\mathcal {I}}} X_{{\mathcal {I}} {\mathcal {I}}} e - \mu e \end{bmatrix}, \end{aligned}$$
(5)

where the first and second set in the matrix subscripts give the indices of rows and columns respectively. The Schur complement of \(X_{{\mathcal {A}} {\mathcal {A}}}\) and \(X_{{\mathcal {I}} {\mathcal {I}}}\) in (5) is

$$\begin{aligned} \begin{bmatrix} H_{{\mathcal {A}} {\mathcal {A}}} +X_{{\mathcal {A}} {\mathcal {A}}} ^{-1}\varLambda _{{\mathcal {A}} {\mathcal {A}}} &{} H_{{\mathcal {A}} {\mathcal {I}}} \\ H_{{\mathcal {I}} {\mathcal {A}}} &{} H_{{\mathcal {I}} {\mathcal {I}}} + X_{{\mathcal {I}} {\mathcal {I}}}^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \end{bmatrix} \begin{bmatrix} \varDelta x_{\mathcal {A}}^N \\ \varDelta x_{\mathcal {I}}^N \end{bmatrix} = - \begin{bmatrix} \nabla f(x)_{\mathcal {A}} - \mu X_{{\mathcal {A}} {\mathcal {A}}} ^{-1}e \\ \nabla f(x)_{\mathcal {I}} - \mu X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}e \\ \end{bmatrix}. \end{aligned}$$
(6)

By continuity of \((x^{\mu }, \lambda ^{\mu })\) it follows that \(x_i \rightarrow 0\), \(i\in {\mathcal {A}}\), and \(\lambda _i \rightarrow 0\), \(i\in {\mathcal {I}}\), as \(\mu \rightarrow 0\). In consequence, \(X_{{\mathcal {I}} {\mathcal {I}}}\) and \(\varLambda _{{\mathcal {A}} {\mathcal {A}}}\) dominate the coefficients of the third and fourth block of (5) for sufficiently small \(\mu\) under strict complementarity. Similarly \(X_{{\mathcal {A}} {\mathcal {A}}} ^{-1}\varLambda _{{\mathcal {A}} {\mathcal {A}}}\) dominates the coefficients of the first block of (6). Consequently, approximate solutions of \(\varDelta x_{\mathcal {A}}^N\) and \(\varDelta \lambda _{\mathcal {I}}^N\) can be obtained from the third and fourth block of (5), and of \(\varDelta x_{\mathcal {A}}^N\) from the first block of (6). These approximates can then be inserted into (5), or (6), to obtain a reduced system of size \(| {\mathcal {I}} |\) \(\times\) \(| {\mathcal {I}} |\) that involves \(H_{{\mathcal {I}} {\mathcal {I}}}\). The solution of this system gives an approximation of \(\varDelta x_{\mathcal {I}}^N\). These observations together with Lemma 4 and Lemma 5 below provide the foundation for the results. The essence of Lemma 5 is that the norm of the solution of (2) is bounded by a constant times \(\mu\).

Lemma 5

Under Assumption 1, let \({\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) and \({\hat{\mu }}\) be defined by Lemma 1 and Lemma 2 respectively. For \(0< \mu \le {\hat{\mu }}\) and \((x,\lambda ) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\), let \((\varDelta x^N, \varDelta \lambda ^N)\) be the solution of (2) with \(\mu ^+ = \sigma \mu\), where \(0< \sigma < 1\). If \((x,\lambda )\) is sufficiently close to \((x^{\mu }, \lambda ^{\mu }) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\) such that \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\) then

$$\begin{aligned} \left\| \left( \varDelta x^N ,\varDelta \lambda ^N \right) \right\| = {\mathcal {O}}(\mu ). \end{aligned}$$

Proof

By (2) it holds that

$$\begin{aligned} \left\| \left( \varDelta x^N ,\varDelta \lambda ^N \right) \right\|&= \left\| F'(x,\lambda ) ^{-1}F_{\mu ^+} (x, \lambda ) \right\| \\&= \big \Vert F'(x,\lambda ) ^{-1}\left[ F_{\mu ^+} (x, \lambda ) - F_{\mu ^+} (x^{\mu ^+}, \lambda ^{\mu ^+}) \right] \big \Vert . \end{aligned}$$

Continuity of \(F'\) on \({\mathcal {B}}((x^*, \lambda ^*), \delta )\) implies that \(F_{\mu ^+}\) is Lipschitz continuous. Moreover, both \((x,\lambda )\) and \((x^{\mu ^+}, \lambda ^{\mu ^+})\) belong to \({\mathcal {B}}((x^*, \lambda ^*), \delta )\). Lipschitz continuity of \(F_{\mu ^+}\) and Lemma 1 yield

$$\begin{aligned} \left\| \left( \varDelta x^N ,\varDelta \lambda ^N \right) \right\| \le M L_{F_{\mu^+}}\big \Vert \left( x , \lambda \right) - ( x^{\mu ^+} , \lambda ^{\mu ^+} ) \big \Vert . \end{aligned}$$

Addition and subtraction of \((x^\mu , \lambda ^\mu )\) in the norm of the right-hand side give

$$\begin{aligned} \left\| \left( \varDelta x^N ,\varDelta \lambda ^N \right) \right\|&\le M L_{F_{\mu^+}} \big \Vert \left( x , \lambda \right) - \left( x^{\mu } , \lambda ^{\mu } \right) + \left( x^{\mu } , \lambda ^{\mu } \right) - ( x^{\mu ^+} , \lambda ^{\mu ^+} ) \big \Vert \\&\le M L_{F_{\mu^+}} \left( \big \Vert \left( x , \lambda \right) - \left( x^{\mu } , \lambda ^{\mu } \right) \big \Vert + \big \Vert \left( x^{\mu }, \lambda ^{\mu } \right) - ( x^{\mu ^+} , \lambda ^{\mu ^+} ) \big \Vert \right) \\&\le M L_{F_{\mu^+}} \left( \frac{1}{C_5} \big \Vert F_\mu (x,\lambda ) \big \Vert + C_4(1-\sigma )\mu \right) \\&\le M L_{F_{\mu^+}}\left( \frac{C_1}{C_5} + C_4(1-\sigma ) \right) \mu , \end{aligned}$$

where the second last inequality follows from Lemma 3 and Lipschitz continuity of \((x^{\mu }, \lambda ^{\mu })\). The last inequality follows from \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\), i.e., there exists a constant \(C_1>0\) such that \(\Vert F_{\mu } (x,\lambda ) \Vert \le C_1 \mu\). \(\square\)

3.1 Partial approximate solutions

In this section we initially propose an approximate solution of \(\varDelta x_{\mathcal {A}}^N\) which originates from the Schur complement form (6). This approximate solution will be labeled with superscript “S” due to its origin. As \(\mu \rightarrow 0\), the diagonal elements of the (1,1)-block become large and dominate the coefficients of the matrix under strict complementarity. In Proposition 1 we show that an approximate solution of \(\varDelta x_{\mathcal {A}}^N\) can be obtained by neglecting all off-diagonal coefficients in the the first block of (6). Thereafter, we propose another approximate solution of \(\varDelta x_{\mathcal {A}}^N\), as well as one of \(\varDelta \lambda _{\mathcal {I}}^N\), which originate from the complementarity blocks of (5). These approximate solutions will be labeled with superscript “C” due to their origin. The solutions are obtained by neglecting the coefficients in the complementarity blocks which approach zero as \(\mu \rightarrow 0\), i.e., those in \(X_{{\mathcal {A}} {\mathcal {A}}}\) and \(\varLambda _{{\mathcal {I}} {\mathcal {I}}}\). The resulting partial approximate solutions are given below in Proposition 2. The essence of both results is that, under certain conditions, the asymptotic component error bounds are in the order of \(\mu ^2\). Finally we motive and propose two Newton-like approaches which we later on investigate numerically.

Proposition 1

Under Assumption 1, let \({\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) and \({\hat{\mu }}\) be defined by Lemma 1and Lemma 2respectively. For \((x,\lambda ) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\), let \((\varDelta x^N, \varDelta \lambda ^N)\) be the solution of (2) with \(\mu ^+ = \sigma \mu\), where \(0< \sigma < 1\). If the search direction components are defined as

$$\begin{aligned} \varDelta x_i^S = -\frac{ x_i [\nabla f(x)]_i - \mu ^+}{ x_i \left[ \nabla ^2 f(x)\right] _{ii} + \lambda _i}, \quad i=1,\dots ,n, \end{aligned}$$
(7)

then

$$\begin{aligned} \varDelta x_i^S - \varDelta x_i^N = \frac{x_i}{ x_i \left[ \nabla ^2 f(x)\right] _{ii} + \lambda _i} \sum _{i \ne j} \left[ \nabla ^2 f(x)\right] _{ij} \varDelta x_j^N, \quad i=1,\dots ,n. \end{aligned}$$
(8)

Assume in addition that \(0 < \mu \le {\hat{\mu }}\) and \((x,\lambda )\) is sufficiently close to \((x^{\mu }, \lambda ^{\mu })\in {\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) such that \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\). Then there exists \({\bar{\mu }}\), with \(0 <{\bar{\mu }}\le {\hat{\mu }}\), such that for \(0 < \mu \le {\bar{\mu }}\) it holds that

$$\begin{aligned} \frac{1}{x_i \left[ \nabla ^2 f(x)\right] _{ii} + \lambda _i} = \varTheta (1), \quad i=1,\dots ,n, \end{aligned}$$
(9)

and

$$\begin{aligned} | \varDelta x_i^S - \varDelta x^N_i | = {\mathcal {O}}(\mu ^2), \quad i \in {\mathcal {A}}. \end{aligned}$$
(10)

Proof

The solution of (2) for \(\varDelta x^N\) is equivalent to the solution of (6) where the i’th, \(i=1,\dots ,n\), row is

$$\begin{aligned} \sum _{j \ne i}^n \left[ \nabla ^2 f(x) \right] _{ij} \varDelta x_{j}^N + \left( \left[ \nabla ^2 f(x) \right] _{ii} + \frac{\lambda _i}{x_i} \right) \varDelta x_i^N = -\left( \left[ \nabla f(x) \right] _i - \frac{\mu ^+}{x_i} \right) . \end{aligned}$$
(11)

If \(x_i \left[ \nabla ^2 f(x)\right] _{ii} + \lambda _i \ne 0\) then (11) can be written as

$$\begin{aligned} \varDelta x_i^N&= \frac{x_i}{ x_i \left[ \nabla ^2 f(x) \right] _{ii} + \lambda _i} \left( -\left( \left[ \nabla f(x) \right] _i - \frac{\mu ^+}{x_i} \right) - \sum _{j \ne i}^n \left[ \nabla ^2 f(x) \right] _{ij} \varDelta x_{j}^N \right) \nonumber \\&= -\frac{ x_i [\nabla f(x)]_i - \mu ^+}{ x_i \left[ \nabla ^2 f(x)\right] _{ii} + \lambda _i} -\frac{x_i}{ x_i \left[ \nabla ^2 f(x)\right] _{ii} + \lambda _i} \sum _{j \ne i}^n \left[ \nabla ^2 f(x) \right] _{ij} \varDelta x_{j}^N. \end{aligned}$$
(12)

Subtraction of (12) from (7) gives (8). By Lemma 4 there exists \({\bar{\mu }}_3\), with \(0<{\bar{\mu }}_3 \le {\hat{\mu }}\) such that the components of \((x,\lambda )\) satisfy (4). Due to the boundedness of f on \({\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) there exists \({\bar{\mu }}_4\), with \(0<{\bar{\mu }}_4 \le {\hat{\mu }}\), such that (9) holds for \(0<\mu \le {\bar{\mu }}\) with \({\bar{\mu }}= \min \{{\bar{\mu }}_3, {\bar{\mu }}_4\}\). The result of (10) follows from application of Lemma 4 and Lemma 5 to (8) while taking (9) into account.\(\square\)

The approximate solution \(\varDelta x^S\) in (7) of Proposition 1 and its corresponding component error (8) may be undefined for certain components. However, the essence is that the expressions are well-defined sufficiently close to the barrier trajectory for sufficiently small \(\mu\), as shown by (9). In particular the component errors of (10) are bounded by a constant times \(\mu ^2\) only for components \(i \in {\mathcal {A}}\), although the expressions (7) and associated errors (8) hold for all components \(i=1,\dots ,n\). An approximate solution that is guaranteed to have all its components well-defined can be obtained from the complementarity blocks of (5). This approximate solution, and in addition an approximate solution of \(\varDelta \lambda _{\mathcal {I}}^N\), are given in the proposition below.

Proposition 2

Under Assumption 1, let \({\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) and \({\hat{\mu }}\) be defined by Lemma 1and Lemma 2 respectively. For \((x,\lambda ) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\), let \((\varDelta x^N, \varDelta \lambda ^N)\) be the solution of (2) with \(\mu ^+ = \sigma \mu\), where \(0< \sigma < 1\). If the search direction components are defined as

$$\begin{aligned} \varDelta x_i^C&= - x_i + \frac{\mu ^+}{\lambda _i},&i=1,\dots ,n, \end{aligned}$$
(13a)
$$\begin{aligned} \varDelta \lambda _i^C&= - \lambda _i + \frac{\mu ^+}{x_i},&i=1,\dots ,n, \end{aligned}$$
(13b)

then

$$\begin{aligned} \varDelta x_i^C - \varDelta x_i^N&= \frac{x_i}{\lambda _i} \varDelta \lambda _i^N,&i=1,\dots ,n, \end{aligned}$$
(14a)
$$\begin{aligned} \varDelta \lambda _i^C - \varDelta \lambda _i^N&= \frac{\lambda _i}{x_i} \varDelta x_i^N,&i=1,\dots ,n. \end{aligned}$$
(14b)

Assume in addition that \(0 < \mu \le {\hat{\mu }}\) and \((x,\lambda )\) is sufficiently close to \((x^{\mu }, \lambda ^{\mu })\in {\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) such that \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\). Then there exists \({\bar{\mu }}\), with \(0 <{\bar{\mu }}\le {\hat{\mu }}\), such that for \(0 < \mu \le {\bar{\mu }}\) it holds that

$$\begin{aligned}&| \varDelta x_i^C - \varDelta x_i^N | = {\mathcal {O}}(\mu ^2),&i \in {\mathcal {A}}, \end{aligned}$$
(15a)
$$\begin{aligned}&| \varDelta \lambda _i^C - \varDelta \lambda _i^N | = {\mathcal {O}}(\mu ^2),&i \in {\mathcal {I}}. \end{aligned}$$
(15b)

Proof

The i’th, \(i=1,\dots ,n\), row in the second block of (2) is

$$\begin{aligned} \lambda _i \varDelta x_i^N + x_i \varDelta \lambda _i^N = - \lambda _i x_i + \mu ^+, \end{aligned}$$

For \(x_i > 0\), \(\lambda _i > 0\), \(i,\dots , n\), it holds that

$$\begin{aligned} \varDelta x_i^N&= - x_i + \frac{\mu ^+}{\lambda _i} - \frac{x_i}{\lambda _i} \varDelta \lambda _i^N, \end{aligned}$$
(16a)
$$\begin{aligned} \varDelta \lambda _i^N&= - \lambda _i + \frac{\mu ^+}{x_i} - \frac{\lambda _i}{x_i} \varDelta x_i^N. \end{aligned}$$
(16b)

Subtraction of (16a) from (13a) and subtraction of (16b) from (13b) gives (14a) and (14b) respectively. By Lemma 4 there exists \({\bar{\mu }}\), with \(0<{\bar{\mu }}\le {\hat{\mu }}\) such that the components of \((x,\lambda )\) satisfy (4) for \(0<\mu \le {\bar{\mu }}\). The result of (15) then follows from application of Lemma 5 to (16) while taking (4) into account.\(\square\)

The expressions for \(\varDelta x^C_i\) and \(\varDelta \lambda ^C_i\), (13a) and (13b) respectively, and their associated component errors (14a) and (14b) respectively, hold for all components. The essence of the results in Proposition 2 is that the component errors are bounded by a constant times \(\mu ^2\) only for certain components. Specifically, for \(\varDelta x^C_i\), \(i \in {\mathcal {A}}\), and \(\varDelta \lambda _i^C\), \(i \in {\mathcal {I}}\). Both \(\varDelta x^S_i\) given by (7) and \(\varDelta x^C_i\) given by (13a) provide approximate solutions of \(\varDelta x_i^N\), \(i \in {\mathcal {A}}\), with similar asymptotic error bounds. Note that the order of the approximation error, \(\Vert \varDelta x_{\mathcal {A}} - \varDelta x_{\mathcal {A}}^N \Vert\), is maintained even if some components \(i \in {\mathcal {A}}\) are updated with (7) and others with (13a). Which expression to use can hence be chosen individually for each index \(i \in {\mathcal {A}}\). The factors in front of \(\varDelta x_i^N\) and \(\varDelta \lambda _i^N\), \(i =1,\dots , n\), in the component errors of (8) and (14) respectively may be used as an indicator for which of the approximations to use, and also whether either expression is likely to provide an accurate approximation. Note also that the approximate solution \(\varDelta x^C\) given by (13a) does not take into account any information from the first block equation of (2), whereas \(\varDelta x^S\) given by (7) includes information from both blocks.

Provided that the norm of the combined steps \(\varDelta x_{\mathcal {A}}^N\) and \(\varDelta \lambda _{\mathcal {I}}^N\) is not smaller than the approximation error, then stepping in these components with (7) or (13) give a vector which is not further from the Newton iterate. This is formalized in Proposition 3 below.

Proposition 3

Under Assumption 1, let \({\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) and \({\hat{\mu }}\) be defined by Lemma 1 and Lemma 2respectively. For \((x,\lambda ) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\), define \((x_+^{N},\lambda _+^{N})=(x,\lambda )+ (\varDelta x^N,\varDelta \lambda ^N)\) where \((\varDelta x^N, \varDelta \lambda ^N)\) is the solution of (2) with \(\mu ^+ = \sigma \mu\), where \(0< \sigma < 1\). Moreover, let \((x_+, \lambda _+) = (x,\lambda ) + (\varDelta x, \varDelta \lambda )\) where

$$\begin{aligned} \varDelta x_i = {\left\{ \begin{array}{ll} \varDelta x_i^S \text{ or } \varDelta x_i^C &{} i \in {\mathcal {A}}, \\ 0 &{} i \in {\mathcal {I}}, \end{array}\right. } \quad \varDelta \lambda _i = {\left\{ \begin{array}{ll} 0 &{} i \in {\mathcal {A}}, \\ \varDelta \lambda _i^C &{} i \in {\mathcal {I}}, \end{array}\right. } \end{aligned}$$
(17)

with \(\varDelta x_i^C\), \(\varDelta \lambda _i^C\) and \(\varDelta x_i^S\) given by (13) and (7) respectively. Assume that \(0 < \mu \le {\hat{\mu }}\), \(\Vert (\varDelta x_{\mathcal {A}}^N, \varDelta \lambda _{\mathcal {I}}^N) \Vert =\varOmega ( \mu ^\gamma )\) for \(0\le \gamma < 2\), and \((x,\lambda )\) is sufficiently close to \((x^{\mu }, \lambda ^{\mu })\in {\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) such that \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\). Then there exists \({\bar{\mu }}\), with \(0 <{\bar{\mu }}\le {\hat{\mu }}\), such that for \(0 < \mu \le {\bar{\mu }}\) it holds that

$$\begin{aligned} \Vert (x_+^N,\lambda _+^N) - (x_+,\lambda _+) \Vert \le \Vert (x_+^N,\lambda _+^N) - (x,\lambda ) \Vert . \end{aligned}$$
(18)

Proof

With \((\varDelta x, \varDelta \lambda )\) defined as in (17) of the proposition it holds that

$$\begin{aligned}&\Vert (x_+^N,\lambda _+^N) - (x_+,\lambda _+) \Vert ^2 - \Vert (x_+^N,\lambda _+^N) - (x,\lambda ) \Vert ^2 \nonumber \\&=\Vert (\varDelta x^N - \varDelta x, \varDelta \lambda ^N - \varDelta \lambda ) \Vert ^2 - \Vert (\varDelta x^N, \varDelta \lambda ^N) \Vert ^2 \nonumber \\&=\Vert (\varDelta x_{\mathcal {A}}^N - \varDelta x_{\mathcal {A}}, \varDelta \lambda _{\mathcal {I}}^N - \varDelta \lambda _{\mathcal {I}}) \Vert ^2 - \Vert (\varDelta x_{\mathcal {A}}^N, \varDelta \lambda _{\mathcal {I}}^N) \Vert ^2. \end{aligned}$$
(19)

By Proposition 1 and Proposition 2 there exists \({\bar{\mu }}_5\) and \({\bar{\mu }}_6\) respectively, with \(0 < {\bar{\mu }}_i \le {\hat{\mu }}\), \(i=5,6,\) such that for \(\varDelta x_i\) equal to \(\varDelta x_i^S \text{ or } \varDelta x_i^C\) it holds that \(| \varDelta x_i - \varDelta x_i^N | ={\mathcal {O}}(\mu ^2)\), \(i \in {\mathcal {A}}\), for \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). By Proposition 2 it also holds that \(| \varDelta \lambda _i^C - \varDelta \lambda _i^N | = {\mathcal {O}}(\mu ^2)\), \(i \in {\mathcal {I}}\), for \(0 < \mu \le {\bar{\mu }}_6\). Hence, for \(0<\mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\), there exist constants \(C_7>0\) and \(C_8>0\), where \(C_8\) comes from the condition \(\Vert (\varDelta x_{\mathcal {A}}^N, \varDelta \lambda _{\mathcal {I}}^N) \Vert =\varOmega ( \mu ^\gamma )\), \(0\le \gamma < 2\), such that

$$\begin{aligned} \Vert (\varDelta x_{\mathcal {A}}^N - \varDelta x_{\mathcal {A}}, \varDelta \lambda _{\mathcal {I}}^N - \varDelta \lambda _{\mathcal {I}}) \Vert ^2 - \Vert (\varDelta x_{\mathcal {A}}^N, \varDelta \lambda _{\mathcal {I}}^N) \Vert ^2 \le C_7^2 \mu ^4 - C_8^2 \mu ^{2\gamma }. \end{aligned}$$
(20)

The right-hand side of (20) is non-positive for \(0<\mu \le (C_8/C_7)^{\frac{1}{2-\gamma }}\), \(0\le \gamma < 2\). Combining (19)–(20) with \({\bar{\mu }}= \min \{{\bar{\mu }}_5, {\bar{\mu }}_6, (C_8/C_7)^{\frac{1}{2-\gamma }} \}\) gives the result. \(\square\)

The partial approximate solution (17) of Proposition 3 is computationally inexpensive compared to solving (2). In consequence, (18) motivates the study of Newton-like approaches which make use of (17). We will construct two such approaches where the idea is to utilize the intermediate iterate

$$\begin{aligned} (x^{E}, \lambda ^{E}) = (x+\varDelta x^E, \lambda +\varDelta \lambda ^E), \end{aligned}$$
(21)

with \((\varDelta x^E, \varDelta \lambda ^E)\) as in (17). It is thus only the active components of x and inactive components of \(\lambda\) that is updated in the step to \((x^E, \lambda ^E)\). For simplicity we describe the ideas for unit step length, in practice the iterates would be required to be strictly feasible.

The first approach is based on the fact that solving a Newton system from the iterate \((x^E,\lambda ^E)\) provides potential improvement, provided that \((x^E, \lambda ^E)\) is strictly feasible and lies in \({\mathcal {B}}((x^*, \lambda ^*), \delta )\). A full iteration in the approach consists of the approximate intermediate step (21) together with the solution of

$$\begin{aligned} F'(x^E,\lambda ^E) \begin{bmatrix} \varDelta x \\ \varDelta \lambda \end{bmatrix} = - F_\mu (x^E, \lambda ^E), \end{aligned}$$
(22)

and the step \((x^E + \varDelta x, \lambda ^E + \varDelta \lambda )\).

The idea of the second approach is to update the coefficients in the complementarity blocks of the matrix in (2). The approach may hence under strict complementarity be interpreted as an approximate higher-order method. A full iteration consists of the step (21), the solution of

$$\begin{aligned} \begin{bmatrix} H &{} -I \\ \varLambda ^E &{} X^E \end{bmatrix} \begin{bmatrix} \varDelta x \\ \varDelta \lambda \end{bmatrix} = - F_\mu (x, \lambda ), \end{aligned}$$
(23)

where \(\varLambda ^E = \text{ diag }(\lambda ^E)\) and \(X^E = \text{ diag }(x^E)\), together with the step \((x+\varDelta x, \lambda +\varDelta \lambda )\). The approach may hence also be interpreted as a modified Newton method where the Jacobian of each Newton system is altered.

Numerical results for the approximate intermediate step and the approximate higher-order approach are shown in Sect. 4. The results are for bound-constrained quadratic optimization problems where strict complementarity typically does not hold. The complexity of each iteration in both approaches is the same as with Newton’s method. The hope is thus to reduce the total number of iteration necessary for convergence. See the work by Gondzio and Sobral [14] for quasi-Newton approaches for quadratic problems where each iteration is inexpensive in comparison to the approaches above.

3.2 Full approximate solution

In this section we propose approximate solutions of (2) that, in the considered framework, have an asymptotic error bound in the order of \(\mu ^2\). The full approximate solutions are obtained by utilizing either of the partial approximate solutions of \(\varDelta x_{\mathcal {A}}^N\) in Proposition 1 or Proposition 2 while exploiting structure in the systems that arise. Specifically, suppose that an approximate \(\varDelta x_{{\mathcal {A}}}\) is given, e.g., \(\varDelta x^S_{\mathcal {A}}\) given by (7) or \(\varDelta x^C_{\mathcal {A}}\) given by (13a). Insertion of the approximate \(\varDelta x_{{\mathcal {A}}}\) into (5) yields

$$\begin{aligned} \begin{bmatrix} H_{{\mathcal {A}} {\mathcal {I}}} &{} -I_{{\mathcal {A}} {\mathcal {A}}} &{} \\ H_{{\mathcal {I}} {\mathcal {I}}} &{} &{} -I_{{\mathcal {I}}{\mathcal {I}}} \\ &{} X_{{\mathcal {A}} {\mathcal {A}}} &{} \\ \varLambda _{{\mathcal {I}} {\mathcal {I}}} &{} &{} X_{{\mathcal {I}} {\mathcal {I}}} \end{bmatrix} \begin{bmatrix} \varDelta x_{\mathcal {I}}^{ls} \\ \varDelta \lambda _{\mathcal {A}}^{ls} \\ \varDelta \lambda _{\mathcal {I}}^{ls} \end{bmatrix} = -\begin{bmatrix} \nabla f(x)_{\mathcal {A}} - \lambda _{\mathcal {A}}+ H_{{\mathcal {A}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}} \\ \nabla f(x)_{\mathcal {I}} - \lambda _{\mathcal {I}}+ H_{{\mathcal {I}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}} \\ \varLambda _{{\mathcal {A}} {\mathcal {A}}} X_{{\mathcal {A}} {\mathcal {A}}} e - \mu e + \varLambda _{{\mathcal {A}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}} \\ \varLambda _{{\mathcal {I}} {\mathcal {I}}} X_{{\mathcal {I}} {\mathcal {I}}} e - \mu e \end{bmatrix}, \end{aligned}$$
(24)

where the solution is given the superscript “ls” since it will lead to a least squares system. The second and fourth block of (24) provide unique solutions of \(\varDelta x^{ls}_{\mathcal {I}}\) and \(\varDelta \lambda ^{ls}_{\mathcal {I}}\) which satisfy

$$\begin{aligned} \begin{bmatrix} H_{{\mathcal {I}} {\mathcal {I}}} &{} -I_{{\mathcal {I}}{\mathcal {I}}} \\ \varLambda _{{\mathcal {I}} {\mathcal {I}}} &{} X_{{\mathcal {I}} {\mathcal {I}}} \end{bmatrix} \begin{bmatrix} \varDelta x_{\mathcal {I}}^{ls} \\ \varDelta \lambda _{\mathcal {I}}^{ls} \end{bmatrix} = -\begin{bmatrix} \nabla f(x)_{\mathcal {I}} - \lambda _{\mathcal {I}} + H_{{\mathcal {I}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}} \\ \varLambda _{{\mathcal {I}} {\mathcal {I}}} X_{{\mathcal {I}} {\mathcal {I}}} e - \mu e \end{bmatrix}. \end{aligned}$$
(25)

The solution of (25) can be obtained by first solving with the Schur complement of \(X_{{\mathcal {I}} {\mathcal {I}}}\)

$$\begin{aligned} \left( H_{{\mathcal {I}} {\mathcal {I}}} + X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \right) \varDelta x_{\mathcal {I}}^{ls} = -\left( \nabla f(x)_{\mathcal {I}} + H_{{\mathcal {I}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}} \right) + \mu X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}e, \end{aligned}$$
(26)

and then

$$\begin{aligned} \varDelta \lambda _{{\mathcal {I}}}^{ls} = - X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\left( \varLambda _{{\mathcal {I}} {\mathcal {I}}} X_{{\mathcal {I}} {\mathcal {I}}} e - \mu e \right) - X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \varDelta x_{{\mathcal {I}}}^{ls}. \end{aligned}$$
(27)

Note that (26) can also be obtained by insertion of the given \(\varDelta x_{\mathcal {A}}\) into the second block of (6). The matrix of (26) is by Assumption 1 a symmetric positive definite \(( |{\mathcal {I}}|\) \(\times\) \(|{\mathcal {I}}| )\)-matrix. Moreover, the matrix does not become increasingly ill-conditioned due to large elements in \(X^{-1}\varLambda\), under strict complementarity as \(\mu \rightarrow 0\), in contrast to the matrix of (6). The remanding part of the solution of (24), that is \(\varDelta \lambda _{\mathcal {A}}^{ls}\), is then given by

$$\begin{aligned} \begin{bmatrix} -I_{{\mathcal {A}} {\mathcal {A}}} \\ X_{{\mathcal {A}} {\mathcal {A}}} \end{bmatrix} \varDelta \lambda _{{\mathcal {A}}}^{ls} = -\begin{bmatrix} \nabla f(x)_{\mathcal {A}} + \lambda _{\mathcal {A}} + H_{{\mathcal {A}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}}+ H_{{\mathcal {A}} {\mathcal {I}}} \varDelta x_{{\mathcal {I}}}^{ls}\\ \varLambda _{{\mathcal {A}} {\mathcal {A}}} X_{{\mathcal {A}} {\mathcal {A}}} e - \mu e +\varLambda _{{\mathcal {A}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}} \end{bmatrix}. \end{aligned}$$
(28)

If the approximate \(\varDelta x_{\mathcal {A}}\) is exact, i.e., if \(\varDelta x_{\mathcal {A}} = \varDelta x_{\mathcal {A}}^N\), then \(\varDelta x_{\mathcal {I}}^{ls}= \varDelta x_{\mathcal {I}}^N\) by (26). In consequence, the over-determined system (28) has a unique solution that satisfies all equations, i.e., \(\varDelta \lambda _{\mathcal {A}}^{ls}\) is the corresponding part of the solution to (2). The solutions corresponding to the first and second block equation of (28) will be assigned superscripts “b” and “−” respectively. These are given by

$$\begin{aligned} \varDelta \lambda _{{\mathcal {A}}}^b = \nabla f(x)_{\mathcal {A}} + \lambda _{\mathcal {A}} + H_{{\mathcal {A}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}}+ H_{{\mathcal {A}} {\mathcal {I}}} \varDelta x_{{\mathcal {I}}}^{ls}, \end{aligned}$$
(29a)

and

$$\begin{aligned} \varDelta \lambda _{{\mathcal {A}}}^{-} = - \lambda _{\mathcal {A}} + \mu X_{{\mathcal {A}} {\mathcal {A}}} ^{-1}e - X_{{\mathcal {A}} {\mathcal {A}}} ^{-1}\varLambda _{{\mathcal {A}} {\mathcal {A}}} \varDelta x_{{\mathcal {A}}}. \end{aligned}$$
(29b)

Alternatively, \(\varDelta \lambda _{A}^{ls}\) can be obtained as the least squares solution of (28) that is

$$\begin{aligned} \varDelta \lambda _{{\mathcal {A}}}^{ls} =&\left( I_{{\mathcal {A}} {\mathcal {A}}} + X_{{\mathcal {A}} {\mathcal {A}}}^2 \right) ^{-1}\Big [ \nabla f(x)_{\mathcal {A}} + \lambda _{\mathcal {A}} + H_{\mathcal {A} \mathcal {A}} \varDelta x_{\mathcal {A}} + H_{\mathcal {A} \mathcal {I}} \varDelta x_{\mathcal {I}}^{ls} \nonumber \\&- X_{\mathcal {A} \mathcal {A}} \left( \varLambda _{\mathcal {A} \mathcal {A}} X_{\mathcal {A} \mathcal {A}} e - \mu e +\varLambda _{\mathcal {A} \mathcal {A}} \varDelta x_{\mathcal {A}} \right) \Big ]. \end{aligned}$$
(30)

In Theorem 1 it is shown that, under certain conditions, both \(\varDelta \lambda _{\mathcal {A}}^b\) given by (29a) and \(\varDelta \lambda _{\mathcal {A}}^{ls}\) given by (30) can be used to approximate \(\varDelta \lambda _{\mathcal {A}}^N\) without affecting the order of the asymptotic error. Note however that this is not true for \(\varDelta \lambda _{\mathcal {A}}^{-}\) given by (29b) due to the last term that contains \(X_{\mathcal {A} \mathcal {A}} ^{-1}\) in combination with approximation error.

Theorem 1

Under Assumption 1, let \({\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) and \({\hat{\mu }}\) be defined by Lemma 1 and Lemma 2 respectively. For \(0< \mu \le {\hat{\mu }}\) and \((x,\lambda ) \in {\mathcal {B}}((x^*, \lambda ^*), \delta )\), let \((\varDelta x^N, \varDelta \lambda ^N)\) be the solution of (2) with \(\mu ^+ = \sigma \mu\), where \(0< \sigma < 1\). Moreover, let the search direction components be defined as

$$\begin{aligned} \varDelta x_i = {\left\{ \begin{array}{ll} \varDelta x_i^S \text { or } \varDelta x_i^C &{} i \in {\mathcal {A}}, \\ \varDelta x_i^{ls} &{} i \in {\mathcal {I}}, \end{array}\right. } \quad \varDelta \lambda _i = {\left\{ \begin{array}{ll} \varDelta \lambda _i^{ls} \text { or } \varDelta \lambda _i^b &{} i \in {\mathcal {A}},\\ \varDelta \lambda _i^{ls} \text { or } \varDelta \lambda _i^{C} &{} i \in {\mathcal {I}}, \end{array}\right. } \end{aligned}$$

where \(\varDelta x_i^S\) is given by (7), \(\varDelta x_i^C\) by (13a), \(\varDelta x_i^{ls}\) by (26), \(\varDelta \lambda _i^{ls}\) by (30), \(\varDelta \lambda _i^b\) by (29a), \(\varDelta \lambda _i^{ls}\) by (27) and \(\varDelta \lambda _i^C\) by (13b). Assume that \(0 < \mu \le {\hat{\mu }}\) and \((x,\lambda )\) is sufficiently close to \((x^{\mu }, \lambda ^{\mu })\in {\mathcal {B}}\left( (x^*, \lambda ^*), \delta \right)\) such that \(\Vert F_{\mu } (x,\lambda ) \Vert = {\mathcal {O}}(\mu )\). Then there exists \({\bar{\mu }}\), with \(0 <{\bar{\mu }}\le {\hat{\mu }}\), such that for \(0 < \mu \le {\bar{\mu }}\) it holds that

$$\begin{aligned} \left\| (\varDelta x, \varDelta \lambda ) - (\varDelta x^N , \varDelta \lambda ^N) \right\| = {\mathcal {O}}(\mu ^2). \end{aligned}$$

Proof

Similarly as in the proof of Proposition 3. By Proposition 1 and Proposition 2 there exists \({\bar{\mu }}_5\) and \({\bar{\mu }}_6\) respectively, with \(0 < {\bar{\mu }}_i \le {\hat{\mu }}\), \(i=5,6,\) such that for \(\varDelta x_i\) equal to \(\varDelta x_i^S \text{ or } \varDelta x_i^C\) it holds that \(| \varDelta x_i - \varDelta x_i^N | ={\mathcal {O}}(\mu ^2)\), \(i \in {\mathcal {A}}\), for \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). In consequence it follows that \(\Vert \varDelta x_{\mathcal {A}} - \varDelta x_{\mathcal {A}}^N \Vert = {\mathcal {O}}(\mu ^2)\), \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). By Proposition 2 it also holds that \(| \varDelta \lambda _i^C - \varDelta \lambda _i^N | = {\mathcal {O}}(\mu ^2)\), \(i \in {\mathcal {I}}\), \(0 < \mu \le {\bar{\mu }}_6\). The backward error with \(\varDelta x^{ls}_{\mathcal {I}}\) as given in (26) is

$$\begin{aligned} \varDelta x^{ls}_{\mathcal {I}} - \varDelta x_{\mathcal {I}}^N = - \left( H_{\mathcal {I} \mathcal {I}} + X_{\mathcal {I} \mathcal {I}} ^{-1}\varLambda _{\mathcal {I} \mathcal {I}} \right) ^{-1}H_{\mathcal {I} \mathcal {A}} \left( \varDelta x_{\mathcal {A}} - \varDelta x_{\mathcal {A}}^N \right) , \end{aligned}$$

which gives

$$\begin{aligned} \left\| \varDelta x^{ls}_{\mathcal {I}} - \varDelta x_{\mathcal {I}}^N \right\|&\le \Vert \left( H_{{\mathcal {I}} {\mathcal {I}}} + X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \right) ^{-1}\Vert \Vert H_{{\mathcal {I}} {\mathcal {A}}} \Vert \Vert \varDelta x_{{\mathcal {A}}} - \varDelta x_{{\mathcal {A}}}^N \Vert \\&\le \frac{1}{\sigma _{min}\left( H_{{\mathcal {I}} {\mathcal {I}}} + X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \right) } \Vert H_{{\mathcal {I}} {\mathcal {A}}} \Vert \Vert \varDelta x_{{\mathcal {A}}} - \varDelta x_{\mathcal {A}}^N \Vert . \end{aligned}$$

Due to the assumption on f the elements of \(H_{{\mathcal {I}} {\mathcal {A}}}\) are bounded. Moreover, the smallest singular value of \(H_{{\mathcal {I}} {\mathcal {I}}} + X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}}\) is bounded away from zero since the matrix is positive definite by Assumption 1. Hence it follows that \(\| \varDelta x^{ls}_{\mathcal {I}} - \varDelta x_{\mathcal {I}}^N \| = {\mathcal {O}}(\mu ^2)\), \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). Note that \(\varDelta \lambda _{\mathcal {I}}^N\) is the solution of (27) with \(\varDelta x_{\mathcal {I}}^N\). Subtraction of (27), with \(\varDelta x_{\mathcal {I}}^N\), from (27) with the approximated solution \(\varDelta x_{\mathcal {I}}^{ls}\) gives \(\varDelta \lambda ^{ls}_{\mathcal {I}} - \varDelta \lambda ^{N}_{\mathcal {I}} = -X_{\mathcal {I} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \left( \varDelta x_{\mathcal {I}}^{ls} - \varDelta x^{N}_{\mathcal {I}} \right)\), and hence

$$\begin{aligned} \Vert \varDelta \lambda ^{ls}_{\mathcal {I}} - \varDelta \lambda ^{N}_{\mathcal {I}} \Vert \le \Vert X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \Vert \Vert \varDelta x_{{\mathcal {I}}}^{ls} - \varDelta x^{N}_{\mathcal {I}} \Vert . \end{aligned}$$

By Lemma 4 it holds that \(\Vert X_{{\mathcal {I}} {\mathcal {I}}} ^{-1}\varLambda _{{\mathcal {I}} {\mathcal {I}}} \Vert = {\mathcal {O}}(\mu )\), \(0 < \mu \le \max \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). With \(\| \varDelta x^{ls}_{\mathcal {I}} - \varDelta x_{\mathcal {I}}^N \| = {\mathcal {O}}(\mu ^2)\), \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\) it then follows that \(\Vert \varDelta \lambda ^{ls}_{\mathcal {I}} - \varDelta \lambda _{\mathcal {I}}^N \Vert = {\mathcal {O}}(\mu ^3)\), and also \(| \varDelta \lambda ^{ls}_i - \varDelta \lambda _i^N| = {\mathcal {O}}(\mu ^3)\), \(i \in {\mathcal {I}}\), \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). Similarly, \(\varDelta \lambda ^N_{\mathcal {A}}\) is the solution to (30) with \(\varDelta x^N_{\mathcal {A}}\) and \(\varDelta x^N_{\mathcal {I}}\). Subtraction of (30), with \(\varDelta x^N_{\mathcal {A}}\) and \(\varDelta x^N_{\mathcal {I}}\), from (30) with the approximated solutions gives

$$\begin{aligned} \left( I_{\mathcal {A} \mathcal {A}} + X_{\mathcal {A} \mathcal {A}}^2 \right) \left( \varDelta \lambda ^{ls}_{\mathcal {A}} - \varDelta \lambda _{\mathcal {A}}^N \right)&= \left( H_{\mathcal {A} \mathcal {A}} - X_{\mathcal {A} \mathcal {A}} \varLambda _{\mathcal {A} \mathcal {A}} \right) \left( \varDelta x_{\mathcal {A}} - \varDelta x_{\mathcal {A}}^N \right) \\&\quad + H_{\mathcal {A} \mathcal {I}} \left( \varDelta x^{ls}_{\mathcal {I}}- \varDelta x_{\mathcal {I}}^N \right) . \end{aligned}$$

The the largest singular value of \(\left( I_{\mathcal {A} \mathcal {A}} + X_{\mathcal {A} \mathcal {A}}^2 \right) ^{-1}\) is bounded by 1 and hence

$$\begin{aligned} \Vert \varDelta \lambda ^{ls}_{\mathcal {A}} - \varDelta \lambda _{\mathcal {A}}^N \Vert \le \left( \Vert H_{\mathcal {A} \mathcal {A}} \Vert + \Vert X_{\mathcal {A} \mathcal {A}} \varLambda _{\mathcal {A} \mathcal {A}} \Vert \right) \Vert \varDelta x_{\mathcal {A}} - \varDelta x_{\mathcal {A}}^N \Vert + \Vert H_{\mathcal {A} \mathcal {I}} \Vert \Vert \varDelta x^{ls}_{\mathcal {I}}- \varDelta x_{\mathcal {I}}^N \Vert . \end{aligned}$$

The elements of \(H_{\mathcal {A} \mathcal {A}}\) and \(H_{\mathcal {A} \mathcal {I}}\) are bounded and by Lemma 4 it holds that \(\Vert X_{\mathcal {A} \mathcal {A}} \varLambda _{\mathcal {A} \mathcal {A}} \Vert = {\mathcal {O}}(\mu )\), \(0 < \mu \le \max \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). Thus it follows that \(\Vert \varDelta \lambda ^{ls}_{\mathcal {A}} - \varDelta \lambda _{\mathcal {A}}^N \Vert = {\mathcal {O}}(\mu ^2)\), and also \(| \varDelta \lambda ^{ls}_i - \varDelta \lambda _i^N| = {\mathcal {O}}(\mu ^2)\), \(i \in {\mathcal {A}}\), \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). Similarly, (29a) gives the backward error

$$\begin{aligned} \varDelta \lambda ^b_{\mathcal {A}} - \varDelta \lambda _{\mathcal {A}}^N = H_{\mathcal {A} \mathcal {A}} \left( \varDelta x_{\mathcal {A}} - \varDelta x_{\mathcal {A}}^N \right) + H_{\mathcal {A} \mathcal {I}} \left( \varDelta x^{ls}_{\mathcal {I}} - \varDelta x_{\mathcal {I}}^N \right) . \end{aligned}$$

Hence

$$\begin{aligned} \Vert \varDelta \lambda ^b_{\mathcal {A}} - \varDelta \lambda _{\mathcal {A}}^N \Vert \le \Vert H_{\mathcal {A} \mathcal {A}} \Vert \Vert \varDelta x_{\mathcal {A}} - \varDelta x_{\mathcal {A}}^N \Vert + \Vert H_{\mathcal {A} \mathcal {I}} \Vert \Vert \varDelta x^{ls}_{\mathcal {I}} - \varDelta x_{\mathcal {I}}^N \Vert , \end{aligned}$$

from which it follows that \(\Vert \varDelta \lambda ^b_{\mathcal {A}} - \varDelta \lambda _{\mathcal {A}}^N \Vert = {\mathcal {O}}(\mu ^2)\), and also \(| \varDelta \lambda ^b_i - \varDelta \lambda _i^N | = {\mathcal {O}}(\mu ^2)\), \(i \in {\mathcal {A}}\), \(0 < \mu \le \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). Thus the result holds for \({\bar{\mu }}= \min \{ {\bar{\mu }}_5, {\bar{\mu }}_6 \}\). \(\square\)

Information is discarded in the calculation of the components \(\varDelta x^S_i\), \(\varDelta x_i^C\), \(i\in {\mathcal {A}}\), and \(\varDelta \lambda _i^C\), \(i \in {\mathcal {I}}\), with (7) and (13) respectively. The equations for the approximate solution in Theorem 1 show that it is essential to obtain a good approximate solution of \(\varDelta x_{\mathcal {A}}^N\). It is the error in the approximate solution of \(\varDelta x_{\mathcal {A}}^N\) that propagates through the suggested solutions labeled with ls and b. In contrast to all other components of the proposed full approximate solution, \(\varDelta \lambda _i^{ls}\), \(i\in {\mathcal {I}}\), actually have asymptotic component error bounds in the order of magnitude \(\mu ^3\), as can be seen in the proof of Theorem 1.

In general the active and inactive sets at the optimal solution are unknown and have to be estimated as the iterations proceed. The quality of the approximate solution of \(\varDelta x_{\mathcal {A}}^N\) will hence also depend on these estimates. There is a trade-off when estimating the set of active constraints. A restrictive strategy may lead to a more accurate approximate \(\varDelta x_{\mathcal {A}}\). However, it increases the cardinality of the inactive set and in consequence the size of the system (26) that needs to be solved at each iteration. In theory, the cardinality of the inactive set is determined by the number of inactive constraints at the solution of the specific problem, whereas in practice it is determined by the estimate. The size of the system that needs to be solved at each iteration may thus range from 0 to n. A restrictive strategy may also increase the size of some coefficients in the diagonal of the matrix of (26), or (43) in the general case, which may increase the condition number. A generous strategy on the other hand, decreases the size of the system that has to be solved but may increase the error in the approximate \(\varDelta x_{\mathcal {A}}\), which then propagates to other components of the approximate solution. In the ideal case with the true inactive set, then (25) and (26) are composed by the inactive parts of (2), or equivalently (5), and (6) respectively. Consequently, the inactive part of the Schur complement in (26) does not become increasingly ill-conditioned due to \(\mu\) approaching zero, in contrast to the complete Schur complement in (6). However, in practice the behavior will be dependent on an estimate of the inactive set.

Note also that the system that needs to be solved for the full approximate solution has the same structure as the original one. In consequence, our analysis may be interpreted in the framework of previous work on stability and effects of finite-precision arithmetic for interior-point methods, e.g., [11, 31,32,33]. In the case of quadratic problems, see also [23].

To increase the comprehensibility of the work we have described the theoretical foundation for problems on the form (P). Analogous results for problems on the more general form (NLP) together with complementary remarks are given in “Appendix A”.

4 Numerical results

As an initial numerical study we consider convex quadratic optimization problems with lower and upper bounds. In particular, randomly generated problems and a selection from the corresponding class in the CUTEst test collection [16]. The minimizers of the randomly generated problems satisfy strict complementarity, whereas the minimizers of the CUTEst problems typically do not. The simulations were done in Julia and all systems of linear equations were solved by its built-in solver. Moreover, the benchmark problems were initially processed using the packages CUTEst.jl and NLPmodels.jl by Orban and Siqueira [25].

The purpose of the first part of this section is to compare the proposed approximate solutions in Theorem 2. The intent is also to give a rough indication of how the approximation errors develop for practical values of \(\mu\). A setting is considered where the vector \((x,\lambda )\), that satisfies \(\Vert F_{\mu }(x,\lambda ) \Vert < \mu\), is found by an interior-point method. Thereafter, \(\mu\) is decreased by a factor \(\sigma = 0.1\) to \(\mu ^+=\sigma \mu\) and the approximate solution of (2) is calculated. This procedure was then repeated for different values of \(\mu\). Mean errors with one standard deviation error bars for the proposed approximate solutions are shown in Fig. 1. As mentioned, the results are for the approximate solutions given in Theorem 2 of “Appendix A” since the problems in general include lower and upper bounds. In order to avoid double subscripts in the approximates, we have throughout this section omitted the second subscript. Furthermore, \(\varDelta x^{S}_{\mathcal {A}}\) was used in the equations which require an initial approximation of \(\varDelta x_{\mathcal {A}}^N\). Figure 1 also shows the mean improvement in terms of the measure \(\Vert F_{\mu ^+} \Vert\) for two new iterates \((x_+^S, \lambda ^S_+)\) and \((x_+^C, \lambda ^C_+)\) defined by

$$\begin{aligned} (x_+^{ S,C }, \lambda _+^{ S,C }) = ( x+\alpha ^P \varDelta x, \lambda +\alpha ^D \varDelta \lambda ), \> \> \> (\varDelta x, \varDelta \lambda ) = \left( \begin{pmatrix} \varDelta x_{\mathcal {A}}^{ S,C } \\ \varDelta x_{\mathcal {I}}^{ls} \end{pmatrix}, \begin{pmatrix} \varDelta \lambda _{\mathcal {A}}^{ls} \\ \varDelta \lambda _{\mathcal {I}}^{ls} \end{pmatrix} \right) , \end{aligned}$$

with step lengths \(\alpha ^P\) and \(\alpha ^D\) as in Algorithm 1. Specifically, the search direction is composed by (35) or (36) combined with (43), (47) and (44). The figure also contains the mean improvement of the Newton iterate \((x^{N}_+, \lambda ^{N}_+)\), which is defined analogously. The results are for \(10^2\) randomly generated problems, with \(10^3\) variables, whose minimizers satisfy (31). For each problem, both the specific bounds as well as the specific active and inactive constraints were chosen by random. Moreover, the elements of the Hessian were uniformly distributed around zero with a sparsity level corresponding to approximately 40% non-zero elements. The condition numbers were in the order of magnitude \(10^7\)\(10^{10}\) and the largest singular values in the order of magnitude of \(10^3\).

Fig. 1
figure 1

Mean approximation error and mean progress with measure \(\Vert F_{\mu ^+}\Vert\) with one standard deviation error bars for randomly generated quadratic problems. The top and the bottom correspond to problems where approximately 3/4 and 1/4 of the variables respectively are inactive at the solution

The least accurate approximate solutions in Fig. 1 are those corresponding to active \(\lambda\) and inactive x. This is anticipated as their error bounds rely more heavily on the size of the elements of H. Moreover, it can be seen that \(\varDelta \lambda _{\mathcal {I}}^{ls}\) is favorable over \(\varDelta \lambda _{\mathcal {I}}^C\) for the problems considered. This is anticipated as \(\varDelta \lambda _{\mathcal {I}}^{ls}\) has asymptotic error bounds in the order of magnitude \(\mu ^3\), in contrast to the bounds corresponding to \(\varDelta \lambda _{\mathcal {I}}^C\) which is in the order of magnitude \(\mu ^2\), as mentioned in Sect. 3.2. In general, Fig. 1 gives an indication of what equation that is favorable for each partial approximate solution if one is to be chosen. However, as mentioned, more sophisticated choices can be made by carefully considering the known quantities in the individual error terms for specific components. The right side of Fig. 1 shows that the iterates \((x_+^S, \lambda ^S_+)\) and \((x_+^C, \lambda ^C_+)\) perform similar to \((x_+^N, \lambda ^N_+)\) in terms of the measure \(\Vert F_{\mu ^+} \Vert\) for a wide range of \(\mu\). The error bars show that the results are not sensitive to changes in specific bounds, which of the constraints are active/inactive or different initial solutions. Numerical simulations have shown, as the theory also predicts, that the results can be improved (or dis-improved) by increasing (or decreasing) the size of the coefficients of the matrix H as well as its sparsity level.

Next we show results for a selection of problems in the CUTEst test collection in the analogous setting. In the problems with variable options, the number of primal variables, \(n_x\), was typically chosen to approximately 5000, resulting in a total number of primal-dual variables in the order of \(10^4\). The number of primal variables of each specific problem is shown in Table 1. Each problem was initially solved by an interior-point method with stopping criterion \(\Vert F_{0} (x,\lambda )\Vert < 10^{-14}\), i.e., the first-order optimality conditions given by (32) for \(\mu = 0\). This was to determine the selection of problems as well as estimates of the active and inactive sets. Problems with an unconstrained optimal solution or an optimal solution with only degenerate active constraints were not considered. In the first case the proposed approximate solutions are equivalent to the true solution. In the second case it is not clear how to deduce active/inactive sets. A constraint was considered as active if the corresponding variable was closer than \(10^{-10}\) to its bound. An active constraint was deemed degenerate if the corresponding multiplier value was below \(10^{-6}\). An exception was made for problem ODNAMUR, due to its larger size, for which the tolerances above were increased by a factor of \(10^1\) and \(10^2\). Figure 2 shows mean errors with the approximate solutions of Theorem 2 on each CUTEst problem. The results are for three different values of \(\mu\) with 10 different random initial solutions. The figure also shows the measure \(\Vert F_{\mu ^+} \Vert\) for \((x,\lambda )\), \((x_{+}^S, \lambda _{+}^S)\), \((x_{+}^C, \lambda _{+}^C)\) and \((x^{N}_+, \lambda ^{N}_+)\). Simulations with the set estimation heuristic above have shown that the behavior of the approximate solution varies in three different regions depending on \(\mu\). These regions are approximately, \([10^2,10^{-2})\), \([10^{-2}, 10^{-6}]\) and \((10^{-6},0)\). The \(\mu\)-values in Fig. 2 correspond to representative behavior in their respective region. The problems are ordered such that the fraction of estimated active constraints at the solution decreases from left to right.

Fig. 2
figure 2

Mean approximation error and mean progress with measure \(\Vert F_{\mu ^+}\Vert\) with one standard deviation error bars for a collection of CUTEst test problems

The partial approximate solution errors in Fig. 2 are significantly larger compared to those of Fig. 1. This is expected since the optimal solutions of the CUTEst test problems typically do not satisfy strict complementarity. Moreover, with the above strategy for determining the active and inactive sets, the smallest active multipliers may be in the order of \(10^{-5}\). Small active multipliers may cause inaccurate components in the approximate solution of \(\varDelta x_{\mathcal {A}}^N\). Nevertheless, the approximate solutions perform asymptotically similar to the Newton solution in terms of the measure \(\Vert F_{\mu ^+} \Vert\), as shown in Fig. 2. The figure also shows that the approximation error and the progress measure are not particularly sensitive to different initial solutions for smaller \(\mu\), whereas some effects can be seen for larger \(\mu\). The results may be improved and dis-improved depending on how the estimation of the active constraints at the solution is made. We chose to give the results for the strategy described above which gives a potentially significant reduction in the computational iteration cost.

In practice the active constraints at the optimal solution are unknown and have to be estimated as the iterations proceed. The purpose of the following simulations is to give an initial indication of the performance of the proposed approximate solutions within a primal-dual interior-point framework. In particular, we focus on the behavior on problems that do not satisfy the assumptions for which the theoretical results are valid, but also on the robustness in regards to how the set of active constraints is estimated. Algorithm 1 and 2 were considered with the aim of not drowning, or combining, approximation effects with other effects from more advanced features in more sophisticated methods. Algorithm 1 should here be seen as the reference method as it only contains Newton steps.

figure a
figure b

At iteration k of Algorithm 1 and Algorithm 2, \(\alpha ^P_{max,k}\) and \(\alpha ^D_{max,k}\) are the maximum feasible step lengths for \(x_k\) along \(\varDelta x_k\) and \(\lambda _k\) along \(\varDelta \lambda _k\) respectively. Table 1 contains a comparison of Algorithm 1 and two versions of Algorithm 2 which differ in how \(\varDelta x_{\mathcal {A}}\) is computed. The versions are denoted by \(\texttt {aN}^\texttt {S}\) and \(\texttt {aN}^\texttt {C}\) as they use the approximates \(\varDelta x_{\mathcal {A}}^S\) and \(\varDelta x_{\mathcal {A}}^C\) respectively. In Algorithm 2, a constraint was considered active if the distance to its bound was smaller than the value of its multiplier and a threshold \(\tau _{\mathcal {A}}\). The procedure is thus a basic heuristic aimed at determining the non-degenerate active constraints. In essence, the heuristic gives an estimate of set \({\mathcal {A}}_x\), compare to Definition 4 in the theoretical setting. The thresholds of the two versions \({\texttt {aN}}^{\texttt {S}}\) and \({\texttt {aN}}^{\texttt {C}}\) were chosen to \(\tau _{\mathcal {A}} = \mu ^{2/3}\) and the more restrictive \(\tau _{\mathcal {A} }= \mu ^{3/4}\) respectively. This was done to show the effects of two different thresholds \(\tau _{\mathcal {A}}\), but also because numerical experiments have shown that steps with Schur-based approximation are more robust at larger \(\mu\), see Fig. 2. Table 1 gives a comparison of the number of iterations for different values of \(\mu\) as well as the average cardinality of \({\mathcal {I}}_x\), the set of indices corresponding to the estimated inactive components of x, i.e., the size of the systems that has to be solved in every iteration. The symbol “–” denotes the situation when the method failed to converge within 50 iterations for the corresponding \(\mu\). If the method failed at a specific \(\mu\) then Newton steps were performed instead until \(\Vert F_\mu (x,\lambda ) \Vert < \mu\). The order of the problems is the same as in Fig. 2.

Table 1 Comparison of Algorithm 1, (\(\texttt {N}\)), and two versions of Algorithm 2 (\(\texttt {aN}^\texttt {S}\) and \(\texttt {aN}^\texttt {C}\)) on a selection of CUTEst test problems

The results in Table 1 display similar characteristics as the results in Fig. 2. The version associated with the Schur-based approximate solution, \(\texttt {aN}^\texttt {S}\) of Algorithm 2, makes sufficient progress at \(\mu \in [10^2,10^{-2})\), often at a relatively low computational cost. Version \(\texttt {aN}^\texttt {S}\) converges at \(\mu \in [10^{-2}, 10^{-6}]\), however, often while solving relatively large systems due to the difficulty of estimating \({\mathcal {A}}_x\). At \(\mu \in (10^{-6},0)\) the asymptotic behavior becomes more pronounced. Consequently, \(\texttt {aN}^\texttt {S}\) does similar in terms of iteration count to Algorithm 1 while solving systems of reduced size. Version \(\texttt {aN}^\texttt {S}\) converges at all considered \(\mu\) in all problems of Table 1, except on HARKERP2 for larger \(\mu\). The version associated with the complementarity-based approximate solution, \(\texttt {aN}^\texttt {C}\) of Algorithm 2, tend to perform poorly overall for \(\mu \in [10^2,10^{-2})\) and parts of \([10^{-2}, 10^{-6}]\). Although \(\texttt {aN}^\texttt {C}\) converges for large \(\mu\), this is often at the expense of either solving relatively large systems or performing many iterations. In general, \(\texttt {aN}^\texttt {C}\) performs similar to Algorithm 1 for \(\mu\) in the approximate region \([10^{-5},0)\) while solving systems of reduced size. The versions \(\texttt {aN}^\texttt {S}\) and \(\texttt {aN}^\texttt {C}\) have similar asymptotic performance, however in general, \(\texttt {aN}^\texttt {S}\) performs better for larger values of \(\mu\), as also indicated by previous results in Fig. 2.

Finally we show results for the two Newton-like approaches, mentioned in Sect. 3.1, in a simple primal-dual interior-point setting. The approximate intermediate step method and the approximate higher-order method are described in Algorithm 3 and Algorithm 4 respectively. In contrast to Sect. 3.1, here the intermediate iterate is required to be strictly feasible. The total number of iterations required at different intervals of \(\mu\) with the two Newton-like approaches is shown in Fig. 3. The figure shows results for three different choices of \((\varDelta x^E, \varDelta \lambda ^E)\). Moreover, the selection of which components to update was done as the iterations proceeded similarly as above. Note however that it is not necessary to label each constraint and each component of \(\lambda\) as active or inactive in this case, some may be defined as neither. The set of indices corresponding to active constraints, \({\mathcal {A}}_x\), was estimated as above and the sets of indices corresponding to inactive \(\lambda\), \({\mathcal {I}}_l\) and \({\mathcal {I}}_u\), see Definition 4, were estimated analogously. I.e., a multiplier was considered inactive if its value was smaller than the distance of the corresponding x to its feasibility bound and a threshold \(\tau _{\mathcal {I}}\). Table 2 shows how the nonzero components of \((\varDelta x^E, \varDelta \lambda ^E)\) were chosen in the different versions of the approaches as well as the different thresholds \(\tau _{\mathcal {A}}\) and \(\tau _{\mathcal {I}}\).

figure c
figure d
Table 2 Thresholds and nonzero components of the steps to \((x^E, \lambda ^E)\) for the three versions compared in Fig. 3

In Algorithm 3 and Algorithm 4 at iteration k, \(\alpha ^P_{max,k}\), \(\alpha ^D_{max,k}\), \(\alpha ^{E,P}_{max,k}\) and \(\alpha ^{E,D}_{max,k}\) are for the prescribed steps defined analogously as in Algorithm 1.

Fig. 3
figure 3

Number of iterations required at different intervals of \(\mu\) for three versions of the Newton-like approaches and Algorithm 1, (Newton)

The total iteration count for \(\mu \in [10^1,\> 10^{-10}]\) in Fig. 3 shows that the approximate higher-order approach requires the same, or fewer iterations, compared to the approach with the approximate intermediate step. The iteration count for the approaches with the Schur-based approximate is similar to that of Algorithm 1 for this range of \(\mu\). Also here, numerical experiments show indications of three regions. For \(\mu\) in the approximate region \([10^1,10^{-2})\), the versions with the Schur-based approximate yield a potentially reduced number of iterations. Their performance varies in the region of intermediate sized \(\mu\). However, it can not be discarded that this is an effect of the relatively simple set estimation heuristics. On all problems, with the exception of ODNAMUR, in Fig. 3 for \(\mu \in [10^{-5},\> 10^{-10}]\) all versions of both approaches give an iteration count less or equal to Algorithm 1, hence providing potential savings in computation cost. The results may be improved with a flexible set estimation heuristics, e.g., more restrictive thresholds for intermediate sized \(\mu\). However, we chose not to include another layer of detail and instead give the results for a relatively simple setting to obtain an initial evaluation of the potential performance.

5 Conclusions

In this work we have given approximate solutions to systems of linear equations that arise in interior point methods for bound-constrained optimization; in particular, partial approximate solutions, where the asymptotic component error bounds are in the order of \(\mu ^2\), and full approximate solutions with asymptotic error bounds in the order of \(\mu ^2\). Numerical simulations on randomly generated bound-constrained convex quadratic optimization problems, whose minimizers satisfy strict complementarity, have shown that the approximate solutions perform similarly to Newton solutions for sufficiently small \(\mu\). Simulations on convex bound-constrained quadratic problems from the CUTEst test collection, whose minimizers typically do not satisfy strict complementarity, has shown that the predicted asymptotic behavior still occurs, however at significantly smaller values of \(\mu\).

We have performed numerical simulations in a simple yet more realistic setting. Specifically, in a primal-dual interior-point framework where the active and inactive sets were estimated with basic heuristics as the iterations proceeded. These simulations were done on a selection of CUTEst benchmark problems. The results showed that the behavior roughly varied with three regions determined by the size of \(\mu\). The Schur-based approximate solutions showed potential in the region for larger \(\mu\), in the region of intermediate sized \(\mu\) the performance varied, partly due to difficulties in determining the active and inactive sets. For sufficiently small \(\mu\) the approximate solutions showed performance similar to our reference method while solving systems of reduced size.

Finally we showed numerical results for two Newton-like approaches, which include an approximate intermediate step consisting of partial approximate solutions, on the considered CUTEst benchmark problems. The simulations showed similar characteristics as the previous results and also a potential for reducing the overall iteration count of interior-point methods.

The results of this work are meant to contribute to the theoretical and numerical understanding for approximate solutions to systems of linear equations that arise in interior-point methods. We hope that the work can lead to further research on approximate solutions and approximate higher-order methods for optimization problems with linear inequality constraints.