1 Introduction

This work is intended for inequality-constrained quadratic programs on the form

where \(H \in {\mathbb {R}}^{n \times n}\), \(A \in {\mathbb {R}}^{m \times n}\), \(x \in {\mathbb {R}}^{n}\), \(c \in {\mathbb {R}}^{n}\) and \(b \in {\mathbb {R}}^{m}\). We consider primal–dual interior-point methods which means solving or approximately solving a sequence of systems of nonlinear equations. Application of Newton iterations on each system of nonlinear equations gives an unreduced unsymmetric block 3-by-3 system of linear equations, with dimension \(n+2m\), to be solved at each iteration. This system can be put on an equivalent form which contains a reduced symmetric block 2-by-2 system of dimension \(n+m\), or a condensed system of dimension n, see Sect. 2. Both of these systems typically become increasingly ill-conditioned as the iterates converge, whereas the unreduced system may stay well conditioned throughout, see e.g., [1, 2] for analysis of the spectral properties of systems arising in interior-point methods for convex quadratic programs on standard form. The sparsity structure of the unreduced system is maintained in the reduced system. However, the condensed system is typically dense if A contains dense rows. The most computationally expensive part of an interior-point iteration is the solution of the Newton systems that arise, see e.g., [3, 4] for details on the solution of the systems and [5] for a comparison of the solution of unreduced and reduced systems.

We are particularly interested in the convergence towards a solution when the Jacobian of each Newton system is modified so that the system becomes computationally less expensive to solve. In general, there is a trade-off between solving many modified Newton systems, which are computationally less expensive, but typically give lower quality solutions and solving Newton systems which give high quality solutions. We propose an approach where each modified Jacobian is composed of a Jacobian at a previous iteration, whose factorization is assumed to be known, plus one low-rank update matrix per succeeding iteration. A similar strategy have been studied by Gondzio and Sobral [6] in the context of quasi-Newton approaches, where Broyden’s rank-1 updates are performed on the Jacobian approximation. If the proposed quasi-Newton approach is started with an exact Jacobian, then the sparsity pattern of the first two block rows is maintained, however the sparsity pattern of third block row is typically lost. In contrast, we consider low-rank update matrices of variable rank which capture the sparsity pattern of all block rows of the Jacobian. Each modified Jacobian may hence be viewed as a Jacobian at a different point. Consequently, the modified Newton approach may also be interpreted in the framework of previous work on e.g., effects of finite-precision arithmetic, stability, convergence and solution techniques for interior-point methods [1,2,3, 5, 7,8,9,10,11,12,13]. The idea of low-rank update matrices in the context of a primal barrier method for linear programming has been considered by Gonzaga [14].

The updates and the theory are given for the unreduced Jacobian but we also discuss how analogous updates can be made on both reduced and condensed systems. The modified Newton approach is also compatible with certain regularization strategies, see e.g., [15,16,17], although it is outside the scope of this first study.

The work is meant to be an initial study of the structured modified Newton approach. We derive theoretical results in ideal settings to support the choice of update matrix. In addition, we produce numerical results with a basic interior-point algorithm to investigate the practical performance within and beyond the theoretical framework. The numerical simulations were performed on benchmark problems from the repository of convex quadratic programming problems by Maros and Mészáros [18]. We envisage the use of the modified Newton approach as an accelerator to a Newton approach. E.g., when these can be run in parallel for a specific value of the barrier parameter, and the modified Newton approach may utilize factorizations from the Newton approach when it is appropriate.

The manuscript is organized as follows; Sect. 2 contains a brief background to primal–dual interior-point methods and an introduction to the theoretical framework; in Sect. 3 we propose a modified Newton approach and discuss how it relates to some previous work on interior-point methods; Sect. 4 contains a description of the implementation along with two heuristics and a refactorization strategy; in Sect. 5, we give numerical results on convex quadratic programs; finally in Sect. 6 we give some concluding remarks.

1.1 Notation

Throughout, \(\rho (M)\) denotes the spectral radius of a matrix M and \(\vert {\mathcal {S}}\vert \) denotes the cardinality of a set \({\mathcal {S}}\). The notion “\(\cdot \)" is defined as the component-wise multiplication operator, “\(\succ 0\)” as positive definite, and “\(\wedge \)” as the logical and. Quantities associated with Newton iterations will throughout be labeled with “ \(\hat{ }\) ”. Vector subscript and superscript denote component index and iteration index respectively. The only exception is \(e_i\) which denotes the ith unit vector of appropriate dimension. All norms considered are of type 2-norm unless otherwise stated.

2 Background

The theoretical setting is analogous to the setting in a previous work of ours on bound-constrained nonlinear problems [19]. For completeness, we review the background adapted to inequality-constrained quadratic programs in this section. Our interest is focused on the situation as primal–dual interior-point methods converge to a local minimizer \(x^* \in {\mathbb {R}}^n\) with its corresponding multipliers \(\lambda ^* \in {\mathbb {R}}^m\) and slack variables \(s^* \in {\mathbb {R}}^m \). Specifically we assume that the iterates of the method converge to a vector \(\left( x^{*T}, \lambda ^{*T}, s^{*T} \right) ^T \triangleq (x^*,\lambda ^*,s^*)\) that satisfies

$$\begin{aligned} Hx^* + c - A^T \lambda ^* = 0,&\quad \text { (stationarity) } \end{aligned}$$
(1a)
$$\begin{aligned} Ax^*-s^*-b = 0,&\quad \text { (feasibility)} \end{aligned}$$
(1b)
$$\begin{aligned} s^* \ge 0,&\quad \text { (non-negativity of slack variables)} \end{aligned}$$
(1c)
$$\begin{aligned} \lambda ^* \ge 0,&\quad \text { (non-negativity of multipliers)} \end{aligned}$$
(1d)
$$\begin{aligned} s^* \cdot \lambda ^* = 0,&\quad \text { (complementarity)} \end{aligned}$$
(1e)
$$\begin{aligned} Z(x^*)^T H Z(x^*) \succ 0,&\end{aligned}$$
(1f)
$$\begin{aligned} s^* + \lambda ^* > 0,&\quad \text { (strict complementarity)} , \end{aligned}$$
(1g)
$$\begin{aligned} A_{\mathcal {A}}(x^*) \text{ of } \text{ full } \text{ column } \text{ rank },&\quad \text { (regularity)}, \end{aligned}$$
(1h)

where \(A_{\mathcal {A}}(x^\star )\) denotes the Jacobian corresponding to the active constraints and \(Z(x^*)\) denotes a matrix whose columns span the nullspace of \(A_{\mathcal {A}}(x^*)\). First-order necessary conditions for a local minimizer of (IQP) are given by (1a)–(1e). The first-order conditions together with (1f) and (1g) constitute second-order sufficient conditions for a local minimizer of (IQP) [20]. In the theoretical results, we also assume that \((x^*, \lambda ^*, s^*)\) satisfies (1h).

To simplify the notation, we let z denote the triplet \((x, \lambda , s)\). For a given barrier parameter \(\mu \in {\mathbb {R}}\), we are interested in the function \(F_{\mu }:{\mathbb {R}}^{n+2\,m} \rightarrow {\mathbb {R}}^{n+2\,m}\) given by

(2)

where \(S=\text {diag}(s)\), and e is a vector of ones of appropriate size. First-order necessary conditions for a local minimizer of (IQP), (1a)–(1e), are satisfied by a vector z, with \(s\ge 0\) and \(\lambda \ge 0\), that fulfills \(F_{\mu }(z) = 0\) for \(\mu =0\). In interior-point methods \(F_{\mu }(z) = 0\) is solved or approximately solved for a sequence of \(\mu >0\), that approaches zero, while preserving \(s>0\) and \(\lambda > 0\). Application of Newton iterations means solving a sequence of systems of linear equations on the form

$$\begin{aligned} F'(z) \Delta {\hat{z}} = -F_{\mu }(z), \end{aligned}$$
(3)

where \(\Delta {\hat{z}} = (\Delta {\hat{x}}, \Delta {\hat{\lambda }}, \Delta {\hat{s}})\) and \(F': {\mathbb {R}}^{n+2\,m} \rightarrow {\mathbb {R}}^{(n+2\,m) \times (n+2\,m)}\) is the Jacobian of \(F_{\mu }(z)\), defined by

(4)

In consequence it follows that the Jacobian at \(z+\Delta z\) can be written as

$$\begin{aligned} F'(z+\Delta z) = F'(z)+ \Delta F'( \Delta z), \end{aligned}$$

where

(5)

with and \( \Delta S = \text{ diag }(\Delta s)\). The subscript \(\mu \) has been omitted since \(F'\) is independent of the barrier parameter. Under the assumption that is nonsingular the unreduced block 3-by-3 system (3) can be reformulated as the reduced system

(6)

together with . If in addition S is nonsingular, then a Schur complement reduction of in (6) gives the condensed system

(7)

together with . The focus in the manuscript will mainly be on the unreduced block 3-by-3 system (3). However, analogous reductions of the modified Newton system similar to those of (6) and (7) will also be discussed.

To improve efficiency, many methods seek approximate solutions of \(F_\mu (z)= 0\), for each \(\mu \). There are different strategies to update \(\mu \), e.g., dynamically every iteration or to keep \(\mu \) fixed until sufficient decrease of a merit function is achieved. Herein, our model method uses the latter. In particular, our model method is similar to the basic interior-point method of Algorithm 19.1 in Nocedal and Wright [21, Ch. 19, p. 567]. However, termination and reduction of \(\mu \) are based on the merit function \(\phi _\mu (x) = \Vert F_{\mu }(z) \Vert \). Similarly, in the theoretical framework we consider the basic condition \(\Vert F_{\mu } (z) \Vert \le C\mu \), for some constant \(C>0\), see, e.g., [21, Ch. 17, p. 572]. The additional assumption that all vectors z satisfy \(s>0\) and \(\lambda > 0\) is made throughout.

In the remaining part of this section we give some definitions and provide the details for the theoretical framework.

Definition 1

(Order-notation) Let \(\alpha \), \(\gamma \in {\mathbb {R}}\) be two positive related quantities. If there exists a constant \(C_1>0\) such that \(\gamma \ge C_1 \alpha \) for sufficiently small \(\alpha \), then \(\gamma = \Omega (\alpha )\). Similarly, if there exists a constant \(C_2>0\) such that \(\gamma \le C_2 \alpha \) for sufficiently small \(\alpha \), then \(\gamma = {\mathcal {O}}(\alpha )\). If there exist constants \(C_1, C_2 > 0\) such that \(C_1 \alpha \le \gamma \le C_2 \alpha \) for sufficiently small \(\alpha \), then \(\gamma = \Theta (\alpha )\).

Definition 2

(Neighborhood) For a given \(\delta >0\), let the neighborhood around \(z^*\) be defined by \({\mathcal {B}}( z^*, \delta ) = \{ z: \Vert z-z^* \Vert < \delta \}\).

Assumption 1

(Strict local minimizer) The vector \(z^*\) satisfies (1), i.e., second-order sufficient optimality conditions, strict complementarity and regularity hold.

The first of the following two lemmas provides the existence of a neighborhood where the Jacobian is nonsingular. The second lemma gives the existence of a Lipschitz continuous barrier trajectory \(z^\mu \) in the neighborhood where the Jacobian is nonsingular. The results are well known and can be found in e.g., Ortega and Rheinboldt [22]. See also Byrd, Liu and Nocedal [23] for the corresponding results in a setting similar to the one considered here.

Lemma 1

Under Assumption 1 there exists \(\delta >0\) such that \(F'(z)\) is continuous and nonsingular for \(z \in {\mathcal {B}}(z^*, \delta )\) and

$$\begin{aligned} \Vert F'(z) ^{-1}\Vert \le M, \end{aligned}$$

for some constant \(M>0\).

Proof

See [22, p. 46]. \(\square \)

Lemma 2

Let Assumption 1 hold and let \({\mathcal {B}}(z^*, \delta )\) be defined by Lemma 1. Then there exists \({{\hat{\mu }}}>0\) and a Lipschitz continuous function \(z^{\mu }: (0, \ {{\hat{\mu }}}] \rightarrow {\mathcal {B}}(z^*, \delta )\) that satisfies \(F_{\mu }(z^{\mu }) = 0\) and

$$\begin{aligned} \left\| z^{\mu } - z^* \right\| \le C_3 \mu , \end{aligned}$$

where \(C_3 = \sup _{z\in {\mathcal {B}}(z^*, \delta )} \Vert F'(z) ^{-1}\frac{ \partial F_{\mu } (z)}{\partial \mu } \Vert \).

Proof

The result follows from the implicit function theorem, see e.g., [22, p. 128]. \(\square \)

The following lemma provides a relation between the distance of vectors z to the barrier trajectory and the quantity \(\Vert F_\mu (z)\Vert \), when the distance is sufficiently small. A corresponding result is also given by Byrd et al. [23].

Lemma 3

Under Assumption 1, let \({\mathcal {B}}(z^*, \delta )\) and \({{\hat{\mu }}}\) be defined by Lemmas 1 and 2 respectively. For \(0<\mu \le {{\hat{\mu }}}\) and z sufficiently close to \(z^{\mu } \in {\mathcal {B}}(z^*, \delta )\) there exist constants \(C_4, C_5 > 0\) such that

$$\begin{aligned} C_4 \left\| z- z^{\mu } \right\| \le \Vert F_{\mu }(z) \Vert \le C_5 \left\| z- z^\mu \right\| . \end{aligned}$$

Proof

See [23, p. 43]. \(\square \)

The next lemma provides a bound on the Newton direction, \(\Delta {\hat{z}}\), for z sufficiently close to the barrier trajectory.

Lemma 4

Under Assumption 1, let \({\mathcal {B}}\left( z^*, \delta \right) \) and \({{\hat{\mu }}}\) be defined by Lemmas 1 and 2 respectively. For \(0< \mu \le {{\hat{\mu }}}\) and \(z \in {\mathcal {B}}(z^*, \delta )\), let \(\Delta {\hat{z}}\) be the solution of (3) with \(\mu ^+ = \sigma \mu \), where \(0< \sigma < 1\). If z is sufficiently close to \(z^{\mu } \in {\mathcal {B}}(z^*, \delta )\) such that \(\Vert F_{\mu } (z) \Vert = {\mathcal {O}}(\mu )\) then

$$\begin{aligned} \left\| \Delta {\hat{z}} \right\| = {\mathcal {O}}(\mu ). \end{aligned}$$

Proof

Analogous to [19, Lemma 5]. \(\square \)

3 A structured modified Newton approach

In order to describe the approach and its ideas, we first consider a simple setting with one iteration. For a given \(\mu > 0\), consider the interior-point iterate \(z^+ \in {\mathcal {B}}(z^*, \delta )\) defined by \(z^+ = z + \Delta {\hat{z}}\), where \(z \in {\mathcal {B}}(z^*, \delta )\) and \(\Delta {\hat{z}}\) satisfies (3) with \(\mu ^+ = \sigma \mu \), \(0<\sigma <1\). Since \(\Delta {\hat{z}}\) has been computed with (3) we assume that a factorization of \(F'(z)\) is known. Instead of performing another Newton step \(\Delta {\hat{z}}^+\) at \(z^+\) for some \(\mu ^{++} = \sigma ^+ \mu ^+\), \(0 < \sigma ^+ \le 1\), which requires the solution of (3) with \(\mu ^{++}\) and \(z^+\), we would like to compute an approximate solution, which is computationally less expensive, from

$$\begin{aligned} B^+ \Delta z^+ = - F_{\mu ^{++}}(z^+), \quad \text{ where } B^+ = F'(z) + U, \end{aligned}$$
(8)

and U is some low-rank update matrix. A natural question is then how to choose the update matrix U. Gondzio and Sobral [6] consider rank-1 update matrices such that the distance, in Frobenius norm, between \(B^+\) and the previous Jacobian approximation is minimized, when \(B^+\) in addition satisfies the secant condition. They show that the sparsity pattern of the first two block rows is maintained, however the sparsity pattern of the third row block row is typically lost. In contrast, our strategy is, for a given rank restriction r on U, to choose U such that the distance, in both 2-norm and Frobenius norm, between \(B^+\) and the actual Jacobian \(F'(z^+)\) is minimized. The sparsity of the Jacobian is maintained, however there is no requirement for \(B^+\) to fulfill the secant condition.

To further support the choice of update matrix we give some additional theoretical results. First, we show that there is a region where the modified Newton approach produces small errors with respect to the Newton direction. In particular, a region that depends on \(\mu \) where the modified Newton direction approaches the Newton direction as \(\mu \rightarrow 0\). Later, we also discuss general errors, descent directions with respect to our merit function \(\phi _\mu (z)\) and conditions for local convergence.

The error of using the modified Jacobian \(B^+\) of (8) is

$$\begin{aligned} E^+= F'(z^+) - B^+ = \Delta F'(\Delta {\hat{z}}) - U. \end{aligned}$$
(9)

Given a rank restriction r, \(0\le r \le m\), on U, the Eckart–Young–Mirsky theorem gives the update matrix U that minimizes the Jacobian error \(E^+\), in terms of the measure \(\Vert . \Vert _2\) and \(\Vert . \Vert _F\). In Proposition 1 below, we give an expression for U and show that the resulting modified Jacobian may be viewed as a Jacobian evaluated at a point \({\bar{z}}^+ = ( {\bar{x}}^+, {\bar{\lambda }}^+, {\bar{s}}^+)\).

Proposition 1

For \(z = (x, \lambda , s)\) and \(\Delta z = ( \Delta x, \Delta \lambda , \Delta s)\), let \(F'(z)\) and \(\Delta F'( \Delta z)\) be defined by (4) and (5) respectively, and let \(z^+=z+\Delta z\). For a given rank r, \( 0 \le r \le m\), let \({\mathcal {U}}_r\) be the set of indices corresponding to the r largest quantities of \(\sqrt{(\Delta \lambda _i)^2 + (\Delta s_i)^2}\), \(i=1,\dots ,m\). The optimal solution of

where \(\Vert . \Vert \) is either of type \(\Vert .\Vert _2\) or \(\Vert . \Vert _F\), is

$$\begin{aligned} U_* = \sum _{i \in {\mathcal {U}}_r} e_{n+m+i} \left( ( s_i^+ - s_i ) e_{n+i} + (\lambda _i^+ - \lambda _i) e_{m+n+i} \right) ^T. \end{aligned}$$

In consequence, it holds that

$$\begin{aligned}B^+ = F'({\bar{z}}^+), \text{ with } ({\bar{x}}^+_{i}, {\bar{\lambda }}^+_{i}, {\bar{s}}^+_{i}) = {\left\{ \begin{array}{ll} (x^{+}_{i}, \lambda ^{+}_{i}, s^{+}_{i}) &{} i \in {\mathcal {U}}_r, \\ (x^{+}_{i}, \lambda _i, s_i) &{} i \in \{1,\dots , m\} \setminus {\mathcal {U}}_r. \end{array}\right. } \end{aligned}$$

Proof

Note that \(\Vert F'(z^+) - B^+ \Vert = \Vert \Delta F'(\Delta z) - U \Vert = \Vert E^+\Vert \) by (9). The result then follows from the Eckart–Young–Mirsky theorem, stated in Theorem 5, together with Lemma 6. The last part of the proposition follows directly from performing the update. \(\square \)

Proposition 1 shows that each rank-1 term of the sum in \(U_*\) added to \(F'(z)\) is equivalent to the update of one component-pair, \((\lambda , s)\), in the and S blocks of the Jacobian. The essence is that adding the rank-r update matrix \(U_*\) to \(F'(z)\) is equivalent to updating pairs \((\lambda _i, s_i)\) to \(\left( \lambda ^{+}_{i}, s^{+}_{i} \right) \), \(i \in {\mathcal {U}}_r\), and that the modified Jacobian at \(z^+\) may be viewed as a Jacobian evaluated at \({\bar{z}}^+\). In particular, \(r=m\) gives \({\bar{z}}^+ = z^+\) and \(B^+ = F'(z^+)\).

Before we give the analogous result of Proposition 1 in a more general framework, we show that there exists a region where the modified Newton approach may be started without causing large errors in the search direction. In particular, we give a bound on the search direction error \( \Vert \Delta {\hat{z}}^+ - \Delta z^+ \Vert \), where \(\Delta z^+\) satisfies (8) with update matrix \(U_*\) of rank r as given in Proposition 1. In the derivation, the inverse of \(B^+\) is expressed as a Neumann series which requires \(\rho ( F'(z^+) ^{-1}E^+) < 1\). We first show that among U such that , \(U_*\) is sound in regard to the reduction of an upper bound of \(\rho (F'(z^+) ^{-1}E^+)\). Thereafter, in Lemma 5 we show that, for iterates z sufficiently close to the barrier trajectory, the quantity \(\Vert F'(z^+) ^{-1}E^+ \Vert \), and consequently also \(\rho (F'(z^+) ^{-1}E^+)\), is bounded above by a constant times \(\mu \). This gives the existence of a region, that depends on the barrier parameter, where \(\rho ( F'(z^+) ^{-1}E^+ ) < 1\).

By assumption \(z^+ \in {\mathcal {B}}(z^*, \delta )\), hence by Lemma 1 there exists a constant \(M>0\) such that \(\Vert F'(z^+) ^{-1}\Vert \le M\), and it holds that

$$\begin{aligned} \rho ( F'(z^+) ^{-1}E^+ ) \le \Vert F'(z^+) ^{-1}E^+ \Vert \le M \sigma _{max}(\Delta F'(\Delta {\hat{z}} ) - U). \end{aligned}$$
(10)

Lemma 6 shows that the singular values of \(\Delta F'(\Delta {\hat{z}})\) are given by \(\sqrt{(\Delta {\hat{\lambda }}_i)^2 + (\Delta {\hat{s}}_i)^2}\), \(i=1,\dots ,m\). The largest reduction of the upper bound in (10), among U such that , is achieved with the rank-r update matrix \(U_*\) of Proposition 1, which gives

$$\begin{aligned} \rho ( F'(z^+) ^{-1}E^+ )&\le \Vert F'(z^+) ^{-1}E^+ \Vert \le M \max _{i=r+1,\dots , m} \sqrt{(\Delta {\hat{\lambda }}_i)^2 + (\Delta {\hat{s}}_i)^2} \nonumber \\&= M \sqrt{(\Delta {\hat{\lambda }}_{r+1})^2 +( \Delta {\hat{s}}_{r+1})^2}, \end{aligned}$$
(11)

where the indices \(i=1,\dots ,m\) are ordered such that \( \sqrt{(\Delta {\hat{\lambda }}_i)^2 + (\Delta {\hat{s}}_i)^2}\) are in descending order. Thus supporting the choice of update matrix in regard to the reduction of the upper bound of the spectral radius, and 2-norm, of \(F'(z^+)^{-1}E^+\).

Lemma 5

Under Assumption 1, let \({\mathcal {B}}\left( z^*, \delta \right) \) and \({{\hat{\mu }}}\) be defined by Lemmas 1 and 2 respectively. For \(0<\mu \le {{\hat{\mu }}}\) and \(z \in {\mathcal {B}}(z^*, \delta )\), define \(z^+= z + \Delta {\hat{z}}\), where \(\Delta {\hat{z}}\) is the solution of (3) with \(\mu ^+ = \sigma \mu \), \(0< \sigma < 1\). Moreover, let \(E^+= \Delta F'(\Delta {\hat{z}}) -U_*\) with \(U_*\) defined as the rank-r, \(0 \le r<m\), update matrix \(U_*\) of Proposition 1. If z is sufficiently close to \(z^{\mu } \in {\mathcal {B}}(z^*, \delta )\) such that \(\Vert F_{\mu } (z) \Vert = {\mathcal {O}}(\mu )\) and \(z^+ \in {\mathcal {B}}(z^*, \delta )\), then

$$\begin{aligned} \Vert F'(z^+) ^{-1}E^+ \Vert \le M C^{(r+1)} \mu , \end{aligned}$$
(12)

where M is defined by Lemma 1 and \(C^{(r+1)} > 0\) is a constant such that \( \sqrt{(\Delta {\hat{\lambda }}_{r+1})^2 +( \Delta {\hat{s}}_{r+1})^2} \le C^{(r+1)} \mu \) with \( \sqrt{(\Delta {\hat{\lambda }}_i)^2 + (\Delta {\hat{s}}_i)^2}\), \(i=1,\dots ,m\), ordered in descending order. In addition, \(C^{(r+1)}\) decreases as r increases.

Proof

The point z and direction \(\Delta {\hat{z}}\) satisfy the conditions of Lemma 4, hence there exists a constant \(C>0\) such that \(\Vert \Delta {\hat{z}} \Vert \le C \mu \). In consequence, there exist constants \(C^{(i)}>0\), \(i=1,\dots , m\), such that

$$\begin{aligned} \sqrt{(\Delta {\hat{\lambda }}_i)^2 + (\Delta {\hat{s}}_i)^2} \le C^{(i)} \mu , \quad i=1,\dots ,m. \end{aligned}$$
(13)

If in addition, \(\sqrt{(\Delta {\hat{\lambda }}_i)^2 + (\Delta {\hat{s}}_i)^2}\), \(i=1,\dots ,m\), are ordered in descending order, then \(C^{(i)}\), \(i=1,\dots ,m\), may be chosen such that \(C^{(1)} \ge \cdots \ge C^{(m)}\). A combination of (11) and (13) gives the result. \(\square \)

The bound in (12) of Lemma 5 shows that \(\Vert F'(z^+) ^{-1}E^+ \Vert \), and by (10) also \(\rho ( F'(z) ^{-1}E )\), will be less than unity for sufficiently small \(\mu \). Indeed, this is also true when U is a zero matrix, i.e., for a simplified Newton strategy. The derivation of the result of Lemma 5 utilizes that, for a given rank restriction r, \(U_*\) of Proposition 1 is the update matrix that gives the tightest bound in (11). In consequence, \(U_*\) is also the rank-r update matrix that gives the tightest upper bound in the result of the lemma, with our analysis. Moreover, \(C^{(1)} \ge \cdots \ge C^{(m)}\), and consequently \(C^{(r+1)}\) decreases with increasing r. In addition, (12) provides an explicit sufficient condition on \(\mu \), depending on M, or \(\Vert F'(z^+)^{-1}\Vert \), and \(C^{(r+1)}\), for \(\rho ( F'(z^+) ^{-1}E^+ ) < 1\).

Next we give a bound on the search direction error at \(z^+\) with the modified Newton equation (8) relative to the Newton equation (3) with \(\mu ^{++}\). It is shown that the error is bounded by a constant times \(\mu ^3\) when \(\mu ^{++} = \mu ^+\) and a constant times \(\mu ^2\) when \(\mu ^{++} < \mu ^+\). As may be anticipated, the bound is tighter when \(\mu \) is not decreased in the corresponding iteration.

Theorem 2

Under Assumption 1, let \({\mathcal {B}}\left( z^*, \delta \right) \), M, \(C_3\), \({{\hat{\mu }}}\), be defined by Lemmas 1 and 2. For \(0<\mu \le {{\hat{\mu }}}\), assume that \(z \in {\mathcal {B}}(z^*, \delta )\) is sufficiently close to \(z^{\mu } \in {\mathcal {B}}(z^*, \delta )\) such that \(\Vert F_{\mu } (z) \Vert = {\mathcal {O}}(\mu )\). Define \(z^+= z + \Delta {\hat{z}}\) where \(\Delta {\hat{z}}\) is the solution of (3) with \(\mu ^+ = \sigma \mu \), \(0< \sigma < 1\). Moreover, let \(\Delta z^+\) be defined by (8) with \(\mu ^{++} = \sigma ^+ \mu ^+\), \(0 <\sigma ^+ \le 1\), and U as the rank-r, \(0 \le r<m\), update matrix \(U_*\) of Proposition 1. If \(z^+ \in {\mathcal {B}}(z^*, \delta )\), then there exists \({{\bar{\mu }}}\), \(0<{{\bar{\mu }}}\le {{\hat{\mu }}}\), such that for \(0<\mu \le {{\bar{\mu }}}\)

$$\begin{aligned} \left\| \Delta {\hat{z}}^+ - \Delta z^+ \right\| \le \frac{M C^{(r+1)}}{1-M C^{(r+1)}\mu } \big ( C_3(1 - \sigma ^+) \sigma \mu ^2 + {\mathcal {O}}(\mu ^3) \big ), \end{aligned}$$
(14)

where \(\Delta {\hat{z}}^+\) is the Newton step at \( z^+\), given by \(F'(z^+) \Delta {\hat{z}}^+= -F_{\mu ^{++}}(z^+)\), and \(C^{(r+1)} > 0\) is such that \( \sqrt{(\Delta {\hat{\lambda }}_{r+1})^2 +( \Delta {\hat{s}}_{r+1})^2} \le C^{(r+1)} \mu \) with \( \sqrt{(\Delta {\hat{\lambda }}_i)^2 + (\Delta {\hat{s}}_i)^2}\), \(i=1,\dots ,m\), ordered in descending order. In addition, \(C^{(r+1)}\) decreases as r increases.

Proof

See Appendix A. \(\square \)

Similarly as for Lemma 5, the result of Theorem 2 is also valid for U as a zero matrix. The essence is again that, among update matrices U such that , the rank-r update matrix \(U_*\) of Proposition 1 is the matrix that provides the tightest bound on (14) with our analysis. As mentioned, \(U_*\) is also the matrix that gives the upper bound in (12), and in addition, \(C^{(r+1)}\) decreases with increasing r. Consequently, the bound in (14) decreases with increasing r, and larger values of r may thus also increase the region where the result of Theorem 2 is valid, i.e., the region where the proposed modified Newton approach may be a viable alternative.

3.1 At a general iteration k

In this section we give a result analogous to Proposition 1, at iteration k, \(k \ge 1\), in a damped modified Newton setting. Consider the sequence \(\{z^i\}_{i=0}^k\) generated by \(z^{i+1} = z^{i} + \alpha ^i \Delta z^i\), \(i=0,\dots , k-1\), where \(\alpha ^i\) is the step size. Suppose that each \(\Delta z^i\) satisfies

$$\begin{aligned} B^{i} \Delta z^{i} = - F_{\mu ^i}(z^i), \quad \text{ with } B^{i} = {\left\{ \begin{array}{ll}F'(z^0) &{} i=0, \\ B^{i-1} + U^i &{} i = 1,\dots , k-1,\end{array}\right. } \end{aligned}$$
(15)

for some \(\mu ^i>0\) and update matrix \(U^{i}\) of rank \(r^i\). If at \(k=1\), for a given rank \(r^k\), the update matrix is chosen as the optimal solution of the optimization problem in Proposition 1, then \(B^1 = F'({\bar{z}}^1)\), for some \({\bar{z}}^1\). Inductively, at an iteration k, \(k\ge 1\), for a given \({\bar{z}}^{k-1}\) and rank \(r^k\), \(0\le r^k \le m\), we wish to choose \(U^k\) as the optimal solution of

(16)

where \(\Vert . \Vert \) is either of type \(\Vert . \Vert _2\) or \(\Vert . \Vert _F\). The optimal solution of (16), the update of \({\bar{z}}^k\) from \({{\bar{z}}}^{k-1}\), and the resulting optimal \(B^k\) are shown in Proposition 3. This is analogous to the update from \(z^0\) to \({{\bar{z}}}^1\) given in Proposition 1. The essence is that the rank-\(r^k\) update matrix, defined by the solution of (16), is equivalent to updating information corresponding to the \(r^k\) largest quantities \(\sqrt{ (\lambda ^k_i - {\bar{\lambda }}_i^{k-1})^2 + (s^k_i - {\bar{s}}^{k-1}_i )^2}\). In essence, the \(r^k\) largest deviations from the Newton step are corrected, and \(r^k=m\) gives \(B^k=F'(z^k)\).

Proposition 3

At iteration k, \(k \ge 1\), for given vectors \(z^k\), \(\bar{z}^{k-1}\) and rank \(r^k\), \(0 {\le } r^k {\le } m\), consider optimization problem (16). The optimal solution of (16) is

$$\begin{aligned} U^k_* = \sum _{i \in {\mathcal {U}}_{r^k}} e_{n+m+i} \left( (s^k_i - {\bar{s}}^{k-1}_i ) e_{n+i} + (\lambda ^k_i - {\bar{\lambda }}_i^{k-1}) e_{m+n+i} \right) ^T, \end{aligned}$$

where \({\mathcal {U}}_{r^k}\) is the set of indices corresponding to the \(r^k\) largest quantities of \(\sqrt{ (\lambda ^k_i - {\bar{\lambda }}_i^{k-1})^2 + (s^k_i - {\bar{s}}^{k-1}_i )^2}\), \(i=1,\dots ,m\). In consequence, it holds that

$$\begin{aligned} B^{k} = F'({\bar{z}}^{k}), \text{ with } {\bar{z}}^k =({\bar{x}}^k,{\bar{\lambda }}^k,{\bar{s}}^k) = {\left\{ \begin{array}{ll} (x^{k}_{i}, \lambda ^{k}_{i}, s^{k}_{i}) &{} i \in {\mathcal {U}}_{r^k}, \\ (x^{k}_{i}, {\bar{\lambda }}^{k-1}_i, {\bar{s}}^{k-1}_i) &{} i \in \{1,\dots , m\} \setminus {\mathcal {U}}_{r^k}. \end{array}\right. } \end{aligned}$$

Proof

The proof is analogous to that of Proposition 1 with \(z={\bar{z}}^{k-1}\), \(z^+= z^k\) and \(\Delta z = z^k - {\bar{z}}^{k-1}\). \(\square \)

In the described approach, \(U_*^k\) of Proposition 3 gives \(B^k = F'( {\bar{z}}^k)\) for some \({\bar{z}}^k\). A direct consequence is that iterates which become primal–dual feasible, i.e., satisfies the first two block equations of (3), will remain so. Moreover, at iteration k, the error of using the modified Jacobian is

$$\begin{aligned} E^k = F'(z^k) - B^k = F'(z^k) - (B^{k-1}+ U^k_*) = F'(z^k) - F'( {\bar{z}}^k), \end{aligned}$$
(17)

and it holds that

$$\begin{aligned} \Vert E^k \Vert _{2}&\le \Vert E^k \Vert _F {=} \sqrt{\sum _{i=1}^m \sum _{j=1}^n \vert E^k_{ij} \vert ^2}{=} \sqrt{\sum _{i=1}^m \left( (s_i^k - {\bar{s}}_i^k)^2 {+} (\lambda _i^k - {\bar{\lambda }}_i^k)^2 \right) } {=} \Vert z^k {-} {\bar{z}}^k \Vert _2, \end{aligned}$$

since \(x^k = {\bar{x}}^k\). Hence it follows that

$$\begin{aligned} \Vert F'(z^k) - F'( {\bar{z}}^k) \Vert _{2} \le \Vert F'(z^k) - F'( {\bar{z}}^k) \Vert _{F} = \Vert z^k - {\bar{z}}^k \Vert _2. \end{aligned}$$
(18)

In fact, for any z and \({\tilde{z}}\) it holds that

$$\begin{aligned} \Vert F'(z) - F'({\tilde{z}}) \Vert _{2} \le \Vert F'(z) - F'({\tilde{z}}) \Vert _{F} \le \Vert z -{\tilde{z}} \Vert _2, \end{aligned}$$

which implies that the Lipschitz constant of \(F'\) may be chosen as one. Recall that, among the update matrices \(U^k\) such that , \(U_*^k\) of Proposition 3 is the update matrix that minimizes \(\Vert F'(z^k) - F'( {\bar{z}}^k) \Vert _{F}\). Consequently, by (18), \(U_*^k\) is also the update matrix that minimizes \(\Vert z^k - {\bar{z}}^k \Vert _2\).

Next we show that the update matrix \(U_*^k\) of Proposition 3 is sound with respect to reducing an upper bound of the relative error of the search direction \(\Delta z^k\). At iteration k, \(k\ge 1\), \(\Delta z^k\) satisfies (15) which equivalently can be written as

$$\begin{aligned} (F'(z^k) - E^k) ( \Delta {\hat{z}}^k - \epsilon ^k) = - F_{\mu ^k} (z^k), \quad \epsilon ^k = \Delta {\hat{z}}^k-\Delta z^k, \end{aligned}$$

where \(\Delta {\hat{z}}^k\) satisfies \(F'(z^k) \Delta {\hat{z}}^k = - F_{\mu ^k} (z^k)\). Standard perturbation analysis gives

$$\begin{aligned} \frac{\Vert \Delta {\hat{z}}^k - \Delta z^k \Vert }{ \Vert \Delta z^k \Vert } \le \Vert F'(z^k) ^{-1}\Vert \Vert E^k \Vert . \end{aligned}$$
(19)

Given the restrictions on the update, \(U_*^k\) of Proposition 3 is the update that minimizes \(\Vert E^k \Vert \), and in consequence gives the largest reduction in the upper bound on the relative error (19).

3.2 Convergence

In this section we discuss convergence towards the barrier trajectory, i.e., convergence of the inner loop of Algorithm 1. We first give a condition for descent direction with respect to our merit function, as our setting is compatible with linesearch strategies. Thereafter, results are given in a setting where unit steps are assumed to be tractable. We give conditions on \(B^k\), and thus \(r^k\), so that the modified Newton approach converges locally. The theoretical setting is here widened slightly from the previous sections, in that we wish to quantify the effects of the modified Newton approach also for larger values of \(\mu \). For a given \(\mu >0\), iterate z and vector \(\Delta z\), consider the merit function and the univariate function

$$\begin{aligned} \phi _\mu (z) = \Vert F_\mu (z) \Vert , \text { and } \varphi _\mu (\alpha ) = \Vert F_\mu (z+\alpha \Delta z) \Vert , \text { respectively.} \end{aligned}$$

The directional derivative is then

$$\begin{aligned} \nabla \phi _\mu (z)^T \Delta z = \varphi _\mu '(\alpha ) \vert _{\alpha =0} = \frac{d }{d \alpha } \Vert F_\mu (z+\alpha \Delta z) \Vert |_{\alpha =0} = \frac{\Delta z^T F'(z)^T F_\mu (z)}{ \Vert F_\mu (z) \Vert }. \end{aligned}$$

At iteration k, \(k\ge 0\), \(z^k\) and \(\Delta z^k\) in the modified Newton approach satisfies \(F'({\bar{z}}^k) \Delta z^k = -F_\mu (z^k)\), for some \({\bar{z}}^k\). If \(F'({\bar{z}}^k\)) is nonsingular then the directional derivative is

$$\begin{aligned} \nabla \phi _\mu (z^k)^T \Delta z^k&= - \frac{1}{ \Vert F_\mu (z^k) \Vert } F_\mu (z^k)^T F'(z^k) F'({\bar{z}}^k) ^{-1}F_\mu (z^k) \nonumber \\&= - \Vert F_\mu (z^k) \Vert - \frac{1}{ \Vert F_\mu (z^k) \Vert } F_\mu (z^k)^T E^k F'({\bar{z}}^k) ^{-1}F_\mu (z^k), \end{aligned}$$
(20)

with \(E^k\) as in (17). From (20) it follows that \(\Delta z^k\) is a descent direction with respect to \(\phi _\mu \) if

$$\begin{aligned} \Vert F_\mu (z^k) \Vert ^2 > - F_\mu (z^k)^T E^k F'({\bar{z}}^k) ^{-1}F_\mu (z^k). \end{aligned}$$
(21)

Under the restrictions of the update, the rank-\(r^k\) matrix \(U^k_*\) of Proposition 3 is chosen such that \(\Vert E^k \Vert \) is minimized. In addition, \(\Vert E^k \Vert = 0\) for \(r^k = m\). A descent direction can hence always be ensured for \(r^k\) sufficiently large. Moreover, in our theoretical setting, Lemma 5 gives the existence of a region, that depends on \(\mu \), where the modified Newton approach may be initiated so that \(\Vert E^k \Vert \) is sufficiently small for (21) to hold, even when \(r^k=0\). However, the essence is again that \(U_*^k\) gives the largest reduction of \(\Vert E^k \Vert \), among \(U^k\) such that .

Next we give a result on local convergence and discuss the modified Newton approach in the framework of inexact Newton methods. Note that the local convergence result in the proposition shows balance between quadratic rate of convergence, which would follow if \(B^k=F'(z^k)\), i.e., \(C_6=0\), and \(\Vert F'(z^k)^{-1}\Vert \le C_7\) for all k, and linear rate of convergence, which would follow when \(B^k\) differs from \(F'(z^k)\).

Proposition 4

For a given \(\mu >0\), assume that \(z^\mu \) exists. At an iteration \(k_0\), consider the sequence of iterates generated by \(z^{k+1} = z^k + \Delta z^k\), \(k = k_0, k_0 +1, \dots ,\) where each \(\Delta z^k\) satisfies (15), with \(\mu ^k = \mu \) and update matrix of rank-\(r^k\), \(0\le r^k \le m\), given by \(U^k_*\) of Proposition 3. Assume that at each iteration k, \((B^k) ^{-1}\) exists and that \(r^k\) is chosen such that for all k, \(\Vert (B^k)^{-1}\Vert \Vert F'(z^k) - B^k \Vert \le C_6\) for some \(C_6<1\). If in addition, there is a \(C_7\) such that for all k, \(\Vert (B^k)^{-1}\Vert \le C_7\), then it holds that

$$\begin{aligned} \Vert z^{k+1}-z^{\mu } \Vert \le \frac{C_7}{2} \Vert z^k - z^\mu \Vert ^2 + C_6 \Vert z^k - z^\mu \Vert , \end{aligned}$$
(22)

so that \(z^k\) converges to \(z^\mu \) if

$$\begin{aligned} \Vert z^{k_0}-z^\mu \Vert \le \frac{1-C_6}{C_7}. \end{aligned}$$
(23)

Proof

Under the conditions of the proposition, \(\Delta z^k\) and \(z^k\) satisfy \(B^k \Delta z^k = - F_\mu (z^k)\). At iteration \(k+1\) the error may be written as

$$\begin{aligned} z^{k+1}-z^{\mu }&= z^k - (B^k) ^{-1}F_\mu (z^k) - z^{\mu } \\&= (B^k) ^{-1}\big ( F_\mu (z^\mu ) - F_\mu (z^k)-F'(z^k)(z^\mu - z^k) \big ) \\&\quad - (B^k) ^{-1}\big ( B^k - F'(z^k) \big ) ( z^{\mu } - z^k), \end{aligned}$$

where \((B^k) ^{-1}F'(z^k) ( z^{\mu } - z^k)\) has been added and subtracted in the second equality. Taking 2-norm while considering Lipschitz continuity of \(F'\), see end of Sect. 3.1, and norm inequalities give

$$\begin{aligned} \Vert z^{k+1}-z^{\mu } \Vert \le \frac{\Vert (B^k) ^{-1}\Vert }{2} \Vert z^k - z^\mu \Vert ^2 + \Vert (B^k) ^{-1}\Vert \Vert B^k - F'(z^k) \Vert \Vert z^k - z^\mu \Vert . \end{aligned}$$
(24)

Insertion of the assumed \(C_6\) and \(C_7\) into (24) gives (22). Finally, if \(\Vert z^{k}-z^\mu \Vert \le (1-C_6)/C_7\), then (22) gives

$$\begin{aligned} \Vert z^{k+1}-z^{\mu } \Vert\le & {} \frac{C_7}{2} \Vert z^k - z^\mu \Vert ^2 + C_6 \Vert z^k - z^\mu \Vert \le \left( \frac{1-C_6}{2} +C_6\right) \Vert z^k - z^\mu \Vert \\\le & {} \frac{1+C_6}{2} \Vert z^k - z^\mu \Vert , \end{aligned}$$

so that \(z^{k_0}\) satisfying (23) converges to \(z^\mu \), as \(C_6<1\). \(\square \)

Conditions for local convergence may also be obtained when the modified Newton approach is interpreted in the context of inexact Newton methods [24]. For a given \(\mu >0\), such steps may be viewed on the form

$$\begin{aligned} F'(z^k) \Delta z^k = - F_{\mu }(z^k) + q^k, \text{ where } \Vert q^k \Vert / \Vert F_{\mu }(z^k) \Vert \le \eta ^k. \end{aligned}$$
(25)

The sequence of iterates \(z^k + \Delta z^k\) converges to \(z^\mu \), with at least linear rate, for \(z^0\) sufficiently close to \(z^\mu \) if \(\eta ^k < 1\) uniformly. Given that the iterates converge, the convergence is superlinear if and only if \( \Vert q^k \Vert = o(\Vert F_{\mu }(z^k) \Vert ), \text{ as } k \rightarrow \infty \).

The modified Newton approach can be put onto the form of (25) under the assumption that \(I-E^k F'(z^{k})^{-1}\) is nonsingular, where \(E^k\) is given by (17). Nonsingularity may be ensured with an update matrix \(U^k_*\) of Proposition 3 of sufficiently large rank \(r^k\), \(0\le r^k \le m\), or by starting the modified Newton approach when \(\mu \) is sufficiently small, as shown by Lemma 5. A straightforward calculation shows that (15) at iteration k, with \(\mu ^k = \mu \), can be written as

$$\begin{aligned} F'(z^k) \Delta z^k = - F_{\mu }(z^k) + \big ( I - (I-E^k F'(z^k) ^{-1})^{-1}\big ) F_{\mu }(z^k). \end{aligned}$$
(26)

Identification of terms in (25) and (26) gives

$$\begin{aligned} q^k = \big ( I - (I-E^k F'(z^k) ^{-1})^{-1}\big ) F_{\mu }(z^k). \end{aligned}$$

If in addition \(\Vert E^k F' (z^k)^{-1}\Vert < 1\), then \(q^k = \sum _{j=1}^\infty (E^k F'(z^k)^{-1})^j F_{\mu }(z^k)\). Norm inequalities and standard geometric series results give

$$\begin{aligned} \Vert q^k \Vert \le \frac{ \Vert E^k F'(z^k) ^{-1}\Vert }{1 - \Vert E^k F'(z^k) ^{-1}\Vert } \Vert F_\mu (z^k) \Vert . \end{aligned}$$

Local convergence towards the barrier trajectory follows if, at each iteration k, \(r^k\) of Proposition 3 is chosen such that \( \Vert E^k F'(z^k) ^{-1}\Vert / ( 1 - \Vert E^k F'(z^k) ^{-1}\Vert ) < 1\) uniformly. Moreover, the convergence is superlinear if in addition \(r^k\) is chosen such that \( \Vert E^k F'(z^k) ^{-1}\Vert / ( 1 - \Vert E^k F'(z^k) ^{-1}\Vert ) \rightarrow 0 \) as \(k\rightarrow \infty \).

Similarly, the modified Newton approach may also be viewed in the framework of inexact interior-point methods in order to study conditions for global convergence, see e.g., [25,26,27]. However, our analysis have only given further technical conditions which are outside the scope of this initial work. Instead, we have chosen to limit our focus to basic supporting results and to the study of practical performance with update matrices of low-rank at larger values of \(\mu \).

3.3 Reduced systems

The ideas presented so far have been on the unreduced unsymmetric block 3-by-3 system (3). In this section we describe the corresponding reduced and condensed system which are similar to those of (6) and (7).

In essence, the system of linear equations to be solved for each iterate z in the modified Newton approach takes the form

$$\begin{aligned} F'({\bar{z}}) \Delta z = - F_\mu (z), \end{aligned}$$
(27)

for some \({\bar{z}}\). System (27) may be reformulated on the reduced form

(28)

together with . Schur complement reduction of in (28) gives the condensed form

(29)

together with . As mentioned, the proposed rank-r update matrix of Proposition 3 is equivalent to updating r component pairs \(({\bar{\lambda }}, {\bar{s}})\). The change between iterations in the matrices of (28) and (29) is thus of rank r. In consequence, low-rank updates on the factorization of the matrix of (28), or (29), may also be considered.

3.4 Compatibility with previous work on interior-point methods

In order to have simple notation, we have chosen to formulate our problem on the form (IQP), with inequality constraints only. Analogous results hold for quadratic programs on standard form, as considered in [2, 6, 13, 15, 16]. However, when working with reduced systems, then the update will be on the diagonal of the H-matrix in the symmetric block 2-by-2 indefinite system.

Moreover, the proposed approach is also compatible with regularized methods for quadratic programming, e.g., [15,16,17], as long as the scaling of the regularization is not changed at iterations where the modified Jacobian is updated by a low-rank matrix. The scaling of the regularization may be changed at a refactorization step, e.g., on the form suggested in (34) of Sect. 4.

As each modified Jacobian may be viewed as a Jacobian evaluated at a different point, the modified Newton approach may also be interpreted in the framework of previous work on stability, effects of finite-precision arithmetic and spectral properties of the arising systems, e.g., [1,2,3, 5, 7,8,9,10,11,12,13].

4 Implementation

All numerical simulations were performed in matlab on benchmark problems from the repository of convex quadratic programming problems by Maros and Mészáros [18]. Many of these problems contain both linear equality and linear inequality constraints. However, in order not to complicate the description of the implementation with further technical details, we choose to give the description for problems on the same form as in previous sections. Note however that some of the parameters will depend on quantities related to the format of benchmark problems.

4.1 Basic algorithm

The aim is to study the fundamental behavior of the modified Newton approach as primal–dual interior-point methods converge. In particular, when each search direction is generated with a modified Newton equation on the form (15), with update matrix \(U^k_*\) of Proposition 3, relative to a Newton equation on the form (3). In order not to risk combining effects of the proposed approach with effects from other features in more advanced methods, we chose to implement the modified Newton approach in a simple interior-point framework. Moreover, all systems of linear equations were solved with matlab’s built in direct solver. No low-rank update of factorizations was used or implemented for the numerical tests. In consequence, the results are not dependent on the particular factorization or the procedure used to update the factors. For low-rank updates of matrix factorizations, see e.g., [28,29,30,31]. Our basic interior-point algorithm is similar to Algorithm 19.1 in Nocedal and Wright [21, Ch. 19, p. 567], however, termination and the update of \(\mu \) are based on the merit function \(\phi _\mu (z) = \Vert F_\mu (z) \Vert \).

figure a

In Algorithm 1 at iteration k, \(\alpha _P^{max,k}\) and \(\alpha _D^{max,k}\) are the maximum feasible step sizes for \(s^k\) along \(\Delta s^k\) and \(\lambda ^k\) along \(\Delta \lambda ^k\) respectively.

Our reference method, which in all experiments is denoted by Newton, is defined by Algorithm 1 where the search direction at iteration k, \(\Delta z^k = (\Delta x^k, \Delta \lambda ^k, \Delta s^k)\), satisfies (3). The method, whose behavior we aim to study, is defined by Algorithm 1 where the search direction satisfies (15), with an update matrix of rank \(r^k= r\) given by \(U^k_*\) of Proposition 3. This method is in the numerical experiments denoted by mN-r(r). Although the rank of the update matrices can be varied between the iterations, this initial study is limited to update matrices of constant rank in order to keep the comparisons clean.

4.2 Benchmark problems

Each problem was pre-processed and put on an equivalent form with n x-variables, \(m_{in}\) inequality constraints and \(m_{eq}\) equality constraints. The total number of variables in the primal–dual formulation is thus \(N = n + m_{eq}+ 2 m_{in}\) variables, see Appendix A for a description and formulation of the systems that arise. A trivial equality constraint that fixed a variable at any of its bounds was removed from the problem along with the variable. A problem was accepted if \(m_{in} \ge 4\) and, in addition, if Newton converged from a given initial solution. Due to the simplicity of Newton convergence was not achieved for some problems due to reasons as, non-trivial equality constraints fixing variables at its bounds, singular Jacobians caused by linearly dependent equality constraints, etc. Moreover, we were not able to run CONT-300, BOYD1 and BOYD2 due to memory restrictions. These conditions reduced the benchmark set, \({\mathcal {P}}\), to 90 problems (out of 138). The problems were divided into the three subsets: small, \({\mathcal {S}}\), medium, \({\mathcal {M}}\), and large, \({\mathcal {L}}\). The sets were defined as follows: \({\mathcal {S}} = \{ p \in {\mathcal {P}}: N < 500 \}\), \({\mathcal {M}} = \{ p \in {\mathcal {P}}: 500 \le N < 10000 \}\) and \({\mathcal {L}} = \{ p \in {\mathcal {P}}: N \ge 10000 \}\). Consequently, \(\vert {\mathcal {S}}\vert = 25\), \(\vert {\mathcal {M}}\vert = 37\) and \(\vert {\mathcal {L}}\vert = 28\). The specific problems of each group and details on their individual sizes can be found in Appendix A.

4.3 Heuristics

The theoretical results in Sect. 3 concern iterates sufficiently close to the trajectory for sufficiently small \(\mu \), or when the rank of each update matrix is sufficiently large. However, we are also interested in studying the behavior of the modified Newton approach beyond this setting. In particular, for larger vales of \(\mu \) and when each update matrix is of low-rank. In this section, we discuss the behavior in these cases. In order to improve performance we also suggest two heuristics and a refactorization strategy. In essence, the heuristics allow for change of indices in the set \({\mathcal {U}}_{r^k}\) of Proposition 3, and the refactorization strategy limits the total rank change on an initial Jacobian.

Numerical experiments with mN-r(r), for small r, have shown that convergence may slow down due to small step sizes \(\alpha ^k_P\) and \(\alpha ^k_D\). Small step sizes can be caused by few components in the modified Newton direction which differ considerably from those in the Newton direction. We first show some numerical evidence of this behavior and suggest a partial explanation on which we base two heuristics. The effectiveness of each heuristic is then illustrated, and finally a refactorization strategy is included in the modified Newton approach. Step sizes and convergence, in terms of the measure \(\Vert F_\mu \Vert \) with \(\mu = 0\), for Newton and mN-r(r) with \(r =[0,\> 2, \> 4]\) are shown in the left-hand side of Fig. 1. The results are for benchmark problem qafiro with parameters \(\mu ^0 = 1\), \(\sigma = 0.1\) and \(\varepsilon ^{tol} = 10^{-6}\). The right-hand side of the figure shows the inverse of the limiting step sizes and the relative error in the search direction at the iteration marked by the red circle of mN-r(2), hence large spikes imply small step sizes. Moreover, the figure only contains negative components of the modified Newton direction. The result for \(r= 0\), i.e., simplified Newton, is given to illustrate that low-rank updates can indeed make a difference compared to a simplified Newton approach for which some of our theoretical results are still valid, although in a smaller region.

Fig. 1
figure 1

The left-hand side shows step sizes and convergence on benchmark problem qafiro. The right-hand side shows the inverse of the limiting step sizes and the relative error in the search direction for negative components of the modified Newton direction in mN-r(2), at the iteration marked by the red circle

The results in the left-hand side of Fig. 1 indicate that convergence may slow down with the low-rank modified Newton approach due to small step sizes. The right-hand side of Fig. 1 suggests that small steps may be caused by large relative errors in certain components of the search direction. The results are similar to those shown by Gondzio and Sobral [6] for quasi-Newton approaches, hence indicating that the proposed modified Newton approach suffers from the same phenomenon as quasi-Newton approaches. In theory, zero steps are not harmful for the modified Newton approach, as long as the Newton step makes progress from this point, since after m/r iterations with zero steps the modified Jacobian will indeed be the Jacobian at that point. In practice however, close to zero steps have negative effects on the convergence. In consequence we would like to understand what causes these steps and how to avoid them.

The partial solution \(\Delta x\) of (27) satisfies (29). For z sufficiently close to \(z^*\), (29) may be approximated by

(30)

where \({\mathcal {A}}\) and \({\mathcal {I}}\) are sets of indices corresponding to active and inactive constraints at the solution \(z^*\) respectively, i.e., \({\mathcal {A}} = \{ i: s_i^* = 0, \ i=1, \dots , m. \}\) and \({\mathcal {I}} = \{ i: \lambda _i^* = 0, \ i=1, \dots , m. \}\). If the modified Newton approach is initiated for small \(\mu \), or if the rank of each update matrix is sufficiently large, then \({\bar{z}} \approx z\). If in addition, \(A_{\mathcal {A}} \Delta x\) is sufficiently large, i.e, \(\Delta x\) is not in or almost in the null-space of \(A_{\mathcal {A}}\) (if it is then the the search direction will not cause limiting steps), then the dominating terms of (30) are

(31)

In essence, (31) is an approximation of (30) in the particular case. By our assumptions \(A_{\mathcal {A}}\) has full row rank. Consequently, component-wise (31) gives

$$\begin{aligned} \frac{{\bar{\lambda }}_i}{{\bar{s}}_i} A_i \Delta x = \frac{ \mu - \lambda _i s_i}{{\bar{s}}_i}, \qquad i \in {\mathcal {A}}, \end{aligned}$$
(32)

where \( A_i\) denotes the ith row of A. Equation (32) gives an approximate description of how each pair \(({\bar{\lambda }}_i,{\bar{s}}_i)\), \(i\in {\mathcal {A}}\) affects \(A_i \Delta x\), the inner product of the search direction \(\Delta x\) and the corresponding constraint \(A_i\). This means that each pair \(({\bar{\lambda }}_i,{\bar{s}}_i)\), \(i\in {\mathcal {A}},\) affects the angle between \(\Delta x\) and constraint \(A_i\), and/or \( \Vert \Delta x \Vert \). Both errors in angle and large \(\Vert \Delta x \Vert \) may cause small step sizes. In the proposed modified Newton approach, depending on the rank of the update matrix, some of the factors in the modified Jacobian, i.e., some components \({\bar{\lambda }}_i / {\bar{s}}_i\), \(i\in {\mathcal {A}}\), may contain information from previous iterates. The analysis above suggests that it may be important to update certain components-pairs \((\lambda ,s)\) in order to avoid limiting steps. Such pairs may not be updated with the matrix of Proposition 1 or Proposition 3.

In light of the discussion above and the results of Fig. 1, we construct two heuristics in an attempt to decrease negative effects on convergence caused by small step sizes. Both heuristics have an update matrix \(U^k\) analogous to the one given by Proposition 3, with

$$\begin{aligned} U^k = \sum _{i \in {\mathcal {U}}^k} e_{n+m+i} \left( (s^k_i - {\bar{s}}^{k-1}_i ) e_{n+i} + (\lambda ^k_i - {\bar{\lambda }}_i^{k-1}) e_{m+n+i} \right) ^T, \end{aligned}$$
(33)

where \({\mathcal {U}}^k\) is an index set of cardinality r. However, not all indices in \({\mathcal {U}}^k\) are chosen according to the criteria of Proposition 3, so that \({\mathcal {U}}^k\) of the heuristics may differ from \({\mathcal {U}}_r\) of Proposition 3. (Note that \(r^k = r\), for all k, in the numerical tests.) The first heuristic can have at most two indices that differ between \({\mathcal {U}}^k\) and \({\mathcal {U}}_r\), whereas the second is more flexible and can potentially change all r indices. We choose to replace indices instead of adding indices in order to obtain a fair comparison in the study of the heuristics.

4.3.1 Heuristic H1

The idea of the first heuristic is to ensure that information corresponding to component-pairs \((\lambda ,s)\) is updated if either limited the step size in the previous iteration. At iteration k, \(k \ge 1\), \({\mathcal {U}}^k\) is based on \({\mathcal {U}}_{r}\) of Proposition 3, but the last one or two indices are replaced by

if \(\min _{i: \Delta \lambda ^{k-1}_{i}< 0} \frac{\lambda ^{k-1}_{i}}{-\Delta \lambda ^{k-1}_{i}} <1 \wedge {\hat{i}}_1 \notin {\mathcal {U}}_{r}\) and/or \(\min _{i: \Delta s^{k-1}_{i}< 0} \frac{s^{k-1}_{i}}{-\Delta s^{k-1}_{i}} <1 \wedge {\hat{i}}_2 \notin {\mathcal {U}}_{r}\) respectively.

4.3.2 Heuristic H2

The principle of the second heuristic is based on the observation in the analysis above. Similarly as in H1 the idea is to ensure that certain component-pairs \((\lambda , s)\) are updated. In particular, the components with the largest relative error of the ratio in the left-hand of (32). However, the set of active constraints at the solution is unknown, instead all components which could have limited the step size in the previous iteration are considered in the selection. At iteration k, \(k \ge 1\), \({\mathcal {U}}^k\) is based on \({\mathcal {U}}_{r}\) of Proposition 3, but at most r indices are replaced by the indices corresponding to the, at most, r largest quantities of

where \({\mathcal {H}}^k = \{ i: \Delta \lambda ^{k-1}_{i}<0 \wedge \frac{\lambda ^{k-1}_{i}}{-\Delta \lambda ^{k-1}_{i}}< 1 \} \cup \{ i: \Delta s^{k-1}_{i}<0 \wedge \frac{s^{k-1}_{i}}{-\Delta s^{k-1}_{i}} < 1 \}\).

4.3.3 Heuristic test

To demonstrate the impact of heuristic H1 and H2 we show results in Fig. 2 which are analogous to those in the left-hand side of Fig. 1. The methods mN-r(r)-H1 and mN-r(r)-H2 denote mN-r(r), \(r = [2, 4]\), combined with heuristic H1 and H2 respectively. In addition, Table 1 shows the average of the sum \((\alpha ^k_P + \alpha ^k_D)/2\) for a subset of the benchmark problems. The subset contains problems from each of the sets \({\mathcal {S}}\), \({\mathcal {M}}\) and \({\mathcal {L}}\).

Fig. 2
figure 2

Step sizes and convergence for mN-r(r), \(r = [2, 4]\), combined with heuristic H1 and H2 on benchmark problem qafiro

The results of Fig. 2 show that mN-r(r)-H1 and mN-r(r)-H2, \(r = [2, 4]\), use larger step sizes, and converges in fewer iterations, compared to mN-r(r) in Fig. 1. Hence showing that the heuristics H1 and H2 have the intended effect on benchmark problem qafiro.

Table 1 Average of the sum \((\alpha ^k_P + \alpha ^k_D)/2\) for a subset of the benchmark problems

The results in Table 1 indicate that H1 and H2 have the intended effect on more benchmark problems but also that they are not effective on all problems. Part of the reason is that the rank of each update matrix is restricted to 2 and 4 respectively, but for some problems there are many components which limit the step size. For instance, H1 can at most take two of these and H2 can, at most, take as many as the maximum rank of the update. The results for problem yao is an example where H2 does worse than without heuristic. The heuristic replaced indices in \({\mathcal {U}}_r\) which caused low quality in the search direction, indicating that it may be beneficial to also update the information that is suggested in Proposition 3. Numerical experiments have further shown small step sizes can be avoided by allowing update matrices of varying rank. In particular, update matrices where the rank is determined by the components which potentially limit the step size. However, numerical experiments have also shown that avoiding small step sizes is not sufficient to obtain increased convergence speed. Similarly as in the results of Gondizo and Sobral for quasi-Newton approaches [6], numerical experiments have shown that it is occasionally important to use the Jacobian at \(z^k\) instead of \({{\bar{z}}}^k\) to improve convergence. In light of this, we will limit the number of allowed steps before the modified Jacobian is refactorized. In consequence the total rank change of an actual Jacobian will be limited. In the following numerical simulations, the modified Newton approaches mN-r(r), mN-r(r)-H1 and mN-r(r)-H2 include a refactorization strategy on the form

$$\begin{aligned} B^k = {\left\{ \begin{array}{ll} F'(z^k) &{} k=0, l+1, 2l+2, 3l+3, \dots , \\ B^{k-1} + U^k &{} k \ne 0, l+1, 2l+2, 3l+3, \dots , \end{array}\right. } \end{aligned}$$
(34)

where \(U^k\) is given by (33) for index sets \({\mathcal {U}}^k\) corresponding to H1, H2 for mN-r(r)-H1, mN-r(r)-H2, and \({\mathcal {U}}_r\) of Propositon 3 for mN-r(r), respectively.

In general, other refactorization strategies or dynamical procedures may be considered, e.g., instead of accepting all steps, refactor or increase the rank of the update matrix if a particular step is deemed bad for some reason.

5 Numerical results

In this section we give results on the form of number of iterations and factorizations. The results are meant to give an initial indication of the performance of the proposed modified Newton approach in a basic interior-point framework. The results are for the methods Newton, mN-r(r), mN-r(r)-H1 and mN-r(r)-H2, with \(r =[2, \> 16]\), described in Sect. 4. In essence the methods differ in how the search direction is computed. The direction at iteration k satisfies (3) in Newton and (15) in the mN-methods. In contrast to Sect. 4, here the mN-methods also include the refactorization strategy described in (34). Due to the large variety in number of inequality constraints and number of variables in each benchmark problem, the parameter l of (34) was defined as the closest integer to \(l_{{\mathcal {S}}}\), \(l_{{\mathcal {M}}}\) and \(l_{{\mathcal {L}}} \) for \(p \in {\mathcal {S}}\), \(p \in {\mathcal {M}}\) and \(p \in {\mathcal {L}}\) respectively, see Table 2 for the specific values. The computational cost of a refactorization of the unreduced, reduced and condensed system all depends on the sparsity structure given by the specific problem. We therefore choose \(l_{{\mathcal {S}}}\), \(l_{{\mathcal {M}}}\) and \(l_{{\mathcal {L}}}\) such that they relate to the full rank change that corresponds to a new factorization. The values of Table 2 were chosen such that a low-rank update is performed as long as the total rank change on an actual Jacobian is not larger than a factor of 1/2, 1/10 and 1/100 for the small, medium and large problems respectively. Moreover, the parameters of Algorithm 1 were chosen as follows: \(\sigma = 0.1\), termination tolerance \(\varepsilon ^{tol} = 10^{-6}\) for the small and medium sized problems and \(\varepsilon ^{tol} = 10^{-5}\) for the large sized problems. In each run, the initial \((x^0, \lambda ^0, s^0)\) was found with Newton, with stopping criteria corresponding to the requirement on the initial solution in Algorithm 1.

Table 2 Refactorization parameter for the different problem sizes, \(m_{in}\) is the number of inequality constraints

Results are first shown for problems in the set \({\mathcal {S}}\), Tables 3 and 4, thereafter for problems in \({\mathcal {M}}\), Tables 5, 6 and 7, and finally for problems in \({\mathcal {L}}\), Tables 8, 9 and 10. The results are for three different regions depending on \(\mu ^0\), namely \(\mu ^0 = [1, \> 10^{-3}, \> 10^{-6}]\). The intention is to illustrate the performance of the modified Newton approach, both close to a solution and in a larger region where the theoretical results are not expected to hold. The results corresponding to \(r = 16\) for problems in \({\mathcal {S}}\) are omitted due to similarity of the performance caused by the refactorization strategy. In all tables the initial factorization of \(B^0 = F'(z^0)\) is counted as one factorization. In essence, “1” in the factorization column, \(\texttt {F}\), means that no refactorization was performed. Moreover, “-” denotes that the method failed to converge within a maximum number of iterations. For each problem the maximum number of iterations was set to 10N, where N depends on the number of variables, see Appendix A for the value of N associated with each problem.

Table 3 Number of factorizations and iterations for problems in \({\mathcal {S}}\) with \(\mu ^0 = 1\)
Table 4 Number of factorizations and iterations for problems in \({\mathcal {S}}\) with \(\mu ^0 = 10^{-3}\) to the left and \(\mu ^0 = 10^{-6}\) to the right
Table 5 Number of factorizations and iterations for problems in \({\mathcal {M}}\) with \(\mu ^0 = 1\)
Table 6 Number of factorizations and iterations for problems in \({\mathcal {M}}\) with \(\mu ^0 = 10^{-3}\)
Table 7 Number of factorizations and iterations for problems in \({\mathcal {M}}\) with \(\mu ^0 = 10^{-6}\)
Table 8 Number of factorizations and iterations for problems in \({\mathcal {L}}\) with \(\mu ^0 = 1\)
Table 9 Number of factorizations and iterations for problems in \({\mathcal {L}}\) with \(\mu ^0 = 10^{-3}\)
Table 10 Number of factorizations and iterations for problems in \({\mathcal {L}}\) with \(\mu ^0 = 10^{-6}\)

The results in Tables 3, 4, 5, 6, 7, 8, 9 and 10 indicate that the number of factorizations compared to those done by Newton may be reduced by instead performing low-rank updates, as with mN-r(r), \(r = [2, \> 16]\). The reduced number of factorizations is however often at the expense of performing additional iterations with low-rank updates. The total number of iterations and/or factorizations are for many problems, but not for all, further reduced with heuristics H1 and H2, as shown by the results corresponding to mN-r(r)-H1 and mN-r(r)-H2, \(r = [2, \> 16]\). This behavior is most significant in the simulations with larger values of \(\mu \), as shown in Tables 3, 5 and 8. Comparing the number of iterations in the mN-methods gives an indication of whether the heuristics have been active on a specific problem. The results in Tables 3, 4, 5, 6, 7, 8, 9 and 10 show that the performance of the heuristics varies with each problem, and in addition with \(\mu ^0\). In particular, the results show that the heuristics are most effective, and hence more likely to be active, at larger values of \(\mu \). For smaller values of \(\mu \), the mN-methods show similar performance, see Tables 7, 10 and the right-hand side of Table 4. This indicates that the heuristics are less likely to have been active for smaller values of \(\mu \). Consequently, the mN-methods are less likely to produce limiting steps at small values of \(\mu \) on the benchmark problems. The observation is in line with the results of the theoretical sections. Overall mN-r(2) fails to converge for two problems within a maximum number of iterations due to small step sizes. This is overcome with both H1 and H2, as shown by the corresponding results in Table 5.

Numerical experiments have further shown that decreasing the refactorization parameters of Table 2 decreases the number of iterations, and increases the number of factorizations done by the mN-methods. In general, an increased rank of each update matrix reduces the number of iterations overall, but due to the refactorization strategy the methods are required to refactorize more often.

Tables 7, 10 and the right-hand side of Table 4 show that low-rank updates give convergence for small values of \(\mu \) in many of the benchmark problems, even for update matrices of rank-two on large scale problems.

Finally, we want to mention simplified Newton, i.e., mN-r(0) without a refactorization strategy. Our simulations showed that this approach was significantly less robust. It is not clear to us how to deduce a refactorization strategy for mN-r(0) that gives a fair comparison to the results of Tables 3, 4, 5, 6, 7, 8, 9 and 10. Therefore, we have omitted specific results.

6 Conclusion

In this work we have proposed and motivated a structured modified Newton approach for solving systems of nonlinear equations that arise in interior-point methods for quadratic programming. In essence, the Jacobian of each Newton system is modified to a previous Jacobian plus one low-rank update matrix per succeeding iteration. The modified Jacobian maintains the sparsity pattern of the Jacobian and may thus be viewed as a Jacobian evaluated at a different point. The approach may in consequence be interpreted in the framework of previous works on primal–dual interior-point methods, e.g., effects of finite-precision arithmetic, stability, convergence and solution techniques.

Numerical simulations have shown that small step sizes can have negative effects on convergence with the modified Newton approach, especially at larger values of \(\mu \). In order to decrease these negative effects, we have constructed and motivated two heuristics. Further numerical simulations have shown that the two heuristics often increase the step sizes but also that this is not always sufficient to improve convergence. We have therefore also suggested a refactorization strategy. The heuristics and refactorization strategy that we have proposed are merely options, however, the framework allows for both different versions of these as well as other heuristics and/or strategies.

In addition, we have performed numerical simulations on a set of convex quadratic benchmark problems. The results indicated that the number of factorizations compared to those of the Newton based method can be reduced, often at the expense of performing more iterations with low-rank updates. The total number of iterations and/or factorizations were for many problems, but not for all, further reduced with the two heuristics. Although the theoretical results are in the asymptotic region as \(\mu \rightarrow 0\), or when the rank of each update matrix is sufficiently large, we still obtain interesting numerical results for larger values of \(\mu \) and update matrices of low-rank.

Our work is meant to contribute to the theoretical and numerical understanding of modified Newton approaches for solving systems of nonlinear equations that arise in interior-point methods. We have laid a foundation that may be adapted and included in more sophisticated interior-point solvers as well as contribute to the development of preconditioners. We have limited ourselves to a numerical study of the accuracy of the approaches in a high-level language. To get a full understanding of the practical performance, precise ways of solving the updated modified Newton systems would have to be investigated further.