1 Introduction

Positive Semidefinite Programming (SDP) problems have attracted a lot of attention in the literature for more than two decades, and have been used to model a plethora of different problems arising from control theory [3, Chapter 14], power systems [13], stochastic optimization [6], truss optimization [28], and many other application areas (e.g. see [3, 27]). More recently, SDP has been extensively used for building tight convex relaxations of NP-hard combinatorial optimization problems (see [3, Chapter 12], and the references therein).

As a result of the seemingly unlimited applicability of SDP, numerous contributions have been made to optimization techniques suitable for solving such problems. The most remarkable milestone was achieved by Nesterov and Nemirovskii [17], who designed a polynomially convergent Interior Point Method (IPM) for the class of SDP problems. This led to the development of numerous successful IPM variants for SDP; some of theoretical (e.g. [15, 29, 31]) and others of practical nature (e.g. [4, 5, 16]). While IPMs enjoy fast convergence, in theory and in practice, each IPM iteration requires the solution of a very large-scale linear system, even for small-scale SDP problems. What is worse, such linear systems are inherently ill-conditioned. A viable and successful alternative to IPMs for SDP problems (e.g. see [30]), which circumvents the issue of ill-conditioning without significantly compromising convergence speed, is based on the so-called Augmented Lagrangian method (ALM), which can be seen as the dual application of the proximal point method (as shown in [20]). The issue with ALMs is that, unlike IPMs, a consistent strategy for tuning the algorithm parameters is not known. Furthermore, polynomial complexity is lost, and is replaced with merely a finite termination. An IPM scheme combined with the Proximal Method of Multipliers (PMM) for solving SDP problems was proposed in [9], and was interpreted as a primal-dual regularized IPM. The authors established global convergence, and numerically demonstrated the efficiency of the method. However, the latter is not guaranteed to converge to an \(\epsilon \)-optimal solution in a polynomial number of iterations, or even to find a global optimum in a finite number of steps. Finally, viable alternatives based on proximal splitting algorithms have been studied in [12, 25]. Such schemes are very efficient and require significantly less computations and memory per iteration, as compared to IPM or ALM. However, as they are first-order methods, their convergence to high accuracy might be slow. Hence, such methodologies are only suitable for finding approximate solutions with low-accuracy.

In this paper, we are extending the Interior Point-Proximal Method of Multipliers (IP-PMM) presented in [19]. In particular, the algorithm in [19] was developed for convex quadratic programming problems and assumed that the resulting linear systems are solved exactly. Under this framework, it was proved that IP-PMM converges in a polynomial number of iterations, under mild assumptions, and an infeasibility detection mechanism was established. An important feature of this method is that it provides a reliable tuning for the penalty parameters of the PMM; indeed, the reliability of the algorithm is established numerically in a wide variety of convex problems in [7, 8, 10, 19]. In particular, the IP-PMMs proposed in [7, 8, 10] use preconditioned iterative methods for the solution of the resulting linear systems, and are very robust despite the use of inexact Newton directions. In what follows, we develop and analyze an IP-PMM for linear SDP problems, which furthermore allows for inexactness in the solution of the linear systems that have to be solved at every iteration. We show that the method converges polynomially under standard assumptions. Subsequently, we provide a necessary condition for lack of strong duality, which can serve as a basis for constructing implementable detection mechanisms for pathological cases (following the developments in [19]). As is verified in [19], IP-PMM is competitive with standard non-regularized IPM schemes, and is significantly more robust. This is because the introduction of regularization prevents severe ill-conditioning and rank deficiency of the associated linear systems solved within standard IPMs, which can hinder their convergence and numerical stability. For detailed discussions on the effectiveness of regularization within IPMs, the reader is referred to [1, 2, 18], and the references therein. A particularly important benefit of using regularization, is that the resulting Newton systems can be preconditioned effectively (e.g. see the developments in [7, 10]), allowing for more efficient implementations, with significantly lowered memory requirements. We note that the paper is focused on the theoretical aspects of the method, and an efficient, scalable, and reliable implementation would require a separate study. Nevertheless, the practical effectiveness of IP-PMM (both in terms of efficiency, scalability, and robustness) has already been demonstrated for linear, convex quadratic [7, 10, 19], and nonlinear convex problems [8].

The rest of the paper is organized as follows. In Sect. 2, we provide some preliminary background and introduce our notation. Then, in Sect. 3, we provide the algorithmic framework of the method. In Sect. 4, we prove polynomial complexity of the algorithm, and establish its global convergence. In Sect. 5, a necessary condition for lack of strong duality is derived, and we discuss how it can be used to construct an implementable detection mechanism for pathological cases. Finally, we derive some conclusions in Sect. 6.

2 Preliminaries and Notation

2.1 Primal-Dual Pair of SDP Problems

Let the vector space \({\mathcal {S}}^n :=\{B \in {\mathbb {R}}^{n\times n} :B = B^\top \}\) be given, endowed with the inner product \(\langle A, B \rangle = {\text {Tr}}(AB)\), where \({\text {Tr}}(\cdot )\) denotes the trace of a matrix. In this paper, we consider the following primal-dual pair of linear positive semi-definite programming problems, in the standard form:

$$\begin{aligned} \underset{X \in {\mathcal {S}}^n}{\text {min}} \ \langle C,X\rangle , \ \ \text {s.t.} \ {\mathcal {A}}X = b, \ X \in {\mathcal {S}}^n_{+}, \end{aligned}$$
(P)
$$\begin{aligned} \underset{y \in {\mathbb {R}}^m,\ Z \in {\mathcal {S}}^n}{\text {max}} \ b^\top y , \ \ \text {s.t.}\ {\mathcal {A}}^*y + Z = C,\ Z \in {\mathcal {S}}^n_{+}, \end{aligned}$$
(D)

where \({\mathcal {S}}^{n}_{+} :=\{B \in {\mathcal {S}}^{n} :B \succeq 0\}\), \(C,X,Z \in {\mathcal {S}}^{n}\), \(b,y \in {\mathbb {R}}^m\), \({\mathcal {A}}\) is a linear operator on \({\mathcal {S}}^n\), \({\mathcal {A}}^*\) is the adjoint of \({\mathcal {A}}\), and \(X\succeq 0\) denotes that X is positive semi-definite. We note that the norm induced by the inner product \(\langle A, B \rangle = {\text {Tr}}(AB)\) is in fact the Frobenius norm, denoted by \(\Vert \cdot \Vert _{F}\). Furthermore, the adjoint \({\mathcal {A}}^* :{\mathbb {R}}^m \mapsto {\mathcal {S}}^n\) is such that \(y^\top {\mathcal {A}}X = \langle {\mathcal {A}}^*y, X\rangle ,\ \ \forall \ y \in {\mathbb {R}}^m, \ \forall \ X \in {\mathcal {S}}^n\).

For the rest of this paper, except for Sect. 5, we will assume that the linear operator \({\mathcal {A}}\) is onto and that problems (P)(P) and (D)(D) are both strictly feasible (that is, Slater’s constraint qualification holds for both problems). It is well-known that under the previous assumptions, the primal-dual pair (P)(P)–(D)(D) is guaranteed to have optimal solution for which strong duality holds (see [17]). Such a solution can be found by solving the Karush–Kuhn–Tucker (KKT) optimality conditions for (P)(P)–(D)(D), which read as follows:

$$\begin{aligned} \begin{bmatrix} {\mathcal {A}}^* y + Z -C\\ {\mathcal {A}} X - b\\ XZ \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix},\qquad X,\ Z \in {\mathcal {S}}^n_+. \end{aligned}$$
(1)

2.2 A Proximal Method of Multipliers

The author in [20] presented for the first time the Proximal Method of Multipliers (PMM), in order to solve general convex programming problems. Let us derive this method for the pair (P)(P)–(D)(D). Given arbitrary starting point \((X_0,y_0) \in {\mathcal {S}}^n_{+}\times {\mathbb {R}}^m\), the PMM can be summarized by the following iteration:

$$\begin{aligned} \begin{aligned} X_{k+1} =&\ \underset{X \in {\mathcal {S}}^n_+}{\arg \min }\bigg \{\langle C,X\rangle - y_k^\top ({\mathcal {A}}X - b) + \frac{\mu _k}{2}\Vert X-X_k\Vert _F^2 + \frac{1}{2\mu _k}\Vert AX-b\Vert _2^2 \bigg \},\\ y_{k+1} =&\ y_k - \frac{1}{\mu _k} (AX_{k+1} - b), \end{aligned} \end{aligned}$$
(2)

where \(\{\mu _k\}\) is a sequence of positive penalty parameters. The previous iteration admits a unique solution, for all k.

We can write (2) equivalently by making use of the maximal monotone operator \(T_{{\mathcal {L}}} :{\mathbb {R}}^m\times {\mathcal {S}}^n \rightrightarrows {\mathbb {R}}^m\times {\mathcal {S}}^n\) (see [20, 21]), whose graph is defined as:

$$\begin{aligned} T_{{\mathcal {L}}}(X,y) :=\{(V,u): V \in C - {\mathcal {A}}^*y + \partial \delta _{S^n_{+}}(X),\ u = {\mathcal {A}}X-b \}, \end{aligned}$$
(3)

where \(\delta _{S^n_{+}}(\cdot )\) is an indicator function defined as:

$$\begin{aligned} \delta _{S^n_{+}}(X) :={\left\{ \begin{array}{ll} 0, &{}\quad \text {if } X \in {\mathcal {S}}^n_+, \\ \infty , &{}\quad \text {otherwise,} \\ \end{array}\right. } \end{aligned}$$
(4)

and \(\partial (\cdot )\) denotes the sub-differential of a function, hence (from [22, Corollary 23.5.4]):

$$\begin{aligned} Z \in \partial \delta _{S^n_{+}}(X) \Leftrightarrow -Z \in {\mathcal {S}}^n_+,\ \langle X,Z\rangle = 0. \end{aligned}$$

By convention, we have that \(\partial \delta _{{\mathcal {S}}_+^n(X^*)} = \emptyset \) if \(X^* \notin {\mathcal {S}}_+^n\). Given a bounded pair \((X^*,y^*)\) such that \((0,0) \in T_{{\mathcal {L}}}(X^*,y^*)\), we can retrieve a matrix \(Z^* \in \partial \delta _{S^n_{+}}(X^*)\), using which \((X^*,y^*,-Z^*)\) is an optimal solution for (P)(P)–(D)(D). By defining the proximal operator:

$$\begin{aligned} {\mathcal {P}}_k :=\bigg (I_{n+m} + \frac{1}{\mu _k}T_{{\mathcal {L}}}\bigg )^{-1}, \end{aligned}$$
(5)

where \(I_{n+m}\) is the identity operator of size \(n+m\), and describes the direct sum of the idenity operators of \({\mathcal {S}}_n\) and \({\mathbb {R}}^m\), we can express (2) as:

$$\begin{aligned} (X_{k+1},y_{k+1}) = {\mathcal {P}}_k(X_k,y_k), \end{aligned}$$
(6)

and it can be shown that \({\mathcal {P}}_k\) is single valued and firmly non-expansive (see [21]).

2.3 An Infeasible Interior Point Method

In what follows we present a basic infeasible IPM suitable for solving the primal-dual pair (P)(P)–(D)(D). Such methods handle the conic constraints by introducing a suitable logarithmic barrier in the objective (for an extensive study of logarithmic barriers, the reader is referred to [17]). At each iteration, we choose a barrier parameter \(\mu > 0\) and form the logarithmic barrier primal-dual pair:

$$\begin{aligned}&\underset{X \in {\mathcal {S}}^n}{\text {min}} \ \langle C,X\rangle - \mu \ln (\det (X)), \ \ \text {s.t.} \ {\mathcal {A}}X = b, \end{aligned}$$
(7)
$$\begin{aligned}&\underset{y \in {\mathbb {R}}^m,\ Z \in {\mathcal {S}}^n}{\text {max}} \ b^\top y + \mu \ln (\det (Z)) , \ \ \text {s.t.}\ {\mathcal {A}}^*y + Z = C. \end{aligned}$$
(8)

The first-order (barrier) optimality conditions of (7)–(8) read as follows:

$$\begin{aligned} \begin{bmatrix} {\mathcal {A}}^* y + Z -C\\ {\mathcal {A}} X - b\\ XZ - \mu I_n \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix},\qquad X,\ Z \in {\mathcal {S}}^n_{++}, \end{aligned}$$
(9)

where \({\mathcal {S}}^n_{++} :=\{B \in {\mathcal {S}}^n: B\succ 0\}\). For every chosen value of \(\mu \), we want to approximately solve the following nonlinear system of equations:

$$\begin{aligned} F_{\sigma ,\mu }^{IPM}(w) :=\begin{bmatrix} {\mathcal {A}}^* y + Z -C\\ {\mathcal {A}} X - b\\ XZ - \sigma \mu I_n \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix}, \end{aligned}$$

where, with a slight abuse of notation, we set \(w = (X,y,Z)\). Notice that \(F_{\sigma ,\mu }^{IPM}(w) = 0\) is a perturbed form of the barrier optimality conditions. In particular, \(\sigma \in (0,1)\) is a centering parameter which determines how fast \(\mu \) will be forced to decrease at the next IPM iteration. For \(\sigma = 1\) we recover the barrier optimality conditions in (9), while for \(\sigma = 0\) we recover the optimality conditions in (1).

In IPM literature it is common to apply Newton method to approximately solve the system of nonlinear equations \(F_{\sigma ,\mu }^{IPM}(w) = 0\). Newton method is favored for systems of this form due to the self-concordance of the logarithmic barrier (see [17]). However, a well-known issue in the literature is that the matrix XZ is not necessarily symmetric. A common approach to tackle this issue is to employ a symmetrization operator \(H_P : {\mathbb {R}}^{n\times n} \mapsto {\mathcal {S}}^n\), such that \(H_P(XZ) = \mu I\) if and only if \(XZ = \mu I\), given that \(X,\ Z \in {\mathcal {S}}_+^n\). Following Zhang ( [29]), we employ the following operator: \(H_P : {\mathbb {R}}^{n\times n} \mapsto {\mathcal {S}}^n\):

$$\begin{aligned} H_P(B) :=\frac{1}{2}(PBP^{-1} + (PBP^{-1})^\top ), \end{aligned}$$
(10)

where P is a non-singular matrix. It can be shown that the central path (a key notion used in IPMs—see [17]) can be equivalently defined as \(H_P(XZ) = \mu I\), for any non-singular matrix P. In this paper, we will make use of the choice \(P_k = Z_k^{-\frac{1}{2}}\). For a plethora of alternative choices, the reader is referred to [26]. We should note that the analysis in this paper can be tailored to different symmetrization strategies, and this choice is made for simplicity of exposition.

At the beginning of the k-th iteration, we have \(w_k = (X_k, y_k, Z_k)\) and \(\mu _k\) available. The latter is defined as \(\mu _k = \frac{\langle X_k, Z_k \rangle }{n}\). By substituting the symmetrized complementarity in the last block equation and applying Newton method, we obtain the following system of equations:

$$\begin{aligned} \begin{bmatrix} 0 &{} {\mathcal {A}}^* &{} I_n\\ {\mathcal {A}} &{} 0 &{} 0 \\ {\mathcal {E}}_k &{} 0 &{} {\mathcal {F}}_k \end{bmatrix} \begin{bmatrix} \varDelta X\\ \varDelta y\\ \varDelta Z \end{bmatrix} = \begin{bmatrix} C - {\mathcal {A}}^*y - Z_k\\ b - {\mathcal {A}}X_k\\ \mu I_n - H_{P_k}(X_kZ_k) \end{bmatrix}, \end{aligned}$$
(11)

where \({\mathcal {E}}_k :=\nabla _X H_{P_k}(X_kZ_k)\), and \({\mathcal {F}}_k :=\nabla _Z H_{P_k}(X_kZ_k)\).

2.4 Vectorized Format

In what follows we vectorize the associated operators, in order to work with matrices. In particular, given any matrix \(B \in {\mathbb {R}}^{m \times n}\), we denote its vectorized form as \(\varvec{B}\), which is a vector of size mn, obtained by stacking the columns of B, from the first to the last. For the rest of this manuscript, any boldface letter denotes a vectorized matrix. Furthermore, if \({\mathcal {A}}: {\mathcal {S}}^n \mapsto {\mathbb {R}}^m\) is a linear operator, we can define it component-wise as \(({\mathcal {A}}X)_i :=\langle A_i , X \rangle \), for \(i = 1,\ldots ,m\), and any \(X \in {\mathcal {S}}^n\), where \(A_i \in {\mathcal {S}}^n\). Furthermore, the adjoint of this operator, that is \({\mathcal {A}}^*: {\mathbb {R}}^m \mapsto {\mathcal {S}}^n\) is defined as \({\mathcal {A}}^*y :=\sum _{i = 1}^{m} y_i A_i\), for all \(y \in {\mathbb {R}}^m\). Using this notation, we can equivalently write (P)(P)–(D)(D) in the following form:

$$\begin{aligned}&\underset{X \in {\mathcal {S}}^n}{\text {min}} \ \langle C,X\rangle , \qquad \text {s.t.} \ \langle A_i, X \rangle = b_i, \quad i = 1,\ldots ,m, \qquad \ X \in {\mathcal {S}}^n_{+}, \end{aligned}$$
(12)
$$\begin{aligned}&\underset{y \in {\mathbb {R}}^m,\ Z \in {\mathcal {S}}^n}{\text {max}} \ \ b^\top y , \qquad \text {s.t.} \ \sum _{i=1}^m y_i A_i + Z = C, \qquad \ Z \in {\mathcal {S}}^n_{+}. \end{aligned}$$
(13)

The first-order optimality conditions can be re-written as:

$$\begin{aligned} \begin{bmatrix} A^\top y + \varvec{Z} - \varvec{C} \\ A\varvec{X} - b\\ XZ \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix}, \qquad X,\ Z \in {\mathcal {S}}^n_+, \end{aligned}$$

where \(A^\top = [\varvec{A}_1\ \varvec{A}_2\ \cdots \ \varvec{A}_m]\).

Below we summarize any additional notation that is used later in the paper. An iteration of the algorithm is denoted by \(k \in {\mathbb {N}}\). Given an arbitrary matrix A (resp., vector x), \(A_k\) (resp., \(x_k\)) denotes that the matrix (resp., vector) depends on the iteration k. An optimal solution to the pair (P)(P)–(D)(D) will be denoted as \((X^*,y^*,Z^*)\). Optimal solutions of different primal-dual pairs will be denoted using an appropriate subscript, in order to distinguish them (e.g. \((X_r^*,y_r^*,Z_r^*)\) will denote an optimal solution for a PMM sub-problem). Any norm (resp., semi-norm) is denoted by \(\Vert \cdot \Vert _{\chi }\), where \(\chi \) is used to distinguish between different norms (e.g. \(\Vert \cdot \Vert _2\) denotes the Euclidean norm). Given two matrices \(X,\ Y \in {\mathcal {S}}^n_+\), we write \(X \succeq Y\) when X is larger than Y with respect to the Loewner ordering. Given two logical statements \(T_1,\ T_2\), the condition \(T_1 \wedge T_2\) is true only when both \(T_1\) and \(T_2\) are true. Given two real-valued positive increasing functions \(T(\cdot )\) and \(f(\cdot )\), we say that \(T(x) = O(f(x))\) (resp., \(T(x) = \varOmega (f(x))\)) if there exist \(x_0\ge 0,\ c_1 > 0\), such that \(T(x) \le c_1 f(x)\) (resp., \(c_2 > 0\) such that \(T(x) \ge c_2 f(x)\)), for all \(x \ge x_0\). We write \(T(x) = \varTheta (f(x))\) if and only if \(T(x) = O(f(x))\) and \(T(x) = \varOmega (f(x))\). Finally, let an arbitrary matrix A be given. The maximum (resp., minimum) singular value of A is denoted by \(\eta _{\max }(A)\) (resp., \(\eta _{\min }(A)\)). Similarly, the maximum (resp., minimum) eigenvalue of a square matrix A is denoted by \(\nu _{\max }(A)\) (resp., \(\nu _{\min }(A)\)).

3 An Interior Point-Proximal Method of Multipliers for SDP

In this section we present an inexact extension of IP-PMM presented in [19], suitable for solving problems of the form of (P)(P)–(D)(D). Assume that we have available an estimate \(\lambda _k\) for a Lagrange multiplier vector at iteration k. Similarly, denote by \(\varXi _k \in {\mathcal {S}}_+^n\) an estimate of a primal solution. As we discuss later, these estimate sequences (i.e. \(\{\lambda _k\}, \{\varXi _k\}\)) are produced by the algorithm, and represent the dual and primal proximal estimates, respectively. During the k-th iteration of the PMM, applied to (P)(P), the following proximal penalty function has to be minimized:

$$\begin{aligned} \begin{aligned} {\mathcal {L}}^{PMM}_{\mu _k} (X;\varXi _k, \lambda _k) :=\langle C, X \rangle -\lambda _k^\top ({\mathcal {A}}X - b) + \frac{1}{2\mu _k}\Vert {\mathcal {A}}X-b\Vert _{2}^2 + \frac{\mu _k}{2}\Vert X-\varXi _k\Vert _{F}^2, \end{aligned} \end{aligned}$$
(14)

with \(\{\mu _k\}\) being some non-increasing sequence of positive penalty parameters. Notice that this is equivalent to the iteration (2). We approximately minimize (14) by applying one (or a few) iterations of the previously presented infeasible IPM. We alter (14) by adding a logarithmic barrier:

$$\begin{aligned} {\mathcal {L}}^{IP-PMM}_{\mu _k} (X;\varXi _k, \lambda _k) :={\mathcal {L}}^{PMM}_{\mu _k} (X;\varXi _k, \lambda _k) - \mu _k \log (\det (X)), \end{aligned}$$
(15)

and we treat \(\mu _k\) as the barrier parameter. In order to form the optimality conditions of this sub-problem, we equate the gradient of \({\mathcal {L}}^{IP-PMM}_{\mu _k}(\cdot ;\varXi _k,\lambda _k)\) to the zero vector, i.e.:

$$\begin{aligned} C - {\mathcal {A}}^* \lambda _k + \frac{1}{\mu _k}{\mathcal {A}}^*({\mathcal {A}}X - b) + \mu _k (X - \varXi _k) - \mu _k X^{-1} = 0. \end{aligned}$$

Introducing the variables \(y = \lambda _k - \frac{1}{\mu _k}({\mathcal {A}}X - b)\) and \(Z = \mu _k X^{-1}\), yields:

$$\begin{aligned} \begin{aligned} \begin{bmatrix} C - {\mathcal {A}}^*y - Z + \mu _k(X-\varXi _k)\\ {\mathcal {A}}X + \mu _k (y - \lambda _k) - b\\ XZ - \mu _k I_n \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix} \Leftrightarrow \begin{bmatrix} \varvec{C}- A^\top y - \varvec{Z} + \mu _k(\varvec{X}-\varvec{\varXi }_k)\\ A\varvec{X}+ \mu _k (y - \lambda _k) - b\\ H_{P_k}(XZ) - \mu _k I_n \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 0 \end{bmatrix}, \end{aligned} \end{aligned}$$
(16)

where the second system is obtained by introducing the symmetrization in (10), and by vectorizing the associated matrices and operators.

Given an arbitrary vector \(b \in {\mathbb {R}}^m\), and matrix \(C \in {\mathbb {R}}^{n \times n}\), we define the semi-norm:

$$\begin{aligned} \Vert (b,\varvec{C})\Vert _{{\mathcal {S}}} :=\min _{X,y,Z}\bigg \{\Vert (\varvec{X},\varvec{Z})\Vert _2\ :\begin{matrix} A\varvec{X} = b, \\ A^\top y + \varvec{Z} = \varvec{C}\end{matrix}\bigg \}. \end{aligned}$$
(17)

A similar semi-norm was used before in [15], as a way to measure infeasibility in the case of linear programming problem. For a discussion of the properties of the aforementioned semi-norm, as well as how to evaluate it (using an appropriate QR factorization, which can be computed in a polynomial time), the reader is referred to [15, Section 4].

Starting Point Let us define the starting point for IP-PMM. For that, we set \((X_0,Z_0) = \rho (I_n,I_n)\), for some \(\rho > 0\). We also set \(y_0\) to some arbitrary value (e.g. \(y_0 = 0\)), and \(\mu _0 = \frac{\langle X_0, Z_0 \rangle }{n}\). Using the aforementioned triple, we have:

$$\begin{aligned} A\varvec{X}_0 = b + \bar{b},\ A^\top y_0 + \varvec{Z}_0 = \varvec{C} + \varvec{\bar{C}},\ \varXi _0 = X_0,\ \lambda _0 = y_0. \end{aligned}$$
(18)

for some \(\bar{b} \in {\mathbb {R}}^m\), and \(\bar{C} \in {\mathcal {S}}^{n}\).

Neighbourhood Below, we describe a neighbourhood in which the iterations of the method should lie. Unlike most path-following methods, we have to define a family of neighbourhoods that depend on the PMM sub-problem parameters.

Given (18), some \(\mu _k\), \(\lambda _k\), and \(\varXi _k\), we define the regularized set of centers:

$$\begin{aligned}&\mathscr {P}_{\mu _k}(\varXi _k,\lambda _k) :=\{(X,y,Z)\in {\mathscr {C}}_{\mu _k}(\varXi _k,\lambda _k)\ :\ X \in {\mathcal {S}}^n_{++},\ Z \in {\mathcal {S}}^n_{++},\ XZ = \mu _k I_n \}, \\&\quad \mathscr {C}_{\mu _k}(\varXi _k,\lambda _k) :=\left\{ (X,y,Z)\ :\quad \begin{matrix} A\varvec{X} + \mu _k (y-\lambda _k) = b + \frac{\mu _k}{\mu _0} \bar{b},\\ A^\top y + \varvec{Z} - \mu _k(\varvec{X}- \varvec{\varXi }_k) = \varvec{C} + \frac{\mu _k}{\mu _0}\varvec{\bar{C}} \end{matrix} \right\} , \end{aligned}$$

where \(\bar{b},\ \bar{C}\) are as in (18). The term set of centers originates from [15].

We enlarge the previous set, by defining the following set:

$$\begin{aligned} \begin{aligned} \tilde{\mathscr {C}}_{\mu _k}(\varXi _k,\lambda _k) :=\left\{ (X,y,Z)\ :\quad \begin{matrix} A\varvec{X} + \mu _k(y-\lambda _k) = b + \frac{\mu _k}{\mu _0} (\bar{b}+\tilde{b}_{k}),\\ A^\top y + \varvec{Z} - \mu _k (\varvec{X}- \varvec{\varXi }_k) =\varvec{C} + \frac{\mu _k}{\mu _0}(\varvec{\bar{C}}+\varvec{\tilde{C}}_{k})\\ \Vert (\tilde{b}_{k},\varvec{\tilde{C}}_{k})\Vert _2 \le K_N,\ \Vert (\tilde{b}_{k},\varvec{\tilde{C}}_{k})\Vert _{{\mathcal {S}}} \le \gamma _{{\mathcal {S}}} \rho \end{matrix} \right\} , \end{aligned} \end{aligned}$$

where \(K_N > 0\) is a constant, \(\gamma _{{\mathcal {S}}} \in (0,1)\) and \(\rho >0\) is as defined in the starting point. The vector \(\tilde{b}_{k}\) and the matrix \(\tilde{C}_{k}\) represent the current scaled (by \(\frac{\mu _0}{\mu _k}\)) infeasibilities, and will vary depending on the iteration k. While these can be defined recursively, it is not necessary. Instead it suffices to know that they satisfy the bounds given in the definition of the previous set. In essence, the previous set requires these scaled infeasibilities to be bounded above by some constants, with respect to the 2-norm as well as the semi-norm defined in (17). We can now define a family of neighbourhoods:

$$\begin{aligned} \mathscr {N}_{\mu _k}(\varXi _k,\lambda _k) :=\bigg \{(X,y,Z) \in \tilde{\mathscr {C}}_{\mu _k}(\varXi _k,\lambda _k)\ :\quad \begin{matrix}\ X \in {\mathcal {S}}^n_{++},\ Z \in {\mathcal {S}}^n_{++},\\ \Vert H_P(XZ) - \mu I_n\Vert _F \le \gamma _{\mu } \mu _k \end{matrix}\bigg \}, \end{aligned}$$
(19)

where \(\gamma _{\mu } \in (0,1)\) is a constant restricting the symmetrized complementarity products. Obviously, the starting point defined in (18) belongs to the neighbourhood \(\mathscr {N}_{\mu _0}(\varXi _0,\lambda _0)\), with \((\tilde{b}_{0},\varvec{\tilde{C}}_{0}) = (0,0)\). Notice that the neighbourhood depends on the choice of the constants \(K_N\), \(\gamma _{{\mathcal {S}}}\), \(\gamma _{\mu }\). However, as it also depends on the parameters \(\mu _k,\ \lambda _k,\ \varXi _k\), the dependence on the constants is omitted for simplicity of notation.

Newton System As discussed earlier, we employ the Newton method for approximately solving a perturbed form of system (16), for all k. In particular, we perturb (16) in order to take into consideration the target reduction of the barrier parameter \(\mu _k\) (by introducing the centering parameter \(\sigma _k\)), as well as to incorporate the initial infeasibility, given our starting point in (18). In particular, we would like to approximately solve the following system:

$$\begin{aligned} \begin{aligned} \begin{bmatrix} -(\mu _k I_n) &{} \mathcal {A^*} &{} I_n\\ {\mathcal {A}} &{} \mu _k I_m &{} 0\\ Z_k &{} 0 &{}X_k \end{bmatrix} \begin{bmatrix} \varDelta X_k\\ \varDelta y_k\\ \varDelta Z_k \end{bmatrix} = \begin{bmatrix} (C + \frac{\sigma _k \mu _k}{\mu _0}\bar{C}) - {\mathcal {A}}^* y_k - Z_k +\sigma _k\mu _k (X_k - \varXi _k)\\ -{\mathcal {A}}X_k -\sigma _k\mu _k (y_k - \lambda _k)+ (b +\frac{\sigma _k \mu _k}{\mu _0}\bar{b}) \\ -X_kZ_k + \sigma _{k} \mu _k I_n \end{bmatrix}, \end{aligned} \end{aligned}$$
(20)

where \(\bar{b},\ \bar{C}\) are as in (18). We note that we could either first linearize the last block equation of (16) and then apply the symmetrization, defined in (10), or first apply the symmetrization directly to the last block equation of (16) and then linearize it. Both approaches are equivalent. Hence, following the former approach, we obtain the vectorized Newton system, that has to be solved at every iteration of IP-PMM:

$$\begin{aligned} \begin{aligned}&\begin{bmatrix} -(\mu _k I_{n^2}) &{} A^\top &{} I_{n^2}\\ A &{} \mu _k I_m &{} 0\\ E_k &{} 0 &{}F_k \end{bmatrix} \begin{bmatrix} \varvec{\varDelta X}_k\\ \varDelta y_k\\ \varvec{\varDelta Z}_k \end{bmatrix} \\&\quad = \begin{bmatrix} (\varvec{C} + \frac{\sigma _k \mu _k}{\mu _0}\varvec{\bar{C}}) -A^\top y_k - \varvec{Z}_k +\sigma _k\mu _k (\varvec{X}_k - \varvec{\varXi }_k)\\ -A \varvec{X}_k -\sigma _k\mu _k (y_k - \lambda _k)+ (b +\frac{\sigma _k \mu _k}{\mu _0}\bar{b}) \\ -(Z_k^{\frac{1}{2}}\otimes Z_k^{\frac{1}{2}})\varvec{X}_k + \sigma _{k} \mu _k \varvec{I}_{n} \end{bmatrix} + \begin{bmatrix} \varvec{\mathsf {E}}_{d,k}\\ \epsilon _{p,k}\\ \varvec{\mathsf {E}}_{\mu ,k} \end{bmatrix}, \end{aligned} \end{aligned}$$
(21)

where \(E_k = (Z_k^{\frac{1}{2}} \otimes Z_k^{\frac{1}{2}})\), \(F_k = \frac{1}{2}\big (Z_k^{\frac{1}{2}}X_k \otimes Z_k^{-\frac{1}{2}} + Z^{-\frac{1}{2}} \otimes Z_k^{\frac{1}{2}}X_k \big )\), and \((\mathsf {E}_{d,k},\epsilon _{p,k},\mathsf {E}_{\mu ,k})\) models potential errors, occurring by solving the symmetrized version of system (20) inexactly (e.g. by using a Krylov subspace method). In order to make sure that the computed direction is accurate enough, we impose the following accuracy conditions:

$$\begin{aligned} \Vert \varvec{\mathsf {E}}_{\mu ,k}\Vert _2 = 0,\qquad \Vert (\epsilon _{p,k},\varvec{\mathsf {E}}_{d,k})\Vert _2 \le \frac{\sigma _{\min }}{4\mu _0} K_N \mu _k,\qquad \Vert (\epsilon _{p,k},\varvec{\mathsf {E}_{d,k}})\Vert _{{\mathcal {S}}} \le \frac{\sigma _{\min }}{4\mu _0} \gamma _{{\mathcal {S}}}\rho \mu _k, \end{aligned}$$
(22)

where \(\sigma _{\min }\) is the minimum allowed value for \(\sigma _k\), \(K_N,\ \gamma _{{\mathcal {S}}}\) are constants defined by the neighbourhood in (19), and \(\rho \) is defined in the starting point in (18). Notice that the condition \(\Vert \varvec{\mathsf {E}}_{\mu ,k}\Vert _2 = 0\) is imposed without loss of generality, since it can be easily satisfied in practice. For more on this, see the discussions in [31, Section 3] and [11, Lemma 4.1]. Furthermore, as we will observe in Sect. 4, the bound on the error with respect to the semi-norm defined in (17) is required to ensure polynomial complexity of the method. While evaluating this semi-norm is not particularly practical (and is never performed in practice, e.g. see [7, 8, 19]), it can be done in a polynomial time (see [15, Section 4]), and hence does not affect the polynomial nature of the algorithm. The algorithmic scheme of the method is summarized in Algorithm IP–PMM.

figure a

Algorithm IP–PMM deviates from standard IPM schemes due to the solution of a different Newton system, as well as due to the possible updates of the proximal estimates, i.e. \(\varXi _k\) and \(\lambda _k\). Notice that when these estimates are updated, the neighbourhood in (19) changes as well, since it is parametrized by them. Intuitively, when this happens, the algorithm accepts the current iterate as a sufficiently accurate solution to the associated PMM sub-problem. However, as we will see in Sect. 4, it is not necessary for these estimates to converge to a primal-dual solution, for Algorithm IP–PMM to converge. Instead, it suffices to ensure that these estimates will remain bounded. In light of this, Algorithm IP–PMM is not studied as an inner-outer scheme, but rather as a standard IPM scheme. We will return to this point at the end of Sect. 4.

4 Convergence Analysis

In this section we prove polynomial complexity of Algorithm IP–PMM, and establish its global convergence. The analysis is modeled after that in [19]. We make use of the following two standard assumptions, generalizing those employed in [19] to the SDP case.

Assumption 1

The problems (P)(P) and (D)(D) are strictly feasible, that is, Slater’s constraint qualification holds for both problems. Furthermore, there exists an optimal solution \((X^*, y^*, Z^*)\) and a constant \(K_* > 0\) independent of n and m such that \(\Vert (\varvec{X}^*,y^*,\varvec{Z}^*)\Vert _F \le K_* \sqrt{n}\).

Assumption 2

The vectorized constraint matrix A of (P)(P) has full row rank, that is \({\text {rank}}(A) = m\). Moreover, there exist constants \(K_{A,1} > 0\), \(K_{A,2} > 0\), \(K_{r,1} >0 \), and \(\ K_{r,2} > 0\), independent of n and m, such that:

$$\begin{aligned} \eta _{\min }(A) \ge K_{A,1},\quad \eta _{\max }(A) \le K_{A,2},\quad \Vert b\Vert _{\infty }\le K_{r,1},\quad \Vert C\Vert _2 \le K_{r,2} \sqrt{n}. \end{aligned}$$

Remark 1

Note that the independence of the previous constants from n and m is assumed for simplicity of exposition. In particular, as long as these constants depend polynomially on n (or m), the analysis still holds, simply by altering the worst-case polynomial bound for the number of iterations of the algorithm (given later in Theorem 4.2).

Remark 2

Assumption 1 is a direct extension of that in [19, Assumption 1]. The positive semi-definiteness of \(X^*\) and \(Z^*\) implies that \({\text {Tr}}(X^*) + {\text {Tr}}(Z^*) \le 2 K_* n\) (from equivalence of the norms \(\Vert \cdot \Vert _1\) and \(\Vert \cdot \Vert _2\)), which is one of the assumptions employed in [29, 31]. Notice that we assume \(n > m\), without loss of generality. The theory in this section would hold if \(m > n\), simply by replacing n by m in the upper bound of the norm of the optimal solution as well as of the problem data.

Before proceeding with the convergence analysis, we briefly provide an outline of it, for the convenience of the reader. Firstly, it should be noted that polynomial complexity as well as global convergence of Algorithm IP–PMM is proven by induction on the iterations k of the method. To that end, we provide some necessary technical results in Lemmas 4.14.3. Then, in Lemma 4.4 we are able to show that the iterates \((X_k,y_k,Z_k)\) of Algorithm IP–PMM will remain bounded for all k. Subsequently, we provide some additional technical results in Lemmas 4.54.7, which are then used in Lemma 4.8, where we show that the Newton direction computed at every iteration k is also bounded. All the previous are utilized in Lemmas 4.94.10, where we provide a lower bound for the step-length \(\alpha _k\) chosen by Algorithm IP–PMM at every iteration k. Then, Q-linear convergence of \(\mu _k\) (with R-linear convergence of the regularized residuals) is shown in Theorem 4.1. Polynomial complexity is proven in Theorem 4.2, and finally, global convergence is established in Theorem 4.3.

Let us now use the properties of the proximal operator defined in (5).

Lemma 4.1

Given Assumption 1, and for all \(\lambda \in {\mathbb {R}}^m\), \(\varXi \in {\mathcal {S}}_+^n\) and \(0 \le \mu < \infty \), there exists a unique pair \((X_r^*,y_r^*)\), such that \((X_r^*,y_r^*) = {\mathcal {P}}(\varXi ,\lambda ),\) \(X_r^* \in {\mathcal {S}}_+^n\), and

$$\begin{aligned} \Vert (\varvec{X}_r^*,y_r^*)-(\varvec{X}^*,y^*)\Vert _{2} \le \Vert (\varvec{\varXi },\lambda )-(\varvec{X}^*,y^*)\Vert _{2}, \end{aligned}$$
(23)

where \({\mathcal {P}}(\cdot )\) is defined as in (5), and \((X^*,y^*)\) is such that \((0,0) \in T_{{\mathcal {L}}}(X^*,y^*)\).

Proof

The thesis follows from the developments in [21, Proposition 1]. \(\square \)

In the following lemma, we bound the solution of every PMM sub-problem encountered by Algorithm IP–PMM, while establishing bounds for the proximal estimates \(\varXi _k\), and \(\lambda _k\).

Lemma 4.2

Given Assumptions 1, 2, there exists a triple \((X_{r_k}^*,y_{r_k}^*,Z_{r_k}^*)\), satisfying:

$$\begin{aligned} \begin{aligned} A \varvec{X}_{r_k}^* + \mu (y_{r_k}^*-\lambda _k) -b&= 0,\\ -\varvec{C} + A^\top y_{r_k}^* + \varvec{Z}_{r_k}^* - \mu (\varvec{X}_{r_k}^* - \varvec{\varXi }_k)&= 0,\\ \langle X_{r_k}^*, Z_{r_k}^*\rangle&= 0, \end{aligned} \end{aligned}$$
(24)

with \(X_{r_k}^*, Z_{r_k}^* \in {\mathcal {S}}^n_+\), and \(\Vert (\varvec{X}_{r_k}^*,y_{r_k}^*,\varvec{Z}_{r_k}^*)\Vert _2 = O(\sqrt{n})\), for all \(\lambda _k \in {\mathbb {R}}^m\), \(\varXi _k \in {\mathcal {S}}_+^n\), produced by Algorithm IP–PMM, and any \(\mu \in [0,\infty )\). Moreover, \(\Vert (\varvec{\varXi }_k,\lambda _k)\Vert _2 = O(\sqrt{n})\), for all \(k \ge 0\).

Proof

We prove the claim by induction on the iterates, \(k \ge 0\), of Algorithm IP–PMM. At iteration \(k = 0\), we have that \(\lambda _0 = y_0\) and \(\varXi _0 = X_0\). But from the construction of the starting point in (18), we know that \(\Vert (X_0,y_0)\Vert _2 = O(\sqrt{n})\). Hence, \(\Vert (\varXi _0,\lambda _0)\Vert _2 = O(\sqrt{n})\) (assuming \(n > m\)). Invoking Lemma 4.1, there exists a unique pair \((X_{r_0}^*,y_{r_0}^*)\) satisfying:

$$\begin{aligned} (X_{r_0}^*,y_{r_0}^*) = {\mathcal {P}}_0(\varXi _0,\lambda _0),\qquad \Vert (\varvec{X}_{r_0}^*,y_{r_0}^*) - (\varvec{X}^*,y^*)\Vert _{2} \le \Vert (\varvec{\varXi }_0,\lambda _0)-(\varvec{X}^*,y^*)\Vert _{2}, \end{aligned}$$

where \((X^*,y^*,Z^*)\) solves (P)(P)–(D)(D), and from Assumption 1, is such that \(\Vert \varvec{X}^*,y^*,\varvec{Z}^*\Vert _2 = O(\sqrt{n})\). Using the triangular inequality, and combining the latter inequality with our previous observations, yields that \(\Vert (\varvec{X}_{r_0}^*,y_{r_0}^*)\Vert _2 = O(\sqrt{n})\). From the definition of the operator in (6), we know that:

$$\begin{aligned} -C + {\mathcal {A}}^* y_{r_0}^* - \mu (X_{r_0}^* - \varXi _0) \ \in \partial \delta _{{\mathcal {S}}_+^n}(X_{r_0}^*),\qquad {\mathcal {A}}X_{r_0}^* + \mu (y_{r_0}^*-\lambda _0) - b \ = 0, \end{aligned}$$

where \(\partial (\delta _{{\mathcal {S}}_+^n}(\cdot ))\) is the sub-differential of the indicator function defined in (4). Hence, there must exist \(-Z_{r_0}^* \in \partial \delta _{{\mathcal {S}}_+^n}(X_{r_0}^*)\) (and thus, \(Z_{r_0}^* \in {\mathcal {S}}^n_{+}\), \(\langle X_{r_0},Z_{r_0} \rangle = 0\)), such that:

$$\begin{aligned}Z_{r_0}^* = C - {\mathcal {A}}^* y_{r_0}^* + \mu (X_{r_0}^* - \varXi _0),\quad \langle X^*_{r_0},Z^*_{r_0}\rangle = 0,\quad \Vert \varvec{Z}_{r_0}^*\Vert _2 = O(\sqrt{n}),\end{aligned}$$

where \(\Vert \varvec{Z}_{r_0}^*\Vert _2 = O(\sqrt{n})\) follows from Assumption 2, combined with \(\Vert (\varvec{X}^*_{r_0},y^*_{r_0})\Vert _2 = O(\sqrt{n})\).

Let us now assume that at some iteration k of Algorithm IP–PMM, we have \(\Vert (\varvec{\varXi }_k,\lambda _k)\Vert _2 = O(\sqrt{n})\). There are two cases for the subsequent iterations:

1.:

The proximal estimates are updated, that is \((\varXi _{k+1},\lambda _{k+1}) = (X_{k+1},y_{k+1})\), or

2.:

the proximal estimates stay the same, i.e. \((\varXi _{k+1},\lambda _{k+1}) = (\varXi _k,\lambda _k)\).

Case 1 We know by construction that this occurs only if the following is satisfied:

$$\begin{aligned} \Vert (r_p,\varvec{R}_d)\Vert _2 \le K_N \frac{\mu _{k+1}}{\mu _0}, \end{aligned}$$

where \(r_p,\ R_d\) are defined in Algorithm IP–PMM. However, from the neighbourhood conditions in (19), we know that:

$$\begin{aligned} \Vert \big (r_p + \mu _{k+1}(y_{k+1}-\lambda _k), \varvec{R}_d + \mu _{k+1}(\varvec{X}_{k+1}-\varvec{\varXi }_k)\big )\Vert _2 \le K_N \frac{\mu _{k+1}}{\mu _0}. \end{aligned}$$

Combining the last two inequalities by applying the triangular inequality, and using the inductive hypothesis (\(\Vert (\varvec{\varXi }_k,\lambda _k)\Vert _2 = O(\sqrt{n})\)), yields that

$$\begin{aligned} \Vert (\varvec{X}_{k+1},y_{k+1})\Vert _2 \le \frac{2K_N}{\mu _0} + \Vert (\varvec{\varXi }_k,\lambda _k)\Vert _2 = O(\sqrt{n}).\end{aligned}$$

Hence, \(\Vert (\varvec{\varXi }_{k+1},\lambda _{k+1})\Vert _2 = O(\sqrt{n})\). Then, we can invoke Lemma 4.1, with \(\lambda = \lambda _{k+1}\), \(\varXi = \varXi _{k+1}\) and any \(\mu \ge 0\), which gives

$$\begin{aligned} \Vert (\varvec{X}_{r_{k+1}}^*,y_{r_{k+1}}^*) - (\varvec{X}^*,y^*)\Vert _{2} \le \Vert (\varvec{\varXi }_{k+1},\lambda _{k+1})-(\varvec{X}^*,y^*)\Vert _{2}. \end{aligned}$$

A simple manipulation shows that \(\Vert (\varvec{X}_{r_{k+1}}^*,y_{r_{k+1}}^*)\Vert _2 = O(\sqrt{n})\). As before, we use (6) alongside Assumption 2 to show the existence of \(-Z_{r_{k+1}}^* \in \partial \delta _{{\mathcal {S}}^n_{+}}(X_{r_{k+1}}^*)\), such that the triple \((X_{r_{k+1}}^*,y_{r_{k+1}}^*,Z_{r_{k+1}}^*)\) satisfies (24) with \(\Vert \varvec{Z}_{r_{k+1}}^*\Vert _2 = O(\sqrt{n})\).

Case 2 In this case, we have \((\varXi _{k+1},\lambda _{k+1}) = (\varXi _k,\lambda _k)\), and hence the inductive hypothesis gives us directly that \(\Vert (\varvec{\varXi }_{k+1},\lambda _{k+1})\Vert _2 = O(\sqrt{n})\). As before, there exists a triple \((X_{r_{k+1}}^*,y_{r_{k+1}}^*,Z_{r_{k+1}}^*)\) satisfying (24), with \(\Vert (\varvec{X}_{r_{k+1}}^*,y_{r_{k+1}}^*,\varvec{Z}_{r_{k+1}}^*)\Vert _2 = O(\sqrt{n})\). \(\square \)

In the next lemma we define and bound a triple solving a particular parametrized nonlinear system of equations, which is then used in Lemma 4.4 in order to prove boundedness of the iterates of Algorithm IP–PMM.

Lemma 4.3

Given Assumptions 1, 2, a pair \((\varXi _k,\lambda _k)\), produced at an arbitrary iteration \(k \ge 0\) of Algorithm IP–PMM, and any \(\mu \in [0,\infty )\), there exists a triple \((\tilde{X},\tilde{y},\tilde{Z})\) which satisfies the following system of equations:

$$\begin{aligned} \begin{aligned} A \varvec{\tilde{X}} + \mu \tilde{y}&= b + \bar{b} + \mu \lambda _k + \tilde{b}_k,\\ A^\top \tilde{y} + \varvec{\tilde{Z}} - \mu \varvec{\tilde{X}}&= \varvec{C} + \varvec{\bar{C}} - \mu \varvec{\varXi }_k + \varvec{\tilde{C}}_k,\\ \tilde{X}\tilde{Z}&= \theta I_n, \end{aligned} \end{aligned}$$
(25)

for some arbitrary \(\theta > 0\) (\(\theta = \varTheta (1)\)), with \(\tilde{X},\ \tilde{Z} \in {\mathcal {S}}^n_{++}\) and \(\Vert (\varvec{\tilde{X}},\tilde{y},\varvec{\tilde{Z}})\Vert _2 = O(\sqrt{n})\), where \(\tilde{b}_{k},\ \tilde{C}_{k}\) are defined in (19), while \(\bar{b},\ \bar{C}\) are defined with the starting point in (18). Furthermore, \(\nu _{\min }(\tilde{X}) \ge \xi \) and \(\nu _{\min }(\tilde{Z}) \ge \xi \), for some positive \(\xi = \varTheta (1)\).

Proof

Let \(k \ge 0\) denote an arbitrary iteration of Algorithm IP–PMM. Let also \(\bar{b},\ \bar{C}\) as defined in (18), and \(\tilde{b}_{k},\ \tilde{C}_{k}\), as defined in the neighbourhood conditions in (19). Given an arbitrary positive constant \(\theta > 0\), we consider the following barrier primal-dual pair:

$$\begin{aligned}&\underset{X \in {\mathcal {S}}^n}{\text {min}} \ \big ( \langle C+\bar{C} + \tilde{C}_k),X\rangle -\theta \ln (\det (X)) \big ), \ \ \text {s.t.} \ {\mathcal {A}} X= b + \bar{b} + \tilde{b}_k, \end{aligned}$$
(26)
$$\begin{aligned}&\underset{y \in {\mathbb {R}}^m,Z \in {\mathcal {S}}^n}{\text {max}} \ \big ((b + \bar{b} + \tilde{b}_k)^\top y +\theta \ln (\det (Z))\big ), \ \ \text {s.t.}\ {\mathcal {A}}^*y + Z = C+\bar{C} + \tilde{C}_k. \end{aligned}$$
(27)

Let us now define the following triple:

$$\begin{aligned} (\hat{X},\hat{y},\hat{Z}) :=\arg \min _{(X,y,Z)} \big \{\Vert (\varvec{X},\varvec{Z})\Vert _2: A\varvec{X} = \tilde{b}_k,\ A^\top y + \varvec{Z} = \tilde{C}_k \}. \end{aligned}$$

From the neighbourhood conditions (19), we know that \(\Vert (\tilde{b}_k,\varvec{\tilde{C}}_k)\Vert _{{\mathcal {S}}} \le \gamma _{{\mathcal {S}}}\rho \), and from the definition of the semi-norm in (17), we have that \(\Vert (\varvec{\hat{X}},\varvec{\hat{Z}})\Vert _2 \le \gamma _{{\mathcal {S}}} \rho \). Using (17) alongside Assumption 2, we can also show that \(\Vert \hat{y}\Vert _2 = \varTheta (\Vert (\varvec{\hat{X}},\varvec{\hat{Z}})\Vert _2)\). On the other hand, from the definition of the starting point, we have that \((X_0,Z_0) = \rho (I_n,I_n)\). By defining the following auxiliary point:

$$\begin{aligned} (\bar{X},\bar{y},\bar{Z}) = (X_0,y_0,Z_0) + (\hat{X},\hat{y},\hat{Z}), \end{aligned}$$

we have that \((1 + \gamma _{{\mathcal {S}}})\rho (I_n,I_n) \succeq (\bar{X},\bar{Z}) \succeq (1-\gamma _{{\mathcal {S}}})\rho (I_n,I_n)\), that is, the eigenvalues of these matrices are bounded by constants that are independent of the problem under consideration. By construction, the triple \((\bar{X},\bar{y},\bar{Z})\) is a feasible solution for the primal-dual pair in (26)–(27), giving bounded primal and dual objective values, respectively. This, alongside Weierstrass’s theorem on a potential function, can be used to show that the solution of problem (26)–(27) is bounded. In other words, for any choice of \(\theta > 0\), there must exist a bounded triple \((X_s^*,y_s^*,Z_s^*)\) solving (26)–(27), i.e.:

$$\begin{aligned} \begin{aligned} A\varvec{X}_s^* = b + \bar{b} + \tilde{b}_k,\quad A^\top y_s^* + \varvec{Z}_s^* = \varvec{C} + \varvec{\bar{C}} + \varvec{\tilde{C}}_k,\quad X_s^* Z_s^* = \theta I_n, \end{aligned} \end{aligned}$$

such that \(\nu _{\max }(X_{s^*}) \le K_{s^*}\) and \(\nu _{\max }(Z_{s^*}) \le K_{s^*}\), where \(K_{s^*} > 0\) is a positive constant. In turn, combining this with Assumption 2 implies that \(\Vert (\varvec{X}_s^*,y_s^*,\varvec{Z}_s^*)\Vert _2 = O(\sqrt{n})\).

Let us now apply the PMM to (26)–(27), given the estimates \(\varXi _k,\ \lambda _k\). We should note at this point that the proximal operator used here is different from that in (5), since it is based on a different maximal monotone operator from that in (3). In particular, we associate a single-valued maximal monotone operator to (26)–(27), with graph:

$$\begin{aligned} \tilde{T}_{{\mathcal {L}}}(X,y) :=\big \{(V,u): V = (C + \bar{C} + \tilde{C}_k) - {\mathcal {A}}^*y - \theta X^{-1}, u = {\mathcal {A}}X-(b+\bar{b}+\tilde{b}_k) \big \}. \end{aligned}$$

As before, the proximal operator is defined as \(\tilde{\mathcal {P}} :=\left( I_{n+m}+ \frac{1}{\mu }\tilde{T}_{\mathcal {L}}\right) ^{-1}\), and is single-valued and non-expansive. We let any \(\mu \in [0,\infty )\) and define the following penalty function:

$$\begin{aligned} \begin{aligned} \tilde{{\mathcal {L}}}_{\mu ,\theta }(X;\varXi _k,\lambda _k) :=\&\langle C + \bar{C} + \tilde{C}_k, X\rangle + \frac{1}{2}\mu \Vert X-\varXi _k\Vert _{F}^2 + \frac{1}{2\mu }\Vert {\mathcal {A}}X-(b+\bar{b}+\tilde{b}_k)\Vert _{2}^2 \\&- (\lambda _k)^\top ({\mathcal {A}}X - (b+\bar{b}+\tilde{b}_k))-\theta \ln (\det (X)). \end{aligned} \end{aligned}$$

By defining the variables \(y = \lambda _k - \frac{1}{\mu }({\mathcal {A}}X - (b+\bar{b}+\tilde{b}_k))\) and \(Z = \theta X^{-1}\), we can see that the optimality conditions of this PMM sub-problem are exactly those stated in (25). Equivalently, we can find a pair \((\tilde{X},\tilde{y})\) such that \((\tilde{X},\tilde{y}) = \tilde{{\mathcal {P}}}(\varXi _k,\lambda _k)\) and set \(\tilde{Z} = \theta \tilde{X}^{-1}\). We can now use the non-expansiveness of \(\tilde{{\mathcal {P}}}\), as in Lemma 4.1, to obtain:

$$\begin{aligned} \Vert (\varvec{\tilde{X}},\tilde{y})-(\varvec{X}_s^*,y_s^*)\Vert _{2} \le \Vert (\varXi _k,\lambda _k)-(\varvec{X}_s^*,y_s^*)\Vert _{2}. \end{aligned}$$

But we know, from Lemma 4.2, that \(\Vert (\varvec{\varXi }_k,\lambda _k)\Vert _2 = O(\sqrt{n})\), \(\forall \ k \ge 0\). Combining this with our previous observations, yields that \(\Vert (\varvec{\tilde{X}},\tilde{y})\Vert _2 = O(\sqrt{n})\). Setting \(\tilde{Z} = \theta \tilde{X}^{-1}\), gives a triple \((\tilde{X},\tilde{y},\tilde{Z})\) that satisfies (25), while \(\Vert (\varvec{\tilde{X}},\tilde{y},\varvec{\tilde{Z}})\Vert _2 = O(\sqrt{n})\) (from dual feasibility).

To conclude the proof, let us notice that the value of \(\tilde{{\mathcal {L}}}_{\mu ,\theta }(X;\varXi _k,\lambda _k)\) will grow unbounded as \(\nu _{\min }(X) \rightarrow 0\) or \(\nu _{\max }(X) \rightarrow \infty \). Hence, there must exist a constant \(\tilde{K} > 0\), such that the minimizer of this function satisfies \(\frac{1}{\tilde{K}} \le \nu _{\min }(\tilde{X}) \le \nu _{\max }(\tilde{X}) \le \tilde{K}\). The relation \(\tilde{X}\tilde{Z} = \theta I_n\) then implies that \(\frac{\theta }{\tilde{K}} \le \nu _{\min }(\tilde{Z}) \le \nu _{\max }(\tilde{Z}) \le \theta \tilde{K}\). Hence, there exists some \(\xi = \varTheta (1)\) such that \(\nu _{\min }(\tilde{X}) \ge \xi \) and \(\nu _{\min }(\tilde{Z}) \ge \xi \). \(\square \)

In the following lemma, we derive boundedness of the iterates of Algorithm IP–PMM.

Lemma 4.4

Given Assumptions 1 and 2, the iterates \((X_k,y_k,Z_k)\) produced by Algorithm IP–PMM, for all \(k \ge 0\), are such that:

$$\begin{aligned} {\text {Tr}}(X_k) = O(n),\qquad {\text {Tr}}(Z_k) = O(n),\qquad \Vert (\varvec{X}_k,y_k,\varvec{Z}_k)\Vert _2 = O(n). \end{aligned}$$

Proof

Let an iterate \((X_k,y_k,Z_k) \in \mathscr {N}_{\mu _k}(\varXi _k,\lambda _k)\), produced by Algorithm IP–PMM during an arbitrary iteration \(k \ge 0\), be given. Firstly, we invoke Lemma 4.3, from which we have a triple \((\tilde{X},\tilde{y},\tilde{Z})\) satisfying (25), for \(\mu = \mu _k\). Similarly, by invoking Lemma 4.2, we know that there exists a triple \((X_{r_k}^*,y_{r_k}^*,Z_{r_k}^*)\) satisfying (24), with \(\mu = \mu _k\). Consider the following auxiliary point:

$$\begin{aligned} \left( \left( 1-\frac{\mu _k}{\mu _0}\right) X_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{X} - X_k,\ \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* +\frac{\mu _k}{\mu _0} \tilde{y} - y_k,\ \left( 1-\frac{\mu _k}{\mu _0}\right) Z_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{Z} - Z_k\right) .\nonumber \\ \end{aligned}$$
(28)

Using (28) and (24)–(25) (for \(\mu = \mu _k\)), one can observe that:

$$\begin{aligned} \begin{aligned}&A\left( \left( 1-\frac{\mu _k}{\mu _0}\right) \varvec{X}_{r_k}^* + \frac{\mu _k}{\mu _0} \varvec{\tilde{X}} - \varvec{X}_k\right) + \mu _k \left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{y} - y_k\right) \\&\quad = \left( 1-\frac{\mu _k}{\mu _0}\right) (A\varvec{X}_{r_k}^* + \mu _k y_{r_k}^*) + \frac{\mu _k}{\mu _0} (A\varvec{\tilde{X}}+ \mu _k \tilde{y}) - A\varvec{X}_k -\mu _k y_k \\&\quad =\left( 1-\frac{\mu _k}{\mu _0}\right) (b + \mu _k \lambda _k) + \frac{\mu _k}{\mu _0} (b + \mu _k \lambda _k + \tilde{b}_k +\bar{b}) - A\varvec{X}_k - \mu _k y_k \\&\quad = b +\mu _k \lambda _k + \frac{\mu _k}{\mu _0}(\tilde{b}_k+\bar{b}) - A\varvec{X}_k - \mu _k y_k = \ 0, \end{aligned} \end{aligned}$$

where the last equality follows from the definition of the neighbourhood \(\mathscr {N}_{\mu _k}(\varXi _k,\lambda _k)\). Similarly, one can show that:

$$\begin{aligned}&-\mu _k \left( \left( 1-\frac{\mu _k}{\mu _0}\right) \varvec{X}_{r_k}^* + \frac{\mu _k}{\mu _0} \varvec{\tilde{X}} - \varvec{X}_k\right) + A^\top \left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* +\frac{\mu _k}{\mu _0} \tilde{y} - y_k\right) \\&\quad + \left( \left( 1-\frac{\mu _k}{\mu _0}\right) \varvec{Z}_{r_k}^* + \frac{\mu _k}{\mu _0} \varvec{\tilde{Z}} - \varvec{Z}_k\right) = 0. \end{aligned}$$

By combining the previous two relations, we obtain:

$$\begin{aligned} \begin{aligned}&\left( \left( 1-\frac{\mu _k}{\mu _0}\right) \varvec{X}_{r_k}^* + \frac{\mu _k}{\mu _0} \varvec{\tilde{X}} - \varvec{X}_k\right) ^\top \left( \left( 1-\frac{\mu _k}{\mu _0}\right) \varvec{Z}_{r_k}^* + \frac{\mu _k}{\mu _0} \varvec{\tilde{Z}} - \varvec{Z}_k\right) \\&\quad = \mu _k\left( \left( 1-\frac{\mu _k}{\mu _0}\right) \varvec{X}_{r_k}^* + \frac{\mu _k}{\mu _0}\varvec{\tilde{X}} - \varvec{X}_k\right) ^\top \left( \left( 1-\frac{\mu _k}{\mu _0}\right) \varvec{X}_{r_k}^* + \frac{\mu _k}{\mu _0} \varvec{\tilde{X}} - \varvec{X}_k\right) \\&\qquad + \mu _k \left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{y} - y_k\right) ^\top \left( \left( 1-\frac{\mu _k}{\mu _0}\right) y_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{y} - y_k\right) \ge \ 0. \end{aligned} \end{aligned}$$
(29)

Observe that (29) can equivalently be written as:

$$\begin{aligned} \begin{aligned}&\left\langle \left( 1-\frac{\mu _k}{\mu _0}\right) X_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{X}, Z_k \right\rangle + \left\langle \left( 1-\frac{\mu _k}{\mu _0}\right) Z_{r_k}^* + \frac{\mu _k}{\mu _0}\tilde{Z}, X_k \right\rangle \le \\&\quad \left\langle \left( 1-\frac{\mu _k}{\mu _0}\right) X_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{X},\left( 1-\frac{\mu _k}{\mu _0}\right) Z_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{Z}\right\rangle + \langle X_k, Z_k\rangle . \end{aligned} \end{aligned}$$

However, from Lemmas 4.2 and 4.3, we have that \(\tilde{X} \succeq \xi I_n\) and \(\tilde{Z} \succeq \xi I_n\), for some positive constant \(\xi = \varTheta (1)\), \(\langle X_{r_k}^*,Z_k \rangle \ge 0\), \(\langle Z_{r_k}^*, X_k\rangle \ge 0\), while \(\Vert (X_{r_k}^*,Z_{r_k}^*)\Vert _F = O(\sqrt{n})\), and \(\Vert (\tilde{X},\tilde{Z})\Vert _F = O(\sqrt{n})\). Furthermore, by definition we have that \( n \mu _k = \langle X_k,Z_k \rangle \). By combining all the previous, we obtain:

$$\begin{aligned} \begin{aligned}&\frac{\mu _k}{\mu _0} \xi \big ({\text {Tr}}(X_k) + {\text {Tr}}(Z_k) \big )\\&\quad = \frac{\mu _k}{\mu _0} \xi \big (\langle I_n, X_k\rangle + \langle I_n, Z_k\rangle \big ) \\&\quad \le \left\langle \left( 1-\frac{\mu _k}{\mu _0}\right) X_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{X}, Z_k \right\rangle + \left\langle \left( 1-\frac{\mu _k}{\mu _0}\right) Z_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{Z}, X_k\right\rangle \\&\quad \le \left\langle \left( 1-\frac{\mu _k}{\mu _0}\right) X_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{X},\left( 1-\frac{\mu _k}{\mu _0}\right) Z_{r_k}^* + \frac{\mu _k}{\mu _0} \tilde{Z}\right\rangle + \langle X_k, Z_k \rangle \\&\quad =\frac{\mu _k}{\mu _0}\left( 1-\frac{\mu _k}{\mu _0}\right) \langle X_{r_k}^*, \tilde{Z}\rangle + \frac{\mu _k}{\mu _0} \left( 1-\frac{\mu _k}{\mu _0}\right) \langle \tilde{X}, Z_r^*\rangle \\&\qquad + \left( \frac{\mu _k}{\mu _0}\right) ^2 \langle \tilde{X}, \tilde{Z} \rangle + \langle X_k, Z_k \rangle \\&\quad = O(n \mu _k ), \end{aligned} \end{aligned}$$
(30)

where the first inequality follows since \(X_{r_k}^*,\ Z_{r_k}^*,\ \tilde{X},\ \tilde{Z} \in {\mathcal {S}}^n_+\) and \((\tilde{X},\tilde{Z}) \succeq \xi (I_n,I_n)\). In the penultimate equality we used (24) (i.e. \(\langle X_{r_k}^*,Z_{r_k}^*\rangle = 0\)). Hence, (30) implies that:

$$\begin{aligned} {\text {Tr}}(X_k) = O(n), \qquad {\text {Tr}}(Z_k) = O(n). \end{aligned}$$

From positive definiteness we have that \(\Vert (X_k,Z_k)\Vert _F = O(n)\). Finally, from the neighbourhood conditions we know that:

$$\begin{aligned} \varvec{C} - A^\top y_k - \varvec{Z}_k + \mu _k (\varvec{X}_k - \varvec{\varXi }_k) + \frac{\mu _k}{\mu _0} (\varvec{\tilde{C}}_k + \varvec{\bar{C}}) = 0. \end{aligned}$$

All terms above (except for \(y_k\)) have a 2-norm that is bounded by some quantity that is O(n) (note that \(\Vert (\varvec{\bar{C}},\bar{b})\Vert _2 = O(\sqrt{n})\) using Assumption 2 and the definition in (18)). Hence, using again Assumption 2 (i.e. A is full rank, with singular values independent of n and m) yields that \(\Vert y_k\Vert _2 = O(n)\), and completes the proof. \(\square \)

In what follows, we provide Lemmas 4.54.7, which we use to prove boundedness of the Newton direction computed at every iteration of Algorithm IP–PMM, in Lemma 4.8.

Lemma 4.5

Let \(D_k = S_k^{-\frac{1}{2}}F_k = S_k^{\frac{1}{2}}E_k^{-1}\), where \(S_k = E_k F_k\), and \(E_k,\ F_k\) are defined as in the Newton system in (21). Then, for any \(M \in {\mathbb {R}}^{n\times n}\),

$$\begin{aligned} \Vert D_k^{-T} \varvec{M}\Vert ^2_2 \le \frac{1}{(1-\gamma _{\mu })\mu _k}\Vert Z_k^{\frac{1}{2}}M Z_k^{\frac{1}{2}}\Vert _F^2,\quad \Vert D_k\varvec{M}\Vert _2^2 \le \frac{1}{(1-\gamma _{\mu })\mu _k} \Vert X_k^{\frac{1}{2}}M X_k^{\frac{1}{2}}\Vert _F^2, \end{aligned}$$

where \(\gamma _{\mu }\) is defined in (19). Moreover, we have that:

$$\begin{aligned} \Vert D_k^{-T}\Vert _2^2 \le \frac{1}{(1-\gamma _{\mu })\mu _k} \Vert Z_k\Vert _F^2 = O\bigg (\frac{n^2}{\mu _k}\bigg ),\quad \Vert D_k\Vert _2^2 \le \frac{1}{(1-\gamma _{\mu })\mu _k} \Vert X_k \Vert _F^2 = O\bigg ( \frac{n^2}{\mu _k}\bigg ). \end{aligned}$$

Proof

The proof of the first two inequalities follows exactly the developments in [31, Lemma 5]. The bound on the 2-norm of the matrix \(D_k^{-T}\) follows by choosing M such that \(\varvec{M}\) is a unit eigenvector, corresponding to the largest eigenvalue of \(D_k^{-T}\). Then, \(\Vert D_k^{-T}\varvec{M}\Vert _2^2 = \Vert D_k^{-T}\Vert _2^2\). But, we have that:

$$\begin{aligned} \begin{aligned} \Vert D_k^{-T}\varvec{M}\Vert _2^2 \le \&\frac{1}{(1-\gamma _{\mu })\mu _k} \Vert Z_k^{\frac{1}{2}}MZ_k^{\frac{1}{2}}\Vert _F^2 \\ =\&\frac{1}{(1-\gamma _{\mu })\mu _k}{\text {Tr}}(Z_k M^\top Z_k M)\\ \le \&\frac{1}{(1-\gamma _{\mu })\mu _k} \Vert Z_k\Vert _F^2 = O\bigg ( \frac{n^2}{\mu _k}\bigg ) \end{aligned} \end{aligned}$$

where we used the cyclic property of the trace as well as Lemma 4.4. The same reasoning applies to deriving the bound for \(\Vert D_k\Vert _2^2\). \(\square \)

Lemma 4.6

Let \(D_k\) and \(S_k\) be defined as in Lemma 4.5. Then, we have that:

$$\begin{aligned} \Vert D_k^{-T} \varvec{\varDelta X}_k\Vert _2^2 + \Vert D_k \varvec{\varDelta Z}_k\Vert _2^2 + 2\langle \varDelta X_k, \varDelta Z_k \rangle = \Vert S_k^{-\frac{1}{2}} \varvec{R}_{\mu ,k}\Vert _2^2, \end{aligned}$$

where \(R_{\mu ,k} = \sigma _k \mu _k I_n - Z_k^{\frac{1}{2}} X_k Z_k^{\frac{1}{2}}\). Furthermore,

$$\begin{aligned} \Vert H_{P_k}(\varDelta X_k \varDelta Z_k) \Vert _F \le \frac{\sqrt{\frac{1+\gamma _{\mu }}{1-\gamma _{\mu }}}}{2}\big (\Vert D_k^{-T}\varvec{\varDelta X}_k\Vert ^2_2 + \Vert D_k \varvec{\varDelta Z}_k\Vert _2^2 \big ), \end{aligned}$$

where \(\gamma _{\mu }\) is defined in (19).

Proof

The equality follows directly by pre-multiplying by \(S^{-\frac{1}{2}}\) on both sides of the third block equation of the Newton system in (21) and by then taking the 2-norm (see [29, Lemma 3.1]). For a proof of the inequality, we refer the reader to [29, Lemma 3.3]. \(\square \)

Lemma 4.7

Let \(S_k\) be as defined in Lemma 4.5, and \(R_{\mu ,k}\) as defined in Lemma 4.6. Then,

$$\begin{aligned} \Vert S_k^{-\frac{1}{2}} \varvec{R}_{\mu ,k}\Vert _2^2 = O(n \mu _k). \end{aligned}$$

Proof

The proof is omitted since it follows exactly the developments in [31, Lemma 7]. \(\square \)

We are now ready to derive bounds for the Newton direction computed at every iteration of Algorithm IP–PMM.

Lemma 4.8

Given Assumptions 1 and 2, and the Newton direction \((\varDelta X_k, \varDelta y_k, \varDelta Z_k)\) obtained by solving system (21) during an arbitrary iteration \(k \ge 0\) of Algorithm IP–PMM, we have that:

$$\begin{aligned} \Vert H_{P_k}(\varDelta X_k \varDelta Z_k)\Vert _F = O(n^{4}\mu ),\qquad \Vert (\varvec{\varDelta X}_k,\varDelta y_k,\varvec{\varDelta Z}_k)\Vert _2 = O(n^{3}). \end{aligned}$$

Proof

Consider an arbitrary iteration k of Algorithm IP–PMM. We invoke Lemmas 4.2, 4.3, for \(\mu = \sigma _k \mu _k\). That is, there exists a triple \((X_{r_k}^*,y_{r_k}^*,Z_{r_k}^*)\) satisfying (24), and a triple \((\tilde{X},\tilde{y},\tilde{Z})\) satisfying (25), for \(\mu = \sigma _k \mu _k\). Using the centering parameter \(\sigma _k\), define:

$$\begin{aligned} \begin{aligned} \varvec{\hat{C}} =&-\left( \frac{\sigma _k}{\mu _0} \varvec{\bar{C}} - (1-\sigma _k)\left( \varvec{X}_k - \varvec{\varXi }_k + \frac{\mu _k}{\mu _0}(\varvec{\tilde{X}}-\varvec{X}_{r_k}^*)\right) +\frac{1}{\mu _k}\varvec{\mathsf {E}}_{d,k}\right) ,\\ \hat{b} =&-\left( \frac{\sigma _k}{\mu _0} \bar{b} + (1-\sigma _k)\left( y_k - \lambda _k +\frac{\mu _k}{\mu _0}(\tilde{y}-y_{r_k}^*) \right) + \frac{1}{\mu _k}\epsilon _{p,k}\right) , \end{aligned} \end{aligned}$$
(31)

where \(\bar{b},\ \bar{C},\ \mu _0\) are given by (18) and \(\epsilon _{p,k}\), \(\mathsf {E}_{d,k}\) model the errors which occur when system (20) is solved inexactly. Notice that these errors are required to satisfy (22) at every iteration k. Using Lemmas 4.2, 4.3, 4.4, relation (22), and Assumption 2, we know that \(\Vert (\varvec{\hat{C}},\hat{b})\Vert _2 = O(n)\). Then, by applying again Assumption 2, we know that there must exist a matrix \(\hat{X} \in {\mathbb {R}}^{n\times n}\) such that \({\mathcal {A}}\hat{X} = \hat{b},\ \Vert \hat{X}\Vert _F = O(n)\), and by setting \(\hat{Z} = \hat{C} + \mu \hat{X}\), we have that \(\Vert \hat{Z}\Vert _F = O(n)\) and:

$$\begin{aligned} {\mathcal {A}}\hat{X} = \hat{b},\qquad \hat{Z} - \mu _k \hat{X} = \hat{C}. \end{aligned}$$
(32)

Using \((X_{r_k}^*,y_{r_k}^*,Z_{r_k}^*)\), \((\tilde{X},\tilde{y},\tilde{Z})\), as well as the triple \((\hat{X},0,\hat{Z})\), where \((\hat{X},\hat{Z})\) is defined in (32), we define the following auxiliary triple:

$$\begin{aligned} (\bar{X},\bar{y},\bar{Z}) = (\varDelta X_k, \varDelta y_k, \varDelta Z_k) + \frac{\mu _k}{\mu _0} (\tilde{X}, \tilde{y}, \tilde{Z}) - \frac{\mu _k}{\mu _0} (X_{r_k}^*, y_{r_k}^*,Z_{r_k}^*) + \mu _k (\hat{X},0,\hat{Z}). \end{aligned}$$
(33)

Using (33), (31), and the second block equation of (21):

$$\begin{aligned} \begin{aligned} A\varvec{\bar{X}} + \mu _k \bar{y} =&\ (A \varvec{\varDelta X}_k + \mu _k \varDelta y_k) + \frac{\mu _k}{\mu _0}((A\varvec{\tilde{X}}+ \mu _k \tilde{y})- (A\varvec{X}_{r_k}^*+ \mu _k y_{r_k}^*)) + \mu _k A\varvec{\hat{X}}\\ =&\left( b + \sigma _k\frac{\mu _k}{\mu _0}\bar{b}-A\varvec{X}_k - \sigma _k \mu _k (y_k-\lambda _k) + \epsilon _{p,k}\right) \\&\quad + \frac{\mu _k}{\mu _0}((A\varvec{\tilde{X}} + \mu _k \tilde{y})- (A\varvec{X}_{r_k}^*+ \mu _k y_{r_k}^*))\\&\quad - \mu _k \left( \sigma _k \frac{\bar{b}}{\mu _0} + (1-\sigma _k)(y_k-\lambda _k)\right) - \frac{\mu _k}{\mu _0}(1-\sigma _k)\mu _k (\tilde{y}-y_{r_k}^*) - \epsilon _{p,k}. \end{aligned} \end{aligned}$$

Then, by deleting opposite terms in the right-hand side, and employing (24)–(25) (evaluated at \(\mu = \sigma _k \mu _k\) from the definition of \((X_{r_k}^*,y_{r_k}^*,Z_{r_k}^*)\) and \((\tilde{X},\tilde{y},\tilde{Z})\)), we have

$$\begin{aligned} \begin{aligned} A\varvec{\bar{X}} + \mu _k \bar{y}&= \left( b + \sigma _k\frac{\mu _k}{\mu _0}\bar{b}-A\varvec{X}_k - \sigma _k \mu _k (y_k-\lambda _k)\right) + \frac{\mu _k}{\mu _0}(b+\sigma _k\mu _k \lambda _k+\bar{b}+\tilde{b}_k)\\&\quad - \frac{\mu _k}{\mu _0} (\sigma _k \mu _k \lambda _k + b) - \mu _k \left( \sigma _k \frac{\bar{b}}{\mu _0} + (1-\sigma _k)(y_k-\lambda _k)\right) \\&= b + \frac{\mu _k}{\mu _0}(\bar{b}+\tilde{b}_k) - A\varvec{X}_k - \mu _k (y_k-\lambda _k)\\&= 0, \end{aligned} \end{aligned}$$

where the last equation follows from the neighbourhood conditions (i.e. \((X_k,y_k,Z_k) \in \mathscr {N}_{\mu _k}(\varXi _k,\lambda _k)\)). Similarly, we can show that:

$$\begin{aligned} A^\top \bar{y} + \bar{Z}-\mu _k \bar{X} = 0. \end{aligned}$$

The previous two equalities imply that:

$$\begin{aligned} \begin{aligned} \langle \bar{X},\bar{Z}\rangle = \langle \bar{X}, - \mathcal {A^*} \bar{y}+\mu _k \bar{X}\rangle = \mu _k \langle \bar{X}, \bar{X} \rangle + \mu _k \bar{y}^\top \bar{y} \ge 0. \end{aligned} \end{aligned}$$
(34)

On the other hand, using the last block equation of the Newton system (21), we have:

$$\begin{aligned} E_k\varvec{\bar{X}} + F_k \varvec{\bar{Z}} = \varvec{R_{\mu ,k}}+ \frac{\mu _k}{\mu _0} E_k(\varvec{\tilde{X}}-\varvec{X}_{r_k}^* + \mu _0 \varvec{\hat{X}})+\frac{\mu _k}{\mu _0} F_k(\varvec{\tilde{Z}}- \varvec{Z}_{r_k}^* + \mu _0 \varvec{\hat{Z}}), \end{aligned}$$

where \(R_{\mu ,k}\) is defined as in Lemma 4.6. Let \(S_k\) be defined as in Lemma 4.5. By multiplying both sides of the previous equation by \(S_k^{-\frac{1}{2}}\), we get:

$$\begin{aligned} D_k^{-T}\varvec{\bar{X}} + D_k\varvec{\bar{Z}} = S_k^{-\frac{1}{2}}\varvec{R_{\mu ,k}}+ \frac{\mu _k}{\mu _0} \big (D_k^{-T}(\varvec{\tilde{X}}-\varvec{X}_{r_k}^* + \mu _0 \varvec{\hat{X}}) + D_k(\varvec{\tilde{Z}}-\varvec{Z}_{r_k}^* + \mu _0 \varvec{\hat{Z}})\big ). \end{aligned}$$
(35)

But from (34) we know that \(\langle \bar{X}, \bar{Z}\rangle \ge 0\) and hence:

$$\begin{aligned} \Vert D_k^{-T}\varvec{\bar{X}} + D_k \varvec{\bar{Z}}\Vert _2^2 \ge \Vert D_k^{-T} \varvec{\bar{X}}\Vert _2^2 + \Vert D_k \varvec{\bar{Z}}\Vert _2^2. \end{aligned}$$

Combining (35) with the previous inequality, gives:

$$\begin{aligned} \begin{aligned} \Vert D_k^{-T}\varvec{\bar{X}}\Vert _2^2 \le \ \bigg \{&\Vert S_k^{-\frac{1}{2}}\varvec{R}_{\mu ,k}\Vert _2 +\frac{\mu _k}{\mu _0} \bigg (\Vert D_k^{-T}(\varvec{\tilde{X}}-\varvec{X}_{r_k}^* + \mu _0 \varvec{\hat{X}})\Vert _2 \\&\ + \Vert D_k(\varvec{\tilde{Z}}-\varvec{Z}_{r_k}^* + \mu _0 \varvec{\hat{Z}})\Vert _2\bigg ) \bigg \}^2. \end{aligned} \end{aligned}$$

We take square roots, use (33) and apply the triangular inequality, to get:

$$\begin{aligned} \begin{aligned} \Vert D_k^{-T} \varvec{\varDelta X}_k \Vert _2 \le&\ \Vert S_k^{-\frac{1}{2}} \varvec{R}_{\mu ,k}\Vert _2 + \frac{\mu _k}{\mu _0}\bigg ( 2\Vert D_k^{-T} (\varvec{\tilde{X}}-\varvec{X}_{r_k}^* + \mu _0 \varvec{\hat{X}})\Vert _2 \\&\ +\Vert D_k(\varvec{\tilde{Z}}-\varvec{Z}_{r_k}^* + \mu _0 \varvec{\hat{Z}})\Vert _2\bigg ). \end{aligned} \end{aligned}$$
(36)

We now proceed to bounding the terms in the right hand side of (36). A bound for the first term of the right hand side is given by Lemma 4.7, that is:

$$\begin{aligned} \Vert S_k^{-\frac{1}{2}} \varvec{R}_{\mu ,k}\Vert _2 = O(n^{\frac{1}{2}}\mu _k^{\frac{1}{2}}).\end{aligned}$$

On the other hand, we have (from Lemma 4.5) that

$$\begin{aligned}\Vert D_k^{-T}\Vert _2 = O\Bigg ( \frac{n}{\mu _k^{\frac{1}{2}}}\Bigg ),\qquad \Vert D_k\Vert _2 = O\Bigg ( \frac{n}{\mu _k^{\frac{1}{2}}}\Bigg ).\end{aligned}$$

Hence, using the previous bounds, as well as Lemmas 4.2, 4.3, and (32), we obtain:

$$\begin{aligned} \begin{aligned} 2\frac{\mu _k}{\mu _0}\Vert D_k^{-T}(\varvec{\tilde{X}}-\varvec{X}_{r_k}^* + \mu _0 \varvec{\hat{X}})\Vert _2 +\frac{\mu _k}{\mu _0}\Vert D_k(\varvec{\tilde{Z}}-\varvec{Z}_{r_k}^* + \mu _0 \varvec{\hat{Z}})\Vert _2 = O\big (n^{2}\mu _k^{\frac{1}{2}}\big ), \end{aligned} \end{aligned}$$

Combining all the previous bounds yields that \(\Vert D_k^{-T}\varvec{\varDelta X}_k\Vert _2 = O(n^2 \mu _k^{\frac{1}{2}})\). One can bound \(\Vert D_k \varvec{\varDelta Z}_k\Vert _2\) in the same way. The latter is omitted for ease of presentation.

Furthermore, we have that:

$$\begin{aligned} \Vert \varvec{\varDelta X}_k\Vert _2 = \Vert D_k D_k^{-T} \varvec{\varDelta X}_k\Vert _2 \le \Vert D_k\Vert _2\Vert D_k^{-T} \varvec{\varDelta X}_k\Vert _2 = O(n^{3}). \end{aligned}$$

Similarly, we can show that \(\Vert \varvec{\varDelta Z}_k \Vert _2 = O(n^{3})\). From the first block equation of the Newton system in (21), alongside Assumption 2, we can show that \(\Vert \varDelta y_k\Vert _2 = O(n^{3})\).

Finally, using the previous bounds, as well as Lemma 4.6, we obtain the desired bound on \(\Vert H_{P_k}(\varDelta X_k \varDelta Z_k)\Vert _F\), that is:

$$\begin{aligned}\Vert H_{P_k}(\varDelta X_k \varDelta Z_k)\Vert _F = O(n^4\mu _k),\end{aligned}$$

which completes the proof. \(\square \)

We can now prove (Lemmas 4.94.10) that at every iteration of Algorithm IP–PMM there exists a step-length \(\alpha _k > 0\), using which, the new iterate satisfies the conditions required by the algorithm. The lower bound on any such step-length will later determine the polynomial complexity of the method. To that end, we assume the following notation:

$$\begin{aligned}\big (X_k(\alpha ),y_k(\alpha ),Z_k(\alpha )\big ) \equiv (X_k + \alpha \varDelta X_k, y_k + \alpha \varDelta y_k, Z_k + \alpha \varDelta Z_k).\end{aligned}$$

Lemma 4.9

Given Assumptions 1, 2, and by letting \({P_k}(\alpha ) = Z_k(\alpha )^{\frac{1}{2}}\), there exists a step-length \({\alpha ^*} \in (0,1)\), such that for all \(\alpha \in [0,{\alpha ^*}]\) and for all iterations \(k \ge 0\) of Algorithm IP–PMM, the following relations hold:

$$\begin{aligned}&\langle X_k + \alpha \varDelta X_k,Z_k + \alpha \varDelta Z_k\rangle \ge (1-\alpha (1-\beta _1))\langle X_k,Z_k \rangle , \end{aligned}$$
(37)
$$\begin{aligned}&\Vert H_{{P_k}(\alpha )}(X_k(\alpha )Z_k(\alpha )) - \mu _k(\alpha )\Vert _F \le \gamma _{\mu }\mu _k(\alpha ), \end{aligned}$$
(38)
$$\begin{aligned}&\langle X_k + \alpha \varDelta X_k, Z_k + \alpha \varDelta Z_k \rangle \le (1-\alpha (1-\beta _2))\langle X_k, Z_k \rangle , \end{aligned}$$
(39)

where, without loss of generality, \(\beta _1 = \frac{\sigma _{\min }}{2}\) and \(\beta _2 = 0.99\). Moreover, \({\alpha ^*} \ge \frac{{\kappa ^*}}{n^{4}}\) for all \(k\ge 0\), where \({\kappa ^*} > 0\) is independent of n, m.

Proof

From Lemma 4.8, there exist constants \(K_{\varDelta } >0\) and \(K_{H\varDelta } > 0\), independent of n and m, such that:

$$\begin{aligned}&\langle \varDelta X_k, \varDelta Z_k \rangle = (D_k^{-T} \varvec{\varDelta X}_k)^\top (D_k \varvec{\varDelta Z}_k) \le \Vert D_k^{-T} \varvec{\varDelta X}_k\Vert _2 \Vert D_k \varvec{\varDelta Z}_k\Vert _2 \le K_{\varDelta }^2 n^4 \mu _k,\\&\Vert H_{P_k}(\varDelta X_k \varDelta Z_k)\Vert _F \le K_{H\varDelta } n^4 \mu _k.\end{aligned}$$

From the last block equation of the Newton system (20), we can show that:

$$\begin{aligned} \langle Z_k, \varDelta X_k\rangle + \langle X_k, \varDelta Z_k\rangle = (\sigma _k - 1) \langle X_k, Z_k \rangle . \end{aligned}$$
(40)

The latter can also be obtained from (21), since we require \(\mathsf {E}_{\mu ,k}= 0\). Furthermore:

$$\begin{aligned} H_{P_k}(X_k(\alpha )Z_k(\alpha )) = (1-\alpha )H_{P_k}(X_k Z_k) + \alpha \sigma _k \mu _k I_n +\alpha ^2 H_{P_k}(\varDelta X_k \varDelta Z_k), \end{aligned}$$
(41)

where \((X_{k+1},y_{k+1},Z_{k+1}) = (X_k + \alpha \varDelta X_k,y_k + \alpha \varDelta y_k, Z_k + \alpha \varDelta Z_k)\).

We proceed by proving (37). Using (40), we have:

$$\begin{aligned} \begin{aligned}&\langle X_k + \alpha \varDelta X_k,Z_k + \alpha \varDelta Z_k\rangle - (1-\alpha (1 -\beta _1))\langle X_k, Z_k\rangle \\&\quad =\langle X_k, Z_k\rangle +\alpha (\sigma _k - 1)\langle X_k, Z_k \rangle + \alpha ^2 \langle \varDelta X_k, \varDelta Z_k \rangle - (1-\alpha )\langle X_k, Z_k \rangle -\alpha \beta _1 \langle X_k, Z_k \rangle \\&\quad \ge \alpha (\sigma _k - \beta _1) \langle X_k, Z_k\rangle - \alpha ^2 K_{\varDelta }^2 n^4 \mu _k \ge \alpha \left( \frac{\sigma _{\min }}{2}\right) n \mu _k - \alpha ^2 K_{\varDelta }^2 n^4 \mu _k, \end{aligned} \end{aligned}$$

where we set (without loss of generality) \(\beta _1 = \frac{\sigma _{\min }}{2}\). The most-right hand side of the previous inequality will be non-negative for every \(\alpha \) satisfying:

$$\begin{aligned} \alpha \le \frac{\sigma _{\min }}{2 K_{\varDelta }^2 n^3}. \end{aligned}$$

In order to prove (38), we will use (41) and the fact that from the neighbourhood conditions we have that \(\Vert H_{P_k}(X_k Z_k) - \mu _k\Vert _F \le \gamma _{\mu } \mu _k\). For that, we use the result in [29, Lemma 4.2], stating that:

$$\begin{aligned} \Vert H_{{P_k}(\alpha )}(X_k(\alpha ) Z_k(\alpha )) - \mu _k(\alpha ) I_n\Vert _F \le \Vert H_{P_k}(X_k(\alpha ) Z_k(\alpha )) - \mu _k(\alpha ) I_n\Vert _F. \end{aligned}$$

By combining all the previous, we have:

$$\begin{aligned} \begin{aligned}&\Vert H_{{P_k}(\alpha )}(X_k(\alpha ) Z_k(\alpha )) - \mu _k(\alpha ) I_n\Vert _F - \gamma _{\mu }\mu _k(\alpha ) \\&\quad \le \Vert H_{P_k}(X_k(\alpha ) Z_k(\alpha )) - \mu _k(\alpha ) I_n\Vert _F - \gamma _{\mu } \mu _k(\alpha ) \\&\quad =\Vert (1-\alpha )(H_{P_k}(X_k Z_k)-\mu _k I_n) + \alpha ^2 H_{P_k}(\varDelta X_k, \varDelta Z_k)\\&\qquad - \frac{\alpha ^2}{n} \langle \varDelta X_k, \varDelta Z_k \rangle I_n\Vert _F - \gamma _{\mu } \mu _k(\alpha ) \\&\quad \le (1-\alpha )\Vert H_{P_k}(X_kZ_k) - \mu _k I_n\Vert _F + \alpha ^2\mu _k \bigg (\frac{K_{\varDelta }^2}{n} + K_{H\varDelta }\bigg )n^4 \ \\&\qquad - \gamma _{\mu } \bigg ((1-\alpha )\mu _k + \alpha \sigma _k \mu _k + \frac{\alpha ^2}{n}\langle \varDelta X_k, \varDelta Z_k \rangle \bigg ) \\&\quad \le -\gamma _{\mu } \alpha \sigma _{\min } \mu _k + \alpha ^2 \mu _k \bigg (\frac{2K_{\varDelta }^2}{n} + K_{H\varDelta } \bigg )n^4, \end{aligned} \end{aligned}$$

where we used the neighbourhood conditions in (19), the equality \(\mu _k(\alpha ) = (1-\alpha )\mu _k + \alpha \sigma _k \mu _k + \frac{\alpha ^2}{n} \langle \varDelta X_k, \varDelta Z_k \rangle \) (which can be derived from (40)), and the third block equation of the Newton system (21). The most-right hand side of the previous is non-positive for every \(\alpha \) satisfying:

$$\begin{aligned} \alpha \le \frac{\sigma _{\min }\gamma _{\mu }}{\left( \frac{2K_{\varDelta }^2}{n} + K_{H\varDelta }\right) n^4}. \end{aligned}$$

Finally, to prove (39), we set (without loss of generality) \(\beta _2 = 0.99\). We know, from Algorithm IP–PMM, that \(\sigma _{\max } \le 0.5\). With the previous two remarks in mind, we have:

$$\begin{aligned} \begin{aligned}&\frac{1}{n}\langle X_k + \alpha \varDelta X_k, Z_k + \alpha \varDelta Z_k \rangle - (1-0.01\alpha )\mu _k \\&\quad \le (1-\alpha )\mu _k + \alpha \sigma _k \mu _k + \alpha ^2 \frac{K_{\varDelta }^2 n^4}{n}\mu _k - (1-0.01 \alpha )\mu _k \\&\quad \le -0.99\alpha \mu _k + 0.5\alpha \mu _k + \alpha ^2 \frac{K_{\varDelta }^2 n^4}{n} \mu _k \\&\quad =-0.49\alpha \mu _k +\alpha ^2\frac{K_{\varDelta }^2 n^4}{n}\mu _k. \end{aligned} \end{aligned}$$

The last term will be non-positive for every \(\alpha \) satisfying:

$$\begin{aligned} \alpha \le \frac{0.49 }{K_{\varDelta }^2 n^3}. \end{aligned}$$

By combining all the previous bounds on the step-length, we have that (37)–(39) hold for every \(\alpha \in (0,\alpha ^*)\), where:

$$\begin{aligned} \alpha ^* :=\min \bigg \{ \frac{\sigma _{\min }}{2 K_{\varDelta }^2 n^3},\ \frac{\sigma _{\min }\gamma _{\mu }}{\big (\frac{2K_{\varDelta }^2}{n} + K_{H\varDelta }\big ) n^4},\ \frac{0.49 }{K_{\varDelta }^2 n^3},\ 1\bigg \}. \end{aligned}$$
(42)

Since \({\alpha ^*} = \varOmega \big (\frac{1}{n^{4}}\big )\), we know that there must exist a constant \({\kappa ^*} > 0\), independent of n, m and of the iteration k, such that \({\alpha ^*} \ge \frac{\kappa }{n^{4}}\), for all \(k \ge 0\), and this completes the proof. \(\square \)

Lemma 4.10

Given Assumptions 1, 2, and by letting \({P_k}(\alpha ) = Z_k(\alpha )^{\frac{1}{2}}\), there exists a step-length \(\bar{\alpha } \ge \frac{\bar{\kappa }}{n^{4}} \in (0,1)\), where \(\bar{\kappa } > 0\) is independent of n, m, such that for all \(\alpha \in [0,\bar{\alpha }]\) and for all iterations \(k \ge 0\) of Algorithm IP–PMM, if \((X_k,y_k,Z_k) \in \mathscr {N}_{\mu _k}(\varXi _k,\lambda _k)\), then letting:

$$\begin{aligned} (X_{k+1},y_{k+1},Z_{k+1}) = (X_k + \alpha \varDelta X_k,y_k + \alpha \varDelta y_k, Z_k + \alpha \varDelta Z_k),\ \mu _{k+1} = \frac{\langle X_{k+1},Z_{k+1}\rangle }{n}, \end{aligned}$$

for any \(\alpha \in (0,\bar{\alpha }]\), gives \((X_{k+1},y_{k+1},Z_{k+1}) \in \mathscr {N}_{\mu _{k+1}}(\varXi _{k+1},\lambda _{k+1})\), where \(\varXi _k,\) and \(\lambda _k\) are updated as in Algorithm IP–PMM.

Proof

Let \(\alpha ^*\) be given as in Lemma 4.9 (i.e. in (42)) such that (37)–(39) are satisfied. We would like to find the maximum \(\bar{\alpha } \in (0,\alpha ^*)\), such that:

$$\begin{aligned} (X_k(\alpha ),y_k(\alpha ),Z_k(\alpha )) \in \mathscr {N}_{\mu _k(\alpha )}(\varXi _k,\lambda _k),\ \text {for all}\ \alpha \in (0,\bar{\alpha }), \end{aligned}$$

where \(\mu _k(\alpha ) = \frac{\langle X_k(\alpha ), Z_k(\alpha ) \rangle }{n}\). Let:

$$\begin{aligned} \tilde{r}_p(\alpha ) = A\varvec{X}_k(\alpha )+ \mu _k(\alpha )(y_k(\alpha )-\lambda _k) - \left( b + \frac{\mu _k(\alpha )}{\mu _0}\bar{b}\right) , \end{aligned}$$
(43)

and

$$\begin{aligned} \varvec{\tilde{R}}_d(\alpha ) = A^\top y_k(\alpha ) + \varvec{Z}_k(\alpha ) - \mu _k(\alpha )(\varvec{X}_k(\alpha )- \varvec{\varXi }_k) - \left( \varvec{C} + \frac{\mu _k(\alpha )}{\mu _0}\varvec{\bar{C}}\right) . \end{aligned}$$
(44)

In other words, we need to find the maximum \(\bar{\alpha } \in (0,\alpha ^*)\), such that:

$$\begin{aligned} \Vert \tilde{r}_p(\alpha ),\varvec{\tilde{R}}_d(\alpha )\Vert _2 \le K_N \frac{\mu _k(\alpha )}{\mu _0},\ \ \Vert \tilde{r}_p(\alpha ),\varvec{\tilde{R}}_d(\alpha )\Vert _{{\mathcal {S}}} \le \gamma _{{\mathcal {S}}} \rho \frac{\mu _k(\alpha )}{\mu _0} ,\ \text {for all}\ \alpha \in (0,\bar{\alpha }). \end{aligned}$$
(45)

If the latter two conditions hold, then \((X_k(\alpha ),y_k(\alpha ),Z_k(\alpha )) \in \mathscr {N}_{\mu _k(\alpha )}(\varXi _k,\lambda _k),\ \text {for all}\ \alpha \in (0,\bar{\alpha })\). Then, if Algorithm IP–PMM updates \(\varXi _k\), and \(\lambda _k\), it does so only when similar conditions (as in (45)) hold for the new parameters. Indeed, notice that the estimates \(\varXi _k\) and \(\lambda _k\) are only updated if the last conditional of Algorithm IP–PMM is satisfied. But this is equivalent to saying that (45) is satisfied after setting \(\varvec{\varXi }_k = \varvec{X}_k(\alpha )\) and \(\lambda _k = y_k(\alpha )\). On the other hand, if the parameters are not updated, the new iterate lies in the desired neighbourhood because of (45), alongside (37)–(39).

We start by rearranging \(\tilde{r}_p(\alpha )\). Specifically, we have that:

$$\begin{aligned} \begin{aligned} \tilde{r}_p(\alpha )&=A(\varvec{X}_k + \alpha \varvec{\varDelta X}_k) +\big (\mu _k + \alpha (\sigma _k-1)\mu _k\\&\qquad +\frac{\alpha ^2}{n}\langle \varDelta X_k,\varDelta Z_k\rangle \big )\left( (y_k + \alpha \varDelta y_k -\lambda _k)-\frac{\bar{b}}{\mu _0} \right) -b \\&\quad =\left( A\varvec{X}_k +\mu _k (y_k -\lambda _k)-b -\frac{\mu _k}{\mu _0}\bar{b}\right) + \alpha (A \varvec{\varDelta X}_k + \mu _k \varDelta y_k) \\&\qquad + \left( \alpha (\sigma _k-1)\mu _k + \frac{\alpha ^2}{n}\langle \varDelta X_k, \varDelta Z_k\rangle \right) \left( (y_k - \lambda _k + \alpha \varDelta y_k) - \frac{\bar{b}}{\mu _0}\right) \\&\quad =\frac{\mu _k}{\mu _0}\tilde{b}_k + \alpha \left( b- A\varvec{X}_k - \sigma _k\mu _k\left( (y_k-\lambda _k)-\frac{\bar{b}}{\mu _0} \right) + \epsilon _{p,k}\right. \\&\left. \qquad + \mu _k \big ((y_k-\lambda _k)-\frac{\bar{b}}{\mu _0} \big )-\ \mu _k \left( (y_k-\lambda _k)-\frac{\bar{b}}{\mu _0} \right) \right) \\&\qquad +\left( \alpha (\sigma _k-1)\mu _k + \frac{\alpha ^2}{n}\langle \varDelta X_k, \varDelta Z_k\rangle \right) \left( (y_k - \lambda _k + \alpha \varDelta y_k) - \frac{\bar{b}}{\mu _0}\right) , \end{aligned} \end{aligned}$$

where we used the definition of \(\tilde{b}_k\) in the neighbourhood conditions in (19), and the second block equation in (21). By using again the neighbourhood conditions, and then by deleting the opposite terms in the previous equation, we obtain:

$$\begin{aligned} \begin{aligned} \tilde{r}_p(\alpha )&= \ (1-\alpha )\frac{\mu _k}{\mu _0}\tilde{b}_k + \alpha \epsilon _{p,k} + \alpha ^2(\sigma _k - 1)\mu _k \varDelta y_k\\&\qquad + \frac{\alpha ^2}{n}\langle \varDelta X_k, \varDelta Z_k\rangle \left( y_k - \lambda _k + \alpha \varDelta y_k - \frac{\bar{b}}{\mu _0} \right) . \end{aligned} \end{aligned}$$
(46)

Similarly, we can show that:

$$\begin{aligned} \varvec{\tilde{R}}_d(\alpha )= & {} (1-\alpha )\frac{\mu _k}{\mu _0}\varvec{\tilde{C}}_k + \alpha \varvec{\mathsf {E}}_{d,k}- \alpha ^2(\sigma _k-1)\mu _k \varvec{\varDelta X}_k\nonumber \\&- \frac{\alpha ^2}{n}\langle \varDelta X_k, \varDelta Z_k \rangle \left( \varvec{X}_k - \varvec{\varXi }_k + \alpha \varvec{\varDelta X}_k + \frac{1}{\mu _0} \varvec{\bar{C}}\right) . \end{aligned}$$
(47)

Recall (Lemma 4.8) that \(\langle \varDelta X_k, \varDelta Z_k \rangle \le K_{\varDelta }^2 n^4 \mu _k\), and define the following quantities

$$\begin{aligned} \begin{aligned} \xi _2&= \ \mu _k \Vert (\varDelta y_k,\varvec{\varDelta X}_k)\Vert _2 + K_{\varDelta }^2n^3 \mu _{k}\bigg (\Vert (y_k - \lambda _k,\varvec{X}_k-\varvec{\varXi }_k)\Vert _2\ \\&\quad + \alpha ^* \Vert (\varDelta y_k,\varvec{\varDelta X}_k)\Vert _2 + \frac{1}{\mu _0}\Vert (\bar{b},\varvec{\bar{C}})\Vert _2\bigg ),\\ \xi _{{\mathcal {S}}} =&\ \mu _k \Vert (\varDelta y_k,\varvec{\varDelta X}_k)\Vert _{{\mathcal {S}}} + K_{\varDelta }^2n^3 \mu _{k}\bigg (\Vert (y_k - \lambda _k,\varvec{X}_k-\varvec{\varXi }_k)\Vert _{{\mathcal {S}}} \ \\&\quad + \alpha ^* \Vert (\varDelta y_k,\varvec{\varDelta X}_k)\Vert _{{\mathcal {S}}} + \frac{1}{\mu _0}\Vert (\bar{b},\varvec{\bar{C}})\Vert _{{\mathcal {S}}}\bigg ), \end{aligned} \end{aligned}$$
(48)

where \(\alpha ^*\) is given by (42). Using the definition of the starting point in (18), as well as results in Lemmas 4.4, 4.8, we can observe that \(\xi _2 = O(n^{4} \mu _k)\). On the other hand, using Assumption 2, we know that for every pair \((r_1,\varvec{R}_2) \in {\mathbb {R}}^{m+n^2}\) (where \(R_2 \in {\mathbb {R}}^{n \times n}\) is an arbitrary matrix), if \(\Vert (r_1,\varvec{R}_2)\Vert _2 = \varTheta (f(n))\), where \(f(\cdot )\) is a positive polynomial function of n, then \(\Vert (r_1,R_2)\Vert _{{\mathcal {S}}} = \varTheta (f(n))\). Hence, we have that \(\xi _{{\mathcal {S}}} = O(n^{4}\mu _k)\). Using the quantities in (48), equations (46), (47), as well as the neighbourhood conditions, we have that:

$$\begin{aligned} \begin{aligned} \Vert \tilde{r}_p(\alpha ),\varvec{\tilde{R}}_d(\alpha )\Vert _2 \le&\ (1-\alpha )K_N \frac{\mu _k}{\mu _0} + \alpha \mu _k \Vert (\epsilon _{p,k},\varvec{\mathsf {E}}_{d,k})\Vert _2+ \alpha ^2 \mu _k \xi _2,\\ \Vert \tilde{r}_p(\alpha ),\varvec{\tilde{R}}_d(\alpha )\Vert _S \le&\ (1-\alpha )\gamma _{{\mathcal {S}}}\rho \frac{\mu _k}{\mu _0} + \alpha \mu _k \Vert (\epsilon _{p,k},\varvec{\mathsf {E}}_{d,k})\Vert _{{\mathcal {S}}} + \alpha ^2 \mu _k \xi _{{\mathcal {S}}}, \end{aligned} \end{aligned}$$

for all \(\alpha \in (0,\alpha ^*)\), where \(\alpha ^*\) is given by (42) and the error occurring from the inexact solution of (20), \((\epsilon _{p,k},\mathsf {E}_{d,k})\), satisfies (22). From (37), we know that:

$$\begin{aligned} \mu _k(\alpha ) \ge (1-\alpha (1-\beta _1))\mu _k,\ \text {for all}\ \alpha \in (0,\alpha ^*). \end{aligned}$$

By combining the last three inequalities, using (22) and setting \(\beta _1 = \frac{\sigma _{\min }}{2}\), we obtain

$$\begin{aligned} \begin{aligned} \Vert \tilde{r}_p(\alpha ),\varvec{\tilde{R}}_d(\alpha )\Vert _2 \le \frac{\mu _k(\alpha )}{\mu _0} K_N,\ \text {for all}\ \alpha \in \bigg (0, \min \big \{\alpha ^*,\frac{\sigma _{\min } K_N}{4\xi _2 \mu _0}\big \}\bigg ]. \end{aligned} \end{aligned}$$

Similarly,

$$\begin{aligned} \begin{aligned} \Vert \tilde{r}_p(\alpha ),\varvec{\tilde{R}}_d(\alpha )\Vert _{{\mathcal {S}}} \le \frac{\mu _k(\alpha )}{\mu _0} \gamma _{{\mathcal {S}}} \rho ,\ \text {for all}\ \alpha \in \bigg (0, \min \big \{\alpha ^*,\frac{\sigma _{\min } \gamma _{{\mathcal {S}}} \rho }{4\xi _{{\mathcal {S}}} \mu _0}\big \}\bigg ]. \end{aligned} \end{aligned}$$

Hence, we have:

$$\begin{aligned} \bar{\alpha } :=\min \bigg \{\alpha ^*,\frac{\sigma _{\min } K_N}{4\xi _2 \mu _0}, \frac{\sigma _{\min } \gamma _{{\mathcal {S}}} \rho }{4\xi _{{\mathcal {S}}} \mu _0} \bigg \}. \end{aligned}$$
(49)

Since \(\bar{\alpha } = \varOmega \big (\frac{1}{n^{4}}\big )\), we know that there must exist a constant \(\bar{\kappa } > 0\), independent of n, m and of the iteration k, such that \(\bar{\alpha } \ge \frac{\kappa }{n^{4}}\), for all \(k \ge 0\), and this completes the proof. \(\square \)

The following theorem summarizes our results.

Theorem 4.1

Given Assumptions 1, 2, the sequence \(\{\mu _k\}\) generated by Algorithm IP–PMM converges Q-linearly to zero, and the sequences of regularized residual norms

$$\begin{aligned} \left\{ \left\| A\varvec{X}_k + \mu _k (y_k-\lambda _k) - b-\frac{\mu _k}{\mu _0}\bar{b}\right\| _2\right\} \ \text {and}\ \left\{ \left\| A^\top y_k + \varvec{Z}_k - \mu _k (\varvec{X}_k - \varvec{\varXi }_k) - \varvec{C} - \frac{\mu _k}{\mu _0}\varvec{\bar{C}}\right\| _2\right\} \end{aligned}$$

converge R-linearly to zero.

Proof

From (39) we have that:

$$\begin{aligned} \mu _{k+1} \le (1-0.01\alpha _k)\mu _k, \end{aligned}$$

while, from (49), we know that \(\forall \ k \ge 0\), \(\exists \ \bar{\alpha } \ge \frac{\bar{\kappa }}{n^4}\) such that \(\alpha _k \ge \bar{\alpha }\). Hence, we can easily see that \(\mu _k \rightarrow 0\). On the other hand, from the neighbourhood conditions, we know that for all \(k \ge 0\):

$$\begin{aligned} \bigg \Vert A\varvec{X}_k + \mu _k (y_k-\lambda _k) - b - \frac{\mu _k}{\mu _0}\bar{b}\bigg \Vert _2 \le K_N \frac{\mu _k}{\mu _0} \end{aligned}$$

and

$$\begin{aligned} \bigg \Vert A^\top y_k + \varvec{Z}_k - \mu _k (\varvec{X}_k - \varvec{\varXi }_k) - \varvec{C}- \frac{\mu _k}{\mu _0}\varvec{\bar{C}}\bigg \Vert _2 \le K_N \frac{\mu _k}{\mu _0}. \end{aligned}$$

This completes the proof. \(\square \)

The polynomial complexity of Algorithm IP–PMM is established in the following theorem.

Theorem 4.2

Let \(\varepsilon \in (0,1)\) be a given error tolerance. Choose a starting point for Algorithm IP–PMM as in (18), such that \(\mu _0 \le \frac{K}{\varepsilon ^{\omega }}\) for some positive constants \(K,\ \omega \). Given Assumptions 1 and 2, there exists an index \(k_0 \ge 0\) with:

$$\begin{aligned} k_0 = O\bigg (n^{4}\big |\log \frac{1}{\varepsilon }\big |\bigg ) \end{aligned}$$

such that the iterates \(\{(X_k,y_k,Z_k)\}\) generated from Algorithm IP–PMM satisfy:

$$\begin{aligned} \mu _k \le \varepsilon ,\ \ \ \ \text {for all}\ k\ge k_0. \end{aligned}$$

Proof

The proof can be found in [19, Theorem 3.8]. \(\square \)

Finally, we present the global convergence guarantee of Algorithm IP–PMM.

Theorem 4.3

Suppose that Algorithm IP–PMM terminates when a limit point is reached. Then, if Assumptions 1 and 2 hold, every limit point of \(\{(X_k,y_k,Z_k)\}\) determines a primal-dual solution of the non-regularized pair (P)(P)–(D)(D).

Proof

From Theorem 4.1, we know that \(\{\mu _k\} \rightarrow 0\), and hence, there exists a sub-sequence \({\mathcal {K}} \subseteq {\mathbb {N}}\), such that:

$$\begin{aligned}&\left\{ A\varvec{X}_k + \mu _k (y_k - \lambda _k) -b -\frac{\mu _k}{\mu _0}\bar{b}\right\} _{{\mathcal {K}}} \rightarrow 0,\\&\quad \left\{ A^\top y_k + \varvec{Z}_k - \mu _k (\varvec{X}_k - \varvec{\varXi }_k)-\varvec{C}-\frac{\mu _k}{\mu _0}\varvec{\bar{C}}\right\} _{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

However, since Assumptions 1 and 2 hold, we know from Lemma 4.4 that \(\{(X_k,y_k,Z_k)\}\) is a bounded sequence. Hence, we obtain that:

$$\begin{aligned} \{ A\varvec{X}_k - b\}_{{\mathcal {K}}} \rightarrow 0,\ \{A^\top y_k +\varvec{Z}_k -\varvec{C}\}_{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

One can readily observe that the limit point of the algorithm satisfies the optimality conditions of (P)(P)–(D)(D), since \(\langle X_k, Z_k \rangle \rightarrow 0\) and \(X_k,\ Z_k \in {\mathcal {S}}^n_+\). \(\square \)

Remark 4.1

As mentioned at the end of Sect. 3, we do not study the conditions under which one can guarantee that \(X_k - \varXi _k \rightarrow 0\) and \(y_k - \lambda _k \rightarrow 0\), although this could be possible. This is because the method is shown to converge globally even if this is not the case. Indeed, notice that if one were to choose \(X_0 = 0\) and \(\lambda _0 = 0\), and simply ignore the last conditional statement of Algorithm IP–PMM, the convergence analysis established in this section would still hold. In this case, the method would be interpreted as an interior point-quadratic penalty method, and we could consider the regularization as a diminishing primal-dual Tikhonov regularizer (i.e. a variant of the regularization proposed in [23]).

5 A Sufficient Condition for Strong Duality

We now drop Assumptions 1, 2, in order to analyze the behaviour of the algorithm when solving problems that are strongly (or weakly) infeasible, problems for which strong duality does not hold (weakly feasible), or problems for which the primal or the dual solution is not attained. For a formal definition and a comprehensive study of the previous types of problems we refer the reader to [14], and the references therein. Below we provide a well-known result, stating that strong duality holds if and only if there exists a KKT point.

Proposition 1

Let (P)(P)–(D)(D) be given. Then, \(\mathrm{val(P)} \ge \mathrm{val(D)}\), where \(\mathrm{val}(\cdot )\) denotes the optimal objective value of a problem. Moreover, \(\mathrm{val(P)} = \mathrm{val(D)}\) and \((X^*,y^*,Z^*)\) is an optimal solution for (P)(P)–(D)(D), if and only if \((X^*,y^*,Z^*)\) satisfies the KKT optimality conditions in (1).

Proof

This is a well-known fact, the proof of which can be found in [24, Proposition 2.1]. \(\square \)

Let us employ the following two premises:

Premise 1

During the iterations of Algorithm IP–PMM, the sequences \(\{\Vert y_k - \lambda _k\Vert _2\}\) and \(\{\Vert X_k - \varXi _k\Vert _F\}\), remain bounded.

Premise 2

There does not exist a primal-dual triple, satisfying the KKT conditions in (1) associated with the primal-dual pair (P)(P)–(D)(D).

The following analysis extends the result presented in [19, Section 4], and is based on the developments in [9, Sections 10 & 11]. In what follows, we show that Premises 1 and 2 are contradictory. In other words, if Premise 2 holds (which means that strong duality does not hold for the problem under consideration), then Premise 1 cannot hold, and hence Premise 1 is a sufficient condition for strong duality (and its negation is a necessary condition for Premise 2). We show that if Premise 1 holds, then the algorithm converges to an optimal solution. If not, however, it does not necessarily mean that the problem under consideration is infeasible. For example, this could happen if either (P)(P) or (D)(D) is strongly infeasible, weakly infeasible, and in some cases even if either of the problems is weakly feasible (e.g. see [14, 24]). As we discuss later, the knowledge that Premise 1 does not hold could be useful in detecting pathological problems.

Lemma 5.1

Given Premise 1, and by assuming that \(\langle X_k, Z_k \rangle > \varepsilon \), for some \(\varepsilon >0\), for all iterations k of Algorithm IP–PMM, the Newton direction produced by (21) is uniformly bounded by a constant dependent only on n and/or m.

Proof

The proof is omitted since it follows exactly the developments in [9, Lemma 10.1]. We notice that the regularization terms (blocks (1,1) and (2,2) in the Jacobian matrix in (21)) depend on \(\mu _k\) which by assumption is always bounded away from zero: \(\mu _k \ge \frac{\epsilon }{n}\). \(\square \)

In the following Lemma, we prove by contradiction that the parameter \(\mu _k\) of Algorithm IP–PMM converges to zero, given that Premise 1 holds.

Lemma 5.2

Given Premise 1, and a sequence \((X_k,y_k,Z_k) \in {\mathcal {N}}_{\mu _k}(\varXi _k,\lambda _k)\) produced by Algorithm IP–PMM, the sequence \(\{\mu _k\}\) converges to zero.

Proof

Assume, by virtue of contradiction, that \(\mu _k> \varepsilon > 0\), \(\text {for all}\ k \ge 0\). Then, we know (from Lemma 5.1) that the Newton direction obtained by the algorithm at every iteration, after solving (21), will be uniformly bounded by a constant dependent only on n, that is, there exists a positive constant \(K^{\dagger }\), such that \(\Vert (\varDelta X_k,\varDelta y_k,\varDelta Z_k)\Vert _2 \le K^{\dagger }\). We define \(\tilde{r}_p(\alpha )\) and \(\varvec{\tilde{R}}_d(\alpha )\) as in (43) and (44), respectively, for which we know that equalities (46) and (47) hold, respectively. Take any \(k \ge 0\) and define the following functions:

$$\begin{aligned} \begin{aligned} f_1(\alpha ) :=\&\langle X_k(\alpha ), Z_k(\alpha )\rangle - \left( 1 -\alpha \left( 1-\frac{\sigma _{\min }}{2}\right) \right) \langle X_k, Z_k\rangle ,\\ f_2(\alpha ) :=\&\gamma _{\mu }\mu _k(\alpha ) - \Vert H_{{P_k}(\alpha )}(X_k(\alpha )Z_k(\alpha )) - \mu _k(\alpha )\Vert _F \\ f_3(\alpha ) :=\&(1-0.01\alpha )\langle X_k,Z_k\rangle - \langle X_k(\alpha ), Z_k(\alpha )\rangle ,\\ g_{2}(\alpha ) :=\&\frac{\mu _k(\alpha )}{\mu _0}K_N - \Vert (\tilde{r}_p(\alpha ),\varvec{\tilde{R}}_d(\alpha ))\Vert _2, \end{aligned} \end{aligned}$$

where \(\mu _k(\alpha ) = \frac{\langle X_k + \alpha \varDelta X_k, Z_k + \alpha \varDelta Z_k\rangle }{n}\), \((X_k(\alpha ),y_k(\alpha ),Z_k(\alpha )) = (X_k + \alpha \varDelta X_k,y_k + \alpha \varDelta y_k,Z_k + \alpha \varDelta Z_k)\). We would like to show that there exists \(\alpha ^* > 0\), such that:

$$\begin{aligned} f_1(\alpha ) \ge 0,\quad f_2(\alpha ) \ge 0,\quad f_3(\alpha ) \ge 0,\quad g_2(\alpha ) \ge 0,\ \text {for all}\ \alpha \in (0,\alpha ^*]. \end{aligned}$$

These conditions model the requirement that the next iteration of Algorithm IP–PMM must lie in the updated neighbourhood \({\mathcal {N}}_{\mu _{k+1}}(\varXi _k,\lambda _{k})\) (notice however that the restriction with respect to the semi-norm defined in (17) is not required here, and indeed it cannot be incorporated unless \({\text {rank}}(A) = m\)). Since Algorithm IP–PMM updates the parameters \(\lambda _k,\ \varXi _k\) only if the selected new iterate belongs to the new neighbourhood, defined using the updated parameters (again, ignoring the restrictions with respect to the semi-norm), it suffices to show that \((X_{k+1},y_{k+1},Z_{k+1}) \in {\mathcal {N}}_{\mu _{k+1}}(\varXi _k,\lambda _{k})\).

Proving the existence of \(\alpha ^* > 0\), such that each of the aforementioned functions is positive, follows exactly the developments in Lemmas 4.94.10, with the only difference being that the bounds on the directions are not explicitly specified in this case. Using the same methodology as in aforementioned lemmas, while keeping in mind our assumption, namely \(\langle X_k, Z_k \rangle > \varepsilon \), we can show that:

$$\begin{aligned} \alpha ^* :=\min \bigg \{1,\frac{\sigma _{\min }\epsilon }{2(K^{\dagger })^2}, \frac{(1-\gamma _{\mu })\sigma _{\min }{\gamma _{\mu }\epsilon }}{2n(K^{\dagger })^2},\frac{0.49\epsilon }{2(K^{\dagger })^2}, \frac{\sigma _{\min }K_N \epsilon }{4\mu _0(\xi _2)} \bigg \}, \end{aligned}$$
(50)

where \(\xi _2\) is a bounded constant, defined as in (48), and dependent on \(K^{\dagger }\). However, using the inequality:

$$\begin{aligned} \mu _{k+1} \le (1-0.01 \alpha )\mu _k,\ \text {for all}\ \alpha \in [0,\alpha ^*] \end{aligned}$$

we get that \(\mu _k \rightarrow 0\), which contradicts our assumption that \(\mu _k > \varepsilon ,\ \forall \ k\ge 0\), and completes the proof. \(\square \)

Finally, using the following Theorem, we derive a necessary condition for lack of strong duality.

Theorem 5.1

Given Premise 2, i.e. there does not exist a KKT triple for the pair (P)(P)–(D)(D), then Premise 1 fails to hold.

Proof

By virtue of contradiction, let Premise 1 hold. In Lemma 5.2, we proved that given Premise 1, Algorithm IP–PMM produces iterates that belong to the neighbourhood (19) and \(\mu _k \rightarrow 0\). But from the neighbourhood conditions we can observe that:

$$\begin{aligned} \left\| A\varvec{X}_k + \mu _k(y_k - \lambda _k) - b - \frac{\mu _k}{\mu _0}\bar{b} \right\| _2 \le K_N\frac{\mu _k}{\mu _0}, \end{aligned}$$

and

$$\begin{aligned} \left\| A^\top y_k + \varvec{Z}_k - \mu _k(\varvec{X}_k - \varvec{\varXi }_k)-\varvec{C}-\frac{\mu _k}{\mu _0}\varvec{\bar{C}}\right\| _2 \le K_N \frac{\mu _k}{\mu _0}. \end{aligned}$$

Hence, we can choose a sub-sequence \({\mathcal {K}} \subseteq {\mathbb {N}}\), for which:

$$\begin{aligned}&\{A\varvec{X}_k + \mu _k(y_k - \lambda _k) - b - \frac{\mu _k}{\mu _0}\bar{b} \}_{{\mathcal {K}}} \rightarrow 0,\ \text {and} \\&\quad \{A^\top y_k + \varvec{Z}_k - \mu _k(\varvec{X}_k - \varvec{\varXi }_k)-\varvec{C}-\frac{\mu _k}{\mu _0}\varvec{\bar{C}}\}_{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

But since \(\Vert y_k-\lambda _k\Vert _2\) and \(\Vert X_k - \varXi _k\Vert _F\) are bounded, while \(\mu _k \rightarrow 0\), we have that:

$$\begin{aligned} \{A\varvec{X}_k - b\}_{{\mathcal {K}}} \rightarrow 0,\ \{\varvec{C} - A^\top y_k - \varvec{Z}_k\}_{{\mathcal {K}}} \rightarrow 0,\ \text {and}\ \{\langle X_k, Z_k\rangle \}_{{\mathcal {K}}} \rightarrow 0. \end{aligned}$$

This contradicts Premise 2, i.e. that the pair (P)(P)–(D)(D) does not have a KKT triple, and completes the proof. \(\square \)

In the previous Theorem, we proved that the negation of Premise 1 is a necessary condition for Premise 2. Nevertheless, this does not mean that the condition is also sufficient. In order to obtain a more reliable algorithmic test for lack of strong duality, we have to use the properties of Algorithm IP–PMM. In particular, we can notice that if there does not exist a KKT point, then the PMM sub-problems will stop being updated after a finite number of iterations. In that case, we know from Theorem 5.1 that the sequence \(\Vert (\varvec{X}_k- \varvec{\varXi }_k,y_k -\lambda _k)\Vert _2\) will grow unbounded. Hence, we can define a maximum number of iterations per PMM sub-problem, say \(k_{\dagger } > 0\), as well as a very large constant \(K_{\dagger }\). Then, if \(\Vert (\varvec{X}_k- \varvec{\varXi }_k,y_k -\lambda _k)\Vert _2 > K_{\dagger }\) and \(k_{in} \ge k_{\dagger }\) (where \(k_{in}\) counts the number of IPM iterations per PMM sub-problem), the algorithm is terminated with a guess that there does not exist a KKT point for (P)(P)–(D)(D).

Remark 5.1

Let us notice that the analysis in Sect. 4 employs the standard assumptions used when analyzing a non-regularized IPM. However, the method could still be useful if these assumptions were not met. Indeed, if for example the constraint matrix was not of full row rank, one could still prove global convergence of the method, using the methodology employed in this section by assuming that Premise 1 holds and Premise 2 does not (or by following the developments in [9]). Furthermore, in practice the method would not encounter any numerical issues with the inversion of the Newton system (see [2]). Nevertheless, showing polynomial complexity in this case is still an open problem. The aim of this work is to show that under the standard Assumptions 1, 2, Algorithm IP–PMM is able to enjoy polynomial complexity, while having to solve better conditioned systems than those solved by standard IPMs at each iteration, thus ensuring better stability (and as a result better robustness and potentially better efficiency).

6 Conclusions

In this paper we developed and analyzed an interior point-proximal method of multipliers, suitable for solving linear positive semi-definite programs, without requiring the exact solution of the associated Newton systems. By generalizing appropriately some previous results on convex quadratic programming, we show that IP-PMM inherits the polynomial complexity of standard non-regularized IPM schemes when applied to SDP problems, under standard assumptions, while having to approximately solve better-conditioned Newton systems, compared to their non-regularized counterparts. Furthermore, we provide a tuning for the proximal penalty parameters based on the well-studied barrier parameter, which can be used to guide any subsequent implementation of the method. Finally, we study the behaviour of the algorithm when applied to problems for which no KKT point exists, and give a necessary condition which can be used to construct detection mechanisms for identifying such pathological cases.

A future research direction would be to construct a robust and efficient implementation of the method, which should utilize some Krylov subspace solver alongside an appropriate preconditioner for the solution of the associated linear systems. Given previous implementations of IP-PMM for other classes of problems appearing in the literature, we expect that the theory can successfully guide the implementation, yielding a competitive, robust, and efficient method.