1 Introduction

Semidefinite programming problems (SDPs) are a generalization of linear programming problems (LPs). While capturing a much larger set of problems, SDPs are solvable up to fixed precision in polynomial time in terms of the input data, and linear in terms of the precision [17]; see [10] for the complexity in the Turing model of computation.

Practical computation is, however, more complicated. While we are able to solve linear programs with millions of variables and constraints routinely, SDPs become intractable already for a few tens of thousands of constraints and for \(n\times n\) matrix variables of the order \(n \approx 1,000\). The reason is that each iteration of a typical interior point algorithm for SDP requires \(\mathcal {O}(n^3m+n^2m^2 + m^3)\) operations, where n is the size of the matrix variable and m is the number of equality constraints; see e.g. [15] or [3]. However, solving large instances of SDPs is of growing interest, due to applications in power flow problems on large power grids, SDP-based hierarchies for polynomial and combinatorial problems, etc. (see [13, 23, 24]). In the following we will revisit a relaxation of a given SDP, where the cone of positive semidefinite matrices is replaced by a more tractable cone, namely the cone of matrices of constant factor width [7]. The simplest examples of matrices of constant factor width are non-negative diagonal matrices (corresponding to linear programs), and scaled diagonally dominant matrices (corresponding to second order cone programming) [4]. We then review how iteratively rescaling the cone and solving the given optimization problem over this new set leads to a non-increasing sequence of optimal values lower bounded by the optimum of the sought SDP. This iterative procedure, due to [1], does not lead to a convergent algorithm. However, its essence can be used to construct a convergent predictor-corrector interior point method, as was done in [19]. Our paper is inspired by ideas from [1, 2, 4, 5, 19]. In particular, we will extend the results in [19], and give a more concise complexity analysis in our extended setting.

1.1 Iterative Approximation Scheme

Let the set of symmetric \(n \times n\) matrices be given by \(\mathbb {S}^n\), where \(n \in \mathbb {N}\) is a positive integer. We write [m] for the set \(\{1, 2, \ldots , m\}\), where \(m \in \mathbb {N}\). Consider a set \(\{A_i \in \mathbb {S}^{n}: i \in [m] \}\) of symmetric data matrices and define the linear operator

$$ \mathcal {A}(X) = (\langle A_1,X \rangle ,\ldots ,\langle A_m,X\rangle ) \in \mathbb {R}^m, $$

where \(\langle X, Y \rangle := \textrm{tr}(XY)\) for \(X,Y \in \mathbb {S}^n\). Furthermore, define for \(b \in \mathbb {R}^m\) the affine subspace

$$\begin{aligned} L = \{X \in \mathbb {S}^n : \mathcal {A}(X) = b\}. \end{aligned}$$

Consider the following semidefinite program

$$\begin{aligned} v^*_{\textrm{SDP}} = \inf \left\{ \langle A_0, X \rangle : \mathcal {A}(X) = b, X \in \mathbb {S}^n_+\right\} , \end{aligned}$$
(1)

which we assume to be strictly feasible. Replacing the cone of positive semidefinite (psd) matrices in (1) by a cone \(\mathcal {K} \subseteq \mathbb {S}^n_+\), which is more tractable, leads to the following program

$$\begin{aligned} v_{\mathcal {K}} = \inf \left\{ \langle A_0, X \rangle : \mathcal {A}(X) = b , X \in \mathcal {K}\right\} ,\quad \text { where } \mathcal {K} \subseteq \mathbb {S}^n_+. \end{aligned}$$
(2)

Clearly, \(v_{\mathcal {K}} \ge v^*_{\textrm{SDP}}\). The quality of the approximation depends on the chosen cone \(\mathcal {K}\). In [4], while focusing on sums-of-squares optimization the authors consider the cones of diagonally dominant and scaled diagonally dominant matrices. Ahmadi and Hall developed the idea of replacing the psd cone by a simpler cone further in [1], leveraging an optimal solution of the relaxation. Essentially, the idea is as follows. Define the feasible set for (1) as

$$ \mathcal {F}_{\textrm{SDP}} = \left\{ X \succeq 0 : \mathcal {A}(X) = b \right\} . $$

We will consider a sequence of strictly feasible points for (2), denoted by \(X_\ell \) for \(\ell = 0, 1, \ldots \). Since \(X_\ell \succeq 0\), the matrix \(X_\ell ^{1/2}\) is well-defined. One can update the data matrices in the following way

$$ A_i^{(\ell )} = X^{1/2}_\ell A_i X^{1/2}_\ell \quad (i \in \{0,1,\ldots ,m\},\, \ell = 0,1, \ldots ), $$

giving rise to a new linear operator

$$ \mathcal {A}^{(\ell )}(X) = (\langle A_1^{(\ell )}, X\rangle ,\ldots ,\langle A_m^{(\ell )}, X\rangle ) \in \mathbb {R}^m. $$

We may also refer to this operation as rescaling with respect to \(X_\ell \). Via this rescaling one obtains the following sequence of reformulations of (1):

$$\begin{aligned} v^{*}_{\textrm{SDP}} =\min \left\{ \langle A_0^{(\ell )}, X \rangle : \mathcal {A}^{(\ell )}(X) = b, X \in \mathbb {S}^n_+\right\} , \end{aligned}$$
(3)

whose feasible set we define as

$$ \mathcal {F}_{\textrm{SDP}_\ell } = \left\{ X \succeq 0 : \mathcal {A}^{(\ell )}(X) = b \right\} . $$

For each \(\ell \) the identity matrix is feasible, i.e., we have \(X = I \in \mathcal {F}_{\textrm{SDP}_\ell }\). To see this, note that for all \(i \in [m]\) we have

$$ \langle A_i^{(\ell )}, I \rangle = \langle (X_\ell )^{\frac{1}{2}}A_i (X_\ell )^{\frac{1}{2}}, I \rangle = \langle A_i ,X_\ell \rangle = b_i. $$

Similarly, the identity leads to the same objective value in (3) as \(X_\ell \) in (2). Let \(X_0\) be an optimal solution to (2). Rescaling with respect to \(X_0\) we find by the same reasoning that \(v_{\mathcal {K}}^{(0)} \le v_{\mathcal {K}}\), where

$$\begin{aligned} v^{(\ell )}_\mathcal {K} =\min \left\{ \langle A_0^{(\ell )}, X \rangle : \mathcal {A}^{(\ell )}(X) = b, X \in \mathcal {K} \right\} . \end{aligned}$$
(4)

Reiterating this procedure leads to a non-increasing sequence of values \(\left\{ v^{(\ell )}_{\mathcal {K}} \right\} _{\ell \in \mathbb {N}}\) lower bounded by \(v^{*}_{\textrm{SDP}}\). Unfortunately, this procedure does not always converge to the true optimum of (1) if \(\mathcal {K}\) is a cone of matrices of constant factor width, as mentioned in [19]. Indeed, it can happen that \(\liminf _{\ell \rightarrow \infty } v_{\mathcal {K}}^{(\ell )} > v^*_{\textrm{SDP}}\). The rest of this paper is devoted to the development and analysis of an interior point algorithm, which converges to the optimal value \(v^*_{\textrm{SDP}}\). We thereby refine and extend results from [19], where a different interior point method (based on the factor width cone) was introduced. Our contribution is to give a concise polynomial-time convergence analysis, since the iteration complexity bounds given in [19] involve constants that depend on the data, but the dependence is not made explicit; see e.g. [19, Theorem 4.12]. Moreover, the authors of [19] only consider factor width at most 2 (i.e. the scaled, diagonally dominant matrices), while we analyse the general case.

1.2 Outline of the Paper

This paper is conceptually divided into two parts. The first part contains Sections 1 and 2 and is devoted to introducing the setting as well as the algorithm. Our aim with the first part is to convey the concept in a comprehensible way. The second part consists of the remaining Sections 36. It is more technical and contains the derivation of objects used in the algorithm as well as the formal complexity analysis.

1.3 The Factor Width Cone

Fix \(n \in \mathbb {N}\). The cone of \(n\times n\) matrices of factor width k, denoted by \(\textrm{FW}_n(k)\), is defined as

$$ \textrm{FW}_n(k) = \left\{ Y \in \mathbb {S}^n : Y = \sum _{i \in \mathbb {N}} x_i x_i^T~ \text { for } x_i \in \mathbb {R}^n,~ \text {supp}(x_i) \le k,~\forall i\,\right\} . $$

The notion of factor width was first used in [7] where the authors proved that \({FW}_n(2)\) is the cone of scaled diagonally dominant matrices. Trivially, \({FW}_n(1)\) is the cone of non-negative \(n \times n\) diagonal matrices. Clearly, we have that

$$ \textrm{FW}_n(k) \subseteq \textrm{FW}_n(k+1) \subseteq \mathbb {S}^n_+ \quad \forall k \in [n-1]. $$

Moreover, \(\textrm{FW}_n(n) = \mathbb {S}^n_+\). It is easy to see these cones are proper. As they define an inner approximation of the cone \(\mathbb {S}^n_+\) we may use them in the aforementioned iterative scheme. Define \(\mathcal {J} := \{J \subset [n] : |J| = k\}\) for fixed \(n,k \in \mathbb {N}\) with \(k \vert n\).

$$ \mathbb {S}^{(n,k)} := (\mathbb {S}^k)^{\mathcal {J}}\quad \text { and }\quad \mathbb {S}_+^{(n,k)} := (\mathbb {S}_+^k)^{\mathcal {J}}. $$

An optimization problem over the cone \(\textrm{FW}_n(k)\) may be formulated as an optimization problem over the cone product \(\mathbb {S}_+^{(n,k)}\). To see this we need to consider principal submatrices. For a matrix \(S \in \mathbb {R}^{n \times n}\) we define the principal submatrix \(S_{J,J}\) for \(J \subseteq [n]\) to be the restriction of S to rows and columns whose indices appear in J. Furthermore, for a set \(J = \{ i_1, \dots , i_{|J|} \} \subseteq [n]\) and a matrix \(S \in \mathbb {R}^{ J \times J}\) we define the \(n \times n\) matrix \(S_J^{\rightarrow n}\) as follows for \(i,j \in [n]\)

$$\begin{aligned} (S_J^{\rightarrow n})_{i,j} = \left\{ \begin{array}{ll} S_{k,l} &{}\quad \text {if }~i = i_k, j = i_l,\\ 0 &{}\quad \text {otherwise.} \end{array}\right. \end{aligned}$$
(5)

In other words, \(S_J^{\rightarrow n}\) has \(S_J\) as principal sub-matrix indexed by J, and zeros elsewhere. Now, to write a program over \(\textrm{FW}_n(k)\) as an SDP note the following observation. It is easy to see that, for any \(X \in \textrm{FW}_n(k)\), we have

$$ X = \sum _{J \in \mathcal {J}} Y_J^{\rightarrow n} $$

for suitable \(Y_J \in \mathbb {S}^k_+\) indexed by \(\mathcal {J}\). Thus, we can write

$$\begin{aligned} \inf \{\langle C, X \rangle : \mathcal {A}(X) = b, X \in \textrm{FW}_n(k)\} \end{aligned}$$
(6)

as

$$\begin{aligned} \inf \left\{ \sum _{|J|=k}\langle C_{J,J}, Y_J \rangle : \sum _{|J|=k} \langle (A_i)_{J,J}, Y_J \rangle = b_i, Y_J \in \mathbb {S}^k_+,~ \forall |J| = k\right\} . \end{aligned}$$
(7)

It is straightforward to show that the dual cone is given by

$$ \textrm{FW}_n(k)^*= \{S \in \mathbb {S}^n : S_{J,J} \succeq 0 ~\text { for } J \subseteq [n], |J| = k \}. $$

The dual cone has been studied in the context of semidefinite optimization in [8], where it was shown that the distance of \(\text {FW}_n(k)^*\) and \(\mathbb {S}^n_+\) in the Frobenius norm can be upper bounded by \(\frac{n-k}{n+k-2}\) for matrices of trace 1. For \(k\ge 3n/4\) and \(n\ge 97\) this bound can be improved to \(O(n^{-3/2})\) (see [8]).

2 Interior Point Methods and the Central Path

Interior point methods (IPMs) are among the most commonly used algorithms to solve conic optimization problems in practice. Notable software for IPMs include Mosek [16], CSDP [9], SDPA [12, 22], SeDuMi [20] and SDPT3 [21]. In the remainder of this section, we will closely follow the notation used in [18], since we will make use of several results from this book. Consider the following conic optimization problem for a proper convex cone \(\mathcal {K} \subset \mathbb {R}^n\):

$$\begin{aligned} \min \left\{ \langle c, x \rangle : \langle a_i, x \rangle = b_i, i \in [m], x \in \mathcal {K}\right\} . \end{aligned}$$

In IPMs the cone membership constraint is replaced by adding a convex penalty function f to the objective. This function f is a so-called self-concordant barrier function. Loosely speaking, the function f returns larger values the closer the input is to the boundary of the cone and tends to infinity as the boundary is approached. In order to formally define self-concordant barrier functionals, let \(f : \mathbb {R}^n \supset D_f \rightarrow \mathbb {R}\) be such that its Hessian H(x) is positive definite (pd) for all \(x \in D_f\). With respect to this function, we can define a local inner product as follows

$$ \langle u,v \rangle _x := \langle u, H(x)v \rangle , $$

where \(u,v \in \mathbb {R}^n\) and \(\langle \cdot , \cdot \rangle \) is some reference inner product. Let \(B_x(y,r)\) be the open ball centered at y with radius \(r>0\) whose radius is measured by \(\Vert \cdot \Vert _x\), i.e., the norm arising from the local inner product at x.

Definition 1

(see [18, Section 2.2.1]) A functional f is called (strongly non-degenerate) self-concordant if for all \(x \in D_f\) we have that \(B_x(x,1) \subset D_f\) and whenever \(y \in B_x(x,1)\) we have

$$ 1-\Vert y-x\Vert _x \le \frac{\Vert v\Vert _y}{\Vert v\Vert _x} \le \frac{1}{1-\Vert y-x\Vert _x} \quad \text { for all }v \ne 0. $$

A functional f is called a self-concordant barrier functional if f is self-concordant and additionally satisfies

$$ \vartheta _f := \sup _{x \in D_f} \Vert H(x)^{-1}g(x)\Vert _x^2<\infty , $$

where g(x) is the gradient of f.

We refer to \(\vartheta _f\) as the complexity value of f (see [18, p. 35]), which will become crucial in our complexity analysis. Henceforth, let f be a self-concordant barrier functional for \(\mathcal {K}\) and consider the following family of problems for positive \(\eta \in \mathbb {R}_+\)

$$\begin{aligned} z_{\eta }= & {} \text {argmin}\,\eta \langle c, x \rangle + f(x) \\{} & {} \qquad \text {s.t. }~\langle a_i, x \rangle = b_i \quad i \in [m].\nonumber \end{aligned}$$
(8)

The minimizers \(z_\eta \) of (8) define a curve, parametrized by \(\eta \) in the interior of \(\mathcal {K}\). This curve is called the central path. For \(\eta \rightarrow \infty \) one can show that \(z_\eta \rightarrow x^*\), where \(x^*\) denotes an optimal solution. Interior point methods work by subsequently approximating a sequence of points \(\{z_{\eta _i} : i = 1, \dots , N\}\) on the central path, where \(\eta _1< \eta _2 < \cdots \) such that \(z_{\eta _N}\) is within the desired distance to the optimal solution. The type of interior point method we consider is an adaptation of the (primal) predictor-corrector method (see [18, § 2.4.4]). This method uses the ordinary affine scaling direction to produce a new point inside the cone with decreased objective value. Afterwards, a series of corrector steps is performed to obtain feasible solutions with the same objective value that lie increasingly close to the central path. Interior point methods typically rely on Newton’s method in each step, where the convergence rate depends on the so-called Newton decrement.

Definition 2

If \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) has a gradient g(x) and positive definite Hessian \(H(x) \succ 0\) at a point x in its domain, then the Newton decrement of f at x is defined as

$$ \Delta (f,x) = \sqrt{\langle g(x), H^{-1}(x)g(x)\rangle } = \Vert H^{-1}(x)g(x)\Vert _x. $$

For self-concordant functions f, a sufficiently small value of \(\Delta (f,x)\), e.g., \(\Delta (f,x) < 1/9\), implies that x is close to the minimizer of f (cf. [18, Theorem 2.2.5]).

Suppose we are given a starting point \(x_0\), which is close to \(z_{\eta _0}\) for some \(\eta _0 \in \mathbb {R}\). The affine-scaling direction is given by \(- H(x_0)^{-1}c\) and points approximately tangential to the central path in the direction of decreasing the objective value \(\langle c, x \rangle \) (\(-H^{-1}(z_{\eta _0})c\) is exactly tangential to the central path). The predictor step moves from \(x_0\) a fixed fraction \(\sigma \in (0,1)\) of the distance towards the boundary of the feasible set in the affine-scaling direction, thereby producing a new point \(x_1\) satisfying \(\langle c, x_1 \rangle < \langle c, x_0 \rangle \). The new point \(x_1\) is not necessarily close to the central path. The algorithm then proceeds to produce a sequence of feasible points \(x_2, x_3, \dots \) satisfying \(\langle c, x_1\rangle = \langle c , x_i \rangle \) for \(i = 2, 3, \dots \) while each \(x_i\) for \(i = 2, 3, \dots \) is closer to the central path than its predecessor \(x_{i-1}\). In other words, the algorithm targets the point \(z_{\eta _1}\) on the central path with the same objective value as \(x_1\) and produces a sequence of points converging to \(z_{\eta _1}\). Once an \(x_j\) is found such that \(\Delta (f, x_j) < 1/9\), the next predictor step is taken. This procedure is repeated until an \(\varepsilon \)-optimal solution is found. The corrector phase works by minimizing the self-concordant barrier restricted to the feasible affine space intersected with the set of all \(x \in \mathbb {R}^n\) such that \(\langle c, x\rangle = \langle c, x_i\rangle \), where \(x_i\) is the point produced by the most recent predictor step. This minimization problem is solved iteratively by performing line searches along the direction given by the Newton step for the restricted functional. We provide a visualization of the predictor-corrector method in Fig. 1.

2.1 Newton Decrements for Functions Restricted to Subspaces

If a self-concordant function f is restricted to a (translated) linear subspace L, and denoted by \(f_{\vert L}\), then the Newton decrement at x becomes

$$ \Delta \left( f_{\vert L},x\right) = \Vert P_{L,x}H^{-1}(x)g(x)\Vert _x, $$

where \(\Vert \cdot \Vert _x\) is the norm induced by the inner product \(\langle u,v\rangle _x = \langle u, H(x)v \rangle \), and \(P_{L,x}\) is the orthogonal projection onto L for the \(\Vert \cdot \Vert _x\) norm; see [18, § 1.6].

Note that we have

$$\begin{aligned} \Delta (f, x)&= \langle g(x), H^{-1}(x)g(x)\rangle ^{1/2} = \langle g(x), - n(x)\rangle ^{1/2} \\&= \langle n(x), n(x)\rangle _{x}^{1/2} = \Vert n(x)\Vert _{x} = \sup _{\Vert d\Vert _{x} = 1} \langle d, n(x)\rangle _{x}, \end{aligned}$$

where n(x) is the Newton step at x, i.e., \(n(x)= -H(x)^{-1}g(x)\). Hence, restricting the function f to a subspace L we find

$$\begin{aligned} \Delta \left( f_{|_L}, x\right)= & {} \underset{\Vert {d}\Vert _{x = 1}}{\sup } \langle d, P_{L,x} n(x) \rangle _{x} = {\underset{d \in L}{\underset{\Vert d\Vert _{x = 1},}{\sup }}} \langle d, n(x) \rangle _{x} \nonumber \\= & {} \underset{0 \ne d \in L}{\sup }\frac{\langle d, n(x) \rangle _{x}}{\Vert {d}\Vert _{x}} \ge \frac{\langle d, n(x) \rangle _{x}}{\Vert d\Vert _{x}}\quad \text { for all } d \in L\setminus \{0\}. \end{aligned}$$
(9)
Fig. 1
figure 1

Visualization of predictor-corrector method. Initial feasible solution close to central path (red) is given by \(x_1\). Algorithm performs predictor step returning \(x_2\). Corrector steps are taken until point close enough to central path (\(x_4\)) is found. Next predictor step returns \(x_5\). Corrector steps are taken until \(x_8\) is found, which is close enough to central path to perform next predictor step returning \(x_9\). After one corrector step the final point \(x_{10}\) is \(\varepsilon \)-close to \(x^*\)

2.2 A Predictor-Corrector Method Using FW(k)

In this subsection we propose our algorithm which makes use of the rescaling introduced in Section 1.1; see Algorithm 1 below. Our aim is to provide a comprehensible exposition, while the details are postponed to the second part of the paper, beginning with Section 3.

Algorithm 1 is an adaption of the predictor-corrector method as described in [18, Section 2.2.4]. Before describing the algorithm in detail we fix some notation. Let \(\mathcal {Y} \in \mathbb {S}^{(n,k)}\) be a collection of \({{n}\atopwithdelims (){k}}\) matrices of size \(k \times k\). We define the operator \(\Psi \) as

$$ \Psi (\mathcal {Y}) = \sum _{ J \in \mathcal {J}}Y^{\rightarrow n}_J, $$

where we made use of the notation defined in (5). Hence, if \(\mathcal {Y}\) is a collection of positive semidefinite \(k\times k\) matrices, then \(\Psi (\mathcal {Y}) \in \text {FW}_n(k)\). Furthermore, let

$$\begin{aligned} \mathcal {Y}_0 = \left\{ Y_J = {{n-1}\atopwithdelims (){k-1}}^{-1} I_{k \times k} : J \subset [n], |J| = k \right\} , \end{aligned}$$
(10)

so that \(\Psi (\mathcal {Y}_0) = I\). Now let \(X_\ell \) be a strictly feasible solution to a problem of form (1) and rescale the data matrices with respect to \(X_\ell \). Recall that the feasible set of the resulting SDP is contained in the following affine space

$$\begin{aligned} L_\ell = \left\{ X \in \mathbb {S}^n : \mathcal {A}^{(\ell )}_0(X) = b\right\} . \end{aligned}$$
(11)

Likewise, the feasible set of the factor width relaxation written over \(\mathbb {S}^{(n,k)}_+\) (cf. (7)) is constrained to lie in the affine space

$$\begin{aligned} L^{\Psi }_{\ell } = \left\{ \mathcal {Y} \in \mathbb {S}^{(n,k)} : (\mathcal {A}^{(\ell )} \circ \Psi ) (\mathcal {Y}) = b\right\} . \end{aligned}$$

Note that \(I \in L_\ell \) and \(\mathcal {Y}_0 \in L^{\Psi }_\ell \). We emphasize that, by definition, for any element \(\mathcal {Y} \in L_{\ell }^{\Psi }\) we have \(\Psi (\mathcal {Y}) \in L_\ell \).

2.3 Main Method

The algorithm requires a feasible starting point \(X_0\) close to the central path, which is used in the first rescaling step. We also require an \(\varepsilon > 0\), i.e., our desired accuracy as well as a \(\sigma \in (0,1)\) used in the predictor step. In the following let \(f^{\text {FW}(k)}\) be a self-concordant barrier function for \(\mathbb {S}^{(n,k)}_+\) (we postpone its derivation to Section 3, for now we assume it exists and is efficiently computable). In the algorithm we denote the restriction of \(f^{\text {FW}(k)}\) to the subspace \(\text {null}(\mathcal {A}^{(\ell )}\circ \Psi )\) by \(f^{\text {FW}(k)}_{\vert \text {null}(L^{\Psi }_\ell )}\). The algorithm initializes \(\ell = 0\). The outer while loop repeats until an \(\varepsilon \) optimal solution is found. If after rescaling with respect to \(X_\ell \) the Newton decrement at \(\mathcal {Y}_0\) satisfies

$$ \Delta \left( f^{\text {FW}(k)}_{\vert \text {null}(L^{\Psi }_\ell )}, \mathcal {Y}_0\right) \le 1/14 $$

the predictor subroutine is called. Here, the affine-scaling direction is projected onto the null space of \(L_\ell ^{\Psi }\), call it \(\mathcal {Z}\). Clearly, \(\mathcal {Y}_0 + s \mathcal {Z} \in L_\ell ^{\Psi }\) for all \(s \in \mathbb {R}\). Then the subroutine computes

$$\begin{aligned} s^*= \sup \left\{ s : \mathcal {Y}_0 - s \mathcal {Z} \in \mathbb {S}^{(n,k)}_+ \right\} , \end{aligned}$$
(12)

which provides the necessary notion of distance to the boundary in terms of \(\mathcal {Y}_0\) and \(\mathcal {Z}\). The returned point \(\mathcal {Y}_\ell := \mathcal {Y}_0 + \sigma s^*\mathcal {Z}\) is feasible and decreases the objective value, as shown in Section 5.

If the Newton decrement is not small enough, the corrector subroutine is called. Let \(v_\ell = \langle A_0, X_\ell \rangle \), i.e., the objective value of the previous iteration, and define

$$ L_\ell ^{\Psi }(v_\ell ) = \left\{ \mathcal {Y} \in \mathbb {S}^{(n,k)}_+ : \langle A_0 , \Psi (\mathcal {Y}) \rangle = v_\ell , \mathcal {A}^{(\ell )}(\Psi (\mathcal {Y}))=b\right\} . $$

Let \(x_0 := \mathcal {Y}_0\). Denote by \(n_{\vert L^{\Psi }_\ell (v_{\ell })}(x_i)\) the Newton step of \(f^{\text {FW}(k)}_{\vert L^{\Psi }_\ell (v_{\ell })}\) at a point \(x_i\). The corrector step now computes

$$ x_{i+1} = \text {argmin}_{t} f^{\text {FW}(k)}\left( x_i + t n_{\vert L^{\Psi }_\ell (v_{\ell })}(x_i)\right) $$

until \(x_{i+1}\) is close enough to the central path of the rescaled problem over \(\mathbb {S}^{(n,k)}_+\) and returns \(\mathcal {Y}_\ell := x_{i+1}\). We will prove in Section 4 how this leads to a decrease in distance to the central path of the original SDP. Note that multiple calls of the corrector step may be necessary as after rescaling the Newton decrement might not be small enough anymore. However, as we prove later on, the maximum number of corrector steps can be bounded in terms of the problem data. Let \(\mathcal {Y}_\ell \) be the point returned by one of the subroutines. We set

$$ X_{\ell +1} = X_{\ell }^{1/2} \Psi (\mathcal {Y}_\ell )X_{\ell }^{1/2}. $$

Then

$$ \langle A^{(\ell +1)}_i, I \rangle = \langle A^{(\ell )}_i, \Psi (\mathcal {Y}_\ell ) \rangle = \langle A_i , X_{\ell +1} \rangle $$

for all \(i = 0,1, \dots , m\).

2.4 Termination Criterion

In the predictor as well as in the corrector subroutine we solve a linear system for \(y \in \mathbb {R}^m\). The solution of this linear system may be interpreted as a dual feasible solution provided the current iterate is sufficiently close to the central path. Hence, we can approximate the duality gap of our problem by calculating the difference

$$ \langle A_0, X_\ell \rangle - y^Tb \ge 0, $$

where y is calculated in every subroutine call. We may use this as a termination criterion. Once this quantity falls below some \(\varepsilon > 0\) chosen beforehand, we terminate with an \(\varepsilon \) optimal solution.

Algorithm 1
figure a

Predictor-corrector SDP algorithm using FW\(_n(k)\).

Algorithm 2
figure b

Subroutine Predictor_Step.

Algorithm 3
figure c

Subroutine Corrector_Step.

3 Barrier Functionals for \(\mathbb {S}^{n}_+\) and \(\text {FW}_n(k)\)

In this section we derive the self-concordant barrier functional for the cone \(\mathbb {S}^{(n,k)}_+\) which is used in the algorithm. Note that the ordinary self-concordant barrier for \(\mathbb {S}^n_+\) is given by \(f^{\text {SDP}}(X) = - \log (\det (X))\). We will emphasize parallels to the work of Roig-Solvas and Sznaier [19].

In order to construct a self-concordant barrier function for our underlying set, we introduce the notions of hyper-graphs and edge colorings as well as a well-known result about these objects.

Definition 3

A hyper-graph \(\mathcal {H} = (V,E)\) consists of a set \(V = \{1, \dots , n\}\) of vertices and a set of hyper-edges \(E \subseteq \{ J \subseteq V : |J| \ge 2\}\), which are subsets of the vertex set V. If all elements in E contain exactly k vertices, we call the corresponding hyper-graph k-uniform.

Definition 4

Let \(\mathcal {H} = (V,E)\) be a hyper-graph. A proper hyper-edge coloring with m colors is a partition of the hyper-edge set E into m disjoint sets (color classes), say \(E= \cup _{i \in [m]} S_i\) such that \(S_i \cap S_j = \emptyset \) if \(i \ne j\), and two hyper-edges that share a vertex are not in the same color class. In other words, a proper hyper-edge coloring assigns a color to every hyper-edge such that, if a given vertex appears in two different hyper-edges, they have different colors.

Theorem 1

(Baranyai’s theorem [6]) Let \(k,n \in \mathbb {N}\) be such that \(k \ge 2\) and \(k \vert n\), and let \(K^n_k\) be the complete k-uniform hyper-graph on n vertices. Then \(K^n_k\) has a proper hyper-edge coloring using \({{n-1}\atopwithdelims (){k-1}}\) colors.

In (7) we wrote a program over \(\text {FW}_n(k)\) as an equivalent program over the cone product \(\mathbb {S}^{(n,k)}_+\). The algorithm uses a self-concordant barrier function over said cone product. The mapping \(\Psi \) from \(\mathbb {S}_+^{(n,k)}\) to \(\text {FW}_n(k)\) is surjective, but not bijective, since multiple elements in the former may give rise to the same element in the latter set.

Assumption 1

Throughout we will assume \(k \vert n\) for some given \(n \in \mathbb {N}\) and \(2 \le k \in \mathbb {N}\).

This assumption is not without loss of generality, but one can always border the data matrices of the SDP problem (3) with \((n \mod k)\) extra rows and columns in a suitable way to ensure the assumption holds.

In the following we will let \(\mathcal {Y} \in \mathbb {S}^{(n,k)}\) be a collection of \({{n}\atopwithdelims (){k}}\) matrices of size \(k \times k\). We recall the operator \(\Psi \) is defined as

$$ \Psi (\mathcal {Y}) = \sum _{J \in \mathcal {J}}Y^{\rightarrow n}_J. $$

The following generalizes Lemma 4.4 in [19], where a similar result is proved for \(k=2\). It will be crucial in our analysis as it allows us to compare the values taken by the barrier functionals on \(\mathbb {S}^{(n,k)}_+\) and \(\mathbb {S}^n_+\) at \(\mathcal {Y}\) and \(\Psi (\mathcal {Y})\), respectively. In particular, it will allow us to bound the reduction in the SDP barrier function in terms of the reduction of the barrier for FW(k).

Lemma 2

Let

$$ f^{\textrm{FW}(k)}(\mathcal {Y}) = -\sum _{J \in \mathcal {J}} \log (\det (Y_J))\, , \, \mathcal {Y} \in \textrm{int}\left( \mathbb {S}^{(n,k)}_+ \right) . $$

The barrier \(f^{\textrm{FW}(k)}(\mathcal {Y})\) is self-concordant on \(\textrm{int}\left( \mathbb {S}^{(n,k)}_+ \right) \). Furthermore, if \(X = \Psi (\mathcal {Y})\) then

$$\begin{aligned} f^{\textrm{FW}(k)}(\mathcal {Y})\ge & {} -{{n-1}\atopwithdelims (){k-1}} \log (\det (X)) + n {{n-1}\atopwithdelims (){k-1}} \log \left( {{n-1}\atopwithdelims (){k-1}}\right) \\=: & {} {{n-1}\atopwithdelims (){k-1}} f^{\textrm{SDP}}(X) +n {{n-1}\atopwithdelims (){k-1}} \log \left( {{n-1}\atopwithdelims (){k-1}} \right) . \end{aligned}$$

Let us emphasize here that \(f^{\textrm{FW}(k)}\) is a self-concordant barrier for \(\mathbb {S}^{(n,k)}_+\) not \(\text {FW}_n(k)\). Before proving Lemma 2 we need an auxiliary result which extends Lemma A.1 from [19] to general values of k such that \(k\vert n\). To prove it we will make use of Theorem 1.

Lemma 3

Consider a \(\mathcal {Y} = (Y_J) \in \mathbb {S}^{(n,k)}\) consisting of positive definite \(k \times k\) matrices and let \(\Psi (\mathcal {Y}) \in \textrm{FW}_n(k)\). Then there exists a set of \({{n-1}\atopwithdelims (){k-1}}\) matrices \(Z_i \succ 0\) of size \(n \times n\) such that \(\Psi (\mathcal {Y}) = \sum _{i = 1}^{{{n-1}\atopwithdelims (){k-1}}}Z_i\) and \(f^{\textrm{FW}(k)}(\mathcal {Y})=- \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}} \log (\det (Z_i))\).

Proof

Let \(K^n_k\) be the complete k-uniform hyper-graph on n vertices. We can identify each hyper-edge \(\{i_1,i_2,\dots , i_k\}\subset [n]\) in \(K^n_k\) with exactly one element \(Y_J \in \mathcal {Y}\), namely the one where \(\{i_1,i_2,\dots , i_k\} =J\). Let \(\left\{ S_1, \dots , S_{{{n-1}\atopwithdelims (){k-1}}}\right\} \) be the color classes of a hype-edge coloring of \(K^n_k\). Define \(\mathcal {Y}_i := \{Y_J : J \in S_i \}\) and set \(Z_i := \Psi (\mathcal {Y}_i)\). Then \(\Psi (\mathcal {Y}) = \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}}Z_i\) since \(S_i \cap S_j = \emptyset \) for \(i \ne j\) and \(\cup _{i} S_i = \mathcal {J}\). Moreover, since each \(S_i\) corresponds to disjoint index sets in \(\mathcal {J}\), there exists a permutation matrix \(P_i\) for every \(i = 1, \dots , {{n-1}\atopwithdelims (){k-1}}\) such that \(P_i Z_i P_i^T\) is a block-diagonal matrix with blocks \(Y_J\) on the diagonal for \(J \in S_i\). This shows that \(Z_i \succ 0\).

From this we find

$$ \log (\det (Z_i)) = \log \left( \det \left( P_i Z_i P_i^T\right) \right) = \sum _{J \in S_i}\log (\det (Y_J)). $$

Hence,

$$\begin{aligned} \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}}\log (\det (Z_i))= & {} \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}} \sum _{J \in S_i}\log (\det (Y_J)) \\= & {} \sum _{J \in \mathcal {J}} \log (\det (Y_J)) = -f^{\text {FW}(k)}(\mathcal {Y}), \end{aligned}$$

completing the proof. \(\square \)

We continue to prove Lemma 2.

Proof of Lemma 2

The self-concordance of \(f^{\text {FW}(k)}\) on \(\text {int}\big (\mathbb {S}^{(n,k)}_+ \big )\) follows immediately from the self-concordance of \(-\log \det (\cdot )\) on \(\text {int}\left( \mathbb {S}^k_+\right) \). By assumption \(X = \Psi (\mathcal {Y}) = \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}}Z_i \in \text {FW}_n(k)\). Therefore,

$$\begin{aligned} - \log \left( \det (X)\right)= & {} - \log \left( \det \left( \frac{1}{{{n-1}\atopwithdelims (){k-1}}} \sum _{i= 1}^{{{n-1}\atopwithdelims (){k-1}}} {{n-1}\atopwithdelims (){k-1}} Z_i \right) \right) \\\le & {} - \sum _{i = 1}^{{{n-1}\atopwithdelims (){k-1}}} \frac{1}{{{n-1}\atopwithdelims (){k-1}}} \log \left( \det \left( {{n-1}\atopwithdelims (){k-1}} Z_i \right) \right) \\= & {} - \sum _{i = 1}^{{{n-1}\atopwithdelims (){k-1}}} \frac{1}{{{n-1}\atopwithdelims (){k-1}}} \log \left( {{n-1}\atopwithdelims (){k-1}}^n \det \left( Z_i \right) \right) , \end{aligned}$$

where the inequality follows by convexity of the function \(-\log \det (\cdot )\) on int\(\left( \mathbb {S}^n_+\right) \). Hence, we find

$$\begin{aligned} -{{n-1}\atopwithdelims (){k-1}} \log \left( \det \left( X\right) \right)\le & {} - \sum _{i = 1}^{{{n-1}\atopwithdelims (){k-1}}} \left( n \log \left( {{n-1}\atopwithdelims (){k-1}} \right) + \log \left( \det \left( Z_i \right) \right) \right) \\= & {} - \sum _{i = 1}^{{{n-1}\atopwithdelims (){k-1}}} \log \left( \det \left( Z_i \right) \right) - {{n-1}\atopwithdelims (){k-1}} n \log \left( {{n-1}\atopwithdelims (){k-1}} \right) , \end{aligned}$$

and the claim follows. \(\square \)

The following corollary is analogous to Corollary 4.5 from [19].

Corollary 4

If

$$ \mathcal {Y}_0 = \left\{ Y_J = {{n-1}\atopwithdelims (){k-1}}^{-1} I_{k \times k} : J \subset [n], |J| = k \right\} , $$

then

$$\begin{aligned} f^{\textrm{FW}(k)}(\mathcal {Y}_0)= & {} {{n-1}\atopwithdelims (){k-1}} f^{\textrm{SDP}}(X)+n {{n-1}\atopwithdelims (){k-1}}\log \left( {{n-1}\atopwithdelims (){k-1}} \right) \\= & {} n {{n-1}\atopwithdelims (){k-1}}\log \left( {{n-1}\atopwithdelims (){k-1}} \right) . \end{aligned}$$

Proof

The first statement follows when noting that each \(i \in [n]\) lies in exactly \(\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) \) subsets of [n] of size k. The reason is that when fixing i, there are \(n-1\) elements left out of which we want to choose \(k-1\) more elements to make a set of size k. For the second statement note that

$$ \log \left( \det \left( \frac{1}{{{n-1}\atopwithdelims (){k-1}}}I_{k \times k}\right) \right) =\log \left( {{n-1}\atopwithdelims (){k-1}}^{-k} \right) = -k \log \left( {{n-1}\atopwithdelims (){k-1}}\right) . $$

The result follows when noting that \(k \left( {\begin{array}{c}n\\ k\end{array}}\right) = n {{n-1}\atopwithdelims (){k-1}}\). \(\square \)

4 Further Properties of the Barrier Functions

To prove convergence of our algorithm we need two essential ingredients. First, we need to prove that the predictor step reduces the current objective value sufficiently, and secondly, we must prove that the corrector step converges to a point close to the central path. Moreover, we have to show that our criterion to decide which subroutine to call is valid. The issue here is that we compute the Newton decrement of \(f^{\text {FW}(k)}\) at \(\mathcal {Y}_0\), but we need to be able to assert that the Newton decrement of \(f^{\text {SDP}}\) at \(X_\ell \) is small enough.

The next result we present will allow us to lower bound the progress made by the corrector step. For this we need to be able to compare the barrier functions for \(\mathbb {S}^n_+\) and \(\mathbb {S}^{(n,k)}_+\). We assume we have a given feasible solution \(X_\ell \) such that \(\langle A^{(\ell )}_0, I \rangle = v\). Define the vector \(b(v) := (v, b_1, \ldots , b_m)^T\). For further reference, consider

$$\begin{aligned} \min \left\{ f^{\text {SDP}}(X) : \langle A^{(\ell )}_i, X \rangle = b(v)_i~ \forall i = 0, 1, \ldots ,m, X \in \mathbb {S}^n_+\right\} , \end{aligned}$$
(13)

which we would like to compare to

$$\begin{aligned} \min \left\{ f^{\text {FW}(k)}(\mathcal {Y}) : \mathcal {Y} \in L_{\ell }^{\Psi }(v) \cap \mathbb {S}^{(n,k)}_+ \right\} . \end{aligned}$$
(14)

Suppose \(\mathcal {Y}^*\) is an approximate solution to (14). Defining

$$ X_{\ell +1} = X_{\ell }^{1/2} \Psi (\mathcal {Y}^*) X_{\ell }^{1/2}, $$

we find that \(X_{\ell } \in \mathcal {F}_{\textrm{SDP}}\) for all \(\ell \). In other words, the points \(X_\ell \) we obtain via this procedure are all feasible for the original SDP (1). The following lemma allows us to lower bound the decrease achieved by one corrector step in terms of an element in \(\mathbb {S}^{(n,k)}_+\).

Lemma 5

Let \(\mathcal {Y}^*\) be a feasible solution to (14) and \(\mathcal {Y}_0\) as in (10). Furthermore, let \(X_{\ell +1} = X_{\ell }^{1/2}\Psi (\mathcal {Y}^*)X_{\ell }^{1/2}\) for \(X_\ell \) a feasible solution. Then

$$ {{n-1}\atopwithdelims (){k-1}}\left( f^{\textrm{SDP}}(X_{\ell })-f^{\textrm{SDP}}(X_{\ell +1}) \right) \ge f^{\textrm{FW}(k)}(\mathcal {Y}_{0})-f^{\textrm{FW}(k)}(\mathcal {Y}^*). $$

Proof

The proof follows immediately when noting that

$$\begin{aligned}&\quad \,\,\,{{n-1}\atopwithdelims (){k-1}} \left( f^{\textrm{SDP}}(X_{\ell })-f^{\textrm{SDP}}(X_{\ell +1})) \right) \\&= {{n-1}\atopwithdelims (){k-1}} \left( f^{\textrm{SDP}}(X_{\ell }) -f^{\textrm{SDP}}( X_{\ell }^{1/2} \Psi (\mathcal {Y}^*)X_{\ell }^{1/2} \right) \\&= \underbrace{n {{n-1}\atopwithdelims (){k-1}} \log \left( {{n-1}\atopwithdelims (){k-1}}\right) }_{= f^{\textrm{FW}(k)}(\mathcal {Y}_0)~\text {by Cor. 4}} \underbrace{-f^{\textrm{SDP}}(\Psi (\mathcal {Y}^*))-n{{n-1}\atopwithdelims (){k-1}} \log {{n-1}\atopwithdelims (){k-1}}.}_{\ge -f^{\textrm{FW}(k)}(\mathcal {Y}^*) \text { by Lemma 2}} \end{aligned}$$

\(\square \)

4.1 Relation of the Newton Decrements

In this subsection we will prove that we can upper bound the Newton decrement of \(f^{\textrm{SDP}}\) at the identity in terms of the Newton decrement of \(f^{\textrm{FW}(k)}\) at \(\mathcal {Y}_0\). We now define the following operator

$$ \Psi ^{\dagger } : \mathbb {S}^{n} \rightarrow \mathbb {S}^{(n,k)} $$

via

$$ \left( \Psi ^{\dagger }(X) \right) _J = \left( \frac{1}{{{n-1}\atopwithdelims (){k-1}}} I +\frac{1}{{{n-2}\atopwithdelims (){k-2}}}(ee^T-I)\right) \circ X_{J,J} \quad \text { for } J \subset [n],~ |J| = k, $$

where \(\circ \) denotes the Hadamard product. See Fig. 2 for a visualization of the surjection from \(\mathbb {S}^{(n,k)}_+\) to \(\textrm{FW}_n(k)\).

Fig. 2
figure 2

Visualization the surjection from \(\mathbb {S}^{(n,k)}_+\) to \(\textrm{FW}_n(k)\)

This operator satisfies

$$ \Psi (\Psi ^{\dagger }(X)) = X\quad \text { for all } X \in \mathbb {S}^n. $$

An inner product on \(\mathbb {S}^{(n,k)}\) is given by

$$ \langle \mathcal {X}, \mathcal {Y} \rangle _{(n,k)} := \sum _{ J \in \mathcal {J}} \langle X_J, Y_J \rangle ,\quad \left( \mathcal {X} = (X_J) \in \mathbb {S}^{(n,k)}, \; \mathcal {Y} = (Y_J) \in \mathbb {S}^{(n,k)}\right) . $$

It is straightforward to verify the following relation between the norms induced by this inner product, and the Frobenius norm on \(\mathbb {S}^n\).

Lemma 6

For any \(X \in \mathbb {S}^n\) we have

$$ \Vert \Psi ^\dagger (X) \Vert _{(n,k)} \le \Vert X\Vert . $$

Suppose now \(X_\ell \) is a feasible solution to (4) such that \(\langle A_0, X_\ell \rangle = v\). We define the two subspaces

$$\begin{aligned} L^{\Psi }_{\ell } = \left\{ \mathcal {Y} \in \mathbb {S}^{(n,k)} : (\mathcal {A}^{(\ell )} \circ \Psi ) (\mathcal {Y}) = b\right\} \end{aligned}$$

and

$$\begin{aligned} L_\ell = \left\{ X \in \mathbb {S}^n : \mathcal {A}^{(\ell )}(X) = b\right\} . \end{aligned}$$

We may also add an equality for the objective, in which case we will refer to the following operator

$$ \mathcal {A}_0^{(\ell )}(X) = \left( \langle A^{(\ell )}_0, X\rangle ,\langle A^{(\ell )}_1,X\rangle ,\ldots ,\langle A^{(\ell )}_m, X\rangle \right) \in \mathbb {R}^{m+1}. $$

The respective subspaces will be denoted as follows

$$\begin{aligned} L^{\Psi }_{\ell }(v) = \left\{ \mathcal {Y} \in \mathbb {S}^{(n,k)} : (\mathcal {A}_0^{(\ell )} \circ \Psi ) (\mathcal {Y}) = b(v)\right\} \end{aligned}$$
(15)

and

$$\begin{aligned} L_\ell (v) = \left\{ X \in \mathbb {S}^n : \mathcal {A}^{(\ell )}_0(X) = b(v)\right\} , \end{aligned}$$
(16)

where \(b(v) := (v, b_1, \ldots , b_m)^T\).

When we consider the subspaces defined via the operator with respect to the initial data matrices, we omit the subscript \(\ell \), e.g.,

$$ L^{\Psi } = \left\{ \mathcal {Y} \in \mathbb {S}^{(n,k)} : \langle A_i, \Psi (\mathcal {Y}) \rangle = b_i,~\forall i \in [m] \right\} . $$

The following lemma corresponds to Lemma A.2 in [19], and allows us to bound the Newton decrement of \(f^{\textrm{SDP}}_{\vert L}\) in terms of \(f^{\textrm{FW}(k)}_{\vert L}\).

Lemma 7

Assume \(\mathcal {Y}_0 \in L^{\Psi }\) and \(I \in L\). At \(\mathcal {Y}_0\) one has

$$ \Delta \left( f^{\textrm{FW}(k)}_{\vert _{L^{\Psi }}},\,\mathcal {Y}_0\right) \ge \frac{\Delta \left( {{n-1}\atopwithdelims (){k-1}} f^{\textrm{SDP}}_{\vert _L},I\right) }{\sqrt{{{n-1}\atopwithdelims (){k-1}}}} = \sqrt{{{n-1}\atopwithdelims (){k-1}}}\Delta \left( f^{\textrm{SDP}}_{\vert _L},\,I\right) . $$

Proof

Following (9) we have

$$ \Delta \left( f^{\textrm{FW}(k)}_{\vert _{L^\Psi }},\, \mathcal {Y}\right) \ge \frac{\langle d, n^{\textrm{FW}}(\mathcal {Y}) \rangle _{(n,k),\mathcal {Y}}}{\Vert d\Vert _{(n,k),\mathcal {Y}}}\quad \text { for all }d \in L \setminus \{ 0\}. $$

Choosing \(d = \Psi ^{\dagger }(n^{\textrm{SDP}}_L(X)) \in L\) leads to

$$\begin{aligned} \Delta \left( f^{\textrm{FW}(k)}_{\vert _{L^\Psi }}, \mathcal {Y}\right) \ge \frac{\langle \Psi ^\dagger (n^{\textrm{SDP}}_L(X)),n^{\textrm{FW}}(\mathcal {Y}) \rangle _{{(n,k),\mathcal {Y}}}}{\Vert \Psi ^\dagger (n^{\textrm{SDP}}_L(X))\Vert _{{(n,k),\mathcal {Y}}}}, \end{aligned}$$

and evaluating the expression at \(\mathcal {Y}_0\) we find

$$\begin{aligned} \Delta \left( f^{\textrm{FW}(k)}_{\vert _{L^\Psi }}, \mathcal {Y}_0\right)\ge & {} \frac{\langle \Psi ^\dagger (n^{\textrm{SDP}}_L(X)),n^{\textrm{FW}}(\mathcal {Y}_0) \rangle _{{(n,k),\mathcal {Y}_0}}}{\Vert \Psi ^\dagger (n^{\textrm{SDP}}_L(X))\Vert _{{(n,k),\mathcal {Y}_0}}} \\= & {} \frac{\langle \Psi ^\dagger (n^{\textrm{SDP}}_L(X)) , -g^{\textrm{FW}}(\mathcal {Y}_0) \rangle _{(n,k)}}{ {{n-1}\atopwithdelims (){k-1}} \Vert \Psi ^\dagger (n^{\textrm{SDP}}_L(X)) \Vert _{(n,k)}} \\\ge & {} \frac{ \langle \Psi ^\dagger (n^{\textrm{SDP}}_L(X)), (I, I , \ldots , I) \rangle _{(n,k)}}{\Vert n^{\textrm{SDP}}_L(X) \Vert } \\= & {} \frac{\textrm{tr}(n_L^{\textrm{SDP}}(X))}{\Vert n_L^{\textrm{SDP}}(X)\Vert }, \end{aligned}$$

where the second inequality follows from Lemma 6. Setting \(X = I\) and noting

$$\begin{aligned} \textrm{tr}(n_L^{\textrm{SDP}}(I)) = \langle I, n_L^{\textrm{SDP}}(I) \rangle= & {} \frac{1}{{{n-1}\atopwithdelims (){k-1}}}\langle g^{\textrm{SDP}}(I), -n_L^{\textrm{SDP}}(I) \rangle \\= & {} \frac{1}{{{n-1}\atopwithdelims (){k-1}}}\left( \Delta \left( {{n-1}\atopwithdelims (){k-1}} f^{\textrm{SDP}}_{\vert _L}, I \right) \right) ^2, \end{aligned}$$

we conclude

$$\begin{aligned} \Delta \left( f^{\textrm{FW}(k)}_{\vert _{L^\Psi }}, \mathcal {Y}_0\right) \ge \frac{1}{{{n-1}\atopwithdelims (){k-1}}}\frac{\Delta \left( {{n-1}\atopwithdelims (){k-1}} f^{\textrm{SDP}}_{\vert _L},I\right) ^2}{\Vert n_L^{\textrm{SDP}}(I)\Vert } = \frac{\Delta \left( {{n-1}\atopwithdelims (){k-1}} f^{\textrm{SDP}}_{\vert _L},I\right) }{\sqrt{{{n-1}\atopwithdelims (){k-1}}}}, \end{aligned}$$

because

$$ \Vert n_L^{\textrm{SDP}}(I)\Vert = \frac{\Delta \left( {{n-1}\atopwithdelims (){k-1}} f_{\vert L}^{\textrm{SDP}}, I\right) }{\sqrt{{{n-1}\atopwithdelims (){k-1}}}} = \sqrt{{{n-1}\atopwithdelims (){k-1}} } \Delta \left( f_{\vert L}^{\textrm{SDP}}, I\right) . $$

\(\square \)

5 Complexity Analysis

We begin the complexity analysis with the following lemma, which helps us to check whether the current point is close enough to the central path of the SDP.

Lemma 8

Let \(X_{\ell }\) be a feasible iterate for the SDP (13) and let the objective value at \(X_{\ell }\) be v, i.e., \(\langle A_0, X_\ell \rangle = v\). Define the two subspaces \(L^{\Psi }_{\ell }(v)\), \(L_\ell \) as in (15), (11) respectively. Then, if

$$ \Delta \left( f^{\textrm{FW}(k)}_{\vert L^{\Psi }_{\ell }(v)},\, \mathcal {Y}_0\right) \le \frac{1}{14}, $$

one has

$$ \Delta \left( {f^{\textrm{SDP}}_{{\eta _{v}}\vert _{L_\ell }}},\, I\right) \le \frac{1}{9}, $$

where

$$ f^{\textrm{SDP}}_{\eta _{v}}(X) = \eta _v \langle A_0, X \rangle - \log \det (X), $$

and \(\eta _v\) is the value of the central path parameter that corresponds to the objective value v.

Proof

By Lemma 7 we know that

$$ \frac{1}{14} \ge \Delta \left( f^{\textrm{FW}(k)}_{\vert L^{\Psi }_{\ell }(v)},\, \mathcal {Y}_0\right) \ge \Delta \left( f^{\textrm{SDP}}_{\vert L_{\ell }(v)},\, I\right) . $$

Let now z(v) be the point on the central path of the rescaled SDP with objective value v and let the corresponding parameter be \(\eta _v\). By Theorem 2.2.5 from [18] we have

$$\begin{aligned} \Vert z(v)-I \Vert _I \le \Delta \left( f^{\textrm{SDP}}_{\vert L_{\ell }(v)},\, I\right) +\frac{3 \Delta \left( f^{\textrm{SDP}}_{\vert L_{\ell }(v)},\, I\right) ^2}{\left( 1- \Delta \left( f^{\textrm{SDP}}_{\vert L_{\ell }(v)},\, I\right) \right) ^3} \le \frac{1}{11}. \end{aligned}$$

Let \(X_+\) be the point returned by taking a Newton step at \(X = I\) with respect to the function \(f^{\textrm{SDP}}_{\eta _v}\) restricted to \(L_\ell \). By Theorem 2.2.3 in [18] we have

$$ \frac{\Vert z(v)-I\Vert ^2_I}{1- \Vert z(v)-I\Vert _I} \ge \Vert X_+ -z(v)\Vert _I $$

and hence

$$\begin{aligned} \Delta \left( f^{\textrm{SDP}}_{\eta _{v}\vert _{L_\ell }}, I\right) = \Vert X_+-I\Vert _I\le & {} \Vert X_+ -z(v)\Vert _I + \Vert z(v)- I\Vert _I \\\le & {} \frac{\Vert z(v)-I\Vert ^2_I}{1-\Vert z(v)-I\Vert _I} + \Vert z(v)- I\Vert _I \le \frac{1}{9}. \end{aligned}$$

\(\square \)

The Newton decrement of the rescaled SDP being smaller than 1/9 means we can safely perform the next predictor step. If the current point is too far away from the central path and one were to perform the predictor step the direction may not be approximately tangential to the central path. Hence, once the Newton decrement of the factor width program is small enough, so is the one of the SDP and we can perform the next predictor step, knowing the direction will be approximately tangential to the central path. After each predictor step we may have to take several corrector steps, to get back close to the central path.

5.1 Corrector Step

We will now find an upper bound on the number of corrector steps needed to get close to the central path. We know from Lemma 5 that a decrease in the barrier for the factor width cone will lead to a decrease in the barrier function for our original SDP, meaning we made progress towards its central path. The following lemma asserts that if we are too far away from the central path we can attain at least a constant reduction in the barrier of the factor width cone and therefore obtain a constant reduction in the SDP barrier as well.

Lemma 9

Let \(X_{\ell }\) be a feasible iterate for the SDP (13) and let the objective value at \(X_{\ell }\) be v. Define the subspace \(L^{\Psi }_{\ell }(v)\) as in (15). If

$$ \Delta \left( f^{\textrm{FW}(k)}_{\vert L^{\Psi }_{\ell }(v)}, \mathcal {Y}_0\right) > \frac{1}{14}, $$

then

$$ f^{\textrm{FW}(k)}_{\vert L^{\Psi }_{\ell }(v)}(\mathcal {Y}_0)-f^{\textrm{FW}(k)}_{\vert L^{\Psi }_{\ell }(v)}(\mathcal {Y}^{*}) \ge \frac{1}{2688}. $$

Proof

If \(\Delta \left( f^{\textrm{FW}(k)}_{ \vert L^{\Psi }_{\ell }(v)}, \mathcal {Y}_0\right) > \frac{1}{14}\) the corrector step will employ a line search to find \(\mathcal {Y}^{*}\), i.e. the point in \(L^{\Psi }_{\ell }(v)\) that minimizes \(f^{\textrm{FW}(k)}\). Let \(n_{L^{\Psi }_{\ell }(v)}(\mathcal {Y}_0)\) be the Newton step taken from \(\mathcal {Y}_0\) and let \(t = \frac{1}{8 \Vert n_{L^{\Psi }_{\ell }(v)}(\mathcal {Y}_0) \Vert _{(n,k),\mathcal {Y}_0}}\), where the norm in the denominator is the local norm at \(\mathcal {Y}_0\) induced by \(\langle \cdot , \cdot \rangle _{(n,k)}\). Then, for

$$ \tilde{\mathcal {Y}} = \mathcal {Y}_0 + t \, n_{L^{\Psi }_{\ell }(v)}(\mathcal {Y}_0) $$

we find by Theorem 2.2.2 in [18]

$$\begin{aligned} f^{\textrm{FW}(k)}(\tilde{\mathcal {Y}})\le & {} f^{\textrm{FW}(k)}(\mathcal {Y}_0)-\frac{1}{14}\frac{1}{8}+\frac{1}{2}\left( \frac{1}{8} \right) ^2+ \frac{(1/8)^3}{3(1-1/8)} \\\le & {} f^{\textrm{FW}(k)}(\mathcal {Y}_0)-\frac{1}{2688}. \end{aligned}$$

\(\square \)

Note that this implies together with Lemma 5 that

$$\begin{aligned} \frac{1}{2688} \le f^{\textrm{FW}(k)}(\mathcal {Y}_0)-f^{\textrm{FW}(k)}(\tilde{\mathcal {Y}})\le & {} f^{\textrm{FW}(k)}(\mathcal {Y}_0)- f^{\textrm{FW}(k)}(\mathcal {Y}^*) \\\le & {} {{n-1}\atopwithdelims (){k-1}} \left( f^{\textrm{SDP}}(X_\ell )-f^{\textrm{SDP}}(X_{\ell +1}) \right) . \end{aligned}$$

Knowing each line search reduces the distance to the targeted point on the central path at least by a constant amount will allow us to bound the number of line searches we need to get close enough if we have an upper bound on the distance of the result of the predictor step and the corresponding point on the central path of the SDP.

Lemma 10

Let \(X_1\) be close to a point \(z(v_1)\) on the central path of the SDP in the sense that \(\Delta \left( f^{\textrm{SDP}}_{L_{\ell }(v_1)}, X_1\right) \le \frac{1}{9}\). Furthermore, let \(X_2\) be the result of the predictor step and \(z(v_2)\) be the point on the central path with the same objective value as \(X_2\). Then

$$ f^{\textrm{SDP}}(X_{2})-f^{\textrm{SDP}}(z(v_2)) \le n \left( \log \frac{1}{1-\sigma }\right) +\frac{1}{154}. $$

Proof

A proof of this statement for generic self-concordant barriers may be found on page 54 of [18]. We have used that the barrier parameter for the barrier of the psd cone is given by \(\vartheta _{f^{\textrm{SDP}}} = n\). \(\square \)

Lemma 11

Let \(v_2\) be the objective value of the result \(X_2\) of the predictor step. The maximum number K of line searches needed to find a point \(X_{K+2}\) which is close enough to \(z(v_2)\) in the sense that \(\Delta \left( f^{\textrm{SDP}}_{\vert L_{\ell }(v_2)}, X_{K+2}\right) \le \frac{1}{9}\) is

$$ K = \left\lceil 2688 {{n-1}\atopwithdelims (){k-1}} \left( n \log \left( \frac{1}{1-\sigma }\right) +\frac{1}{154} \right) \right\rceil , $$

where \(z(v_2)\) is the point on the central path with objective value \(v_2\).

Proof

We know that the distance between the result of the predictor phase and the targeted point on the central path is at most \(n \left( \log \frac{1}{1-\sigma }\right) +\frac{1}{154}\) by Lemma 10. Moreover, using Lemmas 9 and 5 we find that in each corrector step we reduce this distance by at least \(\frac{1}{2688{{n-1}\atopwithdelims (){k-1}}}\), unless the SDP Newton decrement at I is already small enough to perform the next predictor step. If after rescaling the Newton decrement of the factor width program satisfies

$$ \Delta \left( f^{\textrm{FW}(k)}_{\vert L^{\Psi }_{\ell }(v)}, \mathcal {Y}_0\right) > \frac{1}{14}, $$

thereby implying by Lemma 8 that I is not close to the central path of the SDP we can perform another corrector step yielding at least a constant decrease of \(\frac{1}{2688{{n-1}\atopwithdelims (){k-1}}}\) of the distance to the central path, and rescale again. This process can be continued until we do not get such a constant decrease anymore at which point we know we must be close enough to the central path, in the sense of Lemma 8. This is because if the decrease is not greater than \(\frac{1}{2688{{n-1}\atopwithdelims (){k-1}}}\) we know that the Newton decrement cannot satisfy

$$ \Delta \left( f^{\textrm{FW}(k)}_{\vert L^{\Psi }_{\ell }(v)}, \mathcal {Y}_0\right) > \frac{1}{14}, $$

from which follows by Lemma 8 that

$$ \Delta \left( f^{\textrm{SDP}}_{L_{\ell }(v)}, I\right) \le \frac{1}{9}. $$

This implies we are close enough to the central path to perform the next predictor step. Hence, after at most

$$ K = \left\lceil 2688 {{n-1}\atopwithdelims (){k-1}} \left( n \log \left( \frac{1}{1-\sigma }\right) +\frac{1}{154} \right) \right\rceil $$

corrector steps we are close enough to the central path so that we can perform the next predictor step.

5.2 Predictor Step

We will make use of the analysis of the short step interior point method discussed in Section 2.4.2 in [18]. We will show that each predictor step reduces the objective value by an amount at least as large as the objective decrease by the short-step interior point method. This will allow us to conclude the maximum number of predictor steps needed to obtain an \(\varepsilon \) optimal solution of the given SDP. Note that the decrease in objective value obtained by our predictor method is as follows. Let X be the point from where the predictor method starts and \(-(A_0)_X := -H(X)A_0\) be the direction. Then for \(\sigma \ge \frac{1}{4}\) and \(s^*\) as in (12) we find

$$\begin{aligned} \langle A_0, X-s^*\sigma \, (A_0)_X \rangle= & {} \langle (A_0)_X, X \rangle - s^*\sigma \langle A_0, (A_0)_X \rangle \\\le & {} \langle A_0, X \rangle -\frac{1}{4}\Vert (A_0)_X \Vert _X. \end{aligned}$$

This implies the decrease is at least as large the one obtained in one iteration of the short-step method, as discussed in [18, §2.4.2]. Renegar’s analysis shows that short-step method leads to an \(\varepsilon \) optimal solution in at most

$$ K = 10\sqrt{\vartheta _f}\log (\vartheta _f /(\varepsilon \, \eta _0)) $$

steps, where \(\eta _0\) is such that our starting point \(X_0\) is close to \(z_{\eta _0}\). By an \(\varepsilon \) optimal solution we mean a feasible solution X such that

$$ v^*_\textrm{SDP} \le \langle A_0 , X \rangle \le v^*_\textrm{SDP} + \varepsilon . $$

5.3 Predictor and Corrector Steps Combined

Combining the complexity analysis of predictor and corrector steps we arrive at the following theorem.

Theorem 12

Let \(X_0\) be a feasible solution of the SDP (1) and assume it is close to some point \(z_{\eta _0}\) on the corresponding central path in the sense that \(\Delta \left( f_{\vert _{L(v)}}^{\textrm{SDP}},X_0\right) <1/14\), where L is as in (16) for \(v = \langle A_0, X_0 \rangle \). Algorithm 1 converges to an \(\varepsilon \) optimal solution in at most

$$\begin{aligned} K&= \left\lceil 2688 {{n-1}\atopwithdelims (){k-1}} \left( n \log \left( \frac{1}{1-\sigma }\right) +\frac{1}{154} \right) \right\rceil 10\sqrt{n}\log (n /(\varepsilon \, \eta _0)) \\&= O\left( {n-1 \atopwithdelims ()k-1}n^{3/2}\log \left( \frac{1}{1-\sigma }\right) \log \left( \frac{n}{\varepsilon \eta _0}\right) \right) \end{aligned}$$

steps.

The assumption of a starting point “close to the central path” may be satisfied by the self-dual embedding strategy [11]. Alternatively, one may first solve an auxiliary SDP problem, as in [18, Section 2.4.2], by using the algorithm we have presented. The solution of this auxiliary problem then yields a point close to the central path of the original SDP problem.

6 Discussion and Future Prospects

We finish with a brief discussion on various topics surrounding Algorithm 1.

6.1 Replacing the Predictor Step

In their paper [19], the authors propose to perform a fixed number of decrease steps, where a decrease step consists of solving (6) and rescaling with respect to the optimal solution. In our algorithm we considered a different method to decrease the objective value, i.e., the predictor method, where we use the traditional SDP affine scaling direction.

6.2 Tractability of Factor Width Cones

Some recent ideas regarding factor width cones that could influence future research in this area are:

  • the idea to optimize over the dual cone of \(\textrm{FW}_n(k)\) by utilizing clique trees [24].

  • a variation on the factor width cone involving fewer blocks [25].

In addition, it would be very helpful to know a computable self-concordant barrier functional for the cone \(\textrm{FW}_n(k)\), as well as its complexity parameter.