Abstract
We propose an interior point method (IPM) for solving semidefinite programming problems (SDPs). The standard interior point algorithms used to solve SDPs work in the space of positive semidefinite matrices. Contrary to that the proposed algorithm works in the cone of matrices of constant factor width. We prove global convergence and provide a complexity analysis. Our work is inspired by a series of papers by Ahmadi, Dash, Majumdar and Hall, and builds upon a recent preprint by Roig-Solvas and Sznaier [arXiv:2202.12374, 2022].
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Semidefinite programming problems (SDPs) are a generalization of linear programming problems (LPs). While capturing a much larger set of problems, SDPs are solvable up to fixed precision in polynomial time in terms of the input data, and linear in terms of the precision [17]; see [10] for the complexity in the Turing model of computation.
Practical computation is, however, more complicated. While we are able to solve linear programs with millions of variables and constraints routinely, SDPs become intractable already for a few tens of thousands of constraints and for \(n\times n\) matrix variables of the order \(n \approx 1,000\). The reason is that each iteration of a typical interior point algorithm for SDP requires \(\mathcal {O}(n^3m+n^2m^2 + m^3)\) operations, where n is the size of the matrix variable and m is the number of equality constraints; see e.g. [15] or [3]. However, solving large instances of SDPs is of growing interest, due to applications in power flow problems on large power grids, SDP-based hierarchies for polynomial and combinatorial problems, etc. (see [13, 23, 24]). In the following we will revisit a relaxation of a given SDP, where the cone of positive semidefinite matrices is replaced by a more tractable cone, namely the cone of matrices of constant factor width [7]. The simplest examples of matrices of constant factor width are non-negative diagonal matrices (corresponding to linear programs), and scaled diagonally dominant matrices (corresponding to second order cone programming) [4]. We then review how iteratively rescaling the cone and solving the given optimization problem over this new set leads to a non-increasing sequence of optimal values lower bounded by the optimum of the sought SDP. This iterative procedure, due to [1], does not lead to a convergent algorithm. However, its essence can be used to construct a convergent predictor-corrector interior point method, as was done in [19]. Our paper is inspired by ideas from [1, 2, 4, 5, 19]. In particular, we will extend the results in [19], and give a more concise complexity analysis in our extended setting.
1.1 Iterative Approximation Scheme
Let the set of symmetric \(n \times n\) matrices be given by \(\mathbb {S}^n\), where \(n \in \mathbb {N}\) is a positive integer. We write [m] for the set \(\{1, 2, \ldots , m\}\), where \(m \in \mathbb {N}\). Consider a set \(\{A_i \in \mathbb {S}^{n}: i \in [m] \}\) of symmetric data matrices and define the linear operator
where \(\langle X, Y \rangle := \textrm{tr}(XY)\) for \(X,Y \in \mathbb {S}^n\). Furthermore, define for \(b \in \mathbb {R}^m\) the affine subspace
Consider the following semidefinite program
which we assume to be strictly feasible. Replacing the cone of positive semidefinite (psd) matrices in (1) by a cone \(\mathcal {K} \subseteq \mathbb {S}^n_+\), which is more tractable, leads to the following program
Clearly, \(v_{\mathcal {K}} \ge v^*_{\textrm{SDP}}\). The quality of the approximation depends on the chosen cone \(\mathcal {K}\). In [4], while focusing on sums-of-squares optimization the authors consider the cones of diagonally dominant and scaled diagonally dominant matrices. Ahmadi and Hall developed the idea of replacing the psd cone by a simpler cone further in [1], leveraging an optimal solution of the relaxation. Essentially, the idea is as follows. Define the feasible set for (1) as
We will consider a sequence of strictly feasible points for (2), denoted by \(X_\ell \) for \(\ell = 0, 1, \ldots \). Since \(X_\ell \succeq 0\), the matrix \(X_\ell ^{1/2}\) is well-defined. One can update the data matrices in the following way
giving rise to a new linear operator
We may also refer to this operation as rescaling with respect to \(X_\ell \). Via this rescaling one obtains the following sequence of reformulations of (1):
whose feasible set we define as
For each \(\ell \) the identity matrix is feasible, i.e., we have \(X = I \in \mathcal {F}_{\textrm{SDP}_\ell }\). To see this, note that for all \(i \in [m]\) we have
Similarly, the identity leads to the same objective value in (3) as \(X_\ell \) in (2). Let \(X_0\) be an optimal solution to (2). Rescaling with respect to \(X_0\) we find by the same reasoning that \(v_{\mathcal {K}}^{(0)} \le v_{\mathcal {K}}\), where
Reiterating this procedure leads to a non-increasing sequence of values \(\left\{ v^{(\ell )}_{\mathcal {K}} \right\} _{\ell \in \mathbb {N}}\) lower bounded by \(v^{*}_{\textrm{SDP}}\). Unfortunately, this procedure does not always converge to the true optimum of (1) if \(\mathcal {K}\) is a cone of matrices of constant factor width, as mentioned in [19]. Indeed, it can happen that \(\liminf _{\ell \rightarrow \infty } v_{\mathcal {K}}^{(\ell )} > v^*_{\textrm{SDP}}\). The rest of this paper is devoted to the development and analysis of an interior point algorithm, which converges to the optimal value \(v^*_{\textrm{SDP}}\). We thereby refine and extend results from [19], where a different interior point method (based on the factor width cone) was introduced. Our contribution is to give a concise polynomial-time convergence analysis, since the iteration complexity bounds given in [19] involve constants that depend on the data, but the dependence is not made explicit; see e.g. [19, Theorem 4.12]. Moreover, the authors of [19] only consider factor width at most 2 (i.e. the scaled, diagonally dominant matrices), while we analyse the general case.
1.2 Outline of the Paper
This paper is conceptually divided into two parts. The first part contains Sections 1 and 2 and is devoted to introducing the setting as well as the algorithm. Our aim with the first part is to convey the concept in a comprehensible way. The second part consists of the remaining Sections 3–6. It is more technical and contains the derivation of objects used in the algorithm as well as the formal complexity analysis.
1.3 The Factor Width Cone
Fix \(n \in \mathbb {N}\). The cone of \(n\times n\) matrices of factor width k, denoted by \(\textrm{FW}_n(k)\), is defined as
The notion of factor width was first used in [7] where the authors proved that \({FW}_n(2)\) is the cone of scaled diagonally dominant matrices. Trivially, \({FW}_n(1)\) is the cone of non-negative \(n \times n\) diagonal matrices. Clearly, we have that
Moreover, \(\textrm{FW}_n(n) = \mathbb {S}^n_+\). It is easy to see these cones are proper. As they define an inner approximation of the cone \(\mathbb {S}^n_+\) we may use them in the aforementioned iterative scheme. Define \(\mathcal {J} := \{J \subset [n] : |J| = k\}\) for fixed \(n,k \in \mathbb {N}\) with \(k \vert n\).
An optimization problem over the cone \(\textrm{FW}_n(k)\) may be formulated as an optimization problem over the cone product \(\mathbb {S}_+^{(n,k)}\). To see this we need to consider principal submatrices. For a matrix \(S \in \mathbb {R}^{n \times n}\) we define the principal submatrix \(S_{J,J}\) for \(J \subseteq [n]\) to be the restriction of S to rows and columns whose indices appear in J. Furthermore, for a set \(J = \{ i_1, \dots , i_{|J|} \} \subseteq [n]\) and a matrix \(S \in \mathbb {R}^{ J \times J}\) we define the \(n \times n\) matrix \(S_J^{\rightarrow n}\) as follows for \(i,j \in [n]\)
In other words, \(S_J^{\rightarrow n}\) has \(S_J\) as principal sub-matrix indexed by J, and zeros elsewhere. Now, to write a program over \(\textrm{FW}_n(k)\) as an SDP note the following observation. It is easy to see that, for any \(X \in \textrm{FW}_n(k)\), we have
for suitable \(Y_J \in \mathbb {S}^k_+\) indexed by \(\mathcal {J}\). Thus, we can write
as
It is straightforward to show that the dual cone is given by
The dual cone has been studied in the context of semidefinite optimization in [8], where it was shown that the distance of \(\text {FW}_n(k)^*\) and \(\mathbb {S}^n_+\) in the Frobenius norm can be upper bounded by \(\frac{n-k}{n+k-2}\) for matrices of trace 1. For \(k\ge 3n/4\) and \(n\ge 97\) this bound can be improved to \(O(n^{-3/2})\) (see [8]).
2 Interior Point Methods and the Central Path
Interior point methods (IPMs) are among the most commonly used algorithms to solve conic optimization problems in practice. Notable software for IPMs include Mosek [16], CSDP [9], SDPA [12, 22], SeDuMi [20] and SDPT3 [21]. In the remainder of this section, we will closely follow the notation used in [18], since we will make use of several results from this book. Consider the following conic optimization problem for a proper convex cone \(\mathcal {K} \subset \mathbb {R}^n\):
In IPMs the cone membership constraint is replaced by adding a convex penalty function f to the objective. This function f is a so-called self-concordant barrier function. Loosely speaking, the function f returns larger values the closer the input is to the boundary of the cone and tends to infinity as the boundary is approached. In order to formally define self-concordant barrier functionals, let \(f : \mathbb {R}^n \supset D_f \rightarrow \mathbb {R}\) be such that its Hessian H(x) is positive definite (pd) for all \(x \in D_f\). With respect to this function, we can define a local inner product as follows
where \(u,v \in \mathbb {R}^n\) and \(\langle \cdot , \cdot \rangle \) is some reference inner product. Let \(B_x(y,r)\) be the open ball centered at y with radius \(r>0\) whose radius is measured by \(\Vert \cdot \Vert _x\), i.e., the norm arising from the local inner product at x.
Definition 1
(see [18, Section 2.2.1]) A functional f is called (strongly non-degenerate) self-concordant if for all \(x \in D_f\) we have that \(B_x(x,1) \subset D_f\) and whenever \(y \in B_x(x,1)\) we have
A functional f is called a self-concordant barrier functional if f is self-concordant and additionally satisfies
where g(x) is the gradient of f.
We refer to \(\vartheta _f\) as the complexity value of f (see [18, p. 35]), which will become crucial in our complexity analysis. Henceforth, let f be a self-concordant barrier functional for \(\mathcal {K}\) and consider the following family of problems for positive \(\eta \in \mathbb {R}_+\)
The minimizers \(z_\eta \) of (8) define a curve, parametrized by \(\eta \) in the interior of \(\mathcal {K}\). This curve is called the central path. For \(\eta \rightarrow \infty \) one can show that \(z_\eta \rightarrow x^*\), where \(x^*\) denotes an optimal solution. Interior point methods work by subsequently approximating a sequence of points \(\{z_{\eta _i} : i = 1, \dots , N\}\) on the central path, where \(\eta _1< \eta _2 < \cdots \) such that \(z_{\eta _N}\) is within the desired distance to the optimal solution. The type of interior point method we consider is an adaptation of the (primal) predictor-corrector method (see [18, § 2.4.4]). This method uses the ordinary affine scaling direction to produce a new point inside the cone with decreased objective value. Afterwards, a series of corrector steps is performed to obtain feasible solutions with the same objective value that lie increasingly close to the central path. Interior point methods typically rely on Newton’s method in each step, where the convergence rate depends on the so-called Newton decrement.
Definition 2
If \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) has a gradient g(x) and positive definite Hessian \(H(x) \succ 0\) at a point x in its domain, then the Newton decrement of f at x is defined as
For self-concordant functions f, a sufficiently small value of \(\Delta (f,x)\), e.g., \(\Delta (f,x) < 1/9\), implies that x is close to the minimizer of f (cf. [18, Theorem 2.2.5]).
Suppose we are given a starting point \(x_0\), which is close to \(z_{\eta _0}\) for some \(\eta _0 \in \mathbb {R}\). The affine-scaling direction is given by \(- H(x_0)^{-1}c\) and points approximately tangential to the central path in the direction of decreasing the objective value \(\langle c, x \rangle \) (\(-H^{-1}(z_{\eta _0})c\) is exactly tangential to the central path). The predictor step moves from \(x_0\) a fixed fraction \(\sigma \in (0,1)\) of the distance towards the boundary of the feasible set in the affine-scaling direction, thereby producing a new point \(x_1\) satisfying \(\langle c, x_1 \rangle < \langle c, x_0 \rangle \). The new point \(x_1\) is not necessarily close to the central path. The algorithm then proceeds to produce a sequence of feasible points \(x_2, x_3, \dots \) satisfying \(\langle c, x_1\rangle = \langle c , x_i \rangle \) for \(i = 2, 3, \dots \) while each \(x_i\) for \(i = 2, 3, \dots \) is closer to the central path than its predecessor \(x_{i-1}\). In other words, the algorithm targets the point \(z_{\eta _1}\) on the central path with the same objective value as \(x_1\) and produces a sequence of points converging to \(z_{\eta _1}\). Once an \(x_j\) is found such that \(\Delta (f, x_j) < 1/9\), the next predictor step is taken. This procedure is repeated until an \(\varepsilon \)-optimal solution is found. The corrector phase works by minimizing the self-concordant barrier restricted to the feasible affine space intersected with the set of all \(x \in \mathbb {R}^n\) such that \(\langle c, x\rangle = \langle c, x_i\rangle \), where \(x_i\) is the point produced by the most recent predictor step. This minimization problem is solved iteratively by performing line searches along the direction given by the Newton step for the restricted functional. We provide a visualization of the predictor-corrector method in Fig. 1.
2.1 Newton Decrements for Functions Restricted to Subspaces
If a self-concordant function f is restricted to a (translated) linear subspace L, and denoted by \(f_{\vert L}\), then the Newton decrement at x becomes
where \(\Vert \cdot \Vert _x\) is the norm induced by the inner product \(\langle u,v\rangle _x = \langle u, H(x)v \rangle \), and \(P_{L,x}\) is the orthogonal projection onto L for the \(\Vert \cdot \Vert _x\) norm; see [18, § 1.6].
Note that we have
where n(x) is the Newton step at x, i.e., \(n(x)= -H(x)^{-1}g(x)\). Hence, restricting the function f to a subspace L we find
2.2 A Predictor-Corrector Method Using FW(k)
In this subsection we propose our algorithm which makes use of the rescaling introduced in Section 1.1; see Algorithm 1 below. Our aim is to provide a comprehensible exposition, while the details are postponed to the second part of the paper, beginning with Section 3.
Algorithm 1 is an adaption of the predictor-corrector method as described in [18, Section 2.2.4]. Before describing the algorithm in detail we fix some notation. Let \(\mathcal {Y} \in \mathbb {S}^{(n,k)}\) be a collection of \({{n}\atopwithdelims (){k}}\) matrices of size \(k \times k\). We define the operator \(\Psi \) as
where we made use of the notation defined in (5). Hence, if \(\mathcal {Y}\) is a collection of positive semidefinite \(k\times k\) matrices, then \(\Psi (\mathcal {Y}) \in \text {FW}_n(k)\). Furthermore, let
so that \(\Psi (\mathcal {Y}_0) = I\). Now let \(X_\ell \) be a strictly feasible solution to a problem of form (1) and rescale the data matrices with respect to \(X_\ell \). Recall that the feasible set of the resulting SDP is contained in the following affine space
Likewise, the feasible set of the factor width relaxation written over \(\mathbb {S}^{(n,k)}_+\) (cf. (7)) is constrained to lie in the affine space
Note that \(I \in L_\ell \) and \(\mathcal {Y}_0 \in L^{\Psi }_\ell \). We emphasize that, by definition, for any element \(\mathcal {Y} \in L_{\ell }^{\Psi }\) we have \(\Psi (\mathcal {Y}) \in L_\ell \).
2.3 Main Method
The algorithm requires a feasible starting point \(X_0\) close to the central path, which is used in the first rescaling step. We also require an \(\varepsilon > 0\), i.e., our desired accuracy as well as a \(\sigma \in (0,1)\) used in the predictor step. In the following let \(f^{\text {FW}(k)}\) be a self-concordant barrier function for \(\mathbb {S}^{(n,k)}_+\) (we postpone its derivation to Section 3, for now we assume it exists and is efficiently computable). In the algorithm we denote the restriction of \(f^{\text {FW}(k)}\) to the subspace \(\text {null}(\mathcal {A}^{(\ell )}\circ \Psi )\) by \(f^{\text {FW}(k)}_{\vert \text {null}(L^{\Psi }_\ell )}\). The algorithm initializes \(\ell = 0\). The outer while loop repeats until an \(\varepsilon \) optimal solution is found. If after rescaling with respect to \(X_\ell \) the Newton decrement at \(\mathcal {Y}_0\) satisfies
the predictor subroutine is called. Here, the affine-scaling direction is projected onto the null space of \(L_\ell ^{\Psi }\), call it \(\mathcal {Z}\). Clearly, \(\mathcal {Y}_0 + s \mathcal {Z} \in L_\ell ^{\Psi }\) for all \(s \in \mathbb {R}\). Then the subroutine computes
which provides the necessary notion of distance to the boundary in terms of \(\mathcal {Y}_0\) and \(\mathcal {Z}\). The returned point \(\mathcal {Y}_\ell := \mathcal {Y}_0 + \sigma s^*\mathcal {Z}\) is feasible and decreases the objective value, as shown in Section 5.
If the Newton decrement is not small enough, the corrector subroutine is called. Let \(v_\ell = \langle A_0, X_\ell \rangle \), i.e., the objective value of the previous iteration, and define
Let \(x_0 := \mathcal {Y}_0\). Denote by \(n_{\vert L^{\Psi }_\ell (v_{\ell })}(x_i)\) the Newton step of \(f^{\text {FW}(k)}_{\vert L^{\Psi }_\ell (v_{\ell })}\) at a point \(x_i\). The corrector step now computes
until \(x_{i+1}\) is close enough to the central path of the rescaled problem over \(\mathbb {S}^{(n,k)}_+\) and returns \(\mathcal {Y}_\ell := x_{i+1}\). We will prove in Section 4 how this leads to a decrease in distance to the central path of the original SDP. Note that multiple calls of the corrector step may be necessary as after rescaling the Newton decrement might not be small enough anymore. However, as we prove later on, the maximum number of corrector steps can be bounded in terms of the problem data. Let \(\mathcal {Y}_\ell \) be the point returned by one of the subroutines. We set
Then
for all \(i = 0,1, \dots , m\).
2.4 Termination Criterion
In the predictor as well as in the corrector subroutine we solve a linear system for \(y \in \mathbb {R}^m\). The solution of this linear system may be interpreted as a dual feasible solution provided the current iterate is sufficiently close to the central path. Hence, we can approximate the duality gap of our problem by calculating the difference
where y is calculated in every subroutine call. We may use this as a termination criterion. Once this quantity falls below some \(\varepsilon > 0\) chosen beforehand, we terminate with an \(\varepsilon \) optimal solution.
3 Barrier Functionals for \(\mathbb {S}^{n}_+\) and \(\text {FW}_n(k)\)
In this section we derive the self-concordant barrier functional for the cone \(\mathbb {S}^{(n,k)}_+\) which is used in the algorithm. Note that the ordinary self-concordant barrier for \(\mathbb {S}^n_+\) is given by \(f^{\text {SDP}}(X) = - \log (\det (X))\). We will emphasize parallels to the work of Roig-Solvas and Sznaier [19].
In order to construct a self-concordant barrier function for our underlying set, we introduce the notions of hyper-graphs and edge colorings as well as a well-known result about these objects.
Definition 3
A hyper-graph \(\mathcal {H} = (V,E)\) consists of a set \(V = \{1, \dots , n\}\) of vertices and a set of hyper-edges \(E \subseteq \{ J \subseteq V : |J| \ge 2\}\), which are subsets of the vertex set V. If all elements in E contain exactly k vertices, we call the corresponding hyper-graph k-uniform.
Definition 4
Let \(\mathcal {H} = (V,E)\) be a hyper-graph. A proper hyper-edge coloring with m colors is a partition of the hyper-edge set E into m disjoint sets (color classes), say \(E= \cup _{i \in [m]} S_i\) such that \(S_i \cap S_j = \emptyset \) if \(i \ne j\), and two hyper-edges that share a vertex are not in the same color class. In other words, a proper hyper-edge coloring assigns a color to every hyper-edge such that, if a given vertex appears in two different hyper-edges, they have different colors.
Theorem 1
(Baranyai’s theorem [6]) Let \(k,n \in \mathbb {N}\) be such that \(k \ge 2\) and \(k \vert n\), and let \(K^n_k\) be the complete k-uniform hyper-graph on n vertices. Then \(K^n_k\) has a proper hyper-edge coloring using \({{n-1}\atopwithdelims (){k-1}}\) colors.
In (7) we wrote a program over \(\text {FW}_n(k)\) as an equivalent program over the cone product \(\mathbb {S}^{(n,k)}_+\). The algorithm uses a self-concordant barrier function over said cone product. The mapping \(\Psi \) from \(\mathbb {S}_+^{(n,k)}\) to \(\text {FW}_n(k)\) is surjective, but not bijective, since multiple elements in the former may give rise to the same element in the latter set.
Assumption 1
Throughout we will assume \(k \vert n\) for some given \(n \in \mathbb {N}\) and \(2 \le k \in \mathbb {N}\).
This assumption is not without loss of generality, but one can always border the data matrices of the SDP problem (3) with \((n \mod k)\) extra rows and columns in a suitable way to ensure the assumption holds.
In the following we will let \(\mathcal {Y} \in \mathbb {S}^{(n,k)}\) be a collection of \({{n}\atopwithdelims (){k}}\) matrices of size \(k \times k\). We recall the operator \(\Psi \) is defined as
The following generalizes Lemma 4.4 in [19], where a similar result is proved for \(k=2\). It will be crucial in our analysis as it allows us to compare the values taken by the barrier functionals on \(\mathbb {S}^{(n,k)}_+\) and \(\mathbb {S}^n_+\) at \(\mathcal {Y}\) and \(\Psi (\mathcal {Y})\), respectively. In particular, it will allow us to bound the reduction in the SDP barrier function in terms of the reduction of the barrier for FW(k).
Lemma 2
Let
The barrier \(f^{\textrm{FW}(k)}(\mathcal {Y})\) is self-concordant on \(\textrm{int}\left( \mathbb {S}^{(n,k)}_+ \right) \). Furthermore, if \(X = \Psi (\mathcal {Y})\) then
Let us emphasize here that \(f^{\textrm{FW}(k)}\) is a self-concordant barrier for \(\mathbb {S}^{(n,k)}_+\) not \(\text {FW}_n(k)\). Before proving Lemma 2 we need an auxiliary result which extends Lemma A.1 from [19] to general values of k such that \(k\vert n\). To prove it we will make use of Theorem 1.
Lemma 3
Consider a \(\mathcal {Y} = (Y_J) \in \mathbb {S}^{(n,k)}\) consisting of positive definite \(k \times k\) matrices and let \(\Psi (\mathcal {Y}) \in \textrm{FW}_n(k)\). Then there exists a set of \({{n-1}\atopwithdelims (){k-1}}\) matrices \(Z_i \succ 0\) of size \(n \times n\) such that \(\Psi (\mathcal {Y}) = \sum _{i = 1}^{{{n-1}\atopwithdelims (){k-1}}}Z_i\) and \(f^{\textrm{FW}(k)}(\mathcal {Y})=- \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}} \log (\det (Z_i))\).
Proof
Let \(K^n_k\) be the complete k-uniform hyper-graph on n vertices. We can identify each hyper-edge \(\{i_1,i_2,\dots , i_k\}\subset [n]\) in \(K^n_k\) with exactly one element \(Y_J \in \mathcal {Y}\), namely the one where \(\{i_1,i_2,\dots , i_k\} =J\). Let \(\left\{ S_1, \dots , S_{{{n-1}\atopwithdelims (){k-1}}}\right\} \) be the color classes of a hype-edge coloring of \(K^n_k\). Define \(\mathcal {Y}_i := \{Y_J : J \in S_i \}\) and set \(Z_i := \Psi (\mathcal {Y}_i)\). Then \(\Psi (\mathcal {Y}) = \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}}Z_i\) since \(S_i \cap S_j = \emptyset \) for \(i \ne j\) and \(\cup _{i} S_i = \mathcal {J}\). Moreover, since each \(S_i\) corresponds to disjoint index sets in \(\mathcal {J}\), there exists a permutation matrix \(P_i\) for every \(i = 1, \dots , {{n-1}\atopwithdelims (){k-1}}\) such that \(P_i Z_i P_i^T\) is a block-diagonal matrix with blocks \(Y_J\) on the diagonal for \(J \in S_i\). This shows that \(Z_i \succ 0\).
From this we find
Hence,
completing the proof. \(\square \)
We continue to prove Lemma 2.
Proof of Lemma 2
The self-concordance of \(f^{\text {FW}(k)}\) on \(\text {int}\big (\mathbb {S}^{(n,k)}_+ \big )\) follows immediately from the self-concordance of \(-\log \det (\cdot )\) on \(\text {int}\left( \mathbb {S}^k_+\right) \). By assumption \(X = \Psi (\mathcal {Y}) = \sum _{i=1}^{{{n-1}\atopwithdelims (){k-1}}}Z_i \in \text {FW}_n(k)\). Therefore,
where the inequality follows by convexity of the function \(-\log \det (\cdot )\) on int\(\left( \mathbb {S}^n_+\right) \). Hence, we find
and the claim follows. \(\square \)
The following corollary is analogous to Corollary 4.5 from [19].
Corollary 4
If
then
Proof
The first statement follows when noting that each \(i \in [n]\) lies in exactly \(\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) \) subsets of [n] of size k. The reason is that when fixing i, there are \(n-1\) elements left out of which we want to choose \(k-1\) more elements to make a set of size k. For the second statement note that
The result follows when noting that \(k \left( {\begin{array}{c}n\\ k\end{array}}\right) = n {{n-1}\atopwithdelims (){k-1}}\). \(\square \)
4 Further Properties of the Barrier Functions
To prove convergence of our algorithm we need two essential ingredients. First, we need to prove that the predictor step reduces the current objective value sufficiently, and secondly, we must prove that the corrector step converges to a point close to the central path. Moreover, we have to show that our criterion to decide which subroutine to call is valid. The issue here is that we compute the Newton decrement of \(f^{\text {FW}(k)}\) at \(\mathcal {Y}_0\), but we need to be able to assert that the Newton decrement of \(f^{\text {SDP}}\) at \(X_\ell \) is small enough.
The next result we present will allow us to lower bound the progress made by the corrector step. For this we need to be able to compare the barrier functions for \(\mathbb {S}^n_+\) and \(\mathbb {S}^{(n,k)}_+\). We assume we have a given feasible solution \(X_\ell \) such that \(\langle A^{(\ell )}_0, I \rangle = v\). Define the vector \(b(v) := (v, b_1, \ldots , b_m)^T\). For further reference, consider
which we would like to compare to
Suppose \(\mathcal {Y}^*\) is an approximate solution to (14). Defining
we find that \(X_{\ell } \in \mathcal {F}_{\textrm{SDP}}\) for all \(\ell \). In other words, the points \(X_\ell \) we obtain via this procedure are all feasible for the original SDP (1). The following lemma allows us to lower bound the decrease achieved by one corrector step in terms of an element in \(\mathbb {S}^{(n,k)}_+\).
Lemma 5
Let \(\mathcal {Y}^*\) be a feasible solution to (14) and \(\mathcal {Y}_0\) as in (10). Furthermore, let \(X_{\ell +1} = X_{\ell }^{1/2}\Psi (\mathcal {Y}^*)X_{\ell }^{1/2}\) for \(X_\ell \) a feasible solution. Then
Proof
The proof follows immediately when noting that
\(\square \)
4.1 Relation of the Newton Decrements
In this subsection we will prove that we can upper bound the Newton decrement of \(f^{\textrm{SDP}}\) at the identity in terms of the Newton decrement of \(f^{\textrm{FW}(k)}\) at \(\mathcal {Y}_0\). We now define the following operator
via
where \(\circ \) denotes the Hadamard product. See Fig. 2 for a visualization of the surjection from \(\mathbb {S}^{(n,k)}_+\) to \(\textrm{FW}_n(k)\).
This operator satisfies
An inner product on \(\mathbb {S}^{(n,k)}\) is given by
It is straightforward to verify the following relation between the norms induced by this inner product, and the Frobenius norm on \(\mathbb {S}^n\).
Lemma 6
For any \(X \in \mathbb {S}^n\) we have
Suppose now \(X_\ell \) is a feasible solution to (4) such that \(\langle A_0, X_\ell \rangle = v\). We define the two subspaces
and
We may also add an equality for the objective, in which case we will refer to the following operator
The respective subspaces will be denoted as follows
and
where \(b(v) := (v, b_1, \ldots , b_m)^T\).
When we consider the subspaces defined via the operator with respect to the initial data matrices, we omit the subscript \(\ell \), e.g.,
The following lemma corresponds to Lemma A.2 in [19], and allows us to bound the Newton decrement of \(f^{\textrm{SDP}}_{\vert L}\) in terms of \(f^{\textrm{FW}(k)}_{\vert L}\).
Lemma 7
Assume \(\mathcal {Y}_0 \in L^{\Psi }\) and \(I \in L\). At \(\mathcal {Y}_0\) one has
Proof
Following (9) we have
Choosing \(d = \Psi ^{\dagger }(n^{\textrm{SDP}}_L(X)) \in L\) leads to
and evaluating the expression at \(\mathcal {Y}_0\) we find
where the second inequality follows from Lemma 6. Setting \(X = I\) and noting
we conclude
because
\(\square \)
5 Complexity Analysis
We begin the complexity analysis with the following lemma, which helps us to check whether the current point is close enough to the central path of the SDP.
Lemma 8
Let \(X_{\ell }\) be a feasible iterate for the SDP (13) and let the objective value at \(X_{\ell }\) be v, i.e., \(\langle A_0, X_\ell \rangle = v\). Define the two subspaces \(L^{\Psi }_{\ell }(v)\), \(L_\ell \) as in (15), (11) respectively. Then, if
one has
where
and \(\eta _v\) is the value of the central path parameter that corresponds to the objective value v.
Proof
By Lemma 7 we know that
Let now z(v) be the point on the central path of the rescaled SDP with objective value v and let the corresponding parameter be \(\eta _v\). By Theorem 2.2.5 from [18] we have
Let \(X_+\) be the point returned by taking a Newton step at \(X = I\) with respect to the function \(f^{\textrm{SDP}}_{\eta _v}\) restricted to \(L_\ell \). By Theorem 2.2.3 in [18] we have
and hence
\(\square \)
The Newton decrement of the rescaled SDP being smaller than 1/9 means we can safely perform the next predictor step. If the current point is too far away from the central path and one were to perform the predictor step the direction may not be approximately tangential to the central path. Hence, once the Newton decrement of the factor width program is small enough, so is the one of the SDP and we can perform the next predictor step, knowing the direction will be approximately tangential to the central path. After each predictor step we may have to take several corrector steps, to get back close to the central path.
5.1 Corrector Step
We will now find an upper bound on the number of corrector steps needed to get close to the central path. We know from Lemma 5 that a decrease in the barrier for the factor width cone will lead to a decrease in the barrier function for our original SDP, meaning we made progress towards its central path. The following lemma asserts that if we are too far away from the central path we can attain at least a constant reduction in the barrier of the factor width cone and therefore obtain a constant reduction in the SDP barrier as well.
Lemma 9
Let \(X_{\ell }\) be a feasible iterate for the SDP (13) and let the objective value at \(X_{\ell }\) be v. Define the subspace \(L^{\Psi }_{\ell }(v)\) as in (15). If
then
Proof
If \(\Delta \left( f^{\textrm{FW}(k)}_{ \vert L^{\Psi }_{\ell }(v)}, \mathcal {Y}_0\right) > \frac{1}{14}\) the corrector step will employ a line search to find \(\mathcal {Y}^{*}\), i.e. the point in \(L^{\Psi }_{\ell }(v)\) that minimizes \(f^{\textrm{FW}(k)}\). Let \(n_{L^{\Psi }_{\ell }(v)}(\mathcal {Y}_0)\) be the Newton step taken from \(\mathcal {Y}_0\) and let \(t = \frac{1}{8 \Vert n_{L^{\Psi }_{\ell }(v)}(\mathcal {Y}_0) \Vert _{(n,k),\mathcal {Y}_0}}\), where the norm in the denominator is the local norm at \(\mathcal {Y}_0\) induced by \(\langle \cdot , \cdot \rangle _{(n,k)}\). Then, for
we find by Theorem 2.2.2 in [18]
\(\square \)
Note that this implies together with Lemma 5 that
Knowing each line search reduces the distance to the targeted point on the central path at least by a constant amount will allow us to bound the number of line searches we need to get close enough if we have an upper bound on the distance of the result of the predictor step and the corresponding point on the central path of the SDP.
Lemma 10
Let \(X_1\) be close to a point \(z(v_1)\) on the central path of the SDP in the sense that \(\Delta \left( f^{\textrm{SDP}}_{L_{\ell }(v_1)}, X_1\right) \le \frac{1}{9}\). Furthermore, let \(X_2\) be the result of the predictor step and \(z(v_2)\) be the point on the central path with the same objective value as \(X_2\). Then
Proof
A proof of this statement for generic self-concordant barriers may be found on page 54 of [18]. We have used that the barrier parameter for the barrier of the psd cone is given by \(\vartheta _{f^{\textrm{SDP}}} = n\). \(\square \)
Lemma 11
Let \(v_2\) be the objective value of the result \(X_2\) of the predictor step. The maximum number K of line searches needed to find a point \(X_{K+2}\) which is close enough to \(z(v_2)\) in the sense that \(\Delta \left( f^{\textrm{SDP}}_{\vert L_{\ell }(v_2)}, X_{K+2}\right) \le \frac{1}{9}\) is
where \(z(v_2)\) is the point on the central path with objective value \(v_2\).
Proof
We know that the distance between the result of the predictor phase and the targeted point on the central path is at most \(n \left( \log \frac{1}{1-\sigma }\right) +\frac{1}{154}\) by Lemma 10. Moreover, using Lemmas 9 and 5 we find that in each corrector step we reduce this distance by at least \(\frac{1}{2688{{n-1}\atopwithdelims (){k-1}}}\), unless the SDP Newton decrement at I is already small enough to perform the next predictor step. If after rescaling the Newton decrement of the factor width program satisfies
thereby implying by Lemma 8 that I is not close to the central path of the SDP we can perform another corrector step yielding at least a constant decrease of \(\frac{1}{2688{{n-1}\atopwithdelims (){k-1}}}\) of the distance to the central path, and rescale again. This process can be continued until we do not get such a constant decrease anymore at which point we know we must be close enough to the central path, in the sense of Lemma 8. This is because if the decrease is not greater than \(\frac{1}{2688{{n-1}\atopwithdelims (){k-1}}}\) we know that the Newton decrement cannot satisfy
from which follows by Lemma 8 that
This implies we are close enough to the central path to perform the next predictor step. Hence, after at most
corrector steps we are close enough to the central path so that we can perform the next predictor step.
5.2 Predictor Step
We will make use of the analysis of the short step interior point method discussed in Section 2.4.2 in [18]. We will show that each predictor step reduces the objective value by an amount at least as large as the objective decrease by the short-step interior point method. This will allow us to conclude the maximum number of predictor steps needed to obtain an \(\varepsilon \) optimal solution of the given SDP. Note that the decrease in objective value obtained by our predictor method is as follows. Let X be the point from where the predictor method starts and \(-(A_0)_X := -H(X)A_0\) be the direction. Then for \(\sigma \ge \frac{1}{4}\) and \(s^*\) as in (12) we find
This implies the decrease is at least as large the one obtained in one iteration of the short-step method, as discussed in [18, §2.4.2]. Renegar’s analysis shows that short-step method leads to an \(\varepsilon \) optimal solution in at most
steps, where \(\eta _0\) is such that our starting point \(X_0\) is close to \(z_{\eta _0}\). By an \(\varepsilon \) optimal solution we mean a feasible solution X such that
5.3 Predictor and Corrector Steps Combined
Combining the complexity analysis of predictor and corrector steps we arrive at the following theorem.
Theorem 12
Let \(X_0\) be a feasible solution of the SDP (1) and assume it is close to some point \(z_{\eta _0}\) on the corresponding central path in the sense that \(\Delta \left( f_{\vert _{L(v)}}^{\textrm{SDP}},X_0\right) <1/14\), where L is as in (16) for \(v = \langle A_0, X_0 \rangle \). Algorithm 1 converges to an \(\varepsilon \) optimal solution in at most
steps.
The assumption of a starting point “close to the central path” may be satisfied by the self-dual embedding strategy [11]. Alternatively, one may first solve an auxiliary SDP problem, as in [18, Section 2.4.2], by using the algorithm we have presented. The solution of this auxiliary problem then yields a point close to the central path of the original SDP problem.
6 Discussion and Future Prospects
We finish with a brief discussion on various topics surrounding Algorithm 1.
6.1 Replacing the Predictor Step
In their paper [19], the authors propose to perform a fixed number of decrease steps, where a decrease step consists of solving (6) and rescaling with respect to the optimal solution. In our algorithm we considered a different method to decrease the objective value, i.e., the predictor method, where we use the traditional SDP affine scaling direction.
6.2 Tractability of Factor Width Cones
Some recent ideas regarding factor width cones that could influence future research in this area are:
-
the idea to optimize over the dual cone of \(\textrm{FW}_n(k)\) by utilizing clique trees [24].
-
a variation on the factor width cone involving fewer blocks [25].
In addition, it would be very helpful to know a computable self-concordant barrier functional for the cone \(\textrm{FW}_n(k)\), as well as its complexity parameter.
References
Ahmadi, A.A., Dash, S., Hall, G.: Optimization over structured subsets of positive semidefinite matrices via column generation. Discrete Optim. 24, 129–151 (2017)
Ahmadi, A.A., Hall, G.: Sum of squares basis pursuit with linear and second order cone programming. In: Harrington, H.A., Omar, M., Wright, M. (eds.) Algebraic and Geometric Methods in Discrete Mathematics. Contemporary Mathematics, vol. 685, pp. 27–53. American Mathematical Society, Providence, RI (2017)
Alizadeh, F., Haeberly, J.-P.A., Overton, M.L.: Primal-dual interior-point methods for semidefinite programming: convergence rates, stability and numerical results. SIAM J. Optim. 8, 746–768 (1998)
Ahmadi, A.A., Majumdar, A.: DSOS and SDSOS optimization: LP and SOCP-based alternatives to sum of squares optimization. 2014 48th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, pp. 1–5. IEEE (2014)
Ahmadi, A.A., Majumdar, A.: DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization. SIAM J. Appl. Algebra Geom. 3, 193–230 (2019)
Baranyai, Z.: On the factorization of the complete uniform hypergraphs. Infinite and Finite Sets 1, 91–108 (1975). Proceedings of a Colloquium held at Keszthely, June 25–July 1 (1973)
Boman, E.G., Chen, D., Parekh, O., Toledo, S.: On factor width and symmetric \(H\)-matrices. Linear Algebra Appl. 405, 239–248 (2005)
Blekherman, G., Dey, S.S., Molinaro, M., Sun, S.: Sparse PSD approximation of the PSD cone. Math. Program. 191, 981–1004 (2022)
Borchers, B.: CSDP, A C library for semidefinite programming. Optim. Methods Softw. 11, 613–623 (1999)
de Klerk, E., Vallentin, F.: On the Turing model complexity of interior point methods for semidefinite programming. SIAM J. Optim. 26, 1944–1961 (2016)
de Klerk, E., Roos, C., Terlaky, T.: Initialization in semidefinite programming via a self-dual skew-symmetric embedding. Oper. Res. Lett. 20, 213–221 (1997)
Fujisawa, K., Kojima, M., Nakata, K., Yamashita, M.: SDPA (semidefinite programming algorithm) user’s manual—version 6.00. Res. Rep. Math. Comput. Sci. Ser. B: Oper. Res. 12 (2002)
Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11, 796–817 (2001)
Marcus, M., Minc, H.: A Survey of Matrix Theory and Matrix Inequalities. Allyn and Bacon, Inc. (1964)
Monteiro, R.D.C., Zanjácomo, P.: Implementation of primal-dual methods for semidefinite programming based on Monteiro and Tsuchiya Newton directions and their variants. Optim. Methods Softw. 11, 91–140 (1999)
MOSEK, ApS.: MOSEK Optimization Software. Technical report, Version 9.1.9 (2019). http://docs.mosek.com/9.1/toolbox/index.html
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, Philadelphia (1994)
Renegar, J.: A Mathematical View of Interior-Point Methods in Convex Optimization. MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2001)
Roig-Solvas, B., Sznaier, M.: A globally convergent LP and SOCP-based algorithm for semidefinite programming. arXiv:2202.12374 (2022)
Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11, 625–653 (1999)
Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3 - a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw. 11, 545–581 (1999)
Yamashita, M., Fujisawa, K., Fukuda, M., Kobayashi, K., Nakata, K., Nakata, M.: Latest developments in the SDPA family for solving large-scale SDPs. In: Anjos, M.F., Lasserre, J.B. (eds.) Handbook on Semidefinite, Conic and Polynomial Optimization, pp. 687–713. Springer, New York (2012)
Zohrizadeh, F., Josz, C., Jin, M., Madani, R., Lavaei, J., Sojoudi, S.: A survey on conic relaxations of optimal power flow problem. Eur. J. Oper. Res. 287, 391–409 (2020)
Zhang, R.Y., Lavaei, J.: Sparse semidefinite programs with guaranteed near-linear time complexity via dualized clique tree conversion. Math. Program. 188, 351–393 (2021)
Zheng, Y., Sootla, A., Papachristodoulou, A.: Block factor-width-two matrices and their applications to semidefinite and sum-of-squares optimization. IEEE Trans. Autom. Control 68, 943–958 (2023)
Acknowledgements
The authors would like to thank Georgina Hall for insightful discussions on the topic on multiple occasions. Moreover, the authors thank Michaël Gabay and Arefeh Kavand for fruitful conversations on different angles of the subject matter.
Funding
This work is supported by the European Union’s Framework Programme for Research and Innovation Horizon 2020 under the Marie Skłodowska-Curie grant agreement N. 813211 (POEMA).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is dedicated to professor Tamás Terlaky, who was one of the PhD supervisors of the second author. We would like to pay tribute to the many contributions he has made to the theory and practice of interior point methods for convex optimization, in addition to his notable influence in other areas and applications of mathematical programming.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kirschner, F., Klerk, E.d. A Predictor-Corrector Algorithm for Semidefinite Programming that Uses the Factor Width Cone. Vietnam J. Math. (2024). https://doi.org/10.1007/s10013-023-00666-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10013-023-00666-8