1 Packing and Covering Semidefinite Programs

We denote by \(\mathbb S^n\) the set of all \(n\times n\) real symmetric matrices and by \(\mathbb S^n_+\subseteq \mathbb S^n\) the set of all \(n\times n\) positive semidefinite (psd) matrices. We consider the following pairs of packing-covering semidefinite programs (SDPs):

$$\begin{aligned} \quad &\displaystyle z_I^* = \max \, C\bullet X \ \ \ \ \ \ \ \ ({\textsc {Packing-I}}) \\ \text {s.t.} \ & \displaystyle A_i\bullet X\le b_i, \forall i\in [m]\\ \qquad&X\in \mathbb S^{n},~X\succeq 0 \end{aligned}$$
$$\begin{aligned}& \displaystyle \quad \quad z_I^* = \min \, b^Ty \quad \quad \quad \quad ({\textsc {Covering-I}}) \\ \ \ \ & \displaystyle \text {s.t.}\sum _{i=1}^my_iA_i\succeq C\nonumber \\ & \quad \quad y\in \mathbb R^m,~y\ge 0\nonumber , \end{aligned}$$
$$\begin{aligned} & \displaystyle z_{II}^* = \min \, C\bullet X \quad \quad \quad \quad ({\textsc {Covering-II}}) \\ \text {s.t.} \ \ & \displaystyle A_i\bullet X\ge b_i, \forall i\in [m]\nonumber \\ \qquad & X\in \mathbb S^{n},~X\succeq 0\nonumber \end{aligned}$$
$$\begin{aligned} & \displaystyle \quad \quad z_{II}^* = \max \, b^Ty \quad \quad \quad \quad ({\textsc {Packing-II}}) \\ \ \ & \displaystyle \text {s.t.}\sum _{i=1}^my_iA_i\preceq C\nonumber \\ & \quad \quad y\in \mathbb R^m,~y\ge 0\nonumber , \end{aligned}$$

where \(C,A_1,\ldots ,A_m \in \mathbb S_+^n\) are (non-zero) psd matrices, and \(b=(b_1,\ldots ,b_n)^T\in \mathbb R^m_+\) is a non-negative vector. In the above, \(C\bullet X:=\text {Tr}(CX)=\sum _{i=1}^n\sum _{j=1}^n c_{ij}x_{ij}\), and “\(\succeq \)” is the Löwner order on matrices: \(A\succeq B\) if and only if \(A-B\) is psd. This type of SDP arises in many applications. See, for example, [14, 15] and the references therein.

We assume the following throughout this chapter:

  1. (A)

    \(b_i>0\) and hence \(b_i=1\) for all \(i\in [m]\).

It is known that, under assumption (A), strong duality holds for problems (Packing-I) and (Covering-I) (resp., (Packing-II) and (Covering-II)). Let \(\epsilon \in (0,1]\) be a given constant. We say that (Xy) is an \(\epsilon \)-optimal primal-dual solution for (Packing-I)-(Covering-I) if (Xy) is a primal-dual feasible pair such that

$$\begin{aligned} C\bullet X\ge (1-\epsilon )b^Ty\ge (1-\epsilon )z^*_I. \end{aligned}$$
(4.1)

Similarly, we say that (Xy) is an \(\epsilon \)-optimal primal-dual solution for (Packing-II)-(Covering-II) if (Xy) is a primal-dual feasible pair such that

$$\begin{aligned} C\bullet X\le (1+\epsilon )b^Ty\le (1+\epsilon )z^*_{II}. \end{aligned}$$
(4.2)

In this chapter, we allow the number of constraints m in (Packing-I) (resp., (Covering-II)) to be exponentially (or even infinitely) large, so we assume the availability of the following oracle:

 

Max(Y)(resp., Min(Y)):

: Given \(Y\in \mathbb S_+^n\), find \(i\in {\text {argmax}}_{i\in [m]}A_i\bullet Y\) (resp., \(i\in {\text {argmin}}_{i\in [m]}A_i\bullet Y\)).

Note that an approximation oracle computing the above maximum (resp., minimum) within a factor of \((1-\epsilon )\) (resp., \((1+\epsilon )\)) is also sufficient for our purposes. A primal-dual solution (Xy) to (Covering-I) (resp., (Packing-II)) is said to be \(\eta \)-sparse if the size of \({\text {supp}}(y):=\{i\in [m]:y_i>0\}\) is at most \(\eta \).

When \(C=I=I_n\) (which is the identity matrix in \(\mathbb R^{n\times n}\)) and \(b=\mathbf{1}_m\) (which is the vector containing all ones in \(\mathbb R^m\)), we say that the packing-covering SDPs are in normalized form. It can be shown (see, e.g., [7, 16]) that, to within a multiplicative factor of \((1+\epsilon )\) in the objective, any pair of packing-covering SDPs of the form (Packing-I)-(Covering-I) can be brought to normalized form in \(O(n^3)\) time while increasing the oracle time by only \(O(n^\omega )\), where \(\omega \) is the exponent of matrix multiplication, under the following assumption:

  1. (B-I)

    There exist r matrices, say \(A_{1},\ldots ,A_{r}\), such that \(\hat{A}:=\sum _{i=1}^rA_{i}\succ 0\). In particular, \(\text {Tr}(X)\le \tau :=\frac{r}{\lambda _{\min }(\bar{A})}\) for any optimal solution X for (Packing-I), and we may assume that \(r=1\) and \(A_1=\frac{1}{\tau }I.\)

Similarly, it can be shown that, to within a multiplicative factor of \((1+\epsilon )\) in the objective, any pair of packing-covering SDPs of the form (Packing-II)-(Covering-II) can be brought to normalized form in \(O(n^3)\) time, while increasing the oracle time by only \(O(n^\omega )\). Moreover, we may assume in this normalized form that

  1. (B-II)

    \(\lambda _{\min }(A_i)=\Omega \big (\frac{\epsilon }{n}\cdot \min _{i'}\lambda _{\max }(A_{i'})\big )\) for all \(i\in [m]\),

where, for a psd matrix \(B\in \mathbb S_+^{n}\), we denote by \(\{\lambda _j(B):~j=1,\ldots ,n\}\) the eigenvalues of B, and by \(\lambda _{\min }(B)\) and \(\lambda _{\max }(B)\) the minimum and maximum eigenvalues of B, respectively. Given additional \(O(mn^2)\) time, we may also assume that

  1. (B-II’)

    \(\frac{\lambda _{\max }(A_i)}{\lambda _{\min }(A_i)}= O\big (\frac{n^2}{\epsilon ^2}\big )\) for all \(i\in [m]\).

Thus, the remainder of this chapter focuses on normalized problems.

Mixed packing and covering SDPs.

We also consider the following mixed packing-covering feasibility SDPs:

$$\begin{aligned}&\displaystyle A_i\bullet X\le b_i, \ \ \forall i\in [m_p] \qquad \qquad ({\textsc {Mix-Pack-Cover}}) \nonumber \\ \displaystyle&B_i\bullet X\ge d_i, \ \ \forall i\in [m_c]\nonumber \\ \qquad&X\in \mathbb S^{n},~X\succeq 0\nonumber , \end{aligned}$$

where \(A_1,\ldots ,A_{m_p}, B_1,\ldots ,B_{m_c} \in \mathbb R^{n\times n}\) are psd matrices, and \(b=(b_1,\ldots ,b_{m_p})^T\), \(d=(d_1,\ldots ,d_{m_c})^T\) are non-negative real vectors.

A matrix \(X\in \mathbb S_+^n\) is an \(\epsilon \)-approximate solution for (Mix-Pack-Cover) if \(A_i\bullet X\le b_i\) for all \(i\in [m_p]\) and \(B_i\bullet X\ge (1-\epsilon )d_i\) for all \(i\in [m_c]\).

2 Applications

2.1 SDP relaxation for Robust MaxCut

Given a simple undirected graph \(G=(V,E)\) on \(n=|V|\) vertices with non-negative edge weights \(w\in \mathbb R_+^E\), the objective in the well-known MaxCut problem is to find a subset of the vertices \(X\subset V\) that maximizes the weight of the cut: \( w(X,V\setminus X):=\sum _{u\in X,~v\in V\setminus X}w_{uv}\). The best-known approximation algorithm (with approximation ratio \(0.878\ldots \)) [10] for MaxCut is based on the following SDP relaxation:

$$\begin{aligned} \displaystyle \max \, L(w)\bullet X \qquad \quad \quad \qquad ({\textsc {MaxCut-SDP}}) \\ \end{aligned}$$
$$\begin{aligned} \text {s.t.}\quad&\displaystyle \mathbf{1}_{i}\mathbf{1}^T_i\bullet X=1, \ \ \forall i\in [n] \\ \qquad&X\in \mathbb R^{n\times n},~X\succeq 0\nonumber . \end{aligned}$$
(4.3)

By simply changing the equality in (4.3) into an inequality, this can be written in the form (Packing-I), with \(A_i:=\mathbf{1}_i\mathbf{1}_i^T\) and \(C:=L(w)\succeq 0\) being the Laplacian matrix of G, defined as follows:

$$ L_{ij}(w)=\left\{ \begin{array}{ll} \sum _{k=1}^nw_{ik}&{}\text { if }i=j,\\ -w_{ij}&{}\text { if }\{i,j\}\in E,\\ 0&{}\text { otherwise.} \end{array} \right. $$

Based on this relaxation, the following result is obtained using the scalar multiplicative weights update (MWU) method:

Theorem 4.1

([18]) There is a randomized algorithm for finding an \(\epsilon \)-optimal solution for (MaxCut-SDP) in time \(\tilde{O}(\frac{nm}{\epsilon ^3})\), where n and m respectively denote the number of vertices and edges in a given graph.

Under the robust optimization framework, one assumes the weights are not known precisely, but instead are given by a convex uncertainty set \(\mathcal W\subseteq \mathbb R^{n}_+\), where it is necessary to find a (near)-optimal solution under the worst-case choice \(w\in \mathcal W\) in the uncertainty set:

$$\begin{aligned} \quad&\displaystyle \max \min _{w\in \mathcal W}\, L(w)\bullet X \quad \quad \quad \quad {\textsc {Robust-MaxCut-SDP}} \end{aligned}$$
$$\begin{aligned} \text {s.t.}\quad&\displaystyle \mathbf{1}_{i}\mathbf{1}^T_i\bullet X\le 1, \quad \forall i\in [n] \\ \qquad&X\in \mathbb R^{n\times n},~X\succeq 0\nonumber . \end{aligned}$$
(4.4)

By “guessing” the value \(\tau \) of an optimal solution (via binary search), (4.4) can be reduced to

$$\begin{aligned} \quad&\displaystyle \min \, I\bullet X\\ \quad \quad \quad \quad {\textsc {Robust-MaxCut-SDP}} \text {s.t.}\quad&\displaystyle \mathbf{1}_{i}\mathbf{1}^T_i\bullet X\ge 1, \quad \forall i\in [n]\nonumber \\ \qquad&\frac{1}{\tau }L(w)\bullet X\ge 1,\quad \forall w\in \mathcal W\nonumber \\ \qquad&X\in \mathbb R^{n\times n},~X\succeq 0\nonumber . \end{aligned}$$

Thus, we obtain a covering SDP (of type (Covering-II)) with an infinite number of constraints, given by a minimization oracle over the convex set \(\mathcal W\). We can use the matrix logarithmic-potential method to obtain the following result:

Theorem 4.2

There is a randomized algorithm that finds an \(\epsilon \)-optimal solution for (4.4) in time \(\tilde{O}\big (\frac{n^{\omega +1}}{\epsilon ^{2.5}}+\frac{n\mathcal T}{\epsilon ^2}\big )\), where \(\mathcal T\) is the time needed to optimize a linear function over \(\mathcal W\).

Note that for this reduction to remain valid, it is sufficient to find an \(\epsilon \)-optimal solution to (4.4) for any \(\epsilon =o\big (\frac{1}{n}\big )\).

2.2 Mahalanobis Distance Learning

Given a psd matrix \(X\in \mathbb S^{n}\), the X-Mahalanobis distance between two points \(a,b\in \mathbb R^n\) is defined as

$$ d_X(a,b):=\sqrt{(a-b)^TX(a-b)}. $$

The distance function \(d_X(\cdot ,\cdot )\) is a semi-metric; that is, it is symmetric (\(d_X(a,b)=d_X(a,b)\)) and satisfies the triangle inequality (\(d_X(a,c)\le d_X(a,b)+d_M(b,c)\)), and it is also a metric if \(X\succ 0\) (as in this case, \(d_X(a,b)=0\) if and only if \(a=b\)).

The Mahalanobis distance learning problem is defined as follows [28]: Given sets \(\mathcal C_s\) and \(\mathcal C_d\) of similar and dissimilar pairs of points in \(\mathbb R^n\), respectively, a similarity parameter \(\sigma _s\in \mathbb R_+\) and a dissimilarity parameter \(\sigma _d\in \mathbb R_+\), the objective is to find a matrix X such that all the pairs in \(\mathcal C_s\) are “close” and all the pairs in \(\mathcal C_d\) are “far” with respect to the distance function \(d_X(\cdot ,\cdot )\):

$$\begin{aligned} \quad&\displaystyle (a-b)^TX(a-b)\le \sigma _s, \ \forall (a,b)\in \mathcal C_s \end{aligned}$$
(4.5)
$$\begin{aligned} \quad&\displaystyle (a-b)^TX(a-b)\ge \sigma _d, \ \forall (a,b)\in \mathcal C_d \end{aligned}$$
(4.6)
$$\begin{aligned} \qquad&X\in \mathbb S^{n},~X\succeq 0. \end{aligned}$$
(4.7)

Note that this can be written in the form (Mix-Pack-Cover), with \(|\mathcal C_s|\) packing constraints of the form \(A_{a,b}\bullet X \le \sigma _s\), where \(A_{a,b}=(a-b)(a-b)^T\) for \((a,b)\in \mathcal C_s\), and \(|\mathcal C_d|\) covering constraints of the form \(B_{a,b}\bullet X \ge \sigma _d\), where \(B_{a,b}=(a-b)(a-b)^T\) for \((a,b)\in \mathcal C_d\).

We can use the scalar MWU method to obtain the following result:

Theorem 4.3

There is a deterministic algorithm that finds an \(\epsilon \)-feasible solution for (4.5)-(4.2.2) in time \(\tilde{O}(\frac{m(m+n^3)}{\epsilon ^2})\), where n is the dimension of the point sets and \(m:=|\mathcal C_s|^2+|\mathcal C_d|^2\).

We remark that it is plausible that further improvements (possibly by another factor of O(m)) are possible via rank-one tricks and the use of approximate eigenvalue computations.

2.3 Related Work

Problems (Packing-I)-(Covering-I) and (Packing-II)-(Covering-II) can be solved using general SDP solvers, such as interior-point methods. For example, the barrier method (see, e.g., [22]) can compute a solution within an additive error of \(\epsilon \) from the optimal in time \(O(\sqrt{n}m(n^3+mn^2+m^2)\log \frac{1}{\epsilon })\) (see also [1, 27]). However, due to the special nature of (Packing-I)-(Covering-I) and (Packing-II)-(Covering-II), better algorithms can be obtained. Most of the improvements are obtained by using first-order methods [2, 3, 5, 6, 8, 15,16,17,18, 21, 23, 24], or second-order methods [13, 14]. In general, we can classify these algorithms according to whether they are (semi) width-independent, are parallel, output sparse solutions, or are oracle-based, as follows.

  1. (I)

    (Semi) width-independent: The running time of the algorithm depends polynomially on the bit length of the input. For example, in the of case of  (Packing-I)-(Covering-I), the running time is \({\text {poly}}(n,m,\mathcal L,\log \tau ,\frac{1}{\epsilon })\), where \(\mathcal L\) is the maximum bit length needed to represent any number in the input. In contrast, the running time of a width-dependent algorithm depends polynomially on a “width parameter” \(\rho \), which is polynomial in \(\mathcal L\) and \(\tau \).

  2. (II)

    Parallel: The algorithm takes \({\text {polylog}}(n,m,\mathcal L,\log \tau )\cdot {\text {poly}}(\frac{1}{\epsilon })\) time on a \({\text {poly}}(n,m,\) \(\mathcal L,\log \tau ,\frac{1}{\epsilon })\) number of processors.

  3. (III)

    Sparse: The algorithm outputs an \(\eta \)-sparse solution to (Covering-I) (resp., (Packing-II)) for \(\eta ={\text {poly}}(n,\log m,\mathcal L,\log \tau ,\frac{1}{\epsilon })\) (resp., \(\eta ={\text {poly}}(n,\log m,\mathcal L,\frac{1}{\epsilon })\)), where \(\tau \) is a parameter that bounds the trace of any optimal solution X;

  4. (IV)

    Oracle-based: The only access the algorithm has to the matrices \(A_1,\ldots ,A_m\) is via the maximization/minimization oracle, and hence the running time is independent of m.

Table 4.1 below gives a summaryFootnote 1 of the most relevant results together with their classifications according to the four criteria above. We note that almost all of these algorithms for packing/covering SDPs are generalizations of similar algorithms for packing/covering linear programs (LPs), and most of them are essentially based on an exponential potential function in the form of scalar exponentials, such as [3, 18], or matrix exponential [2, 5, 6, 15, 17]. For instance, several of these results use the scalar or matrix versions of the MWU method (see, e.g., [4]), which are extensions of similar methods for packing/covering LPs [9, 11, 25, 29].

In [12], a different type of algorithm was given for covering LPs (indeed, more generally, for a class of concave covering inequalities) based on a logarithmic potential function. In [7], it was shown that this approach could be extended to provide sparse solutions for both versions of packing and covering SDPs.

Table 4.1 Different algorithms for packing/covering SDPs

As we can see from the table, among all the algorithms, only the matrix (MWU and logarithmic-potential) algorithms are oracle-based (and hence produce sparse solutions) in the sense described above. However, the overall running time of the matrix MWU algorithm is larger by a factor of (roughly) \(\Omega (n^{3-\omega })\) than that of the logarithmic-potential algorithm, where \(\omega \) is the exponent of matrix multiplication. Moreover, we cannot extend the matrix MWU algorithm to solve (Packing-I)-(Covering-I) (in particular, it seems tricky to bound the number of iterations).

3 General Framework for Packing-Covering SDPs

Given a pair of packing-covering SDPs (Packing-I)-(Covering-I) or  (Covering-II)-(Packing-II), we consider the following general framework in which each constraint is assigned a weight reflecting how satisfied the constraint is given the current solution:

figure a

We obtain different algorithms depending on how the weights are defined. We write \(a_i:=A_i\bullet X\ge 0.\) Since \(a_{\max }:=\max \{a_1,\ldots ,a_m\}\) (resp., \(a_{\min }:=\min \{a_1,\ldots ,a_m\}\)) is not a smooth function (in X), it is more convenient to work with a smooth approximation of it, which is provided by the weighted average formed in step 3 in the framework. There are several ways to do this, for example:

  • Exponential averaging: The weights are \(\overline{p}_i:=\frac{(1+\epsilon )^{a_i}}{\sum _{i'=1}^m(1+\epsilon )^{a_{i'}}}\) (resp., \(\overline{p}_i:=\frac{(1-\epsilon )^{a_i}}{\sum _{i'=1}^m(1-\epsilon )^{a_{i'}}}\)). The following claim justifies the use of these sets of weights.

Lemma 4.1

If \(a_{\max }\ge \frac{1+\epsilon }{\epsilon }\log _{1+\epsilon }\frac{m}{\epsilon }\) \(\Bigl (\)resp., \(a_{\min }\ge \frac{1}{\epsilon }\log _{\frac{1}{1-\epsilon }}\left( \frac{m \cdot a_{\max }}{\epsilon \cdot a_{\min }}\right) \Bigr )\), then

$$ \frac{a_{\max }}{1+\epsilon }\le \sum _{i=1}^m\overline{p}_i a_i\le a_{\max } ~~~\Bigl ( \text {resp., }\, a_{\min }\le \sum _{i=1}^m\overline{p}_i a_i\le (1+\epsilon )a_{\min }\Bigr ). $$
  • Logarithmic potential averaging: The weights are \(\overline{p}_i=\frac{\epsilon }{m}\frac{\theta ^*}{\theta ^*-a_i}\) (resp., \(\overline{p}_i=\frac{\epsilon }{m}\frac{\theta ^*}{a_i-\theta ^*}\)), where \(\theta ^*\) is the minimizer (resp., maximizer) of the potential function

    $$\begin{aligned} \Phi (\theta )=\ln \left( \theta \cdot \root \epsilon /m \of {\prod _{i=1}^m\frac{1}{\theta -a_i}}\right) \ \ \Biggl ( \text {resp.,} \Phi (\theta )=\ln \left( \theta \cdot \root \epsilon /m \of {\prod _{i=1}^m(a_i-\theta )}\right) \Biggr ). \end{aligned}$$

    (It can be easily verified that \(\sum _i\overline{p}_i=1\).) The following claim justifies the use of these sets of weights.

Lemma 4.2

$$ \frac{(1-\epsilon )a_{\max }}{1-\epsilon /m}\le \sum _{i=1}^m\overline{p}_i a_i\le a_{\max } ~~~\Biggr (\text {resp.,} a_{\min }\le \sum _{i=1}^m\overline{p}_i a_i\le \frac{a_{\min }(1+\epsilon )}{1+\epsilon /m}\Biggr ). $$

4 Scalar Algorithms

4.1 Scalar MWU Algorithm for (Packing-I)-(Covering-I)

Given a normalized pair of packing-covering SDPs of type I (Packing-I)-(Covering-I), and a feasible primal solution X, we use the exponential weight \(p_i:=(1+\epsilon )^{A_i\bullet X}\), for \(i\in [m]\). Averaging the inequalities with respect to the weights \(\overline{p}_i:=\frac{p_i}{\sum _{i}p_i}\), we arrive at the following problem:

$$\begin{aligned} \quad&\displaystyle \max \, I\bullet X \\ \text {s.t.}\quad&\displaystyle \sum _i\overline{p}_iA_i\bullet X\le 1, \ \ \forall i\in [m]\nonumber \\ \qquad&X\in \mathbb R^{n\times n},~X\succeq 0\nonumber . \end{aligned}$$
(4.8)

Letting \(\overline{A}:=\sum _i\overline{p}_iA_i\) and writing \(X=\sum _{v\in B_n}\lambda _vvv^T\), where \( B_n:=\{v\in \mathbb R^n:~\Vert v\Vert =1\} \) and \(\lambda _v\ge 0\) for all \(v\in B_n\), we obtain the following (infinite-dimensional) knapsack problem

$$\begin{aligned} \quad&\displaystyle \max \, \sum _{v\in B_n} \lambda _v\\ \text {s.t.}\quad&\displaystyle \sum _{v\in B_n}\lambda _v\overline{A}\bullet vv^T\le 1, \ \ \forall i\in [m]\nonumber \\ \qquad&\lambda _v\ge 0, \ \ \forall v\in B_n\nonumber . \end{aligned}$$
(4.9)

An optimal solution is attained at a vector \(v\in B_n\) which minimizes \(v^T\overline{A}v\). This is the basis vector corresponding to \(\lambda _{\min }(\overline{A})\).

Thus, using this set of weights in our general framework (Algorithm 1) yields the following procedure (for a vector \(p\in \mathbb R^m\), we write \(\overline{p}_i:=\frac{p_i}{\sum _ip_i}\)):

figure b

The stopping criterion is that the left-hand side (LHS) of at least one inequality in (Packing-I) reaches some threshold \(T:=\epsilon ^{-2}\ln m\), with respect to the current solution X(t). The step size (step 5) is chosen such that in each iteration of the while-loop, this right-hand size increases by at least 1, thus guaranteeing termination in mT iterations.

Theorem 4.4

Given a real \(\epsilon \in (0,1]\), Algorithm 2 outputs an \(\epsilon \)-optimal solution for (Packing-I)-(Covering-I) in \(O(m\log m/\epsilon ^2)\) iterations, where each iteration requires an oracle call that computes an eigenvector corresponding to the minimum eigenvalue of a psd matrix.

For a given matrix \(M\in \mathbb R^{n\times n}\), computing \(\lambda _{\min }(M)\) (almost) exactly requires \(O(n^3)\) time via a full eigenvalue decomposition of the matrix. If M is psd, a faster approximation of \(\lambda _{\min }(M)\) can be obtained (using Lanczos’ algorithm with a random start) via the following result.

Theorem 4.5

([19]) Let \(M\in \mathbb S_+^n\) be a psd matrix with N non-zeros and \(\gamma \in (0,1)\) be a given constant. Then, there is a randomized algorithm that computes, with high (i.e., \(1-o(1)\)) probability a unit vector \(v\in \mathbb R^n\) such that \(v^TMv\ge (1-\gamma )\lambda _{\max }(M)\). The algorithm takes \(O\big (\frac{\log n}{\sqrt{\gamma }}\big )\) iterations, each requiring O(N) arithmetic operations.

By applying the lemma to \((\overline{A})^{-1}\), we can approximate \(\lambda _{\min }(\overline{A})\) in \(\tilde{O}(n^{\omega })\) time.

4.2 Scalar Logarithmic Potential Algorithm For (Packing-I)–(Covering-I)

Given a normalized pair of packing-covering SDPs of type I (Packing-I)-(Covering-I) and a feasible primal solution X, we use the logarithmic-potential weights \(\overline{p}_i=\frac{\epsilon }{m}\frac{\theta ^*}{\theta ^*-A_i\bullet X}\) for \(i\in [m]\). Averaging the inequalities with respect to this set of weights, we arrive at the knapsack problem (4.9). This gives rise to the following procedure:

figure c

In the above, for given numbers \(x\in \mathbb R_+\) and \(\delta \in (0,1)\), we define the \(\delta \)-(upper) approximation \(x^\delta \) of x to be a number satisfying: \(x\le x^\delta <(1+\delta )x\).

Theorem 4.6

Given \(\epsilon \in (0,1]\), Algorithm 3 outputs an \(\epsilon \)-optimal solution for (Covering-I)-(Packing-I) in \(O(m\log \psi +m/\epsilon ^2)\) iterations, where \(\psi :=\frac{\lambda _{\max }(\overline{A}(0))}{\lambda _{\min }(\overline{A}(0))}\) and each iteration requires an oracle call that computes an eigenvector corresponding to the minimum eigenvalue of a psd matrix.

5 Matrix Algorithms

5.1 Matrix MWU Algorithm For (Covering-II)-(Packing-II)

Let \(F(y):=\sum _{i=1}^my_iA_i\). Then, we can rewrite the normalized version of (Packing-II) as follows:

$$\begin{aligned} \quad&\displaystyle z_I^* = \max \, \mathbf{1}^Ty \qquad \qquad \quad \quad ({\textsc {Packing-II}}) \\ \text {s.t.}\quad&\displaystyle \lambda _j(F(y))\le 1, \ \ \forall j\in [n]\nonumber \\ \qquad&y\in \mathbb R^m,~y\ge 0\nonumber . \end{aligned}$$

Averaging the inequalities with respect to the weights \(\overline{p}_j:=\frac{p_j}{\sum _{j}p_j}\), where \( p_j:=(1+\epsilon )^{\lambda _j(F(y))}\), we get

$$\begin{aligned} \quad&\displaystyle \max \, \mathbf{1}^Ty\\ \text {s.t.}\quad&\displaystyle \sum _j\overline{p}_j\lambda _j(F(y))\le 1, \ \ \forall j\in [n]\nonumber \\ \qquad&y\in \mathbb R^m,~y\ge 0\nonumber . \end{aligned}$$

Using the eigenvalue decomposition: \(F(y)=U\Lambda U^T\), where \(\Lambda \) is the diagonal matrix containing the eigenvalues of F(y) and \(UU^T=I\), and letting

$$\begin{aligned} \overline{P}:=U\left[ \begin{array}{llll} \overline{p}_1&{} 0&{} \cdots &{}0\\ 0&{} \overline{p}_2&{} \cdots &{}0\\ \cdots &{} \cdots &{} \cdots &{}\cdots \\ 0&{} 0&{} \cdots &{}\overline{p}_n \end{array}\right] U^T=\frac{(1+\epsilon )^{F(y)}}{\text {Tr}((1+\epsilon )^{F(y)})},\end{aligned}$$

we obtain the following knapsack problem:

$$\begin{aligned} \quad&\displaystyle \max \, \mathbf{1}^Ty\\ \text {s.t.}\quad&\displaystyle \sum _{i}(\overline{P}\bullet A_i) y_i\le 1, \ \ \forall j\in [n]\nonumber \\ \qquad&y\in \mathbb R^m,~y\ge 0\nonumber . \end{aligned}$$

An optimal solution is attained at the basis vector \(y=\mathbf{1}_i\in \mathbb R^m_+\) that minimizes \(\overline{P}\bullet A_i\). This gives rise to the following matrix MWU algorithm:

figure d

Theorem 4.7

Given an real \(\epsilon \in (0,1]\), Algorithm 2 outputs an \(\epsilon \)-optimal solution for (Covering-II)-(Packing-II) in \(O(n\log n/\epsilon ^2)\) iterations, where each iteration requires matrix exponential computation, two oracle calls that computes the maximum eigenvalue of a psd matrix, and a single oracle call to the minimization in step 4.

The most demanding step in the above algorithm is the matrix exponential computation, which can be done in \(O(n^3)\) time via a complete eigenvalue decomposition. A more efficient approximation, particularly when the matrices \(A_i\) are sparse, can be obtained via the following result.

Theorem 4.8

([26]) There is an algorithm for approximating the matrix exponential \(e^{F}\) in time \(O(n^2r\log ^3\frac{1}{\epsilon })\), where r denotes the number of non-zeros in \(F\in \mathbb S^{n}\), and \(\epsilon \) is the approximation accuracy.

We remark that a matrix MWU algorithm and a theorem similar to Algorithm 4 and Theorem 4.7 for (Packing-I)-(Covering-I) have not yet been discovered and are left as open problems.

5.2 Matrix Logarithmic Potential Algorithm For (Packing-I)-(Covering-I)

Let \(F(y):=\sum _{i=1}^my_iA_i\). Then, we can rewrite the normalized version of (Covering-I) as

$$\begin{aligned} \quad&\displaystyle z_I^* = \min \,\mathbf{1}^Ty \qquad \qquad \quad \quad ({\textsc {Packing-II}}) \\ \text {s.t.}\quad&\displaystyle \lambda _j(F(y))\ge 1, \ \ \forall j\in [n]\nonumber \\ \qquad&y\in \mathbb R^m,~y\ge 0\nonumber . \end{aligned}$$

Averaging the inequalities with respect to the weights \(\overline{p}_j:=\frac{\epsilon }{n}\frac{\theta ^*}{\lambda _j(F(y))-\theta ^*}\), we get

$$\begin{aligned} \quad&\displaystyle \min \, \mathbf{1}^Ty\\ \text {s.t.}\quad&\displaystyle \sum _j\overline{p}_j\lambda _j(F(y))\ge 1, \ \ \forall j\in [n]\nonumber \\ \qquad&y\in \mathbb R^m,~y\ge 0\nonumber . \end{aligned}$$

Using the eigenvalue decomposition: \(F(y)=U\Lambda U^T\), where \(\Lambda \) is the diagonal matrix containing the eigenvalues of F(y) and \(UU^T=I\), and letting

$$\begin{aligned} \overline{P}:=U\left[ \begin{array}{llll} \overline{p}_1&{} 0&{} \cdots &{}0\\ 0&{} \overline{p}_2&{} \cdots &{}0\\ \cdots &{} \cdots &{} \cdots &{}\cdots \\ 0&{} 0&{} \cdots &{}\overline{p}_n \end{array}\right] U^T=\frac{\epsilon \theta ^*}{n}(F(y)-\theta ^*I)^{-1},\end{aligned}$$

we obtain the following knapsack problem:

$$\begin{aligned} \quad&\displaystyle \min \quad \mathbf{1}^Ty\\ \text {s.t.}\quad&\displaystyle \sum _{i}(\overline{P}\bullet A_i) y_i\ge 1, \forall j\in [n]\nonumber \\ \qquad&y\in \mathbb R^m,~y\ge 0\nonumber . \end{aligned}$$

An optimal solution is attained at the basis vector \(y=\mathbf{1}_i\in \mathbb R^m_+\) that maximizes \(\overline{P}\bullet A_i\). This gives rise to the following matrix logarithmic-potential algorithm:

figure e

The most demanding steps are the computation of \(\theta (t)\) and X(t) in steps 5 and 5, respectively. Computing \(\theta (t)\) can be done via binary search over a region determined by repeated matrix multiplications and approximate minimum eigenvalue computation (cf. Theorem 4.5). Once \(\theta (t)\) is determined, computing X(t) requires a single matrix inversion. The overall running time per iteration is \(\tilde{O}(n^{\omega })\) plus the time needed by the maximization oracle in step 5.

Theorem 4.9

Given \(\epsilon \in (0,1]\), Algorithm 5 outputs an \(\epsilon \)-optimal solution for (Covering-I)-(Packing-I) in \(O(n\log \psi +\frac{n}{\epsilon ^2})\) iterations, where \(\psi := \frac{r\cdot \max _i\lambda _{\max }(A_i)}{\lambda _{\min }(\hat{A})}\) and each iteration requires \(O(\log \frac{n}{\epsilon })\) matrix multiplications and a single oracle call to the maximization in step 5.

5.3 Matrix Logarithmic Potential Algorithm For (Packing-II)-(Covering-II)

A symmetric version of Algorithm 5 for (Packing-II)-(Covering-II) can be given as follows:

figure f

Theorem 4.10

Given \(\epsilon \in (0,1]\), Algorithm 6 outputs an \(\epsilon \)-optimal solution for (Packing-II)-(Covering-II) in \(O(n\log \psi +\frac{n}{\epsilon ^2})\) iterations, where \(\psi := O(\log \frac{n}{\epsilon })\) and each iteration requires \(O(\log \frac{n}{\epsilon })\) matrix inversions and a single oracle call to the minimization in step 4.