Abstract
We study the problem of approximating the cone of positive semidefinite (PSD) matrices with a cone that can be described by smaller-sized PSD constraints. Specifically, we ask the question: “how closely can we approximate the set of unit-trace \(n \times n\) PSD matrices, denoted by D, using at most N number of \(k \times k\) PSD constraints?” In this paper, we prove lower bounds on N to achieve a good approximation of D by considering two constructions of an approximating set. First, we consider the unit-trace \(n \times n\) symmetric matrices that are PSD when restricted to a fixed set of k-dimensional subspaces in \({\mathbb {R}}^n\). We prove that if this set is a good approximation of D, then the number of subspaces must be at least exponentially large in n for any \(k = o(n)\). Second, we show that any set S that approximates D within a constant approximation ratio must have superpolynomial \({\varvec{S}}_+^k\)-extension complexity. To be more precise, if S is a constant factor approximation of D, then S must have \({\varvec{S}}_+^k\)-extension complexity at least \(\exp ( C \cdot \min \{ \sqrt{n}, n/k \})\) where C is some absolute constant. In addition, we show that any set S such that \(D \subseteq S\) and the Gaussian width of S is at most a constant times larger than the Gaussian width of D must have \({\varvec{S}}_+^k\)-extension complexity at least \(\exp ( C \cdot \min \{ n^{1/3}, \sqrt{n/k} \})\). These results imply that the cone of \(n \times n\) PSD matrices cannot be approximated by a polynomial number of \(k \times k\) PSD constraints for any \(k = o(n / \log ^2 n)\). These results generalize the recent work of Fawzi (Math Oper Res 46(4):1479–1489, 2021) on the hardness of polyhedral approximations of \({\varvec{S}}_+^n\), which corresponds to the special case with \(k=1\).
Similar content being viewed by others
Notes
The smallest integer m such that \(\text {COR}(n)\) admits a \({\varvec{S}}_+^m\)-lift; see Sect. 2.2. This is related to the \({\varvec{S}}_+^k\)-extension complexity, but they are different notions of complexity.
Note that \(f_k\) is square-free (because \(f_k\) is multilinear), and thus, \(\nabla ^2 f_k = 0\).
More precisely, \(A_{ij} = \frac{1}{2} {\mathbb {E}}_{X \sim \mu (H_n)} [ X_i X_j f(X) ]\) for \(i, j \in [n]\) such that \(i \ne j\).
That is, \(Q_{\chi ^2}(s) := \inf \{ x \in {\mathbb {R}}: F_{\chi ^2}(x) \ge s \}\) for \(0 < s \le 1\) where \(F_{\chi ^2}\) be the cumulative distribution function of the \(\chi ^2\)-distribution with one degree of freedom.
In the language of matrix operations, given a slack matrix s(x, y) whose rows are indexed by x and columns are indexed by y, we left-multiply the degree-2 projection matrix to s(x, y) and then take the (scaled) trace of the resulting matrix.
See (8) for the definition of p-norm in Step 3 of the proof. Note that \(\Vert \text {proj}_2 f(x)\Vert _2^2\) for \(f: H_n \rightarrow {\mathbb {R}}\) can be interpreted as the variance of the random variable \(\text {proj}_2 f(X)\) where \(X \sim \mu (H_n)\).
Notice that we are computing the second and the fourth moments of the sum of the entries of a random vector uniformly distributed over the n-dimensional hypercube. Thus, we already expect to obtain by the central limit theorem that \(\lim _{n \rightarrow \infty } \frac{1}{n} {\mathbb {E}}_{x \sim \mu }\big [ (x^T y)^2 \big ] = 1\) and \(\lim _{n\rightarrow \infty }\frac{1}{n^2} {\mathbb {E}}_{x \sim \mu }\big [ (x^T y)^4 \big ] = 3\), which correspond to the second and the fourth moments of the standard Gaussian distribution.
See [21, Lemma 4.4.1] for example.
The lemma in the reference is stated with an unspecified constant C, but one can verify the inequality stated here by carefully following the proof of [21, Lemma 6.2.3].
References
Ahmadi, A.A., Dash, S., Hall, G.: Optimization over structured subsets of positive semidefinite matrices via column generation. Discrete Optim. 24, 129–151 (2017)
Ahmadi, A.A., Hall, G.: Sum of squares basis pursuit with linear and second order cone programming. Algebraic Geom. Methods Discrete Math. 685, 27–53 (2017)
Ahmadi, A.A., Majumdar, A.: DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization. SIAM J. Appl. Algebra Geom. 3(2), 193–230 (2019)
Aubrun, G., Szarek, S.: Dvoretzky’s theorem and the complexity of entanglement detection. Discrete Analysis, pp. 1242 (2017)
Aubrun, G., Szarek, S.J.: Alice and Bob Meet Banach, vol. 223. American Mathematical Society, Providence (2017)
Beckner, W.: Inequalities in Fourier analysis. Ann. Math. 102, 159–182 (1975)
Blekherman, G., Dey, S.S., Molinaro, M., Sun, S.: Sparse PSD approximation of the PSD cone. Math. Program. 191, 981–1004 (2022)
Boman, E.G., Chen, D., Parekh, O., Toledo, S.: On factor width and symmetric H-matrices. Linear Algebra Appl. 405, 239–248 (2005)
Bonami, A.: Étude des coefficients de Fourier des fonctions de \( L^{p} (G)\). In: Annales de l’Institut Fourier, vol. 20, pp. 335–402. (1970)
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)
Fawzi, H.: On polyhedral approximations of the positive semidefinite cone. Math. Oper. Res. 46(4), 1479–1489 (2021)
Fawzi, H., Gouveia, J., Parrilo, P.A., Saunderson, J., Thomas, R.R.: Lifting for simplicity: concise descriptions of convex sets. arXiv preprint arXiv:2002.09788 (2020)
Fawzi, H., Parrilo, P.A.: Exponential lower bounds on fixed-size psd rank and semidefinite extension complexity. arXiv preprint arXiv:1311.2571 (2013)
Gouveia, J., Parrilo, P.A., Thomas, R.R.: Lifts of convex sets and cone factorizations. Math. Oper. Res. 38(2), 248–264 (2013)
Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28, 1302–1338 (2000)
Lee, J.R., Raghavendra, P., Steurer, D.: Lower bounds on the size of semidefinite programming relaxations. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing, pp. 567–576 (2015)
O’Donnell, R.: Analysis of Boolean Functions. Cambridge University Press, Cambridge (2014)
O’Rourke, S., Vu, V., Wang, K.: Eigenvectors of random matrices: a survey. J. Comb. Theory Ser. A 144, 361–442 (2016)
Regev, O., Klartag, B.A.: Quantum one-way communication can be exponentially stronger than classical communication. In: Proceedings of the forty-third annual ACM symposium on theory of computing, pp. 31–40 (2011)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Vershynin, R.: High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)
Yannakakis, M.: Expressing combinatorial optimization problems by linear programs. J. Comput. Syst. Sci. 43(3), 441–466 (1991)
Acknowledgements
This research was partially supported by NSF grant AF CCF-1565235.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proof of some lemmas from section 2
1.1 Proof of lemma 6
Proof
Let \(f = f_0 + f_1 + f_2 + \dots + f_n\) be the Fourier expansion of f. Then for \(0 \le \rho \le 1\),
With \(\rho = \sqrt{p-1}\) for \(1 \le p \le 2\), we have \(\Vert T_{\rho }f \Vert _2 \le \Vert f \Vert _p\) by hypercontractivity. Then it follows that
because \(\Vert f \Vert _p = {\mathbb {E}}[ f^p ]^{\frac{1}{p}} \le \Lambda ^{\frac{p-1}{p}} {\mathbb {E}}[ f ]^{\frac{1}{p}} \le \Lambda ^{\frac{p-1}{p}} \le \Lambda ^{p-1}\). If \(\Lambda < e\), we choose \(p = 2\) to get \(\Vert \text {proj}_2 f \Vert _2 \le \Lambda \). Otherwise, we choose \(p = 1 + \frac{1}{\log \Lambda }\) to obtain \(\Vert \text {proj}_2 f \Vert _2 \le e \log ( \Lambda )\). \(\square \)
1.2 Proof of lemma 8
Proof
We consider a Gaussian process \((X_v)_{v \in {\mathbb {S}}^{n-1}}\) defined over \( {\mathbb {S}}^{n-1}\) such that \(X_v = v^T G v + \gamma \) with G being standard Gaussian in \({\varvec{S}}^n\) and \(\gamma \sim N(0,1)\) independent of G. It is easy to verify that \({\mathbb {E}}\big [ \sup _{v \in {\mathbb {S}}^{n-1}} \langle v, Gv \rangle \big ] = {\mathbb {E}}_{G, \gamma }\big [ \sup _{v \in {\mathbb {S}}^{n-1}} X_v \big ]\). Now we introduce an auxiliary Gaussian process \((Y_v)_{v \in {\mathbb {S}}^{n-1}}\) such that \(Y_v = g^Tv\) with \(g \sim N(0, 2 I_n)\). Observe that for all \(u, v \in {\mathbb {S}}^{n-1}\), (1) \({\mathbb {E}}X_v = {\mathbb {E}}Y_v = 0\); and (2) \({\mathbb {E}}(X_u - X_v)^2 \le {\mathbb {E}}(Y_u - Y_v)^2\) because \({\mathbb {E}}X_v^2 = {\mathbb {E}}Y_v^2 = 2\) and \({\mathbb {E}}X_u X_v - {\mathbb {E}}Y_u Y_v = (1 - u^T v )^2 \ge 0\). Thus, we can apply Sudakov-Fernique inequality (Lemma 7) to obtain \({\mathbb {E}}_{G, \gamma } \big [ \sup _{v \in {\mathbb {S}}^{n-1}} X_v \big ] \le {\mathbb {E}}_{g \sim N(0, 2I_n)} \big [ \sup _{v \in {\mathbb {S}}^{n-1}} Y_v \big ] = {\mathbb {E}}_{g \sim N(0, 2I_n)} \Vert g \Vert _2 \le \big ( {\mathbb {E}}_{g \sim N(0, 2I_n)} \Vert g \Vert _2^2 \big )^{1/2} = \sqrt{2n}\). \(\square \)
1.3 Proof of lemma 10
Proof
Let \(h_K(u) := \max _{x \in K} \left\langle u,~ x \right\rangle = \Vert u \Vert _{K^{\circ }}\) denote the support function of K. The function \(h_K\) is L-Lipschitz with \(L = \sup _{x \in K} \Vert x \Vert _2\), the diameter of K, because for any \(u, v \in {\mathbb {R}}^d\),
Moreover, we can show that \(\sup _{x \in K} \Vert x \Vert _2 \le \sqrt{2\pi } w_G(K)\). To see this, let B(0, R) denote the Euclidean ball centered at 0 with radius R. It follows from [21, Proposition 7.5.2-(e)] that \(\sup _{x, y \in K} \Vert x - y \Vert _2 \le \sqrt{2\pi } w_G(K)\). Since \(0 \in K\), this implies \(K \subseteq B(0, \sqrt{2\pi }w_G(K) )\). Applying Lemma 9 with \(f=h_K\) and \(\tau = \alpha w_G(K)\) completes the proof. \(\square \)
1.4 Proof sketch of lemma 11
An Auxiliary Lemma. The MGF of a decoupled Gaussian chaos is bounded as in Lemma 15.
Lemma 15
(MGF of Gaussian chaos) Let \(X, X' \sim N(0, I_n)\) be independent Gaussian random vectors and let \(A \in {\mathbb {R}}^{n \times n}\). Then
for all \(\lambda \) satisfying \(|\lambda | \le \frac{1}{\sqrt{2}\Vert A\Vert _{op}}\).
Proof
Let \(A = U \Sigma V^T\) be a singular value decomposition of A, and let \(g = U^T X\), \(g' = V^T X'\). Observe that \(g, g'\) are independent standard Gaussian random vectors in \({\mathbb {R}}^n\), and that \(X^T A X' = \sum _i s_i g_i g_i'\) where \(\{ s_i \}_{i=1}^n\) are the singular values of A (i.e., the diagonal elements of the nonnegative diagonal matrix \(\Sigma \)). As this is a sum of n independent random variables, we have
Now, for each \(i \in [n]\), we use the MGF formulas for the Gaussian and the chi-squared random variables to get
Since \((1-t)^{-1/2} \le e^t\) for all t satisfying \(0 \le t \le 0.7968\), we have
for all \(\lambda \) such that \(\lambda ^2 \Vert A\Vert _{op}^2 \le 1/2 < 0.7968\). \(\square \)
Proof Sketch of Lemma 11
Proof (Sketch)
Let \(X'\) be an independent copy of X. Then \({\mathbb {E}}\exp ( \lambda X^T A X ) \le {\mathbb {E}}\exp ( 4 \lambda X^T A X' )\) for all \(\lambda \in {\mathbb {R}}\) by decoupling lemma [21, Theorem 6.1.1]. Next, let \(g, g' \sim N(0, I_n)\) be independent Gaussian random vectors. Then \({\mathbb {E}}\exp ( \lambda X^T A X' ) \le {\mathbb {E}}\exp ( \lambda v g^T A g' )\) for all \(\lambda \in {\mathbb {R}}\) due to comparison lemmaFootnote 9 [21, Lemma 6.2.3]. Lastly, we apply Lemma 15 the MGF upper bound for Gaussian chaos to conclude the proof. \(\square \)
1.5 Proof of lemma 12
Proof
Let A be a symmetric \(n \times n\) matrix such that \(A_{ii} = 0, ~\forall i\) and \(A_{ij} = \frac{1}{2} {\mathbb {E}}_{Y \sim \mu (H_n)}[ Y_i Y_j f(Y) ]\) for \(i \ne j\). Then we observe that for all \(X \in H_n\),
Note that \(X_i\) is sub-Gaussian with sub-Gaussian parameter 1 for all i because \({\mathbb {E}}[ e^{\lambda X_i} ] = \frac{1}{2}( e^{\lambda } + e^{-\lambda } ) \le e^{\lambda ^2/2}\). To conclude the proof, it suffices to observe that \(\Vert A \Vert _F^2 = \sum _{\begin{array}{c} {i = 1} \\ {j \ne i} \end{array}}^n \big ( \frac{1}{2} {\mathbb {E}}_{X \sim \mu (H_n)}[ X_i X_j f(X) ] \big )^2 = \frac{1}{2} \Vert \text {proj}_2 f \Vert _2^2\) and \(\Vert A \Vert _{op} \le \Vert A \Vert _F\), and then apply Lemma 11. \(\square \)
1.6 Proof of lemma 13
Proof
For any \(\lambda \in (0, 1/c]\),
It remains to choose \(\lambda \) in the interval (0, 1/c] to optimize the upper bound. If \(\sqrt{2 \log N / v } \le 1/c\), then we choose \(\lambda = \sqrt{ 2 \log N / v }\) to get \({\mathbb {E}}\big [ \max _{i \in [N] } X_i \big ] \le \sqrt{2 v \log N}\). On the other hand, if \(\sqrt{2 \log N / v } \le 1/c\), then we choose \(\lambda = 1/c\) to get \({\mathbb {E}}\big [ \max _{i \in [N] } X_i \big ] \le 2c \log N \) since \(v/2c \le \sqrt{2 \log N / v} \le c \log N\). \(\square \)
More on example 3 (ball, needle, and pancake)
Let \(B_2^d := \{ x \in {\mathbb {R}}^d: \Vert x \Vert _2 \le 1 \}\) denote the d-dimensional unit \(\ell _2\)-ball, and let \(B = B_2^d\). Fix \(0< \delta < 1\), and let \(N = \mathrm {conv}\big \{ B_2^d(0, 1) \cup \{ \pm \frac{1}{\delta } e_1 \} \big \}\) be the ‘needle’ where \(e_1 = (1, 0, \dots , 0) \in {\mathbb {R}}^d\). Lastly, we define the ‘pancake’ \(P= \{ x \in B: -\delta \le x_1 \le \delta \}\) where \(x_1\) is the first coordinate of \(x \in {\mathbb {R}}^d\). Observe that N and P are the polars of each other, and B is the polar of itself.
First of all, \(w_G(B) = {\mathbb {E}}_g \Vert g \Vert _2 = \kappa _d\) and it is known that \(\sqrt{d - 1/2} \le \kappa _d \le \sqrt{d - d/(2d+1)}\), cf. the paragraph below Definition 2. Next, we can see that \(w_G(N) \ge \frac{1}{\delta }\sqrt{2/\pi }\) because \( \{ \pm \frac{1}{\delta } e_1 \} \subseteq N\) and thus, \(w_G(N) \ge w_G \big ( \{ \pm \frac{1}{\delta } e_1 \} \big ) = \frac{1}{\delta } {\mathbb {E}}_{g \sim {\mathcal {N}}(0,1)} |g| = \frac{1}{\delta }\sqrt{2/\pi }\). Lastly, observe that \(w_G(P) \ge \kappa _{d-1} \ge \sqrt{d - 3/2}\) because \(\{0\} \times B_2^{d-1}(0,1) \subseteq P\) and \(w_G(P) \ge w_G \big ( \{0\} \times B_2^{d-1}(0,1) \big ) = w_G \big ( B_2^{d-1}(0,1) \big ) = \kappa _{d-1}\).
It follows that B is an \(\epsilon \)-approximation of P in the average sense for \(\epsilon = \kappa _d / \kappa _{d-1} - 1 \le 3 / (2d - 3)\). Nevertheless, B is not an \(\epsilon '\)-approximation of P in the dual-average sense unless \(\epsilon ' \ge \frac{1}{\delta } \sqrt{2/\pi } / \kappa _d - 1\ge \frac{2}{ \delta \sqrt{ \pi (2d-1) } } - 1\), which can be made arbitrarily large by choosing small \(\delta \). For example, if we choose \(\delta \le 1/\sqrt{\pi (2d-1)}\), then \(\epsilon ^*_{\text {dual-avg}}(P,S) \ge 1\) whereas \(\epsilon ^*_{\text {avg}}(P, S) \le 3/(2d-3)\) regardless of \(\delta \).
Solving the cubic inequality \(z^3 + \alpha z \ge \beta \) with \(\beta > 0\)
Consider a cubic equation of the form \(z^3 + \alpha z - \beta = 0\), which is commonly referred to as a depressed cubic. Note that when \(\beta > 0\), this cubic equation always has a positive real root. The other two roots can be either negative real roots (when \(D \le 0\)), or a pair of complex conjugate roots (when \(D > 0\)), depending on the sign of its discriminant, \(D = (\alpha /3)^3 + (\beta /2)^2\).
Indeed, we can find the roots with a generic cubic formula, known as Cardano’s formula. Let \(i = \sqrt{-1}\) denote the imaginary unit, \(\omega = \frac{-1 + \sqrt{3} i}{2}\) be a primitive 3rd of unity, and
Case 1: \(D > 0\). When \(D > 0\), the cubic equation \(z^3 + \alpha z - \beta = 0\) with \(\beta > 0\) has only one real root, \(z^*= T_+ + T_-\), which turns out to be positive. Thus, the set of real solutions for the cubic inequality \(z^3 + \alpha z \ge \beta \) is \(\{ z \in {\mathbb {R}}: z \ge T_+ + T_- \}\).
Case 2: \(D \le 0\). There are three real roots for the cubic equation \(z^3 + \alpha z - \beta = 0\), which can be written as
One of these three real roots is positive, and the other two are negative.
Note that (42) now involves complex roots, and the choice of branches might affect the order of the roots, \(z_1, z_2, z_3\), however, the choice will not change the values of the roots. To avoid any ambiguity in our description, we choose the principal branch so that \(\text {Arg}\left( \root m \of {z} \right) \in (-\frac{\pi }{m}, \frac{\pi }{m}]\) for any complex number z and any positive integer m.
Observe that \(T_+ = \root 3 \of { \beta /2 + \sqrt{|D|}i }\) and \(\text {Arg}\left( T_+ \right) \in [0, \pi /3)\). Similarly, we can see that \(\text {Arg}\left( T_- \right) \in (-\pi /3, 0]\). It follows that \(T_+ + T_-\) is a positive real number, and thus, the largest real root. Thus, the set of real solutions for the cubic inequality \(z^3 + \alpha z \ge \beta \) is \(\{ z \in {\mathbb {R}}: z \ge T_+ + T_- \}\).
Rights and permissions
About this article
Cite this article
Song, D., Parrilo, P.A. On approximations of the PSD cone by a polynomial number of smaller-sized PSD cones. Math. Program. 198, 733–785 (2023). https://doi.org/10.1007/s10107-022-01795-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01795-7