Abstract
The little Grothendieck problem consists of maximizing \(\sum _{ij}C_{ij}x_ix_j\) for a positive semidefinite matrix C, over binary variables \(x_i\in \{\pm 1\}\). In this paper we focus on a natural generalization of this problem, the little Grothendieck problem over the orthogonal group. Given \(C\in \mathbb {R}^{dn\times dn}\) a positive semidefinite matrix, the objective is to maximize \(\sum _{ij}{\text {tr}}\left( C^T_{ij}O_iO_j^T\right) \) restricting \(O_i\) to take values in the group of orthogonal matrices \(\mathcal {O}_d\), where \(C_{ij}\) denotes the (ij)-th \(d\times d\) block of C. We propose an approximation algorithm, which we refer to as Orthogonal-Cut, to solve the little Grothendieck problem over the group of orthogonal matrices \(\mathcal {O}_d\) and show a constant approximation ratio. Our method is based on semidefinite programming. For a given \(d\ge 1\), we show a constant approximation ratio of \(\alpha _{\mathbb {R}}(d)^2\), where \(\alpha _{\mathbb {R}}(d)\) is the expected average singular value of a \(d\times d\) matrix with random Gaussian \(\mathcal {N}\left( 0,\frac{1}{d}\right) \) i.i.d. entries. For \(d=1\) we recover the known \(\alpha _{\mathbb {R}}(1)^2=2/\pi \) approximation guarantee for the classical little Grothendieck problem. Our algorithm and analysis naturally extends to the complex valued case also providing a constant approximation ratio for the analogous little Grothendieck problem over the Unitary Group \(\mathcal {U}_d\). Orthogonal-Cut also serves as an approximation algorithm for several applications, including the Procrustes problem where it improves over the best previously known approximation ratio of \(\frac{1}{2\sqrt{2}}\). The little Grothendieck problem falls under the larger class of problems approximated by a recent algorithm proposed in the context of the non-commutative Grothendieck inequality. Nonetheless, our approach is simpler and provides better approximation with matching integrality gaps. Finally, we also provide an improved approximation algorithm for the more general little Grothendieck problem over the orthogonal (or unitary) group with rank constraints, recovering, when \(d=1\), the sharp, known ratios.
Similar content being viewed by others
Notes
We also note that these semidefinite programs satisfy Slater’s condition as the identity matrix is a feasible point. This ensures strong duality, which can be exploited by many semidefinite programming solvers.
These ideas also play a major role in the unidimensional complex case treated by Man-Cho So et al. [25].
The additional constraint that forces a matrix to be in the special orthogonal or unitary group is having determinant equal to 1 which is not quadratic.
References
Alizadeh, F., Haeberly, J.-P.A., Overton, M.L.: Primal-dual interior-point methods for semidefinite programming: convergence rates, stability and numerical results. SIAM J. Optim. 8(3), 746–768 (1998)
Alon, N., Makarychev, K., Makarychev, Y., Naor, A.: Quadratic forms on graphs. Invent. Math. 163, 486–493 (2005)
Alon, N., Naor, A.: Approximating the cut-norm via Grothendieck’s inequality. In: Proceedings of the 36 th ACM STOC, pp. 72–80. ACM Press (2004)
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York (1964)
Bandeira, A.S.: Convex relaxations for certain inverse problems on graphs. Ph.D. thesis, Program in Applied and Computational Mathematics, Princeton University (2015)
Briet, J., Buhrman, H., Toner, B.: A generalized Grothendieck inequality and nonlocal correlations that require high entanglement. Commun. Math. Phys. 305(3), 827–843 (2011)
Briet, J., Filho, F.M.O., Vallentin, F.: The positive semidefinite Grothendieck problem with rank constraint. In: Automata, Languages and Programming, vol. 6198 of Lecture Notes in Computer Science, pp. 31–42. Springer, Berlin (2010)
Briet, J., Regev, O., Saket, R.: Tight hardness of the non-commutative Grothendieck problem. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1108–1122. doi:10.1109/FOCS.2015.72 (2015)
Bandeira, A.S., Singer, A., Spielman, D.A.: A Cheeger inequality for the graph connection Laplacian. SIAM J. Matrix Anal. Appl. 34(4), 1611–1630 (2013)
Ben-Tal, A., Nemirovski, A.: On tractable approximations of uncertain linear matrix inequalities affected by interval uncertainty. SIAM J. Optim. 12, 811–833 (2002)
Carlen, E.A.: Trace inequalities and quantum entropy: an introductory course. http://www.ueltschi.org/azschool/notes/ericcarlen.pdf (2009)
Couillet, R., Debbah, M.: Random Matrix Methods for Wireless Communications. Cambridge University Press, New York (2011)
Chaudhury, K.N., Khoo, Y., Singer, A.: Global registration of multiple point clouds using semidefinite programming. SIAM J. Optim. 25(1), 126–185 (2015)
Charikar, M., Wirth, A.: Maximizing quadratic programs: extending Grothendieck’s inequality. In: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’04, pp. 54–60. IEEE Computer Society, Washington (2004)
Fan, K., Hoffman, A.J.: Some metric inequalities in the space of matrices. Proc. Am. Math. Soc. 6(1), 111–116 (1955)
Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products, Fifth Edition, 5th edn. Academic Press, Cambridge (1994)
Grothendieck, A.: Resume de la theorie metrique des produits tensoriels topologiques. Bol. Soc. Mat. Sao Paulo, p. 179 (1996). (French)
Gotze, F., Tikhomirov, A.: On the rate of convergence to the Marchenko–Pastur distribution. arXiv:1110.1284 [math.PR] (2011)
Goemans, M.X., Williamson, D.P.: Improved apprximation algorithms for maximum cut and satisfiability problems using semidefine programming. J. Assoc. Comput. Mach. 42, 1115–1145 (1995)
Higham, N.J.: Computing the polar decomposition-with applications. SIAM J. Sci. Stat. Comput. 7, 1160–1174 (1986)
Keller, J.B.: Closest unitary, orthogonal and hermitian operators to a given operator. Math. Mag. 48(4), 192–197 (1975)
Khot, S.: On the unique games conjecture (invited survey). In: Proceedings of the 2010 IEEE 25th Annual Conference on Computational Complexity, CCC ’10, pp. 99–121. IEEE Computer Society, Washington (2010)
Leveque, O.: Random matrices and communication systems: Wishart random matrices: marginal eigenvalue distribution. http://ipg.epfl.ch/~leveque/Matrix/ (2012)
Livan, G., Vivo, P.: Moments of Wishart–Laguerre and Jacobi ensembles of random matrices: application to the quantum transport problem in chaotic cavities. Acta Phys. Pol. B 42, 1081 (2011)
Man-Cho So, A., Zhang, J., Ye, Y.: On approximating complex quadratic optimization problems via semidefinite programming relaxations. Math. Program. 110(1), 93–110 (2007)
Nemirovski, A.: Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program. 109(2–3), 283–317 (2007)
Nesterov, Y.: Semidefinite relaxation and nonconvex quadratic optimization. Optim. Methods Softw. 9(1–3), 141–160 (1998)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 of Applied Optimization. Springer, Berlin (2004)
Nemirovski, A., Roos, C., Terlaky, T.: On maximization of quadratic form over intersection of ellipsoids with common center. Math. Program. 86(3), 463–473 (1999)
Naor, A., Regev, O., Vidick, T.: Efficient rounding for the noncommutative Grothendieck inequality. In: Proceedings of the 45th annual ACM Symposium on Symposium on theory of computing, STOC ’13, pp. 71–80. ACM, New York (2013)
Pisier, G.: Grothendieck’s theorem, past and present. Bull. Am. Math. Soc. 49, 237323 (2011)
Raghavendra, P.: Optimal algorithms and inapproximability results for every CSP. In: Proceedings of 40th ACM STOC, pp. 245–254 (2008)
Schonemann, P.H.: A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1), 1–10 (1966)
Shen, J.: On the singular values of Gaussian random matrices. Linear Algebra Appl. 326(13), 1–14 (2001)
Singer, A.: Angular synchronization by eigenvectors and semidefinite programming. Appl. Comput. Harmon. Anal. 30(1), 20–36 (2011)
So, A.-C.: Moment inequalities for sums of random matrices and their applications in optimization. Math. Program. 130(1), 125–151 (2011)
Singer, A., Shkolnisky, Y.: Three-dimensional structure determination from common lines in Cryo-EM by eigenvectors and semidefinite programming. SIAM J. Imaging Sci. 4(2), 543–572 (2011)
Tulino, A.M., Verdú, S.: Random matrix theory and wireless communications. Commun. Inf. Theory 1(1), 1–182 (2004)
Vanderberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38, 49–95 (1996)
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y., Kutyniok, G. (eds.) Chapter 5 of: Compressed Sensing, Theory and Applications. Cambridge University Press, Cambridge (2012)
Wen, Z., Goldfarb, D., Scheinberg, K.: Block coordinate descent methods for semidefinite programming. In: Handbook on Semidefinite, Conic and Polynomial Optimization, vol. 166 of International Series in Operations Research & Management Science, pp. 533–564. Springer, US (2012)
Acknowledgments
The authors would like to thank Moses Charikar for valuable guidance in context of this work and Jop Briet, Alexander Iriza, Yuehaw Khoo, Dustin Mixon, Oded Regev, and Zhizhen Zhao for insightful discussions on the topic of this paper. Special thanks to Johannes Trost for a very useful answer to a Mathoverflow question posed by the first author. Finally, we would like to thank the reviewers for numerous suggestions that helped to greatly improve the quality of this paper.
A. S. Bandeira was supported by AFOSR Grant No. FA9550-12-1-0317. A. Singer was partially supported by Award Number FA9550-12-1-0317 and FA9550-13-1-0076 from AFOSR, by Award Number R01GM090200 from the NIGMS, and by Award Number LTR DTD 06-05-2012 from the Simons Foundation. Parts of this work have appeared in C. Kennedy’s senior thesis at Princeton University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Most of this work was done while ASB was at Princeton University.
Appendices
Appendix 1: Technical proofs—analysis of algorithm for the Stiefel Manifold setting
Lemma 17
Let \(r\ge d\). Let G be a \(d\times r\) Gaussian random matrix with real valued i.i.d. \(\mathcal {N}\left( 0,\frac{1}{r}\right) \) entries and let \(\alpha _{\mathbb {R}}(d,r)\) as defined in Definition 11. Then,
Furthermore, if G is a \(d\times r\) Gaussian random matrix with complex valued i.i.d. \(\mathcal {N}\left( 0,\frac{1}{r}\right) \) entries and \(\alpha _{\mathbb {C}}(d,r)\) the analogous constant (Definition 11), then
The proof of this Lemma is a simple adaptation of the proof of Lemma 6.
Proof
We restrict the presentation to the real case. As before, all the arguments are equivalent to the complex case, replacing all transposes with Hermitian adjoints and \(\alpha _{\mathbb {R}}(d,r)\) with \(\alpha _{\mathbb {C}}(d,r)\).
Let \(G = U [\Sigma \ 0] V^T\) be the singular value decomposition of G. Since \(GG^T = U\Sigma ^2 U^T\) is a Wishart matrix, it is well known that its eigenvalues and eigenvectors are independent and U is distributed according to the Haar measure in \(\mathcal {O}_d\) (see e.g. Lemma 2.6 in [38]). To resolve ambiguities, we consider \(\Sigma \) ordered such that \(\Sigma _{11} \ge \Sigma _{22} \ge \cdots \ge \Sigma _{dd}\).
Let \(Y = \mathcal {P}_{(d,r)}(G) G^T\). Since
we have
Note that \( G\mathcal {P}_{(d,r)}(G)^T = U \Sigma U^T = Y\).
Since \(Y_{ij} = u_i \Sigma u_j^T\), where \(u_1,\ldots ,u_d\) are the rows of U, and U is distributed according to the Haar measure, we have that \(u_j\) and \(-u_j\) have the same distribution conditioned on any \(u_i\), for \(i\ne j\), and \(\Sigma \). This implies that, if \(i\ne j, Y_{ij} = u_i \Sigma u_j^T\) is a symmetric random variable, and so \(\mathbb {E}Y_{ij} = 0\). Also, \(u_i\sim u_j\) implies that \(Y_{ii}\sim Y_{jj}\). This means that \(\mathbb {E}Y = c I_{d\times d}\) for some constant c. To obtain c,
which shows the lemma. \(\square \)
Lemma 18
Let \(r\ge d\). Let \(M,N\in \mathbb {R}^{d\times nd}\) such that \(MM^T = NN^T = I_{d \times d}\). Let \(R \in \mathbb {R}^{nd \times r}\) be a Gaussian random matrix with real valued i.i.d. entries \(\mathcal {N}\left( 0, \frac{1}{r} \right) \). Then
where \(\alpha _{\mathbb {R}}(d,r)\) is the constant in Definition 11.
Analogously, if \(M,N\in \mathbb {C}^{d\times nd}\) such that \(MM^H = NN^H = I_{d \times d}\), and \(R \in \mathbb {C}^{nd \times r}\) is a Gaussian random matrix with complex valued i.i.d. entries \(\mathcal {N}\left( 0, \frac{1}{r} \right) \), then
where \(\alpha _{\mathbb {C}}(d,r)\) is the constant in Definition 11.
Similarly to above, the proof of this Lemma is a simple adaptation of the proof of Lemma 5.
Proof
We restrict the presentation of proof to the real case. Nevertheless, all the arguments trivially adapt to the complex case by, essentially, replacing all transposes with Hermitian adjoints and \(\alpha _{\mathbb {R}}(d)\) and \(\alpha _{\mathbb {R}}(d,r)\) with \(\alpha _{\mathbb {C}}(d)\) and \(\alpha _{\mathbb {C}}(d,r)\).
Let \(A = \left[ M^T\text { } N^T\right] \in \mathbb {R}^{dn\times 2d}\) and \(A=QB\) be the QR decomposition of A with \(Q\in \mathbb {R}^{nd\times nd}\) an orthogonal matrix and \(B \in \mathbb {R}^{nd \times 2d}\) upper triangular with non-negative diagonal entries; note that only the first 2d rows of B are nonzero. We can write
where \(B_{11}\in \mathbb {R}^{d\times d}\) and \(B_{22}\in \mathbb {R}^{d\times d}\) are upper triangular matrices with non-negative diagonal entries. Since
\(B_{11} = (Q^T M^T)_{11}\) is an orthogonal matrix, which together with the non-negativity of the diagonal entries (and the fact that \(B_{11}\) is upper-triangular) forces \(B_{11}\) to be \(B_{11} = I_{d\times d}\).
Since R is a Gaussian matrix and Q is an orthogonal matrix, \(QR \sim R\) which implies
Since \(MQ=[B_{11}^T, 0_{d\times d},\ldots ,0_{d\times d}] = [I_{d\times d}, 0_{d\times d},\ldots ,0_{d\times d}]\) and \(NQ = [B_{12}^T, B_{22}^T, 0_{d\times d},\ldots ,0_{d\times d}]\),
where \(R_1\) and \(R_2\) are the first two \(d\times r\) blocks of R. Since these blocks are independent, the second term vanishes and we have
The Lemma now follows from using Lemma 17 to obtain \(\mathbb {E}\left[ \mathcal {P}_{(d,r)}(R_1) R_1^ T\right] = \alpha _{\mathbb {R}}(d,r)I_{d\times d}\) and noting that \(B_{12} = (Q^TM^T)^T(Q^TN^T) = MN^T\).
The same argument, with \(Q'B'\) the QR decomposition of \(A' = \left[ N^T M^T\right] \in \mathbb {R}^{dn\times 2d}\) instead, shows
\(\square \)
Appendix 2: Bounds for the average singular value
Lemma 19
Let \(G_{\mathbb {C}} \in \mathbb {C}^{d\times d}\) be a Gaussian random matrix with i.i.d. complex valued \(\mathcal {N}(0,d^{-1})\) entries and define \(\alpha _{\mathbb {C}}(d):= \mathbb {E}\left[ \frac{1}{d} \sum _{j=1}^d \sigma _j(G_{\mathbb {C}})\right] \). We have the following bound
Proof
We express \(\alpha _{\mathbb {C}}(d)\) as sums and products of Gamma functions and then use classical bounds to obtain our result.
Recall that from equation (16),
where
and \(L_n(x)\) is the nth Laguerre polynomial,
This integral can be expressed as [see [16] section 7.414 equation 4(1)]
where \((x)_m\) is the Pochhammer symbol
The next lemma states a couple basic facts about the Gamma function that we will need in the subsequent computations. \(\square \)
Lemma 20
The Gamma function satisfies the following inequalities:
Proof
See [4] page 255. \(\square \)
We want to bound the summation in (27), which we rewrite as
For simplicity define
so that (27) becomes
The first term we can compute explicitly (see [16]) as
For the second term we use the fact that \((\frac{-1}{2})_m = \Gamma (m-1/2)/\Gamma (-1/2)\) to get
Using the first inequality in Lemma 20 and the multiplication formula for the Gamma function,
so we have
For the third term, we use the formula \((x)_m = \frac{\Gamma (x+n)}{\Gamma (x)}\) to deduce
Using the second bound in Lemma 20,
and also
so that
If we multiply top and bottom by \(\sqrt{n+1} + \sqrt{n-m+1/2}\) and use the fact that
then
Combining our bounds for (I), (II) and (III),
and by (26),
The term \(\frac{1}{d^{3/2}} \sum _{n=1}^{d-1} 4 \sqrt{n+1/2}/ \pi \) is the main term and can be bounded below by
The other error terms are at most
Combining the main and error term bounds, the lemma follows. \(\square \)
Lemma 21
For \(G_{\mathbb {K}} \in \mathbb {K}^{d\times d}\) a Gaussian random matrix with i.i.d. \(\mathbb {K}\) valued \(\mathcal {N}(0,d^{-1})\) entries, define \(\alpha _{\mathbb {K}}(d) := \mathbb {E}\left[ \frac{1}{d} \sum _{j=1}^d \sigma _j(G_{\mathbb {K}})\right] \). The following holds
Proof
To find an explicit formula for \(\alpha _{\mathbb {R}}(d)\), we need an expression for the spectral distribution of the wishart matrix \(d G_{\mathbb {R}} G_{\mathbb {R}}^T\), which we call \(p_d^{\mathbb {R}}(x)\), given by equation (16) in [24]:
where
\(\kappa = d\mod 2\) and \(\Gamma (a,y) = \int _y^\infty t^{a-1}e^{-t}dt\) is the incomplete Gamma function.
This means that
Recall that (see Sect. 5)
which implies
We are especially interested in the following terms which appear in the full expression for \(\alpha _{\mathbb {R}}(d)\):
From [16] section 7.414 equation 4(1), we have
The following lemma deals with bounds on sums involving Q(m, k) terms. \(\square \)
Lemma 22
For Q(m, k) as defined in (29) we have the following bounds
Proof
Note that in (30),
since \(m\ge k\).
For \(0<i<2k-1\), the ith term in the summation of Q(2m, 2k) can be bounded above by
This means that
We bound the sum from \(i=1\) to \(2k-3\) by
so that for \(k \ge 1\),
For \(k=0, Q(2m,0) < 0\) except for the term \(Q(0,0) = \sqrt{\pi }/2\) which also becomes negative in the full sum, so we ignore these terms.
We now turn our attention to the full sum \(\sum _{k=0}^m \frac{\Gamma (k+1/2)}{\Gamma (k+1)} Q(2m,2k)\). As before, we define for clarity
Using the bounds in lemma 20,
Finally,
To deduce the inequality (31), we use the previously derived bounds to show that
so that \(Q(2m-1,2k-1) \le Q(2m,2k)\). Now it suffices to note that in the full sum, \(\sum _{k=1}^m \frac{\Gamma (k+3/2)}{\Gamma (k+1)} Q(2m-1,2k-1) \le 2 \sum _{k=1}^m \frac{\Gamma (k+1/2)}{\Gamma (k)} Q(2m-1,2k-1)\) and we get
\(\square \)
We now return our focus to finding a bound on the expression for \(\alpha _{\mathbb {R}}(d)\) given in (28). Since \(\psi _1,\psi _2\) depend on the parity of d, we split in to two cases.
Odd \(d=2m+1\)
From (see [16] section 7.414 equation 6),
thus Eq. (28) becomes
and using the first bound in Lemma 22,
Even \(d=2m\)
For \(d=2m\), we have
We split the integral into two parts,
Expanding from the definition of \(\psi _1\) above, we have
so by Lemma 22,
The other part of the integral is
where we use the fact that for odd \(2m-1\) (see [16] section 7.414 equation 6),
We can bound the first integral in the expression of (II) by
so finally
Combining the above bounds we see that in the case of even \(d=2m\),
\(\square \)
Rights and permissions
About this article
Cite this article
Bandeira, A.S., Kennedy, C. & Singer, A. Approximating the little Grothendieck problem over the orthogonal and unitary groups. Math. Program. 160, 433–475 (2016). https://doi.org/10.1007/s10107-016-0993-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-016-0993-7