Skip to main content
Log in

Sublinear time algorithms for approximate semidefinite programming

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We consider semidefinite optimization in a saddle point formulation where the primal solution is in the spectrahedron and the dual solution is a distribution over affine functions. We present an approximation algorithm for this problem that runs in sublinear time in the size of the data. To the best of our knowledge, this is the first algorithm to achieve this. Our algorithm is also guaranteed to produce low-rank solutions. We further prove lower bounds on the running time of any algorithm for this problem, showing that certain terms in the running time of our algorithm cannot be further improved. Finally, we consider a non-affine version of the saddle point problem and give an algorithm that under certain assumptions runs in sublinear time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. The results presented in this paper are a continuation of preliminary results on sublinear semidefinite optimization presented in [11].

  2. Our results hold also under the weaker assumption that every \(c_i\) has a supergradient everywhere in \(\mathcal {S}\).

  3. As stated before it suffices to assume that \(c_i\) has a supergradient everywhere in \(\mathcal {S}\).

References

  1. Agarwal, A., Charikar, M., Makarychev, K., Makarychev, Y.: O(sqrt(log n)) approximation algorithms for min uncut, min 2cnf deletion, and directed cut problems. In: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, STOC ’05, pp. 573–581 (2005)

  2. Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  3. Arora, S., Lee, J.R., Naor, A.: Euclidean distortion and the sparsest cut. In: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, STOC ’05, pp. 553–562 (2005)

  4. Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph partitioning. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, STOC ’04, pp. 222–231 (2004)

  5. Baes, M., Bürgisser, M., Nemirovski, A.: A randomized mirror-prox method for solving structured large-scale matrix saddle-point problems. SIAM J. Optim. 23(2), 934–962 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  7. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York, NY (2006)

    Book  MATH  Google Scholar 

  8. Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. J. ACM 59(5), 23 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  9. D’Aspremont, A.W.: Subsampling algorithms for semidefinite programming. Stoch. Syst. 1, 274–305 (2011). doi:10.1214/10-SSY018

    Article  MathSciNet  MATH  Google Scholar 

  10. d’Aspremont, A., Ghaoui, L.E., Jordan, M.I., Lanckriet, G.R.G.: A direct formulation of sparse PCA using semidefinite programming. SIAM Rev. 3, 41–48 (2004)

    MATH  Google Scholar 

  11. Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: NIPS, pp. 1080–1088 (2011)

  12. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42, 1115–1145 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  13. Grigoriadis, M.D., Khachiyan, L.G.: A sublinear-time randomized approximation algorithm for matrix games. Oper. Res. Lett. 18(2), 53–58 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hazan, E.: Approximate convex optimization by online game playing. CoRR. arXiv:0610119 (2006)

  15. Hazan, E.: Sparse approximate solutions to semidefinite programs. In: Proceedings of the 8th Latin American conference on Theoretical informatics, LATIN’08, pp. 306–316 (2008)

  16. Sra, S., Nowozin, S., Wright, S.J.: Optimization for machine learning. MIT Press (2012)

  17. Juditsky, A., Nemirovski, A., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011). doi:10.1214/10-SSY011

    Article  MathSciNet  MATH  Google Scholar 

  18. Kuczyński, J., Woźniakowski, H.: Estimating the largest eigenvalues by the power and lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13, 1094–1122 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lanckriet, G.R.G., Cristianini, N., Ghaoui, L.E., Bartlett, P., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. In: Journal of Machine Learning Research, pp. 27–72 (2004)

  20. Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Program. 110(2), 245–259 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  22. Recht, B.: A simpler approach to matrix completion. J. Mach. Learn. Res. 12, 3413–3430 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trend. Mach. Learn. 4(2), 107–194 (2012)

    Article  MATH  Google Scholar 

  24. Shalev-shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. In: ICML, pp. 743–750 (2004)

  25. Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  26. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 15, 505–512 (2002)

    Google Scholar 

  27. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: ICML, pp. 928–936 (2003)

Download references

Acknowledgments

We would like to thank both of the anonymous reviewers for their constructive comments which have contributed significantly to the improvement of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Garber.

Appendix: Auxiliary lemmas used in the proof of the main theorem

Appendix: Auxiliary lemmas used in the proof of the main theorem

Most of the proofs given below are adopted from [8] to our needs and are brought here in full detail for completeness.

We begin by proving Lemma 5.

Proof

As a first step, note that for \(x>C\) we have \(x-\mathbb {E}[X] \ge C/2\), so that

$$\begin{aligned} C(x-C) \le 2(x-\mathbb {E}[X])(x-C) \le 2(x-\mathbb {E}[X])^2 . \end{aligned}$$

Hence, we obtain,

$$\begin{aligned} \mathbb {E}[X] - \mathbb {E}[\bar{X}]&= \int _{x<-C} (x+C) d\mu _X + \int _{x>C} (x-C) d\mu _X \\&\le \int _{x>C} (x-C) d\mu _X \\&\le \frac{2}{C} \int _{x>C} (x-\mathbb {E}[X])^2 d\mu _X \\&\le \frac{2}{C} \text{ Var }[X] . \end{aligned}$$

Similarly one can prove that \(\mathbb {E}[X] - \mathbb {E}[\bar{X}] \ge -2\text{ Var }[X]/C\), and the result follows.

In the following lemmas we assume only that \(v_t(i) = {{\mathrm{clip}}}(\tilde{v}_t(i),1/\eta )\) is the clipping of a random variable \(\tilde{v}_t(i)\), the variance of \(\tilde{v}_t(i)\) is at most one (\(\text{ Var }[\tilde{v}_t(i)] \le 1\)) and we use the notation \(\mu _t(i) = \mathbb {E}[\tilde{v}_t(i)]\). We also assume that the expectations of \(\tilde{v}_t(i)\) are bounded in absolute value by a constant, \(|\mu _t(i)| \le C\), such that \(1 \le 2C \le 1/\eta \).

The following lemma is a Bernstein-type inequality for martingales. For a proof see [8], Lemma B.3.

Lemma 22

Let \(\{Z_t\}\) be a martingale difference sequence with respect to filtration \(\{S_t\}\) (i.e., \(\mathbb {E}[Z_t|S_1,\ldots ,S_t] = 0\)). Assume the filtration \(\{S_t\}\) is such that the values in \(S_t\) are determined using only those in \(S_{t-1}\), and not any previous history, and so the joint probability distribution satisfies:

$$\begin{aligned} {{\mathrm{Pr}}}\left( S_1=s_1, S_2=s_2, \ldots , S_T=s_t\right) = \prod _{t\in [T-1]} {{\mathrm{Pr}}}\left( S_{t+1}=s_{t+1}\mid S_t=s_t\right) . \end{aligned}$$

In addition, assume for all t, \(\mathbb {E}[Z_t^2 | S_1,\ldots ,S_t]\le s\), and \(|Z_t| \le V\). Then

$$\begin{aligned} \log {{\mathrm{Pr}}}\left( \sum \nolimits _{t\in [T]} Z_t \ge \alpha \right) \le -\frac{\alpha ^2/2}{Ts + \alpha V/3}. \end{aligned}$$

The following three lemmas prove Lemmas 12, 17.

Lemma 23

For \(\sqrt{\frac{4\log {m}}{3T}} \le \eta \le 1/2C\), with probability at least \(1-O(1/m)\) it holds that

$$\begin{aligned} \max _{i\in [m]}\sum _{t\in [T]}[v_t(i) - \mu _t(i)] \le 5 \eta T. \end{aligned}$$

Proof

Lemma 5 implies that \(|\mathbb {E}[{v}_t(i)] - \mu _t(i)| \le 2\eta \), since \(\text{ Var }[\tilde{v}_t(i)]\le 1\).

We show that for given \(i\in [m]\), with probability \(1 - O(1/m^{2})\), \(\sum _{t\in [T]}[v_t(i) - \mathbb {E}[{v}_t(i)]] \le 3 \eta T \), and then apply the union bound over all \(i\in [m]\). This together with the above bound on \(|\mathbb {E}[{v}_t(i)] - \mu _t(i)|\) implies the lemma via the triangle inequality.

Fixing i, let \(Z_t^i \equiv v_t(i) - \mathbb {E}[{v}_t(i)]\), and consider the filtration given by

$$\begin{aligned} S_t \equiv (x_t, p_t, w_t, v_{t-1}, i_{t-1}, j_{t-1}, v_{t-1} - \mathbb {E}[{v}_{t-1}]). \end{aligned}$$

Using the notation \(\mathbb {E}_t[\cdot ] = \mathbb {E}[\cdot |S_t]\), observe that

  1. 1.

    \(\forall t \ . \ \mathbb {E}_t[(Z_t^i) ^2 ] = \mathbb {E}_t[v_t(i)^2] - \mathbb {E}_t[v_t(i)]^2 = \text{ Var }(v_t(i)) \le 1\).

  2. 2.

    \(|Z_t^i|\le 2/\eta \). This holds since by construction, \(|v_t(i)|\le 1/\eta \), and hence

    $$\begin{aligned} |Z_t^i|&= | v_t(i) - \mathbb {E}[v_t(i)]| \le | v_t(i)| + |\mathbb {E}[v_t(i)]| \le \frac{2}{\eta }. \end{aligned}$$

Using these conditions, despite the fact that the \(Z_t^i\) are not independent, we can use Lemma 22, and conclude that \(Z\equiv \sum _{t\in T} Z_t^i\) satisfies the Bernstein-type inequality with \(s=1\) and \(V=2/\eta \), and so

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge \alpha \right) \ge \frac{\alpha ^2/2}{Ts + \alpha V/3} \ge \frac{\alpha ^2/2}{T + 2\alpha /3\eta } . \end{aligned}$$

Hence, for \(a \ge 3\) we have that

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge a \eta T\right) \ge \frac{a^2/2}{1 + 2a/3} \eta ^2 T \ge \frac{a}{2} \eta ^2 T. \end{aligned}$$

For \(\eta \ge \sqrt{4\log {m}/aT}\), the above probability is at most \(e^{- 2 \log {m}} = 1/m^2\). Letting \(a=3\) we obtain the statement of the lemma.

Lemma 24

For \(\sqrt{\log {m}/T} \le \eta \le 1/2C\), with probability at least \(1-O(1/m)\),

$$\begin{aligned} \Big | \sum _{t\in [T]}p_t ^{\top }v_t - \sum _{t\in [T]}{}p_t^{\top }\mu _t \Big | \le 4 \eta T . \end{aligned}$$

Proof

This Lemma is proven in essentially the same manner as Lemma 23, and proven below for completeness.

Lemma 5 implies that \( |\mathbb {E}[{v}_t(i)] - \mu _t(i)| \le 2\eta , \) using \(\text{ Var }[\tilde{v}_t(i)]\le 1\). Since \(p_t\) is a distribution, it follows that \( |\mathbb {E}[p_t ^{\top }{v}_t] - p_t ^{\top }\mu _t| \le \eta . \) Let \(Z_t \equiv p_t ^{\top }v_t - \mathbb {E}[p_t ^{\top }{v}_t] = \sum _i p_t(i) Z_t^i\), where \(Z_t^i = v_t(i) - \mathbb {E}[v_t(i)]\). Consider the filtration given by

$$\begin{aligned} S_t \equiv (x_t, p_t, w_t, v_{t-1}, i_{t-1}, j_{t-1}, v_{t-1} - \mathbb {E}[{v}_{t-1}]) . \end{aligned}$$

Using the notation \(\mathbb {E}_t[\cdot ] = \mathbb {E}[\cdot |S_t]\), the quantities \(|Z_t|\) and \(\mathbb {E}_t[Z_t^2] \) can be bounded as follows:

$$\begin{aligned} |Z_t|&= \!\left| \sum _{i\in [n]}p_t(i) Z_t^i\right| \le \sum _{i\in [n]}p_t(i) \left| Z_t^i\right| \!\le \! 2 \eta ^{-1},&\text{ using }\, \left| Z_t^i\right| \!\le \! 2\eta ^{-1}\, \text{ as } \text{ in } \text{ Lemma } \text{23 }. \end{aligned}$$

Also, using properties of variance, we have that

$$\begin{aligned} \mathbb {E}[Z_t^2] = {{\mathrm{Var}}}[p_t ^{\top }v_t] \le \max _i \text{ Var }[v_t(i)] \le 1. \end{aligned}$$

With these conditions, we can use Lemma 22 and conclude that \(Z\equiv \sum _{t\in T} Z_t\) satisfies the Bernstein-type inequality with \(s=1\) and \(V=2/\eta \), and so

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge \alpha \right) \ge \frac{\alpha ^2/2}{Ts + \alpha V/3} \ge \frac{\alpha ^2/2}{T + 2\alpha /3\eta } . \end{aligned}$$

Hence, for \(a \ge 3\) we have that

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge a \eta T\right) \ge \frac{a^2/2}{1 + 2a/3} \eta ^2 T \ge \frac{a}{2} \eta ^2 T. \end{aligned}$$

For \(\eta \ge \sqrt{2\log {m}/aT}\), the above probability is at most \(e^{-\log {m}} = 1/m\). Letting \(a=2\) we obtain the statement of the lemma.

Lemma 25

For \(\sqrt{\log {m}/T} \le \eta \le 1/4\), with probability at least \(1-O(1/m)\),

$$\begin{aligned} \Big | \sum _{t\in [T]}\mu _t(i_t) - \sum _{t\in [T]}p_t ^{\top }\mu _t \Big | \le 4 C \eta T . \end{aligned}$$

Proof

Let \(Z_t \equiv \mu _t(i_t) - p_t ^{\top }\mu _t \), where now \(\mu _t\) is a constant vector and \(i_t\) is the random variable, and consider the filtration given by

$$\begin{aligned} S_t \equiv (x_t, p_t, w_t, y_t, v_{t-1}, i_{t-1}, j_{t-1}, Z_{t-1}). \end{aligned}$$

The expectation of \(\mu _t(i_t) \), conditioning on \(S_t\) with respect to the random choice of \(i_t\), is \(p_t ^{\top }\mu _t\). Hence \(\mathbb {E}_t[Z_t] = 0\), where \(\mathbb {E}_t[\cdot ]\) denotes \(\mathbb {E}[\cdot |S_t]\). The parameters \(|Z_t|\) and \(\mathbb {E}[Z_t^2] \) can be bounded as follows:

$$\begin{aligned} |Z_t|&\le |\mu _t(i)| + \left| p_t ^{\top }\mu _t\right| \le 2C,\\ \mathbb {E}\left[ Z_t^2\right]&= \mathbb {E}\left[ \left( \mu _t(i) - p_t ^{\top }\mu _t\right) ^2 \right] \le 2 \mathbb {E}\left[ \mu _t(i)^2\right] + 2 \left( p_t ^{\top }\mu _t\right) ^2 \le 4C^2 . \end{aligned}$$

Applying Lemma 22 to \(Z\equiv \sum _{t\in T} Z_t\), with parameters \(s = 4C^2,\ V = 2C\), we obtain

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge \alpha \right) \ge \frac{\alpha ^2/2}{4C^2T + 2C\alpha /3}. \end{aligned}$$

Hence, for \(\eta \le 1/a\) we have that

$$\begin{aligned} -\log {{\mathrm{Pr}}}\left( Z \ge a C \eta T\right) \ge \frac{a^2 \eta ^2 T/2}{4 + 2 a \eta /3} \ge \frac{a^2}{10} \eta ^2 T \end{aligned}$$

and if \(\eta \ge \sqrt{10 \log {m}/a^2 T}\), the above probability is no more than 1 / m. Letting \(a=4\) we obtain the lemma.

The following is a proof of Lemma 13.

Proof

For all \(i\in [m]\) it holds that \(\mathbb {E}[v_t(i)^2] \le \mathbb {E}[\tilde{v}_t(i)^2] \le 4\). Thus since \(p_t\) is a distribution we have that \(\mathbb {E}\left[ {\sum _{t\in {[T]}}p_t^{\top }v_t^2}\right] \le 4T\).

The result follows from applying Markov’s inequality to the random variable \(\sum _{t\in {[T]}}p_t^{\top }v_t^2\).

The following is a proof of Lemma 11.

Proof

The proof relies on the analysis for the Lanczos method in [18], Theorem 4.2.

According to [18], given a positive semi-definite matrix M such that \(\Vert {M}\Vert _2 \le \rho \) and parameter \(\epsilon , \delta > 0\), the Lanczos method returns in time \(O\left( {\frac{N}{\sqrt{\epsilon }}\log \frac{n}{\delta }}\right) \) and with probability at least \(1-\delta \) a vector x such that

$$\begin{aligned} x^{\top }Mx \ge \lambda _{\max }(M)(1-\epsilon ) . \end{aligned}$$

In our case M need not be positive-semidefinite. Given M such that \(\Vert {M}\Vert _2 \le \rho \), we define \(M' = M + \rho {}\mathbf I \). Thus \(M'\) is positive-semidefinite and it holds that \(\Vert {M'}\Vert _2 \le 2\rho \). Thus if we apply the Lanczos procedure with error parameter \(\epsilon ' = \epsilon /(2\rho )\) we get a unit vector x such that

$$\begin{aligned} x^{\top }M'x \ge \lambda _{\max }(M') - \frac{\epsilon }{2\rho }\lambda _{\max }(M') \ge \lambda _{\max }(M') - \epsilon . \end{aligned}$$

Now it holds that

$$\begin{aligned} x^{\top }M'x= & {} x^{\top }Mx + x^{\top }\rho {}\mathbf I x = x^{\top }Mx + \rho \\\ge & {} \lambda _{\max }(M' ) - \epsilon = \lambda _{\max }(M) + \rho - \epsilon . \end{aligned}$$

Thus,

$$\begin{aligned} x^{\top }Mx \ge \lambda _{\max }(M) - \epsilon . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garber, D., Hazan, E. Sublinear time algorithms for approximate semidefinite programming. Math. Program. 158, 329–361 (2016). https://doi.org/10.1007/s10107-015-0932-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-015-0932-z

Keywords

Mathematics Subject Classification

Navigation