Multisection in the Stochastic Block Model Using Semidefinite Programming

Agarwal, Naman; Bandeira, Afonso S.; Koiliaris, Konstantinos; Kolla, Alexandra

doi:10.1007/978-3-319-69802-1_4

Naman Agarwal⁸,
Afonso S. Bandeira⁹,
Konstantinos Koiliaris¹⁰ &
…
Alexandra Kolla¹⁰

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

1437 Accesses
10 Citations

Abstract

We consider the problem of identifying underlying community-like structures in graphs. Toward this end, we study the stochastic block model (SBM) on k-clusters: a random model on n = km vertices, partitioned in k equal sized clusters, with edges sampled independently across clusters with probability q and within clusters with probability p, p > q. The goal is to recover the initial “hidden” partition of [n]. We study semidefinite programming (SDP)-based algorithms in this context. In the regime $p = \frac {\alpha \log (m)}{m}$ and $q = \frac {\beta \log (m)}{m}$, we show that a certain natural SDP-based algorithm solves the problem of exact recovery in the k-community SBM, with high probability, whenever $\sqrt {\alpha } - \sqrt {\beta } > \sqrt {1}$, as long as $k=o(\log n)$. This threshold is known to be the information theoretically optimal. We also study the case when $k=\theta (\log (n))$. In this case however, we achieve recovery guarantees that no longer match the optimal condition $\sqrt {\alpha } - \sqrt {\beta } > \sqrt {1}$, thus leaving achieving optimality for this range an open question.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Indeed by definition any vector $y \in \mathbb {R}_{n | k} \oplus \mathbb {1}$ can be written as $x + \delta \frac {\mathbb {1}}{\sqrt {n}}$ for some δ and $x \in \mathbb {R}_{n | k}$. For the purpose of proving positive definiteness, we can always divide by any positive number and can therefore consider $\frac {y}{\|x\|}$. Also note that we can consider y or − y equivalently and hence can consider the case when δ > 0.

References

E. Abbe, C. Sandon, Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms (2015). Available online at arXiv:1503.00609 [math.PR]
Google Scholar
E. Abbe, A.S. Bandeira, G. Hall, Exact recovery in the stochastic block model (2014). Available online at arXiv:1405.3267 [cs.SI]
Google Scholar
N. Alon, N. Kahale, A spectral technique for coloring random 3-colorable graphs. SIAM J. Comput. 26(6), 1733–1748 (1997)
Article MathSciNet MATH Google Scholar
N. Alon, M. Krivelevich, B. Sudakov, Finding a large hidden clique in a random graph, in Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 25–27 January 1998, San Francisco, CA (1998), pp. 594–598
MATH Google Scholar
E. Arias-Castro, N. Verzelen, Community detection in random networks (2013). Available online at arXiv:1302.7099 [math.ST]
Google Scholar
P. Awasthi, A.S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, R. Ward, Relax, no need to round: integrality of clustering formulations, in 6th Innovations in Theoretical Computer Science (ITCS 2015) (2015)
Book MATH Google Scholar
A.S. Bandeira, Random Laplacian matrices and convex relaxations (2015). Available online at arXiv:1504.03987 [math.PR]
Google Scholar
A.S. Bandeira, R.V. Handel, Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Ann. Probab. 44(4), 2479–2506 (2016)
Article MathSciNet MATH Google Scholar
A.S. Bandeira, M. Charikar, A. Singer, A. Zhu, Multireference alignment using semidefinite programming, in 5th Innovations in Theoretical Computer Science (ITCS 2014) (2014)
MATH Google Scholar
A.S. Bandeira, Y. Chen, A. Singer, Non-unique games over compact groups and orientation estimation in cryo-em (2015). Available at arXiv:1505.03840 [cs.CV]
Google Scholar
R.B. Boppana, Eigenvalues and graph bisection: an average-case analysis, in Proceedings of the 28th Annual Symposium on Foundations of Computer Science, SFCS ’87, Washington, DC (IEEE Computer Society, Washington, 1987), pp. 280–285
Google Scholar
T.N. Bui, S. Chaudhuri, F.T. Leighton, M. Sipser, Graph bisection algorithms with good average case behavior, in 25th Annual Symposium on Foundations of Computer Science, 24–26 October 1984, West Palm Beach, FL (1984), pp. 181–192
Google Scholar
M. Charikar, K. Makarychev, Y. Makarychev, Near-optimal algorithms for unique games, in Proceedings of the Thirty-eighth Annual ACM Symposium on Theory of Computing, STOC ’06, New York, NY (ACM, New York, 2006), pp. 205–214
MATH Google Scholar
Y. Chen, J. Xu, Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices (2014). Available online at arXiv:1402.1267 [stat.ML]
Google Scholar
P. Chin, A. Rao, V. Vu, Stochastic block model and community detection in the sparse graphs: A spectral algorithm with optimal rate of recovery (2015). Available online at: arXiv:1501.05021
Google Scholar
A. Condon, R.M. Karp, Algorithms for graph partitioning on the planted partition model. Random Struct. Algor. 18(2), 116–140 (2001)
Article MathSciNet MATH Google Scholar
A. Decelle, F. Krzakala, C. Moore, L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011)
Article Google Scholar
P. Erdös, A. Renyi, On random graphs. I. Publ. Math. 6, 290–297 (1959)
MATH Google Scholar
U. Feige, J. Kilian, Heuristics for semirandom graph problems. J. Comput. Syst. Sci. 63(4), 639–671 (2001)
Article MathSciNet MATH Google Scholar
A.M. Frieze, M. Jerrum, Improved approximation algorithms for max k-cut and max bisection, in Proceedings of the 4th International IPCO Conference on Integer Programming and Combinatorial Optimization (Springer-Verlag, London, 1995), pp. 1–13
MATH Google Scholar
B. Hajek, Y. Wu, J. Xu, Achieving exact cluster recovery threshold via semidefinite programming (2014). Available online at arXiv:1412.6156 [stat.ML]
Google Scholar
B. Hajek, Y. Wu, J. Xu, Achieving exact cluster recovery threshold via semidefinite programming: extensions (2015). Available online at arXiv:1502.07738 [stat.ML]
Google Scholar
R. Krauthgamer, J. Naor, R. Schwartz, Partitioning graphs into balanced components, in Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’09 (Society for Industrial and Applied Mathematics, Philadelphia, PA, 2009), pp. 942–949
Book Google Scholar
K. Makarychev, Y. Makarychev, A. Vijayaraghavan, Constant factor approximation for balanced cut in the PIE model, in Symposium on Theory of Computing, STOC 2014, New York, NY, May 31–June 03 (2014), pp. 41–49
Google Scholar
L. Massoulié, Community detection thresholds and the weak Ramanujan property, in Symposium on Theory of Computing, STOC 2014, New York, NY, May 31–June 03 (2014), pp. 694–703
Google Scholar
F. McSherry, Spectral partitioning of random graphs, in Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS ’01 (IEEE Computer SocietyWashington, DC, 2001), p. 529
Google Scholar
E. Mossel, J. Neeman, A. Sly, Stochastic block models and reconstruction (2012). Available online at arXiv:1202.1499
Google Scholar
E. Mossel, J. Neeman, A. Sly, A proof of the block model threshold conjecture (2013). Available online at arXiv:1311.4115
Google Scholar
E. Mossel, J. Neeman, A. Sly, Belief propagation, robust reconstruction and optimal recovery of block models, in Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, June 13–15 (2014), pp. 356–370
Google Scholar
E. Mossel, J. Neeman, A. Sly, Consistency thresholds for binary symmetric block models (2014). Available online at arXiv: 1407.1591
Google Scholar
V. Vu, A simple SVD algorithm for finding hidden partitions. Available online at arXiv: 1404.3918 (2014)
Google Scholar
S.-Y. Yun, A. Proutiere, Accurate community detection in the stochastic block model via spectral algorithms (2014). Available online at arXiv: 1412.7335
Google Scholar

Download references

Acknowledgements

Most of the work presented in this paper was conducted while ASB was at Princeton University and partly conducted while ASB was at the Massachusetts Institute of Technology. ASB acknowledges support from AFOSR Grant No. FA9550-12-1-0317, NSF DMS-1317308, NSF DMS-1712730, and NSF DMS-1719545.

Author information

Authors and Affiliations

Princeton University, Princeton, NJ, USA
Naman Agarwal
Department of Mathematics, Courant Institute of Mathematical Sciences and Center for Data Science, New York University, New York, NY, USA
Afonso S. Bandeira
University of Illinois Urbana - Champaign, Urbana, IL, USA
Konstantinos Koiliaris & Alexandra Kolla

Authors

Naman Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Afonso S. Bandeira
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Koiliaris
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Kolla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Afonso S. Bandeira .

Editor information

Editors and Affiliations

Fakultät für Elektrotechnik und Informationstechnik, Technische Universität München, Munich, Bavaria, Germany
Holger Boche
Institut für Telekommunikationssysteme, Technische Universität Berlin, Berlin, Germany
Giuseppe Caire
Department of Electrical & Computer Engineering, Duke University, Durham, North Carolina, USA
Robert Calderbank
Institut für Mathematik, Technische Universität Berlin, Berlin, Germany
Maximilian März
Institut für Mathematik, Technische Universität Berlin, Berlin, Germany
Gitta Kutyniok
Lehrstuhl und Institute für Statistik, RWTH Aachen, Aachen, Germany
Rudolf Mathar

1 Appendix

Forms of Chernoff Bounds and Hoeffding Bounds Used in the Arguments

Theorem 7 (Chernoff)

Suppose X ₁…X _n be independent random variables taking values in {0, 1}. Let X denote their sum and let $\mu = \mathbb {E}[X]$ be its expectation. Then for any δ > 0 it holds that

$$\displaystyle \begin{aligned} \mathbb{P}\left( X > (1 + \delta)\mu\right) < \left(\frac{e^{\delta}}{(1 + \delta)^{(1+\delta)}}\right)^{\mu}\:, \end{aligned} $$

(25)

$$\displaystyle \begin{aligned} \mathbb{P}\left( X < (1 - \delta)\mu\right) < \left(\frac{e^{-\delta}}{(1 - \delta)^{(1-\delta)}}\right)^{\mu} \:. \end{aligned} $$

(26)

A simplified form of the above bound is the following formula (for δ ≤ 1)

$$\displaystyle \begin{aligned}\mathbb{P}\left( X \geq (1 + \delta)\mu\right) \leq e^{-\frac{\delta^2 \mu}{3}}\:, \end{aligned}$$

$$\displaystyle \begin{aligned}\mathbb{P}\left( X \leq (1 - \delta)\mu\right) \leq e^{-\frac{\delta^2 \mu}{2}} \:.\end{aligned}$$

Theorem 8 (Bernstein)

Suppose X ₁…X _n be independent random variables taking values in [−M, M]. Let X denote their sum and let $\mu = \mathbb {E}[X]$ be its expectation, then

$$\displaystyle \begin{aligned} \mathbb{P}\left( |X - \mu| \geq t \right) \leq \exp\left(-\frac{1}{2}\frac{t^2}{\sum_i \mathbb{E}[(X_i - \mathbb{E}[X_i])^2] + Mt/3}\right) \:. \end{aligned}$$

Corollary 1

Suppose X ₁…X _n are i.i.d Bernoulli variables with parameter p. Let σ = σ(X _i) = p(1 − p); then we have that for any r ≥ 0

$$\displaystyle \begin{aligned}\mathbb{P}\left(X \geq \mu + \alpha\sigma\sqrt{n\log(r)}+ \alpha\log(r)\right) \leq e^{-\frac{\alpha\log(r)}{4}} \:.\end{aligned}$$

Proof

We have that nσ ² = np(1 − p) and M = 1. We can now choose $t = \alpha \sigma \sqrt {n\log (r)} + \alpha \log (r)$. This implies that $\frac {n\sigma ^2 + t/3}{t^2} \leq \frac {1}{\log (r)}\left (1/\alpha ^2 + 1/3\alpha \right ) \leq \frac {2}{\alpha \log (r)} $ which implies from Theorem 8 that $\mathbb {P}\left (X > \mu + \alpha \sigma \sqrt {n\log (r)}+ \alpha \log (r)\right ) \leq e^{-\frac {\alpha \log (r)}{4}}.$ □

Theorem 9 (Hoeffding)

Let X ₁…X _n be independent random variables. Assume that the X _i are bounded in the interval [a _i, b _i]. Define the empirical mean of these variables as

$$\displaystyle \begin{aligned} \bar{X} = \frac{\sum_i \bar{X_i}}{n} \:, \end{aligned}$$

then

$$\displaystyle \begin{aligned} \mathbb{P}\left( |\bar{X} - \mathbb{E}[\bar{X}]| \geq t \right) \leq 2\exp\left(- \frac{2n^2t^2}{\sum_{i = 1}^{n} (b_i - a_i)^2}\right) \:. \end{aligned} $$

(27)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Agarwal, N., Bandeira, A.S., Koiliaris, K., Kolla, A. (2017). Multisection in the Stochastic Block Model Using Semidefinite Programming. In: Boche, H., Caire, G., Calderbank, R., März, M., Kutyniok, G., Mathar, R. (eds) Compressed Sensing and its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-69802-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-69802-1_4
Published: 18 January 2018
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-69801-4
Online ISBN: 978-3-319-69802-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics