A Proof of the Block Model Threshold Conjecture

Abstract

We study a random graph model called the “stochastic block model” in statistics and the “planted partition model” in theoretical computer science. In its simplest form, this is a random graph with two equal-sized classes of vertices, with a within-class edge probability of q and a between-class edge probability of q′.

A striking conjecture of Decelle, Krzkala, Moore and Zdeborová [9], based on deep, non-rigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if q=a/n and q′=b/n, s=(ab)/2 and d=(a+b)/2, then Decelle et al. conjectured that it is possible to efficiently cluster in a way correlated with the true partition if s2>d and impossible if s2<d. By comparison, until recently the best-known rigorous result showed that clustering is possible if s2>Cdlnd for sufficiently large C.

In a previous work, we proved that indeed it is information theoretically impossible to cluster if s2d and moreover that it is information theoretically impossible to even estimate the model parameters from the graph when s2 < d. Here we prove the rest of the conjecture by providing an efficient algorithm for clustering in a way that is correlated with the true partition when s2>d. A different independent proof of the same result was recently obtained by Massoulié [20].

This is a preview of subscription content, access via your institution.

References

  1. [1]

    K. B. Athreya and P. E. Ney: Branching processes, Springer-Verlag, New York, 1972. Die Grundlehren der mathematischen Wissenschaften, Band 196.

    Book  MATH  Google Scholar 

  2. [2]

    P. J. Bickel and A. Chen: A nonparametric view of network models and Newman-Girvan and other modularities, Proceedings of the National Academy of Sciences 106 (2009), 21068–21073.

    Article  MATH  Google Scholar 

  3. [3]

    R. B. Boppana: Eigenvalues and graph bisection: An average-case analysis, in: 28th Annual Symposium on Foundations of Computer Science, 280–285. IEEE, 1987.

    Google Scholar 

  4. [4]

    C. Bordenave: A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. arXiv preprint arXiv:1502.04482, 2015.

    Google Scholar 

  5. [5]

    C. Bordenave, M. Lelarge and L. Massouli: Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs. arXiv preprint arXiv:1501.06087, 2015.

    Google Scholar 

  6. [6]

    T. N. Bui, S. Chaudhuri, F. T. Leighton and M. Sipser: Graph bisection algorithms with good average case behavior, Combinatorica 7 (1987), 171–191.

    MathSciNet  Article  Google Scholar 

  7. [7]

    A. Coja-Oghlan: Graph partitioning via adaptive spectral techniques, Combinatorics, Probability and Computing 19 (2010), 227–284.

    MathSciNet  Article  MATH  Google Scholar 

  8. [8]

    A. Condon and R. M. Karp: Algorithms for graph partitioning on the planted partition model, Random Structures and Algorithms 18 (2001), 116–140.

    MathSciNet  Article  MATH  Google Scholar 

  9. [9]

    A. Decelle, F. Krzakala, C. Moore and L. Zdeborová: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications, Physics Review E 84 (2011) 066106.

    Article  Google Scholar 

  10. [10]

    M. E. Dyer and A. M. Frieze: The solution of some random NP-hard problems in polynomial expected time, Journal of Algorithms 10 (1989), 451–489.

    MathSciNet  Article  MATH  Google Scholar 

  11. [11]

    L. Erdős, A. Knowles, H.-T. Yau and J. Yin: Spectral statistics of Erdős-Rényi graphs II: eigenvalue spacing and the extreme eigenvalues, Communications in Mathematical Physics 314 (2012), 587–640.

    MathSciNet  Article  MATH  Google Scholar 

  12. [12]

    U. Feige and E. Ofek: Spectral techniques applied to sparse random graphs, Random Structures & Algorithms 27 (2005), 251–275.

    MathSciNet  Article  MATH  Google Scholar 

  13. [13]

    A. Flaxman, A. Frieze and T. Fenner: High degree vertices and eigenvalues in the preferential attachment graph, Internet Math. 2 (2005), 1–19.

    MathSciNet  Article  MATH  Google Scholar 

  14. [14]

    O. Guédon and R. Vershynin: Community detection in sparse networks via grothendieck’s inequality. arXiv preprint arXiv:1411.4686, 2014.

    MATH  Google Scholar 

  15. [15]

    P. W. Holland, K. B. Laskey and S. Leinhardt: Stochastic blockmodels: First steps, Social Networks 5 (1983), 109–137.

    MathSciNet  Article  Google Scholar 

  16. [16]

    M. Jerrum and G. B. Sorkin: The Metropolis algorithm for graph bisection, Discrete Applied Mathematics 82 (1998), 155–175.

    MathSciNet  Article  MATH  Google Scholar 

  17. [17]

    H. Kesten and B. P. Stigum: Additional limit theorems for indecomposable multidimensional Galton-Watson processes, Ann. Math. Statist. 37 (1966), 1463–1481.

    MathSciNet  Article  MATH  Google Scholar 

  18. [18]

    F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, Zdeborova L and P. Zhang: Spectral redemption: clustering sparse networks. arXiv:1306.5550, 2013.

    MATH  Google Scholar 

  19. [19]

    J. Leskovec, K. J. Lang, A. Dasgupta and M. W. Mahoney: Statistical properties of community structure in large social and information networks, in: Proceeding of the 17th international conference on World Wide Web, 695–704. ACM, 2008.

    Google Scholar 

  20. [20]

    L. Massoulié: Community detection thresholds and the weak ramanujan property, in: Proceedings of the 46th Annual ACM Symposium on Theory of Computing, 694–703. ACM, 2014.

    Google Scholar 

  21. [21]

    F. McSherry: Spectral partitioning of random graphs, in: 42nd IEEE Symposium on Foundations of Computer Science, 529–537. IEEE, 2001.

    Google Scholar 

  22. [22]

    E. Mossel, J. Neeman and A. Sly: Stochastic block models and reconstruction, Probability Theory and Related Fields, 2014, (to appear).

    Google Scholar 

  23. [23]

    R. R. Nadakuditi and M. E. J Newman: Graph spectra and the detectability of community structure in networks, Physical Review Letters 108 (2012), 188701.

    Article  Google Scholar 

  24. [24]

    K. Rohe, S. Chatterjee and B. Yu: Spectral clustering and the high-dimensional stochastic blockmodel, The Annals of Statistics 39 (2011), 1878–1915.

    MathSciNet  Article  MATH  Google Scholar 

  25. [25]

    B. Roos: Binomial approximation to the poisson binomial distribution: The krawtchouk expansion, Theory of Probability and Its Applications 45 (2001), 258–272.

    MathSciNet  Article  MATH  Google Scholar 

  26. [26]

    T. A. B. Snijders and K. Nowicki: Estimation and prediction for stochastic block-models for graphs with latent block structure, Journal of Classification 14 (1997), 75–100.

    MathSciNet  Article  MATH  Google Scholar 

  27. [27]

    S. H. Strogatz: Exploring complex networks, Nature 410 (2001), 268–276.

    Article  MATH  Google Scholar 

  28. [28]

    T. Tao and V. Vu: Random matrices: the circular law, Communications in Contemporary Mathematics 10 (2008), 261–307.

    MathSciNet  Article  MATH  Google Scholar 

  29. [29]

    P. M. Wood: Universality and the circular law for sparse random matrices, The Annals of Applied Probability 22 (2012), 1266–1300.

    MathSciNet  Article  MATH  Google Scholar 

  30. [30]

    S.-Y. Yun and A. Proutiere: Community detection via random and adaptive sampling, arXiv preprint arXiv:1402.3072, 2014.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Joe Neeman.

Additional information

Supported by NSF grant DMS-1106999, NSF grant CCF 1320105 and DOD ONR grant N000141110140.

Supported by NSF grant DMS-1106999 and DOD ONR grant N000141110140.

Supported by an Alfred Sloan Fellowship and NSF grant DMS-1208338.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mossel, E., Neeman, J. & Sly, A. A Proof of the Block Model Threshold Conjecture. Combinatorica 38, 665–708 (2018). https://doi.org/10.1007/s00493-016-3238-8

Download citation

Mathematics Subject Classification (2000)

  • 05C80