Probability Theory and Related Fields

, Volume 162, Issue 3–4, pp 431–461

# Reconstruction and estimation in the planted partition model

• Elchanan Mossel
• Joe Neeman
• Allan Sly
Article

## Abstract

The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $$n$$ nodes with two equal-sized clusters, with an between-class edge probability of $$q$$ and a within-class edge probability of $$p$$. Although most of the literature on this model has focused on the case of increasing degrees (ie. $$pn, qn \rightarrow \infty$$ as $$n \rightarrow \infty$$), the sparse case $$p, q = O(1/n)$$ is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $$p = a/n$$ and $$q = b/n$$, then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if $$(a - b)^2 > 2(a + b)$$, and impossible if $$(a - b)^2 < 2(a + b)$$. By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $$(a - b)^2 > C (a + b)$$ for some sufficiently large $$C$$. We prove half of their prediction, showing that it is indeed impossible to cluster if $$(a - b)^2 < 2(a + b)$$. Furthermore we show that it is impossible even to estimate the model parameters from the graph when $$(a - b)^2 < 2(a + b)$$; on the other hand, we provide a simple and efficient algorithm for estimating $$a$$ and $$b$$ when $$(a - b)^2 > 2(a + b)$$. Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

## Mathematics Subject Classification (2010)

Primary 05C80 Secondary 60J85 90B15 91D30

## Notes

### Acknowledgments

A.S. would like to thank Christian Borgs for suggesting the problem and Lenka Zdeborová for useful discussions. Part of this work was done while A.S. was at Microsoft Research, Redmond. The authors would also like to thank Lenka Zdeborová for comments on a draft of this work, and the two anonymous reviewers for further helpful comments.

## References

1. 1.
Bickel, P.J., Chen, A.: A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. 106(50), 21068–21073 (2009)
2. 2.
Bleher, P.M., Ruiz, J., Zagrebnov, V.A.: On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice. J. Stat. Phys. 79(1), 473–482 (1995)
3. 3.
Bollobás, B.: Random Graphs, 2nd edn. Cambridge University Press, Cambridge (2001)
4. 4.
Bollobás, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous random graphs. Random Struct. Alg. 31(1), 3–122 (2007)Google Scholar
5. 5.
Boppana, R.B.: Eigenvalues and graph bisection: an average-case analysis. In: 28th Annual Symposium on Foundations of Computer Science, pp. 280–285. IEEE (1987)Google Scholar
6. 6.
Bui, T.N., Chaudhuri, S., Leighton, F.T., Sipser, M.: Graph bisection algorithms with good average case behavior. Combinatorica 7(2), 171–191 (1987)
7. 7.
Coja-Oghlan, A.: Graph partitioning via adaptive spectral techniques. Combinat. Prob. Comput. 19(02), 227–284 (2010)
8. 8.
Condon, A., Karp, R.M.: Algorithms for graph partitioning on the planted partition model. Random Struct. Alg. 18(2), 116–140 (2001)
9. 9.
Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (Dec 2011)Google Scholar
10. 10.
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological), 39(1), 1–38 (1977)Google Scholar
11. 11.
Dyer, M.E., Frieze, A.M.: The solution of some random NP-hard problems in polynomial expected time. J. Alg. 10(4), 451–489 (1989)
12. 12.
Evans, W., Kenyon, C., Peres, Y., Schulman, L.J.: Broadcasting on trees and the Ising model. Ann. Appl. Prob. 10(2), 410–433 (2000)
13. 13.
Friedman, J.: A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Am. Math. Soc. 195(910), viii+100 (2008)Google Scholar
14. 14.
Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete graph problems. Theor. Computer Sci. 1(3), 237–267 (1976)
15. 15.
Häggström, O., Mossel, E.: Nearest-neighbor walks with low predictability profile and percolation in $$2+\epsilon$$ dimensions. Ann. Prob. 26(3), 1212–1231 (1998)Google Scholar
16. 16.
Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76(4), 175–181 (2000)
17. 17.
Hodges, J.L., Le Cam, L.: The Poisson approximation to the Poisson binomial distribution. Ann. Math. Stat. 31(3), 737–740 (1960)
18. 18.
Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983)
19. 19.
Janson, S.: Random regular graphs: asymptotic distributions and contiguity. Combinat. Prob. Comput. 4(04), 369–405 (1995)
20. 20.
Janson, S.: Asymptotic equivalence and contiguity of some random graphs. Random Struct. Alg. 36(1), 26–45 (2010)
21. 21.
Jerrum, M., Sorkin, G.B.: The Metropolis algorithm for graph bisection. Discrete Appl. Math. 82(1–3), 155–175 (1998)
22. 22.
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)
23. 23.
Kesten, H., Stigum, B.P.: A limit theorem for multidimensional Galton–Watson processes. Ann. Math. Stat. 37(5), 1211–1223 (1966)
24. 24.
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: Proceeding of the 17th International Conference on World Wide Web, pp. 695–704. ACM (2008)Google Scholar
25. 25.
McSherry, F.: Spectral partitioning of random graphs. In: 42nd IEEE Symposium on Foundations of Computer Science, pp. 529–537. IEEE (2001)Google Scholar
26. 26.
Mossel, E.: Survey—information flow on trees. DIMACS Ser. Discrete Math. Theor. Computer Sci. 63, 155–170 (2004)
27. 27.
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)Google Scholar
28. 28.
Newman, M.E.J., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proc. Natl. Acad. Sci. USA 99(Suppl 1), 2566 (2002)
29. 29.
Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945–959 (2000)Google Scholar
30. 30.
Robinson, R.W., Wormald, N.C.: Almost all cubic graphs are Hamiltonian. Random Struct. Alg. 3(2), 117–125 (1992)
31. 31.
Robinson, R.W., Wormald, N.C.: Almost all regular graphs are Hamiltonian. Random Struct. Alg. 5(2), 363–374 (1994)
32. 32.
Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39(4), 1878–1915 (2011)
33. 33.
Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Anal. Mach. Intell. IEEE Trans. 22(8), 888–905 (2000)
34. 34.
Sly, A.: Reconstruction for the Potts model. Ann. Prob. 39(4), 1365–1406 (2011)
35. 35.
Snijders, T.A.B., Nowicki, K.: Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classif. 14(1), 75–100 (1997)
36. 36.
Sonka, M., Hlavac, V., Boyle, R.: Image processing: analysis and machine vision, 4th edn. Cengage Learning, Stamford (2015)Google Scholar
37. 37.
Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268–276 (2001)
38. 38.
Wormald, N.C.: Models of random regular graphs. In: Lamb, J.D., Preece, D.A. (eds.) Surveys in Combinatorics 1999. London Mathematical Society Lecture Note Series, vol. 267. Cambridge University Press, Cambridge (1999)Google Scholar