Advertisement

Probability Theory and Related Fields

, Volume 162, Issue 3–4, pp 431–461 | Cite as

Reconstruction and estimation in the planted partition model

  • Elchanan Mossel
  • Joe NeemanEmail author
  • Allan Sly
Article

Abstract

The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on \(n\) nodes with two equal-sized clusters, with an between-class edge probability of \(q\) and a within-class edge probability of \(p\). Although most of the literature on this model has focused on the case of increasing degrees (ie. \(pn, qn \rightarrow \infty \) as \(n \rightarrow \infty \)), the sparse case \(p, q = O(1/n)\) is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if \(p = a/n\) and \(q = b/n\), then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if \((a - b)^2 > 2(a + b)\), and impossible if \((a - b)^2 < 2(a + b)\). By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if \((a - b)^2 > C (a + b)\) for some sufficiently large \(C\). We prove half of their prediction, showing that it is indeed impossible to cluster if \((a - b)^2 < 2(a + b)\). Furthermore we show that it is impossible even to estimate the model parameters from the graph when \((a - b)^2 < 2(a + b)\); on the other hand, we provide a simple and efficient algorithm for estimating \(a\) and \(b\) when \((a - b)^2 > 2(a + b)\). Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

Mathematics Subject Classification (2010)

Primary 05C80 Secondary 60J85 90B15 91D30 

Notes

Acknowledgments

A.S. would like to thank Christian Borgs for suggesting the problem and Lenka Zdeborová for useful discussions. Part of this work was done while A.S. was at Microsoft Research, Redmond. The authors would also like to thank Lenka Zdeborová for comments on a draft of this work, and the two anonymous reviewers for further helpful comments.

References

  1. 1.
    Bickel, P.J., Chen, A.: A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. 106(50), 21068–21073 (2009)CrossRefGoogle Scholar
  2. 2.
    Bleher, P.M., Ruiz, J., Zagrebnov, V.A.: On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice. J. Stat. Phys. 79(1), 473–482 (1995)zbMATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    Bollobás, B.: Random Graphs, 2nd edn. Cambridge University Press, Cambridge (2001)CrossRefGoogle Scholar
  4. 4.
    Bollobás, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous random graphs. Random Struct. Alg. 31(1), 3–122 (2007)Google Scholar
  5. 5.
    Boppana, R.B.: Eigenvalues and graph bisection: an average-case analysis. In: 28th Annual Symposium on Foundations of Computer Science, pp. 280–285. IEEE (1987)Google Scholar
  6. 6.
    Bui, T.N., Chaudhuri, S., Leighton, F.T., Sipser, M.: Graph bisection algorithms with good average case behavior. Combinatorica 7(2), 171–191 (1987)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Coja-Oghlan, A.: Graph partitioning via adaptive spectral techniques. Combinat. Prob. Comput. 19(02), 227–284 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Condon, A., Karp, R.M.: Algorithms for graph partitioning on the planted partition model. Random Struct. Alg. 18(2), 116–140 (2001)zbMATHMathSciNetCrossRefGoogle Scholar
  9. 9.
    Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (Dec 2011)Google Scholar
  10. 10.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological), 39(1), 1–38 (1977)Google Scholar
  11. 11.
    Dyer, M.E., Frieze, A.M.: The solution of some random NP-hard problems in polynomial expected time. J. Alg. 10(4), 451–489 (1989)zbMATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Evans, W., Kenyon, C., Peres, Y., Schulman, L.J.: Broadcasting on trees and the Ising model. Ann. Appl. Prob. 10(2), 410–433 (2000)zbMATHMathSciNetCrossRefGoogle Scholar
  13. 13.
    Friedman, J.: A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Am. Math. Soc. 195(910), viii+100 (2008)Google Scholar
  14. 14.
    Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete graph problems. Theor. Computer Sci. 1(3), 237–267 (1976)zbMATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    Häggström, O., Mossel, E.: Nearest-neighbor walks with low predictability profile and percolation in \(2+\epsilon \) dimensions. Ann. Prob. 26(3), 1212–1231 (1998)Google Scholar
  16. 16.
    Hartuv, E., Shamir, R.: A clustering algorithm based on graph connectivity. Inf. Process. Lett. 76(4), 175–181 (2000)zbMATHMathSciNetCrossRefGoogle Scholar
  17. 17.
    Hodges, J.L., Le Cam, L.: The Poisson approximation to the Poisson binomial distribution. Ann. Math. Stat. 31(3), 737–740 (1960)zbMATHCrossRefGoogle Scholar
  18. 18.
    Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Janson, S.: Random regular graphs: asymptotic distributions and contiguity. Combinat. Prob. Comput. 4(04), 369–405 (1995)zbMATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Janson, S.: Asymptotic equivalence and contiguity of some random graphs. Random Struct. Alg. 36(1), 26–45 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  21. 21.
    Jerrum, M., Sorkin, G.B.: The Metropolis algorithm for graph bisection. Discrete Appl. Math. 82(1–3), 155–175 (1998)zbMATHMathSciNetCrossRefGoogle Scholar
  22. 22.
    Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)CrossRefGoogle Scholar
  23. 23.
    Kesten, H., Stigum, B.P.: A limit theorem for multidimensional Galton–Watson processes. Ann. Math. Stat. 37(5), 1211–1223 (1966)zbMATHMathSciNetCrossRefGoogle Scholar
  24. 24.
    Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: Proceeding of the 17th International Conference on World Wide Web, pp. 695–704. ACM (2008)Google Scholar
  25. 25.
    McSherry, F.: Spectral partitioning of random graphs. In: 42nd IEEE Symposium on Foundations of Computer Science, pp. 529–537. IEEE (2001)Google Scholar
  26. 26.
    Mossel, E.: Survey—information flow on trees. DIMACS Ser. Discrete Math. Theor. Computer Sci. 63, 155–170 (2004)MathSciNetGoogle Scholar
  27. 27.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)Google Scholar
  28. 28.
    Newman, M.E.J., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proc. Natl. Acad. Sci. USA 99(Suppl 1), 2566 (2002)zbMATHCrossRefGoogle Scholar
  29. 29.
    Pritchard, J.K., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945–959 (2000)Google Scholar
  30. 30.
    Robinson, R.W., Wormald, N.C.: Almost all cubic graphs are Hamiltonian. Random Struct. Alg. 3(2), 117–125 (1992)zbMATHMathSciNetCrossRefGoogle Scholar
  31. 31.
    Robinson, R.W., Wormald, N.C.: Almost all regular graphs are Hamiltonian. Random Struct. Alg. 5(2), 363–374 (1994)zbMATHMathSciNetCrossRefGoogle Scholar
  32. 32.
    Rohe, K., Chatterjee, S., Yu, B.: Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 39(4), 1878–1915 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  33. 33.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Anal. Mach. Intell. IEEE Trans. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  34. 34.
    Sly, A.: Reconstruction for the Potts model. Ann. Prob. 39(4), 1365–1406 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  35. 35.
    Snijders, T.A.B., Nowicki, K.: Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classif. 14(1), 75–100 (1997)zbMATHMathSciNetCrossRefGoogle Scholar
  36. 36.
    Sonka, M., Hlavac, V., Boyle, R.: Image processing: analysis and machine vision, 4th edn. Cengage Learning, Stamford (2015)Google Scholar
  37. 37.
    Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268–276 (2001)CrossRefGoogle Scholar
  38. 38.
    Wormald, N.C.: Models of random regular graphs. In: Lamb, J.D., Preece, D.A. (eds.) Surveys in Combinatorics 1999. London Mathematical Society Lecture Note Series, vol. 267. Cambridge University Press, Cambridge (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Department of StatisticsUC BerkeleyBerkeleyUSA
  2. 2.Department of Computer ScienceUC BerkeleyBerkeleyUSA
  3. 3.Department of MathematicsAustralian National UniversityCanberraAustralia

Personalised recommendations