Skip to main content

Posterior Contraction Rates for Stochastic Block Models

Abstract

With the advent of structured data in the form of social networks, genetic circuits and protein interaction networks, statistical analysis of networks has gained popularity over recent years. The stochastic block model constitutes a classical cluster-exhibiting random graph model for networks. There is a substantial amount of literature devoted to proposing strategies for estimating and inferring parameters of the model, both from classical and Bayesian viewpoints. Unlike the classical counterpart, there is a dearth of theoretical results on the accuracy of estimation in the Bayesian setting. In this article, we undertake a theoretical investigation of the posterior distribution of the parameters in a stochastic block model. In particular, we show that one obtains near-optimal rates of posterior contraction with routinely used multinomial-Dirichlet priors on cluster indicators and uniform or general Beta priors on the probabilities of the random edge indicators. Our theoretical results are corroborated through a small scale simulation study.

This is a preview of subscription content, access via your institution.

Notes

  1. Our result continues to hold for general Beta priors on the edge-inclusion probabilities.

References

  • Abramowitz, M. and Stegun, I. (1964). Handbook of mathematical functions: with formulas, graphs, and mathematical tables. No. 55. Courier Corporation.

  • Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2009). Mixed membership stochastic blockmodels. In: Advances in Neural Information Processing Systems. pp. 33–40.

  • Airoldi, E., Costa, T. and Chan, S. (2013). Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In: Advances in Neural Information Processing Systems. pp. 692–700.

  • Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics 41, 4, 2097–2122.

    MathSciNet  MATH  Google Scholar 

  • Banerjee, S. and Ghosal, S. (2014). Posterior convergence rates for estimating large precision matrices using graphical models. Electronic Journal of Statistics 8, 2, 2111–2137.

    MathSciNet  MATH  Google Scholar 

  • Barron, A. R. (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Univ.

  • Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. The Annals of Statistics 27, 2, 536–561.

    MathSciNet  MATH  Google Scholar 

  • Bickel, P. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proceedings of the National Academy of Sciences106, 50, 21068–21073.

    MATH  Google Scholar 

  • Bickel, P. J., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics 41, 4, 1922–1943.

    MathSciNet  MATH  Google Scholar 

  • Bontemps, D. (2011). Bernstein–von mises theorems for gaussian regression with increasing number of regressors. The Annals of Statistics 39, 5, 2557–2584.

    MathSciNet  MATH  Google Scholar 

  • Castillo, I. and van der Vaart, A. W. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. The Annals of Statistics 40, 4, 2069–2101.

    MathSciNet  MATH  Google Scholar 

  • Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43, 5, 1986–2018. https://doi.org/10.1214/15-AOS1334.

    MathSciNet  MATH  Google Scholar 

  • Channarond, A., Daudin, J. -J. and Robin, S. (2012). Classification and estimation in the stochastic block model based on the empirical degrees. Electronic Journal of Statistics 6, 2574–2601.

    MathSciNet  MATH  Google Scholar 

  • Chatterjee, S. (2014). Matrix estimation by universal singular value thresholding. The Annals of Statistics 43, 1, 177–214.

    MathSciNet  MATH  Google Scholar 

  • Dasgupta, A., Hopcroft, J. E. and McSherry, F. (2004). Spectral analysis of random graphs with skewed degree distributions. IEEE, p. 602–610.

  • Erdős, P. and Rényi, A. (1961). On the evolution of random graphs. Bull. Inst. Internat. Statist 38, 4, 343–347.

    MathSciNet  Google Scholar 

  • Frank, O. and Strauss, D. (1986). Markov graphs. Journal of the American Statistical association 81, 395, 832–842.

    MathSciNet  MATH  Google Scholar 

  • Gao, C., Lu, Y. and Zhou, H. H. (2015). Rate-optimal graphon estimation. The Annals of Statistics 43, 6, 2624–2652.

    MathSciNet  MATH  Google Scholar 

  • Gao, C., van der Vaart, A. W. and Zhou, H. H. (2018). A general framework for bayes structured linear models. arXiv:1506.02174.

  • Geng, J., Bhattacharya, A. and Pati, D. (2018). Probabilistic community detection with unknown number of communities. Journal of American Statistical Association (to appear).

  • Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Annals of Statistics 28, 2, 500–531.

    MathSciNet  MATH  Google Scholar 

  • Ghosal, S. and Roy, A. (2006). Posterior consistency of gaussian process prior for nonparametric binary regression. The Annals of Statistics, 2413–2429.

  • Ghosal, S. and van der Vaart, A. W. (2007). Convergence rates of posterior distributions for noniid observations. The Annals of Statistics 35, 1, 192–223.

    MathSciNet  MATH  Google Scholar 

  • Goldenberg, A., Zheng, A., Fienberg, S. and Airoldi, E. (2010). A survey of statistical network models. Foundations and Trends®;, in Machine Learning 2, 2, 129–233.

    MATH  Google Scholar 

  • Golightly, A. and Wilkinson, D. J. (2005). Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 61, 3, 781–788.

    MathSciNet  MATH  Google Scholar 

  • Hayashi, K., Konishi, T. and Kawamoto, T. (2016). A tractable fully bayesian method for the stochastic block model. arXiv:1602.02256.

  • Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical association 97, 460, 1090–1098.

    MathSciNet  MATH  Google Scholar 

  • Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical association 76, 373, 33–50.

    MathSciNet  MATH  Google Scholar 

  • Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5, 2, 109–137.

    MathSciNet  Google Scholar 

  • Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review E 83, 1, 016107.

    MathSciNet  Google Scholar 

  • Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. Journal of Combinatorial Theory, Series B 96, 6, 933–957.

    MathSciNet  MATH  Google Scholar 

  • McDaid, A., Murphy, T. B., Friel, N. and Hurley, N. (2013). Improved bayesian inference for the stochastic block model with application to large networks. Computational Statistics & Data Analysis 60, 12–31.

    MathSciNet  MATH  Google Scholar 

  • Newman, M. E. J. (2012). Communities, modules and large-scale structure in networks. Nature Physics 8, 1, 25–31.

    Google Scholar 

  • Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96, 455, 1077–1087.

    MathSciNet  MATH  Google Scholar 

  • Pati, D., Bhattacharya, A., Pillai, N. S. and Dunson, D. (2014). Posterior contraction in sparse bayesian factor models for massive covariance matrices. The Annals of Statistics 42, 3, 1102–1130.

    MathSciNet  MATH  Google Scholar 

  • Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73, 5, 689–710.

    MathSciNet  MATH  Google Scholar 

  • Schwartz, L. (1965). On bayes procedures. Probability Theory and Related Fiel 4, 1, 10–26.

    MathSciNet  MATH  Google Scholar 

  • Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification 14, 1, 75–100.

    MathSciNet  MATH  Google Scholar 

  • Suwan, S., Lee, D. S., Tang, R., Sussman, D. L., Tang, M. and Priebe, C. E. (2016). Empirical bayes estimation for the stochastic block model. Electronic Journal of Statistics 10, 1, 761–782.

    MathSciNet  MATH  Google Scholar 

  • Szemerédi, E. (1975). On sets of integers containing no k elements in arithmetic progression. Acta Arith 27, 199-245, 2.

    MathSciNet  MATH  Google Scholar 

  • van der Pas, S., Kleijn, B. and van der Vaart, A. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electronic Journal of Statistics 8, 2, 2585–2618.

    MathSciNet  MATH  Google Scholar 

  • van der Pas, S. L. and van der Vaart, A. W. (2018). Bayesian community detection. Bayesian Analysis 13, 3, 767–796.

    MathSciNet  MATH  Google Scholar 

  • Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. Compressed Sensing, 210–268.

  • Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. Journal of the American Statistical Association 82, 397, 8–19.

    MathSciNet  MATH  Google Scholar 

  • Zhao, Y., Levina, E. and Zhu, J. (2011). Community extraction for social networks. Proceedings of the National Academy of Sciences 108, 18, 7321–7326.

    Google Scholar 

  • Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics 40, 4, 2266–2292.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prasenjit Ghosh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A.1 Proof of Corollary 4.3

Following exactly the same set of arguments as in the proof of Theorem 4.1, we have for all sufficiently large n,

$$ \prod\limits_{r=1}^{k} \prod\limits_{s=1}^{k} [Q_{rs}^{0} - \delta \epsilon_{n}/2, Q_{rs}^{0} + \delta \epsilon_{n}/2] \subset \left\{ Q: \sum\limits_{r=1}^{k} \sum\limits_{s=1}^{k} n_{0r} n_{0s} (Q_{rs} - Q^{0}_{rs})^{2} < n^{2} \delta^{2} {\epsilon_{n}^{2}}\right\}. $$
(4.26)

Since for each (r, s), \(Q_{rs}^{0}\in (\delta ,1-\delta )\), the prior probability of the embedded rectangle \({\prod }_{r=1}^{k}{\prod }_{s=1}^{k}[Q_{rs}^{0}-\delta \epsilon _{n}/2,Q_{rs}^{0}+\delta \epsilon _{n}/2]\) can be bounded below as follows:

$$ \begin{array}{@{}rcl@{}} p\left( \prod\limits_{r=1}^{k} \prod\limits_{s=1}^{k} [Q_{rs}^{0} - \delta \epsilon_{n}/2, Q_{rs}^{0} + \delta \epsilon_{n}/2]\right) &\geq& (\delta\epsilon_{n})^{k^{2}} \prod\limits_{r=1}^{k}\prod\limits_{s=1}^{k} \inf p\left( [Q_{rs}^{0} - \delta \epsilon_{n}/2, Q_{rs}^{0} + \delta \epsilon_{n}/2]\right)\\ &\geq& (\delta\epsilon_{n})^{k^{2}} \left\{\inf p[\delta (1-\epsilon_{n}/2), 1-\delta (1-\epsilon_{n}/2)]\right\}^{k^{2}}\\ &=& \left( \text{Beta}(\beta_{1},\beta_{2})\right)^{-k^{2}}(\delta\epsilon_{n})^{k^{2}} \times \\ && \left\{\inf_{q\in[\delta (1-\epsilon_{n}/2), 1-\delta (1-\epsilon_{n}/2)]} q^{\beta_{1}-1}(1-q)^{\beta_{2}-1} \right\}^{k^{2}}\\ \end{array} $$
(4.27)

where Beta(β1, β2) denotes the standard Beta function with parameters (β1, β2). Next we observe that

$$ \inf_{q\in[\delta (1-\epsilon_{n}/2), 1-\delta (1-\epsilon_{n}/2)]} q^{\beta_{1}-1}(1-q)^{\beta_{2}-1}\geq \psi_{\delta,\epsilon_{n}}(\beta_{1},\beta_{2}), $$
(4.28)

where for each fixed (δ, 𝜖n), the function \(\psi _{\delta ,\epsilon _{n}}\colon (0,\infty )^{2} \rightarrow (0,\infty )\) is defined as

$$ {\psi_{\delta,\epsilon_{n}}(\beta_{1},\beta_{2}) :=} \left\{ \begin{array}{ll} \{\delta (1-\epsilon_{n}/2)\}^{\beta_{1}+\beta_{2}-2} & \text{ if} \beta_{1}\geq 1,\beta_{2}\geq 1\\ \\ \{1-\delta (1-\epsilon_{n}/2)\}^{\beta_{1}-1}\{\delta (1-\epsilon_{n}/2)\}^{\beta_{2}-1} & \text{ if} \beta_{1} < 1,\beta_{2}\geq 1\\ \\ \{\delta (1-\epsilon_{n}/2)\}^{\beta_{1}-1}\{1-\delta (1-\epsilon_{n}/2)\}^{\beta_{2}-1} & \text{ if} \beta_{1} \geq 1,\beta_{2} < 1\\ \\ \{1-\delta (1-\epsilon_{n}/2)\}^{\beta_{1}+\beta_{2}-2} & \text{ if} \beta_{1}< 1,\beta_{2}< 1. \end{array}\right. $$

Using Eqs. 4.264.28, and following exactly the same line of arguments as in the proof of Theorem 4.1, we obtain

$$ \mathbb{E}_{0} \left\{{\varPi}_{n}(U_{n} \mid A) 1_{\mathcal{A}_{n}^{c}} \right\} \leq \sum\limits_{l=M}^{\infty} \left\{e^{-{C_{1}^{2}} l^{2} n^{2} {\epsilon_{n}^{2}} } + \frac{ e^{-{C_{2}^{2}} l^{2} n^{2} {\epsilon_{n}^{2}}} e^{C_{3}n \log k}}{(\delta\epsilon_{n})^{k^{2}}C(\beta_{1},\beta_{2},\delta,\epsilon_{n})^{k^{2}}}\right\}, $$
(4.29)

for some constant \(C(\beta _{1},\beta _{2},\delta ,\epsilon _{n})=\left (\text {Beta}(\beta _{1},\beta _{2})\right )^{-k^{2}}\left (\psi _{\delta ,\epsilon _{n}}(\beta _{1},\beta _{2})\right )^{k^{2}}>0\). Now, for every possible choice of the pair (β1, β2), we note that \(\log C(\beta _{1},\beta _{2},\)\(\delta ,\epsilon _{n})\sim k^{2}\). For instance, suppose β1 ≥ 1, β2 ≥ 1. Then, as δ ∈ (0,1/2) is fixed and \(\epsilon _{n}\rightarrow 0\) as \(n\rightarrow \infty \), \(\log C(\beta _{1},\beta _{2},\delta ,\epsilon _{n})=-k^{2} \log \text {Beta}(\beta _{1},\beta _{2})+(\beta _{1}+\beta _{2}-2)k^{2}\log (\delta (1-\epsilon _{n}/2))\sim k^{2}\). Therefore, for \(n^{2}{\epsilon _{n}^{2}} = k^{2} \{\log n + \log (\delta ^{-1})\} + n \log k\), \(\log C(\beta _{1},\beta _{2},\delta ,\epsilon _{n})=o(n^{2}{\epsilon _{n}^{2}})\) as \(n\rightarrow \infty \). Thus, choosing a large enough constant M > 0 (depending on (β1, β2)), it follows that the above sum in Eq. 4.29 converges to zero for all large values of M which concludes the argument.

A.2 Proof of Theorem 4.4

Observe that the posterior distribution in the case of directed networks can be written as

$$ \begin{array}{@{}rcl@{}} {\varPi}_{n}(U_{n} \mid A) = \frac{ {\int}_{U_{n}} \prod{\prod}_{1\leq i < j \leq n} \frac{ f_{\theta_{ij}}(A_{ij}) }{ f_{\theta_{ij}^{0}}(A_{ij}) } p(dz, dQ) }{ {\int}_{{\varTheta}_{k}} \prod{\prod}_{1\leq i < j \leq n} \frac{ f_{\theta_{ij}}(A_{ij}) }{ f_{\theta_{ij}^{0}}(A_{ij}) } p(dz, dQ) }. \end{array} $$
(4.30)

Observe that the discrepancy measure in Eq. 4.8 can also be written as

$$ \begin{array}{@{}rcl@{}} \frac{1}{n^{2}} \underset{1\leq i < j \leq n}{\sum\sum} (\hat{\theta}_{ij} - \theta_{ij}^{0})^{2} = \frac{1}{2n^{2}} \left\Vert\hat{\theta} - \theta^{0}\right\Vert^{2} . \end{array} $$

for \(\theta , \theta ^{0} \in {{\varTheta }_{k}^{u}}\), defined in Eq. 4.9. Hence, it is straightforward to obtain versions of Lemmata 4.5, 4.6, 4.9 and 4.10 as well as Corollary 4.8 for parameters \(\theta \in {{\varTheta }_{k}^{u}}\). The conclusion then follows by replicating arguments (4.21)-(4.25).

A.3 Additional Simulations Results

Below we present an additional small scale simulation study where we simulate 100 replicates of an SBM network using k = 3 and 5 equi-sized communities with n = 30, 60, and 90 and ρ = 0.3,0.5. We summarize these additional results into Tables 3 and 4 below.

Table 3 MSE (× 102) and standard error (× 103) comparison over 100 replicates
Table 4 Rand Index and standard error (× 103) comparison over 100 replicates

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ghosh, P., Pati, D. & Bhattacharya, A. Posterior Contraction Rates for Stochastic Block Models. Sankhya A 82, 448–476 (2020). https://doi.org/10.1007/s13171-019-00180-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-019-00180-5

Keywords and phrases

  • Bayesian asymptotics
  • Stochastic block models
  • Clustering
  • Multinomial-Dirichlet
  • Networks
  • Posterior contraction
  • Random graphs

AMS (2000) subject classification

  • Primary 62G07
  • 62G20
  • secondary 60K35