Skip to main content
Log in

Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

W-graph refers to a general class of random graph models that can be seen as a random graph limit. It is characterized by both its graphon function and its motif frequencies. In this paper, relying on an existing variational Bayes algorithm for the stochastic block models (SBMs) along with the corresponding weights for model averaging, we derive an estimate of the graphon function as an average of SBMs with increasing number of blocks. In the same framework, we derive the variational posterior frequency of any motif. A simulation study and an illustration on a social network complete our work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Airoldi, E.M., Costa, T.B., Chan, S.H.: Stochastic blockmodel approximation of a graphon: theory and consistent estimation. Adv. Neural Inf. Process. Syst. 692–700 (2013)

  • Asta, D., Shalizi, C.R.: Geometric Network Comparison. Technical report (2014). arXiv:1411.1350v1

  • Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Barbour, A., Reinert, G.: Discrete small world networks. Electron. J. Probab. 11(47), 1234–1283 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Beal, J.M., Ghahramani, Z.: The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayesian Stat. 7, 543–552 (2003)

    MathSciNet  Google Scholar 

  • Bhattacharyya, S., Bickel, P.J.: Subsampling bootstrap of count features of networks. Ann. Stat. 43(6), 2384–2411 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Bickel, P., Chen, A.: A non parametric view of network models and Newman–Girvan and other modularities. Proc. Natl Acad. Sci. USA 106, 21068–21073 (2009)

    Article  Google Scholar 

  • Bickel, P., Chen, A., Levina, E.: The method of moments and degree distributions for network models. Ann. Stat. 39(5), 2280–2301 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Bollobás, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous random graphs. Random Struct. Algorithms 31(1), 3–122 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Borgs, C., Chayes, J., Cohn, H., Ganguly, S.: Consistent Nonparametric Estimation for Heavy-Tailed Sparse Graphs. Technical report (2015). arXiv:1508.06675

  • Celisse, A., Daudin, J.-J., Pierre, L.: Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6, 1847–1899 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Chan, S., Airoldi, E.: A consistent histogram estimator for exchangeable graph models. J. Mach. Learn. Res. Conf. Proc. 32, 208–216 (2014)

    Google Scholar 

  • Channarond, A., Daudin, J.-J., Robin, S.: Classification and estimation in the stochastic block model based on the empirical degrees. Electron. J. Stat. 6, 2574–2601 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee, S.: Matrix estimation by universal singular value thresholding. Ann. Stat. 43(1), 177–214 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Daudin, J.-J., Picard, F., Robin, S.: A mixture model for random graphs. Stat. Comput. 18(2), 173–183 (2008)

    Article  MathSciNet  Google Scholar 

  • Diaconis, P., Janson, S.: Graph limits and exchangeable random graphs. Rend. Mat. Appl. 7(28), 33–61 (2008)

    MathSciNet  MATH  Google Scholar 

  • Gazal, S., Daudin, J.-J., Robin, S.: Accuracy of variational estimates for random graph mixture models. J. Stat. Comput. Simul. 82(6), 849–862 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl Acad. Sci. USA 99(12), 7821 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Gouda, A., Szántai, T.: On numerical calculation of probabilities according to Dirichlet distribution. Ann. Oper. Res. 177, 185–200 (2010). doi:10.1007/s10479-009-0601-9

    Article  MathSciNet  MATH  Google Scholar 

  • Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382–417 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Hoff, P.: Modeling homophily and stochastic equivalence in symmetric relational data. Adv. Neural Inf. Process. Syst. 20, 657–664 (2008)

    Google Scholar 

  • Kallenberg, O.: Multivariate sampling and the estimation problem for exchangeable arrays. J. Theor. Probab. 12(3), 859–883 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Latouche, P., Birmelé, E., Ambroise, C.: Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5(1), 309–336 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Latouche, P., Birmelé, E., Ambroise, C.: Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12(1), 93–115 (2012)

    Article  MathSciNet  Google Scholar 

  • Lloyd, J., Orbanz, P., Ghahramani, Z., Roy, D.: Random function priors for exchangeable arrays with applications to graphs and relational data. Adv. Neural Inf. Process. Syst. 998–1006 (2012)

  • Lovász, L., Szegedy, B.: Limits of dense graph sequences. J. Comb. Theory B 96(6), 933–957 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Mariadassou, M., Robin, S., Vacher, C.: Uncovering latent structure in valued graphs: a variational approach. Ann. Appl. Stat. 4(2), 715–742 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Networks motifs: simple building blocks of complex networks. Science 298, 824–827 (2002)

    Article  Google Scholar 

  • Nowicki, K., Snijders, T.: Estimation and prediction for stochastic block-structures. J. Am. Stat. Assoc. 96, 1077–1087 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Palla, G., Lovasz, L., Vicsek, T.: Multifractal network generator. Proc. Natl Acad. Sci. USA 107(17), 7640–7645 (2010)

    Article  Google Scholar 

  • Picard, F., Daudin, J.-J., Koskas, M., Schbath, S., Robin, S.: Assessing the exceptionality of network motifs. J. Comput. Biol. 15(1), 1–20 (2008)

    Article  MathSciNet  Google Scholar 

  • Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph models for social networks. Soc. Netw. 29, 173–191 (2007)

    Article  Google Scholar 

  • Stark, D.: Compound Poisson approximations of subgraph counts in random graphs. Random Struct. Algorithms 18(1), 39–60 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Volant, S., Magniette, M.-L.M., Robin, S.: Variational Bayes approach for model aggregation in unsupervised classification with Markovian dependency. Comput. Stat. Data Anal. 56(8), 2375–2387 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)

    Article  Google Scholar 

  • Wolfe, P.J., Olhede, S.C.: Nonparametric graphon estimation. Technical report (2013). arXiv:1309.5936

  • Yang, J., Han, Q., Airoldi, E.: Nonparametric estimation and testing of exchangeable graph models. J. Mach. Learn. Res. Conf. Proc. 30, 1060–1067 (2014)

    Google Scholar 

  • Zanghi, H., Ambroise, C., Miele, V.: Fast online graph clustering via Erdös Renyi mixture. Pattern Recognit. 41(12), 3592–3599 (2008)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors thanks Stevenn Volant for helpful comments and discussions. The authors also thank the anonymous reviewer for his helpful remarks on our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Latouche.

Appendix

Appendix

1.1 Inference of the function W

Proof of Proposition 1

The first part is straightforward, based on a conditioning of the binnings of u and v

$$\begin{aligned} \widetilde{p}(w(u,\,v)|\mathbf{X},\,Q)= & {} \widetilde{p}\left( \pi _{C(u),\,C(v)}|\mathbf{X},\,Q\right) \\= & {} \sum _{q \le \ell } \widetilde{p}\left( \pi _{q, \ell } | \mathbf{X},\, Q,\, C(u) = q,\, C(v) = \ell \right) \\&\quad \widetilde{\Pr }\{C(u) = q,\, C(v) = \ell | \mathbf{X},\,Q\} \\= & {} \sum _{q \le \ell } b\left( w;\, \eta _{q, \ell },\, \zeta _{q, \ell }\right) \\&\quad \widetilde{\Pr }\{C(u) = q,\,C(v) = \ell | \mathbf{X},\, Q\}. \end{aligned}$$

We are now left with the calculation of

$$\begin{aligned} \widetilde{\Pr }\{C(u)= & {} q,\,C(v) = \ell | \mathbf{X},\, Q\} = \widetilde{\Pr }\left\{ \sigma _{q-1} < u < \sigma _q,\, \sigma _{\ell -1}\right. \\< & {} \left. v < \sigma _\ell | \mathbf{X},\, Q\right\} \\= & {} F_{{q-1}, {\ell -1}}(u,\, v;\, \mathbf{a})- F_{{q}, {\ell -1}}(u,\, v;\, \mathbf{a})\\&\quad - F_{{q-1}, {\ell }}(u,\, v;\, \mathbf{a}) \\&\quad + F_{{q}, {\ell }}(u,\, v;\, \mathbf{a}), \end{aligned}$$

where

  • \(\mathbf{a},\,\varvec{\eta }\) and \(\varvec{\zeta }\) are the parameters of the variational Bayes posterior distributions;

  • \(b(\cdot ;\,\varvec{\eta },\,\varvec{\zeta })\) stands for the pdf of the Beta distribution \(\text{ Beta }(\varvec{\eta },\, \varvec{\zeta });\)

  • \(F_{q, \ell }(u,\,v;\,\mathbf{a})\) denotes the joint cdf of \((\sigma _q,\, \sigma _\ell ),\) as defined in (1), when \({\varvec{\alpha }}\) has a Dirichlet distribution \(\text{ Dir }(\mathbf{a}).\)

The last argument comes from Gouda and Szántai (2010) who give explicit recursions to compute the uni- and bi-variate cdf for the Dirichlet \(\text{ Dir }(\mathbf{a}),\) denoted \( G_{q}(u;\,\mathbf{a})\) and \(G_{q, \ell }(u,\, v;\, \mathbf{a}),\) respectively.

Reminding that the approximate variational posterior of \({\varvec{\alpha }}\) is \(\text{ Dir }(\mathbf{a})\) and using a simple property of the Dirichlet distribution

$$\begin{aligned} ({\varvec{\alpha }})\sim & {} \text{ Dir }(\mathbf{a}) \;\Rightarrow \;\left( \sum _{j=1}^q \alpha _j,\,\sum _{j=q+1}^\ell \alpha _j,\,\sum _{j=\ell +1}^Q\alpha _j\right) \\\sim & {} \text{ Dir }\left( \sum _{j=1}^q a_j,\,\sum _{j=q+1}^\ell a_j,\, \sum _{j=\ell +1}^Q a_j\right) , \end{aligned}$$

the calculation of \(F_{q, \ell }(u,\,v)\) follows as

$$\begin{aligned} F_{q, \ell }(u,\,v)= & {} \widetilde{\Pr }\left\{ \sigma _q < u,\,\sigma _\ell < v | \mathbf{X},\,Q\right\} \\= & {} \widetilde{\Pr }\left\{ \sigma _q < u,\, 1 - \sigma _\ell > 1 - v | \mathbf{X},\, Q\right\} \\= & {} \widetilde{\Pr }\left\{ \sigma _q < u | \mathbf{X},\,Q\right\} \\&- \Pr \left\{ \sigma _q < u,\, \sigma _\ell < 1 - v | \mathbf{X},\,Q\right\} \\= & {} G_1\left( u;\,\left[ s_q,\,s_\ell -s_q,\,s_Q-s_\ell \right] \right) \\&- G_{1, 3}\left( u,\, 1-v;\,\left[ s_q,\,s_\ell -s_q,\,s_Q-s_\ell \right] \right) , \end{aligned}$$

where the \((s_q)\) are the cumulated parameters: \(s_q = \sum _{j=1}^q a_j.\) \(\square \)

1.2 Motif probability

Proof of Proposition 3

We directly write the approximate variational expectation

$$\begin{aligned} \begin{aligned} \widetilde{{\mathbb {E}}}[\mu (\mathbf{m}) | \mathbf{X},\,Q]&= \int \int {\mathbb {E}}[\mu (\mathbf{m})|{\varvec{\alpha }},\,{\varvec{\pi }}] \widetilde{p}({\varvec{\alpha }},\, {\varvec{\pi }}|{\mathbf{X},\,Q})\text{ d }{\varvec{\alpha }}\text{ d }{\varvec{\pi }}\\&= \int \int \left\{ \sum _{\mathbf{c}} {\mathbb {E}}[\mu (\mathbf{m}) |\mathbf{c},\,{\varvec{\pi }}] p(\mathbf{c}|{\varvec{\alpha }})\right\} \\&\quad \, \widetilde{p}({\varvec{\alpha }},\, {\varvec{\pi }}|{\mathbf{X},\,Q})\text{ d }{\varvec{\alpha }}\text{ d }{\varvec{\pi }}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned}&p(\mathbf{c}|{\varvec{\alpha }}) = \prod _{1 \le a \le k} p\left( c_a|{\varvec{\alpha }}\right) \\&\quad = \prod _{1 \le a \le k}\prod _{1 \le q \le Q}\alpha _{q}^{{\mathbb {I}}\{c_a = q\}} = \prod _{1 \le q \le Q} \alpha _{q}^{n_{q}^{\mathbf{c}}}. \end{aligned}$$

Furthermore, we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[\mu (\mathbf{m}) |\mathbf{c},\,{\varvec{\pi }}]&= \Pr \left\{ \prod _{1 \le a<b \le k}X_{{a}{b}}^{m_{ab}} = 1| \mathbf{c},\,{\varvec{\pi }}\right\} \\&= \prod _{1 \le a<b \le k}\Pr \left\{ X_{{a}{b}} = 1 | c_a,\, c_b,\, {\varvec{\pi }}\right\} ^{m_{ab}} \\&= \prod _{1 \le a<b \le k}\prod _{1 \le q ,\,\ell \le Q} \pi _{q\ell }^{{\mathbb {I}}\{c_a=q\}{\mathbb {I}}\{c_b=\ell \}m_{ab}} \\&= \prod _{1 \le q < \ell \le Q}\prod _{a \ne b}\pi _{q\ell }^{{\mathbb {I}}\{c_a=q\}{\mathbb {I}}\{c_b=\ell \}m_{ab}}\\&\quad \prod _{1 \le q \le Q}\prod _{1 \le a<b \le k}\pi _{qq}^{{\mathbb {I}}\{c_a=q\}{\mathbb {I}}\{c_b=q\}m_{ab}}\\&= \prod _{1 \le q \le \ell \le Q} \pi _{q\ell }^{\eta _{q\ell }^{\mathbf{c}}}, \end{aligned} \end{aligned}$$

so we end up with

$$\begin{aligned} \widetilde{{\mathbb {E}}}[\mu (\mathbf{m}) | \mathbf{X},\,Q]= & {} \int \int \sum _{\mathbf{c}} \prod _{1 \le q \le \ell \le Q} \pi _{q\ell }^{\eta _{q\ell }^{\mathbf{c}}} \prod _{1 \le q \le Q}\alpha _{q}^{n_{q}^{\mathbf{c}}}\\&\quad \widetilde{p}({\varvec{\alpha }},{\varvec{\pi }}| {Q})\,\text{ d }{\varvec{\alpha }}\,\text{ d }{\varvec{\pi }}\\= & {} \int \int \sum _{\mathbf{c}}\prod _{1 \le q \le \ell \le Q} \pi _{q\ell }^{\eta _{q\ell }^{\mathbf{c}}} \prod _{1 \le q \le Q} \alpha _{q}^{n_{q}^{\mathbf{c}}}\\&\quad \prod _{1 \le q \le \ell \le Q} \frac{\Gamma (\eta _{q\ell }+\zeta _{q\ell })}{\Gamma (\eta _{q\ell })\Gamma (\zeta _{q\ell })}\pi _{q\ell }^{\eta _{q\ell }-1}\\&\times \left( 1-\pi _{q\ell }\right) ^{\zeta _{q\ell }-1}\\&\quad \frac{\Gamma \left( \sum _{1 \le q \le Q}n_{q}\right) }{\prod _{1 \le q \le Q}\Gamma (n_{q})}\prod _{1 \le q \le Q}\alpha _{q}^{n_{q}-1}\text{ d }{\varvec{\alpha }}\text{ d }{\varvec{\pi }}\\= & {} \sum _{\mathbf{c}} \prod _{1 \le q \le \ell \le Q} \frac{\Gamma (\eta _{q\ell }+\zeta _{q\ell })}{\Gamma (\eta _{q\ell })\Gamma (\zeta _{q\ell })}\int \pi _{q\ell }^{\eta _{q\ell } + n_{q\ell }^{\mathbf{c}}-1}\\&\quad \left( 1-\pi _{q\ell }\right) ^{\zeta _{q\ell }-1} \text{ d }\pi _{q\ell } \\&\quad \frac{\Gamma \left( \sum _{1 \le q \le Q}n_{q}\right) }{\prod _{1 \le q \le Q}\Gamma (n_{q})}\prod _{1 \le q \le Q}\int \alpha _{q}^{n_{q}+n_{q}^{\mathbf{c}}-1}\text{ d }{\varvec{\alpha }}_{q} \\= & {} \sum _{\mathbf{c}} \prod _{1 \le q \le \ell \le Q} \frac{\Gamma (\eta _{q\ell }+\zeta _{q\ell })}{\Gamma (\eta _{q\ell })\Gamma (\zeta _{q\ell })}\\&\quad \frac{\Gamma (\eta _{q\ell }+\eta _{q\ell }^{\mathbf{c}})\Gamma (\zeta _{q\ell })}{\Gamma (\eta _{q\ell }+\eta _{q\ell }^{\mathbf{c}}+\zeta _{q\ell })}\\&\quad \frac{\Gamma \left( \sum _{1 \le q \le Q}n_{q}\right) }{\prod _{1 \le q \le Q}\Gamma (n_{q})}\frac{\prod _{1 \le q \le Q}\Gamma (n_{q}+n_{q}^{\mathbf{c}})}{\Gamma \sum _{1 \le q \le Q} (n_{q}+n_{q}^{\mathbf{c}}) }, \end{aligned}$$

and the proof is completed (Fig. 8). \(\square \)

Proof of Proposition 2

Because the \(Z_i\)’s are uniformly distributed over \([0;\,1],\) we have

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Latouche, P., Robin, S. Variational Bayes model averaging for graphon functions and motif frequencies inference in W-graph models. Stat Comput 26, 1173–1185 (2016). https://doi.org/10.1007/s11222-015-9607-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-015-9607-0

Keywords

Mathematics Subject Classification

Navigation