The stochastic topic block model for the clustering of vertices in networks with textual edges


Due to the significant increase of communications between individuals via social media (Facebook, Twitter, Linkedin) or electronic formats (email, web, e-publication) in the past two decades, network analysis has become an unavoidable discipline. Many random graph models have been proposed to extract information from networks based on person-to-person links only, without taking into account information on the contents. This paper introduces the stochastic topic block model, a probabilistic model for networks with textual edges. We address here the problem of discovering meaningful clusters of vertices that are coherent from both the network interactions and the text contents. A classification variational expectation-maximization algorithm is proposed to perform inference. Simulated datasets are considered in order to assess the proposed approach and to highlight its main features. Finally, we demonstrate the effectiveness of our methodology on two real-word datasets: a directed communication network and an undirected co-authorship network.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


  1. Airoldi, E., Blei, D., Fienberg, S., Xing, E.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)

    MATH  Google Scholar 

  2. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, pp. 267–281 (1973)

  3. Ambroise, C., Grasseau, G., Hoebeke, M., Latouche, P., Miele, V., Picard, F.: The mixer R package (version 1.8) (2010).

  4. Bickel, P., Chen, A.: A nonparametric view of network models and newman-girvan and other modularities. Proc. Natl Acad. Sci. 106(50), 21068–21073 (2009)

    MATH  Article  Google Scholar 

  5. Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intel. 7, 719–725 (2000)

    Article  Google Scholar 

  6. Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate gaussian mixture models. Comput. Stat. Data Anal. 41(3–4), 561–575 (2003)

    MathSciNet  MATH  Article  Google Scholar 

  7. Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Int. Comput. Sci. Inst. 4, 126 (1998)

    Google Scholar 

  8. Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural Inf. Process. Syst. 18, 147 (2006)

    Google Scholar 

  9. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 10, 10008–10020 (2008)

    Article  Google Scholar 

  12. Bouveyron, C., Latouche, P., Zreik, R.: The dynamic random subgraph model for the clustering of evolving networks. Comput. Stat. (2016)

  13. Celeux, G., Govaert, G.: A classification em algorithm for clustering and two stochastic versions. Comput. Stat. Q. 2(1), 73–82 (1991)

    MATH  Google Scholar 

  14. Chang, J., Blei, D.M.: Relational topic models for document networks. In: International Conference on Artificial Intelligence and Statistics, pp. 81–88 (2009)

  15. Côme, E., Randriamanamihaga, A., Oukhellou, L., Aknin, P.: Spatio-temporal analysis of dynamic origin-destination data using latent dirichlet allocation. application to the vélib? bike sharing system of paris. In: Proceedings of 93rd Annual Meeting of the Transportation Research Board (2014)

  16. Côme, E., Latouche, P.: Model selection and clustering in stochastic block models with the exact integrated complete data likelihood. Stat. Model. doi:10.1177/1471082X15577017 (2015)

  17. Daudin, J.-J., Picard, F., Robin, S.: A mixture model for random graphs. Stat. Comput. 18(2), 173–183 (2008)

    MathSciNet  Article  Google Scholar 

  18. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)

    Article  Google Scholar 

  19. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  20. Fienberg, S., Wasserman, S.: Categorical data analysis of single sociometric relations. Sociol. Methodol. 12, 156–192 (1981)

    Article  Google Scholar 

  21. Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl Acad. Sci. 99(12), 7821 (2002)

    MathSciNet  MATH  Article  Google Scholar 

  22. Gormley, I.C., Murphy, T.B.: A mixture of experts latent position cluster model for social network data. Stat. Methodol. 7(3), 385–405 (2010)

    MathSciNet  MATH  Article  Google Scholar 

  23. Grun, B., Hornik, K.: The mixer topicmodels package (version 0.2-3). (2013)

  24. Handcock, M., Raftery, A., Tantrum, J.: Model-based clustering for social networks. J. R. Stat. Soc. A 170(2), 301–354 (2007)

    MathSciNet  Article  Google Scholar 

  25. Hathaway, R.: Another interpretation of the EM algorithm for mixture distributions. Stat. Prob. Lett. 4(2), 53–56 (1986)

    MathSciNet  MATH  Article  Google Scholar 

  26. Hoff, P., Raftery, A., Handcock, M.: Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97(460), 1090–1098 (2002)

    MathSciNet  MATH  Article  Google Scholar 

  27. Hofman, J., Wiggins, C.: Bayesian approach to network modularity. Phys. Rev. Lett. 100(25), 258701 (2008)

    Article  Google Scholar 

  28. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 50–57. ACM, New York (1999)

  29. Jernite, Y., Latouche, P., Bouveyron, C., Rivera, P., Jegou, L., Lamassé, S.: The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul. Ann. Appl. Stat. 8(1), 55–74 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  30. Kemp, C., Tenenbaum, J., Griffiths, T., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. Proc. Natl Conf. Artif. Intell. 21, 381–391 (2006)

    Google Scholar 

  31. Latouche, P., Birmelé, E., Ambroise, C.: Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5(1), 309–336 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  32. Latouche, P., Birmelé, E., Ambroise, C.: Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12(1), 93–115 (2012)

    MathSciNet  Article  Google Scholar 

  33. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, Piscataway (2006)

  34. Liu, Y., Niculescu-Mizil, A., Gryc, W. : Topic-link lda: joint models of topic and author community. In: proceedings of the 26th annual international conference on machine learning, pp. 665–672. ACM, New York (2009)

  35. Mariadassou, M., Robin, S., Vacher, C.: Uncovering latent structure in valued graphs: a variational approach. Ann. Appl. Stat. 4(2), 715–742 (2010)

    MathSciNet  MATH  Article  Google Scholar 

  36. Matias, C., Miele, V.: Statistical clustering of temporal networks through a dynamic stochastic block model. Preprint HAL. n.01167837 (2016)

  37. Matias, C., Robin, S.: Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. Surv. 47, 55–74 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  38. McDaid, A., Murphy, T., Friel, N., Hurley, N.: Improved bayesian inference for the stochastic block model with application to large networks. Comput. Stat. Data Anal. 60, 12–31 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  39. McCallum, A., Corrada-Emmanuel, A., Wang, X.: The author-recipient-topic model for topic and role discovery in social networks, with application to enron and academic email, pp. 33–44. In: Workshop on Link Analysis, Counterterrorism and Security (2005)

  40. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. Lett. E. 69, 0066133 (2004)

    Article  Google Scholar 

  41. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Mach. Learn. 39(2–3), 103–134 (2000)

    MATH  Article  Google Scholar 

  42. Nowicki, K., Snijders, T.: Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc. 96(455), 1077–1087 (2001)

    MathSciNet  MATH  Article  Google Scholar 

  43. Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the tenth ACM PODS, pp. 159–168. ACM, New York (1998)

  44. Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD workshop, vol. 8. Citeseer (2008)

  45. Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

  46. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pp. 487–494. AUAI Press, Arlington (2004)

  47. Sachan, M., Contractor, D., Faruquie, T., Subramaniam, L.: Using content and interactions for discovering communities in social networks. In: Proceedings of the 21st international conference on World Wide Web, pp. 331–340. ACM, New York (2012)

  48. Salter-Townshend, M., White, A., Gollini, I., Murphy, T.B.: Review of statistical network analysis: models, algorithms, and software. Stat. Anal. Data Min. 5(4), 243–264 (2012)

    MathSciNet  Article  Google Scholar 

  49. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    MathSciNet  MATH  Article  Google Scholar 

  50. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 306–315. ACM, New York (2004)

  51. Sun, Y., Han, J., Gao, J., Yu, Y.: itopicmodel: Information network-integrated topic modeling. In: Ninth IEEE International Conference on Data Mining, 2009. ICDM’09, pp. 493–502. IEEE, Piscataway (2009)

  52. Teh, Y., Newman, D., Welling, M.: A collapsed variational bayesian inference algorithm for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 18, 1353–1360 (2006)

    Google Scholar 

  53. Than, K., Ho, T.: Fully sparse topic models. Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science. vol. 7523, pp. 490–505. Springer, Berlin (2012)

  54. Wang, Y., Wong, G.: Stochastic blockmodels for directed graphs. J. Am. Stat. Assoc. 82, 8–19 (1987)

    MathSciNet  MATH  Article  Google Scholar 

  55. White, H., Boorman, S., Breiger, R.: Social structure from multiple networks. I. Blockmodels of roles and positions. Am. J. Sociol. 81, 730–780 (1976)

    Article  Google Scholar 

  56. Xu, K., Hero III, A.: Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Social Computing, Behavioral-Cultural Modeling and Prediction, pp. 201–210. Springer, Berlin (2013)

  57. Yang, T., Chi, Y., Zhu, S., Gong, Y., Jin, R.: Detecting communities and their evolutions in dynamic social networks: a bayesian approach. Mach. Learn. 82(2), 157–189 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  58. Zanghi, H., Ambroise, C., Miele, V.: Fast online graph clustering via Erdos–Renyi mixture. Pattern Recognit. 41, 3592–3599 (2008)

    MATH  Article  Google Scholar 

  59. Zanghi, H., Volant, S., Ambroise, C.: Clustering based on random graph model embedding vertex features. Pattern Recognit. Lett. 31(9), 830–836 (2010)

    Article  Google Scholar 

  60. Zhou, D., Manavoglu, E., Li, J., Giles, C., Zha, H.: Probabilistic models for discovering e-communities. In: Proceedings of the 15th international conference on World Wide Web, pp. 173–182. ACM, New York (2006)

Download references


The authors would like to greatly thank the editor and the two reviewers for their helpful remarks on the first version of this paper, and Laurent Bergé for his kind suggestions and the development of visualization tools.

Author information



Corresponding author

Correspondence to P. Latouche.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 253 KB)



Appendix 1: Optimization of R(Z)

The VEM update step for each distribution \(R(Z_{ij}^{dn}), A_{ij}=1\), is given by

$$\begin{aligned} \begin{aligned} \log R(Z_{ij}^{dn})&= \mathrm {E}_{Z^{\backslash i,j,d,n},\theta }[\log p(W|A, Z, \beta ) \\&\quad + \log p(Z|A, Y, \theta )] + \mathrm {const}\\&= \sum _{k=1}^{K} Z_{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv}\\&\quad + \sum _{q,r}^{Q}Y_{iq}Y_{jr}\sum _{k=1}^{K}Z_{ij}^{dnk}\mathrm {E}_{\theta _{qr}}[\log \theta _{qrk}] + \mathrm {const}\\&= \sum _{k=1}^{K}Z_{ij}^{dnk}\left( \sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv} \right. \\&\left. \quad +\sum _{q,r}^{Q}Y_{iq}Y_{jr}\left( \psi (\gamma _{qrk})-\psi \left( \sum _{k=1}^{K}\gamma _{qrk}\right) \right) \right) \\&\quad + \mathrm {const}, \end{aligned} \end{aligned}$$

where all terms that do not depend on \(Z_{ij}^{dn}\) have been put into the constant term \(\mathrm {const}\). Moreover, \(\psi (\cdot )\) denotes the digamma function. The functional form of a multinomial distribution is then recognized in (9)

$$\begin{aligned} R(Z_{ij}^{dn})={\mathcal {M}}\left( Z_{ij}^{dn};1,\phi _{ij}^{dn}=\left( \phi _{ij}^{dn1},\dots , \phi _{ij}^{dnK}\right) \right) , \end{aligned}$$


$$\begin{aligned} \phi _{ij}^{dnk} \propto \left( \prod _{v=1}^{V} \beta _{kv}^{W_{ij}^{dnv}}\right) \prod _{q,r}^{Q}\exp \left( \psi (\gamma _{qrk}-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) ^{Y_{iq}Y_{jr}}{.} \end{aligned}$$

\(\phi _{ij}^{dnk}\) is the (approximate) posterior distribution of words \(W_{ij}^{dn}\) being in topic k.

Appendix 2: Optimization of \(R(\theta )\)

The VEM update step for distribution \(R(\theta )\) is given by

$$\begin{aligned} \begin{aligned} \log R(\theta )&= \mathrm {E}_{Z}[\log p(Z|A, Y, \theta )] + \mathrm {const}\\&= \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{d}}\sum _{q,r}^{Q}Y_{iq}Y_{jr}\\&\quad \times \sum _{k=1}^{K}\mathrm {E}_{Z_{ij}^{dn}}\left[ Z_{ij}^{dnk}\right] \log \theta _{qrk}\\&\quad + \sum _{q,r}^{Q}\sum _{k=1}^{K}(\alpha _k - 1)\log \theta _{qrk} + \mathrm {const}\\&= \sum _{q,r}^{Q}\sum _{k=1}^{K}\left( \alpha _{k} + \sum _{i \ne j}^{M} A_{ij}Y_{iq}Y_{jr}\sum _{d=1}^{N_{ij}^{d}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}-1\right) \\&\qquad \log \theta _{qrk} + \mathrm {const}. \end{aligned} \end{aligned}$$

We recognize the functional form of a product of Dirichlet distributions

$$\begin{aligned} \begin{aligned} R(\theta )= \prod _{q,r}^{Q}\mathrm {Dir}(\theta _{qr};\gamma _{qr}=(\gamma _{qr1},\dots , \gamma _{qrK})), \end{aligned} \end{aligned}$$


$$\begin{aligned} \gamma _{qrk} = \alpha _{k} + \sum _{i \ne j}^{M} A_{ij}Y_{iq}Y_{jr}\sum _{d=1}^{N_{ij}^{d}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}. \end{aligned}$$

Appendix 3: Derivation of the lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \)

The lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \) in (7) is given by

$$\begin{aligned}&\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \nonumber \\&\quad = \sum _{Z}\int _{\theta }R(Z,\theta ) \log \frac{p(W, Z, \theta |A, Y,\beta )}{R(Z,\theta )} \mathrm{d}\theta \nonumber \\&\quad = \mathrm {E}_{Z}[\log p(W|A, Z, \beta )] \nonumber \\&\qquad + \mathrm {E}_{Z, \theta }[\log p(Z|A, Y, \theta )] + \mathrm {E}_{\theta }[\log p(\theta )]\nonumber \\&\qquad - \mathrm {E}_{Z}[\log R(Z)]-\mathrm {E}_{\theta }[\log R(\theta )] \nonumber \\&\quad =\sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv} \nonumber \\&\qquad + \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}} \sum _{q,r}^{Q}Y_{iq}Y_{jr}\nonumber \\&\qquad \times \sum _{k=1}^{K}\phi _{ij}^{dnk}\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \\&\qquad + \sum _{q,r}^{Q}\left( \log \varGamma \left( \sum _{l=1}^{K}\alpha _{k}\right) - \sum _{l=1}^{K}\log \varGamma (\alpha _{l})\right. \nonumber \\&\qquad \left. +\sum _{k=1}^{K}(\alpha _{k}-1)\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \right) \nonumber \\&\qquad - \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\log \phi _{ij}^{dnk}\nonumber \\&\qquad - \sum _{q,r}^{Q}\left( \log \varGamma \left( \sum _{l=1}^{K}\gamma _{qrl}\right) - \sum _{l=1}^{K}\log \varGamma (\gamma _{qrl})\right. \nonumber \\&\qquad \left. +\sum _{k=1}^{K}(\gamma _{qrk}-1)\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \right) .\nonumber \end{aligned}$$

Appendix 4: Optimization of \(\beta \)

In order to maximize the lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \), we isolate the terms in (10) that depend on \(\beta \) and add Lagrange multipliers to satisfy the constraints \(\sum _{v=1}^{V}\beta _{kv}=1,\forall k\)

$$\begin{aligned} \tilde{{\mathcal {L}}}_{\beta }= & {} \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv}\\&+ \sum _{k=1}^{K}\lambda _{k}\left( \sum _{v=1}^{V}\beta _{kv}-1\right) . \end{aligned}$$

Setting the derivative, with respect to \(\beta _{kv}\), to zero, we find

$$\begin{aligned} \beta _{kv}\propto \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}W_{ij}^{dnv}. \end{aligned}$$

Appendix 5: Optimization of \(\rho \)

Only the distribution \(p(Y|\rho )\) in the complete data log-likelihood \(\log p(A, Y|\rho , \pi )\) depends on the parameter vector \(\rho \) of cluster proportions. Taking the log and adding a Lagrange multiplier to satisfy the constraint \(\sum _{q=1}^{Q}\rho _{q}=1\), we have

$$\begin{aligned} \log p(Y|\rho ) = \sum _{i=1}^{M}\sum _{q=1}^{Q}Y_{iq}\log \rho _{q}. \end{aligned}$$

Taking the derivative with respect \(\rho \) to zero, we find

$$\begin{aligned} \rho _{q} \propto \sum _{i=1}^{M}Y_{iq}. \end{aligned}$$

Appendix 6: Optimization of \(\pi \)

Only the distribution \(p(A|Y, \pi )\) in the complete data log-likelihood \(\log p(A, Y|\rho , \pi )\) depends on the parameter matrix \(\pi \) of connection probabilities. Taking the log we have

$$\begin{aligned}&\log p(A|Y, \pi )\\&\quad = \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}\Big (A_{ij}\log \pi _{qr} +(1-A_{ij})\log (1-\pi _{qr})\Big ). \end{aligned}$$

Taking the derivative with respect to \(\pi _{qr}\) to zero, we obtain

$$\begin{aligned} \pi _{qr} = \frac{ \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}A_{ij}}{ \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}}. \end{aligned}$$

Appendix 7: Model selection

Assuming that the prior distribution over the model parameters \((\rho , \pi , \beta )\) can be factorized, the integrated complete data log-likelihood \(\log p(A, W, Y|K, Q)\) is given by

$$\begin{aligned} \begin{aligned}&\log p(A, W, Y|K, Q)\\&\quad = \log \int _{\rho ,\pi ,\beta } p(A, W, Y, \rho , \pi , \beta |K, Q) \mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\theta \\&\quad = \log \int _{\rho ,\pi ,\beta } p(A, W, Y|\rho , \pi , \beta , K, Q)\\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta . \end{aligned} \end{aligned}$$

Note that the dependency on K and Q is made explicit here, in all expressions. In all other sections of the paper, we did not include these terms to keep the notations uncluttered. We find

$$\begin{aligned}&\log p(A, W, Y|K, Q)\nonumber \\&\quad = \log \int _{\rho , \pi , \beta }\left( \sum _{Z}\int _{\theta }p(A, W, Y, Z, \theta |\rho , \pi , \beta , K, Q)\mathrm{d}\theta \right) \nonumber \\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\rho , \pi , \beta } \left( \sum _{Z}\int _{\theta }p(W, Z, \theta |A, Y, \beta , K, Q)p(A, Y|\rho , \pi , Q)\mathrm{d}\theta \right) \nonumber \\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\rho , \pi , \beta }p(W|A, Y, \beta , K, Q) p(A|Y, \pi , Q)p(Y|\rho , Q)\\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\beta }p(W|A, Y, \beta , K, Q)\nonumber \\&\qquad \times p(\beta |K) \mathrm{d}\beta + \log \int _{\pi } p(A|Y, \pi , Q)p(\pi |Q)\mathrm{d}\pi \nonumber \\&\qquad + \log \int _{\rho }p(Y|\rho , Q)p(\rho |Q)\mathrm{d}\rho .\nonumber \end{aligned}$$

Following the derivation of the ICL criterion, we apply a Laplace (BIC-like) approximation on the second term of Eq. (11). Moreover, considering a Jeffreys prior distribution for \(\rho \) and using Stirling formula for large values of M, we obtain

$$\begin{aligned}&\log \int _{\pi } p(A|Y, \pi , Q)p(\pi |Q)\mathrm{d}\pi \\&\quad \approx \max _{\pi }\log p(A|Y, \pi , Q) - \frac{Q^2}{2}\log M(M-1), \end{aligned}$$

as well as

$$\begin{aligned}&\log \int _{\rho }p(Y|\rho , Q)p(\rho |Q)\mathrm{d}\rho \\&\quad \approx \max _{\rho } \log p(Y|\rho , Q) - \frac{Q-1}{2}\log M. \end{aligned}$$

For more details, we refer to Biernacki et al. (2000). Furthermore, we emphasize that adding these two approximations leads to the ICL criterion for the SBM model, as derived by Daudin et al. (2008)

$$\begin{aligned} \begin{aligned} ICL_{SBM}&= \max _{\pi }\log p(A|Y, \pi , Q)\\&\quad - \frac{Q^2}{2}\log M(M-1) + \max _{\rho } \log p(Y|\rho , Q)\\&\quad - \frac{Q-1}{2}\log M \\&= \max _{\rho , \pi } \log p(A,Y|\rho , \pi , Q)\\&\quad - \frac{Q^2}{2}\log M(M-1) - \frac{Q-1}{2}\log M. \end{aligned} \end{aligned}$$

In Daudin et al. (2008), \(M(M-1)\) is replaced by \(M(M-1)/2\) and \(Q^2\) by \(Q(Q+1)/2\) since they considered undirected networks.

Now, it is worth taking a closer look at the first term of Eq. (11). This term involves a marginalization over \(\beta \). Let us emphasize that \(p(W|A, Y, \beta , K, Q)\) is related to the LDA model and involves a marginalization over \(\theta \) (and Z). Because we aim at approximating the first term of Eq. (11), also with a Laplace (BIC-like) approximation, it is crucial to identify the number of observations in the associated likelihood term \(p(W|A, Y, \beta , K, Q)\). As pointed out in Sect. 2.4, given Y (and \(\theta \)), it is possible to reorganize the documents in W as \(W=({\tilde{W}}_{qr})_{qr}\) is such a way that all words in \({\tilde{W}}_{qr}\) follow the same mixture distribution over topics. Each aggregated document \({\tilde{W}}_{qr}\) has its own vector \(\theta _{qr}\) of topic proportions and since the distribution over \(\theta \) factorizes (\(p(\theta )=\prod _{q,r}^{Q}p(\theta _{qr}))\), we find

$$\begin{aligned} \begin{aligned}&p(W|A, Y, \beta , K, Q)\\&\quad = \int _{\theta } p(W |A, Y, \theta , \beta , K, Q)p(\theta |K, Q)\mathrm{d}\theta \\&\quad = \prod _{q,r}^{Q}\int _{\theta _{qr}}p({\tilde{W}}_{qr}|\theta _{qr}, \beta , K, Q)p(\theta _{qr}| K)\mathrm{d}\theta _{qr} \\&\quad = \prod _{q,r}^{Q} \ell ({\tilde{W}}_{qr}|\beta , K, Q), \end{aligned} \end{aligned}$$

where \(\ell ({\tilde{W}}_{qr}|\beta , K, Q)\) is exactly the likelihood term of the LDA model associated with document \({\tilde{W}}_{qr}\), as described in Blei et al. (2003). Thus

$$\begin{aligned}&\log \int _{\beta }p(W|A, Y, \beta , K, Q) p(\beta |K) \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\beta } p(\beta |K) \prod _{q,r}^{Q} \ell ({\tilde{W}}_{qr}|\beta , K, Q)\mathrm{d}\beta . \end{aligned}$$

Applying a Laplace approximation on Eq. (12) is then equivalent to deriving a BIC-like criterion for the LDA model with documents in \(W=({\tilde{W}}_{qr})_{qr}\). In the LDA model, the number of observations in the penalization term of BIC is the number of documents [see Than and Ho (2012) for instance]. In our case, this leads to

$$\begin{aligned}&\log \int _{\beta }p(W|A, Y, \beta , K, Q) p(\beta |K) \mathrm{d}\beta \nonumber \\&\quad \approx \max _{\beta } \log p(W|A, Y, \beta , K, Q) - \frac{K(V-1)}{2}\log Q^2.\nonumber \\ \end{aligned}$$

Unfortunately, \(\log p(W|A, Y, \beta , K, Q)\) is not tractable and so we propose to replace it with its variational approximation \(\tilde{{\mathcal {L}}}\), after convergence of the C-VEM algorithm. By analogy with \(ICL_{SBM}\), we call the corresponding criterion \(BIC_{LDA|Y}\) such that

$$\begin{aligned} \log p(A, W, Y|K, Q) \approx BIC_{LDA|Y} + ICL_{SBM}. \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bouveyron, C., Latouche, P. & Zreik, R. The stochastic topic block model for the clustering of vertices in networks with textual edges. Stat Comput 28, 11–31 (2018).

Download citation


  • Random graph models
  • Topic modeling
  • Textual edges
  • Clustering
  • Variational inference

Mathematics Subject Classification

  • 62F15
  • 62F86