Skip to main content
Log in

The stochastic topic block model for the clustering of vertices in networks with textual edges

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Due to the significant increase of communications between individuals via social media (Facebook, Twitter, Linkedin) or electronic formats (email, web, e-publication) in the past two decades, network analysis has become an unavoidable discipline. Many random graph models have been proposed to extract information from networks based on person-to-person links only, without taking into account information on the contents. This paper introduces the stochastic topic block model, a probabilistic model for networks with textual edges. We address here the problem of discovering meaningful clusters of vertices that are coherent from both the network interactions and the text contents. A classification variational expectation-maximization algorithm is proposed to perform inference. Simulated datasets are considered in order to assess the proposed approach and to highlight its main features. Finally, we demonstrate the effectiveness of our methodology on two real-word datasets: a directed communication network and an undirected co-authorship network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Airoldi, E., Blei, D., Fienberg, S., Xing, E.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)

    MATH  Google Scholar 

  • Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, pp. 267–281 (1973)

  • Ambroise, C., Grasseau, G., Hoebeke, M., Latouche, P., Miele, V., Picard, F.: The mixer R package (version 1.8) (2010). http://cran.r-project.org/web/packages/mixer/

  • Bickel, P., Chen, A.: A nonparametric view of network models and newman-girvan and other modularities. Proc. Natl Acad. Sci. 106(50), 21068–21073 (2009)

    Article  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intel. 7, 719–725 (2000)

    Article  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate gaussian mixture models. Comput. Stat. Data Anal. 41(3–4), 561–575 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Int. Comput. Sci. Inst. 4, 126 (1998)

    Google Scholar 

  • Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural Inf. Process. Syst. 18, 147 (2006)

    Google Scholar 

  • Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  • Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 10, 10008–10020 (2008)

    Article  Google Scholar 

  • Bouveyron, C., Latouche, P., Zreik, R.: The dynamic random subgraph model for the clustering of evolving networks. Comput. Stat. (2016)

  • Celeux, G., Govaert, G.: A classification em algorithm for clustering and two stochastic versions. Comput. Stat. Q. 2(1), 73–82 (1991)

    MATH  Google Scholar 

  • Chang, J., Blei, D.M.: Relational topic models for document networks. In: International Conference on Artificial Intelligence and Statistics, pp. 81–88 (2009)

  • Côme, E., Randriamanamihaga, A., Oukhellou, L., Aknin, P.: Spatio-temporal analysis of dynamic origin-destination data using latent dirichlet allocation. application to the vélib? bike sharing system of paris. In: Proceedings of 93rd Annual Meeting of the Transportation Research Board (2014)

  • Côme, E., Latouche, P.: Model selection and clustering in stochastic block models with the exact integrated complete data likelihood. Stat. Model. doi:10.1177/1471082X15577017 (2015)

  • Daudin, J.-J., Picard, F., Robin, S.: A mixture model for random graphs. Stat. Comput. 18(2), 173–183 (2008)

    Article  MathSciNet  Google Scholar 

  • Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)

    Article  Google Scholar 

  • Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Fienberg, S., Wasserman, S.: Categorical data analysis of single sociometric relations. Sociol. Methodol. 12, 156–192 (1981)

    Article  Google Scholar 

  • Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl Acad. Sci. 99(12), 7821 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Gormley, I.C., Murphy, T.B.: A mixture of experts latent position cluster model for social network data. Stat. Methodol. 7(3), 385–405 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Grun, B., Hornik, K.: The mixer topicmodels package (version 0.2-3). http://cran.r-project.org/web/packages/topicmodels/ (2013)

  • Handcock, M., Raftery, A., Tantrum, J.: Model-based clustering for social networks. J. R. Stat. Soc. A 170(2), 301–354 (2007)

    Article  MathSciNet  Google Scholar 

  • Hathaway, R.: Another interpretation of the EM algorithm for mixture distributions. Stat. Prob. Lett. 4(2), 53–56 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Hoff, P., Raftery, A., Handcock, M.: Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97(460), 1090–1098 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Hofman, J., Wiggins, C.: Bayesian approach to network modularity. Phys. Rev. Lett. 100(25), 258701 (2008)

    Article  Google Scholar 

  • Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 50–57. ACM, New York (1999)

  • Jernite, Y., Latouche, P., Bouveyron, C., Rivera, P., Jegou, L., Lamassé, S.: The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul. Ann. Appl. Stat. 8(1), 55–74 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Kemp, C., Tenenbaum, J., Griffiths, T., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. Proc. Natl Conf. Artif. Intell. 21, 381–391 (2006)

    Google Scholar 

  • Latouche, P., Birmelé, E., Ambroise, C.: Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5(1), 309–336 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Latouche, P., Birmelé, E., Ambroise, C.: Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12(1), 93–115 (2012)

    Article  MathSciNet  Google Scholar 

  • Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, Piscataway (2006)

  • Liu, Y., Niculescu-Mizil, A., Gryc, W. : Topic-link lda: joint models of topic and author community. In: proceedings of the 26th annual international conference on machine learning, pp. 665–672. ACM, New York (2009)

  • Mariadassou, M., Robin, S., Vacher, C.: Uncovering latent structure in valued graphs: a variational approach. Ann. Appl. Stat. 4(2), 715–742 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Matias, C., Miele, V.: Statistical clustering of temporal networks through a dynamic stochastic block model. Preprint HAL. n.01167837 (2016)

  • Matias, C., Robin, S.: Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. Surv. 47, 55–74 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • McDaid, A., Murphy, T., Friel, N., Hurley, N.: Improved bayesian inference for the stochastic block model with application to large networks. Comput. Stat. Data Anal. 60, 12–31 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • McCallum, A., Corrada-Emmanuel, A., Wang, X.: The author-recipient-topic model for topic and role discovery in social networks, with application to enron and academic email, pp. 33–44. In: Workshop on Link Analysis, Counterterrorism and Security (2005)

  • Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. Lett. E. 69, 0066133 (2004)

    Article  Google Scholar 

  • Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Mach. Learn. 39(2–3), 103–134 (2000)

    Article  MATH  Google Scholar 

  • Nowicki, K., Snijders, T.: Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc. 96(455), 1077–1087 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the tenth ACM PODS, pp. 159–168. ACM, New York (1998)

  • Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD workshop, vol. 8. Citeseer (2008)

  • Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

  • Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pp. 487–494. AUAI Press, Arlington (2004)

  • Sachan, M., Contractor, D., Faruquie, T., Subramaniam, L.: Using content and interactions for discovering communities in social networks. In: Proceedings of the 21st international conference on World Wide Web, pp. 331–340. ACM, New York (2012)

  • Salter-Townshend, M., White, A., Gollini, I., Murphy, T.B.: Review of statistical network analysis: models, algorithms, and software. Stat. Anal. Data Min. 5(4), 243–264 (2012)

    Article  MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 306–315. ACM, New York (2004)

  • Sun, Y., Han, J., Gao, J., Yu, Y.: itopicmodel: Information network-integrated topic modeling. In: Ninth IEEE International Conference on Data Mining, 2009. ICDM’09, pp. 493–502. IEEE, Piscataway (2009)

  • Teh, Y., Newman, D., Welling, M.: A collapsed variational bayesian inference algorithm for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 18, 1353–1360 (2006)

    Google Scholar 

  • Than, K., Ho, T.: Fully sparse topic models. Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science. vol. 7523, pp. 490–505. Springer, Berlin (2012)

  • Wang, Y., Wong, G.: Stochastic blockmodels for directed graphs. J. Am. Stat. Assoc. 82, 8–19 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • White, H., Boorman, S., Breiger, R.: Social structure from multiple networks. I. Blockmodels of roles and positions. Am. J. Sociol. 81, 730–780 (1976)

    Article  Google Scholar 

  • Xu, K., Hero III, A.: Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Social Computing, Behavioral-Cultural Modeling and Prediction, pp. 201–210. Springer, Berlin (2013)

  • Yang, T., Chi, Y., Zhu, S., Gong, Y., Jin, R.: Detecting communities and their evolutions in dynamic social networks: a bayesian approach. Mach. Learn. 82(2), 157–189 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Zanghi, H., Ambroise, C., Miele, V.: Fast online graph clustering via Erdos–Renyi mixture. Pattern Recognit. 41, 3592–3599 (2008)

    Article  MATH  Google Scholar 

  • Zanghi, H., Volant, S., Ambroise, C.: Clustering based on random graph model embedding vertex features. Pattern Recognit. Lett. 31(9), 830–836 (2010)

    Article  Google Scholar 

  • Zhou, D., Manavoglu, E., Li, J., Giles, C., Zha, H.: Probabilistic models for discovering e-communities. In: Proceedings of the 15th international conference on World Wide Web, pp. 173–182. ACM, New York (2006)

Download references

Acknowledgments

The authors would like to greatly thank the editor and the two reviewers for their helpful remarks on the first version of this paper, and Laurent Bergé for his kind suggestions and the development of visualization tools.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Latouche.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 253 KB)

Appendix

Appendix

1.1 Appendix 1: Optimization of R(Z)

The VEM update step for each distribution \(R(Z_{ij}^{dn}), A_{ij}=1\), is given by

$$\begin{aligned} \begin{aligned} \log R(Z_{ij}^{dn})&= \mathrm {E}_{Z^{\backslash i,j,d,n},\theta }[\log p(W|A, Z, \beta ) \\&\quad + \log p(Z|A, Y, \theta )] + \mathrm {const}\\&= \sum _{k=1}^{K} Z_{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv}\\&\quad + \sum _{q,r}^{Q}Y_{iq}Y_{jr}\sum _{k=1}^{K}Z_{ij}^{dnk}\mathrm {E}_{\theta _{qr}}[\log \theta _{qrk}] + \mathrm {const}\\&= \sum _{k=1}^{K}Z_{ij}^{dnk}\left( \sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv} \right. \\&\left. \quad +\sum _{q,r}^{Q}Y_{iq}Y_{jr}\left( \psi (\gamma _{qrk})-\psi \left( \sum _{k=1}^{K}\gamma _{qrk}\right) \right) \right) \\&\quad + \mathrm {const}, \end{aligned} \end{aligned}$$
(9)

where all terms that do not depend on \(Z_{ij}^{dn}\) have been put into the constant term \(\mathrm {const}\). Moreover, \(\psi (\cdot )\) denotes the digamma function. The functional form of a multinomial distribution is then recognized in (9)

$$\begin{aligned} R(Z_{ij}^{dn})={\mathcal {M}}\left( Z_{ij}^{dn};1,\phi _{ij}^{dn}=\left( \phi _{ij}^{dn1},\dots , \phi _{ij}^{dnK}\right) \right) , \end{aligned}$$

where

$$\begin{aligned} \phi _{ij}^{dnk} \propto \left( \prod _{v=1}^{V} \beta _{kv}^{W_{ij}^{dnv}}\right) \prod _{q,r}^{Q}\exp \left( \psi (\gamma _{qrk}-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) ^{Y_{iq}Y_{jr}}{.} \end{aligned}$$

\(\phi _{ij}^{dnk}\) is the (approximate) posterior distribution of words \(W_{ij}^{dn}\) being in topic k.

1.2 Appendix 2: Optimization of \(R(\theta )\)

The VEM update step for distribution \(R(\theta )\) is given by

$$\begin{aligned} \begin{aligned} \log R(\theta )&= \mathrm {E}_{Z}[\log p(Z|A, Y, \theta )] + \mathrm {const}\\&= \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{d}}\sum _{q,r}^{Q}Y_{iq}Y_{jr}\\&\quad \times \sum _{k=1}^{K}\mathrm {E}_{Z_{ij}^{dn}}\left[ Z_{ij}^{dnk}\right] \log \theta _{qrk}\\&\quad + \sum _{q,r}^{Q}\sum _{k=1}^{K}(\alpha _k - 1)\log \theta _{qrk} + \mathrm {const}\\&= \sum _{q,r}^{Q}\sum _{k=1}^{K}\left( \alpha _{k} + \sum _{i \ne j}^{M} A_{ij}Y_{iq}Y_{jr}\sum _{d=1}^{N_{ij}^{d}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}-1\right) \\&\qquad \log \theta _{qrk} + \mathrm {const}. \end{aligned} \end{aligned}$$

We recognize the functional form of a product of Dirichlet distributions

$$\begin{aligned} \begin{aligned} R(\theta )= \prod _{q,r}^{Q}\mathrm {Dir}(\theta _{qr};\gamma _{qr}=(\gamma _{qr1},\dots , \gamma _{qrK})), \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \gamma _{qrk} = \alpha _{k} + \sum _{i \ne j}^{M} A_{ij}Y_{iq}Y_{jr}\sum _{d=1}^{N_{ij}^{d}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}. \end{aligned}$$

1.3 Appendix 3: Derivation of the lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \)

The lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \) in (7) is given by

$$\begin{aligned}&\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \nonumber \\&\quad = \sum _{Z}\int _{\theta }R(Z,\theta ) \log \frac{p(W, Z, \theta |A, Y,\beta )}{R(Z,\theta )} \mathrm{d}\theta \nonumber \\&\quad = \mathrm {E}_{Z}[\log p(W|A, Z, \beta )] \nonumber \\&\qquad + \mathrm {E}_{Z, \theta }[\log p(Z|A, Y, \theta )] + \mathrm {E}_{\theta }[\log p(\theta )]\nonumber \\&\qquad - \mathrm {E}_{Z}[\log R(Z)]-\mathrm {E}_{\theta }[\log R(\theta )] \nonumber \\&\quad =\sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv} \nonumber \\&\qquad + \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}} \sum _{q,r}^{Q}Y_{iq}Y_{jr}\nonumber \\&\qquad \times \sum _{k=1}^{K}\phi _{ij}^{dnk}\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \\&\qquad + \sum _{q,r}^{Q}\left( \log \varGamma \left( \sum _{l=1}^{K}\alpha _{k}\right) - \sum _{l=1}^{K}\log \varGamma (\alpha _{l})\right. \nonumber \\&\qquad \left. +\sum _{k=1}^{K}(\alpha _{k}-1)\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \right) \nonumber \\&\qquad - \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\log \phi _{ij}^{dnk}\nonumber \\&\qquad - \sum _{q,r}^{Q}\left( \log \varGamma \left( \sum _{l=1}^{K}\gamma _{qrl}\right) - \sum _{l=1}^{K}\log \varGamma (\gamma _{qrl})\right. \nonumber \\&\qquad \left. +\sum _{k=1}^{K}(\gamma _{qrk}-1)\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \right) .\nonumber \end{aligned}$$
(10)

1.4 Appendix 4: Optimization of \(\beta \)

In order to maximize the lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \), we isolate the terms in (10) that depend on \(\beta \) and add Lagrange multipliers to satisfy the constraints \(\sum _{v=1}^{V}\beta _{kv}=1,\forall k\)

$$\begin{aligned} \tilde{{\mathcal {L}}}_{\beta }= & {} \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv}\\&+ \sum _{k=1}^{K}\lambda _{k}\left( \sum _{v=1}^{V}\beta _{kv}-1\right) . \end{aligned}$$

Setting the derivative, with respect to \(\beta _{kv}\), to zero, we find

$$\begin{aligned} \beta _{kv}\propto \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}W_{ij}^{dnv}. \end{aligned}$$

1.5 Appendix 5: Optimization of \(\rho \)

Only the distribution \(p(Y|\rho )\) in the complete data log-likelihood \(\log p(A, Y|\rho , \pi )\) depends on the parameter vector \(\rho \) of cluster proportions. Taking the log and adding a Lagrange multiplier to satisfy the constraint \(\sum _{q=1}^{Q}\rho _{q}=1\), we have

$$\begin{aligned} \log p(Y|\rho ) = \sum _{i=1}^{M}\sum _{q=1}^{Q}Y_{iq}\log \rho _{q}. \end{aligned}$$

Taking the derivative with respect \(\rho \) to zero, we find

$$\begin{aligned} \rho _{q} \propto \sum _{i=1}^{M}Y_{iq}. \end{aligned}$$

1.6 Appendix 6: Optimization of \(\pi \)

Only the distribution \(p(A|Y, \pi )\) in the complete data log-likelihood \(\log p(A, Y|\rho , \pi )\) depends on the parameter matrix \(\pi \) of connection probabilities. Taking the log we have

$$\begin{aligned}&\log p(A|Y, \pi )\\&\quad = \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}\Big (A_{ij}\log \pi _{qr} +(1-A_{ij})\log (1-\pi _{qr})\Big ). \end{aligned}$$

Taking the derivative with respect to \(\pi _{qr}\) to zero, we obtain

$$\begin{aligned} \pi _{qr} = \frac{ \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}A_{ij}}{ \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}}. \end{aligned}$$

1.7 Appendix 7: Model selection

Assuming that the prior distribution over the model parameters \((\rho , \pi , \beta )\) can be factorized, the integrated complete data log-likelihood \(\log p(A, W, Y|K, Q)\) is given by

$$\begin{aligned} \begin{aligned}&\log p(A, W, Y|K, Q)\\&\quad = \log \int _{\rho ,\pi ,\beta } p(A, W, Y, \rho , \pi , \beta |K, Q) \mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\theta \\&\quad = \log \int _{\rho ,\pi ,\beta } p(A, W, Y|\rho , \pi , \beta , K, Q)\\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta . \end{aligned} \end{aligned}$$

Note that the dependency on K and Q is made explicit here, in all expressions. In all other sections of the paper, we did not include these terms to keep the notations uncluttered. We find

$$\begin{aligned}&\log p(A, W, Y|K, Q)\nonumber \\&\quad = \log \int _{\rho , \pi , \beta }\left( \sum _{Z}\int _{\theta }p(A, W, Y, Z, \theta |\rho , \pi , \beta , K, Q)\mathrm{d}\theta \right) \nonumber \\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\rho , \pi , \beta } \left( \sum _{Z}\int _{\theta }p(W, Z, \theta |A, Y, \beta , K, Q)p(A, Y|\rho , \pi , Q)\mathrm{d}\theta \right) \nonumber \\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\rho , \pi , \beta }p(W|A, Y, \beta , K, Q) p(A|Y, \pi , Q)p(Y|\rho , Q)\\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\beta }p(W|A, Y, \beta , K, Q)\nonumber \\&\qquad \times p(\beta |K) \mathrm{d}\beta + \log \int _{\pi } p(A|Y, \pi , Q)p(\pi |Q)\mathrm{d}\pi \nonumber \\&\qquad + \log \int _{\rho }p(Y|\rho , Q)p(\rho |Q)\mathrm{d}\rho .\nonumber \end{aligned}$$
(11)

Following the derivation of the ICL criterion, we apply a Laplace (BIC-like) approximation on the second term of Eq. (11). Moreover, considering a Jeffreys prior distribution for \(\rho \) and using Stirling formula for large values of M, we obtain

$$\begin{aligned}&\log \int _{\pi } p(A|Y, \pi , Q)p(\pi |Q)\mathrm{d}\pi \\&\quad \approx \max _{\pi }\log p(A|Y, \pi , Q) - \frac{Q^2}{2}\log M(M-1), \end{aligned}$$

as well as

$$\begin{aligned}&\log \int _{\rho }p(Y|\rho , Q)p(\rho |Q)\mathrm{d}\rho \\&\quad \approx \max _{\rho } \log p(Y|\rho , Q) - \frac{Q-1}{2}\log M. \end{aligned}$$

For more details, we refer to Biernacki et al. (2000). Furthermore, we emphasize that adding these two approximations leads to the ICL criterion for the SBM model, as derived by Daudin et al. (2008)

$$\begin{aligned} \begin{aligned} ICL_{SBM}&= \max _{\pi }\log p(A|Y, \pi , Q)\\&\quad - \frac{Q^2}{2}\log M(M-1) + \max _{\rho } \log p(Y|\rho , Q)\\&\quad - \frac{Q-1}{2}\log M \\&= \max _{\rho , \pi } \log p(A,Y|\rho , \pi , Q)\\&\quad - \frac{Q^2}{2}\log M(M-1) - \frac{Q-1}{2}\log M. \end{aligned} \end{aligned}$$

In Daudin et al. (2008), \(M(M-1)\) is replaced by \(M(M-1)/2\) and \(Q^2\) by \(Q(Q+1)/2\) since they considered undirected networks.

Now, it is worth taking a closer look at the first term of Eq. (11). This term involves a marginalization over \(\beta \). Let us emphasize that \(p(W|A, Y, \beta , K, Q)\) is related to the LDA model and involves a marginalization over \(\theta \) (and Z). Because we aim at approximating the first term of Eq. (11), also with a Laplace (BIC-like) approximation, it is crucial to identify the number of observations in the associated likelihood term \(p(W|A, Y, \beta , K, Q)\). As pointed out in Sect. 2.4, given Y (and \(\theta \)), it is possible to reorganize the documents in W as \(W=({\tilde{W}}_{qr})_{qr}\) is such a way that all words in \({\tilde{W}}_{qr}\) follow the same mixture distribution over topics. Each aggregated document \({\tilde{W}}_{qr}\) has its own vector \(\theta _{qr}\) of topic proportions and since the distribution over \(\theta \) factorizes (\(p(\theta )=\prod _{q,r}^{Q}p(\theta _{qr}))\), we find

$$\begin{aligned} \begin{aligned}&p(W|A, Y, \beta , K, Q)\\&\quad = \int _{\theta } p(W |A, Y, \theta , \beta , K, Q)p(\theta |K, Q)\mathrm{d}\theta \\&\quad = \prod _{q,r}^{Q}\int _{\theta _{qr}}p({\tilde{W}}_{qr}|\theta _{qr}, \beta , K, Q)p(\theta _{qr}| K)\mathrm{d}\theta _{qr} \\&\quad = \prod _{q,r}^{Q} \ell ({\tilde{W}}_{qr}|\beta , K, Q), \end{aligned} \end{aligned}$$

where \(\ell ({\tilde{W}}_{qr}|\beta , K, Q)\) is exactly the likelihood term of the LDA model associated with document \({\tilde{W}}_{qr}\), as described in Blei et al. (2003). Thus

$$\begin{aligned}&\log \int _{\beta }p(W|A, Y, \beta , K, Q) p(\beta |K) \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\beta } p(\beta |K) \prod _{q,r}^{Q} \ell ({\tilde{W}}_{qr}|\beta , K, Q)\mathrm{d}\beta . \end{aligned}$$
(12)

Applying a Laplace approximation on Eq. (12) is then equivalent to deriving a BIC-like criterion for the LDA model with documents in \(W=({\tilde{W}}_{qr})_{qr}\). In the LDA model, the number of observations in the penalization term of BIC is the number of documents [see Than and Ho (2012) for instance]. In our case, this leads to

$$\begin{aligned}&\log \int _{\beta }p(W|A, Y, \beta , K, Q) p(\beta |K) \mathrm{d}\beta \nonumber \\&\quad \approx \max _{\beta } \log p(W|A, Y, \beta , K, Q) - \frac{K(V-1)}{2}\log Q^2.\nonumber \\ \end{aligned}$$
(13)

Unfortunately, \(\log p(W|A, Y, \beta , K, Q)\) is not tractable and so we propose to replace it with its variational approximation \(\tilde{{\mathcal {L}}}\), after convergence of the C-VEM algorithm. By analogy with \(ICL_{SBM}\), we call the corresponding criterion \(BIC_{LDA|Y}\) such that

$$\begin{aligned} \log p(A, W, Y|K, Q) \approx BIC_{LDA|Y} + ICL_{SBM}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouveyron, C., Latouche, P. & Zreik, R. The stochastic topic block model for the clustering of vertices in networks with textual edges. Stat Comput 28, 11–31 (2018). https://doi.org/10.1007/s11222-016-9713-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9713-7

Keywords

Mathematics Subject Classification

Navigation