The stochastic topic block model for the clustering of vertices in networks with textual edges

Bouveyron, C.; Latouche, P.; Zreik, R.

doi:10.1007/s11222-016-9713-7

The stochastic topic block model for the clustering of vertices in networks with textual edges

Published: 21 October 2016

Volume 28, pages 11–31, (2018)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

C. Bouveyron¹,
P. Latouche² &
R. Zreik^1,2

949 Accesses
18 Citations
29 Altmetric
3 Mentions
Explore all metrics

Abstract

Due to the significant increase of communications between individuals via social media (Facebook, Twitter, Linkedin) or electronic formats (email, web, e-publication) in the past two decades, network analysis has become an unavoidable discipline. Many random graph models have been proposed to extract information from networks based on person-to-person links only, without taking into account information on the contents. This paper introduces the stochastic topic block model, a probabilistic model for networks with textual edges. We address here the problem of discovering meaningful clusters of vertices that are coherent from both the network interactions and the text contents. A classification variational expectation-maximization algorithm is proposed to perform inference. Simulated datasets are considered in order to assess the proposed approach and to highlight its main features. Finally, we demonstrate the effectiveness of our methodology on two real-word datasets: a directed communication network and an undirected co-authorship network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embedded topics in the stochastic block model

Article 01 July 2023

The dynamic stochastic topic block model for dynamic networks with textual edges

Article 15 September 2018

Identifiability and parameter estimation of the overlapped stochastic co-block model

Article 28 June 2022

References

Airoldi, E., Blei, D., Fienberg, S., Xing, E.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)
MATH Google Scholar
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, pp. 267–281 (1973)
Ambroise, C., Grasseau, G., Hoebeke, M., Latouche, P., Miele, V., Picard, F.: The mixer R package (version 1.8) (2010). http://cran.r-project.org/web/packages/mixer/
Bickel, P., Chen, A.: A nonparametric view of network models and newman-girvan and other modularities. Proc. Natl Acad. Sci. 106(50), 21068–21073 (2009)
Article MATH Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intel. 7, 719–725 (2000)
Article Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate gaussian mixture models. Comput. Stat. Data Anal. 41(3–4), 561–575 (2003)
Article MathSciNet MATH Google Scholar
Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Int. Comput. Sci. Inst. 4, 126 (1998)
Google Scholar
Blei, D., Lafferty, J.: Correlated topic models. Adv. Neural Inf. Process. Syst. 18, 147 (2006)
Google Scholar
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 10, 10008–10020 (2008)
Article Google Scholar
Bouveyron, C., Latouche, P., Zreik, R.: The dynamic random subgraph model for the clustering of evolving networks. Comput. Stat. (2016)
Celeux, G., Govaert, G.: A classification em algorithm for clustering and two stochastic versions. Comput. Stat. Q. 2(1), 73–82 (1991)
MATH Google Scholar
Chang, J., Blei, D.M.: Relational topic models for document networks. In: International Conference on Artificial Intelligence and Statistics, pp. 81–88 (2009)
Côme, E., Randriamanamihaga, A., Oukhellou, L., Aknin, P.: Spatio-temporal analysis of dynamic origin-destination data using latent dirichlet allocation. application to the vélib? bike sharing system of paris. In: Proceedings of 93rd Annual Meeting of the Transportation Research Board (2014)
Côme, E., Latouche, P.: Model selection and clustering in stochastic block models with the exact integrated complete data likelihood. Stat. Model. doi:10.1177/1471082X15577017 (2015)
Daudin, J.-J., Picard, F., Robin, S.: A mixture model for random graphs. Stat. Comput. 18(2), 173–183 (2008)
Article MathSciNet Google Scholar
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)
Article Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Fienberg, S., Wasserman, S.: Categorical data analysis of single sociometric relations. Sociol. Methodol. 12, 156–192 (1981)
Article Google Scholar
Girvan, M., Newman, M.: Community structure in social and biological networks. Proc. Natl Acad. Sci. 99(12), 7821 (2002)
Article MathSciNet MATH Google Scholar
Gormley, I.C., Murphy, T.B.: A mixture of experts latent position cluster model for social network data. Stat. Methodol. 7(3), 385–405 (2010)
Article MathSciNet MATH Google Scholar
Grun, B., Hornik, K.: The mixer topicmodels package (version 0.2-3). http://cran.r-project.org/web/packages/topicmodels/ (2013)
Handcock, M., Raftery, A., Tantrum, J.: Model-based clustering for social networks. J. R. Stat. Soc. A 170(2), 301–354 (2007)
Article MathSciNet Google Scholar
Hathaway, R.: Another interpretation of the EM algorithm for mixture distributions. Stat. Prob. Lett. 4(2), 53–56 (1986)
Article MathSciNet MATH Google Scholar
Hoff, P., Raftery, A., Handcock, M.: Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97(460), 1090–1098 (2002)
Article MathSciNet MATH Google Scholar
Hofman, J., Wiggins, C.: Bayesian approach to network modularity. Phys. Rev. Lett. 100(25), 258701 (2008)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 50–57. ACM, New York (1999)
Jernite, Y., Latouche, P., Bouveyron, C., Rivera, P., Jegou, L., Lamassé, S.: The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul. Ann. Appl. Stat. 8(1), 55–74 (2014)
Article MathSciNet MATH Google Scholar
Kemp, C., Tenenbaum, J., Griffiths, T., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. Proc. Natl Conf. Artif. Intell. 21, 381–391 (2006)
Google Scholar
Latouche, P., Birmelé, E., Ambroise, C.: Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5(1), 309–336 (2011)
Article MathSciNet MATH Google Scholar
Latouche, P., Birmelé, E., Ambroise, C.: Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12(1), 93–115 (2012)
Article MathSciNet Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, Piscataway (2006)
Liu, Y., Niculescu-Mizil, A., Gryc, W. : Topic-link lda: joint models of topic and author community. In: proceedings of the 26th annual international conference on machine learning, pp. 665–672. ACM, New York (2009)
Mariadassou, M., Robin, S., Vacher, C.: Uncovering latent structure in valued graphs: a variational approach. Ann. Appl. Stat. 4(2), 715–742 (2010)
Article MathSciNet MATH Google Scholar
Matias, C., Miele, V.: Statistical clustering of temporal networks through a dynamic stochastic block model. Preprint HAL. n.01167837 (2016)
Matias, C., Robin, S.: Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. Surv. 47, 55–74 (2014)
Article MathSciNet MATH Google Scholar
McDaid, A., Murphy, T., Friel, N., Hurley, N.: Improved bayesian inference for the stochastic block model with application to large networks. Comput. Stat. Data Anal. 60, 12–31 (2013)
Article MathSciNet MATH Google Scholar
McCallum, A., Corrada-Emmanuel, A., Wang, X.: The author-recipient-topic model for topic and role discovery in social networks, with application to enron and academic email, pp. 33–44. In: Workshop on Link Analysis, Counterterrorism and Security (2005)
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. Lett. E. 69, 0066133 (2004)
Article Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Mach. Learn. 39(2–3), 103–134 (2000)
Article MATH Google Scholar
Nowicki, K., Snijders, T.: Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc. 96(455), 1077–1087 (2001)
Article MathSciNet MATH Google Scholar
Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the tenth ACM PODS, pp. 159–168. ACM, New York (1998)
Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD workshop, vol. 8. Citeseer (2008)
Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)
Article Google Scholar
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pp. 487–494. AUAI Press, Arlington (2004)
Sachan, M., Contractor, D., Faruquie, T., Subramaniam, L.: Using content and interactions for discovering communities in social networks. In: Proceedings of the 21st international conference on World Wide Web, pp. 331–340. ACM, New York (2012)
Salter-Townshend, M., White, A., Gollini, I., Murphy, T.B.: Review of statistical network analysis: models, algorithms, and software. Stat. Anal. Data Min. 5(4), 243–264 (2012)
Article MathSciNet Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MathSciNet MATH Google Scholar
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 306–315. ACM, New York (2004)
Sun, Y., Han, J., Gao, J., Yu, Y.: itopicmodel: Information network-integrated topic modeling. In: Ninth IEEE International Conference on Data Mining, 2009. ICDM’09, pp. 493–502. IEEE, Piscataway (2009)
Teh, Y., Newman, D., Welling, M.: A collapsed variational bayesian inference algorithm for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 18, 1353–1360 (2006)
Google Scholar
Than, K., Ho, T.: Fully sparse topic models. Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science. vol. 7523, pp. 490–505. Springer, Berlin (2012)
Wang, Y., Wong, G.: Stochastic blockmodels for directed graphs. J. Am. Stat. Assoc. 82, 8–19 (1987)
Article MathSciNet MATH Google Scholar
White, H., Boorman, S., Breiger, R.: Social structure from multiple networks. I. Blockmodels of roles and positions. Am. J. Sociol. 81, 730–780 (1976)
Article Google Scholar
Xu, K., Hero III, A.: Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Social Computing, Behavioral-Cultural Modeling and Prediction, pp. 201–210. Springer, Berlin (2013)
Yang, T., Chi, Y., Zhu, S., Gong, Y., Jin, R.: Detecting communities and their evolutions in dynamic social networks: a bayesian approach. Mach. Learn. 82(2), 157–189 (2011)
Article MathSciNet MATH Google Scholar
Zanghi, H., Ambroise, C., Miele, V.: Fast online graph clustering via Erdos–Renyi mixture. Pattern Recognit. 41, 3592–3599 (2008)
Article MATH Google Scholar
Zanghi, H., Volant, S., Ambroise, C.: Clustering based on random graph model embedding vertex features. Pattern Recognit. Lett. 31(9), 830–836 (2010)
Article Google Scholar
Zhou, D., Manavoglu, E., Li, J., Giles, C., Zha, H.: Probabilistic models for discovering e-communities. In: Proceedings of the 15th international conference on World Wide Web, pp. 173–182. ACM, New York (2006)

Download references

Acknowledgments

The authors would like to greatly thank the editor and the two reviewers for their helpful remarks on the first version of this paper, and Laurent Bergé for his kind suggestions and the development of visualization tools.

Author information

Authors and Affiliations

Laboratoire MAP5, UMR CNRS 8145, Université Paris Descartes & Sorbonne Paris Cité, Paris, France
C. Bouveyron & R. Zreik
Laboratoire SAMM, EA 4543, Université Paris 1 Panthéon-Sorbonne, Paris, France
P. Latouche & R. Zreik

Authors

C. Bouveyron
View author publications
You can also search for this author in PubMed Google Scholar
P. Latouche
View author publications
You can also search for this author in PubMed Google Scholar
R. Zreik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Latouche.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 253 KB)

Appendix

1.1 Appendix 1: Optimization of R(Z)

The VEM update step for each distribution $R(Z_{ij}^{dn}), A_{ij}=1$, is given by

$$\begin{aligned} \begin{aligned} \log R(Z_{ij}^{dn})&= \mathrm {E}_{Z^{\backslash i,j,d,n},\theta }[\log p(W|A, Z, \beta ) \\&\quad + \log p(Z|A, Y, \theta )] + \mathrm {const}\\&= \sum _{k=1}^{K} Z_{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv}\\&\quad + \sum _{q,r}^{Q}Y_{iq}Y_{jr}\sum _{k=1}^{K}Z_{ij}^{dnk}\mathrm {E}_{\theta _{qr}}[\log \theta _{qrk}] + \mathrm {const}\\&= \sum _{k=1}^{K}Z_{ij}^{dnk}\left( \sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv} \right. \\&\left. \quad +\sum _{q,r}^{Q}Y_{iq}Y_{jr}\left( \psi (\gamma _{qrk})-\psi \left( \sum _{k=1}^{K}\gamma _{qrk}\right) \right) \right) \\&\quad + \mathrm {const}, \end{aligned} \end{aligned}$$

(9)

where all terms that do not depend on $Z_{ij}^{dn}$ have been put into the constant term $\mathrm {const}$. Moreover, $\psi (\cdot )$ denotes the digamma function. The functional form of a multinomial distribution is then recognized in (9)

$$\begin{aligned} R(Z_{ij}^{dn})={\mathcal {M}}\left( Z_{ij}^{dn};1,\phi _{ij}^{dn}=\left( \phi _{ij}^{dn1},\dots , \phi _{ij}^{dnK}\right) \right) , \end{aligned}$$

where

$$\begin{aligned} \phi _{ij}^{dnk} \propto \left( \prod _{v=1}^{V} \beta _{kv}^{W_{ij}^{dnv}}\right) \prod _{q,r}^{Q}\exp \left( \psi (\gamma _{qrk}-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) ^{Y_{iq}Y_{jr}}{.} \end{aligned}$$

$\phi _{ij}^{dnk}$ is the (approximate) posterior distribution of words $W_{ij}^{dn}$ being in topic k.

1.2 Appendix 2: Optimization of $R(\theta )$

The VEM update step for distribution $R(\theta )$ is given by

$$\begin{aligned} \begin{aligned} \log R(\theta )&= \mathrm {E}_{Z}[\log p(Z|A, Y, \theta )] + \mathrm {const}\\&= \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{d}}\sum _{q,r}^{Q}Y_{iq}Y_{jr}\\&\quad \times \sum _{k=1}^{K}\mathrm {E}_{Z_{ij}^{dn}}\left[ Z_{ij}^{dnk}\right] \log \theta _{qrk}\\&\quad + \sum _{q,r}^{Q}\sum _{k=1}^{K}(\alpha _k - 1)\log \theta _{qrk} + \mathrm {const}\\&= \sum _{q,r}^{Q}\sum _{k=1}^{K}\left( \alpha _{k} + \sum _{i \ne j}^{M} A_{ij}Y_{iq}Y_{jr}\sum _{d=1}^{N_{ij}^{d}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}-1\right) \\&\qquad \log \theta _{qrk} + \mathrm {const}. \end{aligned} \end{aligned}$$

We recognize the functional form of a product of Dirichlet distributions

$$\begin{aligned} \begin{aligned} R(\theta )= \prod _{q,r}^{Q}\mathrm {Dir}(\theta _{qr};\gamma _{qr}=(\gamma _{qr1},\dots , \gamma _{qrK})), \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \gamma _{qrk} = \alpha _{k} + \sum _{i \ne j}^{M} A_{ij}Y_{iq}Y_{jr}\sum _{d=1}^{N_{ij}^{d}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}. \end{aligned}$$

1.3 Appendix 3: Derivation of the lower bound $\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) $

The lower bound $\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) $ in (7) is given by

$$\begin{aligned}&\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \nonumber \\&\quad = \sum _{Z}\int _{\theta }R(Z,\theta ) \log \frac{p(W, Z, \theta |A, Y,\beta )}{R(Z,\theta )} \mathrm{d}\theta \nonumber \\&\quad = \mathrm {E}_{Z}[\log p(W|A, Z, \beta )] \nonumber \\&\qquad + \mathrm {E}_{Z, \theta }[\log p(Z|A, Y, \theta )] + \mathrm {E}_{\theta }[\log p(\theta )]\nonumber \\&\qquad - \mathrm {E}_{Z}[\log R(Z)]-\mathrm {E}_{\theta }[\log R(\theta )] \nonumber \\&\quad =\sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv} \nonumber \\&\qquad + \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}} \sum _{q,r}^{Q}Y_{iq}Y_{jr}\nonumber \\&\qquad \times \sum _{k=1}^{K}\phi _{ij}^{dnk}\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \\&\qquad + \sum _{q,r}^{Q}\left( \log \varGamma \left( \sum _{l=1}^{K}\alpha _{k}\right) - \sum _{l=1}^{K}\log \varGamma (\alpha _{l})\right. \nonumber \\&\qquad \left. +\sum _{k=1}^{K}(\alpha _{k}-1)\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \right) \nonumber \\&\qquad - \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\log \phi _{ij}^{dnk}\nonumber \\&\qquad - \sum _{q,r}^{Q}\left( \log \varGamma \left( \sum _{l=1}^{K}\gamma _{qrl}\right) - \sum _{l=1}^{K}\log \varGamma (\gamma _{qrl})\right. \nonumber \\&\qquad \left. +\sum _{k=1}^{K}(\gamma _{qrk}-1)\left( \psi (\gamma _{qrk})-\psi \left( \sum _{l=1}^{K}\gamma _{qrl}\right) \right) \right) .\nonumber \end{aligned}$$

(10)

1.4 Appendix 4: Optimization of $\beta $

In order to maximize the lower bound $\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) $, we isolate the terms in (10) that depend on $\beta $ and add Lagrange multipliers to satisfy the constraints $\sum _{v=1}^{V}\beta _{kv}=1,\forall k$

$$\begin{aligned} \tilde{{\mathcal {L}}}_{\beta }= & {} \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\sum _{k=1}^{K}\phi _{ij}^{dnk}\sum _{v=1}^{V}W_{ij}^{dnv}\log \beta _{kv}\\&+ \sum _{k=1}^{K}\lambda _{k}\left( \sum _{v=1}^{V}\beta _{kv}-1\right) . \end{aligned}$$

Setting the derivative, with respect to $\beta _{kv}$, to zero, we find

$$\begin{aligned} \beta _{kv}\propto \sum _{i \ne j}^{M}A_{ij}\sum _{d=1}^{D_{ij}}\sum _{n=1}^{N_{ij}^{dn}}\phi _{ij}^{dnk}W_{ij}^{dnv}. \end{aligned}$$

1.5 Appendix 5: Optimization of $\rho $

Only the distribution $p(Y|\rho )$ in the complete data log-likelihood $\log p(A, Y|\rho , \pi )$ depends on the parameter vector $\rho $ of cluster proportions. Taking the log and adding a Lagrange multiplier to satisfy the constraint $\sum _{q=1}^{Q}\rho _{q}=1$, we have

$$\begin{aligned} \log p(Y|\rho ) = \sum _{i=1}^{M}\sum _{q=1}^{Q}Y_{iq}\log \rho _{q}. \end{aligned}$$

Taking the derivative with respect $\rho $ to zero, we find

$$\begin{aligned} \rho _{q} \propto \sum _{i=1}^{M}Y_{iq}. \end{aligned}$$

1.6 Appendix 6: Optimization of $\pi $

Only the distribution $p(A|Y, \pi )$ in the complete data log-likelihood $\log p(A, Y|\rho , \pi )$ depends on the parameter matrix $\pi $ of connection probabilities. Taking the log we have

$$\begin{aligned}&\log p(A|Y, \pi )\\&\quad = \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}\Big (A_{ij}\log \pi _{qr} +(1-A_{ij})\log (1-\pi _{qr})\Big ). \end{aligned}$$

Taking the derivative with respect to $\pi _{qr}$ to zero, we obtain

$$\begin{aligned} \pi _{qr} = \frac{ \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}A_{ij}}{ \sum _{i \ne j}^{M}\sum _{q,r}^{Q}Y_{iq}Y_{jr}}. \end{aligned}$$

1.7 Appendix 7: Model selection

Assuming that the prior distribution over the model parameters $(\rho , \pi , \beta )$ can be factorized, the integrated complete data log-likelihood $\log p(A, W, Y|K, Q)$ is given by

$$\begin{aligned} \begin{aligned}&\log p(A, W, Y|K, Q)\\&\quad = \log \int _{\rho ,\pi ,\beta } p(A, W, Y, \rho , \pi , \beta |K, Q) \mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\theta \\&\quad = \log \int _{\rho ,\pi ,\beta } p(A, W, Y|\rho , \pi , \beta , K, Q)\\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta . \end{aligned} \end{aligned}$$

Note that the dependency on K and Q is made explicit here, in all expressions. In all other sections of the paper, we did not include these terms to keep the notations uncluttered. We find

$$\begin{aligned}&\log p(A, W, Y|K, Q)\nonumber \\&\quad = \log \int _{\rho , \pi , \beta }\left( \sum _{Z}\int _{\theta }p(A, W, Y, Z, \theta |\rho , \pi , \beta , K, Q)\mathrm{d}\theta \right) \nonumber \\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\rho , \pi , \beta } \left( \sum _{Z}\int _{\theta }p(W, Z, \theta |A, Y, \beta , K, Q)p(A, Y|\rho , \pi , Q)\mathrm{d}\theta \right) \nonumber \\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\rho , \pi , \beta }p(W|A, Y, \beta , K, Q) p(A|Y, \pi , Q)p(Y|\rho , Q)\\&\qquad \times p(\rho |Q)p(\pi |Q)p(\beta |K)\mathrm{d}\rho \mathrm{d}\pi \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\beta }p(W|A, Y, \beta , K, Q)\nonumber \\&\qquad \times p(\beta |K) \mathrm{d}\beta + \log \int _{\pi } p(A|Y, \pi , Q)p(\pi |Q)\mathrm{d}\pi \nonumber \\&\qquad + \log \int _{\rho }p(Y|\rho , Q)p(\rho |Q)\mathrm{d}\rho .\nonumber \end{aligned}$$

(11)

Following the derivation of the ICL criterion, we apply a Laplace (BIC-like) approximation on the second term of Eq. (11). Moreover, considering a Jeffreys prior distribution for $\rho $ and using Stirling formula for large values of M, we obtain

$$\begin{aligned}&\log \int _{\pi } p(A|Y, \pi , Q)p(\pi |Q)\mathrm{d}\pi \\&\quad \approx \max _{\pi }\log p(A|Y, \pi , Q) - \frac{Q^2}{2}\log M(M-1), \end{aligned}$$

as well as

$$\begin{aligned}&\log \int _{\rho }p(Y|\rho , Q)p(\rho |Q)\mathrm{d}\rho \\&\quad \approx \max _{\rho } \log p(Y|\rho , Q) - \frac{Q-1}{2}\log M. \end{aligned}$$

For more details, we refer to Biernacki et al. (2000). Furthermore, we emphasize that adding these two approximations leads to the ICL criterion for the SBM model, as derived by Daudin et al. (2008)

$$\begin{aligned} \begin{aligned} ICL_{SBM}&= \max _{\pi }\log p(A|Y, \pi , Q)\\&\quad - \frac{Q^2}{2}\log M(M-1) + \max _{\rho } \log p(Y|\rho , Q)\\&\quad - \frac{Q-1}{2}\log M \\&= \max _{\rho , \pi } \log p(A,Y|\rho , \pi , Q)\\&\quad - \frac{Q^2}{2}\log M(M-1) - \frac{Q-1}{2}\log M. \end{aligned} \end{aligned}$$

In Daudin et al. (2008), $M(M-1)$ is replaced by $M(M-1)/2$ and $Q^2$ by $Q(Q+1)/2$ since they considered undirected networks.

Now, it is worth taking a closer look at the first term of Eq. (11). This term involves a marginalization over $\beta $. Let us emphasize that $p(W|A, Y, \beta , K, Q)$ is related to the LDA model and involves a marginalization over $\theta $ (and Z). Because we aim at approximating the first term of Eq. (11), also with a Laplace (BIC-like) approximation, it is crucial to identify the number of observations in the associated likelihood term $p(W|A, Y, \beta , K, Q)$. As pointed out in Sect. 2.4, given Y (and $\theta $), it is possible to reorganize the documents in W as $W=({\tilde{W}}_{qr})_{qr}$ is such a way that all words in ${\tilde{W}}_{qr}$ follow the same mixture distribution over topics. Each aggregated document ${\tilde{W}}_{qr}$ has its own vector $\theta _{qr}$ of topic proportions and since the distribution over $\theta $ factorizes ($p(\theta )=\prod _{q,r}^{Q}p(\theta _{qr}))$, we find

$$\begin{aligned} \begin{aligned}&p(W|A, Y, \beta , K, Q)\\&\quad = \int _{\theta } p(W |A, Y, \theta , \beta , K, Q)p(\theta |K, Q)\mathrm{d}\theta \\&\quad = \prod _{q,r}^{Q}\int _{\theta _{qr}}p({\tilde{W}}_{qr}|\theta _{qr}, \beta , K, Q)p(\theta _{qr}| K)\mathrm{d}\theta _{qr} \\&\quad = \prod _{q,r}^{Q} \ell ({\tilde{W}}_{qr}|\beta , K, Q), \end{aligned} \end{aligned}$$

where $\ell ({\tilde{W}}_{qr}|\beta , K, Q)$ is exactly the likelihood term of the LDA model associated with document ${\tilde{W}}_{qr}$, as described in Blei et al. (2003). Thus

$$\begin{aligned}&\log \int _{\beta }p(W|A, Y, \beta , K, Q) p(\beta |K) \mathrm{d}\beta \nonumber \\&\quad = \log \int _{\beta } p(\beta |K) \prod _{q,r}^{Q} \ell ({\tilde{W}}_{qr}|\beta , K, Q)\mathrm{d}\beta . \end{aligned}$$

(12)

Applying a Laplace approximation on Eq. (12) is then equivalent to deriving a BIC-like criterion for the LDA model with documents in $W=({\tilde{W}}_{qr})_{qr}$. In the LDA model, the number of observations in the penalization term of BIC is the number of documents [see Than and Ho (2012) for instance]. In our case, this leads to

$$\begin{aligned}&\log \int _{\beta }p(W|A, Y, \beta , K, Q) p(\beta |K) \mathrm{d}\beta \nonumber \\&\quad \approx \max _{\beta } \log p(W|A, Y, \beta , K, Q) - \frac{K(V-1)}{2}\log Q^2.\nonumber \\ \end{aligned}$$

(13)

Unfortunately, $\log p(W|A, Y, \beta , K, Q)$ is not tractable and so we propose to replace it with its variational approximation $\tilde{{\mathcal {L}}}$, after convergence of the C-VEM algorithm. By analogy with $ICL_{SBM}$, we call the corresponding criterion $BIC_{LDA|Y}$ such that

$$\begin{aligned} \log p(A, W, Y|K, Q) \approx BIC_{LDA|Y} + ICL_{SBM}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouveyron, C., Latouche, P. & Zreik, R. The stochastic topic block model for the clustering of vertices in networks with textual edges. Stat Comput 28, 11–31 (2018). https://doi.org/10.1007/s11222-016-9713-7

Download citation

Received: 29 April 2016
Accepted: 11 October 2016
Published: 21 October 2016
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11222-016-9713-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The stochastic topic block model for the clustering of vertices in networks with textual edges

Abstract

Access this article

Similar content being viewed by others

Embedded topics in the stochastic block model

The dynamic stochastic topic block model for dynamic networks with textual edges

Identifiability and parameter estimation of the overlapped stochastic co-block model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 253 KB)

Appendix

1.1 Appendix 1: Optimization of R(Z)

1.2 Appendix 2: Optimization of \(R(\theta )\)

1.3 Appendix 3: Derivation of the lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \)

1.4 Appendix 4: Optimization of \(\beta \)

1.5 Appendix 5: Optimization of \(\rho \)

1.6 Appendix 6: Optimization of \(\pi \)

1.7 Appendix 7: Model selection

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

The stochastic topic block model for the clustering of vertices in networks with textual edges

Abstract

Access this article

Similar content being viewed by others

Embedded topics in the stochastic block model

The dynamic stochastic topic block model for dynamic networks with textual edges

Identifiability and parameter estimation of the overlapped stochastic co-block model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 253 KB)

Appendix

Appendix

1.1 Appendix 1: Optimization of R(Z)

1.2 Appendix 2: Optimization of \(R(\theta )\)

1.3 Appendix 3: Derivation of the lower bound \(\tilde{{\mathcal {L}}}\left( R(\cdot ); Y, \beta \right) \)

1.4 Appendix 4: Optimization of \(\beta \)

1.5 Appendix 5: Optimization of \(\rho \)

1.6 Appendix 6: Optimization of \(\pi \)

1.7 Appendix 7: Model selection

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation