Abstract
Stochastic Block Model (SBM) provides a statistical tool for modeling and clustering network data. In this paper, we propose an extension of this model for discrete-time dynamic networks that takes into account the variability in node degrees, allowing us to model a broader class of networks. We develop a probabilistic model that generates temporal graphs with a dynamic cluster structure and time-dependent degree corrections for each node. Thanks to these degree corrections, the nodes can have variable in- and out-degrees, allowing us to model complex cluster structures as well as interactions that decrease or increase over time. We compare the proposed model to a model without degree correction and highlight its advantages in the case of inhomogenous degree distributions in the clusters and in the recovery of unstable cluster dynamics. We propose an inference procedure based on Variational Expectation-Maximization (VEM) that also provides the means to estimate the time-dependent degree corrections. Extensive experiments on simulated and real datasets confirm the benefits of our approach and show the effectiveness of the proposed algorithm.
Similar content being viewed by others
Notes
We could not compare pdc-dsbm directly to the authors’ algorithm because their R package dynsbm V0.7 only implements Bernoulli, Multinomial and Gaussian distributions, so we had to re-implement it for Poisson distributions.
More noise needs to be added for \(M_+\) since margins greater than one spread the classes apart.
References
Abbe E (2017) Community detection and stochastic block models: recent developments. J Mach Learn Res 18(1):6446–6531
Affeldt S, Labiod L, Nadif M (2021) Regularized bi-directional co-clustering. Stat Comput 31(3):1–17
Ailem M, Role F, Nadif M (2017) Model-based co-clustering for the effective handling of sparse data. Pattern Recognit 72:108–122
Ailem M, Role F, Nadif M (2017) Sparse poisson latent block model for document clustering. IEEE Trans Knowl Data Eng 29(7):1563–1576
Airoldi E, Blei D, Fienberg S, Xing E (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9:1981–2014
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J Mach Learn Res 8(67):1919–1986
Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Bartolucci F, Pandolfi S (2020) An exact algorithm for time-dependent variational inference for the dynamic stochastic block model. Pattern Recognit Lett 138:362–369
Benzecri J-P (1973) L’analyse des données, tome 2: l’analyse des correspondances. Dunod, Paris
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Bock H-H (2020) Co-clustering for object by variable data matrices. In: Imaizumi T, Nakayama A, Yokoyama S (eds) Advanced studies in behaviormetrics and data science: essays in Honor of Akinori Okada. Springer Singapore, Singapore, pp 3–17
Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary spectral clustering by incorporating temporal smoothness. In: KDD. Association for Computing Machinery, pp 153–162
Corneli M, Latouche P, Rossi F (2016) Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks. Neurocomputing 192:81–91
Corneli M, Latouche P, Rossi F (2018) Multiple change points detection and clustering in dynamic networks. Stat Comput 28(5):989–1007
Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183
Fu W, Song L, Xing EP (2009) Dynamic mixed membership blockmodel for evolving networks. In: ICML, pp 329–336
Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273
Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4)
Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. Wiley, Hoboken
Govaert G, Nadif M (2018) Mutual information, phi-squared and model-based co-clustering for contingency tables. Adv Data Anal Classif 12(3):455–488
Greenacre M (2007) Correspondence analysis in practice. Chapman & Hall/CRC, Boca Raton
Hubert L, Arabie P (1985) Comparing partitions. J Classif
Karrer B, Newman ME (2011) Stochastic blockmodels and community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 83(1)
Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discovery Data 3(2):1–31
Liu S, Wang S, Krishnan R (2014) Persistent community detection in dynamic social networks. In: Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 78–89
Mariadassou M, Robin S, Vacher C (2010) Uncovering latent structure in valued graphs: a variational approach. Ann Appl Stat 4(2):715–742
Matias C, Miele V (2017) Statistical clustering of temporal networks through a dynamic stochastic block model. J R Stat Soc Ser B Stat Methodol 79(4):1119–1141
Matias C, Rebafka T, Villers F (2018) A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika 105(3):665–680
Meng X-L, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278
Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models. Springer, Berlin, pp 355–368
Qiao M, Yu J, Bian W, Li Q, Tao D (2017) Improving stochastic block models by incorporating power-law degree characteristic. In: IJCAI international joint conference on artificial intelligence, pp 2620–2626
Rastelli R, Latouche P, Friel N (2018) Choosing the number of groups in a latent stochastic blockmodel for dynamic networks. Netw Sci 6(4):469–493
Razaee Z, Amini A, Li JJ (2019) Matched bipartite block model with covariates. J Mach Learn Res 20:1–44
Salah A, Nadif M (2019) Directional co-clustering. Adv Data Anal Classif 13(3):591–620
Schepers J, Bock H-H, Van Mechelen I (2017) Maximal interaction two-mode clustering. J Classif 34(1):49–75
Sewell DK, Chen Y (2016) Latent space models for dynamic networks with weighted edges. Soc Netw 44:105–116
Snijders T, Nowicki K (1997) Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J Classif 14:75–100
Wang YJ, Wong GY (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82(397):8–19
Xu KS, Hero AO (2014) Dynamic stochastic blockmodels for time-evolving social networks. IEEE J Sel Top Signal Process 8(4):552–562
Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82(2):157–189
Acknowledgements
We thank the three anonymous reviewers for their detailed comments that have helped us a lot to improve this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of intrest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Derivation of the objective criterion (6)
We derive the criterion (6) in the case of a constant number of nodes (\(\forall t, \, V^t = V\)), the other case easily follows. Let Q be a probability over the space of complete data \(\mathcal {Z}\), i.e. the set of all possible latent trajectories for N nodes, over K possible states and T time steps. From Neal and Hinton (1998), we have:
Let Q factorize as N independent inhomogeneous Markov models:
where Q is parameterized by \({\varvec{q}}= \Big (\big (q(i, k)\big )_{ik}, \big (q(t, i, k, \ell )\big )_{tikl}\Big )\), with \(q(i, k) = Q(Z_{ik}^{1}=1)\), \(q(t, i, k, \ell ) = Q(Z_{i\ell }^{t}=1 | Z_{ik}^{t-1}=1)\).
First, from the variational distributions, we have:
Secondly, from the model, we have:
To develop \(F({\varvec{q}}, {\varvec{\theta }})\) we rely on the following Lemma.
Lemma 1
We have the following equalities:
Thereby, from the expressions of (14, 15) and Lemma 1, the variational lower-bound of the log-likelihood of the model is given by:
Proof
(of Lemma 1) As the Markov chains \({\varvec{Z}}_i\) and \({\varvec{Z}}_j\) are independent for the distribution Q, we have to prove that \(\mathbb {E}_Q(Z_{ik}^{t}) = q(t, i, k)\). The proof for (16a) and (16b) are analogous.
In the paper, the latent processes are defined over the index set \(\{1, \ldots , T\}\). In the following, we consider a virtual source cluster \(k_s\) at virtual time step \(t = 0\) from which every node starts. Let \(\mathcal {Z}_i^{(t, t')}(k)\) be all the possible latent trajectories for node i over \(t' - t\) time steps, starting at cluster k at time t:
For \(t' \le t\) and \(k' \in \{1, \ldots , K\}\), we define \(\mathcal {Z}_i^{(t, t')}(k, \tau , k')\), the set of all paths of \(\mathcal {Z}_i^{(t, t')}(k)\), that pass through cluster \(k'\) at time step \(\tau \):
Let \(Q^i\) be the distribution for node i. As the N chains are independent:
As \(\mathcal {Z}_i^{(0, T)}(k_s, t, k)\) decomposes as \(\mathcal {Z}_i^{(0, t-1)}(k_s) \times \mathcal {Z}_i^{(t, T)}(k)\). In the following, we identify the elements of the sets with their index (\((k_0^{(c)}, \ldots , k_T^{(c)})\) is the cth element of \(\mathcal {Z}_i^{(0, T)}(k)\)). For consistency with the notations, we define \(q(1, i, k_s, k') = q(i, k')\) the transition probability from virtual cluster \(k_s\) at \(t=0\). We can then write:
The second sum in the last equation corresponds to summing over all possible paths in a chain of length \(T-t\) starting at cluster k, so it equals one. Now, recall that:
This concludes the proof. \(\square \)
Derivation of the expectation step
Here, we present a way to derive the proposed formulae in E-step for a fixed set of nodes (i.e. \(\forall t,\; V^t = V\)). The results when considering a variable number of nodes easily follows.
As proposed in Bartolucci and Pandolfi (2020), the true VE step can be realized but is computationally heavy. In fact, in order to optimize \(F({\varvec{q}}, {\varvec{\theta }})\) w.r.t. \(q(t, i, k, \ell )\), we notice that every \(q(t', i, k')\) with \(t' \ge t\) depends on \(q(t, i, k, \ell )\). Here, we instead propose a VE step that increases \(F({\varvec{q}}, {\varvec{\theta }})\) w.r.t. the variational parameters \({\varvec{q}}\).
We consider the variational parameters \({\varvec{q}}(i) = \Big (\big (q(i, k)\big )_{k}, \big (q(t, i, k, \ell )\big )_{tkl}\Big )\) as well as auxiliary variables \({\varvec{q}}_m^t(i) = \big (q(t, i, k)\big )_{ik}\) for the marginal probabilities, where \(q(t, i, k) = Q(Z_{ik}^{t}=1)\).
We first note that F can be decomposed over each node and cluster thanks to the variational approximation: \(F({\varvec{q}}, {\varvec{\theta }}) = \sum _{i\ell } F_{i\ell } ({\varvec{q}}(i), {\varvec{q}}_m(-i), {\varvec{\theta }})\) where \({\varvec{q}}_m(-i) = \big ({\varvec{q}}_m^1(j), \ldots , {\varvec{q}}_m^T(j)\big )_{j \ne i}\) and
where we note \(\Phi _{ijk\ell }^{t} = \phi _{ijk\ell }^{t}\phi _{ji\ell k}^{t}\).
For constant marginal probabilities \({\varvec{q}}_m(-i)\), we optimize
by applying a single step of coordinate ascent on each coordinate \(({\varvec{q}}^t(i),{\varvec{q}}_m^t(i))\). When applying this procedure, the other coordinates (\(({\varvec{q}}^{-t}(i),{\varvec{q}}_m^{-t}(i))\)) are considered constant. We apply this procedure sequentially, for t in \(\{1, \ldots , T\}\), and update the marginal probabilities q(t, i, k) with the obtained transition probabilities \(q(t, i, k, \ell )\) at each time step.
The formula for E-step can be obtained as follows. Since \(q(t, i, k) = \sum _{k'} q(t-1, i, k') q(t, i, k', k)\), \(({\varvec{q}}^t(i),{\varvec{q}}_m^t(i))\) only depends on \({\varvec{q}}^t(i)\). For \(t \ge 2\), we can write:
Let \(\mathcal {L}({\varvec{q}}^t(i), \lambda )\) be the Lagrangian of the constrained optimization problem:
For \((q(t', i, k))_{k \in \{1, \ldots , K\}, t'\ne t}\) constant and \(s \in \{1, \ldots , T\}\), we have:
and
where \({\varvec{q}}(t+1,i,\ell , :) = (q(t+1,i,\ell , 1), \ldots , q(t+1,i,\ell , K))^\intercal \) and \({\varvec{\pi }}_{\ell , :} = \big (\pi _{1\ell }, \ldots , \pi _{K\ell }\big )^\intercal \). Let \({d_{ik}^t = D_{\text {KL}}({\varvec{q}}(t,i,k, :) || {\varvec{\pi }}_{k, :})}\). We then have:
Setting the derivative of the Lagrangian to zero, we have:
Thus, \(q(t, i, k, \ell ) \propto \pi _{k\ell } \exp (- d_{i\ell }^{t+1}) \prod _{j \ne i} \prod _{\ell '} {\Phi _{ij\ell \ell '}^{t}}^{q(t, j, \ell ')}\). This justifies the proposed formula. We can note that contrary to Matias and Miele (2017), this formula includes a penalty term \(\exp (- d_{i\ell }^{t+1})\) to the mixture proportions. In Matias and Miele (2017), the formula for E-step seems to be an approximation of this formula. In our experiments, we observed that our formula gives better clustering results when the data has many cluster transitions (\({\varvec{\pi }}\) has low trace) without smoothing the margins, but comparable results when smoothing the margins.
Derivation of the M-step
To update the parameters in the maximization step, we increase \(F({\varvec{q}}, {\varvec{\theta }})\) w.r.t. \({\varvec{\theta }}\) by maximizing F for each parameter, conditionally on the others. We first update the mixture proportions \({\varvec{\alpha }}\) and \({\varvec{\pi }}\), since they only depend on \({\varvec{q}}\). Next, we update \({\varvec{\gamma }}\), then \({\varvec{\mu }}\) and finally \({\varvec{\nu }}\). The updates (8a, 8b) with respect to \({\varvec{\alpha }}\) and \({\varvec{\pi }}\) are direct. Concerning \({\varvec{\mu }}\), \({\varvec{\nu }}\) and \({\varvec{\gamma }}\), the lower-bound on the log-likelihood of the model is:
By computing the derivative of (17) w.r.t. \(\mu _i^t\), \(\nu _j^t\) and \(\gamma _{k\ell }\) and setting it to zero we obtain the maximization step in (8d, 8e, 8c).
Model selection with the ICL criterion
In order to choose the appropriate number of clusters K we considered the Integrated Classification Likelihood Biernacki et al. (2000), as proposed in Daudin et al. (2008) for the static SBM and in Corneli et al. (2016); Matias and Miele (2017); Rastelli et al. (2018) for dynamic models based on the SBM. The ICL criterion for a model \(M_K\) with K clusters is defined as:
where \({\varvec{\theta }}= ({\varvec{\alpha }}, {\varvec{\pi }}, {\varvec{\mu }}, {\varvec{\nu }}, {\varvec{\gamma }}) \in {\varvec{\varTheta }}\), \({\varvec{\varTheta }} = A_K \times A_K^K \times \mathbb {R}^{+TN} \times \mathbb {R}^{+TN} \times \mathbb {R}^{+TK^2}\), \(A_K\) is the K-dimensional simplex and g is the density of the prior distribution on \({\varvec{\varTheta }}\).
Let \(g_{\pi _k}({\varvec{\pi }}_k|M_{K}) = \frac{1}{B(\delta , \ldots , \delta )}\prod _{k'} \pi _{kk'}^{\delta - 1}\) be a prior on \({\varvec{\pi }}_k\), the kth row of \({\varvec{\pi }}\).
where \(n_{kk'}^z = \sum _{t \ge 2}\sum _i Z_{ik}^{t-1}Z_{ik'}^{t}\) and \(I_\alpha \) is computed as in Daudin et al. (2008).
We use Stirling’s formula \(\log \varGamma (x) \approx (x - \frac{1}{2}) \log (x - 1) - (x - 1) + \frac{1}{2} \log \pi \), which is even valid for small values of x. Thus, Stirling’s formula for \(\log \varGamma (n_{kk'}^z + \delta )\) remains valid with small values of \(n_{kk'}^z\). Following Biernacki et al. (2000), it can be shown that, assuming \(K = o(N)\) and removing terms in O(1) (since the error term of the BIC is O(1)):
where \(n_{k.}^z = \sum _{k'} n_{kk'}^z\) and \(Z_{.k}^1 = \sum _i Z_{ik}^1\). Using the hypothesis \(n_{k.}^z = \frac{N(T-1)}{K}\), we have \(\sum _k \log n_{k.}^z = K\log N(T-1) + o(N)\). Replacing \({\varvec{Z}}\) by \(\widehat{{\varvec{Z}}}\), the estimated partition, we obtain:
The term \(\frac{K - 1}{2} \log N\) is due to the estimated parameter \({\varvec{\alpha }}\). In Matias and Miele (2017), the parameter \({\varvec{\alpha }}\) is not estimated and is considered to be equal to the stationary distribution of \({\varvec{\pi }}\). Omitting the term due to \({\varvec{\alpha }}\) in the proposed ICL results in the same ICL as proposed in Matias and Miele (2017).
We note that we have no guarantee that the assumption of Dirichlet priors for each row of \({\varvec{\pi }}\) with Jeffrey’s uninformative priors is a good choice. In fact, with this dynamic model, we are interested in partitions that are relatively stable through time, which implies that \({\varvec{\pi }}\) should be diagonally dominant. Thus, contrary to mixture proportions in mixture models, some dimensions of the simplex should be preferred by the prior for the rows of \({\varvec{\pi }}\), such that \({\varvec{\pi }}_k\), the kth row of \({\varvec{\pi }}\), could have a prior in the form \(\text {Dir}({\varvec{\delta }}_k)\), with \(\delta _{k\ell } = \delta _0\) if \(k \ne \ell \) and \(\delta _{kk} = \delta _{\text {diag}}\), where \(\delta _{\text {diag}} > \delta _0\).
Rights and permissions
About this article
Cite this article
Riverain, P., Fossier, S. & Nadif, M. Poisson degree corrected dynamic stochastic block model. Adv Data Anal Classif 17, 135–162 (2023). https://doi.org/10.1007/s11634-022-00492-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-022-00492-9