Skip to main content
Log in

Poisson degree corrected dynamic stochastic block model

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Stochastic Block Model (SBM) provides a statistical tool for modeling and clustering network data. In this paper, we propose an extension of this model for discrete-time dynamic networks that takes into account the variability in node degrees, allowing us to model a broader class of networks. We develop a probabilistic model that generates temporal graphs with a dynamic cluster structure and time-dependent degree corrections for each node. Thanks to these degree corrections, the nodes can have variable in- and out-degrees, allowing us to model complex cluster structures as well as interactions that decrease or increase over time. We compare the proposed model to a model without degree correction and highlight its advantages in the case of inhomogenous degree distributions in the clusters and in the recovery of unstable cluster dynamics. We propose an inference procedure based on Variational Expectation-Maximization (VEM) that also provides the means to estimate the time-dependent degree corrections. Extensive experiments on simulated and real datasets confirm the benefits of our approach and show the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. We could not compare pdc-dsbm directly to the authors’ algorithm because their R package dynsbm V0.7 only implements Bernoulli, Multinomial and Gaussian distributions, so we had to re-implement it for Poisson distributions.

  2. More noise needs to be added for \(M_+\) since margins greater than one spread the classes apart.

  3. http://64.111.127.166/origin-destination/.

  4. https://cycling.data.tfl.gov.uk/.

References

  • Abbe E (2017) Community detection and stochastic block models: recent developments. J Mach Learn Res 18(1):6446–6531

    MathSciNet  Google Scholar 

  • Affeldt S, Labiod L, Nadif M (2021) Regularized bi-directional co-clustering. Stat Comput 31(3):1–17

    Article  MathSciNet  MATH  Google Scholar 

  • Ailem M, Role F, Nadif M (2017) Model-based co-clustering for the effective handling of sparse data. Pattern Recognit 72:108–122

    Article  Google Scholar 

  • Ailem M, Role F, Nadif M (2017) Sparse poisson latent block model for document clustering. IEEE Trans Knowl Data Eng 29(7):1563–1576

    Article  Google Scholar 

  • Airoldi E, Blei D, Fienberg S, Xing E (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9:1981–2014

    MATH  Google Scholar 

  • Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J Mach Learn Res 8(67):1919–1986

    MathSciNet  MATH  Google Scholar 

  • Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  MATH  Google Scholar 

  • Bartolucci F, Pandolfi S (2020) An exact algorithm for time-dependent variational inference for the dynamic stochastic block model. Pattern Recognit Lett 138:362–369

    Article  Google Scholar 

  • Benzecri J-P (1973) L’analyse des données, tome 2: l’analyse des correspondances. Dunod, Paris

    MATH  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

    Article  Google Scholar 

  • Bock H-H (2020) Co-clustering for object by variable data matrices. In: Imaizumi T, Nakayama A, Yokoyama S (eds) Advanced studies in behaviormetrics and data science: essays in Honor of Akinori Okada. Springer Singapore, Singapore, pp 3–17

    Chapter  Google Scholar 

  • Chi Y, Song X, Zhou D, Hino K, Tseng BL (2007) Evolutionary spectral clustering by incorporating temporal smoothness. In: KDD. Association for Computing Machinery, pp 153–162

  • Corneli M, Latouche P, Rossi F (2016) Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks. Neurocomputing 192:81–91

    Article  Google Scholar 

  • Corneli M, Latouche P, Rossi F (2018) Multiple change points detection and clustering in dynamic networks. Stat Comput 28(5):989–1007

    Article  MathSciNet  MATH  Google Scholar 

  • Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183

    Article  MathSciNet  Google Scholar 

  • Fu W, Song L, Xing EP (2009) Dynamic mixed membership blockmodel for evolving networks. In: ICML, pp 329–336

  • Ghahramani Z, Jordan MI (1997) Factorial hidden Markov models. Mach Learn 29(2–3):245–273

    Article  MATH  Google Scholar 

  • Govaert G, Nadif M (2005) An EM algorithm for the block mixture model. IEEE Trans Pattern Anal Mach Intell 27(4)

  • Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Govaert G, Nadif M (2018) Mutual information, phi-squared and model-based co-clustering for contingency tables. Adv Data Anal Classif 12(3):455–488

    Article  MathSciNet  MATH  Google Scholar 

  • Greenacre M (2007) Correspondence analysis in practice. Chapman & Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif

  • Karrer B, Newman ME (2011) Stochastic blockmodels and community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 83(1)

  • Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discovery Data 3(2):1–31

    Article  Google Scholar 

  • Liu S, Wang S, Krishnan R (2014) Persistent community detection in dynamic social networks. In: Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 78–89

    Chapter  Google Scholar 

  • Mariadassou M, Robin S, Vacher C (2010) Uncovering latent structure in valued graphs: a variational approach. Ann Appl Stat 4(2):715–742

    Article  MathSciNet  MATH  Google Scholar 

  • Matias C, Miele V (2017) Statistical clustering of temporal networks through a dynamic stochastic block model. J R Stat Soc Ser B Stat Methodol 79(4):1119–1141

    Article  MathSciNet  MATH  Google Scholar 

  • Matias C, Rebafka T, Villers F (2018) A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika 105(3):665–680

    Article  MathSciNet  MATH  Google Scholar 

  • Meng X-L, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278

    Article  MathSciNet  MATH  Google Scholar 

  • Neal RM, Hinton GE (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models. Springer, Berlin, pp 355–368

    Chapter  Google Scholar 

  • Qiao M, Yu J, Bian W, Li Q, Tao D (2017) Improving stochastic block models by incorporating power-law degree characteristic. In: IJCAI international joint conference on artificial intelligence, pp 2620–2626

  • Rastelli R, Latouche P, Friel N (2018) Choosing the number of groups in a latent stochastic blockmodel for dynamic networks. Netw Sci 6(4):469–493

    Article  Google Scholar 

  • Razaee Z, Amini A, Li JJ (2019) Matched bipartite block model with covariates. J Mach Learn Res 20:1–44

    MathSciNet  MATH  Google Scholar 

  • Salah A, Nadif M (2019) Directional co-clustering. Adv Data Anal Classif 13(3):591–620

    Article  MathSciNet  MATH  Google Scholar 

  • Schepers J, Bock H-H, Van Mechelen I (2017) Maximal interaction two-mode clustering. J Classif 34(1):49–75

    Article  MathSciNet  MATH  Google Scholar 

  • Sewell DK, Chen Y (2016) Latent space models for dynamic networks with weighted edges. Soc Netw 44:105–116

    Article  Google Scholar 

  • Snijders T, Nowicki K (1997) Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J Classif 14:75–100

    Article  MathSciNet  MATH  Google Scholar 

  • Wang YJ, Wong GY (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82(397):8–19

    Article  MathSciNet  MATH  Google Scholar 

  • Xu KS, Hero AO (2014) Dynamic stochastic blockmodels for time-evolving social networks. IEEE J Sel Top Signal Process 8(4):552–562

    Article  Google Scholar 

  • Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82(2):157–189

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank the three anonymous reviewers for their detailed comments that have helped us a lot to improve this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Riverain.

Ethics declarations

Conflict of intrest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Derivation of the objective criterion  (6)

We derive the criterion  (6) in the case of a constant number of nodes (\(\forall t, \, V^t = V\)), the other case easily follows. Let Q be a probability over the space of complete data \(\mathcal {Z}\), i.e. the set of all possible latent trajectories for N nodes, over K possible states and T time steps. From Neal and Hinton (1998), we have:

$$\begin{aligned} \ell ({\varvec{\theta }}) \ge F({\varvec{q}}, {\varvec{\theta }})&= \ell ({\varvec{\theta }}) - KL(Q||P(.|{\varvec{X}}, {\varvec{\theta }})) \\&= \mathbb {E}_Q(\log P({\varvec{X}}, {\varvec{Z}}; {\varvec{\theta }})) + \mathbb {H}(Q) \\&= \mathbb {E}_Q(\log P({\varvec{X}}, {\varvec{Z}}; {\varvec{\theta }}) - \log Q({\varvec{Z}}; {\varvec{q}})). \end{aligned}$$

Let Q factorize as N independent inhomogeneous Markov models:

$$\begin{aligned} Q({\varvec{Z}}; {\varvec{q}})&= \prod _{i} Q(Z_{i}^{1}; {\varvec{q}}) \prod _{t \ge 2} Q(Z_{i}^{t}|Z_{i}^{t-1}; {\varvec{q}})\\&= \prod _{ik}q(i, k)^{Z_{ik}^1} \prod _{t \ge 2}\prod _{\ell } q(t, i, k, \ell )^{Z_{ik}^{t-1} Z_{i\ell }^t} \end{aligned}$$

where Q is parameterized by \({\varvec{q}}= \Big (\big (q(i, k)\big )_{ik}, \big (q(t, i, k, \ell )\big )_{tikl}\Big )\), with \(q(i, k) = Q(Z_{ik}^{1}=1)\), \(q(t, i, k, \ell ) = Q(Z_{i\ell }^{t}=1 | Z_{ik}^{t-1}=1)\).

First, from the variational distributions, we have:

$$\begin{aligned} \mathbb {E}_Q(\log Q({\varvec{Z}}; {\varvec{q}})) =&\sum _{ik} \mathbb {E}_Q(Z_{ik}^1) \log q(i, k) \nonumber \\&+ \sum _{t \ge 2} \sum _{ik\ell } \mathbb {E}_Q(Z_{ik}^{t-1} Z_{i\ell }^t) \log q(t,i,k,\ell ). \end{aligned}$$
(14)

Secondly, from the model, we have:

$$\begin{aligned} \mathbb {E}_Q(\log P({\varvec{X}}, {\varvec{Z}}; {\varvec{\theta }})) =&\sum _{ik} \mathbb {E}_Q(Z_{ik}^1) \log \alpha _k + \sum _{t \ge 2}\sum _{ik\ell } \mathbb {E}_Q(Z_{ik}^{t-1} Z_{i\ell }^t) \log \pi _{k\ell } \nonumber \\&+ \sum _{t}\sum _{i \ne j}\sum _{k\ell } \mathbb {E}_Q(Z_{ik}^{t} Z_{j\ell }^{t}) \log \phi (X_{ij}^t; \mu _i^t \nu _j^t \gamma _{k\ell }). \end{aligned}$$
(15)

To develop \(F({\varvec{q}}, {\varvec{\theta }})\) we rely on the following Lemma.

Lemma 1

We have the following equalities:

$$\begin{aligned}&\mathbb {E}_Q(Z_{ik}^1) = q(i, k) , \end{aligned}$$
(16a)
$$\begin{aligned}&\mathbb {E}_Q(Z_{ik}^{t-1} Z_{i\ell }^t) = q(t - 1, i, k)q(t, i, k, \ell ) , \end{aligned}$$
(16b)
$$\begin{aligned}&\forall i \ne j, \; \mathbb {E}_Q(Z_{ik}^{t} Z_{j\ell }^t) = q(t, i, k)q(t, j, \ell ). \end{aligned}$$
(16c)

Thereby, from the expressions of (14, 15) and Lemma 1, the variational lower-bound of the log-likelihood of the model is given by:

$$\begin{aligned} F({\varvec{q}}, {\varvec{\theta }})&= \mathbb {E}_Q(\log P({\varvec{X}}, {\varvec{Z}}; {\varvec{\theta }}) - \log Q({\varvec{Z}}; {\varvec{q}}))\\&= \sum _{ik} q(i,k) \log \alpha _k + \sum _{t \ge 2}\sum _{ik\ell } q(t-1,i,k) q(t,i,k,\ell ) \log \pi _{k\ell } \\&\quad + \sum _{t}\sum _{i \ne j}\sum _{k\ell } q(t,i,k) q(t,j,\ell )\log \phi (X_{ij}^t; \mu _i^t \nu _j^t \gamma _{k\ell }) \\&\quad - \sum _{ik} q(i, k) \log q(i, k) - \sum _{t \ge 2} \sum _{ik\ell } q(t-1,i,k) q(t,i,k,\ell ) \log q(t,i,k,\ell ). \end{aligned}$$

Proof

(of Lemma 1) As the Markov chains \({\varvec{Z}}_i\) and \({\varvec{Z}}_j\) are independent for the distribution Q, we have to prove that \(\mathbb {E}_Q(Z_{ik}^{t}) = q(t, i, k)\). The proof for (16a) and (16b) are analogous.

In the paper, the latent processes are defined over the index set \(\{1, \ldots , T\}\). In the following, we consider a virtual source cluster \(k_s\) at virtual time step \(t = 0\) from which every node starts. Let \(\mathcal {Z}_i^{(t, t')}(k)\) be all the possible latent trajectories for node i over \(t' - t\) time steps, starting at cluster k at time t:

$$\begin{aligned} \mathcal {Z}_i^{(t, t')}(k) = \{{\varvec{Z}}_i \in \{0, 1\}^{(t' - t + 1)K}|&{\varvec{Z}}_i = ({\varvec{Z}}_i^t, \ldots , {\varvec{Z}}_i^{t'})^\intercal , Z_{ik}^{t} = 1 \wedge \forall \tau , \, \sum _k Z_{ik}^{\tau } = 1 \}. \end{aligned}$$

For \(t' \le t\) and \(k' \in \{1, \ldots , K\}\), we define \(\mathcal {Z}_i^{(t, t')}(k, \tau , k')\), the set of all paths of \(\mathcal {Z}_i^{(t, t')}(k)\), that pass through cluster \(k'\) at time step \(\tau \):

$$\begin{aligned} \mathcal {Z}_i^{(t, t')}(k, \tau , k') = \{{\varvec{Z}}_i \in \mathcal {Z}_i^{(t, t')}(k) | Z_{ik'}^{\tau } = 1\}. \end{aligned}$$

Let \(Q^i\) be the distribution for node i. As the N chains are independent:

$$\begin{aligned} \mathbb {E}_Q(Z_{ik}^{t}) = \mathbb {E}_{Q^{i}}(Z_{ik}^{t}) = Q^{i}(Z_{ik}^{t}=1) = \sum _{{\varvec{Z}}\in \mathcal {Z}_i^{(0, T)}(k_s, t, k)} Q^{i}({\varvec{Z}}). \end{aligned}$$

As \(\mathcal {Z}_i^{(0, T)}(k_s, t, k)\) decomposes as \(\mathcal {Z}_i^{(0, t-1)}(k_s) \times \mathcal {Z}_i^{(t, T)}(k)\). In the following, we identify the elements of the sets with their index (\((k_0^{(c)}, \ldots , k_T^{(c)})\) is the cth element of \(\mathcal {Z}_i^{(0, T)}(k)\)). For consistency with the notations, we define \(q(1, i, k_s, k') = q(i, k')\) the transition probability from virtual cluster \(k_s\) at \(t=0\). We can then write:

$$\begin{aligned} \mathbb {E}_Q(Z_{ik}^{t})&= \sum _{{\varvec{Z}}\in \mathcal {Z}_i^{(T)}(s, t, k)} Q^{i}({\varvec{Z}}) \\&= \sum _{c \in \mathcal {Z}_i^{(T)}(s, t, k)}q(1, i, k_0^{(c)}, k_1^{(c)}) q(2, i, k_1^{(c)}, k_2^{(c)}) \ldots q(T, i, k_{T-1}^{(c)}, k_T^{(c)}) \\&= \sum _{c' \in \mathcal {Z}_i^{(0, t-1)}(k_s)} \sum _{c'' \in \mathcal {Z}_i^{(t, T)}(k)} \bigg ( q(1, i, k_0^{(c')}, k_1^{(c')}) \ldots q(t, i, k_{t-1}^{(c')}, k) \\&\quad \times q(t+1, i, k, k_{1}^{(c'')}) \ldots q(T, i, k_{T-t-1}^{(c'')}, k_{T-t}^{(c'')}) \bigg ) \\&= \bigg (\sum _{c' \in \mathcal {Z}_i^{(0, t-1)}(k_s)} q(1, i, k_0^{(c')}, k_1^{(c')}) \ldots q(t, i, k_{t-1}^{(c')}, k)\bigg ) \\&\quad \times \bigg (\sum _{c'' \in \mathcal {Z}_i^{(t, T)}(k)} q(t+1, i, k, k_{1}^{(c'')}) \ldots q(T, i, k_{T-t-1}^{(c'')}, k_{T-t}^{(c'')})\bigg ). \end{aligned}$$

The second sum in the last equation corresponds to summing over all possible paths in a chain of length \(T-t\) starting at cluster k, so it equals one. Now, recall that:

$$\begin{aligned} q(t, i, k)&= \sum _{k'} q(t-1, i, k') q(t, i, k', k) \\&= \sum _{k_1, \ldots , k_{t-1}} q(i, k_1) q(2, i, k_1, k_2) \ldots q(t, i, k_{t-1}, k) \\&= \sum _{c' \in \mathcal {Z}_i^{(0, t-1)}(k_s)} q(1, i, k_0^{(c')}, k_1^{(c')}) q(2, i, k_1^{(c')}, k_2^{(c')}) \ldots q(t, i, k_{t-1}^{(c')}, k). \end{aligned}$$

This concludes the proof. \(\square \)

Derivation of the expectation step

Here, we present a way to derive the proposed formulae in E-step for a fixed set of nodes (i.e. \(\forall t,\; V^t = V\)). The results when considering a variable number of nodes easily follows.

As proposed in Bartolucci and Pandolfi (2020), the true VE step can be realized but is computationally heavy. In fact, in order to optimize \(F({\varvec{q}}, {\varvec{\theta }})\) w.r.t. \(q(t, i, k, \ell )\), we notice that every \(q(t', i, k')\) with \(t' \ge t\) depends on \(q(t, i, k, \ell )\). Here, we instead propose a VE step that increases \(F({\varvec{q}}, {\varvec{\theta }})\) w.r.t. the variational parameters \({\varvec{q}}\).

We consider the variational parameters \({\varvec{q}}(i) = \Big (\big (q(i, k)\big )_{k}, \big (q(t, i, k, \ell )\big )_{tkl}\Big )\) as well as auxiliary variables \({\varvec{q}}_m^t(i) = \big (q(t, i, k)\big )_{ik}\) for the marginal probabilities, where \(q(t, i, k) = Q(Z_{ik}^{t}=1)\).

We first note that F can be decomposed over each node and cluster thanks to the variational approximation: \(F({\varvec{q}}, {\varvec{\theta }}) = \sum _{i\ell } F_{i\ell } ({\varvec{q}}(i), {\varvec{q}}_m(-i), {\varvec{\theta }})\) where \({\varvec{q}}_m(-i) = \big ({\varvec{q}}_m^1(j), \ldots , {\varvec{q}}_m^T(j)\big )_{j \ne i}\) and

$$\begin{aligned}&F_{i\ell } ({\varvec{q}}(i), {\varvec{q}}_m(-i), {\varvec{\theta }}) = q(i,\ell ) \log \frac{\alpha _\ell }{q(i, \ell )}\\&\quad + \sum _{t \ge 2}\sum _{k} q(t-1,i,k) q(t,i,k,\ell ) \log \frac{\pi _{k\ell }}{q(t,i,k,\ell )}\\&\quad + \sum _{t} \bigg ( q(t,i,\ell ) \sum _{j \ne i} \sum _{k} q(t,j,k) \log \phi _{ij\ell k}^{t} + q(t,i,\ell ) \sum _{j \ne i} \sum _{k} q(t,j,k) \log \phi _{jik\ell }^{t} \bigg ) \\&\quad = q(i,\ell ) \log \frac{\alpha _\ell }{q(i, \ell )}\\&\quad + \sum _{t \ge 2}\sum _{k} q(t-1,i,k) q(t,i,k,\ell ) \log \frac{\pi _{k\ell }}{q(t,i,k,\ell )} \\&\quad + \sum _{t} q(t, i,\ell ) \sum _{j \ne i} \sum _{k} q(t, j,k) \log \Phi _{ij\ell k}^{t} \end{aligned}$$

where we note \(\Phi _{ijk\ell }^{t} = \phi _{ijk\ell }^{t}\phi _{ji\ell k}^{t}\).

For constant marginal probabilities \({\varvec{q}}_m(-i)\), we optimize

$$\begin{aligned} F_{i\ell } (({\varvec{q}}^1(i),{\varvec{q}}_m^1(i)) \ldots , ({\varvec{q}}_m^T(i), {\varvec{q}}^T(i)) | {\varvec{q}}_m^t(-i), {\varvec{\theta }}) \end{aligned}$$

by applying a single step of coordinate ascent on each coordinate \(({\varvec{q}}^t(i),{\varvec{q}}_m^t(i))\). When applying this procedure, the other coordinates (\(({\varvec{q}}^{-t}(i),{\varvec{q}}_m^{-t}(i))\)) are considered constant. We apply this procedure sequentially, for t in \(\{1, \ldots , T\}\), and update the marginal probabilities q(tik) with the obtained transition probabilities \(q(t, i, k, \ell )\) at each time step.

The formula for E-step can be obtained as follows. Since \(q(t, i, k) = \sum _{k'} q(t-1, i, k') q(t, i, k', k)\), \(({\varvec{q}}^t(i),{\varvec{q}}_m^t(i))\) only depends on \({\varvec{q}}^t(i)\). For \(t \ge 2\), we can write:

$$\begin{aligned}&F_{i\ell }\big (q(t, i, 1, \ell ), \ldots , q(t, i, K, \ell )| {\varvec{q}}^{-t}(i),{\varvec{q}}_m^{-t}(i), {\varvec{q}}_m(-i), {\varvec{\theta }}\big )\\&\quad = \sum _{k} q(t-1,i,k) q(t,i,k,\ell ) \log \frac{\pi _{k\ell }}{q(t,i,k,\ell )}\\&\qquad + \sum _{k} q(t,i,k) q(t+1,i,k,\ell ) \log \frac{\pi _{k\ell }}{q(t+1,i,k,\ell )}\\&\qquad + q(t, i,\ell ) \sum _{j \ne i} \sum _{k} q(t, j,k) \log \Phi _{ij\ell k}^{t}. \end{aligned}$$

Let \(\mathcal {L}({\varvec{q}}^t(i), \lambda )\) be the Lagrangian of the constrained optimization problem:

$$\begin{aligned}&\mathcal {L}(q(t, i, 1, \ell ), \ldots , q(t, i, K, \ell ), \lambda )\\&\quad = F_{i\ell }\big (q(t, i, 1, \ell ), \ldots , q(t, i, K, \ell )| {\varvec{q}}^{-t}(i),{\varvec{q}}_m^{-t}(i), {\varvec{q}}_m(-i), {\varvec{\theta }}\big )\\&\qquad + \lambda \big (1 - \sum _{\ell '}q(t, i, k, \ell ')\big ) \end{aligned}$$

For \((q(t', i, k))_{k \in \{1, \ldots , K\}, t'\ne t}\) constant and \(s \in \{1, \ldots , T\}\), we have:

$$\begin{aligned} \frac{\partial {q(s, i, k')}}{\partial {q(t, i, k, \ell )}}&= \mathbb {1}(s=t) \frac{\partial {}}{\partial {q(t, i, k, \ell )}} \sum _{\ell '} q(t-1, i, \ell ') q(t, i, \ell ', k') \\&\quad = \mathbb {1}(s=t)\mathbb {1}(k'=\ell ) q(t-1, i, k) \end{aligned}$$

and

$$\begin{aligned}&\frac{\partial {}}{\partial {q(t, i, k, \ell )}} \sum _{k} q(t,i,k) q(t+1,i,k,\ell ) \log \frac{\pi _{k\ell }}{q(t+1,i,k,\ell )} \\&\quad = -q(t-1,i,k) \sum _{\ell '} q(t+1,i,\ell ,\ell ') \big ( \log q(t+1, i, \ell , \ell ') - \log (\pi _{\ell \ell '})\\&\quad = -q(t-1,i,k) D_{\text {KL}}({\varvec{q}}(t+1,i,\ell , :) || {\varvec{\pi }}_{\ell , :}) \end{aligned}$$

where \({\varvec{q}}(t+1,i,\ell , :) = (q(t+1,i,\ell , 1), \ldots , q(t+1,i,\ell , K))^\intercal \) and \({\varvec{\pi }}_{\ell , :} = \big (\pi _{1\ell }, \ldots , \pi _{K\ell }\big )^\intercal \). Let \({d_{ik}^t = D_{\text {KL}}({\varvec{q}}(t,i,k, :) || {\varvec{\pi }}_{k, :})}\). We then have:

$$\begin{aligned}&\frac{\partial {}}{\partial {q(t, i, k, \ell )}} F_{i\ell }\big (q(t, i, 1, \ell ), \ldots , q(t, i, K, \ell )| {\varvec{q}}^{-t}(i),{\varvec{q}}_m^{-t}(i), {\varvec{q}}_m(-i), {\varvec{\theta }}\big )\\&\quad = q(t-1, i, k) \big (\log \pi _{k\ell } - d_{i\ell }^{t+1} - 1 - \log q(t, i, k, \ell )\big ) \\&\qquad + \frac{\partial {}}{\partial {q(t, i, k, \ell )}} \sum _{s} q(s,i,\ell ) \sum _{j \ne i} \sum _{\ell '} q(s,j,\ell ') \log \Phi _{ij\ell \ell '}^{s} \\&\quad = q(t-1, i, k) \big ( \log \pi _{k\ell } - d_{i\ell }^{t+1} - 1 - \log q(t, i, k, \ell ) + \sum _{j \ne i} \sum _{\ell '} q(t, j, \ell ') \log \Phi _{ij\ell \ell '}^t \big ). \end{aligned}$$

Setting the derivative of the Lagrangian to zero, we have:

$$\begin{aligned} \log q(t, i, k, \ell ) = - \frac{\lambda }{q(t-1, i, k)} - 1 + \log \pi _{k\ell } - d_{i\ell }^{t+1} + \sum _{j \ne i} \sum _{\ell '} q(t, j, \ell ') \log \Phi _{ij\ell \ell '}^t. \end{aligned}$$

Thus, \(q(t, i, k, \ell ) \propto \pi _{k\ell } \exp (- d_{i\ell }^{t+1}) \prod _{j \ne i} \prod _{\ell '} {\Phi _{ij\ell \ell '}^{t}}^{q(t, j, \ell ')}\). This justifies the proposed formula. We can note that contrary to Matias and Miele (2017), this formula includes a penalty term \(\exp (- d_{i\ell }^{t+1})\) to the mixture proportions. In Matias and Miele (2017), the formula for E-step seems to be an approximation of this formula. In our experiments, we observed that our formula gives better clustering results when the data has many cluster transitions (\({\varvec{\pi }}\) has low trace) without smoothing the margins, but comparable results when smoothing the margins.

Derivation of the M-step

To update the parameters in the maximization step, we increase \(F({\varvec{q}}, {\varvec{\theta }})\) w.r.t. \({\varvec{\theta }}\) by maximizing F for each parameter, conditionally on the others. We first update the mixture proportions \({\varvec{\alpha }}\) and \({\varvec{\pi }}\), since they only depend on \({\varvec{q}}\). Next, we update \({\varvec{\gamma }}\), then \({\varvec{\mu }}\) and finally \({\varvec{\nu }}\). The updates (8a, 8b) with respect to \({\varvec{\alpha }}\) and \({\varvec{\pi }}\) are direct. Concerning \({\varvec{\mu }}\), \({\varvec{\nu }}\) and \({\varvec{\gamma }}\), the lower-bound on the log-likelihood of the model is:

$$\begin{aligned} F({\varvec{q}}, {\varvec{\theta }})&= \sum _{t}\sum _{\begin{array}{c} ij\\ i \ne j \end{array}}\sum _{k\ell } q(t,i,k) q(t,j,\ell ) \log \phi _{ijk\ell }^t + \text {const} \nonumber \\&= \sum _{t}\sum _{\begin{array}{c} ij\\ i \ne j \end{array}}\sum _{k\ell } q(t,i,k) q(t,j,\ell ) \big (\mu _i^t \nu _j^t \gamma _{k\ell } - X_{ij}^t \log \phi (X_{ij}^t; \mu _i^t \nu _j^t \gamma _{k\ell }) \big ) + \text {const}.\nonumber \\ \end{aligned}$$
(17)

By computing the derivative of (17) w.r.t. \(\mu _i^t\), \(\nu _j^t\) and \(\gamma _{k\ell }\) and setting it to zero we obtain the maximization step in  (8d, 8e, 8c).

Model selection with the ICL criterion

In order to choose the appropriate number of clusters K we considered the Integrated Classification Likelihood Biernacki et al. (2000), as proposed in Daudin et al. (2008) for the static SBM and in Corneli et al. (2016); Matias and Miele (2017); Rastelli et al. (2018) for dynamic models based on the SBM. The ICL criterion for a model \(M_K\) with K clusters is defined as:

$$\begin{aligned} ICL(M_K)&= \log P({\varvec{X}}, {\varvec{Z}}|M_K) = \int _{{\varvec{\varTheta }}} P({\varvec{X, Z}}|{\varvec{\theta }}, M_K) g({\varvec{\theta }}|M_K)\,d{\varvec{\theta }}, \end{aligned}$$
(18)

where \({\varvec{\theta }}= ({\varvec{\alpha }}, {\varvec{\pi }}, {\varvec{\mu }}, {\varvec{\nu }}, {\varvec{\gamma }}) \in {\varvec{\varTheta }}\), \({\varvec{\varTheta }} = A_K \times A_K^K \times \mathbb {R}^{+TN} \times \mathbb {R}^{+TN} \times \mathbb {R}^{+TK^2}\), \(A_K\) is the K-dimensional simplex and g is the density of the prior distribution on \({\varvec{\varTheta }}\).

Let \(g_{\pi _k}({\varvec{\pi }}_k|M_{K}) = \frac{1}{B(\delta , \ldots , \delta )}\prod _{k'} \pi _{kk'}^{\delta - 1}\) be a prior on \({\varvec{\pi }}_k\), the kth row of \({\varvec{\pi }}\).

$$\begin{aligned} \log P({\varvec{Z}}|M_K) =&\log \int _{A_K} \frac{1}{B(\delta , \ldots , \delta )} \alpha _1^{Z_{.1}^1 + \delta - 1}\dots \alpha _K^{Z_{.K} + \delta - 1} \, d{\varvec{\alpha }}\\&+ \log \int _{A_K^K} \prod _{k} \frac{1}{B(\delta , \ldots , \delta )} \pi _{k1}^{n_{k1} + \delta - 1} \dots \pi _{kK}^{n_{kK} + \delta - 1} \,d{\varvec{\pi }}\\ =&I_\alpha + I_\pi \end{aligned}$$

where \(n_{kk'}^z = \sum _{t \ge 2}\sum _i Z_{ik}^{t-1}Z_{ik'}^{t}\) and \(I_\alpha \) is computed as in Daudin et al. (2008).

$$\begin{aligned} I_\pi&= \log \int _{A_g^g} \prod _{k} \frac{1}{B(\delta , \ldots , \delta )} \pi _{k1}^{n_{k1}^z + \delta - 1} \dots \pi _{kg}^{n_{kg}^z + \delta - 1} \,d{\varvec{\pi }}\\&= \log \prod _{k} \frac{1}{B(\delta , \ldots , \delta )} \int _{A_K} \pi _{k1}^{n_{k1}^z + \delta - 1} \dots \pi _{kg}^{n_{kK}^z + \delta - 1} \,d{\varvec{\pi }}_k\\&= \sum _k \log \Big ( \frac{B(n_{k1}^z + \delta , \ldots , n_{kK}^z + \delta )}{B(\delta , \ldots , \delta )} \Big ) \\&= g \log \varGamma (\delta g) - g^2 \log \varGamma (\delta ) - \sum _k \log \varGamma (n_{k.}^z + K\delta ) + \sum _{kk'} \log \varGamma (n_{kk'}^z + \delta ) \end{aligned}$$

We use Stirling’s formula \(\log \varGamma (x) \approx (x - \frac{1}{2}) \log (x - 1) - (x - 1) + \frac{1}{2} \log \pi \), which is even valid for small values of x. Thus, Stirling’s formula for \(\log \varGamma (n_{kk'}^z + \delta )\) remains valid with small values of \(n_{kk'}^z\). Following Biernacki et al. (2000), it can be shown that, assuming \(K = o(N)\) and removing terms in O(1) (since the error term of the BIC is O(1)):

$$\begin{aligned} \log P({\varvec{Z}}|M_K) =&- \frac{K-1}{2} \log N + \sum _{k} Z_{.k}^1 \log \frac{Z_{.k}^1}{N} \\&- \frac{K-1}{2} \sum _k \log n_{k.}^z +\sum _{kk'} n_{kk'}^z \log \frac{n_{kk'}^z}{n_{k.}^z}, \end{aligned}$$

where \(n_{k.}^z = \sum _{k'} n_{kk'}^z\) and \(Z_{.k}^1 = \sum _i Z_{ik}^1\). Using the hypothesis \(n_{k.}^z = \frac{N(T-1)}{K}\), we have \(\sum _k \log n_{k.}^z = K\log N(T-1) + o(N)\). Replacing \({\varvec{Z}}\) by \(\widehat{{\varvec{Z}}}\), the estimated partition, we obtain:

$$\begin{aligned} ICL(K) \approx&\max _{{\varvec{\theta }}} \log P({\varvec{X}}, \widehat{{\varvec{Z}}}|{\varvec{\theta }}, M_{K}) - \frac{K - 1}{2} \log N \\&- \frac{K(K - 1)}{2}\log N(T-1) - \frac{K^2 + 2TN}{2} \log (TN(N-1)). \end{aligned}$$

The term \(\frac{K - 1}{2} \log N\) is due to the estimated parameter \({\varvec{\alpha }}\). In Matias and Miele (2017), the parameter \({\varvec{\alpha }}\) is not estimated and is considered to be equal to the stationary distribution of \({\varvec{\pi }}\). Omitting the term due to \({\varvec{\alpha }}\) in the proposed ICL results in the same ICL as proposed in Matias and Miele (2017).

We note that we have no guarantee that the assumption of Dirichlet priors for each row of \({\varvec{\pi }}\) with Jeffrey’s uninformative priors is a good choice. In fact, with this dynamic model, we are interested in partitions that are relatively stable through time, which implies that \({\varvec{\pi }}\) should be diagonally dominant. Thus, contrary to mixture proportions in mixture models, some dimensions of the simplex should be preferred by the prior for the rows of \({\varvec{\pi }}\), such that \({\varvec{\pi }}_k\), the kth row of \({\varvec{\pi }}\), could have a prior in the form \(\text {Dir}({\varvec{\delta }}_k)\), with \(\delta _{k\ell } = \delta _0\) if \(k \ne \ell \) and \(\delta _{kk} = \delta _{\text {diag}}\), where \(\delta _{\text {diag}} > \delta _0\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Riverain, P., Fossier, S. & Nadif, M. Poisson degree corrected dynamic stochastic block model. Adv Data Anal Classif 17, 135–162 (2023). https://doi.org/10.1007/s11634-022-00492-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-022-00492-9

Keywords

Mathematics Subject Classification

Navigation