Computational Statistics

, Volume 32, Issue 2, pp 501–533 | Cite as

The dynamic random subgraph model for the clustering of evolving networks

Original Paper

Abstract

In recent years, many clustering methods have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these techniques are suitable for static networks, that is to say, not taking into account the temporal dimension. This work is motivated by the need of analyzing evolving networks where a decomposition of the networks into subgraphs is given. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM). A variational expectation maximization (VEM) algorithm is proposed to perform inference. We show that the variational approximations lead to an update step which involves a new state space model from which the parameters along with the hidden states can be estimated using the standard Kalman filter and Rauch–Tung–Striebel smoother. Simulated data sets are considered to assess the proposed methodology. Finally, dRSM along with the corresponding VEM algorithm are applied to an original maritime network built from printed Lloyd’s voyage records.

Keywords

State space model Variational inference Variational expectation maximization Maritime data 

1 Introduction

Network analysis has become a mature discipline, since the original work of Moreno (1934), which is no longer limited to sociology and is now applied in many areas such as biology (Albert and Barabási 2002; Barabási and Oltvai 2004; Palla et al. 2005), geography (Ducruet 2013) or history (Rossi et al. 2014). The growing interest in network analysis is explained partly by the strong presence of this type of data in the digital world, and by recent advances in the modeling and the processing of these data. The clustering methods allow in particular clusters of vertices sharing homogeneous connection profiles to be uncovered. Most methods look for specific structures, so called communities, which exhibit a transitivity property such that nodes of the same community are more likely to be connected (Hofman and Wiggins 2008). A popular approach for community discovering, though asymptotically biased (Bickel and Chen 2009), is based on the modularity score given by Girvan and Newman (2002). Alternative methods usually rely on the latent position cluster model (LPCM) of Handcock et al. (2007) which assumes that the links between the vertices depend on their positions in a social latent space.

The stochastic block model (SBM; Wang and Wong 1987; Nowicki and Snijders 2001) is a flexible random graph model which can also characterize communities, but not only. It is based on a probabilistic generalization of the method applied by White et al. (1976) on Sampson’s famous monastery (Fienberg and Wasserman 1981). The SBM model assumes that each vertex belongs to a latent group, and that the probability of connection between a pair of vertices depends exclusively on their group. Because no assumption is made on the connection probabilities, various types of structures of vertices can be taken into account. While SBM was originally developed to analyze mainly binary networks, many extensions have been proposed since to deal for instance with valued edges (Mariadassou et al. 2010) or to take into account prior information (Zanghi et al. 2010; Matias and Robin 2014). In particular, the random subgraph model (RSM) of Jernite et al. (2014) aims at modeling categorical edges using prior knowledge of a partition of the network into subgraphs. These known subgraphs are assumed to be made of latent clusters which have to be inferred. The vertices are then connected with a probability depending only on the subgraphs whereas the edge type is assumed to be sampled conditionally on the latent groups. This model was applied in the original paper to analyze a historical network in merovingian Gaul. Note that other extensions of SBM have focused on looking for overlapping clusters (Airoldi et al. 2008; Latouche et al. 2011). The inference of SBM like models is usually done using variational expectation maximization (VEM; Daudin et al. 2008), variational Bayes EM (VBEM; Latouche et al. 2012), or Gibbs sampling (Nowicki and Snijders 2001). Moreover, we emphasize that various strategies have been derived to estimates the number of corresponding clusters using model selection criteria (Daudin et al. 2008; Latouche et al. 2012), allocation sampler (Mc Daid et al. 2013), greedy search (Côme and Latouche 2015), or non parametric schemes (Kemp et al. 2006).

Recently, a few attempts have been made to extend the models mentioned previously in order to deal with dynamic networks. The main idea consists in introducing temporal processes in order to characterize the temporal evolution of nodes and edges through time. Thus, Yang et al. (2011) proposed a dynamic version of SBM allowing a node to switch its class at time \(t+1\) depending on its state at time t. The switching probabilities are all characterized by a transition matrix. The alternative extension for SBM of Xu and Hero III (2013) focuses on modeling the temporal changes through a state space model and relies on the Kalman filter for inference. Contrary to Yang et al. (2011), Xu and Hero III (2013) treated the edge probabilities as time varying parameters. In parallel, the mixed membership SBM (MMSBM) of Airoldi et al. (2008), capable of characterizing overlapping clusters, was adapted to deal with dynamic networks by Xing et al. (2010), Ho et al. (2011) and Kim and Leskovec (2013). Moreover, Sarkar and Moore (2005) derived a dynamic version of the LPCM model of Handcock et al. (2007) keeping the transitivity property that nodes which are close in a social latent space should be more likely to connect. Finally, we would like to highlight the work of Dubois et al. (2013) and Heaukulani and Ghahramani (2013). In Dubois et al. (2013) a non homogeneous Poisson process is considered. Thus, contrary to most clustering models for dynamic networks, a continuous time period is taken into account and events, i.e. the creation or removal of an edge, occur one at a time. While models usually focus on modeling the dynamic of networks through the evolution of their latent structures, Heaukulani and Ghahramani (2013) extended the dynamic latent feature model of Foulds et al. (2011) to define how observed social interactions can affect future unobserved latent structures. In the same vein, a dynamic model inspired by SBM was proposed recently by Xu (2015).

In this paper, we aim at modeling dynamic networks with binary or more generally typed edges, for which a partition of the nodes is given. As an example, we will consider an original network, built from printed Lloyd’s voyage records and describing maritime flows between ports where the geographical positions of the ports play an important role. The partition was obtained by associating each port to a region according to its geographical position. Figure 1 presents the evolution of network navigations, for 23 years between October 1985 and October 2008. A (given) partition of the nodes is seen here as a decomposition of the network into known subgraphs that we propose to model using unobserved clusters that have to be inferred from the data in practice. Thus, considering a slightly different version of the original RSM model of Jernite et al. (2014) and relying on a state space model as in Xing et al. (2010), we propose a new random graph model for evolving networks that we call the dynamic RSM (dRSM) model. The model focuses on describing the network dynamic by characterizing the evolution of the cluster proportions within the known subgraphs. A logistic transformation is used to link the hidden states and the clusters proportions, as in Blei and Lafferty (2007a); Ahmed and Xing 2007). The inference of the model is done using a VEM algorithm.

The article is organized as follows. In Sect. 2, we introduce the dRSM model along with an inference procedure in Sect. 3. Variational techniques are considered and a model selection criterion is derived. Finally, the methodology is tested on simulated data in Sect. 4 and on the maritime network built from Lloyd’s data in Sect. 5.
Fig. 1

Connections between a subset of 26 ports (from October 1890 to October 2008). Data extracted from Lloyd’s list. The known subgraphs correspond to geographical regions (continents) indicated using colors (color figure online)

2 The dynamic random subgraph model

This section presents the context of the work and introduces the dRSM model along with the modeling of its dynamic. The joint distribution associated with the model is also detailed.

2.1 Context and notations

We consider a set of T networks \(\lbrace {\mathcal {G}}^{(t)}\rbrace _{t=1}^{T}\), where \( {\mathcal {G}}^{(t)} \) is a directed graph observed at time t . Each \( {\mathcal {G}}^{(t)} \) is represented by its \( N \times N \) adjacency matrix \(X^{(t)} \) where N denotes the number of nodes. The edge \( X_ {i,j}^ {(t)} \), describing the relationship between nodes i and j , is assumed to take its values in \( \lbrace 0, \ldots C \rbrace \) such that \(X_{ij}^{(t)} = c \) means that nodes i and j are linked by a relationship of type c at time t and \(X_{ij}^{(t)} = 0\) indicates the absence of relationship between the two nodes at time t. Note that no self loops are considered, i.e. the connection of a node to itself, thus \(X_{ii}^{(t)}=0,\,\forall \, i, t\).

Moreover, a partition \({\mathcal {P}}\) of the network into S classes of vertices is assumed to be given. We emphasize that the observed partition induces a decomposition of the graph into subgraphs where each class of vertices corresponds to a specific subgraph. To describe the subgraph membership of each vertex, the variable s is introduced. The variable takes its values in \(\lbrace 1,\ldots S\rbrace \) and is such that \(s_{i}\) indicates the subgraph of vertex i. In some cases, and in order to clarify the equations, we will also consider the indicator variables \(y_{is}\) such that \(y_{is}=1\) if node i is in subgraph s, 0 otherwise. Finally, because the vertex i can only belong to a single subgraph, we have \(\sum _{s=1}^{S}y_{is}=1\).

Our goal is to cluster at each time t the N nodes into K latent groups with homogeneous connection profiles, i.e. find an estimate of the set Z of latent variables \(Z_ {ik} ^ {(t)}\) such that \( Z_ {ik} ^ {(t)} = \) 1 if at time t , the node i belongs to the class k , and 0 otherwise. Please note that N, C, \({\mathcal {P}}\), S and K are all assumed to be constant over time.

2.2 The model at each time t

As in the original RSM model, the (known) subgraphs are assumed to be built from K unobserved clusters of vertices, with varying proportions. Thus, each subgraph s has its own mixing proportion vector \(\alpha _{s}^{(t)}=(\alpha _{s1}^{(t)},\ldots ,\alpha _{sK}^{(t)})\) where \(\alpha _{sk}^{(t)}\) is the proportion of cluster k in subgraph s at time t and \(\,\sum _{k=1}^{K}\alpha _{sk}^{(t)}=1, \, \forall \, s,t\). The network is then assumed to be generated at each time t as follows.

Each vertex i is first associated to a latent cluster k with a probability depending on its subgraph \(s_i\). In practice, the variable \( Z_{i}^{(t)} \) is drawn from a multinomial distribution of parameter \(\alpha _{s_{i}}^{(t)}\):
$$\begin{aligned} Z_{i}^{\left( t\right) }\sim {\mathcal {M}}\left( 1,\alpha _{s_{i}}^{\left( t\right) }\right) , \end{aligned}$$
and therefore \(\sum _{k=1}^{K}Z_{ik}^{(t)}=1\). Note that \(Z_{ik}^{(t)}=1\) indicates that vertex i belongs to cluster k at time t, 0 otherwise.
On the other hand, the type of link between nodes i and j is assumed to be sampled from a multinomial distribution depending on the latent vectors \( Z_ {i} ^ {(t)} \) and \( Z_ {j} ^ {(t)} \):
$$\begin{aligned} X_{ij}^{(t)}|Z_{ik}^{(t)}Z_{jl}^{(t)}=1\sim {\mathcal {M}}(1,\Pi _{kl}), \end{aligned}$$
with \(\Pi _{kl}\)\(\in [0,1]^{C+1}\) and \(\sum _{c=0}^{C}\Pi ^c_{kl}=1,\forall k,l\).
As in the RSM model, and more generally in SBM like models, all vectors \(Z_{i}^{(t)}\) are sampled independently, and, conditionally on these membership vectors, the edges are assumed to be independent. Thus, contrary to the original RSM model, the edges depend directly on the latent clusters exclusively, and there is no direct dependency on the subgraphs (see Fig. 3). Each edge between a pair (ij) of vertices does depend on the subgraphs \(s_i\) and \(s_j\), but only through the fact that the edge depends on the latent clusters of the vertices, which themselves depend on the subgraphs. The dependency is indirect while in the original RSM model, the latent clusters along with the subgraphs are all involved in the creation of edges and have different roles. Indeed, the presence or absence of an edge between (ij) is first drawn from a Bernoulli distribution depending on \(s_i\) and \(s_j\). If an edge is present, the edge type is then sampled depending on the latent clusters. The separation of roles between the latent clusters and the subgraphs was originally motivated by assumptions regarding the nature of the networks analyzed. We do not make such assumptions in this paper. The latent clusters explain both the creation of an edge and its type.
Fig. 2

A dRSM network observed at time t. The network is made of 9 nodes belonging to \(S=2\) subgraphs (denoted through the form of the nodes) and split into \(K=3\) clusters (indicated by the colors). According to the dRSM model, the directed edges between the nodes can be of different types (\(C=2\) types are considered here). Given the clusters, the presence of an edge depends on the connection probabilities between clusters (\(\Pi \)) (color figure online)

Figure 2 presents an example of a dRSM network, observed at time t, made of 9 nodes belonging to 2 subgraphs (denoted through the form of nodes) and split into 3 clusters (indicated by the colors).

2.3 Modeling the evolution of random subgraphs

In order to model the evolution of the cluster proportions within the subgraphs through time, a state space model is considered as in Xing et al. (2010). Thus, the latent variable \( \gamma _{s}^{(t)} \) is introduced and a logistic transformation \(f(\cdot )\) is used to link the mixing vector \(\alpha ^{(t)}_{s}\) with \( \gamma _{s}^{(t)} \):
$$\begin{aligned} \alpha _{s}^{(t)}=f(\gamma _{s}^{(t)}), \end{aligned}$$
such that
$$\begin{aligned} \alpha _{sk}^{(t)} = f_{k}(\gamma _{s}^{(t)})=\exp (\gamma _{sk}^{(t)} - C(\gamma _s^{(t)})), \forall s,k,t, \end{aligned}$$
where \(\gamma _{sK}^{(t)} = 0\) and \(C(\gamma _s^{(t)})= \log (\sum _{k=1}^{K}\exp (\gamma _{sk}^{(t)}))\). The choice to fix the last component of the vector \(\gamma _s^{(t)}\) arbitrarily to 0 is widely used in the literature (see for instance Blei and Lafferty 2007a; Lafferty and Blei 2006; Blei and Lafferty 2007; Xing et al. 2010) and is due to the bijectivity constraint of this logistic transformation which requires \(\gamma _s^{(t)}\) to live in a (\(K-1\)) dimensional space since \(\alpha _{s}^{(t)}\) has (\(K-1\)) degrees of freedom. This induces that \(\gamma _{sk}^{(t)} = \log (\alpha _{sk}^{(t)}/\alpha _{sK}^{(t)}),\forall s,k,t\). In addition, the (\(K-1\)) first components of the vector \(\gamma _{s}^{(t)}\) are assumed to be distributed according to a Gaussian distribution with mean \(B\nu ^{(t)}\) and covariance matrix \(\Sigma \):
$$\begin{aligned} \gamma _{s \backslash K}^{(t)} \sim {\mathcal {N}}(B\nu ^{(t)},\Sigma ), \end{aligned}$$
(1)
where \(\gamma _{s \backslash K}^{(t)}\) is the vector \(\gamma _s^{(t)}\) without his last component. Both \(\Sigma \) and B are matrices of size \((K-1) \times (K-1)\) while \(\nu ^{(t)}\) is a \((K-1)\) dimensional vector. Let us notice that even though the \(\gamma _{s}^{(t)}\) have the same mean in the state-space, they are actually independent and thus play different roles.
The rest of the model now involves a classic state space model for linear dynamic systems. It is defined as follows:
$$\begin{aligned} \left\{ \begin{array}{l} \nu ^{(t)}=A\nu ^{(t-1)}+\omega \\ \nu ^{(1)}=\mu _{0}+u. \end{array}\right. \end{aligned}$$
The noise terms \(\omega \) and u are supposed to be Gaussian and independent:
$$\begin{aligned} \left\{ \begin{array}{l} \omega \sim {\mathcal {N}}(0,\Phi )\\ u\sim {\mathcal {N}}(0,V_{0}). \end{array}\right. \end{aligned}$$
Again, A, \(\Phi \) and \(V_0\) are matrices of size \((K-1)\times (K-1) \) while \(\mu _0\) is a \((K-1)\) dimensional vector.
Fig. 3

Graphical representation of the dRSM model

Table 1

Summary of the notations used in the paper

Notations

Description

X

Adjacency matrix \(X_{ij}^{(t)} \in \{0,\ldots ,C\}\) at each t

Z

Binary matrix. \(Z_{ik}^{(t)}=1\) indicates that i belongs to cluster k at t

N

Number of vertices in the network

K

Number of latent clusters

S

Number of subgraphs

C

Number of edge types

\(\Pi \)

\(\Pi ^c_{kl}\) is the probability of having an edge of type c between vertices of clusters k and l

\(\alpha \)

\(\alpha _{sk}^{(t)}=f_k(\gamma _s^{(t)})\) is the proportion of cluster k in the subgraph s at t

Notice that the state space model for linear dynamic systems may suffer from model identifiability issues and constraints have to be introduced (see for instance Harvey 1989). In the following, we derive the inference procedure in a general context since different constraints can be considered. In practice, in all the experiments that we carried out, we fixed A, B, and \(V_0\) to be equal to the identity matrix \(I_{K-1}\) and all components of \(\mu _{0}\) to zero.

The model described here has three sets of latent variables (\(\nu =(\nu ^{(t)})_t,\gamma =(\gamma _s^{(t)})_{st},Z=(Z_{ik}^{(t)})_{ikt}\)) and is parameterized by \(\theta =(\mu _{0},A,B,\Phi ,V_{0},\Sigma ,\Pi )\). Note that all parameters in \( \theta \) depend neither on time nor subgraphs. This model is called the dynamic random subgraph model (dRSM) in the rest of the document. Figure 3 gives the graphical model for dRSM and Table 1 summarizes the notations used in the model.

At this point, it is possible to see some links and differences between dRSM and dM3SBM (Ho et al. 2011), which is the closest model in the litterature. On the one hand, dRSM and dM3SBM share a common way to model the latent clusters and the temporal dynamic through a state space model. On the other hand, dRSM is able to handle categorical edges, which is a useful feature when working on real-world networks, whereas dM3SBM cannot. In addition, dRSM requires the knowledge of the subgraphs whereas dM3SBM proposes to estimate them. Furthermore, dM3SBM allows the nodes to belong to different clusters. However, allowing to estimate the subgraphs and multi-group belongings may conduce dM3SBM to be a too flexible model and thus to fail in recovering the network structure. Indeed, providing the subgraphs to dRSM allows it to avoid looking for obvious structures such that it can focus on the search of hidden patterns. The comparisons presented in Sect. 4 seem to confirm this thesis.

2.4 Joint distribution of dRSM

The dRSM model proposed above is defined by the joint distribution:
$$\begin{aligned} p\left( X,Z,\gamma ,\nu |\theta \right) = p\left( X|Z,\Pi \right) p\left( Z|\gamma \right) p\left( \gamma _{\backslash K} | B,\nu ,\Sigma \right) p\left( \nu |\mu _{0},A,\Phi ,V_{0}\right) , \end{aligned}$$
(2)
where \(\gamma _{\backslash K}=(\gamma _{s\backslash K}^{(t)})_{st}\). Moreover
$$\begin{aligned} p\left( X|Z,\Pi \right) =\prod _{t=1}^{T}\prod _{k,l}^{K}\prod _{c=0}^{C}\left( \Pi _{kl}^{c}\right) ^{\sum _{i\ne j}^{N}\delta \left( X_{ij}^{\left( t\right) }=c\right) Z_{ik}^{\left( t\right) }Z_{jl}^{\left( t\right) }}, \end{aligned}$$
and
$$\begin{aligned} \begin{aligned} p\left( Z|\gamma \right)&= \prod _{t=1}^{T}\prod _{i=1}^{N}\prod _{k=1}^{K} f_k\left( \gamma _{s_{i}}^{\left( t\right) }\right) ^{Z_{ik}^{\left( t\right) }}\\&= \prod _{t=1}^{T}\prod _{k=1}^{K} \prod _{s=1}^{S} f_k\left( \gamma _{s}^{\left( t\right) }\right) ^{\sum _{i=1}^{N}y_{is}Z_{ik}^{\left( t\right) }}. \end{aligned} \end{aligned}$$
(3)
Note that
$$\begin{aligned} p\left( \gamma _{\backslash K}|B, \nu ,\Sigma \right) =\prod _{t=1}^{T}\prod _{s=1}^{S}{\mathcal {N}}\left( \gamma _{s \backslash K}^{\left( t\right) }; B\nu ^{\left( t\right) },\Sigma \right) , \end{aligned}$$
where \({\mathcal {N}}(\gamma _{s \backslash K}^{(t)};B\nu ^{(t)},\Sigma )\) denotes the multivariate Gaussian distribution, with mean vector \(B\nu ^{(t)}\) and covariance matrix \(\Sigma \), evaluated at \(\gamma _{s \backslash K}^{(t)}\). Finally
$$\begin{aligned} p\left( \nu |\mu _{0},A,\Phi ,V_{0}\right) =p\left( \nu ^{\left( 1\right) }|\mu _{0},V_{0}\right) \prod _{t=2}^{T}\log p\left( \nu ^{\left( t\right) }|\nu ^{\left( t-1\right) },A,\Phi \right) . \end{aligned}$$

3 Estimation

This section focuses on the inference of the model proposed above. A variational EM algorithm is considered and a model selection criterion is derived.

3.1 A variational framework

We aim at maximizing the log-likelihood \(\log p(X| \theta )\) associated with the model. To achieve this maximization, a common approach consists in using an expectation maximization (EM) algorithm (Dempster et al. 1977; Krishnan and McLachlan 1997). However, such an algorithm cannot be derived here since \(p(Z,\gamma ,\nu |X,\theta )\) is intractable. Therefore, we propose to use a variational EM-type algorithm (VEM; Hathaway 1986) which locally optimizes the model parameters with respect to a lower bound of the log-likelihood. Thus, given a distribution q for the three sets of latent variables \((Z,\gamma ,\nu )\), the log-likelihood can be written:
$$\begin{aligned} \log p(X|\theta )={\mathcal {L}}(q,\theta )+KL(q(.)\parallel p(.|X,\theta )), \end{aligned}$$
(4)
where \({\mathcal {L}}\) is defined as follows:
$$\begin{aligned} {\mathcal {L}}(q,\theta )=\sum _{Z}\int _{\gamma }\int _{\nu }q(Z,\gamma ,\nu )\log \dfrac{p(X,Z,\gamma ,\nu |{{\theta }})}{q(Z,\gamma ,\nu )}d\gamma \, d\nu , \end{aligned}$$
(5)
and KL denotes the Kullback-Leibler divergence between the true and approximate posterior distributions:
$$\begin{aligned} KL(q(.)\parallel p(.|X,\theta ))=-\,\sum _{Z}\int _{\gamma }\int _{\nu }q(Z,\gamma ,\nu )\log \dfrac{p(Z,\gamma ,\nu |X,\theta )}{q(Z,\gamma ,\nu )}d\gamma \, d\nu . \end{aligned}$$
(6)
Looking for the best approximation of the posterior distribution \(p(Z,\gamma ,\nu |X,\theta )\) in the sense of the KL divergence becomes equivalent to searching for a distribution \(q(\cdot )\) that maximizes the lower bound \({\mathcal {L}}\) of the integrated log-likelihood. Unfortunately, because the joint distribution (2) in the lower bound involves the quantity \(p(Z|\gamma )\) which depends on the normalizing constant \(C(\gamma _{s}^{(t)})\), \({\mathcal {L}}\) has no analytical form and cannot be optimized with respect to \(q(\cdot )\). Indeed, \(C(\gamma _{s}^{(t)})=\log (\sum _{l=1}^{K}\exp (\gamma _{sl}^{(t)}))\) is based on a non linear transformation of the vector \(\gamma _s^{(t)}\) which makes some expectations of the standard VEM algorithm impossible to derive.

Following the work of Lafferty and Blei (2006) on correlated topic models, we propose a new bound of \({\mathcal {L}}(q,\theta )\) based on a variational lower bound of \(p(Z|\gamma )\), as in Jordan et al. in Jordan et al. (1999).

Proposition 3.1

(Proof in 1) Given any set \(\xi \) of variational parameters \(\xi _{s}^{(t)}\in {\mathbb {R}}^{*+}\), a lower bound of the first lower bound \({\mathcal {L}}(q,\theta )\) is given by:
$$\begin{aligned} \log p(X|\theta )\geqslant {\mathcal {L}}(q,\theta )\geqslant \tilde{{\mathcal {L}}}(q,\theta ,\xi ), \end{aligned}$$
where
$$\begin{aligned}&\tilde{{\mathcal {L}}}(q,\theta ,\xi )\nonumber \\&\quad =\sum _{Z}\int _{\gamma }\int _{\nu }q(Z,\gamma ,\nu )\log \dfrac{p(X|Z,\Pi )h(Z,\gamma ,\xi )p(\gamma _{\backslash K}|B,\nu ,\Sigma )p(\nu |\mu _{0},A,\Phi ,V_{0})}{q(Z,\gamma ,\nu )} d\gamma \, d\nu \end{aligned}$$
(7)
with
$$\begin{aligned}&\log h(Z,\gamma ,\xi )\\&\quad = \sum _{t=1}^{T}\sum _{k=1}^{K}\sum _{i=1}^{N} \sum _{s=1}^{S} y_{is}Z_{ik}^{(t)}\Big (\gamma _{sk}^{(t)}-\big (\xi _{s}^{-1(t)}\sum _{l=1}^{K}\exp \big (\gamma _{sl}^{(t)}\big )-1+\log (\xi _{s}^{(t)})\big )\Big ) . \end{aligned}$$

Note that the variational parameters \(\xi _{s}^{(t)}\) can be optimized to obtain tight bounds (see the end of Sect. 3.2). Moreover, we emphasize that a variational parameter \(\xi _{s}^{(t)}\) is considered for each subgraph s and each time t for more flexibility and to improve the inference procedure. We point out that the quality of the variational approximation we propose cannot be tested analytically since \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\) and the Kullback-Leibler divergence in (6) are not tractable. Nevertheless, we rely on them for inference purposes. Note that similar approximation schemes have been used for instance by Bishop and Svensén (2003) and Latouche et al. (2014), in the context of model selection.

In order to maximize \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\), we further assume that \(q(Z,\gamma ,\nu )\) can be factorized:
$$\begin{aligned} q(Z,\gamma ,\nu )&=q(Z)q(\gamma )q(\nu ) =\Big (\prod _{t=1}^{T}\prod _{i=1}^{N}q\big (Z_{i}^{(t)}\big )\Big )q(\gamma ) q(\nu ). \end{aligned}$$
Finally \(q(\gamma )\) is chosen within the family of Gaussian distributions of the form:
$$\begin{aligned} q(\gamma )=\prod _{t=1}^{T}\prod _{s=1}^{S}\prod _{k=1}^{K}{\mathcal {N}}\left( \gamma _{sk}^{(t)};\hat{\gamma }_{sk}^{(t)},\hat{\sigma }_{sk}^{2^{(t)}}\right) , \end{aligned}$$
to derive analytical expectations in the E step, as in Lafferty and Blei (2006). Since the last component of each vector \(\gamma _{s}^{(t)}\) has to remain equal to zero, to preserve the bijectivity constraints of the transformation \(f(\cdot )\), the terms \(\hat{\gamma }_{sK}^{(t)}\) and \(\hat{\sigma }_{sK}^{2^{(t)}}\) are all set to zero to ensure a Dirac mass at zero. All other mean and variance terms \((\hat{\gamma }_{sk}^{(t)},\hat{\sigma }_{sk}^{2^{(t)}}),\forall s, k\ne K, t\), are parameters to be estimated.

3.2 A VEM algorithm for the dRSM model

In this section, we first assume that the variational terms \(\xi \), which were introduced for approximation purposes, are given. This allows the use of a VEM algorithm (Jordan et al. 1999) to maximize the lower bound \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\) with respect to \(q(Z,\gamma ,\nu )\) and the model parameters \(\theta \). Such an optimization procedure is iterative and involves a series of successive updates. In the E step, the model parameters are fixed and the lower bound is optimized with respect to \(q(Z,\gamma ,\nu )\). Conversely, during the M step, the variational distribution is held fixed while \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\) is maximized with respect to \(\theta \). In standard VEM algorithms, a unique set of latent variables is usually considered. In our case, there are three sets \((Z, \gamma , \nu )\) of latent variables and therefore the E step itself involves iterative updates (as in Latouche et al. 2014, for instance). All distributions in \(q(Z, \gamma , \nu )\) are held fixed, except one, which is optimized. This procedure is repeated for all distributions in turn.

In the following, we give the update formulae for the E and M steps. The details of the calculations along with the derivation of the lower bound are given in the “Appendices 1, 2 and 3”.

Proposition 3.2

The VEM update step for each distribution \( q (Z_{i}^{(t)}) \) is given by:
$$\begin{aligned} q(Z_{i}^{(t)})\sim {\mathcal {M}}\left( Z_{i}^{(t)};1,\tau _{i}^{(t)}=\left( \tau _{i1}^{(t)},\dots ,\tau _{iK}^{(t)}\right) \right) \,\forall i,t, \end{aligned}$$
where
$$\begin{aligned}&\tau _{ik}^{(t)}\propto \exp \Bigg (\sum _{l=1}^{K}\sum _{c=0}^{C}\sum _{i\ne j}^{N}\delta \Big (X_{ij}^{(t)}=c\Big )\tau _{jl}^{(t)}\Big [\log \Big (\Pi _{kl}^{c}\Big )+\log \Big (\Pi _{lk}^{c}\Big )\Big ]\\&\quad +\sum _{s=1}^{S} y_{is} \Big ( \hat{\gamma }_{sk}^{(t)}-\Big (\xi _{s}^{-1(t)}\sum _{l=1}^{K}\exp \Big (\hat{\gamma }_{sl}^{(t)}+\dfrac{\hat{\sigma }_{sl}^{2^{(t)}}}{2}\Big )-1+\log \Big (\xi _{s}^{(t)}\Big )\Big )\Big )\Bigg ). \end{aligned}$$

Note that \(\tau _{ik}^{(t)}\) is the approximate posterior probability that node i belongs to cluster k at time t.

Proposition 3.3

The VEM update step for the distribution \( q (\nu ) \) is given by:
$$\begin{aligned} q(\nu )\propto p(\nu ^{(1)}|\mu _{0},V_{0})\Big [\prod _{t=2}^{T}p(\nu ^{(t)}|\nu ^{(t-1)},A,\Phi )\Big ]\Big [\prod _{t=1}^{T}{\mathcal {N}}\Big (\dfrac{\sum _{s=1}^{S}\hat{\gamma }_{s}^{(t)}}{S};B\nu ^{(t)},\dfrac{\Sigma }{S}\Big )\Big ]. \end{aligned}$$
At this step, we recall that the terms \(\hat{\gamma }_{s}^{(t)}\) are fixed and so is the variable \(x^{(t)}=\sum _{s=1}^{S}\hat{\gamma }_{s}^{(t)}/S\). Therefore, it is remarkable to note that the functional form of \(q(\nu )\) corresponds exactly to the form of the posterior distribution associated with a state space model where \(\nu \) is the set of all latent state variables and \(x=(x^{(t)})_t\) the set of observed outputs. Thus, each \(x^{(t)}\) can be written as \(x^{(t)}=B\nu ^{(t)}+\tilde{v}\) where \(\tilde{v}\sim {\mathcal {N}}(0,\Sigma /S)\) while the variables in \(\nu \) are defined as previously:
$$\begin{aligned} \left\{ \begin{array}{l} \nu ^{(t)}=A\nu ^{(t-1)}+\omega \\ \nu ^{(1)}=\mu _{0}+u, \end{array}\right. \end{aligned}$$
with
$$\begin{aligned} \left\{ \begin{array}{l} \omega \sim {\mathcal {N}}(0,\Phi )\\ u\sim {\mathcal {N}}(0,V_{0}).\\ \end{array}\right. \end{aligned}$$
Contrary to the original state space model introduced in Sect. 2, where both \(\gamma \) and \(\nu \) were sets of unobserved variables, we obtain here a standard linear dynamic system from which the corresponding parameters, i.e.\(\theta ^{'}=(\mu _{0}, A, B, \Phi , V_{0}, \Sigma /S)\) can be estimated using Kalman filter and Rauch–Tung–Striebel (RTS) smoother (Rauch et al. 1965; details can also be found in Minka 1998). The expectations \(\hat{\nu }^{(t)}\) and covariance matrices \(\hat{V}^{(t)}\) of the random variables \(\nu ^{(t)}\), given all the observed data x, are determined relying on backward forward recursions.

Proposition 3.4

After the E step of the VEM algorithm, the lower bound \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\) simplifies into:
$$\begin{aligned}&\tilde{{\mathcal {L}}}(q,\theta ,\xi ) \\&\quad = \sum _{t=1}^{T}\sum _{k,l}^{K}\sum _{c=0}^{C}\sum _{i\ne j}^{N}\delta \Big (X_{ij}^{(t)}=c\Big )\tau _{ik}^{(t)}\tau _{jl}^{(t)}\log \Big (\Pi _{kl}^{c}\Big )\\&\qquad + \sum _{t=1}^{T}\sum _{s=1}^{S}\Big (r_{s}^{(t)}\hat{\gamma }_{sk}^{(t)}-N_{s}\xi _{s}^{-1(t)}\sum _{l=1}^{K}\exp \Big (\hat{\gamma }_{sl}^{(t)}+\dfrac{\hat{\sigma }_{sl}^{2^{(t)}}}{2}\Big ) +N_{s}-N_{s}\log (\xi _{s}^{(t)})\Big )\\&\qquad + \sum _{t=1}^{T}\sum _{s=1}^{S}\Big (\log {\mathcal {N}}(\hat{\gamma }_{s}^{(t)},B\hat{\nu }_{s}^{(t)},\Sigma )-\dfrac{1}{2}tr(\Sigma ^{-1}B^{\intercal }\hat{V}^{(t)}B)-\dfrac{1}{2}tr\Big (\Sigma ^{-1}\hat{\sigma }_{s}^{(t)^{2}}\Big )\Big )\\&\qquad - \sum _{t=1}^{T}\sum _{s=1}^{S}\sum _{k=1}^{K-1}-\log \Big ( (2\pi )^{\frac{1}{2}}\hat{\sigma }_{sk}^{(t)}\Big )+\dfrac{TKS}{2}\\&\qquad - \sum _{t=1}^{T}\Big (\log {\mathcal {N}}\Big (x^{(t)};B\hat{\nu }^{(t)},\frac{\Sigma }{S}\Big )+\dfrac{1}{2}tr\Big (\Sigma ^{-1}SB^{\intercal }\hat{V}^{(t)}B\Big )\Big )\\&\qquad - \sum _{i=1}^{N}\sum _{t=1}^{T}\sum _{k=1}^{K}\tau _{ik}^{(t)}\log \Big (\tau _{ik}^{(t)}\Big )\\&\qquad + \log p(x|\theta ^{'}) \end{aligned}$$
where \(r_{s}^{(t)}=\sum _{i=1}^{N}\tau _{ik}^{(t)}y_{is}\), \(N_{s}\) is a number of nodes in the subgraph s, and \(\log p(x|\theta ^{'})\) is the log likelihood of the linear dynamic system associated with the variational distribution \(q(\nu )\) (see Proposition 3.3).
The maximization of this bound allows to obtain the updating formula for the tensor matrix \(\Pi \):
$$\begin{aligned} \hat{\Pi }_{kl}^{c}=\dfrac{\sum _{t=1}^{T}\sum _{i\ne j}^{N}\delta (X_{ij}^{(t)}=c)\tau _{ik}^{(t)}\tau _{jl}^{(t)}}{\sum _{t=1}^{T}\sum _{c=0}^{C}\sum _{i\ne j}^{N}\delta (X_{ij}^{(t)}=c)\tau _{ik}^{(t)}\tau _{jl}^{(t)}},\forall k,l,c. \end{aligned}$$
For the parameters \(\hat{\gamma }_{sk}^{(t)}\) and \(\hat{\sigma }_{sk}^{2^{(t)}}\), we do not obtain analytical expressions, and therefore we rely on a quasi-Newton algorithm for the optimization task.

3.3 Optimization of \(\xi \)

So far, we have seen that a VEM algorithm could be implemented from approximations depending on the variational parameters \(\xi _{s}^{(t)}\). However, we have not addressed yet how these parameters could be estimated from the data. We follow the work of Svensén and Bishop (2004) on Bayesian hierarchical mixture of experts. Thus, the lower bound \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\) is optimized with respect to the variational terms \(\xi _{s}^{(t)}\) to obtain the tightest bound \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\) of \({\mathcal {L}}(q,\theta )\). This leads to new estimates \(\hat{\xi }_{s}^{(t)}\) of \(\xi _{s}^{(t)}\):
$$\begin{aligned} \hat{\xi }_{s}^{\left( t\right) }=\sum _{l=1}^{K}\exp \left( \hat{\gamma }_{sl}^{\left( t\right) }+\hat{\sigma }_{sl}^{2^{\left( t\right) }}\right) ,\forall s,t. \end{aligned}$$
This procedure gives rise to a three step optimization scheme. Given all \(\xi =(\xi _{s}^{(t)})_{st}\), the VEM algorithm described previously is used to maximize the lower bound with respect to \(q(Z,\gamma ,\nu )\) and \(\theta \). These terms are then held fixed and a new estimate of \(\xi \) is computed. The three steps are repeated until convergence of the lower bound.

3.4 Model selection: choice of the number K of latent groups

Using the VEM algorithm proposed in the previous paragraphs, the estimation of the model parameters and of the group memberships is fully automatic for a given value of K. Since we consider here a model-based approach, two dRSM models with different values of K can be considered as two different models. The problem of choosing K can therefore be viewed as a model selection problem. It can be tackled in a model-based context using model selection criteria, such as the Akaike information criterion (AIC; Akaike 1974) or the Bayesian information criterion (BIC; Schwarz 1978). Due to its popularity and its asymptotic properties (Leroux 1992), we use BIC in the numerical experiments presented in the following sections. BIC relies on an asymptotic approximation of the marginal log-likelihood, also called integrated log-likelihood, and is defined in the specific context of the dRSM model \({\mathcal {M}}\) by:
$$\begin{aligned} BIC({\mathcal {M}}) = \log p(X |\hat{\theta }) - \dfrac{\eta ({\mathcal {M}})}{2} \log (TN(N-1)), \end{aligned}$$
where \(\eta ({\mathcal {M}})\) is the number of free model parameters depending on K, for the identifiability constraints considered. Unfortunately, the log-likelihood \(\log p(X |\hat{\theta })=\log \left( \sum _{Z} p(X,Z|\hat{\theta }\right) \) is not tractable here because it involves marginalizing over all latent vectors \(Z_{i}^{(t)}\) in Z. Therefore, we propose to replace the log-likelihood with its variational approximation \(\tilde{{\mathcal {L}}}(q,\theta ,\xi )\). Thus, the VEM algorithm is run for various values of K. For each K, the algorithm iterates until convergence of the lower bound. \(\hat{K}\) is then chosen such that the (approximate) BIC criterion is maximized.

4 Numerical experiments and comparisons

This section aims at proving on synthetic data the validity of the inference algorithm presented in Sect. 3. An introductory example is first considered to highlight the main features of the proposed approach. Model selection is then considered to validate the criterion choice. Extensive comparisons with state-of-the-art methods conclude this section.

4.1 Experimental setup

In order to validate our approach, we use in this section artificial data generated according to a common experimental setup. To simplify the characterization and facilitate the reproducibility of the experiments, we designed five different scenarios. The generation setup for each scenario is summarized in Table 2. Data from scenario 0 are drawn using SBM at each time t and without an explicit temporal dependence. The data sets for all other scenarios (scenarios 1–4) are drawn according to the dRSM model. Therefore, the temporal dependence is generated through a state space model. All generated networks are made of \(N=300\) nodes, distributed into \(K=4\) latent groups and have \(T=10\) time points. Depending on the scenario, the networks have \(S=1\) or 2 subgraphs, with binary (\(C=1\)) or categorical (\(C=2\)) edges. When \(S>1\), the nodes are randomly assigned uniformly to the subgraphs. Notice that scenario 2 has a parameter \(\Pi _{kl,k \ne l}^0\) equal to 0.8 which leads to less heterogeneous latent groups.

Table 2

Parameter values for the five types of graphs used in the experiments

Parameters

Scenario 0

Scenario 1

Scenario 2

Scenario 3

Scenario 4

N

300

K

4

T

10 (indep.)

10 (SSM)

S

1

1

1

2

2

C

1

1

1

1

2

\((\Pi _{ll}^{0})_{l=1,\ldots ,K}\)

(0.1,0.4,0.5,0.6)

\(\Pi _{kl,k\ne l}^{0}\)

0.99

0.8

0.99

\(\Pi _{kl}^{c\ne 0}\)

\((1-\Pi _{kl}^{0})/C\)

In scenario 0, the networks are drawn without an explicit temporal dependance whereas, in the other scenarios, the temporal dependance is generated through a state space model (SSM)

The model parameters used for the simulation are as follows. For the simulation of \(\gamma \), it is assumed that the matrices AB and \(V_0\) are set to \(I_{K-1}\), and that \(\Sigma = 0.1 \times K \times I_{K-1}\) and \(\Phi = 0.01 \times I_{K-1}\). Finally, the tensor matrix \(\Pi \), which defines the connection probabilities between clusters for the C different types, is set up such that, within the clusters, the probability \(1-\Pi _{ll}^{0}\) of having an edge of any type is larger than the corresponding connection probabilities between clusters \(1-\Pi ^0_{k l,k\ne l}\) (see Table 2). Notice that such a choice of parameters induces networks made of communities. Then, in case of a connection between two nodes, the edge type is sampled uniformly, i.e.\(\Pi ^{c \ne 0}_{k l} = (1 - \Pi ^0_{k l}) / C,~\forall k,l\).

4.2 An introductory example

We first focus on an introductory example to illustrate the global behavior of the proposed methodology. To this end, we simulated a single network according to scenario 2 for facilitating the understanding of the results. We remind that in this setup the number K of latent groups is fixed to 4 and that \(C=1\). Therefore, the network is binary and \(\Pi _{kl}^{1}\) indicates the occurrence probability of an edge. We ran the VEM algorithm on it for a number K of groups ranging from 3 to 6. We selected afterward the most appropriate number of groups using the BIC criterion.

Figure 4 shows the BIC values associated to the results provided by our VEM algorithm for the different values of K. One can observe that the criterion picks at \(K=4\), which is the actual simulated value for K. Figure 5 presents the evolution of the bound \(\tilde{{\mathcal {L}}}\) for this specific value of K along the 10 iterations of the VEM algorithm. A clear plateau of the bound is visible on the figure, which indicates the convergence of the algorithm.

Fig. 4

Choice of K by model selection with BIC for a simulated network. The actual value for K is 4

Fig. 5

Evolution of the bound \(\tilde{{\mathcal {L}}}\) for \(K=4\)

To quickly assess the estimation quality, Table 3 allows to compare the actual (left panel) and estimated (right panel) values of the terms \(\Pi _{kl}^1\) in the tensor matrix \(\Pi \), which define the connection probabilities between the latent clusters. On this single example, the estimated values \(\Pi _{kl}^1\) turn out to be extremely close to the true ones. Similarly, Fig. 6 compares the actual (dashed red lines) and estimated (solid black lines) values of the group proportions \(\alpha \) for the simulated example. Once again, the estimation of \(\alpha \) appears to be very close to the true proportions.

4.3 Choice of K

We now focus on the evaluation of the criterion we proposed to select the number K of latent groups. Since our approach aims at searching the unobserved clustering partition of the nodes, we chose here to evaluate the combination of our VEM algorithm with the BIC criterion by comparing the resulting partition with the actual one (the simulated partition). In the clustering community, the adjusted Rand index (ARI; Rand 1971) serves as a widely accepted criterion for the difficult task of clustering evaluation. The ARI looks at all pairs of nodes and check wether they are classified in the same group or not in both partitions. As a result, an ARI value close to 1 means that the partitions are similar and, in our case, that the VEM algorithm succeeds in recovering the simulated partition.
Table 3

Actual (left) and estimated (right) values for the terms \(\Pi _{kl}^1\) of the tensor matrix \(\Pi \)

Cluster

1

2

3

4

Cluster

1

2

3

4

Actual values

Estimated values

1

0.90

0.01

0.01

0.01

1

0.89

0.01

0.01

0.01

2

0.01

0.60

0.01

0.01

2

0.01

0.59

0.01

0.01

3

0.01

0.01

0.50

0.01

3

0.01

0.01

0.48

0.01

4

0.01

0.01

0.01

0.40

4

0.01

0.01

0.01

0.39

See text for details

Fig. 6

Actual (solid black lines) and estimated (dashed red lines) values of the group proportions for the simulated example (\(K=4\) groups and \(S=2\) subgraphs) (color figure online)

Fig. 7

Criterion and ARI values over 50 networks generated

To validate the combination of our VEM algorithm with the BIC criterion, the analysis was repeated for 50 different data sets, generated according to scenario 2, for a number K of latent groups ranging from 3 to 6. This allows us to both verify the consistency of the BIC criterion and to study the clustering ability of our approach. Figure 7 shows the repartition of the criterion values (left panel) as well as the associated ARI values (right panel). These results first confirm that BIC is a valid criterion for selecting the number of groups in this context. Indeed, the value \(K=4\) is the one which is the most frequently associated with the highest value of BIC. We remind that \(K=4\) is the actual number of latent groups. One can also observe that the partition resulting from our VEM algorithm is associated, for this value of K, to an ARI value extremely close to 1 which denotes a good matching with the actual partition of the data.

4.4 Comparison with the other stochastic models

Our third set of experiments now aims at comparing the performance of our approach to that of state-of-the-art methods. We are here interested in the comparison of dRSM with the following methods: SBM (Nowicki and Snijders 2001), RSM (Jernite et al. 2014) and dM3SBM (Ho et al. 2011). Once again, the evaluation of the results is done using the ARI criterion. In order to fit a SBM on a dynamic network, we ran the mixer package (Ambroise et al. 2010) for the R software at each time t and the ARI is then computed on the concatenation of all group labels. However, let us notice that SBM was not able to handle networks with categorical edges (scenario 3). For RSM, we used the Rambo package (Bouveyron et al. 2013) for R, on an aggregated version of the whole network. Conversely to SBM, RSM is only able to deal with categorical networks and, consequently, it works only in scenario 4. Finally, we used the Matlab toolbox dM3SBM, kindly provided by the authors, to fit the dM3SBM on the dynamic networks. However, dM3SBM is also not able to handle networks with categorical edges (scenario 4).

In order to consider a wide type of networks, we compare here the methods over the five simulation scenarios. We remind that Table 2 summarizes the main features of each scenario. This comparison has been conducted in two different situations: with and without the knowledge of the actual number of clusters. Table 4 presents the clustering results for the four studied methods in the case where the actual number K = 4 of groups has been provided to each method. Conversely, Table 5 presents the clustering results when the methods have to look for the value of K. Reported values are averaged ARI values (with standard deviations) on 20 networks for each scenario. The average selected number K of latent groups is also provided for Table 5.

First, for scenarios 0, 1 and 2, which consider dynamic networks with binary edges (\(C=1\)) and with only one subgraph (\(S=1\)), one can see on Tables 4 and 5 that SBM is, as expected, not able to handle the network dynamic. Indeed, SBM obtains a low ARI value in all situations, even though it correctly estimates the number of clusters (Table 5). Conversely, the two dynamic methods (dM3SBM and dRSM) turn out to be able to recover the clustering structure of the dynamic networks. One can however notice that dRSM significantly outperforms dM3SBM in this situation. Notice also the accurate estimation of the number K of clusters made by dRSM (Table 5).
Table 4

Clustering results for the four studied methods on networks simulated according to the five scenarios

Method

Scenario 0

Scenario 1

Scenario 2

Scenario 3

Scenario 4

SBM

0.10 \(\pm \) 0.04

0.12 \(\pm \) 0.05

0.18 \(\pm \) 0.07

0.14 \(\pm \) 0.09

RSM

0.01 \(\pm \) 0.01

dM3SBM

0.36 \(\pm \) 0.09

0.30 \(\pm \) 0.16

0.25 \(\pm \) 0.16

0.32 \(\pm \) 0.20

dRSM

1.00 \(\pm \) 0.00

0.98 \(\pm \) 0.04

0.90 \(\pm \) 0.20

0.97 \(\pm \) 0.07

0.75 \(\pm \) 0.24

The actual number \(K=4\) of groups has been provided to each method here. Average ARI values are reported (with standard deviations) and results are averaged on 20 networks for each scenario

Table 5

Clustering results for the four studied methods on networks simulated according to the five scenarios

Method

Scenario 0

Scenario 1

Scenario 2

Scenario 3

Scenario 4

ARI

K

ARI

K

ARI

K

ARI

K

ARI

K

SBM

0.01 \(\pm \) 0.04

4.00 \(\pm \) 0.00

0.18 \(\pm \) 0.13

3.94 \(\pm \) 0.71

0.21 \(\pm \) 0.11

3.97 \(\pm \) 0.46

0.13 \(\pm \) 0.05

4.16 \(\pm \) 0.79

RSM

0.01 \(\pm \) 0.01

2.00 \(\pm \) 0.00

dM3SBM

0.01 \(\pm \) 0.01

5.55 \(\pm \) 1.39

0.35 \(\pm \) 0.21

5.95 \(\pm \) 1.15

0.30 \(\pm \) 0.21

4.35 \(\pm \) 1.63

0.32 \(\pm \) 0.19

5.15 \(\pm \) 1.17

dRSM

1.00 \(\pm \) 0.00

4.00 \(\pm \) 0.00

0.87 \(\pm \) 0.17

4.01 \(\pm \) 0.65

0.89 \(\pm \) 0.21

4.10 \(\pm \) 0.30

0.85 \(\pm \) 0.22

4.10 \(\pm \) 0.45

0.68 \(\pm \) 0.30

4.05 \(\pm \) 0.51

Average ARI values are reported (with standard deviations) as well as the selected number K of latent groups. Results are averaged on 20 networks for each scenario

In scenario 3, the simulated dynamic networks are now made of two subgraphs (\(S=2\)), still with binary edges (\(C=1\)). Naturally, SBM does not perform well in this situation too. The dM3SBM provides clustering results similar to the ones of previous scenarios: it globally succeeds in recovering the dynamic but fails in recognizing the clustering pattern. On the other hand, dRSM provides again accurate clustering results associated with good estimations of K, meaning that it succeeds in identifying both the dynamic and clustering patterns.

Finally, scenario 4 considers the case of dynamic networks with two subgraphs (\(S=2\)) and categorical edges (\(C=2\)). Only RSM and dRSM are able to deal with this kind of networks. Similarly to SBM in previous scenarios, RSM does not succeed in recovering the dynamic and provides very unsatisfactory clustering results. Conversely, dRSM gives very good clustering results regarding the difficulty of the situation. It is worth noticing the sharp estimation made by dRSM of the number K of group in this case too. This confirms the efficiency of both our inference algorithm and our model selection criterion.

We also used scenario 4 to highlight that providing the methodology with the right subgraph structure helps in clustering the vertices. Thus, with the knowledge of the actual number of clusters, we ran dRSM with the wrong subgraph structure (\(S=1\)), and we obtained an average ARI of \(0.54\pm 0.2\). This result is to be compared to the ARI performances for scenario 4, as presented in Table 4.

5 Maritime network

This section presents an application of the proposed methodology for the analysis of a network of maritime flows in which a temporal dynamic is present. The dynamic network was provided by Dr. César Ducruet, from the Géographie-Cités laboratory, who is interested in studying the evolution of maritime flows over time. The data was extracted from the well-known Lloyd’s list which has recorded almost all ship movements worldwide since 1890.

5.1 Data and study protocol

Table 6

The time points considered in the maritime network

Time point

Date

\(t_{1}\)

October 1890

\(t_{2} \dots t_{4}\)

October 1925 to October 1940, every 5 years

\(t_{5}\)

October 1946

\(t_{6}\)

October 1951

\(t_{7} \)

October 1960

\(t_{8} \dots t_{16}\)

October 1965 to October 2000, every 5 years

\(t_{17}\)

October 2008

Data was obtained from the printed Lloyds voyage record published every October between 1890 until 2008. The list details, for each merchant vessel, its successive movements from one port to another. From the raw database of vessel flows, we extracted a dynamic network with 17 time points. The first observation is October 1890 and the network ends in October 2008. Table 6 provides the correspondence between the 17 time points and the actual dates.

At each time point, the adjacency matrix between ports was constructed as follows. First, for every pair of ports, we calculated the total number of ship movements between those ports. Then, we set the associated entry in the adjacency matrix to 1 if the number of ship movements between the two ports is greater or equal to 1, and to 0 otherwise. The original network contained 4472 ports worldwide. We however had to reduce the network size to only 286 ports since most of the ports were not active throughout the whole period of the study.

We finally applied dRSM to a maritime network which describes the navigation of ships among 286 ports in the world at 17 time points. Let us highlight that the study period includes many major historical or economical events (the two world wars, the oil crisis, the economic crisis in Europe, \(\ldots \)), which could directly affect the navigation movements at a global scale and could also change the port behaviors.

The partition of the network into subgraphs is here provided by the port memberships to the four main maritime basins: Asia–Pacific, Europe–Atlantic, Mediterranean–Black Seas, and Middle East–Indian Ocean. Figure 8 presents this partition of the ports where the colors indicates the different subgraphs.

To summarize, the network is a undirected and binary network without self loops, i.e.\(C=1\) and \(X_{ij}^{t}=1\) if the port i and the port j exchange at least one ship during the period t, 0 otherwise, with \(t \in \{1,\dots ,17\}\) and \(S=4\). Figure 9 shows the adjacency matrix, in 1890 and in 2008, between the 286 ports organized by subgraph.
Fig. 8

The given partition of the 286 nodes (ports) into 4 subgraphs

Fig. 9

Adjacency matrix of the maritime network organized by subgraph (basin) in 1890 (left) and 2008 (right)

5.2 Results

We used the variational EM algorithm introduced in the previous section in order to find the latent groups that may be hidden in the data. The choice of the number of groups is made by applying the VEM algorithm for \(K=3,\ldots ,8\) and by then computing the associated BIC values. The retained value for K is the one associated with the highest BIC value. To ensure a good accuracy of the results, the VEM algorithm was run 5 times for each value of K. Figure 10 shows the evolution of the BIC criterion according to K. One can observe that BIC peaks at \(K=7\), meaning that 7 latent groups seem to organize the network. We therefore chose this specific value for K and retained the best run for \(K=7\) over the five runs as the final clustering result.
Fig. 10

BIC values according to the number K of groups for the maritime network

On the one hand, it is of main interest to look at the estimated tensor matrix \(\Pi \) in order to understand and characterize the found latent groups. Indeed, the tensor matrix \(\Pi \) describes the connection probabilities between the groups and allows to figure out the different connection patterns. Since the network considered here is binary, it is enough to look at the terms \(\Pi _{kl}^{1}\) since \(\Pi _{k l}^{0} + \Pi _{k l}^{1}=1\), for all k,  l. Figure 11 presents those estimated values. From the figure, clusters 6 and 7 appear to be groups of hubs for which the connection probabilities are large within and between clusters.
Fig. 11

Terms \(\Pi _{kl}^{1}\) of the tensor matrix \(\Pi \) estimated using the VEM algorithm

On the other hand, the estimated group proportions over time should allow to understand the dynamic of the network. Figure 12 presents the evolution of those proportions over time for each subgraph. One can first observe that the proportion of cluster 6 is low and rather stable over time. This confirms that cluster 6 is a group of a limited number of hubs with a high connectivity and probably a high level of traffic. Cluster 6 includes ports such as Anvers, Rotterdam or Singapore. It is also interesting to see that, in subgraph 2 (Europe–Atlantic), the number of hubs increased until 1930, was then perturbed during the second world war and finally decreased from 1951. Conversely, in subgraph 1 (Asia–Pacific), the proportion of hubs was low until 1975 and then significantly increased. From a global point of view, one can also observe a clear and recent reorganization of the network in which hubs tend to be less numerous worldwide (and probably bigger).
Fig. 12

Evolution of the proportions of the \(K=7\) latent clusters

Regarding cluster 7, one can see on Fig. 12 that its proportions in the subgraphs are higher than those of cluster 6. The ports of cluster 7 can be qualified as hubs of second class which are subordinated to the main hubs of cluster 6. Most of them are marked by a colonial logic, such as Marseille, Kolkata or Cape Town. The evolution of this cluster until the recent period shows a persisting link North–South (e.g. Le Havre–Casablanca) or East–West (e.g. Spain–Brazil–Canaries).

The cluster 5 is mainly made of ports from the Asia–Pacific and Middle East–India basins except during major crises, such as World War II and the oil crisis. During those crises, the cluster mainly contains European ports. The rapid modification of this cluster appears clearly on Fig. 12 around 1946, 1980 and 2008. This cluster can be interpreted as made of active ports from the developing world which move to cluster 2 during the crises. This may highlight the disintegration of long distance links during such crises. Conversely, cluster 2 turns out to be mostly made, except during crises, of European ports of average size, mainly on the atlantic coast. Those ports are rather a reflection of a past glory and most of them have declined over the century. This may due to a failed industrialization or a significant distance to the major trade routes.

Finally, clusters 3 and 4 are made of very small ports with low activity. Those ports are usually not connected together and communicate with the rest of the network only through ports of clusters 2 and 5. The connection with clusters 2 and 5 explains the brutal changes in the proportions of clusters 3 and 4 that one can also observe.

6 Conclusion

This work has considered the problem of analyzing dynamic networks with categorical edges and for which a subgraph partition is known. This kind of networks is frequent in a wide range of scientific fields, such as Geography in particular. For this purpose, we proposed an extension of the RSM model to the dynamic setting. The new model, called dRSM, uses a state space model to model the evolution of the latent group proportions over time. A variational expectation maximization (VEM) algorithm is proposed to perform inference. We have shown in particular that the variational approximations lead to a new state space model from which the parameters can be estimated using the standard Kalman filter and the Rauch–Tung–Striebel (RTS) smoother. Model selection is also considered through an approximate BIC criterion.

Numerical experiments have highlighted the main features of the dRSM model and have demonstrated the efficiency of both the VEM algorithm and the model selection criterion. A numerical comparison has also shown that existing methods, dynamic or not, are less flexible and efficient than dRSM when applied to dynamic networks. Finally, dRSM has been applied to a dynamic maritime flow network, build from the famous Lloyd’s list, and has allowed to characterize interesting dynamic phenomena.

Notes

Acknowledgments

The authors would like to greatly thank César Ducruet, from the Géographie-Cités laboratory, Paris, France, for providing the maritime network and for his painstaking analysis of the results. The data were collected in the context of the ERC Grant No. 313847 “World Seastems” (http://www.world-seastems.cnrs.fr). The authors would like also to thank Catherine Matias and Stéphane Robin for their useful remarks and comments on this work.

References

  1. Ahmed A, Xing EP (2007) On tight approximate inference of logistic-normal admixture model. In: Proceedings of the international conference on artificial intelligence and statistics, pp 1–8Google Scholar
  2. Airoldi E, Blei D, Fienberg S, Xing E (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9:1981–2014MATHGoogle Scholar
  3. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723MathSciNetCrossRefMATHGoogle Scholar
  4. Albert R, Barabási A (2002) Statistical mechanics of complex networks. Mod Phys 74:47–97MathSciNetCrossRefMATHGoogle Scholar
  5. Ambroise C, Grasseau G, Hoebeke M, Latouche P, Miele V, Picard F (2010) The mixer R package (version 1.8). http://cran.r-project.org/web/packages/mixer/
  6. Barabási A, Oltvai Z (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113CrossRefGoogle Scholar
  7. Bickel P, Chen A (2009) A nonparametric view of network models and Newman–Girvan and other modularities. Proc Natl Acad Sci 106(50):21068–21073CrossRefMATHGoogle Scholar
  8. Bishop C, Svensén M (2003) Bayesian hierarchical mixtures of experts. In: Kjaerulff U, Meek C (eds) Proceedings of the 19th conference on uncertainty in artificial intelligence, pp 57–64Google Scholar
  9. Blei D, Lafferty J (2007a) A correlated topic model of science. Ann Appl Stat 1:17–35Google Scholar
  10. Blei D, Lafferty J (2007b) A correlated topic model of science. Ann Appl Stat 1(1):17–35MathSciNetCrossRefMATHGoogle Scholar
  11. Bouveyron C, Jernite Y, Latouche P, Nouedoui L (2013) The rambo R package (version 1.1). http://cran.r-project.org/web/packages/Rambo/
  12. Côme E, Latouche P (2015) Model selection and clustering in stochastic block models with the exact integrated complete data likelihood. Stat Model. doi:10.1177/1471082X15577017 MathSciNetGoogle Scholar
  13. Daudin J-J, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183MathSciNetCrossRefGoogle Scholar
  14. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39:1–38Google Scholar
  15. Dubois C, Butts C, Smyth P (2013) Stochastic blockmodelling of relational event dynamics. In: International conference on artificial intelligence and statistics, vol 31 of the J Mach Learn Res Proc, pp 238–246Google Scholar
  16. Ducruet C (2013) Network diversity and maritime flows. J Transp Geogr 30:77–88CrossRefGoogle Scholar
  17. Fienberg S, Wasserman S (1981) Categorical data analysis of single sociometric relations. Sociol Methodol 12:156–192CrossRefGoogle Scholar
  18. Foulds JR, DuBois C, Asuncion AU, Butts CT, Smyth P (2011) A dynamic relational infinite feature model for longitudinal social networks. In: International conference on artificial intelligence and statistics, pp 287–295Google Scholar
  19. Girvan M, Newman M (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821MathSciNetCrossRefMATHGoogle Scholar
  20. Handcock M, Raftery A, Tantrum J (2007) Model-based clustering for social networks. J R Stat Soc Ser A (Stat Soc) 170(2):301–354MathSciNetCrossRefGoogle Scholar
  21. Harvey A (1989) Forecasting, structural time series models and the Kalman filter. Cambridge University Press, CambridgeGoogle Scholar
  22. Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56MathSciNetCrossRefMATHGoogle Scholar
  23. Heaukulani C, Ghahramani Z (2013) Dynamic probabilistic models for latent feature propagation in social networks. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 275–283Google Scholar
  24. Ho Q, Song L, Xing EP (2011) Evolving cluster mixed-membership blockmodel for time-evolving networks. In: International conference on artificial intelligence and statistics, pp 342–350Google Scholar
  25. Hofman J, Wiggins C (2008) Bayesian approach to network modularity. Phys Rev Lett 100(25):258701CrossRefGoogle Scholar
  26. Jernite Y, Latouche P, Bouveyron C, Rivera P, Jegou L, Lamassé S (2014) The random subgraph model for the analysis of an acclesiastical network in Merovingian Gaul. Ann Appl Stat 8(1):55–74CrossRefMATHGoogle Scholar
  27. Jordan M, Ghahramani Z, Jaakkola T, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233CrossRefMATHGoogle Scholar
  28. Kemp C, Tenenbaum J, Griffiths T, Yamada T, Ueda N (2006) Learning systems of concepts with an infinite relational model. In: Proceedings of the national conference on artificial intelligence, vol 21, pp 381–391Google Scholar
  29. Kim M, Leskovec J (2013) Nonparametric multi-group membership model for dynamic networks. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 25. MIT Press, Cambridge, pp 1385–1393Google Scholar
  30. Krishnan T, McLachlan G (1997) The EM algorithm and extensions. Wiley, New YorkMATHGoogle Scholar
  31. Lafferty JD, Blei DM (2006) Correlated topic models. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, pp 147–154Google Scholar
  32. Latouche P, Birmelé E, Ambroise C (2011) Overlapping stochastic block models with application to the french political blogosphere. Ann Appl Stat 5(1):309–336MathSciNetCrossRefMATHGoogle Scholar
  33. Latouche P, Birmelé E, Ambroise C (2012) Variational bayesian inference and complexity control for stochastic block models. Stat Model 12(1):93–115MathSciNetCrossRefGoogle Scholar
  34. Latouche P, Birmelé E, Ambroise C (2014) Model selection in overlapping stochastic block models. Electron J Stat 8(1):762–794MathSciNetCrossRefMATHGoogle Scholar
  35. Leroux B (1992) Consistent estimation of amixing distribution. Ann Stat 20:1350–1360CrossRefMATHGoogle Scholar
  36. Mariadassou M, Robin S, Vacher C (2010) Uncovering latent structure in valued graphs: a variational approach. Ann Appl Stat 4(2):715–742MathSciNetCrossRefMATHGoogle Scholar
  37. Matias C, Robin S (2014) Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM Proc Surv 47:55–74MathSciNetCrossRefMATHGoogle Scholar
  38. Mc Daid A, Murphy T, Friel N, Hurley N (2013) Improved bayesian inference for the stochastic block model with application to large networks. Comput Stat Data Anal 60:12–31MathSciNetCrossRefGoogle Scholar
  39. Minka T (1998) From hidden markov models to linear dynamical systems. Technical report, MITGoogle Scholar
  40. Moreno J (1934) Who shall survive?: A new approach to the problem of human interrelations. Nervous and Mental Disease Publishing CoGoogle Scholar
  41. Nowicki K, Snijders T (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087MathSciNetCrossRefMATHGoogle Scholar
  42. Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRefGoogle Scholar
  43. Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850Google Scholar
  44. Rauch H, Tung F, Striebel T (1965) Maximum likelihood estimates of linear dynamic systems. AIASS J 3(8):1445–1450MathSciNetGoogle Scholar
  45. Rossi F, Villa-Vialaneix N, Hautefeuille F (2014) Exploration of a large database of French notarial acts with social network methods. Digit Mediev 9:1–20Google Scholar
  46. Sarkar P, Moore AW (2005) Dynamic social network analysis using latent space models. ACM SIGKDD Explor Newsl 7(2):31–40CrossRefGoogle Scholar
  47. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464MathSciNetCrossRefMATHGoogle Scholar
  48. Svensén M, Bishop C (2004) Robust bayesian mixture modelling. Neurocomputing 64:235–252CrossRefGoogle Scholar
  49. Wang Y, Wong G (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82:8–19MathSciNetCrossRefMATHGoogle Scholar
  50. White H, Boorman S, Breiger R (1976) Social structure from multiple networks. I. Blockmodels of roles and positions. Am J Sociol 81:730–780Google Scholar
  51. Xing E, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4(2):535–566MathSciNetCrossRefMATHGoogle Scholar
  52. Xu KS (2015) Stochastic block transition models for dynamic networks. In: International conference on artificial intelligence and statistics, pp 1079–1087Google Scholar
  53. Xu KS, Hero III AO (2013) Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Greenberg AM, Kennedy WG, Bos ND (eds) Social computing, behavioral-cultural modeling and prediction. Springer, Berlin, Heidelberg, pp 201–210Google Scholar
  54. Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks a Bayesian approach. Mach Learn 82(2):157–189MathSciNetCrossRefMATHGoogle Scholar
  55. Zanghi H, Volant S, Ambroise C (2010) Clustering based on random graph model embedding vertex features. Pattern Recognit Lett 31(9):830–836CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Rawya Zreik
    • 1
    • 2
  • Pierre Latouche
    • 1
  • Charles Bouveyron
    • 2
  1. 1.Laboratoire SAMM, EA 4543, Université Paris 1 Panthéon-SorbonneParisFrance
  2. 2.Laboratoire MAP5, UMR CNRS 8145, Université Paris Descartes & Sorbonne Paris CitéParisFrance

Personalised recommendations