Abstract
We develop a model in which interactions between nodes of a dynamic network are counted by non-homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable, we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then, a maximum likelihood estimator is developed to estimate nonparametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology.
This is a preview of subscription content, access via your institution.








Notes
In practice, the starting time of an interaction with a duration will be considered.
The model can easily be extended to the more general framework:
$$\begin{aligned} p\left( \pi _{kgu}|a_{kgu}, b_{kgu}\right) ={\text {Gamma}}(\pi _{kgu}|a_{kgu}, b_{kgu}). \end{aligned}$$Hereafter, the “*” notation refers to the statistics after switching/merging.
The dimension of the vector \(\varvec{\omega }\) does not change.
More informations about the way the data were collected can be found in Isella et al. (2011) or visiting the website http://www.sociopatterns.org/datasets/hypertext-2009-dynamic-contact-network/.
More informations at http://www.ht2009.org/program.php.
References
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Experi 2008(10):10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008
Côme E, Latouche P (2015) Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Stat Model 15(6):564–589
Corneli M, Latouche P, Rossi F (2015) Modelling time evolving interactions in networks through a non stationary extension of stochastic block models. In: Pei J, Silvestri F, Tang J (eds) International conference on advances in social networks analysis and mining ASONAM 2015. IEEE/ACM, ACM, Paris, France, pp 1590–1591. https://hal.archives-ouvertes.fr/hal-01263540
Dubois C, Butts C, Smyth P (2013) Stochastic blockmodelling of relational event dynamics. In: International conference on artificial intelligence and statistics. Volume 31 of the Journal of Machine Learning Research Proceedings, pp 238–246
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Goldenberg A, Zheng X, Fienberg SE, Airoldi EM (2009) A survey of statistical network models. Mach Learn 2(2):129–133
Guigourès R, Boullé M, Rossi F (2015) Discovering patterns in time-varying graphs: a triclustering approach. Adv Data Anal Classif. doi:10.1007/s11634-015-0218-6
Guigourès R, Boullé M, Rossi F (2012) A triclustering approach for time evolving graphs. In: Co-clustering and applications, IEEE 12th international conference on data mining workshops (ICDMW 2012). Brussels, Belgium, pp 115–122
Holland P, Laskey K, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Netw 5:109–137
Isella L, Stehl J, Barrat A, Cattuto C, Pinton J, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180
Leemis LM (1991) Nonparametric estimation of the cumulative intensity function for a nonhomogeneous poisson process. Manag Sci 37(7):886–900. http://www.jstor.org/stable/2632541
Lorrain F, White H (1971) Structural equivalence of individuals in social networks. J Math Sociol 1:49–80
Matias C, Rebafka T, Villers F (2015) Estimation and clustering in a semiparametric Poisson process stochastic block model for longitudinal networks, HAL (preprint)
Noack A, Rotta R (2008) Multi-level algorithms for modularity clustering. CoRR arXiv:0812.4073
Nouedoui L, Latouche P (2013) Bayesian non parametric inference of discrete valued networks. In: 21th European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2013). Bruges, Belgium, pp 291–296
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes 3rd edition: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Wang Y, Wong G (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82:8–19
Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 506. Cambridge University Press, Cambridge
White HC, Boorman S, Breiger R (1976) Social structure from multiple networks: I. Blockmodels of roles and positions. Am J Sociol 81(4):730–780
Wyse J, Friel N, Latouche P (2014) Inferring structure in bipartite networks using the latent block model and exact icl. arXiv preprint arXiv:1404.2911
Xing EP, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4(2):535–566
Xu KS, Hero III AO (2013) Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, pp 201–210
Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82(2):157–189
Author information
Authors and Affiliations
Corresponding author
Appendix: Computational complexity
Appendix: Computational complexity
In this section, we provide details about the computational complexity of the main model presented in this paper, namely the model A. Assuming that the gamma function can be computed in constant time (see Press et al. 2007), we focus on the three statistics appearing in Eq. (9), namely
-
1.
\(S_{kgu}:=\sum _{z_i=k} \sum _{z_j=g}Y_{ij}^{I_u}\),
-
2.
\(P_{kgu}:=\prod _{z_i=k} \prod _{z_j=g}Y_{ij}^{I_u}!\),
-
3.
\(R_{kg}:=|{\mathcal {A}}_k||{\mathcal {A}}_g|\).
The whole computation task consists in evaluating the increase in ICL induced by nodes exchanges and merges. Those computations involves the three quantities listed above. The tensor \(\{S_{kgu}\}_{k,g \le K, u\le U}\) is stored in a three-dimensional array, never resized, occupying a \(O(K_{\max }^2U)\) memory space. Hence, at any time during the algorithm its elements can be accessed and modified in constant time. The tensor \(\{P_{kgu}\}_{k,g \le K, u\le U}\) is handled similarly and clusters sizes (we recall that \(|{\mathcal {A}}_k|\) corresponds to the size of cluster \({\mathcal {A}}_k\)) are also stored in arrays. In order to evaluate the ICL changes, induced by an operation, we need to maintain aggregated interaction counts for each node: for a node i we have, e.g.
the number of interactions from node i to cluster \({\mathcal {A}}_g\) inside the time interval \(I_u\). Similarly
denotes the number of interactions from cluster \({\mathcal {A}}_g\) to node i inside the time interval \(I_u\). Other related quantities are considered. These structures occupy a memory space of \(O(N^2 U)\).
1.1 Exchanges
In order to evaluate the ICL increase induced by the switch of a node (say i) from cluster \({\mathcal {A}}_{k'}\) to cluster \({\mathcal {A}}_{l}\), we perform the following operations:
-
\(S_{k'gu}\) (respectively, \(S_{gk'u}\)) is reduced by \(S_{igu}\) (\(S_{igu}'\)) and \(S_{lgu}\) (\(S_{glu}\)) is increased by the same amount;
-
\(P_{k'gu}\) (respectively, \(P_{g'ku}\)) is reduced by \(P_{igu}\) (\(P_{igu}'\)) and \(P_{lgu}\) (\(P_{glu}\)) is increased by the same amount;
-
\({\mathcal {A}}_{k'}\) (\({\mathcal {A}}_l\)) is reduced (increased) by one.
Although these operations are in constant time, they are involved in a sum with (KU) elements (this can be seen in Eq. (22)), so that the total cost of the test is O(KU). Since node i can be switched to \(K-1\) remaining clusters and the graph has N nodes, the cost of a full exchange routine is \(O(NK^2U)\).
Remark 6
When a node is actually switched from its cluster to another one, all data structures are updated, but the update cost is dominated by the cost of the testing phase described above.
Notice that we have evaluated the total cost of one full exchange routine, i.e. in the case where all nodes are considered once. Reductions in the number of clusters (very likely to be induced by exchanges in case \(K_{\max }\) is high) are not taken into account.
1.2 Merges
The entire merge routine, consisting in a test phase and an actual merge, has a computational cost that is dominated by the cost of exchanges. Consider a cluster \({\mathcal {A}}_{k'}\). We first look for the cluster (say \({\mathcal {A}}_l\)), leading to the best merge (highest increase in the ICL) with \({\mathcal {A}}_{k'}\). This operation has a cost of \(O(K^2U)\): for each \({\mathcal {A}}_l\) the evaluation of the increase in ICL has a cost of O(KU) (see Eq. (23)) and l can take \(K-1\) possible values. Since we look for the best merge for all \(k' \in \{1,\ldots , K \}\), the computational cost for a merge of two nodes clusters is \(O(K^3U)\), where we recall that \(D\le N\).
1.3 Total cost
The worst case complexity for one iteration of the algorithm, with each node considered once, is \(O(NK^2U)\). However, it is difficult to evaluate the actual complexity of the whole algorithm for two reasons. Firstly, we have no way to estimate the number of exchanges needed in the exchange phase. Secondly, nodes exchanges are very likely to reduce the number of clusters, especially at the beginning of the algorithm, when \(K_{\max }\) is relatively high. Thus, the individual cost of an exchange reduces very quickly, leading to a vast overestimation of its cost using the proposed bounds. A detailed evaluation of the behaviour of the proposed algorithm, although outside the scope of the this paper, would be necessary to assess its use on large data sets.
Rights and permissions
About this article
Cite this article
Corneli, M., Latouche, P. & Rossi, F. Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL. Soc. Netw. Anal. Min. 6, 55 (2016). https://doi.org/10.1007/s13278-016-0368-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0368-3
Keywords
- Dynamic network
- Stochastic block model
- Exact ICL
- Non-homogeneous Poisson process