Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL

Corneli, Marco; Latouche, Pierre; Rossi, Fabrice

doi:10.1007/s13278-016-0368-3

Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL

Original Article
Published: 02 August 2016

Volume 6, article number 55, (2016)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Marco Corneli¹,
Pierre Latouche¹ &
Fabrice Rossi¹

469 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

We develop a model in which interactions between nodes of a dynamic network are counted by non-homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable, we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then, a maximum likelihood estimator is developed to estimate nonparametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Poisson degree corrected dynamic stochastic block model

Article 27 February 2022

Paul Riverain, Simon Fossier & Mohamed Nadif

Posterior Contraction Rates for Stochastic Block Models

Article 14 October 2019

Prasenjit Ghosh, Debdeep Pati & Anirban Bhattacharya

Sequential Monte Carlo Inference Based on Activities for Overlapping Community Models

Notes

In practice, the starting time of an interaction with a duration will be considered.
The model can easily be extended to the more general framework:
$$\begin{aligned} p\left( \pi _{kgu}|a_{kgu}, b_{kgu}\right) ={\text {Gamma}}(\pi _{kgu}|a_{kgu}, b_{kgu}). \end{aligned}$$
Hereafter, the “*” notation refers to the statistics after switching/merging.
The dimension of the vector $\varvec{\omega }$ does not change.
More informations about the way the data were collected can be found in Isella et al. (2011) or visiting the website http://www.sociopatterns.org/datasets/hypertext-2009-dynamic-contact-network/.
More informations at http://www.ht2009.org/program.php.

References

Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Article Google Scholar
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Experi 2008(10):10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008
Article Google Scholar
Côme E, Latouche P (2015) Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Stat Model 15(6):564–589
Article MathSciNet Google Scholar
Corneli M, Latouche P, Rossi F (2015) Modelling time evolving interactions in networks through a non stationary extension of stochastic block models. In: Pei J, Silvestri F, Tang J (eds) International conference on advances in social networks analysis and mining ASONAM 2015. IEEE/ACM, ACM, Paris, France, pp 1590–1591. https://hal.archives-ouvertes.fr/hal-01263540
Dubois C, Butts C, Smyth P (2013) Stochastic blockmodelling of relational event dynamics. In: International conference on artificial intelligence and statistics. Volume 31 of the Journal of Machine Learning Research Proceedings, pp 238–246
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article MathSciNet Google Scholar
Goldenberg A, Zheng X, Fienberg SE, Airoldi EM (2009) A survey of statistical network models. Mach Learn 2(2):129–133
Article MATH Google Scholar
Guigourès R, Boullé M, Rossi F (2015) Discovering patterns in time-varying graphs: a triclustering approach. Adv Data Anal Classif. doi:10.1007/s11634-015-0218-6
Google Scholar
Guigourès R, Boullé M, Rossi F (2012) A triclustering approach for time evolving graphs. In: Co-clustering and applications, IEEE 12th international conference on data mining workshops (ICDMW 2012). Brussels, Belgium, pp 115–122
Holland P, Laskey K, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Netw 5:109–137
Article MathSciNet Google Scholar
Isella L, Stehl J, Barrat A, Cattuto C, Pinton J, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180
Article Google Scholar
Leemis LM (1991) Nonparametric estimation of the cumulative intensity function for a nonhomogeneous poisson process. Manag Sci 37(7):886–900. http://www.jstor.org/stable/2632541
Lorrain F, White H (1971) Structural equivalence of individuals in social networks. J Math Sociol 1:49–80
Article Google Scholar
Matias C, Rebafka T, Villers F (2015) Estimation and clustering in a semiparametric Poisson process stochastic block model for longitudinal networks, HAL (preprint)
Noack A, Rotta R (2008) Multi-level algorithms for modularity clustering. CoRR arXiv:0812.4073
Nouedoui L, Latouche P (2013) Bayesian non parametric inference of discrete valued networks. In: 21th European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2013). Bruges, Belgium, pp 291–296
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes 3rd edition: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Article MathSciNet MATH Google Scholar
Wang Y, Wong G (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82:8–19
Article MathSciNet MATH Google Scholar
Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 506. Cambridge University Press, Cambridge
Book MATH Google Scholar
White HC, Boorman S, Breiger R (1976) Social structure from multiple networks: I. Blockmodels of roles and positions. Am J Sociol 81(4):730–780
Article Google Scholar
Wyse J, Friel N, Latouche P (2014) Inferring structure in bipartite networks using the latent block model and exact icl. arXiv preprint arXiv:1404.2911
Xing EP, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4(2):535–566
Article MathSciNet MATH Google Scholar
Xu KS, Hero III AO (2013) Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, pp 201–210
Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82(2):157–189
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire SAMM, Université Paris 1 Panthéon-Sorbonne, 90 rue de Tolbiac, 75634, Paris Cedex 13, France
Marco Corneli, Pierre Latouche & Fabrice Rossi

Authors

Marco Corneli
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Latouche
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Rossi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Corneli.

Appendix: Computational complexity

In this section, we provide details about the computational complexity of the main model presented in this paper, namely the model A. Assuming that the gamma function can be computed in constant time (see Press et al. 2007), we focus on the three statistics appearing in Eq. (9), namely

1.
$S_{kgu}:=\sum _{z_i=k} \sum _{z_j=g}Y_{ij}^{I_u}$,
2.
$P_{kgu}:=\prod _{z_i=k} \prod _{z_j=g}Y_{ij}^{I_u}!$,
3.
$R_{kg}:=|{\mathcal {A}}_k||{\mathcal {A}}_g|$.

The whole computation task consists in evaluating the increase in ICL induced by nodes exchanges and merges. Those computations involves the three quantities listed above. The tensor $\{S_{kgu}\}_{k,g \le K, u\le U}$ is stored in a three-dimensional array, never resized, occupying a $O(K_{\max }^2U)$ memory space. Hence, at any time during the algorithm its elements can be accessed and modified in constant time. The tensor $\{P_{kgu}\}_{k,g \le K, u\le U}$ is handled similarly and clusters sizes (we recall that $|{\mathcal {A}}_k|$ corresponds to the size of cluster ${\mathcal {A}}_k$) are also stored in arrays. In order to evaluate the ICL changes, induced by an operation, we need to maintain aggregated interaction counts for each node: for a node i we have, e.g.

$$\begin{aligned} S_{igu}:=\sum _{z_j=g}Y_{ij}^{I_u}, \end{aligned}$$

the number of interactions from node i to cluster ${\mathcal {A}}_g$ inside the time interval $I_u$. Similarly

$$\begin{aligned} S_{igu}':=\sum _{z_j=g}Y_{ji}^{I_u} \end{aligned}$$

denotes the number of interactions from cluster ${\mathcal {A}}_g$ to node i inside the time interval $I_u$. Other related quantities are considered. These structures occupy a memory space of $O(N^2 U)$.

1.1 Exchanges

In order to evaluate the ICL increase induced by the switch of a node (say i) from cluster ${\mathcal {A}}_{k'}$ to cluster ${\mathcal {A}}_{l}$, we perform the following operations:

$S_{k'gu}$ (respectively, $S_{gk'u}$) is reduced by $S_{igu}$ ($S_{igu}'$) and $S_{lgu}$ ($S_{glu}$) is increased by the same amount;
$P_{k'gu}$ (respectively, $P_{g'ku}$) is reduced by $P_{igu}$ ($P_{igu}'$) and $P_{lgu}$ ($P_{glu}$) is increased by the same amount;
${\mathcal {A}}_{k'}$ (${\mathcal {A}}_l$) is reduced (increased) by one.

Although these operations are in constant time, they are involved in a sum with (KU) elements (this can be seen in Eq. (22)), so that the total cost of the test is O(KU). Since node i can be switched to $K-1$ remaining clusters and the graph has N nodes, the cost of a full exchange routine is $O(NK^2U)$.

Remark 6

When a node is actually switched from its cluster to another one, all data structures are updated, but the update cost is dominated by the cost of the testing phase described above.

Notice that we have evaluated the total cost of one full exchange routine, i.e. in the case where all nodes are considered once. Reductions in the number of clusters (very likely to be induced by exchanges in case $K_{\max }$ is high) are not taken into account.

1.2 Merges

The entire merge routine, consisting in a test phase and an actual merge, has a computational cost that is dominated by the cost of exchanges. Consider a cluster ${\mathcal {A}}_{k'}$. We first look for the cluster (say ${\mathcal {A}}_l$), leading to the best merge (highest increase in the ICL) with ${\mathcal {A}}_{k'}$. This operation has a cost of $O(K^2U)$: for each ${\mathcal {A}}_l$ the evaluation of the increase in ICL has a cost of O(KU) (see Eq. (23)) and l can take $K-1$ possible values. Since we look for the best merge for all $k' \in \{1,\ldots , K \}$, the computational cost for a merge of two nodes clusters is $O(K^3U)$, where we recall that $D\le N$.

1.3 Total cost

The worst case complexity for one iteration of the algorithm, with each node considered once, is $O(NK^2U)$. However, it is difficult to evaluate the actual complexity of the whole algorithm for two reasons. Firstly, we have no way to estimate the number of exchanges needed in the exchange phase. Secondly, nodes exchanges are very likely to reduce the number of clusters, especially at the beginning of the algorithm, when $K_{\max }$ is relatively high. Thus, the individual cost of an exchange reduces very quickly, leading to a vast overestimation of its cost using the proposed bounds. A detailed evaluation of the behaviour of the proposed algorithm, although outside the scope of the this paper, would be necessary to assess its use on large data sets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Corneli, M., Latouche, P. & Rossi, F. Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL. Soc. Netw. Anal. Min. 6, 55 (2016). https://doi.org/10.1007/s13278-016-0368-3

Download citation

Received: 18 December 2015
Revised: 22 July 2016
Accepted: 24 July 2016
Published: 02 August 2016
DOI: https://doi.org/10.1007/s13278-016-0368-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL

Abstract

Access this article

Similar content being viewed by others

Poisson degree corrected dynamic stochastic block model

Posterior Contraction Rates for Stochastic Block Models

Sequential Monte Carlo Inference Based on Activities for Overlapping Community Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Computational complexity

1.1 Exchanges

Remark 6

1.2 Merges

1.3 Total cost

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL

Abstract

Access this article

Similar content being viewed by others

Poisson degree corrected dynamic stochastic block model

Posterior Contraction Rates for Stochastic Block Models

Sequential Monte Carlo Inference Based on Activities for Overlapping Community Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Computational complexity

Appendix: Computational complexity

1.1 Exchanges

Remark 6

1.2 Merges

1.3 Total cost

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation