Skip to main content
Log in

Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

We develop a model in which interactions between nodes of a dynamic network are counted by non-homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable, we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then, a maximum likelihood estimator is developed to estimate nonparametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In practice, the starting time of an interaction with a duration will be considered.

  2. The model can easily be extended to the more general framework:

    $$\begin{aligned} p\left( \pi _{kgu}|a_{kgu}, b_{kgu}\right) ={\text {Gamma}}(\pi _{kgu}|a_{kgu}, b_{kgu}). \end{aligned}$$
  3. Hereafter, the “*” notation refers to the statistics after switching/merging.

  4. The dimension of the vector \(\varvec{\omega }\) does not change.

  5. More informations about the way the data were collected can be found in Isella et al. (2011) or visiting the website http://www.sociopatterns.org/datasets/hypertext-2009-dynamic-contact-network/.

  6. More informations at http://www.ht2009.org/program.php.

References

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

    Article  Google Scholar 

  • Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Experi 2008(10):10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008  

    Article  Google Scholar 

  • Côme E, Latouche P (2015) Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Stat Model 15(6):564–589

    Article  MathSciNet  Google Scholar 

  • Corneli M, Latouche P, Rossi F (2015) Modelling time evolving interactions in networks through a non stationary extension of stochastic block models. In: Pei J, Silvestri F, Tang J (eds) International conference on advances in social networks analysis and mining ASONAM 2015. IEEE/ACM, ACM, Paris, France, pp 1590–1591. https://hal.archives-ouvertes.fr/hal-01263540

  • Dubois C, Butts C, Smyth P (2013) Stochastic blockmodelling of relational event dynamics. In: International conference on artificial intelligence and statistics. Volume 31 of the Journal of Machine Learning Research Proceedings, pp 238–246

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  • Goldenberg A, Zheng X, Fienberg SE, Airoldi EM (2009) A survey of statistical network models. Mach Learn 2(2):129–133

    Article  MATH  Google Scholar 

  • Guigourès R, Boullé M, Rossi F (2015) Discovering patterns in time-varying graphs: a triclustering approach. Adv Data Anal Classif. doi:10.1007/s11634-015-0218-6

    Google Scholar 

  • Guigourès R, Boullé M, Rossi F (2012) A triclustering approach for time evolving graphs. In: Co-clustering and applications, IEEE 12th international conference on data mining workshops (ICDMW 2012). Brussels, Belgium, pp 115–122

  • Holland P, Laskey K, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Netw 5:109–137

    Article  MathSciNet  Google Scholar 

  • Isella L, Stehl J, Barrat A, Cattuto C, Pinton J, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180

    Article  Google Scholar 

  • Leemis LM (1991) Nonparametric estimation of the cumulative intensity function for a nonhomogeneous poisson process. Manag Sci 37(7):886–900. http://www.jstor.org/stable/2632541

  • Lorrain F, White H (1971) Structural equivalence of individuals in social networks. J Math Sociol 1:49–80

    Article  Google Scholar 

  • Matias C, Rebafka T, Villers F (2015) Estimation and clustering in a semiparametric Poisson process stochastic block model for longitudinal networks, HAL (preprint)

  • Noack A, Rotta R (2008) Multi-level algorithms for modularity clustering. CoRR arXiv:0812.4073

  • Nouedoui L, Latouche P (2013) Bayesian non parametric inference of discrete valued networks. In: 21th European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2013). Bruges, Belgium, pp 291–296

  • Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes 3rd edition: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  • Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Y, Wong G (1987) Stochastic blockmodels for directed graphs. J Am Stat Assoc 82:8–19

    Article  MathSciNet  MATH  Google Scholar 

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 506. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • White HC, Boorman S, Breiger R (1976) Social structure from multiple networks: I. Blockmodels of roles and positions. Am J Sociol 81(4):730–780

    Article  Google Scholar 

  • Wyse J, Friel N, Latouche P (2014) Inferring structure in bipartite networks using the latent block model and exact icl. arXiv preprint arXiv:1404.2911

  • Xing EP, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4(2):535–566

    Article  MathSciNet  MATH  Google Scholar 

  • Xu KS, Hero III AO (2013) Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, pp 201–210

  • Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82(2):157–189

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Corneli.

Appendix: Computational complexity

Appendix: Computational complexity

In this section, we provide details about the computational complexity of the main model presented in this paper, namely the model A. Assuming that the gamma function can be computed in constant time (see Press et al. 2007), we focus on the three statistics appearing in Eq. (9), namely

  1. 1.

    \(S_{kgu}:=\sum _{z_i=k} \sum _{z_j=g}Y_{ij}^{I_u}\),

  2. 2.

    \(P_{kgu}:=\prod _{z_i=k} \prod _{z_j=g}Y_{ij}^{I_u}!\),

  3. 3.

    \(R_{kg}:=|{\mathcal {A}}_k||{\mathcal {A}}_g|\).

The whole computation task consists in evaluating the increase in ICL induced by nodes exchanges and merges. Those computations involves the three quantities listed above. The tensor \(\{S_{kgu}\}_{k,g \le K, u\le U}\) is stored in a three-dimensional array, never resized, occupying a \(O(K_{\max }^2U)\) memory space. Hence, at any time during the algorithm its elements can be accessed and modified in constant time. The tensor \(\{P_{kgu}\}_{k,g \le K, u\le U}\) is handled similarly and clusters sizes (we recall that \(|{\mathcal {A}}_k|\) corresponds to the size of cluster \({\mathcal {A}}_k\)) are also stored in arrays. In order to evaluate the ICL changes, induced by an operation, we need to maintain aggregated interaction counts for each node: for a node i we have, e.g.

$$\begin{aligned} S_{igu}:=\sum _{z_j=g}Y_{ij}^{I_u}, \end{aligned}$$

the number of interactions from node i to cluster \({\mathcal {A}}_g\) inside the time interval \(I_u\). Similarly

$$\begin{aligned} S_{igu}':=\sum _{z_j=g}Y_{ji}^{I_u} \end{aligned}$$

denotes the number of interactions from cluster \({\mathcal {A}}_g\) to node i inside the time interval \(I_u\). Other related quantities are considered. These structures occupy a memory space of \(O(N^2 U)\).

1.1 Exchanges

In order to evaluate the ICL increase induced by the switch of a node (say i) from cluster \({\mathcal {A}}_{k'}\) to cluster \({\mathcal {A}}_{l}\), we perform the following operations:

  • \(S_{k'gu}\) (respectively, \(S_{gk'u}\)) is reduced by \(S_{igu}\) (\(S_{igu}'\)) and \(S_{lgu}\) (\(S_{glu}\)) is increased by the same amount;

  • \(P_{k'gu}\) (respectively, \(P_{g'ku}\)) is reduced by \(P_{igu}\) (\(P_{igu}'\)) and \(P_{lgu}\) (\(P_{glu}\)) is increased by the same amount;

  • \({\mathcal {A}}_{k'}\) (\({\mathcal {A}}_l\)) is reduced (increased) by one.

Although these operations are in constant time, they are involved in a sum with (KU) elements (this can be seen in Eq. (22)), so that the total cost of the test is O(KU). Since node i can be switched to \(K-1\) remaining clusters and the graph has N nodes, the cost of a full exchange routine is \(O(NK^2U)\).

Remark 6

When a node is actually switched from its cluster to another one, all data structures are updated, but the update cost is dominated by the cost of the testing phase described above.

Notice that we have evaluated the total cost of one full exchange routine, i.e. in the case where all nodes are considered once. Reductions in the number of clusters (very likely to be induced by exchanges in case \(K_{\max }\) is high) are not taken into account.

1.2 Merges

The entire merge routine, consisting in a test phase and an actual merge, has a computational cost that is dominated by the cost of exchanges. Consider a cluster \({\mathcal {A}}_{k'}\). We first look for the cluster (say \({\mathcal {A}}_l\)), leading to the best merge (highest increase in the ICL) with \({\mathcal {A}}_{k'}\). This operation has a cost of \(O(K^2U)\): for each \({\mathcal {A}}_l\) the evaluation of the increase in ICL has a cost of O(KU) (see Eq. (23)) and l can take \(K-1\) possible values. Since we look for the best merge for all \(k' \in \{1,\ldots , K \}\), the computational cost for a merge of two nodes clusters is \(O(K^3U)\), where we recall that \(D\le N\).

1.3 Total cost

The worst case complexity for one iteration of the algorithm, with each node considered once, is \(O(NK^2U)\). However, it is difficult to evaluate the actual complexity of the whole algorithm for two reasons. Firstly, we have no way to estimate the number of exchanges needed in the exchange phase. Secondly, nodes exchanges are very likely to reduce the number of clusters, especially at the beginning of the algorithm, when \(K_{\max }\) is relatively high. Thus, the individual cost of an exchange reduces very quickly, leading to a vast overestimation of its cost using the proposed bounds. A detailed evaluation of the behaviour of the proposed algorithm, although outside the scope of the this paper, would be necessary to assess its use on large data sets.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Corneli, M., Latouche, P. & Rossi, F. Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL. Soc. Netw. Anal. Min. 6, 55 (2016). https://doi.org/10.1007/s13278-016-0368-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0368-3

Keywords

Navigation