Skip to main content
Log in

Discovering patterns in time-varying graphs: a triclustering approach

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

This paper introduces a novel technique to track structures in time varying graphs. The method uses a maximum a posteriori approach for adjusting a three-dimensional co-clustering of the source vertices, the destination vertices and the time, to the data under study, in a way that does not require any hyper-parameter tuning. The three dimensions are simultaneously segmented in order to build clusters of source vertices, destination vertices and time segments where the edge distributions across clusters of vertices follow the same evolution over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make any a priori quantization. Experiments conducted on artificial data illustrate the good behavior of the technique, and a study of a real-life data set shows the potential of the proposed approach for exploratory data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. To avoid confusion, we denote \(\nu \) the number of edges as a parameter of the model and \(m\) the number of edges in a given data set.

  2. Transport for London, http://www.tfl.gov.uk.

  3. On a standard desktop PC, this takes approximately 50 min, with a maximal memory occupation of 4.5 GB.

References

  • Bekkerman R, El-Yaniv R, McCallum A (2005) Multi-way distributional clustering via pairwise interractions. In: ICML, pp 41–48

  • Borgatti SP (1988) A comment on Doreian’s regular equivalence in symmetric structures. Soc Netw 10:265–271

    Article  MathSciNet  Google Scholar 

  • Boullé M (2011) Data grid models for preparation and modeling in supervised learning. In: Guyon I, Cawley G, Dror G, Saffari A (eds) Hands-on pattern recognition: challenges in machine learning, vol 1. Microtome Publishing, pp 99–130

  • Casteigts A, Flocchini P, Quattrociocchi W, Santoro N (2012) Time-varying graphs and dynamic networks. Int J Parallel Emerg Distrib Syst 27(5):387–408. doi:10.1080/17445760.2012.668546

    Article  Google Scholar 

  • Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Dhillon IS, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: KDD ’03, pp 89–98

  • Erdős P, Rényi A (1959) On random graphs. I. Publ Math 6:290–297

    MathSciNet  MATH  Google Scholar 

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174

    Article  MathSciNet  Google Scholar 

  • Goldenberg A, Zheng AX, Fienberg S, Airoldi EM (2009) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233

    Article  Google Scholar 

  • Grünwald P (2007) The minimum description length principle. Mit Press, Cambridge

    Google Scholar 

  • Guigourès R, Boullé M, Rossi F (2012) A triclustering approach for time evolving graphs. In: Co-clustering and applications, IEEE 12th international conference on data mining workshops (ICDMW 2012), Brussels, Belgium, pp 115–122. doi:10.1109/ICDMW.2012.61

  • Hansen P, Mladenovic N (2001) Variable neighborhood search: principles and applications. Eur J Oper Res 130(3):449–467

    Article  MathSciNet  Google Scholar 

  • Hartigan J (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129

    Article  Google Scholar 

  • Hintze JL, Nelson RD (1998) Violin plots: a box plot-density trace synergism. Am Stat 52(2):181–184. doi:10.1080/00031305.1998.10480559

    Article  Google Scholar 

  • Hopcroft J, Khan O, Kulis B, Selman B (2004) Tracking evolving communities in large linked networks. PNAS 101:5249–5253

    Article  Google Scholar 

  • Kemp C, Tenenbaum J (2006) Learning systems of concepts with an infinite relational model. In: AAAI’06

  • Lang KJ (2009) Information theoretic comparison of stochastic graph models: some experiments. In: WAW, pp 1–12

  • Li Y, Jain A (1998) Classification of text documents. Comput J 41(8):537–546

    Article  Google Scholar 

  • Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151

    Article  MathSciNet  Google Scholar 

  • Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge

    MATH  Google Scholar 

  • Nadel SF (1957) The theory of social structure. Cohen & West, London

    Google Scholar 

  • Nadif M, Govaert G (2010) Model-based co-clustering for continuous data. In: ICMLA, pp 175–180

  • Nowicki K, Snijders T (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96:1077–1087

    Article  MathSciNet  Google Scholar 

  • Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818

    Article  Google Scholar 

  • Palla G, Barabási AL, Vicsek T (2007) Quantifying social group evolution. Nature 446:664–667

    Article  Google Scholar 

  • Rege M, Dong M, Fotouhi F (2006) Co-clustering documents and words using bipartite isoperimetric graph partitioning. In: ICDM, pp 532–541

  • Schaeffer S (2007) Graph clustering. Comput Sci Rev 1(1):27–64

    Article  Google Scholar 

  • Schepers J, Van Mechelen I, Ceulemans E (2006) Three-mode partitioning. Comput Stat Data Anal 51(3):1623–1642

    Article  MathSciNet  Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423

    Article  MathSciNet  Google Scholar 

  • Slonim N, Tishby N (1999) Agglomerative information bottleneck. Adv Neural Inf Process Syst 12:617–623

    Google Scholar 

  • Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partition. JMLR 3:583–617

    MathSciNet  MATH  Google Scholar 

  • Sun J, Faloutsos C, Papadimitriou S, Yu P (2007) Graphscope: parameter-free mining of large time-evolving graphs. In: KDD ’07, pp 687–696

  • Van Mechelen I, Bock HH, De Boeck P (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13(5):363–394

    Article  MathSciNet  Google Scholar 

  • White DR, Reitz KP (1983) Graph and semigroup homomorphisms on networks of relations. Soc Netw 5(2):193–324

    Article  MathSciNet  Google Scholar 

  • White H, Boorman S, Breiger R (1976) Social structure from multiple networks: I. Blockmodels of roles and positions. Am J Sociol 81(4):730–780

    Article  Google Scholar 

  • Xing EP, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4(2):535–566

    Article  MathSciNet  Google Scholar 

  • Zhao L, Zaki M (2005) Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: SIGMOD conference, pp 694–705

Download references

Acknowledgments

The authors thank the anonymous reviewers and the associate editor for their valuable comments that helped improving this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabrice Rossi.

Appendix 1: Interpretations of the dissimilarity between two clusters

Appendix 1: Interpretations of the dissimilarity between two clusters

Interestingly, the dissimilarity given in Definition 3 receives several interpretations. It corresponds to a loss of coding length (when the MODL criterion is interpreted as a description length), a loss of posterior probability of the triclustering given the data (see Proposition 1), and asymptotically to a divergence between probability distributions associated to the clusters (see Proposition 2).

Proposition 1

The exponential of the dissimilarity between two clusters, \(c_1\) and \(c_2\), gives the inverse ratio between the probability of the simplified triclustering given the data set and the probability of the original triclustering given the data set:

$$\begin{aligned} P(\mathcal {M}|E)=e^{\Delta _{ MODL }(c_1,c_2)} P(\mathcal {M}_{\text{ merge } c_1 \text { and }c_2}|E). \end{aligned}$$
(31)

Asymptotically—i.e when the number of edges tends to infinity - the dissimilarity between two clusters is proportional to a generalized Jensen–Shannon divergence between two distributions that characterize the clusters in the triclustering structure. To simplify the discussion, we give only the definition and result for the case of source clusters, but this can be generalized to the two other cases.

Definition 5

Let \(\mathcal {M}\) be a triclustering. For all \(i\in \{1,\ldots ,k_S\}\) we denote

$$\begin{aligned} \mathbb {P}^S_i=\left( \frac{\mu _{ijl}}{\mu _{{i}..}}\right) _{1\le j\le k_D, 1\le l\le k_T}. \end{aligned}$$
(32)

The matrix \(\mathbb {P}^S_i\) can be interpreted as a probability distribution over \(\{1, \ldots , k_D\}\times \{1, \ldots , k_T\}\). It characterizes \(c^S_i\) as a cluster of source vertices as seen from clusters of destination vertices and of time stamps.

We denote \(\mathbb {P}^S\) the associated marginal probability distribution obtained by

$$\begin{aligned} \mathbb {P}^S=\left( \frac{\sum _{i=1}^{k_S}\mu _{ijl}}{\sum _{i=1}^{k_S}\mu _{{i}..}}\right) _{1\le j\le k_D, 1\le l\le k_T}. \end{aligned}$$
(33)

Obviously, we have

$$\begin{aligned} \mathbb {P}^S=\sum _{i=1}^{k_S}\pi _i\mathbb {P}^S_i, \end{aligned}$$
(34)

where

$$\begin{aligned} \pi _i=\frac{\mu _{{i}..}}{\sum _{k=1}^{k_S}\mu _{{k}..}}. \end{aligned}$$
(35)

Proposition 2

Let \(\mathcal {M}\) be a triclustering and let \(c^S_i\) and \(c^S_k\) be two source clusters. Then

$$\begin{aligned} \dfrac{\Delta _{ MODL }(c^S_i,c^S_k)}{\nu }\underset{\nu \rightarrow +\infty }{\longrightarrow } (\pi _i+\pi _k) JS^{\alpha _i,\alpha _k} (\mathbb {P}^S_i,\mathbb {P}^S_k), \end{aligned}$$
(36)

with

$$\begin{aligned} JS^{\alpha _i,\alpha _k} (\mathbb {P}^S_i,\mathbb {P}^S_k)=\alpha _i KL(\mathbb {P}^S_i || \alpha _i \mathbb {P}^S_i + \alpha _k \mathbb {P}^S_k) + \alpha _k KL(\mathbb {P}^S_k || \alpha _i \mathbb {P}^S_i + \alpha _k \mathbb {P}^S_k), \end{aligned}$$
(37)

and where \(\alpha _i\) and \(\alpha _k\) are the normalized mixture coefficients such as \(\alpha _i = \frac{\pi _i}{\pi _i+\pi _k}\) and \(\alpha _k = \frac{\pi _k}{\pi _i+\pi _k}\).

Proof

JS is the generalized Jensen–Shannon Divergence (Lin 1991) and KL, the Kullback–Leibler Divergence. The full proof is left out for brevity and relies on the Stirling approximation: \(\log n!= n \log (n) - n + O(\log n)\), when the difference between the criterion value after and before the merge is computed. \(\square \)

The Jensen–Shannon divergence has some interesting properties: it is a symmetric and non-negative divergence measure between two probability distributions. In addition, the Jensen–Shannon divergence of two identical distributions is equal to zero. While this divergence is not a metric, as it is not sub-additive, it has nevertheless the minimal properties needed to be used as a dissimilarity measure within an agglomerative process in the context of co-clustering (Slonim and Tishby 1999).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guigourès, R., Boullé, M. & Rossi, F. Discovering patterns in time-varying graphs: a triclustering approach. Adv Data Anal Classif 12, 509–536 (2018). https://doi.org/10.1007/s11634-015-0218-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-015-0218-6

Keywords

Mathematics Subject Classification

Navigation