GUISE: a uniform sampler for constructing frequency histogram of graphlets
Regular Paper
First Online:
Received:
Revised:
Accepted:
- 282 Downloads
- 7 Citations
Abstract
Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose Guise, which uses a Markov Chain Monte Carlo sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that Guise obtains the GFD with very low rate of error within few minutes, whereas the exhaustive counting-based approach takes several days.
Keywords
Graphlet counting MCMC sampling Graph analysis Graph mining Graphlet sampling Graphlet degree distribution Uniform sampling Subgraph concentrationReferences
- 1.Azari Soufiani H, Airoldi EM (2012) Graphlet decomposition of a weighted network. ArXiv e-printsGoogle Scholar
- 2.Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512CrossRefMathSciNetGoogle Scholar
- 3.Baumes J, Goldberg M, Magdon-ismail M, Wallace W (2004) Discovering hidden groups in communication networks. In: Proceedings of the 2nd NSF/NIJ symposium on intelligence and security informaticsGoogle Scholar
- 4.Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’08, pp 16–24. ACM, New York, NY, USAGoogle Scholar
- 5.Becchetti L, Boldi P, Castillo C, Gionis A (2010) Efficient algorithms for large-scale local triangle counting. ACM Trans Knowl Discov Data 4(3):13-1–13-28Google Scholar
- 6.Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science 323:892–895CrossRefGoogle Scholar
- 7.Chen J, Hsu W, Lee ML, Ng SK (2006) NeMoFinder: dissecting genome-wide protein–protein interactions with meso-scale network motifs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06, pp. 106–115Google Scholar
- 8.Chung RK (1997) Spectral graph theory. American Mathematical Society, Providence, RIMATHGoogle Scholar
- 9.Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:S95–S120CrossRefGoogle Scholar
- 10.Duke R, Lefmann H, Rodl V (1995) A fast approximation algorithm for computing the frequencies of subgraphs in a given graph. SIAM J Comput 24(3): 598–620Google Scholar
- 11.Eberle W, Holder L (2009) Graph-based approaches to insider threat detection. In: Proceedings of the 5th annual workshop on cyber security and information intelligence research: cyber security and information intelligence challenges and strategiesGoogle Scholar
- 12.Eckmann JP, Moses E (2002) Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc Natl Acad Sci USA 99(9):5825–5829CrossRefMathSciNetGoogle Scholar
- 13.Erdös P, Rènyi A (1959) On random graphs. Publicationes Mathematicae (Debrecen), vol 6, pp 290–297Google Scholar
- 14.Erdös P, Rènyi A (1960) On the evolution of random graphs. In: Publication of The Mathematical Institute of The Hungarian Academy of Sciences, pp 17–61Google Scholar
- 15.Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM’99, pp 251–262 (1999)Google Scholar
- 16.Foucault Welles B, Van Devender A, Contractor N (2010) Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In: CHI’10 extended abstracts on human factors in computing systems, CHI EA’10, pp 4027–4032Google Scholar
- 17.Grochow JA, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the 11th annual international conference on research in computational molecular biology, RECOMB’07, pp 92–106Google Scholar
- 18.Guruswami V (2000) Rapidly mixing markov chains: a comparison of techniques. A SurveyGoogle Scholar
- 19.Hasan MA, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics, Springer, Science+Business Media, LLC, p 243. ISBN 978-1-4419-8461-6Google Scholar
- 20.Kashani Z, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari E, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10(1):318CrossRefGoogle Scholar
- 21.Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758CrossRefGoogle Scholar
- 22.Kuchaiev O, Stevanović A, Hayes W, Pržulj N (2011) GraphCrunch 2: software tool for network modeling, alignment and clustering. BMC Bioinform 12(1):24CrossRefGoogle Scholar
- 23.Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD’05, pp 177–187Google Scholar
- 24.Lussier J, Bank J (2011) Local structure and evolution for cascade prediction. Stanford University Technical reportGoogle Scholar
- 25.Montenegro R, Tetali P (2006) Mathematical aspects of mixing times in Markov chains. Found Trends Theor Comput Sci 1:237–354CrossRefMathSciNetGoogle Scholar
- 26.Milenkovic T, Pržulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:257–273Google Scholar
- 27.Motwani S, Raghavan P (1995) Randomize algorithms. Cambridge University Press, Cambridge, MACrossRefGoogle Scholar
- 28.Omidi S, Schreiber F, Masoudi-nejad A (2009) MODA: an efficient algorithm for network motif discovery in biological networks. Genes Genet Syst 84(5):385–395CrossRefGoogle Scholar
- 29.Pržulj N (2010) Biological network comparison using graphlet degree distribution. Bioinformatics 26(6):853–854CrossRefGoogle Scholar
- 30.Pržulj N, Corneil DG, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508–3515Google Scholar
- 31.Pržulj N, Corneil DG, Jurisica I (2006) Efficient estimation of graphlet frequency distributions in protein-protein interaction networks. Bioinformatics 22(8):974–980CrossRefGoogle Scholar
- 32.Schreiber F, Schwobbermeyer H (2005) Frequency concepts and pattern detection for the analysis of motifs in networks. Trans Comput Syst Biol 3:89–104MathSciNetGoogle Scholar
- 33.Shervashidze N, Vishwanathan SVN, Petri TH, Mehlhorn K, Borgwardt KM (2009) Efficient graphlet kernels for large graph comparison. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS), JMLR: workshop and conference proceedings, vol 5, pp 488–495. CSAILGoogle Scholar
- 34.Tyson JJ, Novak B (2010) Functional motifs in biochemical reaction networks. Annu Rev Phys Chem 61:219–240CrossRefGoogle Scholar
- 35.Vacic V, Lilia M. Iakoucheva SL, Radivojac P (2010) Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol 17:55–72Google Scholar
- 36.Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- 37.Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442Google Scholar
- 38.Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153CrossRefGoogle Scholar
- 39.Zegura EW, Calvert KL, Donahoo MJ (1997) A quantitative comparison of graph-based models for internet topology. IEEE/ACM Trans Netw 5(6):770–783CrossRefGoogle Scholar
Copyright information
© Springer-Verlag London 2013