Knowledge and Information Systems

, Volume 38, Issue 3, pp 511–536 | Cite as

GUISE: a uniform sampler for constructing frequency histogram of graphlets

  • Mahmudur Rahman
  • Mansurul Alam Bhuiyan
  • Mahmuda Rahman
  • Mohammad Al Hasan
Regular Paper

Abstract

Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose Guise, which uses a Markov Chain Monte Carlo sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that Guise obtains the GFD with very low rate of error within few minutes, whereas the exhaustive counting-based approach takes several days.

Keywords

Graphlet counting MCMC sampling Graph analysis  Graph mining Graphlet sampling Graphlet degree distribution Uniform sampling Subgraph concentration 

References

  1. 1.
    Azari Soufiani H, Airoldi EM (2012) Graphlet decomposition of a weighted network. ArXiv e-printsGoogle Scholar
  2. 2.
    Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512CrossRefMathSciNetGoogle Scholar
  3. 3.
    Baumes J, Goldberg M, Magdon-ismail M, Wallace W (2004) Discovering hidden groups in communication networks. In: Proceedings of the 2nd NSF/NIJ symposium on intelligence and security informaticsGoogle Scholar
  4. 4.
    Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’08, pp 16–24. ACM, New York, NY, USAGoogle Scholar
  5. 5.
    Becchetti L, Boldi P, Castillo C, Gionis A (2010) Efficient algorithms for large-scale local triangle counting. ACM Trans Knowl Discov Data 4(3):13-1–13-28Google Scholar
  6. 6.
    Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science 323:892–895CrossRefGoogle Scholar
  7. 7.
    Chen J, Hsu W, Lee ML, Ng SK (2006) NeMoFinder: dissecting genome-wide protein–protein interactions with meso-scale network motifs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06, pp. 106–115Google Scholar
  8. 8.
    Chung RK (1997) Spectral graph theory. American Mathematical Society, Providence, RIMATHGoogle Scholar
  9. 9.
    Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:S95–S120CrossRefGoogle Scholar
  10. 10.
    Duke R, Lefmann H, Rodl V (1995) A fast approximation algorithm for computing the frequencies of subgraphs in a given graph. SIAM J Comput 24(3): 598–620Google Scholar
  11. 11.
    Eberle W, Holder L (2009) Graph-based approaches to insider threat detection. In: Proceedings of the 5th annual workshop on cyber security and information intelligence research: cyber security and information intelligence challenges and strategiesGoogle Scholar
  12. 12.
    Eckmann JP, Moses E (2002) Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc Natl Acad Sci USA 99(9):5825–5829CrossRefMathSciNetGoogle Scholar
  13. 13.
    Erdös P, Rènyi A (1959) On random graphs. Publicationes Mathematicae (Debrecen), vol 6, pp 290–297Google Scholar
  14. 14.
    Erdös P, Rènyi A (1960) On the evolution of random graphs. In: Publication of The Mathematical Institute of The Hungarian Academy of Sciences, pp 17–61Google Scholar
  15. 15.
    Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM’99, pp 251–262 (1999)Google Scholar
  16. 16.
    Foucault Welles B, Van Devender A, Contractor N (2010) Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In: CHI’10 extended abstracts on human factors in computing systems, CHI EA’10, pp 4027–4032Google Scholar
  17. 17.
    Grochow JA, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the 11th annual international conference on research in computational molecular biology, RECOMB’07, pp 92–106Google Scholar
  18. 18.
    Guruswami V (2000) Rapidly mixing markov chains: a comparison of techniques. A SurveyGoogle Scholar
  19. 19.
    Hasan MA, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics, Springer, Science+Business Media, LLC, p 243. ISBN 978-1-4419-8461-6Google Scholar
  20. 20.
    Kashani Z, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari E, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10(1):318CrossRefGoogle Scholar
  21. 21.
    Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758CrossRefGoogle Scholar
  22. 22.
    Kuchaiev O, Stevanović A, Hayes W, Pržulj N (2011) GraphCrunch 2: software tool for network modeling, alignment and clustering. BMC Bioinform 12(1):24CrossRefGoogle Scholar
  23. 23.
    Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD’05, pp 177–187Google Scholar
  24. 24.
    Lussier J, Bank J (2011) Local structure and evolution for cascade prediction. Stanford University Technical reportGoogle Scholar
  25. 25.
    Montenegro R, Tetali P (2006) Mathematical aspects of mixing times in Markov chains. Found Trends Theor Comput Sci 1:237–354CrossRefMathSciNetGoogle Scholar
  26. 26.
    Milenkovic T, Pržulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:257–273Google Scholar
  27. 27.
    Motwani S, Raghavan P (1995) Randomize algorithms. Cambridge University Press, Cambridge, MACrossRefGoogle Scholar
  28. 28.
    Omidi S, Schreiber F, Masoudi-nejad A (2009) MODA: an efficient algorithm for network motif discovery in biological networks. Genes Genet Syst 84(5):385–395CrossRefGoogle Scholar
  29. 29.
    Pržulj N (2010) Biological network comparison using graphlet degree distribution. Bioinformatics 26(6):853–854CrossRefGoogle Scholar
  30. 30.
    Pržulj N, Corneil DG, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508–3515Google Scholar
  31. 31.
    Pržulj N, Corneil DG, Jurisica I (2006) Efficient estimation of graphlet frequency distributions in protein-protein interaction networks. Bioinformatics 22(8):974–980CrossRefGoogle Scholar
  32. 32.
    Schreiber F, Schwobbermeyer H (2005) Frequency concepts and pattern detection for the analysis of motifs in networks. Trans Comput Syst Biol 3:89–104MathSciNetGoogle Scholar
  33. 33.
    Shervashidze N, Vishwanathan SVN, Petri TH, Mehlhorn K, Borgwardt KM (2009) Efficient graphlet kernels for large graph comparison. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS), JMLR: workshop and conference proceedings, vol 5, pp 488–495. CSAILGoogle Scholar
  34. 34.
    Tyson JJ, Novak B (2010) Functional motifs in biochemical reaction networks. Annu Rev Phys Chem 61:219–240CrossRefGoogle Scholar
  35. 35.
    Vacic V, Lilia M. Iakoucheva SL, Radivojac P (2010) Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol 17:55–72Google Scholar
  36. 36.
    Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  37. 37.
    Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442Google Scholar
  38. 38.
    Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153CrossRefGoogle Scholar
  39. 39.
    Zegura EW, Calvert KL, Donahoo MJ (1997) A quantitative comparison of graph-based models for internet topology. IEEE/ACM Trans Netw 5(6):770–783CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Mahmudur Rahman
    • 1
  • Mansurul Alam Bhuiyan
    • 1
  • Mahmuda Rahman
    • 2
  • Mohammad Al Hasan
    • 1
  1. 1.Department of Computer ScienceIndiana University–Purdue UniversityIndianapolisUSA
  2. 2.Department of Computer ScienceSyracuse UniversitySyracuseUSA

Personalised recommendations