Rand-FaSE: fast approximate subgraph census

Original Article
  • 106 Downloads

Abstract

Determining the frequency of small subgraphs is an important graph mining primitive. One major class of algorithms for this task is based upon the enumeration of all sets of \(k\) connected nodes. These are known as network-centric algorithms. FAst Subgraph Enumeration (FaSE) is a exact algorithm for subgraph counting that contrasted with its past approaches by performing the isomorphism tests while doing the enumeration, encapsulating the topological information in a g-trie and thus largely reducing the number of required isomorphism tests. Our goal with this paper is to expand this approach by providing an approximate algorithm, which we called Rand-FaSE. It uses an unbiased sampling estimator for the number of subgraphs of each type, allowing an user to trade some accuracy for even faster execution times. We tested our algorithm on a set of representative complex networks, comparing it with the exact alternative, FaSE. We also do an extensive analysis by studying its accuracy and speed gains against previous sampling approaches. With all of this, we believe FaSE and Rand-FaSE pave the way for faster network-centric census algorithms.

Keywords

Complex networks Graph mining Subgraphs G-tries Network motifs Graphlets 

References

  1. Albert I, Albert R (2004) Conserved network motifs allow proteinprotein interaction prediction. Bioinformatics 20(18):3346–3352. doi:10.1093/bioinformatics/bth402 CrossRefGoogle Scholar
  2. Batagelj V, Mrvar A (2006) Pajek datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/
  3. Bhuiyan M, Rahman M, Rahman M, Hasan MA (2012) Guise: uniform sampling of graphlets for large graph analysis. In: IEEE international conference on data mining, ICDM, pp 91–100Google Scholar
  4. Choobdar S, Ribeiro P, Bugla S, Silva F (2012a) Co-authorship network comparison across research fields using motifs. In: IEEE/ACM international conference on advances in social networks analysis and mining, IEEE, pp 147–152. doi:10.1109/ASONAM.2012.34
  5. Choobdar S, Ribeiro P, Silva F (2012b) Motif mining in weighted networks. In: Data mining workshops (ICDMW), 2012 IEEE 12th international conference on, pp. 210–217. doi:10.1109/ICDMW.2012.111
  6. Cook SA (1971) The complexity of theorem-proving procedures. ACM Symposium on Theory of computing. ACM symposium on theory of computing (STOC). ACM, New York, NY, USA, pp 151–158Google Scholar
  7. Costa L, Oliveira O Jr, Travieso G, Rodrigues F, Boas P, Antiqueira L, Viana M, Da Rocha L (2011) Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 60:329–412CrossRefGoogle Scholar
  8. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174MathSciNetCrossRefGoogle Scholar
  9. Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 06(04), pp. 565–573. doi:10.1142/S0219525903001067
  10. Grochow J, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. Res Comput Mol Biol, pp 92–106Google Scholar
  11. Itzkovitz S, Levitt R, Kashtan N, Milo R, Itzkovitz M, Alon U (2005) Coarse-graining and self-dissimilarity of complex networks. Phys Rev E (Stat Nonlin Soft Matter Phys) 71:016127CrossRefGoogle Scholar
  12. Janssen E, Hurshman M, Kalyaniwalla N (2012) Model selection for social networks using graphlets. Internet MathGoogle Scholar
  13. Kashani Z, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari E, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10(1):318CrossRefGoogle Scholar
  14. Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758CrossRefGoogle Scholar
  15. Khakabimamaghani S, Sharafuddin I, Dichter N, Koch I, Masoudi-Nejad A (2013) Quatexelero: an accelerated exact network motif detection algorithm. PLoS ONE 8(7):e68073. doi:10.1371/journal.pone.0068073 CrossRefGoogle Scholar
  16. Kreher DL, Stinson DR (1999) Combinatorial algorithms: generation, enumeration, and search. SIGACT News 30(1):33–35CrossRefGoogle Scholar
  17. Leskovec J, Kleinberg JM, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov From Data 1(1). doi:10.1145/1217299.1217301
  18. Li X, Stones DS, Wang H, Deng H, Liu X, Wang G (2012) Netmode: network motif detection without nauty. PLoS One 7(12):e50093CrossRefGoogle Scholar
  19. Marcus D, Shavitt Y (2010) Efficient counting of network motifs. In: ICDCS workshops, IEEE Computer Society, pp 92–98Google Scholar
  20. McKay B (2012) nauty. http://cs.anu.edu.au/~bdm/nauty/
  21. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827CrossRefGoogle Scholar
  22. Omidi S, Schreiber F, Masoudi-nejad A (2009) Moda: an efficient algorithm for network motif discovery in biological networksGoogle Scholar
  23. Paredes P, Ribeiro P (2013) Towards a faster network-centric subgraph census. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ACM, New York, NY, USA, ASONAM ’13, pp 264–271. doi:10.1145/2492517.2492535
  24. Pržulj N (2010) Biological network comparison using graphlet degree distribution. Bioinformatics 26(6):853–854CrossRefGoogle Scholar
  25. Ribeiro P, Silva F (2010) Efficient subgraph frequency estimation with g-tries. International Workshop on algorithms in bioinformatics, Springer, WABI, 6293:238–249Google Scholar
  26. Ribeiro P, Silva F (2014a) Discovering colored network motifs. In: Contucci P, Menezes R, Omicini A, Poncela-Casasnovas J (eds) Complex networks V, Studies in computational intelligence, vol 549, Springer International Publishing, pp 107–118. doi:10.1007/978-3-319-05401-8_11
  27. Ribeiro P, Silva F (2014b) G-tries: a data structure for storing and finding subgraphs. Data Min Knowl Discov 28:337–377MATHMathSciNetCrossRefGoogle Scholar
  28. Ribeiro P, Silva F, Kaiser M (2009) Strategies for network motifs discovery. In: IEEE international conference on e-Science, e-Science, pp 80–87Google Scholar
  29. Schreiber F, Schwobbermeyer H (2004) Towards motif detection in networks: frequency concepts and flexible search. In: International workshop on network tools and applications in biology, NetTAB, pp 91–102Google Scholar
  30. Slota GM, Madduri K (2013) Fast approximate subgraph counting and enumeration. In: 42nd international conference on parallel processing (ICPP), pp 210–219Google Scholar
  31. Sporns O, Kötter R (2004) Motifs in brain networks. PLoS Biol 2:369CrossRefGoogle Scholar
  32. Valverde S, Solé RV (2005) Network motifs in computational graphs: a case study in software architecture. Phys Rev E 72(026):107. doi:10.1103/PhysRevE.72.026107 Google Scholar
  33. Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature pp 440–442Google Scholar
  34. Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinf, pp 347–359Google Scholar
  35. Wu G, Harrigan M, Cunningham P (2011) Characterizing wikipedia pages using edit network motif profiles. In: 3rd International workshop on search and mining user-generated contents (SMUC), ACM, New York, NY, USA, pp 45–52Google Scholar
  36. Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, ACM, New York, NY, USA, MDS ’12, pp 3:1–3:8. doi:10.1145/2350190.2350193

Copyright information

© Springer-Verlag Wien 2015

Authors and Affiliations

  1. 1.CRACS and INESC-TEC, DCC-FCUPUniversidade do PortoPortoPortugal

Personalised recommendations