Knowledge and Information Systems

, Volume 50, Issue 3, pp 689–722 | Cite as

Graphlet decomposition: framework, algorithms, and applications

  • Nesreen K. Ahmed
  • Jennifer Neville
  • Ryan A. Rossi
  • Nick G. Duffield
  • Theodore L. Willke
Regular paper

Abstract

From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient framework for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with billions of nodes and edges. In this paper, we propose a fast, efficient, and parallel framework as well as a family of algorithms for counting k-node graphlets. The proposed framework leverages a number of theoretical combinatorial arguments that allow us to obtain significant improvement on the scalability of graphlet counting. For each edge, we count a few graphlets and obtain the exact counts of others in constant time using the combinatorial arguments. On a large collection of \(300+\) networks from a variety of domains, our graphlet counting strategies are on average \(460{\times }\) faster than existing methods. This brings new opportunities to investigate the use of graphlets on much larger networks and newer applications as we show in the experiments. To the best of our knowledge, this paper provides the largest graphlet computations to date.

Keywords

Graphlet Motif Graph mining Graph kernel Classification Graph features Higher-order graph statistics Biological networks Visual graph analytics 

References

  1. 1.
    Ahlberg C, Williamson C, Shneiderman B (1992) Dynamic queries for information exploration: an implementation and evaluation. In: Proceedings of SIGCHI, pp 619–626Google Scholar
  2. 2.
    Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: SIGKDDGoogle Scholar
  3. 3.
    Ahmed NK, Neville J, Kompella R (2010) Reconsidering the foundations of network sampling. In: Proceedings of the 2nd Workshop on Information in NetworksGoogle Scholar
  4. 4.
    Ahmed NK, Neville J, Kompella R (2012) Space-efficient sampling from social activity streams. In: Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications, pp 53–60Google Scholar
  5. 5.
    Ahmed NK, Neville J, Kompella R (2014) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8(2):1–56CrossRefGoogle Scholar
  6. 6.
    Ahmed NK, Rossi RA (2015) Interactive visual graph analytics on the web. In: Proceedings of the Ninth International AAAI Conference on Web and Social MediaGoogle Scholar
  7. 7.
    Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: SIGKDDGoogle Scholar
  8. 8.
    Bhuiyan MA, Rahman M, Rahman M, Al Hasan M (2012) Guise: uniform sampling of graphlets for large graph analysis. In: ICDMGoogle Scholar
  9. 9.
    Costa F, De Grave K (2010) Fast neighborhood subgraph pairwise distance kernel. In: ICMLGoogle Scholar
  10. 10.
    Faust K (2010) A puzzle concerning triads in social networks: graph constraints and the triad census. Soc Netw 32(3):221–233CrossRefGoogle Scholar
  11. 11.
    Feldman D, Shavitt Y (2008) Automatic large scale generation of internet pop level maps. In: IEEE GLOBECOMGoogle Scholar
  12. 12.
    Frank O (1988) Triad count statistics. Ann Discrete Math 38:141–149MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, CambridgeMATHGoogle Scholar
  14. 14.
    Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007) The human disease network. PNAS 104(21):8685–8690CrossRefGoogle Scholar
  15. 15.
    Gonen M, Shavitt Y (2009) Approximating the number of network motifs. Internet Math 6(3):349–372MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Granovetter M (1983) The strength of weak ties: a network theory revisited. Sociol Theory 1(1):201–233CrossRefGoogle Scholar
  17. 17.
    Gross JL, Yellen J, Zhang P (2013) Handbook of graph theory, 2nd edn. Chapman & Hall, LondonMATHGoogle Scholar
  18. 18.
    Hales D, Arteconi S (2008) Motifs in evolving cooperative networks look like protein structure networks. J Netw Heterog Media 3(2):239–249MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Hayes W, Sun K, Pržulj N (2013) Graphlet-based measures are suitable for biological network comparison. Bioinformatics 29(4):483–491CrossRefGoogle Scholar
  20. 20.
    Hočevar T, Demšar J (2014) A combinatorial approach to graphlet counting. Bioinformatics 30(4):559–565CrossRefGoogle Scholar
  21. 21.
    Holland PW, Leinhardt S (1976) Local structure in social networks. Sociol Methodol 7:1–45CrossRefGoogle Scholar
  22. 22.
    Kashima H, Saigo H, Hattori M, Tsuda K (2010) Graph kernels for chemoinformatics. Chemoinformatics and advanced machine learning perspectives: complex computational methods and collaborative techniques, p 1Google Scholar
  23. 23.
    Kelly PJ (1957) A congruence theorem for trees. Pac J Math 7(1):961–968MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Kloks T, Kratsch D, Müller H (2000) Finding and counting small induced subgraphs efficiently. Inf Process Lett 74(3):115–121MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Kuchaiev O, Milenković T, Memišević V, Hayes W, Pržulj N (2010) Topological network alignment uncovers biological function and phylogeny. J R Soc Interface 7(50):1341–1354CrossRefGoogle Scholar
  26. 26.
    Manvel B, Stockmeyer PK (1971) On reconstruction of matrices. Math Mag 44:218–221MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Marcus D, Shavitt Y (2012) Rage—a rapid graphlet enumerator for large networks. Comput Netw 56(2):810–819CrossRefGoogle Scholar
  28. 28.
    McKay BD (1997) Small graphs are reconstructible. Australas J Comb 15:123–126MathSciNetMATHGoogle Scholar
  29. 29.
    Milenkoviæ T, Pržulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:257Google Scholar
  30. 30.
    Milenković T, Ng WL, Hayes W, Pržulj N (2010) Optimal network alignment with graphlet degree vectors. Cancer Inform 9:121CrossRefGoogle Scholar
  31. 31.
    Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827CrossRefGoogle Scholar
  32. 32.
    Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: SIGKDDGoogle Scholar
  33. 33.
    Pržulj N, Corneil DG, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508–3515CrossRefGoogle Scholar
  34. 34.
    Ralaivola L, Swamidass SJ, Saigo H, Baldi P (2005) Graph kernels for chemical informatics. Neural Netw 18(8):1093–1110CrossRefGoogle Scholar
  35. 35.
    Rossi RA, Ahmed NK (2015a) The network data repository with interactive graph analytics and visualization. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligenceGoogle Scholar
  36. 36.
    Rossi RA, Gallagher B, Neville J, Henderson K (2013) Modeling dynamic behavior in large evolving graphs. In: Proceedings of WSDM, pp 667–676Google Scholar
  37. 37.
    Rossi RA, McDowell LK, Aha DW, Neville J (2012) Transforming graph data for statistical relational learning. J Artif Intell Res 45(1):363–441MATHGoogle Scholar
  38. 38.
    Rossi R, Ahmed N (2015b) Role discovery in networks. In: TKDEGoogle Scholar
  39. 39.
    Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64CrossRefMATHGoogle Scholar
  40. 40.
    Shervashidze N, Petri T, Mehlhorn K, Borgwardt KM, Vishwanathan S (2009) Efficient graphlet kernels for large graph comparison. In: AISTATSGoogle Scholar
  41. 41.
    Stanley RP (1986) What is enumerative combinatorics?. Springer, BerlinCrossRefGoogle Scholar
  42. 42.
    Thomas JJ, Cook KA (2005) Illuminating the path: the research and development agenda for visual analytics. IEEE Computer Society, WashingtonGoogle Scholar
  43. 43.
    Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Physica A 391(16):4165–4180CrossRefGoogle Scholar
  44. 44.
    Ugander J, Backstrom L, Kleinberg J (2013) Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In: WWWGoogle Scholar
  45. 45.
    Vishwanathan SVN, Schraudolph NN, Kondor R, Borgwardt KM (2010) Graph kernels. JMLR 11:1201–1242MathSciNetMATHGoogle Scholar
  46. 46.
    Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442CrossRefGoogle Scholar
  47. 47.
    Wernicke S, Rasche F (2006) Fanmod: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153CrossRefGoogle Scholar
  48. 48.
    Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084MathSciNetCrossRefGoogle Scholar
  49. 49.
    Zhang L, Song M, Liu Z, Liu X, Bu J, Chen C (2013) Probabilistic graphlet cut: exploiting spatial structure cue for weakly supervised image segmentation. In: CVPRGoogle Scholar
  50. 50.
    Zhao B, Sen P, Getoor L (2006) Event classification and relationship labeling in affiliation networks. In: ICML Workshop on Statistical Network Analysis (SNA)Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Nesreen K. Ahmed
    • 1
  • Jennifer Neville
    • 2
  • Ryan A. Rossi
    • 3
  • Nick G. Duffield
    • 4
  • Theodore L. Willke
    • 1
  1. 1.Parallel Computing LabIntel CorporationSanta ClaraUSA
  2. 2.Department of Computer SciencePurdue UniversityWest LafayetteUSA
  3. 3.Palo Alto Research Center (PARC)Palo AltoUSA
  4. 4.Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationUSA

Personalised recommendations