Graphlet decomposition: framework, algorithms, and applications

Ahmed, Nesreen K.; Neville, Jennifer; Rossi, Ryan A.; Duffield, Nick G.; Willke, Theodore L.

doi:10.1007/s10115-016-0965-5

Graphlet decomposition: framework, algorithms, and applications

Regular paper
Published: 27 June 2016

Volume 50, pages 689–722, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Nesreen K. Ahmed¹,
Jennifer Neville²,
Ryan A. Rossi³,
Nick G. Duffield⁴ &
…
Theodore L. Willke¹

1436 Accesses
45 Citations
7 Altmetric
Explore all metrics

Abstract

From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient framework for computing the frequencies of these subgraph patterns. However, existing methods are not scalable to large networks with billions of nodes and edges. In this paper, we propose a fast, efficient, and parallel framework as well as a family of algorithms for counting k-node graphlets. The proposed framework leverages a number of theoretical combinatorial arguments that allow us to obtain significant improvement on the scalability of graphlet counting. For each edge, we count a few graphlets and obtain the exact counts of others in constant time using the combinatorial arguments. On a large collection of \(300+\) networks from a variety of domains, our graphlet counting strategies are on average \(460{\times }\) faster than existing methods. This brings new opportunities to investigate the use of graphlets on much larger networks and newer applications as we show in the experiments. To the best of our knowledge, this paper provides the largest graphlet computations to date.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Ahlberg C, Williamson C, Shneiderman B (1992) Dynamic queries for information exploration: an implementation and evaluation. In: Proceedings of SIGCHI, pp 619–626
Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: SIGKDD
Ahmed NK, Neville J, Kompella R (2010) Reconsidering the foundations of network sampling. In: Proceedings of the 2nd Workshop on Information in Networks
Ahmed NK, Neville J, Kompella R (2012) Space-efficient sampling from social activity streams. In: Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications, pp 53–60
Ahmed NK, Neville J, Kompella R (2014) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8(2):1–56
Article Google Scholar
Ahmed NK, Rossi RA (2015) Interactive visual graph analytics on the web. In: Proceedings of the Ninth International AAAI Conference on Web and Social Media
Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: SIGKDD
Bhuiyan MA, Rahman M, Rahman M, Al Hasan M (2012) Guise: uniform sampling of graphlets for large graph analysis. In: ICDM
Costa F, De Grave K (2010) Fast neighborhood subgraph pairwise distance kernel. In: ICML
Faust K (2010) A puzzle concerning triads in social networks: graph constraints and the triad census. Soc Netw 32(3):221–233
Article Google Scholar
Feldman D, Shavitt Y (2008) Automatic large scale generation of internet pop level maps. In: IEEE GLOBECOM
Frank O (1988) Triad count statistics. Ann Discrete Math 38:141–149
Article MathSciNet MATH Google Scholar
Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, Cambridge
MATH Google Scholar
Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007) The human disease network. PNAS 104(21):8685–8690
Article Google Scholar
Gonen M, Shavitt Y (2009) Approximating the number of network motifs. Internet Math 6(3):349–372
Article MathSciNet MATH Google Scholar
Granovetter M (1983) The strength of weak ties: a network theory revisited. Sociol Theory 1(1):201–233
Article Google Scholar
Gross JL, Yellen J, Zhang P (2013) Handbook of graph theory, 2nd edn. Chapman & Hall, London
MATH Google Scholar
Hales D, Arteconi S (2008) Motifs in evolving cooperative networks look like protein structure networks. J Netw Heterog Media 3(2):239–249
Article MathSciNet MATH Google Scholar
Hayes W, Sun K, Pržulj N (2013) Graphlet-based measures are suitable for biological network comparison. Bioinformatics 29(4):483–491
Article Google Scholar
Hočevar T, Demšar J (2014) A combinatorial approach to graphlet counting. Bioinformatics 30(4):559–565
Article Google Scholar
Holland PW, Leinhardt S (1976) Local structure in social networks. Sociol Methodol 7:1–45
Article Google Scholar
Kashima H, Saigo H, Hattori M, Tsuda K (2010) Graph kernels for chemoinformatics. Chemoinformatics and advanced machine learning perspectives: complex computational methods and collaborative techniques, p 1
Kelly PJ (1957) A congruence theorem for trees. Pac J Math 7(1):961–968
Article MathSciNet MATH Google Scholar
Kloks T, Kratsch D, Müller H (2000) Finding and counting small induced subgraphs efficiently. Inf Process Lett 74(3):115–121
Article MathSciNet MATH Google Scholar
Kuchaiev O, Milenković T, Memišević V, Hayes W, Pržulj N (2010) Topological network alignment uncovers biological function and phylogeny. J R Soc Interface 7(50):1341–1354
Article Google Scholar
Manvel B, Stockmeyer PK (1971) On reconstruction of matrices. Math Mag 44:218–221
Article MathSciNet MATH Google Scholar
Marcus D, Shavitt Y (2012) Rage—a rapid graphlet enumerator for large networks. Comput Netw 56(2):810–819
Article Google Scholar
McKay BD (1997) Small graphs are reconstructible. Australas J Comb 15:123–126
MathSciNet MATH Google Scholar
Milenkoviæ T, Pržulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:257
Google Scholar
Milenković T, Ng WL, Hayes W, Pržulj N (2010) Optimal network alignment with graphlet degree vectors. Cancer Inform 9:121
Article Google Scholar
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Article Google Scholar
Noble CC, Cook DJ (2003) Graph-based anomaly detection. In: SIGKDD
Pržulj N, Corneil DG, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508–3515
Article Google Scholar
Ralaivola L, Swamidass SJ, Saigo H, Baldi P (2005) Graph kernels for chemical informatics. Neural Netw 18(8):1093–1110
Article Google Scholar
Rossi RA, Ahmed NK (2015a) The network data repository with interactive graph analytics and visualization. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence
Rossi RA, Gallagher B, Neville J, Henderson K (2013) Modeling dynamic behavior in large evolving graphs. In: Proceedings of WSDM, pp 667–676
Rossi RA, McDowell LK, Aha DW, Neville J (2012) Transforming graph data for statistical relational learning. J Artif Intell Res 45(1):363–441
MATH Google Scholar
Rossi R, Ahmed N (2015b) Role discovery in networks. In: TKDE
Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Article MATH Google Scholar
Shervashidze N, Petri T, Mehlhorn K, Borgwardt KM, Vishwanathan S (2009) Efficient graphlet kernels for large graph comparison. In: AISTATS
Stanley RP (1986) What is enumerative combinatorics?. Springer, Berlin
Book Google Scholar
Thomas JJ, Cook KA (2005) Illuminating the path: the research and development agenda for visual analytics. IEEE Computer Society, Washington
Google Scholar
Traud AL, Mucha PJ, Porter MA (2012) Social structure of facebook networks. Physica A 391(16):4165–4180
Article Google Scholar
Ugander J, Backstrom L, Kleinberg J (2013) Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In: WWW
Vishwanathan SVN, Schraudolph NN, Kondor R, Borgwardt KM (2010) Graph kernels. JMLR 11:1201–1242
MathSciNet MATH Google Scholar
Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442
Article Google Scholar
Wernicke S, Rasche F (2006) Fanmod: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153
Article Google Scholar
Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084
Article MathSciNet Google Scholar
Zhang L, Song M, Liu Z, Liu X, Bu J, Chen C (2013) Probabilistic graphlet cut: exploiting spatial structure cue for weakly supervised image segmentation. In: CVPR
Zhao B, Sen P, Getoor L (2006) Event classification and relationship labeling in affiliation networks. In: ICML Workshop on Statistical Network Analysis (SNA)

Download references

Author information

Authors and Affiliations

Parallel Computing Lab, Intel Corporation, Santa Clara, CA, 95054, USA
Nesreen K. Ahmed & Theodore L. Willke
Department of Computer Science, Purdue University, West Lafayette, IN, 47906, USA
Jennifer Neville
Palo Alto Research Center (PARC), Palo Alto, CA, 94304, USA
Ryan A. Rossi
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
Nick G. Duffield

Authors

Nesreen K. Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Neville
View author publications
You can also search for this author in PubMed Google Scholar
Ryan A. Rossi
View author publications
You can also search for this author in PubMed Google Scholar
Nick G. Duffield
View author publications
You can also search for this author in PubMed Google Scholar
Theodore L. Willke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nesreen K. Ahmed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmed, N.K., Neville, J., Rossi, R.A. et al. Graphlet decomposition: framework, algorithms, and applications. Knowl Inf Syst 50, 689–722 (2017). https://doi.org/10.1007/s10115-016-0965-5

Download citation

Received: 14 November 2015
Revised: 05 March 2016
Accepted: 04 June 2016
Published: 27 June 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10115-016-0965-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graphlet decomposition: framework, algorithms, and applications

Abstract

Access this article

Similar content being viewed by others

Introduction to Bioinformatics

Complex Networks: a Mini-review

Centrality measures in networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graphlet decomposition: framework, algorithms, and applications

Abstract

Access this article

Similar content being viewed by others

Introduction to Bioinformatics

Complex Networks: a Mini-review

Centrality measures in networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation