Abstract
Mining graphlet statistics is very meaningful due to its wide applications in social networks, bioinformatics and information security, etc. However, it is a big challenge to exactly count graphlet statistics as the number of subgraphs exponentially increases with the graph size, so sampling algorithms are widely used to estimate graphlet statistics within reasonable time. However, existing sampling algorithms are not scalable for large graphlets, e.g., they may get stuck when estimating graphlets with more than five nodes. To address this issue, we propose a highly scalable algorithm, Scalable subgraph Sampling via Random Walk (SSRW), for graphlet counts and concentrations. SSRW samples graphlets by generating new nodes from the neighbors of previously visited nodes instead of fixed ones. Thanks to this flexibility, we can generate any k-graphlets in a unified way and estimate statistics of k-graphlet efficiently even for large k. Our extensive experiments on estimating counts and concentrations of \(\{4,5,6,7\}\)-graphlets show that SSRW algorithm is scalable, accurate and fast.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N.G., Willke, T.L.: Graphlet decomposition: framework, algorithms, and applications. Knowl. Inf. Syst. 50, 1–34 (2016)
Lovász, L.: Random walks on graphs: a survey. Combinatorics: Paul Erdös Is Eighty 2(1), 1–46 (1993)
Bhuiyan, M.A., Rahman, M., Al Hasan, M.: Guise: uniform sampling of graphlets for large graph analysis. In: ICDM. IEEE (2012)
Chen, X., Li, Y., Wang, P., Lui, J.: A general framework for estimating graphlet statistics via random walk. VLDB 10(3), 253–264 (2016)
Chen, X., Lui, J.C.: Mining graphlet counts in online social networks. In: ICDM. IEEE (2016)
Han, G., Sethu, H.: Waddling random walk: fast and accurate mining of motif statistics in large graphs. In: ICDM. IEEE (2016)
Hardiman, S.J., Katzir, L.: Estimating clustering coefficients and size of social networks via random walk. In: WWW. ACM (2013)
Holland, P.W., Leinhardt, S.: A method for detecting structure in sociometric data. Am. J. Sociol. 76(3), 492–513 (1970)
Jha, M., Seshadhri, C., Pinar, A.: Path sampling: a fast and provable method for estimating 4-vertex subgraph counts. In: WWW. ACM (2015)
Lee, C.-H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. In: SIGMETRICS (2012)
Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
Lim, Y., Kang, U.: Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD (2015)
Marcus, D., Shavitt, Y.: RAGE-a rapid graphlet enumerator for large networks. Comput. Netw. 56(2), 810–819 (2012)
Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. arXiv preprint arXiv:0802.0556 (2008)
Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: SIGCOMM. ACM (2007)
Peng, W., Gao, T., Sisodia, D., Saha, T.K., Li, F., Al Hasan, M.: ACTS: extracting android app topological signature through graphlet sampling. In: 2016 IEEE Conference on Communications and Network Security (CNS), pp. 37–45. IEEE (2016)
Pinar, A., Seshadhri, C., Vishal, V.: ESCAPE: efficiently counting all 5-vertex subgraphs. arXiv preprint arXiv:1610.09411 (2016)
Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)
Rahman, M., Bhuiyan, M.A., Al Hasan, M.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014)
Ribeiro, P., Silva, F.: G-tries: an efficient data structure for discovering network motifs. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1559–1566. ACM (2010)
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015). http://networkrepository.com/socfb.php
Saha, T.K., Hasan, M.A.: Finding network motifs using MCMC sampling. In: Mangioni, G., Simini, F., Uzzo, S.M., Wang, D. (eds.) Complex Networks VI. SCI, vol. 597, pp. 13–24. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16112-9_2
Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: Artificial Intelligence and Statistics, pp. 488–495 (2009)
Wang, P., Lui, J., Ribeiro, B., Towsley, D., Zhao, J., Guan, X.: Efficiently estimating motif statistics of large networks. TKDD 9(2), 8 (2014)
Wang, P., Zhao, J., Zhang, X., Li, Z., Cheng, J., Lui, J.C., Towsley, D., Tao, J., Guan, X.: MOSS-5: a fast method of approximating counts of 5-node graphlets in large graphs. TKDE 30, 73–86 (2017)
Acknowledgements
This work is supported by NSFC (61672486, 61772484, 11671376), and Key Program of NSFC (71631006).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Yang, C., Lyu, M., Li, Y., Zhao, Q., Xu, Y. (2018). SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-91452-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)