SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk

  • Chen Yang
  • Min Lyu
  • Yongkun Li
  • Qianqian Zhao
  • Yinlong Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10827)

Abstract

Mining graphlet statistics is very meaningful due to its wide applications in social networks, bioinformatics and information security, etc. However, it is a big challenge to exactly count graphlet statistics as the number of subgraphs exponentially increases with the graph size, so sampling algorithms are widely used to estimate graphlet statistics within reasonable time. However, existing sampling algorithms are not scalable for large graphlets, e.g., they may get stuck when estimating graphlets with more than five nodes. To address this issue, we propose a highly scalable algorithm, Scalable subgraph Sampling via Random Walk (SSRW), for graphlet counts and concentrations. SSRW samples graphlets by generating new nodes from the neighbors of previously visited nodes instead of fixed ones. Thanks to this flexibility, we can generate any k-graphlets in a unified way and estimate statistics of k-graphlet efficiently even for large k. Our extensive experiments on estimating counts and concentrations of \(\{4,5,6,7\}\)-graphlets show that SSRW algorithm is scalable, accurate and fast.

Notes

Acknowledgements

This work is supported by NSFC (61672486, 61772484, 11671376), and Key Program of NSFC (71631006).

References

  1. 1.
  2. 2.
    Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N.G., Willke, T.L.: Graphlet decomposition: framework, algorithms, and applications. Knowl. Inf. Syst. 50, 1–34 (2016)Google Scholar
  3. 3.
    Lovász, L.: Random walks on graphs: a survey. Combinatorics: Paul Erdös Is Eighty 2(1), 1–46 (1993)Google Scholar
  4. 4.
    Bhuiyan, M.A., Rahman, M., Al Hasan, M.: Guise: uniform sampling of graphlets for large graph analysis. In: ICDM. IEEE (2012)Google Scholar
  5. 5.
    Chen, X., Li, Y., Wang, P., Lui, J.: A general framework for estimating graphlet statistics via random walk. VLDB 10(3), 253–264 (2016)Google Scholar
  6. 6.
    Chen, X., Lui, J.C.: Mining graphlet counts in online social networks. In: ICDM. IEEE (2016)Google Scholar
  7. 7.
    Han, G., Sethu, H.: Waddling random walk: fast and accurate mining of motif statistics in large graphs. In: ICDM. IEEE (2016)Google Scholar
  8. 8.
    Hardiman, S.J., Katzir, L.: Estimating clustering coefficients and size of social networks via random walk. In: WWW. ACM (2013)Google Scholar
  9. 9.
    Holland, P.W., Leinhardt, S.: A method for detecting structure in sociometric data. Am. J. Sociol. 76(3), 492–513 (1970)CrossRefGoogle Scholar
  10. 10.
    Jha, M., Seshadhri, C., Pinar, A.: Path sampling: a fast and provable method for estimating 4-vertex subgraph counts. In: WWW. ACM (2015)Google Scholar
  11. 11.
    Lee, C.-H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. In: SIGMETRICS (2012)CrossRefGoogle Scholar
  12. 12.
    Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
  13. 13.
    Lim, Y., Kang, U.: Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD (2015)Google Scholar
  14. 14.
    Marcus, D., Shavitt, Y.: RAGE-a rapid graphlet enumerator for large networks. Comput. Netw. 56(2), 810–819 (2012)CrossRefGoogle Scholar
  15. 15.
    Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. arXiv preprint arXiv:0802.0556 (2008)
  16. 16.
    Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: SIGCOMM. ACM (2007)Google Scholar
  17. 17.
    Peng, W., Gao, T., Sisodia, D., Saha, T.K., Li, F., Al Hasan, M.: ACTS: extracting android app topological signature through graphlet sampling. In: 2016 IEEE Conference on Communications and Network Security (CNS), pp. 37–45. IEEE (2016)Google Scholar
  18. 18.
    Pinar, A., Seshadhri, C., Vishal, V.: ESCAPE: efficiently counting all 5-vertex subgraphs. arXiv preprint arXiv:1610.09411 (2016)
  19. 19.
    Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)CrossRefGoogle Scholar
  20. 20.
    Rahman, M., Bhuiyan, M.A., Al Hasan, M.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014)Google Scholar
  21. 21.
    Ribeiro, P., Silva, F.: G-tries: an efficient data structure for discovering network motifs. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1559–1566. ACM (2010)Google Scholar
  22. 22.
    Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015). http://networkrepository.com/socfb.php
  23. 23.
    Saha, T.K., Hasan, M.A.: Finding network motifs using MCMC sampling. In: Mangioni, G., Simini, F., Uzzo, S.M., Wang, D. (eds.) Complex Networks VI. SCI, vol. 597, pp. 13–24. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16112-9_2CrossRefGoogle Scholar
  24. 24.
    Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: Artificial Intelligence and Statistics, pp. 488–495 (2009)Google Scholar
  25. 25.
    Wang, P., Lui, J., Ribeiro, B., Towsley, D., Zhao, J., Guan, X.: Efficiently estimating motif statistics of large networks. TKDD 9(2), 8 (2014)CrossRefGoogle Scholar
  26. 26.
    Wang, P., Zhao, J., Zhang, X., Li, Z., Cheng, J., Lui, J.C., Towsley, D., Tao, J., Guan, X.: MOSS-5: a fast method of approximating counts of 5-node graphlets in large graphs. TKDE 30, 73–86 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Chen Yang
    • 1
  • Min Lyu
    • 1
    • 2
  • Yongkun Li
    • 1
    • 2
  • Qianqian Zhao
    • 1
  • Yinlong Xu
    • 1
    • 2
  1. 1.University of Science and Technology of ChinaHefeiChina
  2. 2.AnHui Province Key Laboratory of High Performance ComputingHefeiChina

Personalised recommendations