Skip to main content

Practical Survey on MapReduce Subgraph Enumeration Algorithms

  • Conference paper
  • First Online:
Advances in Internet, Data & Web Technologies (EIDWT 2022)

Abstract

Subgraph enumeration is a basic task in many graph analyses. Therefore, it is necessary to get this task done within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of scalability. Distributed computing is often proposed as a solution to improve scalability. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. In this work we provide a comprehensive overview of several Map-Reduce subgraph enumeration algorithms which currently represent the state of the art. We identify and describe the main conceptual approaches, giving insight on their advantages and limitations, and provide a summary of their similarities and differences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Here we use the term MapReduce in a broad sense. The algorithms in this work can be implemented in newer frameworks such as Apache Spark as well.

  2. 2.

    http://www.lemurproject.org/clueweb12/webgraph.php.

  3. 3.

    http://webdatacommons.org/hyperlinkgraph/.

  4. 4.

    http://law.di.unimi.it/webdata/indochina-2004/.

References

  1. Afrati, F.N., Fotakis, D., Ullman, J.D.: Enumerating subgraph instances using map-reduce. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 62–73. IEEE (2013)

    Google Scholar 

  2. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M. P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM Press (2011)

    Google Scholar 

  3. Boldi, P., Vigna, P.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), Manhattan, USA, pp. 595–601. ACM Press (2004)

    Google Scholar 

  4. Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: Cosi: cloud oriented subgraph identification in massive social networks. In: 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 248–255. IEEE (2010)

    Google Scholar 

  5. Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29 (2009)

    Article  Google Scholar 

  6. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)

    Article  Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters (2008)

    Google Scholar 

  8. Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng. 17(8), 1036–1050 (2005)

    Article  Google Scholar 

  9. Faust, K.: A puzzle concerning triads in social networks: graph constraints and the triad census. Soc. Netw. 32(3), 221–233 (2010)

    Article  Google Scholar 

  10. Hu, H., Yan, X., Huang, Y., Han, J., Zhou, X.J.: Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(suppl_1), i213–i221 (2005)

    Google Scholar 

  11. Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in MapReduce. Proc. VLDB Endow. 8(10), 974–985 (2015)

    Article  Google Scholar 

  12. Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)

    Article  MathSciNet  Google Scholar 

  13. Liu, X., Santoso, Y., Thomo, A., Srinivasan, V.: Distributed enumeration of four node graphlets at quadrillion-scale. In: SSDBM 2021: 33rd International Conference on Scientific and Statistical Database Management, pp. 85–96 (2021)

    Google Scholar 

  14. Milenković, T., Pržulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inform. 6 (2008)

    Google Scholar 

  15. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)

    Article  Google Scholar 

  16. Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. In: 22nd ACM International Conference on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, 27 October–1 November, pp. 539–548 (2013)

    Google Scholar 

  17. Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1115–1124. ACM (2016)

    Google Scholar 

  18. Park, H.M., Silvestri, F., Kang, U., Pagh, R.: MapReduce triangle enumeration with guarantees. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, 3–7 November 2014, pp. 1739–1748 (2014)

    Google Scholar 

  19. Park, H.M., Silvestri, F., Pagh, R., Chung, C.W., Myaeng, S.H., Kang, U.: Enumerating trillion subgraphs on distributed systems. ACM Trans. Knowl. Discov. Data (TKDD) 12(6), 1–30 (2018)

    Article  Google Scholar 

  20. Pržulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), e177–e183 (2007)

    Article  Google Scholar 

  21. Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)

    Article  Google Scholar 

  22. Ribeiro, P., Paredes, P., Silva, M.E., Aparicio, D., Silva, F.: A survey on subgraph counting: concepts, algorithms, and applications to network motifs and graphlets. ACM Comput. Surv. 54(2), 1–36 (2021)

    Article  Google Scholar 

  23. Santoso, Y., Srinivasan, V., Thomo, A.: Efficient enumeration of four node graphlets at trillion-scale. In: 23rd EDBT, pp. 439–442 (2020)

    Google Scholar 

  24. Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the 20th International Conference on World Wide Web, WWW ’11. ACM, New York (2011)

    Google Scholar 

  25. Wong, S.W.H., Cercone, N., Jurisica, I.: Comparative network analysis via differential graphlet communities. Proteomics 15(2–3), 608–617 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaozhou Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Santoso, Y., Srinivasan, V., Thomo, A. (2022). Practical Survey on MapReduce Subgraph Enumeration Algorithms. In: Barolli, L., Kulla, E., Ikeda, M. (eds) Advances in Internet, Data & Web Technologies. EIDWT 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 118. Springer, Cham. https://doi.org/10.1007/978-3-030-95903-6_45

Download citation

Publish with us

Policies and ethics