Skip to main content
Log in

Distributed discovery of frequent subgraphs of a network using MapReduce

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Discovery of frequent subgraphs of a network is a challenging and time-consuming process. Several heuristics and improvements have been proposed before. However, when the size of subgraphs or the size of network is big, the process cannot be done in feasible time on a single machine. One of the promising solutions is using the processing power of available parallel and distributed systems. In this paper, we present a distributed solution for discovery of frequent subgraphs of a network using the MapReduce framework. The solution is named MRSUB and is developed to run over the Hadoop framework. MRSUB uses a novel and load-balanced parallel subgraph enumeration algorithm and fits it into the MapReduce framework. Also, a fast subgraph isomorphism detection heuristic is used which accelerates the whole process further. We executed MRSUB on a private cloud infrastructure with 40 machines and performed several experiments with different networks. Experimental results show that MRSUB scales well and offers an effective solution for discovery of frequent subgraphs of networks which are not possible on a single machine in feasible time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Milo R, Shen-Orr S, Itzkovitz S et al (2002) Network motifs: simple building blocks of complex networks. Science 298:824–827

    Article  Google Scholar 

  2. Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3:347–359

    Article  Google Scholar 

  3. Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22:1152–1153

    Article  Google Scholar 

  4. Kashani ZRM, Ahrabian H, Elahi E et al (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10:318

    Article  Google Scholar 

  5. Ribeiro P, Silva F, Lopes L (2012) Parallel discovery of network motifs. J Parallel Distrib Comput 72:144–154

    Article  Google Scholar 

  6. Grochow J, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed T, Huang H (eds) Research in computational molecular biology. Springer, Berlin, pp 92–106

    Chapter  Google Scholar 

  7. Rudi AG, Shahrivari S, Jalili S, Kashani ZRM (2013) RANGI: a fast list-colored graph motif finding algorithm. IEEE/ACM Trans Comput Biol Bioinform 10:504–513

    Article  Google Scholar 

  8. Masoudi-Nejad A, Schreiber F, Kashani ZRM (2012) Building blocks of biological networks: a review on major network motif discovery algorithms. Syst Biol 6:164–174

    Google Scholar 

  9. Wong E, Baur B, Quader S, Huang C-H (2012) Biological network motif detection: principles and practice. Brief Bioinform 13:202–215

    Article  Google Scholar 

  10. Kang U, Tsourakakis CE, Faloutsos C (2011) PEGASUS: mining peta-scale graphs. Knowl Inf Syst 27:303–325

    Article  Google Scholar 

  11. Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50:321–354

    Article  MATH  Google Scholar 

  12. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of IEEE international conference on data mining, pp 313–320

  13. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of IEEE international conference on data mining, pp 721–724

  14. Schreiber F, Schwbbermeyer H (2004) Towards motif detection in networks: frequency concepts and flexible search. In: Proceedings of the international workshop on network tools and applications in biology. NETTAB, pp 91–102

  15. Kuramochi M, Karypis G (2004) GREW—a scalable frequent subgraph discovery algorithm. In: Proceedings of fourth IEEE international conference on data mining ICDM ’04, pp 439–442

  16. Holder LB, Cook DJ, Djoko S (1994) Substructure discovery in the subdue system. In: Proceedings of the workshop on knowledge discovery in databases, pp 169–180

  17. Afrati FN, Fotakis D, Ullman JD (2013) Enumerating subgraph instances using map-reduce. In: Proceedings of IEEE 29th international conference on data engineering (ICDE), pp 62–73

  18. Cohen J (2009) Graph twiddling in a MapReduce world. Comput Sci Eng 11:29–41

    Article  Google Scholar 

  19. Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on world wide web. ACM, NY, pp 607–614

  20. Kolda TG, Pinar A, Plantenga T et al (2013) Counting triangles in massive graphs with MapReduce. arXiv:1301.5887

  21. Pagh R, Tsourakakis CE (2012) Colorful triangle counting and a mapreduce implementation. Inf Process Lett 112:277–281

    Article  MATH  MathSciNet  Google Scholar 

  22. Zhao Z, Wang G, Butt AR et al (2012) Sahad: subgraph analysis in massive networks using hadoop. In: Proceedings of IEEE international parallel & distributed processing symposium (IPDPS). IEEE, pp 390–401

  23. Kang U, Tsourakakis CE, Appel AP, Faloutsos C, Leskovec J (2011) Hadi: mining radii of large graphs. ACM Trans Knowl Discov Data 5(2):1–24

    Article  Google Scholar 

  24. Johnson DS (2005) The NP-completeness column. ACM Trans Algorithms 1:160–176

    Article  MathSciNet  Google Scholar 

  25. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:1–13

    Article  Google Scholar 

  26. Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53:72–77

    Article  Google Scholar 

  27. White T (2012) Hadoop: the definitive guide. Yahoo Press, California

    Google Scholar 

  28. McKay BD (1981) Practical graph isomorphism. Department of Computer Science, Vanderbilt University, Tennessee, US

    Google Scholar 

  29. Junttila T, Kaski P (2007) Engineering an efficient canonical labeling tool for large and sparse graphs. In: Applegate D, Brodal GS, Panario D, Sedgewick R (eds) Proceedings of the ninth workshop on algorithm engineering and experiments and the fourth workshop on analytic algorithms and combinatorics. SIAM, pp 135–149

  30. West DB (2001) Introduction to graph theory. Prentice Hall, Englewood Cliffs

    Google Scholar 

  31. Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20:1746–1758

    Article  Google Scholar 

  32. Srihari S, Leong HW (2013) A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinform Comput Biol 11:12–30

    Article  Google Scholar 

  33. Pablo MG, Danon L (2003) Community structure in jazz. Adv Complex Syst 6:565–573

    Article  Google Scholar 

  34. Stehlé J, Voirin N, Barrat A et al (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6:e23176

    Article  Google Scholar 

  35. Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web, pp 641–650

  36. Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Proceedings of advances in neural information processing systems (NIPS), pp 539–547

  37. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–41

    Article  Google Scholar 

  38. Albert R, Jeong H, Barabási A-L (1999) Internet: diameter of the world-wide web. Nature 401:130–131

    Article  Google Scholar 

  39. Stoica I, Morris R, Karger D et al (2001) Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput Commun Rev 31:149–160

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeed Jalili.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahrivari, S., Jalili, S. Distributed discovery of frequent subgraphs of a network using MapReduce. Computing 97, 1101–1120 (2015). https://doi.org/10.1007/s00607-015-0446-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-015-0446-9

Keywords

Mathematics Subject Classification

Navigation