Distributed discovery of frequent subgraphs of a network using MapReduce

Shahrivari, Saeed; Jalili, Saeed

doi:10.1007/s00607-015-0446-9

Distributed discovery of frequent subgraphs of a network using MapReduce

Published: 26 February 2015

Volume 97, pages 1101–1120, (2015)
Cite this article

Computing Aims and scope Submit manuscript

Saeed Shahrivari¹ &
Saeed Jalili¹

504 Accesses
17 Citations
Explore all metrics

Abstract

Discovery of frequent subgraphs of a network is a challenging and time-consuming process. Several heuristics and improvements have been proposed before. However, when the size of subgraphs or the size of network is big, the process cannot be done in feasible time on a single machine. One of the promising solutions is using the processing power of available parallel and distributed systems. In this paper, we present a distributed solution for discovery of frequent subgraphs of a network using the MapReduce framework. The solution is named MRSUB and is developed to run over the Hadoop framework. MRSUB uses a novel and load-balanced parallel subgraph enumeration algorithm and fits it into the MapReduce framework. Also, a fast subgraph isomorphism detection heuristic is used which accelerates the whole process further. We executed MRSUB on a private cloud infrastructure with 40 machines and performed several experiments with different networks. Experimental results show that MRSUB scales well and offers an effective solution for discovery of frequent subgraphs of networks which are not possible on a single machine in feasible time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Milo R, Shen-Orr S, Itzkovitz S et al (2002) Network motifs: simple building blocks of complex networks. Science 298:824–827
Article Google Scholar
Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3:347–359
Article Google Scholar
Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22:1152–1153
Article Google Scholar
Kashani ZRM, Ahrabian H, Elahi E et al (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10:318
Article Google Scholar
Ribeiro P, Silva F, Lopes L (2012) Parallel discovery of network motifs. J Parallel Distrib Comput 72:144–154
Article Google Scholar
Grochow J, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed T, Huang H (eds) Research in computational molecular biology. Springer, Berlin, pp 92–106
Chapter Google Scholar
Rudi AG, Shahrivari S, Jalili S, Kashani ZRM (2013) RANGI: a fast list-colored graph motif finding algorithm. IEEE/ACM Trans Comput Biol Bioinform 10:504–513
Article Google Scholar
Masoudi-Nejad A, Schreiber F, Kashani ZRM (2012) Building blocks of biological networks: a review on major network motif discovery algorithms. Syst Biol 6:164–174
Google Scholar
Wong E, Baur B, Quader S, Huang C-H (2012) Biological network motif detection: principles and practice. Brief Bioinform 13:202–215
Article Google Scholar
Kang U, Tsourakakis CE, Faloutsos C (2011) PEGASUS: mining peta-scale graphs. Knowl Inf Syst 27:303–325
Article Google Scholar
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50:321–354
Article MATH Google Scholar
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of IEEE international conference on data mining, pp 313–320
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of IEEE international conference on data mining, pp 721–724
Schreiber F, Schwbbermeyer H (2004) Towards motif detection in networks: frequency concepts and flexible search. In: Proceedings of the international workshop on network tools and applications in biology. NETTAB, pp 91–102
Kuramochi M, Karypis G (2004) GREW—a scalable frequent subgraph discovery algorithm. In: Proceedings of fourth IEEE international conference on data mining ICDM ’04, pp 439–442
Holder LB, Cook DJ, Djoko S (1994) Substructure discovery in the subdue system. In: Proceedings of the workshop on knowledge discovery in databases, pp 169–180
Afrati FN, Fotakis D, Ullman JD (2013) Enumerating subgraph instances using map-reduce. In: Proceedings of IEEE 29th international conference on data engineering (ICDE), pp 62–73
Cohen J (2009) Graph twiddling in a MapReduce world. Comput Sci Eng 11:29–41
Article Google Scholar
Suri S, Vassilvitskii S (2011) Counting triangles and the curse of the last reducer. In: Proceedings of the 20th international conference on world wide web. ACM, NY, pp 607–614
Kolda TG, Pinar A, Plantenga T et al (2013) Counting triangles in massive graphs with MapReduce. arXiv:1301.5887
Pagh R, Tsourakakis CE (2012) Colorful triangle counting and a mapreduce implementation. Inf Process Lett 112:277–281
Article MATH MathSciNet Google Scholar
Zhao Z, Wang G, Butt AR et al (2012) Sahad: subgraph analysis in massive networks using hadoop. In: Proceedings of IEEE international parallel & distributed processing symposium (IPDPS). IEEE, pp 390–401
Kang U, Tsourakakis CE, Appel AP, Faloutsos C, Leskovec J (2011) Hadi: mining radii of large graphs. ACM Trans Knowl Discov Data 5(2):1–24
Article Google Scholar
Johnson DS (2005) The NP-completeness column. ACM Trans Algorithms 1:160–176
Article MathSciNet Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:1–13
Article Google Scholar
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53:72–77
Article Google Scholar
White T (2012) Hadoop: the definitive guide. Yahoo Press, California
Google Scholar
McKay BD (1981) Practical graph isomorphism. Department of Computer Science, Vanderbilt University, Tennessee, US
Google Scholar
Junttila T, Kaski P (2007) Engineering an efficient canonical labeling tool for large and sparse graphs. In: Applegate D, Brodal GS, Panario D, Sedgewick R (eds) Proceedings of the ninth workshop on algorithm engineering and experiments and the fourth workshop on analytic algorithms and combinatorics. SIAM, pp 135–149
West DB (2001) Introduction to graph theory. Prentice Hall, Englewood Cliffs
Google Scholar
Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20:1746–1758
Article Google Scholar
Srihari S, Leong HW (2013) A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinform Comput Biol 11:12–30
Article Google Scholar
Pablo MG, Danon L (2003) Community structure in jazz. Adv Complex Syst 6:565–573
Article Google Scholar
Stehlé J, Voirin N, Barrat A et al (2011) High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6:e23176
Article Google Scholar
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web, pp 641–650
Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Proceedings of advances in neural information processing systems (NIPS), pp 539–547
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–41
Article Google Scholar
Albert R, Jeong H, Barabási A-L (1999) Internet: diameter of the world-wide web. Nature 401:130–131
Article Google Scholar
Stoica I, Morris R, Karger D et al (2001) Chord: a scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput Commun Rev 31:149–160
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Tarbiat Modares University (TMU), Tehran, Iran
Saeed Shahrivari & Saeed Jalili

Authors

Saeed Shahrivari
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Jalili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Jalili.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shahrivari, S., Jalili, S. Distributed discovery of frequent subgraphs of a network using MapReduce. Computing 97, 1101–1120 (2015). https://doi.org/10.1007/s00607-015-0446-9

Download citation

Received: 27 July 2014
Accepted: 17 February 2015
Published: 26 February 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s00607-015-0446-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed discovery of frequent subgraphs of a network using MapReduce

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on community detection methods and applications in complex information networks

Clustering graph data: the roadmap to spectral techniques

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Distributed discovery of frequent subgraphs of a network using MapReduce

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on community detection methods and applications in complex information networks

Clustering graph data: the roadmap to spectral techniques

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation