# Detecting subgraph isomorphism with MapReduce

- 180 Downloads
- 1 Citations

## Abstract

In recent years, the MapReduce framework has become one of the most popular parallel computing platforms for processing big data. MapReduce is used by companies such as Facebook, IBM, and Google to process or analyze massive data sets. Since the approach is frequently used for industrial solutions, the algorithms based on the MapReduce framework gained significant attention within the scientific community. The subgraph isomorphism is a fundamental graph theory problem. Finding small patterns in large graphs is a core challenge in the analysis of applications with big data sets. This paper introduces two novel algorithms, which are capable of finding matching patterns in arbitrary large graphs. The algorithms are designed for utilizing the easy parallelization technique offered by the MapReduce framework. The approaches are evaluated regarding their space and memory requirements. The paper also provides the applied data structure and presents formal analysis of the algorithms.

### Keywords

Subgraph isomorphism MapReduce Pattern matching## Notes

### Acknowledgments

This work was partially supported by the European Union and the European Social Fund through project FuturICT.hu (Grant No.: TAMOP-4.2.2.C-11/1/KONV-2012-0013) organized by VIKING Zrt. Balatonfüred. This work was partially supported by the Hungarian Government, managed by the National Development Agency, and financed by the Research and Technology Innovation Fund (Grant No.: KMR_12-1-2012-0441).

### References

- 1.Apache Hadoop: Apache Hadoop Project (2011) http://hadoop.apache.org/
- 2.Windows Azure (2013) http://www.windowsazure.com/en-us/
- 3.Bader DA, Madduri K (2006) Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2. In: Parallel Processing, 2006. ICPP 2006. International Conference on, pp 523–530. IEEEGoogle Scholar
- 4.Berry JW (2011) Practical heuristics for inexact subgraph isomorphism. Technical Report SAND2011-6558W, Sandia National Laboratories, AlbuquerqueGoogle Scholar
- 5.Berry JW, Hendrickson B, Kahan S, Konecny P (2007) Software and algorithms for graph queries on multithreaded architectures. In: International Parallel and Distributed Processing Symposium, IEEE, pp 1–14Google Scholar
- 6.Bröcheler M, Pugliese A, Subrahmanian V (2010) Cosi: cloud oriented subgraph identification in massive social networks. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 248–255Google Scholar
- 7.Bröcheler M, Pugliese A, Subrahmanian VS (2009) Dogma: a disk-oriented graph matching algorithm for rdf databases. In: Proceedings of the 8th International Semantic Web Conference, ISWC ’09. Springer, Berlin, pp 97–113Google Scholar
- 8.Hadoop wiki—Powered by http://wiki.apache.org/hadoop/PoweredBy (2013)
- 9.Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: SDM, vol. 4, pp 442–446. SIAMGoogle Scholar
- 10.Coffman T, Greenblatt S, Marcus S (2004) Graph-based technologies for intelligence analysis. Commun ACM 47(3):45–47CrossRefGoogle Scholar
- 11.Graph 500 Steering Committee: graph 500 benchmark (2014) http://www.graph500.org/
- 12.Cordella L, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. Pattern Anal Mach Intell IEEE Trans 26(10):1367–1372CrossRefGoogle Scholar
- 13.Cordella LP, Foggia P, Sansone C, Tortorella F, Vento M (1998) Graph matching: a fast algorithm and its evaluation. In: Proceedings of the 14th International Conference on Pattern Recognition, pp 1582–1584Google Scholar
- 14.Cordella LP, Foggia P, Sansone C, Vento M (2001) An improved algorithm for matching large graphs. In: 3rd IAPR-TC15 workshop on graph based representation (GbR2001)Google Scholar
- 15.Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
- 16.Fehér P (2013) Cloud enabled model processing approaches. In: Proceedings of the Automation and Applied Computer Science Workshop 2013 (AACS’13)Google Scholar
- 17.Fehér P, Lengyel L (2013) Investigating the candidate pair generation of the vf2 algorithm. In: The 12th IASTED International Conference on Software Engineering (SE2013), pp 814–820Google Scholar
- 18.Fehér P, Vajk T, Charaf H, Lengyel L (2013) Mapreduce algorithm for finding st-connectivity. In: 4th IEEE International Conference on Cognitive Infococommunications—CogInfoCom 2013Google Scholar
- 19.Foggia P, Sansone C, Vento M (2001) A performance comparison of five algorithms for graph isomorphism. In: 3rd IAPR-TC15 workshop on graph based representation (GbR2001)Google Scholar
- 20.Kang U, Tsourakakis C, Appel AP, Faloutsos C, Leskovec J (2008) HADI: fast diameter estimation and mining in massive graphs with Hadoop. Carnegie Mellon University, School of Computer Science, Machine Learning DepartmentGoogle Scholar
- 21.Karloff H, Suri S, Vassilvitskii S (2010) A model of computation for mapreduce. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, pp 938–948Google Scholar
- 22.Kim SH, Lee KH, Choi H, Lee YJ (2013) Parallel processing of multiple graph queries using mapreduce. In: DBKDA 2013, The Fifth International Conference on Advances in Databases, Knowledge, and Data Applications, pp 33–38Google Scholar
- 23.Leach AR, Gillet VJ (2007) An introduction to chemoinformatics. Springer, BerlinCrossRefGoogle Scholar
- 24.Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. AcM sIGMoD Rec. 40(4):11–20CrossRefGoogle Scholar
- 25.Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In: Jorge AM, Torgo L, Brazdil P, Camacho R, Gama J (eds) Knowledge discovery in databases: PKDD 2005. Springer, Berlin, pp 133–145Google Scholar
- 26.Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Dou Y, Gruber R, Joller JM (eds) Advanced parallel processing technologies. Springer, Berlin, pp 341–355Google Scholar
- 27.McKay BD (1981) Practical graph isomorphism. Congr Numer 30:45–87MathSciNetMATHGoogle Scholar
- 28.Messmer BT, Bunke H (1995) Subgraph isomorphism in polynominal time. Technical Report IAM 95-003, Institute of Computer Science and Applied Mathematics, University of Bern, BernGoogle Scholar
- 29.Nilsson N (1982) Principles of artificial intelligence. Symbolic computation: artificial intelligence. Springer, BerlinGoogle Scholar
- 30.Ohlrich M, Ebeling C, Ginting E, Sather L (1993) Subgemini: identifying subcircuits using a fast subgraph isomorphism algorithm. In: Proceedings of the 30th International Design Automation Conference, ACM, pp 31–37Google Scholar
- 31.Park HM, Chung CW (2013) An efficient mapreduce algorithm for counting triangles in a very large graph. In: Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, ACM, pp 539–548Google Scholar
- 32.Plantenga T (2013) Inexact subgraph isomorphism in mapreduce. J Parallel Distrib Comput 73(2):164–175Google Scholar
- 33.Plump D (1998) Termination of graph rewriting is undecidable. Fundam Inf 33(2):201–209MathSciNetMATHGoogle Scholar
- 34.Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM (2009) Small molecule subgraph detector (smsd) toolkit. J Cheminformatics 1(1):1–13CrossRefGoogle Scholar
- 35.Amazon Web Services (2013) http://aws.amazon.com
- 36.Snijders TA, Pattison PE, Robins GL, Handcock MS (2006) New specifications for exponential random graph models. Sociol Methodol 36(1):99–153CrossRefGoogle Scholar
- 37.Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T (2007) Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 737–746Google Scholar
- 38.Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) Doulion: counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp 837–846Google Scholar
- 39.Ullmann JR (1976) An algorithm for subgraph isomorphism. J Assoc Comput Mach 23:31–42MathSciNetCrossRefGoogle Scholar
- 40.Zhao Z, Wang G, Butt AR, Khan M, Kumar V, Marathe MV (2012) Sahad: Subgraph analysis in massive networks using hadoop. In: Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, IEEE, pp 390–401Google Scholar