The Journal of Supercomputing

, Volume 73, Issue 5, pp 1810–1851 | Cite as

Detecting subgraph isomorphism with MapReduce

  • Péter Fehér
  • Márk Asztalos
  • Tamás Vajk
  • Tamás Mészáros
  • László Lengyel


In recent years, the MapReduce framework has become one of the most popular parallel computing platforms for processing big data. MapReduce is used by companies such as Facebook, IBM, and Google to process or analyze massive data sets. Since the approach is frequently used for industrial solutions, the algorithms based on the MapReduce framework gained significant attention within the scientific community. The subgraph isomorphism is a fundamental graph theory problem. Finding small patterns in large graphs is a core challenge in the analysis of applications with big data sets. This paper introduces two novel algorithms, which are capable of finding matching patterns in arbitrary large graphs. The algorithms are designed for utilizing the easy parallelization technique offered by the MapReduce framework. The approaches are evaluated regarding their space and memory requirements. The paper also provides the applied data structure and presents formal analysis of the algorithms.


Subgraph isomorphism MapReduce Pattern matching 



This work was partially supported by the European Union and the European Social Fund through project (Grant No.: TAMOP-4.2.2.C-11/1/KONV-2012-0013) organized by VIKING Zrt. Balatonfüred. This work was partially supported by the Hungarian Government, managed by the National Development Agency, and financed by the Research and Technology Innovation Fund (Grant No.: KMR_12-1-2012-0441).


  1. 1.
    Apache Hadoop: Apache Hadoop Project (2011)
  2. 2.
  3. 3.
    Bader DA, Madduri K (2006) Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2. In: Parallel Processing, 2006. ICPP 2006. International Conference on, pp 523–530. IEEEGoogle Scholar
  4. 4.
    Berry JW (2011) Practical heuristics for inexact subgraph isomorphism. Technical Report SAND2011-6558W, Sandia National Laboratories, AlbuquerqueGoogle Scholar
  5. 5.
    Berry JW, Hendrickson B, Kahan S, Konecny P (2007) Software and algorithms for graph queries on multithreaded architectures. In: International Parallel and Distributed Processing Symposium, IEEE, pp 1–14Google Scholar
  6. 6.
    Bröcheler M, Pugliese A, Subrahmanian V (2010) Cosi: cloud oriented subgraph identification in massive social networks. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 248–255Google Scholar
  7. 7.
    Bröcheler M, Pugliese A, Subrahmanian VS (2009) Dogma: a disk-oriented graph matching algorithm for rdf databases. In: Proceedings of the 8th International Semantic Web Conference, ISWC ’09. Springer, Berlin, pp 97–113Google Scholar
  8. 8.
    Hadoop wiki—Powered by (2013)
  9. 9.
    Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: SDM, vol. 4, pp 442–446. SIAMGoogle Scholar
  10. 10.
    Coffman T, Greenblatt S, Marcus S (2004) Graph-based technologies for intelligence analysis. Commun ACM 47(3):45–47CrossRefGoogle Scholar
  11. 11.
    Graph 500 Steering Committee: graph 500 benchmark (2014)
  12. 12.
    Cordella L, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. Pattern Anal Mach Intell IEEE Trans 26(10):1367–1372CrossRefGoogle Scholar
  13. 13.
    Cordella LP, Foggia P, Sansone C, Tortorella F, Vento M (1998) Graph matching: a fast algorithm and its evaluation. In: Proceedings of the 14th International Conference on Pattern Recognition, pp 1582–1584Google Scholar
  14. 14.
    Cordella LP, Foggia P, Sansone C, Vento M (2001) An improved algorithm for matching large graphs. In: 3rd IAPR-TC15 workshop on graph based representation (GbR2001)Google Scholar
  15. 15.
    Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  16. 16.
    Fehér P (2013) Cloud enabled model processing approaches. In: Proceedings of the Automation and Applied Computer Science Workshop 2013 (AACS’13)Google Scholar
  17. 17.
    Fehér P, Lengyel L (2013) Investigating the candidate pair generation of the vf2 algorithm. In: The 12th IASTED International Conference on Software Engineering (SE2013), pp 814–820Google Scholar
  18. 18.
    Fehér P, Vajk T, Charaf H, Lengyel L (2013) Mapreduce algorithm for finding st-connectivity. In: 4th IEEE International Conference on Cognitive Infococommunications—CogInfoCom 2013Google Scholar
  19. 19.
    Foggia P, Sansone C, Vento M (2001) A performance comparison of five algorithms for graph isomorphism. In: 3rd IAPR-TC15 workshop on graph based representation (GbR2001)Google Scholar
  20. 20.
    Kang U, Tsourakakis C, Appel AP, Faloutsos C, Leskovec J (2008) HADI: fast diameter estimation and mining in massive graphs with Hadoop. Carnegie Mellon University, School of Computer Science, Machine Learning DepartmentGoogle Scholar
  21. 21.
    Karloff H, Suri S, Vassilvitskii S (2010) A model of computation for mapreduce. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, pp 938–948Google Scholar
  22. 22.
    Kim SH, Lee KH, Choi H, Lee YJ (2013) Parallel processing of multiple graph queries using mapreduce. In: DBKDA 2013, The Fifth International Conference on Advances in Databases, Knowledge, and Data Applications, pp 33–38Google Scholar
  23. 23.
    Leach AR, Gillet VJ (2007) An introduction to chemoinformatics. Springer, BerlinCrossRefGoogle Scholar
  24. 24.
    Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. AcM sIGMoD Rec. 40(4):11–20CrossRefGoogle Scholar
  25. 25.
    Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In: Jorge AM, Torgo L, Brazdil P, Camacho R, Gama J (eds) Knowledge discovery in databases: PKDD 2005. Springer, Berlin, pp 133–145Google Scholar
  26. 26.
    Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Dou Y, Gruber R, Joller JM (eds) Advanced parallel processing technologies. Springer, Berlin, pp 341–355Google Scholar
  27. 27.
    McKay BD (1981) Practical graph isomorphism. Congr Numer 30:45–87MathSciNetMATHGoogle Scholar
  28. 28.
    Messmer BT, Bunke H (1995) Subgraph isomorphism in polynominal time. Technical Report IAM 95-003, Institute of Computer Science and Applied Mathematics, University of Bern, BernGoogle Scholar
  29. 29.
    Nilsson N (1982) Principles of artificial intelligence. Symbolic computation: artificial intelligence. Springer, BerlinGoogle Scholar
  30. 30.
    Ohlrich M, Ebeling C, Ginting E, Sather L (1993) Subgemini: identifying subcircuits using a fast subgraph isomorphism algorithm. In: Proceedings of the 30th International Design Automation Conference, ACM, pp 31–37Google Scholar
  31. 31.
    Park HM, Chung CW (2013) An efficient mapreduce algorithm for counting triangles in a very large graph. In: Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, ACM, pp 539–548Google Scholar
  32. 32.
    Plantenga T (2013) Inexact subgraph isomorphism in mapreduce. J Parallel Distrib Comput 73(2):164–175Google Scholar
  33. 33.
    Plump D (1998) Termination of graph rewriting is undecidable. Fundam Inf 33(2):201–209MathSciNetMATHGoogle Scholar
  34. 34.
    Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM (2009) Small molecule subgraph detector (smsd) toolkit. J Cheminformatics 1(1):1–13CrossRefGoogle Scholar
  35. 35.
    Amazon Web Services (2013)
  36. 36.
    Snijders TA, Pattison PE, Robins GL, Handcock MS (2006) New specifications for exponential random graph models. Sociol Methodol 36(1):99–153CrossRefGoogle Scholar
  37. 37.
    Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T (2007) Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 737–746Google Scholar
  38. 38.
    Tsourakakis CE, Kang U, Miller GL, Faloutsos C (2009) Doulion: counting triangles in massive graphs with a coin. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp 837–846Google Scholar
  39. 39.
    Ullmann JR (1976) An algorithm for subgraph isomorphism. J Assoc Comput Mach 23:31–42MathSciNetCrossRefGoogle Scholar
  40. 40.
    Zhao Z, Wang G, Butt AR, Khan M, Kumar V, Marathe MV (2012) Sahad: Subgraph analysis in massive networks using hadoop. In: Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, IEEE, pp 390–401Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Péter Fehér
    • 1
  • Márk Asztalos
    • 1
  • Tamás Vajk
    • 1
  • Tamás Mészáros
    • 1
  • László Lengyel
    • 1
  1. 1.Department of Automation and Applied InformaticsBudapest University of Technology and EconomicsBudapestHungary

Personalised recommendations