Advertisement

Knowledge and Information Systems

, Volume 51, Issue 1, pp 61–87 | Cite as

Efficient and scalable labeled subgraph matching using SGMatch

  • Carlos R. RiveroEmail author
  • Hasan M. Jamil
Regular Paper

Abstract

Graphs are natural candidates for modeling application domains, such as social networks, pattern recognition, citation networks, or protein–protein interactions. One of the most challenging tasks in managing graphs is subgraph matching over data graphs, which attempts to find one-to-one correspondences, called solutions, among the query and data nodes. To compute solutions, most contemporary techniques use backtracking and recursion. An open research question is whether graphs can be matched based on parts and local solutions can be combined to reach a global matching. In this paper, we present an approach based on graph decomposition called SGMatch to match graphs. We represent graphs in smaller units called graphlets and develop a matching technique to leverage this representation. Pruning strategies use a new notion of edge covering called minimum hub cover and metadata, such as statistics and inverted indices, to reduce the number of matching candidates. Our evaluation of SGMatch versus contemporary algorithms, i.e., VF2, GraphQL, QuickSI, GADDI, or SPath, shows that SGMatch substantially improves the performance of current state-of-the-art techniques for larger query graphs with different structures, i.e., cliques, paths or subgraphs.

Keywords

Graph management Query processing Subgraph matching 

Notes

Acknowledgments

This work was completed while Carlos R. Rivero was working as a postdoctoral researcher at the University of Idaho. The ILP formulation discussed in Sect. 3.2 for the computation of minimum hub cover was contributed by Belma Yelbay. We would like to thank the anonymous reviewers and Stanisław P. Radziszowski for their valuable suggestions. This research was supported in part by the National Science Foundation grant DRL 1515550, Idaho National Laboratory grant FFK039, and a University of Idaho Office of Research and Economic Development grant.

References

  1. 1.
    Amin MS, Finley RL, Jamil HM (2012) Top-k similar graph matching using TraM in biological networks. IEEE/ACM Trans Comput Biol Bioinform 9(6):1790–1804CrossRefGoogle Scholar
  2. 2.
    Bonnici V, Giugno R, Pulvirenti A, Shasha D, Ferro A (2013) A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform 14(S–7):S13CrossRefGoogle Scholar
  3. 3.
    Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell 26(10):1367–1372CrossRefGoogle Scholar
  4. 4.
    Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, San FranciscozbMATHGoogle Scholar
  5. 5.
    Han W-S, Lee J, Lee J-H (2013) Turbo\(_{{\rm iso}}\): towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD conference, pp 337–348Google Scholar
  6. 6.
    He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE, p 38Google Scholar
  7. 7.
    He H, Singh AK(2008) Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD conference, pp 405–418Google Scholar
  8. 8.
    Jamil HM (2009) A novel knowledge representation framework for computing sub-graph isomorphic queries in interaction network databases. In ICTAI, pp 131–138Google Scholar
  9. 9.
    Jamil HM (2011) Computing subgraph isomorphic queries using structural unification and minimum graph structures. In SAC, pp 1053–1058Google Scholar
  10. 10.
    Lee J, Han W-S, Kasperovics R, Lee J-H (2012) An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2):133–144Google Scholar
  11. 11.
    Leskovec J (2016) Stanford large network dataset collection. http://snap.stanford.edu/data/index.html
  12. 12.
    Messmer BT, Bunke H (2000) Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans Knowl Data Eng 12(2):307–323CrossRefGoogle Scholar
  13. 13.
    Prechelt L (2000) An empirical comparison of seven programming languages. IEEE Comput 33(10):23–29CrossRefGoogle Scholar
  14. 14.
    Prud’homme C, Fages J-G (2016) CHOCO solver: a Java library for CSP and CP. http://choco-solver.org/
  15. 15.
    Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Benchmarking data exchange among semantic-web ontologies. IEEE Trans Knowl Data Eng 25(9):1997–2009CrossRefGoogle Scholar
  16. 16.
    Rivero CR, Hernández I, Ruiz D, Corchuelo R (2013) Exchanging data amongst linked data applications. Knowl Inf Syst 37(3):693–729CrossRefGoogle Scholar
  17. 17.
    Rivero CR, Jamil HM (2014) On matching graphs using disk-based processing: an XML and XQuery implementation. In: ICDE workshops, pp 20–27Google Scholar
  18. 18.
    Rivero CR, Jamil HM (2016) SGMatch: implementation, data and query sets, running scripts. http://dblab.nkn.uidaho.edu/SGMatch/
  19. 19.
    Shang H, Zhang Y, Lin X, Yu JX (2008) Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1):364–375Google Scholar
  20. 20.
    Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. PVLDB 5(9):788–799Google Scholar
  21. 21.
    Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204Google Scholar
  22. 22.
    Tian Y, Patel JM(2008) TALE: a tool for approximate large graph matching. In: ICDE, pp 963–972Google Scholar
  23. 23.
    Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23(1):31–42MathSciNetCrossRefGoogle Scholar
  24. 24.
    Yelbay B, Birbil SI, Bülbül K, Jamil HM (2016) Approximating the minimum hub cover problem on planar graphs. Optim Lett 10(1):33–45MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Zhang S, Li S, Yang J (2009) GADDI: distance index based subgraph matching in biological networks. In: EDBT, pp 192–203Google Scholar
  26. 26.
    Zhang S, Yang J, Jin W (2010) SAPPER: subgraph indexing and approximate matching in large graphs. PVLDB 3(1):1185–1194Google Scholar
  27. 27.
    Zhao P, Han J (2010) On graph query optimization in large networks. PVLDB 3(1):340–351Google Scholar
  28. 28.
    Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2), Article No 6. doi: 10.1145/1132956.1132959
  29. 29.
    Zou Z, Li J, Gao H, Zhang S (2010) Mining frequent subgraph patterns from uncertain graph data. IEEE Trans Knowl Data Eng 22(9):1203–1218CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceRochester Institute of TechnologyRochesterUSA
  2. 2.Department of Computer ScienceUniversity of IdahoMoscowUSA

Personalised recommendations