Skip to main content
Log in

Efficient processing of graph similarity search

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

A graph similarity search is to find a set of graphs from a graph database that are similar to a given query graph. Existing works solve this problem by first defining a similarity measure between two graphs, and then presenting a filtering mechanism that reduces the number of candidate graphs. The candidate graphs are then verified by performing expensive graph search operations such as finding maximum common subgraphs. Existing works, however, do not report some similar graphs from a graph database while dissimilar graphs are not discarded during the filtering phase. To overcome this problem, in this paper, we first present a graph distance measure that can identify hidden but similar graphs that could not be discovered by previous graph distance measures. We then devise a series of filtering and validation rules to discard and identify non-matching and definitely-matching graphs, respectively, by calculating lower and upper bounds of the distance between a query and a data graph. To execute these filtering and validation rules efficiently during runtime, an index structure is also proposed. Lastly, a verification algorithm that verifies candidate graphs according to our graph distance measure is presented. Experiments on real datasets show that our approach can efficiently and effectively perform graph similarity search by significantly reducing the number of candidate graphs that must be verified, and by returning similar graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM. 16(9), 575–576 (1973)

    Article  MATH  Google Scholar 

  2. Chen, C., Yan, X., Yu, P.S., Han, J., Zhang, D.-Q., Gu, X.: Towards graph containment search and indexing. In: Proceedings of the 33rd International Conference on very Large Data Bases, pp. 926–937. ACM, Vienna (2007)

    Google Scholar 

  3. Cheng, J., Ke, Y., Ng, W.: Efficient query processing on graph databases. ACM Trans. Database Syst. 34(1) (2009). Article No. 2

  4. Cheng, J., Ke, Y., Ng, W., Lu, A.: Fg-index: towards verification-free query processing on graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 857–872. ACM, Beijing (2007)

  5. Cuissart, B., Hébrard, J.-J.: A direct algorithm to find a largest common connected induced subgraph of two graphs. In: GbRPR, pp. 162–171. Springer, Poitiers (2005)

    Book  Google Scholar 

  6. Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., Wu, Y.: Incremental graph pattern matching. In: SIGMOD, pp. 925–936 (2011)

  7. Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. PVLDB. 3(1), 1161–1172 (2010)

    Google Scholar 

  8. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)

    MATH  Google Scholar 

  9. Han, W.-S., Lee, J., Lee, J.-H.: Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 337–348. ACM, New York (2013)

    Google Scholar 

  10. He, H., Singh, A.K.: Closure-tree: an index structure for graph queries. In: Proceedings of the 22nd International Conference on Data Engineering, p. 38. IEEE Computer Society, Atlanta (2006)

    Google Scholar 

  11. He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 405–418. ACM, Vancouver (2008)

    Google Scholar 

  12. Jiang, H.,Wang, H., Yu, P.S., Zhou, S.: Gstring: a novel approach for efficient search in graph databases. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 566–575. IEEE, Istanbul (2007)

    Google Scholar 

  13. Jin, C., Bhowmick, S.S., Choi, B., Zhou, S.: Prague: towards blending practical visual subgraph query formulation and query processing. In: IEEE 28th International Conference on Data Engineering, pp. 222–233. Washington (2012)

  14. Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: SIGMOD, pp. 901–912 (2011)

  15. Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)

    Article  MATH  Google Scholar 

  16. Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)

    Google Scholar 

  17. Lee, J., Oh, J.-H., Hwang, S.: Strg-index: Spatio-temporal region graph indexing for large video databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 718–729. ACM, Baltimore (2005)

    Google Scholar 

  18. Mongiovì, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinforma. Comput. Biol. 8(2), 199–218 (2010)

    Article  Google Scholar 

  19. Raymond, J.W., Gardiner, E.J., Willett, P.: Rascal: calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45(6), 631–644 (2002)

    Article  MATH  Google Scholar 

  20. Raymond, J.W., Willett, P.: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des. 16(7), 521–533 (2002)

    Article  Google Scholar 

  21. Shang, H., Lin, X., Zhang, Y., Yu, J.X., Wang, W.: Connected substructure similarity search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 903–914. ACM, Indianapolis (2010)

    Book  Google Scholar 

  22. Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)

    Google Scholar 

  23. Shang, H., Zhu, K., Lin, X., Zhang, Y., Ichise, R.: Similarity search on supergraph containment. In: Proceedings of the 26th International Conference on Data Engineering, pp. 637–648. IEEE, Long Beach (2010)

    Book  Google Scholar 

  24. Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PODS, pp. 39–52. ACM, Madison (2002)

    Book  Google Scholar 

  25. Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 737–746. ACM, San Jose (2007)

    Book  Google Scholar 

  26. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)

    Article  MathSciNet  Google Scholar 

  27. Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: IEEE 28th International Conference on Data Engineering, pp. 210–221. IEEE Computer Society, Washington (2012)

    Book  Google Scholar 

  28. White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 266–275. ACM, Washington (2003)

    Book  Google Scholar 

  29. Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 976–985. IEEE, Istanbul (2007)

    Book  Google Scholar 

  30. Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 721–724. IEEE Computer Society, Maebashi City (2002)

    Google Scholar 

  31. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346. ACM, Paris (2004)

    Google Scholar 

  32. Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 766–777. ACM, Baltimore (2005)

    Book  Google Scholar 

  33. Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009)

    Google Scholar 

  34. Zhang, S., Hu, M., Yang, J.: Treepi: a novel graph indexing method. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 966–975. IEEE, Istanbul (2007)

    Google Scholar 

  35. Zhang, S., Yang, J.: Gaddi: distance index based subgraph matching in biological networks. In: Proceedings of the 12th International Conference on Extending Database Technology, pp. 192–203. ACM, Saint Petersburg (2009)

    Google Scholar 

  36. Zhao, P., Yu, J.X., Yu, P.S.: Graph indexing: tree + delta >= graph. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 938–949. ACM, Vienna (2007)

    Google Scholar 

  37. Zhu, Y., Qin, L., Yu, J.X., Ke, Y., Lin, X.: High efficiency and quality: large graphs matching. VLDB J. 22(3), 345–368 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chin-Wan Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, R., Chung, CW. Efficient processing of graph similarity search. World Wide Web 18, 633–659 (2015). https://doi.org/10.1007/s11280-014-0274-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-014-0274-4

Keywords

Navigation