Efficient processing of graph similarity search

Choi, Ryan; Chung, Chin-Wan

doi:10.1007/s11280-014-0274-4

Efficient processing of graph similarity search

Published: 24 January 2014

Volume 18, pages 633–659, (2015)
Cite this article

World Wide Web Aims and scope Submit manuscript

Ryan Choi¹ &
Chin-Wan Chung¹

515 Accesses
3 Citations
Explore all metrics

Abstract

A graph similarity search is to find a set of graphs from a graph database that are similar to a given query graph. Existing works solve this problem by first defining a similarity measure between two graphs, and then presenting a filtering mechanism that reduces the number of candidate graphs. The candidate graphs are then verified by performing expensive graph search operations such as finding maximum common subgraphs. Existing works, however, do not report some similar graphs from a graph database while dissimilar graphs are not discarded during the filtering phase. To overcome this problem, in this paper, we first present a graph distance measure that can identify hidden but similar graphs that could not be discovered by previous graph distance measures. We then devise a series of filtering and validation rules to discard and identify non-matching and definitely-matching graphs, respectively, by calculating lower and upper bounds of the distance between a query and a data graph. To execute these filtering and validation rules efficiently during runtime, an index structure is also proposed. Lastly, a verification algorithm that verifies candidate graphs according to our graph distance measure is presented. Experiments on real datasets show that our approach can efficiently and effectively perform graph similarity search by significantly reducing the number of candidate graphs that must be verified, and by returning similar graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM. 16(9), 575–576 (1973)
Article MATH Google Scholar
Chen, C., Yan, X., Yu, P.S., Han, J., Zhang, D.-Q., Gu, X.: Towards graph containment search and indexing. In: Proceedings of the 33rd International Conference on very Large Data Bases, pp. 926–937. ACM, Vienna (2007)
Google Scholar
Cheng, J., Ke, Y., Ng, W.: Efficient query processing on graph databases. ACM Trans. Database Syst. 34(1) (2009). Article No. 2
Cheng, J., Ke, Y., Ng, W., Lu, A.: Fg-index: towards verification-free query processing on graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 857–872. ACM, Beijing (2007)
Cuissart, B., Hébrard, J.-J.: A direct algorithm to find a largest common connected induced subgraph of two graphs. In: GbRPR, pp. 162–171. Springer, Poitiers (2005)
Book Google Scholar
Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., Wu, Y.: Incremental graph pattern matching. In: SIGMOD, pp. 925–936 (2011)
Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. PVLDB. 3(1), 1161–1172 (2010)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)
MATH Google Scholar
Han, W.-S., Lee, J., Lee, J.-H.: Turbo_iso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 337–348. ACM, New York (2013)
Google Scholar
He, H., Singh, A.K.: Closure-tree: an index structure for graph queries. In: Proceedings of the 22nd International Conference on Data Engineering, p. 38. IEEE Computer Society, Atlanta (2006)
Google Scholar
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 405–418. ACM, Vancouver (2008)
Google Scholar
Jiang, H.,Wang, H., Yu, P.S., Zhou, S.: Gstring: a novel approach for efficient search in graph databases. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 566–575. IEEE, Istanbul (2007)
Google Scholar
Jin, C., Bhowmick, S.S., Choi, B., Zhou, S.: Prague: towards blending practical visual subgraph query formulation and query processing. In: IEEE 28th International Conference on Data Engineering, pp. 222–233. Washington (2012)
Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: SIGMOD, pp. 901–912 (2011)
Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)
Article MATH Google Scholar
Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)
Google Scholar
Lee, J., Oh, J.-H., Hwang, S.: Strg-index: Spatio-temporal region graph indexing for large video databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 718–729. ACM, Baltimore (2005)
Google Scholar
Mongiovì, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinforma. Comput. Biol. 8(2), 199–218 (2010)
Article Google Scholar
Raymond, J.W., Gardiner, E.J., Willett, P.: Rascal: calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45(6), 631–644 (2002)
Article MATH Google Scholar
Raymond, J.W., Willett, P.: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des. 16(7), 521–533 (2002)
Article Google Scholar
Shang, H., Lin, X., Zhang, Y., Yu, J.X., Wang, W.: Connected substructure similarity search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 903–914. ACM, Indianapolis (2010)
Book Google Scholar
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)
Google Scholar
Shang, H., Zhu, K., Lin, X., Zhang, Y., Ichise, R.: Similarity search on supergraph containment. In: Proceedings of the 26th International Conference on Data Engineering, pp. 637–648. IEEE, Long Beach (2010)
Book Google Scholar
Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PODS, pp. 39–52. ACM, Madison (2002)
Book Google Scholar
Tong, H., Faloutsos, C., Gallagher, B., Eliassi-Rad, T.: Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 737–746. ACM, San Jose (2007)
Book Google Scholar
Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)
Article MathSciNet Google Scholar
Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: IEEE 28th International Conference on Data Engineering, pp. 210–221. IEEE Computer Society, Washington (2012)
Book Google Scholar
White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 266–275. ACM, Washington (2003)
Book Google Scholar
Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 976–985. IEEE, Istanbul (2007)
Book Google Scholar
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 721–724. IEEE Computer Society, Maebashi City (2002)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 335–346. ACM, Paris (2004)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 766–777. ACM, Baltimore (2005)
Book Google Scholar
Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009)
Google Scholar
Zhang, S., Hu, M., Yang, J.: Treepi: a novel graph indexing method. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 966–975. IEEE, Istanbul (2007)
Google Scholar
Zhang, S., Yang, J.: Gaddi: distance index based subgraph matching in biological networks. In: Proceedings of the 12th International Conference on Extending Database Technology, pp. 192–203. ACM, Saint Petersburg (2009)
Google Scholar
Zhao, P., Yu, J.X., Yu, P.S.: Graph indexing: tree + delta >= graph. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 938–949. ACM, Vienna (2007)
Google Scholar
Zhu, Y., Qin, L., Yu, J.X., Ke, Y., Lin, X.: High efficiency and quality: large graphs matching. VLDB J. 22(3), 345–368 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, KAIST, Daejeon, Korea
Ryan Choi & Chin-Wan Chung

Authors

Ryan Choi
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Wan Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chin-Wan Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, R., Chung, CW. Efficient processing of graph similarity search. World Wide Web 18, 633–659 (2015). https://doi.org/10.1007/s11280-014-0274-4

Download citation

Received: 17 January 2013
Revised: 22 December 2013
Accepted: 03 January 2014
Published: 24 January 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11280-014-0274-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient processing of graph similarity search

Abstract

Access this article

Similar content being viewed by others

Similarity Search in Large-Scale Graph Databases

Efficient similarity join for certain graphs

Efficient structure similarity searches: a partition-based approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient processing of graph similarity search

Abstract

Access this article

Similar content being viewed by others

Similarity Search in Large-Scale Graph Databases

Efficient similarity join for certain graphs

Efficient structure similarity searches: a partition-based approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation