The VLDB Journal

, Volume 27, Issue 1, pp 53–78 | Cite as

Efficient structure similarity searches: a partition-based approach

  • Xiang Zhao
  • Chuan Xiao
  • Xuemin Lin
  • Wenjie Zhang
  • Yang Wang
Regular Paper


Graphs are widely used to model complex data in many applications, such as bioinformatics, chemistry, social networks, pattern recognition. A fundamental and critical query primitive is to efficiently search similar structures in a large collection of graphs. This article mainly studies threshold-based graph similarity search with edit distance constraints. Existing solutions to the problem utilize fixed-size overlapping substructures to generate candidates, and thus become susceptible to large vertex degrees and distance thresholds. In this article, we present a partition-based approach to tackle the problem. By dividing data graphs into variable-size non-overlapping partitions, the edit distance constraint is converted to a graph containment constraint for candidate generation. We develop efficient query processing algorithms based on the novel paradigm. Moreover, candidate-pruning techniques and an improved graph edit distance verification algorithm are developed to boost the performance. In addition, a cost-aware graph partitioning method is devised to optimize the index. Extending the partition-based filtering paradigm, we present a solution to the top-\(k\) graph similarity search problem, where tailored filtering, look-ahead and computation-sharing strategies are exploited. Using both public real-life and synthetic datasets, extensive experiments demonstrate that our approaches significantly outperform the baseline and its alternatives.


Graph database Similarity query Graph edit distance Top-k search 



Funding was provided by Japan Society for the Promotion of Science (Grant No. 16H01722), National Natural Science Foundation of China (Grant Nos. 61402494, 71690233), Natural Science Foundation of Hunan Province (Grant No. 2015JJ4009) and Australian Research Council (Grant Nos. DP150103071 and DP150102728).

Supplementary material

778_2017_487_MOESM1_ESM.pdf (147 kb)
Supplementary material 1 (pdf 146 KB)


  1. 1.
    Bi, F., Chang, L., Lin, X., Qin, L., Zhang, W.: Efficient subgraph matching by postponing Cartesian products. In: SIGMOD Conference, pp. 1199–1214 (2016)Google Scholar
  2. 2.
    Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. PRL 1(4), 245–253 (1983)CrossRefMATHGoogle Scholar
  3. 3.
    Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. IJPRAI 18(3), 265–298 (2004)Google Scholar
  4. 4.
    Fankhauser, S., Riesen, K., Bunke, H.: Speeding up graph edit distance computation through fast bipartite matching. In: GbRPR, pp. 102–111 (2011)Google Scholar
  5. 5.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness, 1st edn. W. H. Freeman, San Francisco (1979)MATHGoogle Scholar
  6. 6.
    Gouda, K., Arafa, M., Calders, T.: Bfst_ed: a novel upper bound computation framework for the graph edit distance. In: SISAP, pp. 3–19 (2016)Google Scholar
  7. 7.
    Gouda, K., Hassaan, M.: CSI_GED: an efficient approach for graph edit similarity computation. In: ICDE, pp. 265–276 (2016)Google Scholar
  8. 8.
    Gupta, M., Gao, J., Yan, X., Cam, H., Han, J.: Top-k interesting subgraph discovery in information networks. In: ICDE, pp. 820–831 (2014)Google Scholar
  9. 9.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Los Altos (2011)MATHGoogle Scholar
  10. 10.
    Han, W.-S., Lee, J., Lee, J.-H.: Turbo\(_{\text{iso}}\): towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD Conference, pp. 337–348 (2013)Google Scholar
  11. 11.
    He, H., Singh, A.K.: Closure-Tree: an index structure for graph queries. In: ICDE, p. 38 (2006)Google Scholar
  12. 12.
    Jin, C., Bhowmick, S.S., Choi, B., Zhou, S.: PRAGUE: towards blending practical visual subgraph query formulation and query processing. In: ICDE, pp. 222–233 (2012)Google Scholar
  13. 13.
    Marín, R.M., Aguirre, N.F., Daza, E.E.: Graph theoretical similarity approach to compare molecular electrostatic potentials. J. Chem. Inf. Model. 48(1), 109–118 (2008)CrossRefGoogle Scholar
  14. 14.
    Ranu, S., Hoang, M.X., Singh, A.K.: Answering top-\(k\) representative queries on graph databases. In: SIGMOD Conference, pp. 1163–1174 (2014)Google Scholar
  15. 15.
    Raveaux, R., Burie, J.-C., Ogier, J.-M.: A graph matching method and a graph matching distance based on subgraph assignments. PRL 31(5), 394–406 (2010)CrossRefGoogle Scholar
  16. 16.
    Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. PVLDB 8(5), 617–628 (2015)Google Scholar
  17. 17.
    Riesen, K., Fankhauser, S., Bunke, H.: Speeding up graph edit distance computation with a bipartite heuristic. In: MLG (2007)Google Scholar
  18. 18.
    Sanfeliu, A., Fu, K.-S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cyber. 13(3), 353–362 (1983)CrossRefMATHGoogle Scholar
  19. 19.
    Shang, H., Lin, X., Zhang, Y., Yu, J.X., Wang, W.: Connected substructure similarity search. In: SIGMOD Conference, pp. 903–914 (2010)Google Scholar
  20. 20.
    Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)Google Scholar
  21. 21.
    Ullmann, J.R.: Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism. ACM J. Exp. Algorithmics 15, 1–6 (2010)MathSciNetMATHGoogle Scholar
  22. 22.
    Ullmann, J.R.: Degree reduction in labeled graph retrieval. ACM J. Exp. Algorithmics 20, 1–3 (2015)MathSciNetMATHGoogle Scholar
  23. 23.
    Wang, G., Wang, B., Yang, X., Yu, G.: Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowl. Data Eng. 24(3), 440–451 (2012)CrossRefGoogle Scholar
  24. 24.
    Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: ICDE, pp. 210–221 (2012)Google Scholar
  25. 25.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD Conference, pp. 335–346 (2004)Google Scholar
  26. 26.
    Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: SIGMOD Conference, pp. 766–777 (2005)Google Scholar
  27. 27.
    Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: ICDE (to appear) (2016)Google Scholar
  28. 28.
    Yang, Z., Fu, A.W., Liu, R.: Diversified top-\(k\) subgraph querying in a large graph. In: SIGMOD Conference, pp. 1167–1182 (2016)Google Scholar
  29. 29.
    Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009)Google Scholar
  30. 30.
    Zhang, K., Wang, J.T.-L., Shasha, D.: On the editing distance between undirected acyclic graphs and related problems. In: CPM, pp. 395–407 (1995)Google Scholar
  31. 31.
    Zhang, S., Yang, J., Jin, W.: SAPPER: subgraph indexing and approximate matching in large graphs. PVLDB 3(1), 1185–1194 (2010)Google Scholar
  32. 32.
    Zhao, X., Xiao, C., Lin, X., Liu, Q., Zhang, W.: A partition-based approach to structure similarity search. PVLDB 7(3), 169–180 (2013)Google Scholar
  33. 33.
    Zhao, X., Xiao, C., Lin, X., Wang, W., Ishikawa, Y.: Efficient processing of graph similarity queries with edit distance constraints. VLDB J. 22(6), 727–752 (2013)CrossRefGoogle Scholar
  34. 34.
    Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng. 27(4), 964–978 (2015)CrossRefGoogle Scholar
  35. 35.
    Zhu, Y., Qin, L., Yu, J.X., Cheng, H.: Finding top-\(k\) similar graphs in graph databases. In: EDBT, pp. 456–467 (2012)Google Scholar
  36. 36.
    Zhu, Y., Yu, J.X., Qin, L.: Leveraging graph dimensions in online graph search. PVLDB 8(1), 85–96 (2014)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.National University of Defense TechnologyChangshaChina
  2. 2.Collaborative Innovation Center of Geospatial TechnologyWuhanChina
  3. 3.Nagoya UniversityNagoyaJapan
  4. 4.The University of New South WalesSydneyAustralia

Personalised recommendations