World Wide Web

, Volume 18, Issue 4, pp 871–887 | Cite as

Efficient subgraph join based on connectivity similarity

  • Yue Wang
  • Hongzhi WangEmail author
  • Jianzhong Li
  • Hong Gao


Graph is a widely accepted model of complex data representation. Graph model has been applied in many real applications including social networks, chemistry and pattern recognition, etc. The existence of noisy and inconsistent data makes graph similarity join imperative. The graph similarity join problem studied in this paper is to find graph pairs that can be joined due to similarity metrics. Exact graph join based on edit distance has been proved to be NP-hard, thus making approximate similarity join essential. In the paper, we propose a connectivity-similarity-based matching method, which is a new measure to evaluate graph similarity. We also apply a strategy called vertex similarity upper bound filtering in order to obtain a set of promising candidate pairs, which turns out to improve join efficiency. We perform experiments on real and synthetic graph databases to test proposed method, which is proven to achieve both good result quality and high efficiency among approximate join methods.


Graph similarity joins Connectivity similarity Upper bound of vertex similarity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recogn. Lett. 1(4), 245–253 (1983)zbMATHCrossRefGoogle Scholar
  2. 2.
    Cho, J., Shivakumar, N., Garcia-Molina H.: Finding Replicated Web Collections. SIGMOD Conf. pp. 355–366 (2000)Google Scholar
  3. 3.
    Cyr, C.M., Kimia, B.B.: 3D object recognition using shape similarity-based aspect graph. ICCV pp. 254–261 (2001)Google Scholar
  4. 4.
    Fan, W., Li, J., Ma, S., Tang, N., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010)Google Scholar
  5. 5.
    Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. VLDB 3(1), 1161–1172 (2010)Google Scholar
  6. 6.
    Fankhauser, S., Riesen, K., Bunke, H.: Speeding up graph edit distance computation through fast bipartite matching. GbRPR pp. 102–111 (2011)Google Scholar
  7. 7.
    Joshi, S.: A bag of paths model for measuring structural similarity in Web documents. KDD pp. 577–582 (2003)Google Scholar
  8. 8.
    Justice, D., Hero, A.O.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1200–1214 (2006)CrossRefGoogle Scholar
  9. 9.
    Ma, S., Cao, Y., Fan, W., Huai, J., Wo, T.: Capturing topology in graph pattern matching. PVLDB 5(4), 310–321 (2011)Google Scholar
  10. 10.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm. ICDE pp. 117--128 (2002)Google Scholar
  11. 11.
    Sanfeliu, A., Fu, K.-S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst. Man Cybern. 13(3), 353–362 (1983)zbMATHCrossRefGoogle Scholar
  12. 12.
    Schenker, A., Last, M., Bunke, H., Kandel, A.: Classification of Web documents using graph matching. IJPRAI 18(3), 1553–1559 (2004)Google Scholar
  13. 13.
    Shang, H., Lin, X., Zhang, Y., Yu, J.X., Wang, W.: Connected substructure similarity search. SIGMOD Conf. pp. 903–914 (2010)Google Scholar
  14. 14.
    Wang, G., Wang, B., Yang, X., Yu, G.: Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowl. Data Eng. 24(3), 440--451 (2012)Google Scholar
  15. 15.
    Williams, D. W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. ICDE pp. 976–985 (2007)Google Scholar
  16. 16.
    Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. SIGMOD Conf. 766–777 (2005)Google Scholar
  17. 17.
    Zeng, Z., Tung, A.K., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009)Google Scholar
  18. 18.
    Zeng, Z., Tung, A. K., Wang, J., Feng, J., Zhou, L.: Edit Distance Evaluation on Graph Structures. Technical Report TRA6/08. National University of Singapore ( 2008)Google Scholar
  19. 19.
    Zhao, X., Xiao, C., Lin, X., Wang, W.: Efficient graph similarity joins with edit distance constraints. ICDE pp. 834--845 (2012)Google Scholar
  20. 20.
    Zhu, Y., Qin, L., Yu, J. X., Ke, Y., Lin, X.: High efficiency and quality: large graphs matching. CIKM pp. 1755–1764 (2011)Google Scholar
  21. 21.
    Zou, L., Chen, L., Özsu, M.T.: Distance-join: pattern match query in a large graph databases. VLDB 2(1), 886--897 (2009)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Yue Wang
    • 1
  • Hongzhi Wang
    • 1
    Email author
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.The School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations