Advertisement

Frontiers of Computer Science

, Volume 10, Issue 2, pp 317–329 | Cite as

Efficient graph similarity join for information integration on graphs

  • Yue Wang
  • Hongzhi WangEmail author
  • Jianzhong Li
  • Hong Gao
Research Article

Abstract

Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.

Keywords

graph similarity join edit distance constraint khop tree based indexing structure conservation boundary filtering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11704_2015_4505_MOESM1_ESM.ppt (357 kb)
Supplementary material, approximately 357 KB.

References

  1. 1.
    Zhao X, Xiao C, Lin X, Wang W. Efficient graph similarity joins with edit distance constraints. In: Proceedings of the 28th IEEE International Conference on Data Engineer. 2012, 834–845Google Scholar
  2. 2.
    Qin J, Wang W, Lu Y, Xiao C, Lin X. Efficient exact edit similarity query processing with the asymmetric signature schemes. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 1033–1044CrossRefGoogle Scholar
  3. 3.
    Fan W, Li J, Ma S, Tang N, Wu Y. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2011, 3(1): 264–275Google Scholar
  4. 4.
    Ma S, Cao Y, FanW, Huai J, Wo T. Capturing topology in graph pattern matching. Proceedings of the VLDB Endowment, 2011, 5(4): 310–321CrossRefGoogle Scholar
  5. 5.
    Sanfeliu A, Fu K S. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 1983, 13(3): 353–362CrossRefzbMATHGoogle Scholar
  6. 6.
    Bunke H, Allermann G. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1983, 1(4): 245–253CrossRefzbMATHGoogle Scholar
  7. 7.
    Gouda K, Arafa M. An improved global lower bound for graph edit similarity search. Pattern Recognition Letters, 2015 58: 8–14CrossRefGoogle Scholar
  8. 8.
    Ibragimov R. Exact and heuristic algorithms for network alignment using graph edit distance models. Dissertation for the Doctoral Degree. Fachrichtung 6.2 — Informatik, 2015Google Scholar
  9. 9.
    Baumbach J, Guo J, Ibragimov R. Multiple graph edit distance: simultaneous topological alignment of multiple protein–protein interaction networks with an evolutionary algorithm. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation. 2014: 277–284Google Scholar
  10. 10.
    Justice D, Hero A. A binary linear programming formulation of the graph edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(8): 1200–1214CrossRefGoogle Scholar
  11. 11.
    Fankhauser S, Riesen K, Bunke H. Speeding up graph edit distance computation through fast bipartite matching. In: Proceedings of the 8th International Workshop on Graph–Based Representations in Pattern Recognition. 2011, 102–111CrossRefGoogle Scholar
  12. 12.
    Wang G, Wang B, Yang X, G. Yu G. Efficiently indexing large sparse graphs for similarity search. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(3): 440–451CrossRefGoogle Scholar
  13. 13.
    Wang Y, Wang H, Li J, Gao H. Graph similarity join with k–hop tree indexing. In: Proceedings of the International Conference of Young Computer Scientists, Engineers and Educators. 2015, 38–47Google Scholar
  14. 14.
    Zaki M J. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8): 1021–1035CrossRefGoogle Scholar
  15. 15.
    Gao X, Xiao B, Tao D, Li X. A survey of graph edit distance. Pattern Analysis and Applications, 2010, 13(1): 113–129MathSciNetCrossRefGoogle Scholar
  16. 16.
    Conte D, Ramel JY, Sidère N, Luqman MM, Gaüzère B, Gibert J, Brun L, Vento M. A comparison of explicit and implicit graph embedding methods for pattern recognition. In: Proceedings of the 9th International Workshop on Graph–Based Representations in Pattern Recognition. 2013, 81–90CrossRefGoogle Scholar
  17. 17.
    Shao Y, Cui B, Chen L, Liu M, Xie X. An efficient similarity search framework for SimRank over large dynamic graphs. Proceedings of the VLDB Endowment, 2015, 8(8): 838–849CrossRefGoogle Scholar
  18. 18.
    Shao Y, CuiM, Ma L. PAGE: a partition aware engine for parallel graph computation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(2): 518–530MathSciNetCrossRefGoogle Scholar
  19. 19.
    Xu N, Chen L, Cui B. LogGP: a log–based dynamic graph partitioning method. Proceedings of the VLDB Endowment, 2014, 7(14): 1917–1928CrossRefGoogle Scholar
  20. 20.
    Shao Y, Chen L, Cui B. Efficient cohesive subgraphs detection in parallel. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 613–624Google Scholar
  21. 21.
    Shao Y, Cui B, Chen L, Ma L, Yao J, Xu N. Parallel subgraph listing in a large–scale graph. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 625–636Google Scholar
  22. 22.
    Shao Y, Yao J, Cui B, Ma L. PAGE: a partition aware graph computation engine. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 823–828Google Scholar
  23. 23.
    Cui B, Mei H, Ooi B C. Big data: the driver for innovation in databases. National Science Review, 2014, 1 (1): 27–30CrossRefGoogle Scholar
  24. 24.
    Shang H, Lin X, Zhang Y, Yu J X, Wang W. Connected substructure similarity search. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 903–914CrossRefGoogle Scholar
  25. 25.
    Yan X, Yu P S, Han J. Substructure similarity search in graph databases. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 766–777CrossRefGoogle Scholar
  26. 26.
    Zhu Y, Qin L, Yu J X, Ke Y, Lin X. High efficiency and quality: large graphs matching. The VLDB Journal — The International Journal on Very Large Data Bases, 2013, 22(3): 345–368CrossRefGoogle Scholar
  27. 27.
    Williams D W, Huan J, Wang W. Graph database indexing using structured graph decomposition. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 976–985Google Scholar
  28. 28.
    Zou L, Chen L, Özsu M T. Distance–join: pattern match query in a large graph databases. Proceedings of the VLDB Endowment, 2009, 2(1): 886–897CrossRefGoogle Scholar
  29. 29.
    Zeng Z, Tung A K, Wang J, Feng J, Zhou L. Comparing stars: on approximating graph edit distance. Proceedings of the VLDB Endowment, 2009, 2(1): 25–36CrossRefGoogle Scholar
  30. 30.
    Zheng W, Zou L, Feng Y, Chen L, Zhao D. Efficient SimRank–based similarity join over large graphs. Proceedings of the VLDB Endowment, 2013, 6(7): 493–504CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Yue Wang
    • 1
  • Hongzhi Wang
    • 1
    Email author
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.Department of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations