Efficient graph similarity join for information integration on graphs
- 99 Downloads
Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.
Keywordsgraph similarity join edit distance constraint khop tree based indexing structure conservation boundary filtering
Unable to display preview. Download preview PDF.
- 1.Zhao X, Xiao C, Lin X, Wang W. Efficient graph similarity joins with edit distance constraints. In: Proceedings of the 28th IEEE International Conference on Data Engineer. 2012, 834–845Google Scholar
- 3.Fan W, Li J, Ma S, Tang N, Wu Y. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2011, 3(1): 264–275Google Scholar
- 8.Ibragimov R. Exact and heuristic algorithms for network alignment using graph edit distance models. Dissertation for the Doctoral Degree. Fachrichtung 6.2 — Informatik, 2015Google Scholar
- 9.Baumbach J, Guo J, Ibragimov R. Multiple graph edit distance: simultaneous topological alignment of multiple protein–protein interaction networks with an evolutionary algorithm. In: Proceedings of the 2014 Conference on Genetic and Evolutionary Computation. 2014: 277–284Google Scholar
- 13.Wang Y, Wang H, Li J, Gao H. Graph similarity join with k–hop tree indexing. In: Proceedings of the International Conference of Young Computer Scientists, Engineers and Educators. 2015, 38–47Google Scholar
- 16.Conte D, Ramel JY, Sidère N, Luqman MM, Gaüzère B, Gibert J, Brun L, Vento M. A comparison of explicit and implicit graph embedding methods for pattern recognition. In: Proceedings of the 9th International Workshop on Graph–Based Representations in Pattern Recognition. 2013, 81–90CrossRefGoogle Scholar
- 20.Shao Y, Chen L, Cui B. Efficient cohesive subgraphs detection in parallel. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 613–624Google Scholar
- 21.Shao Y, Cui B, Chen L, Ma L, Yao J, Xu N. Parallel subgraph listing in a large–scale graph. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 625–636Google Scholar
- 22.Shao Y, Yao J, Cui B, Ma L. PAGE: a partition aware graph computation engine. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 823–828Google Scholar
- 27.Williams D W, Huan J, Wang W. Graph database indexing using structured graph decomposition. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 976–985Google Scholar