Abstract
Similarity join of complex structures is an important operation in managing graph data. In this paper, we investigate the problem of graph similarity join with edit distance constraints. Existing algorithms extract substructures – either rooted trees or simple paths – as features, and transform the edit distance constraint into a weaker count filtering condition. However, the performance suffers from the heavy overlapping or low selectivity of substructures. To resolve the issue, we first present a general framework for substructure-based similarity join and a tighter count filtering condition. It is observed under the framework that using either too few or too many substructures can result in poor filtering performance. Thus, we devise an algorithm to select substructures for filtering. The proposed techniques are integrated into the framework, constituting a new algorithm, whose superiority is witnessed by experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fankhauser, S., Riesen, K., Bunke, H.: Speeding up graph edit distance computation through fast bipartite matching. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 102–111. Springer, Heidelberg (2011)
Hung, H., Bhowmick, S., Truong, B., Choi, B., Zhou, S.: QUBLE: towards blending interactive visual subgraph search queries on large networks. The VLDB Journal, 1–26 (2013)
Kang, U., Tong, H., Sun, J., Lin, C.-Y., Faloutsos, C.: gbase: an efficient analysis platform for large graphs. VLDB J. 21(5), 637–650 (2012)
Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: NeMa: Fast graph search with label similarity. PVLDB 6(1), 181–192 (2013)
Riesen, K., Fankhauser, S., Bunke, H.: Speeding up graph edit distance computation with a bipartite heuristic. In: MLG (2007)
Shang, H., Lin, X., Zhang, Y., Yu, J.X., Wang, W.: Connected substructure similarity search. In: SIGMOD Conference, pp. 903–914 (2010)
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using mapreduce. In: SIGMOD Conference, pp. 495–506 (2010)
Wang, G., Wang, B., Yang, X., Yu, G.: Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowl. Data Eng. 24(3), 440–451 (2012)
Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: ICDE, pp. 210–221 (2012)
Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: SIGMOD Conference, pp. 766–777 (2005)
Yuan, D., Mitra, P., Giles, C.L.: Mining and indexing graphs for supergraph search. PVLDB 6(10), 829–840 (2013)
Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: On approximating graph edit distance. PVLDB 2(1), 25–36 (2009)
Zhao, X., Xiao, C., Lin, X., Wang, W., Ishikawa, Y.: Efficient processing of graph similarity queries with edit distance constraints. The VLDB Journal, 1–26 (2013)
Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. PVLDB 6(7), 493–504 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhao, X., Xiao, C., Zhang, W., Lin, X., Tang, J. (2014). Improving Performance of Graph Similarity Joins Using Selected Substructures. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-05810-8_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05809-2
Online ISBN: 978-3-319-05810-8
eBook Packages: Computer ScienceComputer Science (R0)