Abstract
Recently, uncertain graph data management and mining techniques have attracted significant interests and research efforts due to potential applications such as protein interaction networks and social networks. Specifically, as a fundamental problem, subgraph similarity all-matching is widely applied in exploratory data analysis. The purpose of subgraph similarity all-matching is to find all the similarity occurrences of the query graph in a large data graph. Numerous algorithms and pruning methods have been developed for the subgraph matching problem over a certain graph. However, insufficient efforts are devoted to subgraph similarity all-matching over an uncertain data graph, which is quite challenging due to high computation costs. In this paper, we define the problem of subgraph similarity maximal all-matching over a large uncertain data graph and propose a framework to solve this problem. To further improve the efficiency, several speed-up techniques are proposed such as the partial graph evaluation, the vertex pruning, the calculation model transformation, the incremental evaluation method and the probability upper bound filtering. Finally, comprehensive experiments are conducted on real graph data to test the performance of our framework and optimization methods. The results verify that our solutions can outperform the basic approach by orders of magnitudes in efficiency.
Similar content being viewed by others
References
Adar, E., Re, C.: Managing Uncertainty in Social Networks. IEEE Data Eng. Bull. 30(2), 15–22 (2007)
Aggarwal, C.C., Wang, H.: Managing and Mining Graph Data, vol.40 of Advances in Database Systems. Springer (2010)
Choi, R., Chung, C.-W.: Efficient processing of graph similarity search. WWW J. preprint, doi:10.1007/s11280-014-0274-4
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty Years Of Graph Matching In Pattern Recognition. IJPRAI 18(3), 265–298 (2004)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
Gu, Y., Gao, C., Cong, G., Yu, G.: Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs. IEEE Trans. Knowl. Data Eng. 26(5), 1117–1130 (2014)
Hua, M., Pei, J.: Probabilistic path queries in road networks: traffic uncertainty aware path selection. In: Proceedings of EDBT, pp 347–358 (2010)
Jiang, H., Wang, H., Yu, P.S., Zhou, S.: GString: A Novel Approach for Efficient Search in Graph Databases. In: Proceedings of ICDE, pp 566–575 (2007)
Jin, R., Liu, L., Aggarwal, C.C.: Discovering highly reliable subgraphs in uncertain graphs. In: Proceedings of KDD, pp. 992–1000 (2011)
Jin, R., Liu, L., Ding, B., Wang, H.: Distance-Constraint Reachability Computation in Uncertain Graphs. PVLDB 4(9), 551–562 (2011)
Kollios, G., Potamias, M., Terzi, E.: Clustering Large Probabilistic Graphs. IEEE Trans. Knowl. Data Eng. 25(2), 325–336 (2011)
Larrosa, J., Valiente, G.: Constraint Satisfaction Algorithms for Graph Pattern Matching. Math. Struct. Comput. Sci. 12(4), 403–422 (2002)
Liu, Z., Wang, C., Wang, J.: Aggregate nearest neighbor queries in uncertain graphs. WWW J. 17(1), 161–188 (2014)
Li, J., Zou, Z., Gao, H.: Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDB J. 21(6), 753–777 (2012)
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: k-Nearest Neighbors in Uncertain Graphs. PVLDB 3(1), 997–1008 (2010)
Ullmann, J.R.: An Algorithm for Subgraph Isomorphism. J. ACM 23(1), 31–42 (1976)
Wang, X., Smalter, A.M., Huan, J., Lushington, G.H.: G-hash: towards fast kernel-based similarity search in large graph databases. In: Proceedings of EDBT, pp 472–480 (2009)
Wang, Y., Wang, H., Li, J., Gao, H.: Efficient subgraph join based on connectivity similarity. WWW J. preprint, doi:10.1007/s11280-014-0286-0
Yuan, Y., Wang, G., Chen, L., Wang, H.: Efficient Subgraph Similarity Search on Large Probabilisti Graph Databases. PVLDB 5(9), 800–811 (2012)
Yuan, Y., Wang, G., Wang, H., Chen, L.: Efficient Subgraph Search over Large Uncertain Graphs. PVLDB 4(11), 876–886 (2011)
Zhu, G., Lin, X., Zhu, K., Zhang, W., Yu, J.X.: TreeSpan: efficiently computing similarity all-matching. In: Proceedings of SIGMOD, pp. 529–540 (2012)
Zhang, S., Yang, J., Jin, W.: SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs. PVLDB 3(1), 1185–1194 (2010)
Zou, Z., Gao, H., Li, J.: Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of KDD, pp 633–642 (2010)
Acknowledgments
This work was supported by the National Basic Research Program of China (973 Program) under Grant No. 2012CB316201, the National Natural Science Foundation of China (61472071,61272179) and the Fundamental Research Funds for the Central Universities(N130404010).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gu, Y., Gao, C., Wang, L. et al. Subgraph similarity maximal all-matching over a large uncertain graph. World Wide Web 19, 755–782 (2016). https://doi.org/10.1007/s11280-015-0358-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-015-0358-9