Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Li, Jianzhong; Zou, Zhaonian; Gao, Hong

doi:10.1007/s00778-012-0268-8

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Regular Paper
Published: 28 February 2012

Volume 21, pages 753–777, (2012)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Jianzhong Li¹,
Zhaonian Zou¹ &
Hong Gao¹

1145 Accesses
24 Citations
Explore all metrics

Abstract

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over uncertain graph data under the probabilistic semantics. Specifically, a measure called \({\varphi}\) -frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two real numbers \({0 < \varphi, \tau < 1}\) , the goal is to quickly find all subgraphs with \({\varphi}\) -frequent probability at least τ. Due to the NP-hardness of the problem and to the #P-hardness of computing the \({\varphi}\) -frequent probability of a subgraph, an approximate mining algorithm is proposed to produce an \({(\varepsilon, \delta)}\) -approximate set Π of “frequent subgraphs”, where \({0 < \varepsilon < \tau}\) is error tolerance, and 0 < δ < 1 is a confidence bound. The algorithm guarantees that (1) any frequent subgraph S is contained in Π with probability at least ((1 − δ) /2)^s, where s is the number of edges in S; (2) any infrequent subgraph with \({\varphi}\) -frequent probability less than \({\tau - \varepsilon}\) is contained in Π with probability at most δ/2. The theoretical analysis shows that to obtain any frequent subgraph with probability at least 1 − Δ, the input parameter δ of the algorithm must be set to at most \({1 - 2 (1 - \Delta)^{1 / \ell_{\max}}}\) , where 0 < Δ < 1, and ℓ _max is the maximum number of edges in frequent subgraphs. Extensive experiments on real uncertain graph data verify that the proposed algorithm is practically efficient and has very high approximation quality. Moreover, the difference between the probabilistic semantics and the expected semantics on mining frequent subgraphs over uncertain graph data has been discussed in this paper for the first time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Proceedings of VLDB, pp. 1151–1154 (2006)
Alon N., Spencer J.H.: The Probabilistic Method. Wiley, New York (1992)
MATH Google Scholar
Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Züfle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of KDD, pp. 119–128 (2009)
Birnbaum E., Lozinskii E.L.: The good old Davis–Putnam procedure helps counting models. J. Artif. Intell. Res. 10, 457–477 (1999)
MathSciNet MATH Google Scholar
Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of ICDM, pp. 51–58 (2002)
Chui, C.K., Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 64–75 (2008)
Chui, C.K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 47–58 (2007)
Cormode, G., Garofalakis, M.N.: Sketching probabilistic data streams. In: Proceedings of SIGMOD Conference, pp. 281–292 (2007)
Garey M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)
MATH Google Scholar
Gudes E., Shimony S.E., Vanetik N.: Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng. 18(11), 1441–1456 (2006)
Article Google Scholar
Hasan M.A., Zaki M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)
Google Scholar
Hintsanen P., Toivonen H.: Finding reliable subgraphs from large probabilistic graphs. DMKD 17(1), 3–23 (2008)
Article MathSciNet Google Scholar
Hua, M., Pei, J.: Probabilistic path queries in road networks: traffic uncertainty aware path selection. In: Proceedings of EDBT, pp. 347–358 (2010)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of ICDM, pp. 549–552 (2003)
Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of KDD, pp. 581–586 (2004)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of PKDD, pp. 13–23 (2000)
Karp, R.M., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. In: Proceedings of FOCS, pp. 56–64 (1983)
Kimmig, A., Raedt, L.D.: Local query mining in a probabilistic prolog. In: Proceedings of IJCAI, pp. 1095–1100 (2009)
Koch C., Olteanu D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)
Google Scholar
Kuramochi M., Karypis G.: An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng. 16(9), 1038–1051 (2004)
Article Google Scholar
Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of KDD, pp. 177–187 (2005)
Leung, C.K.S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: Proceedings of ICDE, pp. 1663–1670 (2009)
Luby M., Velickovic B.: On deterministic approximation of dnf. Algorithmica 16(4/5), 415–433 (1996)
Article MathSciNet MATH Google Scholar
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of KDD, 647–652 (2004)
Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: Proceedings of ICDE, pp. 145–156 (2010)
Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of EDBT, pp. 355–366 (2011)
Poole, D.: Logic programming, abduction and probability. In: Proceedings of FGCS, pp. 530–538 (1992)
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: k-nearest neighbors in uncertain graphs. In: Proceedings of VLDB (2010)
Raedt, L.D., Kimmig, A., Toivonen, H.: Problog: A probabilistic prolog and its application in link discovery. In: Proceedings of IJCAI, pp. 2462–2467 (2007)
Sun, L., Cheng, R., Cheung, D., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of KDD, pp. 273–282 (2010)
Trevisan, L.: A note on approximate counting for k-dnf. In: Proceedings of APPROX-RANDOM, pp. 417–426 (2004)
Ullmann J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)
Article MathSciNet Google Scholar
Valiant L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)
Article MathSciNet MATH Google Scholar
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of ICDM, pp. 721–724 (2002)
Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proceedings of KDD, pp. 286–295 (2003)
Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of KDD, pp. 344–353 (2004)
Yuan, Y., Chen, L., Wang, G.: Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In: Proceedings of DASFAA, pp. 155–170 (2010)
Yuan Y., Wang G., Wang H., Chen L.: Efficient subgraph search over large uncertain graphs. PVLDB 4(11), 876–886 (2011)
Google Scholar
Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of SIGMOD, pp. 819–832 (2008)
Zhu, F., Yan, X., Han, J., Yu, P.S.: gprune: A constraint pushing framework for graph pattern mining. In: Proceedings of PAKDD, pp. 388–400 (2007)
Zou, Z., Gao, H., Li, J.: Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of KDD, pp. 633–642 (2010)
Zou, Z., Li, J., Gao, H., Zhang, S.: Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of CIKM, pp. 583–592 (2009)
Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-k maximal cliques in an uncertain graph. In: Proceedings of ICDE, pp. 649–652 (2010)
Zou Z., Li J., Gao H., Zhang S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
Jianzhong Li, Zhaonian Zou & Hong Gao

Authors

Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhaonian Zou
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaonian Zou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Zou, Z. & Gao, H. Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. The VLDB Journal 21, 753–777 (2012). https://doi.org/10.1007/s00778-012-0268-8

Download citation

Received: 27 May 2011
Revised: 11 January 2012
Accepted: 09 February 2012
Published: 28 February 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s00778-012-0268-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Abstract

Access this article

Similar content being viewed by others

Uncertain maximal frequent subgraph mining algorithm based on adjacency matrix and weight

Graph similarity search on large uncertain graph databases

Subgraph similarity maximal all-matching over a large uncertain graph

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

Abstract

Access this article

Similar content being viewed by others

Uncertain maximal frequent subgraph mining algorithm based on adjacency matrix and weight

Graph similarity search on large uncertain graph databases

Subgraph similarity maximal all-matching over a large uncertain graph

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation