Abstract
Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over uncertain graph data under the probabilistic semantics. Specifically, a measure called \({\varphi}\) -frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two real numbers \({0 < \varphi, \tau < 1}\) , the goal is to quickly find all subgraphs with \({\varphi}\) -frequent probability at least τ. Due to the NP-hardness of the problem and to the #P-hardness of computing the \({\varphi}\) -frequent probability of a subgraph, an approximate mining algorithm is proposed to produce an \({(\varepsilon, \delta)}\) -approximate set Π of “frequent subgraphs”, where \({0 < \varepsilon < \tau}\) is error tolerance, and 0 < δ < 1 is a confidence bound. The algorithm guarantees that (1) any frequent subgraph S is contained in Π with probability at least ((1 − δ) /2)s, where s is the number of edges in S; (2) any infrequent subgraph with \({\varphi}\) -frequent probability less than \({\tau - \varepsilon}\) is contained in Π with probability at most δ/2. The theoretical analysis shows that to obtain any frequent subgraph with probability at least 1 − Δ, the input parameter δ of the algorithm must be set to at most \({1 - 2 (1 - \Delta)^{1 / \ell_{\max}}}\) , where 0 < Δ < 1, and ℓ max is the maximum number of edges in frequent subgraphs. Extensive experiments on real uncertain graph data verify that the proposed algorithm is practically efficient and has very high approximation quality. Moreover, the difference between the probabilistic semantics and the expected semantics on mining frequent subgraphs over uncertain graph data has been discussed in this paper for the first time.
Similar content being viewed by others
References
Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Proceedings of VLDB, pp. 1151–1154 (2006)
Alon N., Spencer J.H.: The Probabilistic Method. Wiley, New York (1992)
Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Züfle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of KDD, pp. 119–128 (2009)
Birnbaum E., Lozinskii E.L.: The good old Davis–Putnam procedure helps counting models. J. Artif. Intell. Res. 10, 457–477 (1999)
Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of ICDM, pp. 51–58 (2002)
Chui, C.K., Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 64–75 (2008)
Chui, C.K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 47–58 (2007)
Cormode, G., Garofalakis, M.N.: Sketching probabilistic data streams. In: Proceedings of SIGMOD Conference, pp. 281–292 (2007)
Garey M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)
Gudes E., Shimony S.E., Vanetik N.: Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng. 18(11), 1441–1456 (2006)
Hasan M.A., Zaki M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)
Hintsanen P., Toivonen H.: Finding reliable subgraphs from large probabilistic graphs. DMKD 17(1), 3–23 (2008)
Hua, M., Pei, J.: Probabilistic path queries in road networks: traffic uncertainty aware path selection. In: Proceedings of EDBT, pp. 347–358 (2010)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of ICDM, pp. 549–552 (2003)
Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of KDD, pp. 581–586 (2004)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of PKDD, pp. 13–23 (2000)
Karp, R.M., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. In: Proceedings of FOCS, pp. 56–64 (1983)
Kimmig, A., Raedt, L.D.: Local query mining in a probabilistic prolog. In: Proceedings of IJCAI, pp. 1095–1100 (2009)
Koch C., Olteanu D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)
Kuramochi M., Karypis G.: An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng. 16(9), 1038–1051 (2004)
Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of KDD, pp. 177–187 (2005)
Leung, C.K.S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: Proceedings of ICDE, pp. 1663–1670 (2009)
Luby M., Velickovic B.: On deterministic approximation of dnf. Algorithmica 16(4/5), 415–433 (1996)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of KDD, 647–652 (2004)
Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: Proceedings of ICDE, pp. 145–156 (2010)
Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of EDBT, pp. 355–366 (2011)
Poole, D.: Logic programming, abduction and probability. In: Proceedings of FGCS, pp. 530–538 (1992)
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: k-nearest neighbors in uncertain graphs. In: Proceedings of VLDB (2010)
Raedt, L.D., Kimmig, A., Toivonen, H.: Problog: A probabilistic prolog and its application in link discovery. In: Proceedings of IJCAI, pp. 2462–2467 (2007)
Sun, L., Cheng, R., Cheung, D., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of KDD, pp. 273–282 (2010)
Trevisan, L.: A note on approximate counting for k-dnf. In: Proceedings of APPROX-RANDOM, pp. 417–426 (2004)
Ullmann J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)
Valiant L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of ICDM, pp. 721–724 (2002)
Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proceedings of KDD, pp. 286–295 (2003)
Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of KDD, pp. 344–353 (2004)
Yuan, Y., Chen, L., Wang, G.: Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In: Proceedings of DASFAA, pp. 155–170 (2010)
Yuan Y., Wang G., Wang H., Chen L.: Efficient subgraph search over large uncertain graphs. PVLDB 4(11), 876–886 (2011)
Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of SIGMOD, pp. 819–832 (2008)
Zhu, F., Yan, X., Han, J., Yu, P.S.: gprune: A constraint pushing framework for graph pattern mining. In: Proceedings of PAKDD, pp. 388–400 (2007)
Zou, Z., Gao, H., Li, J.: Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of KDD, pp. 633–642 (2010)
Zou, Z., Li, J., Gao, H., Zhang, S.: Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of CIKM, pp. 583–592 (2009)
Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-k maximal cliques in an uncertain graph. In: Proceedings of ICDE, pp. 649–652 (2010)
Zou Z., Li J., Gao H., Zhang S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Zou, Z. & Gao, H. Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. The VLDB Journal 21, 753–777 (2012). https://doi.org/10.1007/s00778-012-0268-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0268-8