Skip to main content
Log in

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over uncertain graph data under the probabilistic semantics. Specifically, a measure called \({\varphi}\) -frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two real numbers \({0 < \varphi, \tau < 1}\) , the goal is to quickly find all subgraphs with \({\varphi}\) -frequent probability at least τ. Due to the NP-hardness of the problem and to the #P-hardness of computing the \({\varphi}\) -frequent probability of a subgraph, an approximate mining algorithm is proposed to produce an \({(\varepsilon, \delta)}\) -approximate set Π of “frequent subgraphs”, where \({0 < \varepsilon < \tau}\) is error tolerance, and 0 < δ < 1 is a confidence bound. The algorithm guarantees that (1) any frequent subgraph S is contained in Π with probability at least ((1 − δ) /2)s, where s is the number of edges in S; (2) any infrequent subgraph with \({\varphi}\) -frequent probability less than \({\tau - \varepsilon}\) is contained in Π with probability at most δ/2. The theoretical analysis shows that to obtain any frequent subgraph with probability at least 1 − Δ, the input parameter δ of the algorithm must be set to at most \({1 - 2 (1 - \Delta)^{1 / \ell_{\max}}}\) , where 0 < Δ < 1, and max is the maximum number of edges in frequent subgraphs. Extensive experiments on real uncertain graph data verify that the proposed algorithm is practically efficient and has very high approximation quality. Moreover, the difference between the probabilistic semantics and the expected semantics on mining frequent subgraphs over uncertain graph data has been discussed in this paper for the first time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Proceedings of VLDB, pp. 1151–1154 (2006)

  2. Alon N., Spencer J.H.: The Probabilistic Method. Wiley, New York (1992)

    MATH  Google Scholar 

  3. Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Züfle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of KDD, pp. 119–128 (2009)

  4. Birnbaum E., Lozinskii E.L.: The good old Davis–Putnam procedure helps counting models. J. Artif. Intell. Res. 10, 457–477 (1999)

    MathSciNet  MATH  Google Scholar 

  5. Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of ICDM, pp. 51–58 (2002)

  6. Chui, C.K., Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 64–75 (2008)

  7. Chui, C.K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 47–58 (2007)

  8. Cormode, G., Garofalakis, M.N.: Sketching probabilistic data streams. In: Proceedings of SIGMOD Conference, pp. 281–292 (2007)

  9. Garey M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  10. Gudes E., Shimony S.E., Vanetik N.: Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng. 18(11), 1441–1456 (2006)

    Article  Google Scholar 

  11. Hasan M.A., Zaki M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)

    Google Scholar 

  12. Hintsanen P., Toivonen H.: Finding reliable subgraphs from large probabilistic graphs. DMKD 17(1), 3–23 (2008)

    Article  MathSciNet  Google Scholar 

  13. Hua, M., Pei, J.: Probabilistic path queries in road networks: traffic uncertainty aware path selection. In: Proceedings of EDBT, pp. 347–358 (2010)

  14. Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of ICDM, pp. 549–552 (2003)

  15. Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of KDD, pp. 581–586 (2004)

  16. Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of PKDD, pp. 13–23 (2000)

  17. Karp, R.M., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. In: Proceedings of FOCS, pp. 56–64 (1983)

  18. Kimmig, A., Raedt, L.D.: Local query mining in a probabilistic prolog. In: Proceedings of IJCAI, pp. 1095–1100 (2009)

  19. Koch C., Olteanu D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)

    Google Scholar 

  20. Kuramochi M., Karypis G.: An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng. 16(9), 1038–1051 (2004)

    Article  Google Scholar 

  21. Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of KDD, pp. 177–187 (2005)

  22. Leung, C.K.S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: Proceedings of ICDE, pp. 1663–1670 (2009)

  23. Luby M., Velickovic B.: On deterministic approximation of dnf. Algorithmica 16(4/5), 415–433 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  24. Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of KDD, 647–652 (2004)

  25. Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: Proceedings of ICDE, pp. 145–156 (2010)

  26. Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of EDBT, pp. 355–366 (2011)

  27. Poole, D.: Logic programming, abduction and probability. In: Proceedings of FGCS, pp. 530–538 (1992)

  28. Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: k-nearest neighbors in uncertain graphs. In: Proceedings of VLDB (2010)

  29. Raedt, L.D., Kimmig, A., Toivonen, H.: Problog: A probabilistic prolog and its application in link discovery. In: Proceedings of IJCAI, pp. 2462–2467 (2007)

  30. Sun, L., Cheng, R., Cheung, D., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of KDD, pp. 273–282 (2010)

  31. Trevisan, L.: A note on approximate counting for k-dnf. In: Proceedings of APPROX-RANDOM, pp. 417–426 (2004)

  32. Ullmann J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)

    Article  MathSciNet  Google Scholar 

  33. Valiant L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  34. Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of ICDM, pp. 721–724 (2002)

  35. Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proceedings of KDD, pp. 286–295 (2003)

  36. Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of KDD, pp. 344–353 (2004)

  37. Yuan, Y., Chen, L., Wang, G.: Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In: Proceedings of DASFAA, pp. 155–170 (2010)

  38. Yuan Y., Wang G., Wang H., Chen L.: Efficient subgraph search over large uncertain graphs. PVLDB 4(11), 876–886 (2011)

    Google Scholar 

  39. Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of SIGMOD, pp. 819–832 (2008)

  40. Zhu, F., Yan, X., Han, J., Yu, P.S.: gprune: A constraint pushing framework for graph pattern mining. In: Proceedings of PAKDD, pp. 388–400 (2007)

  41. Zou, Z., Gao, H., Li, J.: Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of KDD, pp. 633–642 (2010)

  42. Zou, Z., Li, J., Gao, H., Zhang, S.: Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of CIKM, pp. 583–592 (2009)

  43. Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-k maximal cliques in an uncertain graph. In: Proceedings of ICDE, pp. 649–652 (2010)

  44. Zou Z., Li J., Gao H., Zhang S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaonian Zou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Zou, Z. & Gao, H. Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. The VLDB Journal 21, 753–777 (2012). https://doi.org/10.1007/s00778-012-0268-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-012-0268-8

Keywords

Navigation