Advertisement

The VLDB Journal

, Volume 21, Issue 6, pp 753–777 | Cite as

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

  • Jianzhong Li
  • Zhaonian ZouEmail author
  • Hong Gao
Regular Paper

Abstract

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over uncertain graph data under the probabilistic semantics. Specifically, a measure called \({\varphi}\) -frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two real numbers \({0 < \varphi, \tau < 1}\) , the goal is to quickly find all subgraphs with \({\varphi}\) -frequent probability at least τ. Due to the NP-hardness of the problem and to the #P-hardness of computing the \({\varphi}\) -frequent probability of a subgraph, an approximate mining algorithm is proposed to produce an \({(\varepsilon, \delta)}\) -approximate set Π of “frequent subgraphs”, where \({0 < \varepsilon < \tau}\) is error tolerance, and 0 < δ < 1 is a confidence bound. The algorithm guarantees that (1) any frequent subgraph S is contained in Π with probability at least ((1 − δ) /2) s , where s is the number of edges in S; (2) any infrequent subgraph with \({\varphi}\) -frequent probability less than \({\tau - \varepsilon}\) is contained in Π with probability at most δ/2. The theoretical analysis shows that to obtain any frequent subgraph with probability at least 1 − Δ, the input parameter δ of the algorithm must be set to at most \({1 - 2 (1 - \Delta)^{1 / \ell_{\max}}}\) , where 0 < Δ < 1, and max is the maximum number of edges in frequent subgraphs. Extensive experiments on real uncertain graph data verify that the proposed algorithm is practically efficient and has very high approximation quality. Moreover, the difference between the probabilistic semantics and the expected semantics on mining frequent subgraphs over uncertain graph data has been discussed in this paper for the first time.

Keywords

Uncertain graph Frequent subgraph mining Probabilistic semantics \({\varphi}\)-frequent probability #P 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Proceedings of VLDB, pp. 1151–1154 (2006)Google Scholar
  2. 2.
    Alon N., Spencer J.H.: The Probabilistic Method. Wiley, New York (1992)zbMATHGoogle Scholar
  3. 3.
    Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Züfle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of KDD, pp. 119–128 (2009)Google Scholar
  4. 4.
    Birnbaum E., Lozinskii E.L.: The good old Davis–Putnam procedure helps counting models. J. Artif. Intell. Res. 10, 457–477 (1999)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of ICDM, pp. 51–58 (2002)Google Scholar
  6. 6.
    Chui, C.K., Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 64–75 (2008)Google Scholar
  7. 7.
    Chui, C.K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Proceedings of PAKDD, pp. 47–58 (2007)Google Scholar
  8. 8.
    Cormode, G., Garofalakis, M.N.: Sketching probabilistic data streams. In: Proceedings of SIGMOD Conference, pp. 281–292 (2007)Google Scholar
  9. 9.
    Garey M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)zbMATHGoogle Scholar
  10. 10.
    Gudes E., Shimony S.E., Vanetik N.: Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng. 18(11), 1441–1456 (2006)CrossRefGoogle Scholar
  11. 11.
    Hasan M.A., Zaki M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)Google Scholar
  12. 12.
    Hintsanen P., Toivonen H.: Finding reliable subgraphs from large probabilistic graphs. DMKD 17(1), 3–23 (2008)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hua, M., Pei, J.: Probabilistic path queries in road networks: traffic uncertainty aware path selection. In: Proceedings of EDBT, pp. 347–358 (2010)Google Scholar
  14. 14.
    Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of ICDM, pp. 549–552 (2003)Google Scholar
  15. 15.
    Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of KDD, pp. 581–586 (2004)Google Scholar
  16. 16.
    Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of PKDD, pp. 13–23 (2000)Google Scholar
  17. 17.
    Karp, R.M., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. In: Proceedings of FOCS, pp. 56–64 (1983)Google Scholar
  18. 18.
    Kimmig, A., Raedt, L.D.: Local query mining in a probabilistic prolog. In: Proceedings of IJCAI, pp. 1095–1100 (2009)Google Scholar
  19. 19.
    Koch C., Olteanu D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)Google Scholar
  20. 20.
    Kuramochi M., Karypis G.: An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng. 16(9), 1038–1051 (2004)CrossRefGoogle Scholar
  21. 21.
    Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of KDD, pp. 177–187 (2005)Google Scholar
  22. 22.
    Leung, C.K.S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: Proceedings of ICDE, pp. 1663–1670 (2009)Google Scholar
  23. 23.
    Luby M., Velickovic B.: On deterministic approximation of dnf. Algorithmica 16(4/5), 415–433 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of KDD, 647–652 (2004)Google Scholar
  25. 25.
    Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: Proceedings of ICDE, pp. 145–156 (2010)Google Scholar
  26. 26.
    Papapetrou, O., Ioannou, E., Skoutas, D.: Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of EDBT, pp. 355–366 (2011)Google Scholar
  27. 27.
    Poole, D.: Logic programming, abduction and probability. In: Proceedings of FGCS, pp. 530–538 (1992)Google Scholar
  28. 28.
    Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: k-nearest neighbors in uncertain graphs. In: Proceedings of VLDB (2010)Google Scholar
  29. 29.
    Raedt, L.D., Kimmig, A., Toivonen, H.: Problog: A probabilistic prolog and its application in link discovery. In: Proceedings of IJCAI, pp. 2462–2467 (2007)Google Scholar
  30. 30.
    Sun, L., Cheng, R., Cheung, D., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of KDD, pp. 273–282 (2010)Google Scholar
  31. 31.
    Trevisan, L.: A note on approximate counting for k-dnf. In: Proceedings of APPROX-RANDOM, pp. 417–426 (2004)Google Scholar
  32. 32.
    Ullmann J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Valiant L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of ICDM, pp. 721–724 (2002)Google Scholar
  35. 35.
    Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Proceedings of KDD, pp. 286–295 (2003)Google Scholar
  36. 36.
    Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of KDD, pp. 344–353 (2004)Google Scholar
  37. 37.
    Yuan, Y., Chen, L., Wang, G.: Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In: Proceedings of DASFAA, pp. 155–170 (2010)Google Scholar
  38. 38.
    Yuan Y., Wang G., Wang H., Chen L.: Efficient subgraph search over large uncertain graphs. PVLDB 4(11), 876–886 (2011)Google Scholar
  39. 39.
    Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Proceedings of SIGMOD, pp. 819–832 (2008)Google Scholar
  40. 40.
    Zhu, F., Yan, X., Han, J., Yu, P.S.: gprune: A constraint pushing framework for graph pattern mining. In: Proceedings of PAKDD, pp. 388–400 (2007)Google Scholar
  41. 41.
    Zou, Z., Gao, H., Li, J.: Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of KDD, pp. 633–642 (2010)Google Scholar
  42. 42.
    Zou, Z., Li, J., Gao, H., Zhang, S.: Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of CIKM, pp. 583–592 (2009)Google Scholar
  43. 43.
    Zou, Z., Li, J., Gao, H., Zhang, S.: Finding top-k maximal cliques in an uncertain graph. In: Proceedings of ICDE, pp. 649–652 (2010)Google Scholar
  44. 44.
    Zou Z., Li J., Gao H., Zhang S.: Mining frequent subgraph patterns from uncertain graph data. IEEE Trans. Knowl. Data Eng. 22(9), 1203–1218 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations