Advertisement

Efficient Pattern Mining of Uncertain Data with Sampling

  • Toon Calders
  • Calin Garboni
  • Bart Goethals
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6118)

Abstract

Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic solutions. In the case of uncertain data, however, several new techniques have been proposed. Unfortunately, these proposals often suffer when a lot of items occur with many different probabilities. Here we propose an approach based on sampling by instantiating “possible worlds” of the uncertain data, on which we subsequently run optimized frequent itemset mining algorithms. As such we gain efficiency at a surprisingly low loss in accuracy. These is confirmed by a statistical and an empirical evaluation on real and synthetic data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: Proc. of KDD 2009, pp. 29–38. ACM, New York (2009)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc, San Francisco (1994)Google Scholar
  3. 3.
    Chui, C.K., Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In: Washio, et al [9], pp. 64–75Google Scholar
  4. 4.
    Chui, C.K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Goethals, B.: Frequent set mining. In: The Data Mining and Knowledge Discovery Handbook, ch. 17, pp. 377–397. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, et al [9], pp. 653–661Google Scholar
  8. 8.
    Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proc. of ICDM 2001, Washington, DC, USA, pp. 441–448. IEEE Computer Society, Los Alamitos (2001)Google Scholar
  9. 9.
    Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.): PAKDD 2008. LNCS (LNAI), vol. 5012. Springer, Heidelberg (2008)Google Scholar
  10. 10.
    Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) Proc. of KDD 2003, pp. 326–335. ACM, New York (2003)CrossRefGoogle Scholar
  11. 11.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of KDD 1997, pp. 283–286 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Toon Calders
    • 1
  • Calin Garboni
    • 2
  • Bart Goethals
    • 2
  1. 1.TU EindhovenThe Netherlands
  2. 2.University of AntwerpBelgium

Personalised recommendations