Efficient Pattern Mining of Uncertain Data with Sampling
Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic solutions. In the case of uncertain data, however, several new techniques have been proposed. Unfortunately, these proposals often suffer when a lot of items occur with many different probabilities. Here we propose an approach based on sampling by instantiating “possible worlds” of the uncertain data, on which we subsequently run optimized frequent itemset mining algorithms. As such we gain efficiency at a surprisingly low loss in accuracy. These is confirmed by a statistical and an empirical evaluation on real and synthetic data.
Unable to display preview. Download preview PDF.
- 2.Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. of VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc, San Francisco (1994)Google Scholar
- 3.Chui, C.K., Kao, B.: A decremental approach for mining frequent itemsets from uncertain data. In: Washio, et al , pp. 64–75Google Scholar
- 7.Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, et al , pp. 653–661Google Scholar
- 8.Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proc. of ICDM 2001, Washington, DC, USA, pp. 441–448. IEEE Computer Society, Los Alamitos (2001)Google Scholar
- 9.Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.): PAKDD 2008. LNCS (LNAI), vol. 5012. Springer, Heidelberg (2008)Google Scholar
- 11.Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of KDD 1997, pp. 283–286 (1997)Google Scholar