Static Load Balancing of Parallel Mining of Frequent Itemsets Using Reservoir Sampling
Conference paper
Abstract
In this paper, we present a novel method for parallelization of an arbitrary depth-first search (DFS in short) algorithm for mining of all FIs. The method is based on the so called reservoir sampling algorithm. The reservoir sampling algorithm in combination with an arbitrary DFS mining algorithm executed on a database sample takes an uniformly but not independently distributed sample of all FIs using the reservoir sampling. The sample is then used for static load-balancing of the computational load of a DFS algorithm for mining of all FIs.
Keywords
Association Rule Frequent Itemsets Sequential Algorithm Frequent Itemset Mining Coverage Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
- 2.Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions On Knowledge And Data Engineering 8(6), 962–969 (1996)CrossRefGoogle Scholar
- 3.Kessl, R., Tvrdík, P.: Probabilistic load balancing method for parallel mining of all frequent itemsets. In: PDCS 2006: Proceedings of the 18th IASTED International Conference on Parallel and Distributed Computing and Systems, Anaheim, CA, USA, pp. 578–586. ACTA Press (2006)Google Scholar
- 4.Kessl, R., Tvrdík, P.: Toward more parallel frequent itemset mining algorithms. In: PDCS 2007: Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, Anaheim, CA, USA, pp. 97–103. ACTA Press (2007)Google Scholar
- 5.Cheung, D.W.-L., Lee, S.D., Xiao, Y.: Effect of data skewness and workload balance in parallel data mining. Knowledge and Data Engineering 14(3), 498–514 (2002)CrossRefGoogle Scholar
- 6.Cheung, D.W.-L., Xiao, Y.: Effect of data distribution in parallel mining of associations. Data Mining and Knowledge Discovery 3(3), 291–314 (1999)CrossRefGoogle Scholar
- 7.Zaki, M.J., Parthasarathy, S., Li, W.: A localized algorithm for parallel association mining. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 321–330. ACM Press, New York (1997)Google Scholar
- 8.Javed, A., Khokhar, A.: Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel Databases 16(3), 321–334 (2004)CrossRefGoogle Scholar
- 9.Veloso, A.: New parallel algorithms for frequent itemset mining in large databases. In: Proceedings of the Symposium on Computer Architectures and High Performance Computing, pp. 158–166 (2003)Google Scholar
- 10.Toivonen, H.: Sampling large databases for association rules. In: Proceedings of International Conference on Very Large Data Bases, pp. 134–145. Morgan Kaufman, San Francisco (1996)Google Scholar
- 11.Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57Google Scholar
- 12.Chvátal, V.: The tail of the hypergeometric distribution. Discrete Mathematics 25(3), 285–287 (1979)MathSciNetCrossRefMATHGoogle Scholar
- 13.Skala, M.: Hypergeometric tail inequalities: ending the insanity. Published on-line, http://ansuz.sooke.bc.ca/professional/hypergeometric.pdf
- 14.Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge university press, Cambridge (1995)CrossRefMATHGoogle Scholar
- 15.Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics 17(2), 416–429 (1969)MathSciNetCrossRefMATHGoogle Scholar
- 16.
- 17.Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: FIMI 2003, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations. CEUR Workshop Proceedings, vol. 90 (2003)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2011