Static Load Balancing of Parallel Mining of Frequent Itemsets Using Reservoir Sampling

  • Robert Kessl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6871)

Abstract

In this paper, we present a novel method for parallelization of an arbitrary depth-first search (DFS in short) algorithm for mining of all FIs. The method is based on the so called reservoir sampling algorithm. The reservoir sampling algorithm in combination with an arbitrary DFS mining algorithm executed on a database sample takes an uniformly but not independently distributed sample of all FIs using the reservoir sampling. The sample is then used for static load-balancing of the computational load of a DFS algorithm for mining of all FIs.

Keywords

Association Rule Frequent Itemsets Sequential Algorithm Frequent Itemset Mining Coverage Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  2. 2.
    Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions On Knowledge And Data Engineering 8(6), 962–969 (1996)CrossRefGoogle Scholar
  3. 3.
    Kessl, R., Tvrdík, P.: Probabilistic load balancing method for parallel mining of all frequent itemsets. In: PDCS 2006: Proceedings of the 18th IASTED International Conference on Parallel and Distributed Computing and Systems, Anaheim, CA, USA, pp. 578–586. ACTA Press (2006)Google Scholar
  4. 4.
    Kessl, R., Tvrdík, P.: Toward more parallel frequent itemset mining algorithms. In: PDCS 2007: Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, Anaheim, CA, USA, pp. 97–103. ACTA Press (2007)Google Scholar
  5. 5.
    Cheung, D.W.-L., Lee, S.D., Xiao, Y.: Effect of data skewness and workload balance in parallel data mining. Knowledge and Data Engineering 14(3), 498–514 (2002)CrossRefGoogle Scholar
  6. 6.
    Cheung, D.W.-L., Xiao, Y.: Effect of data distribution in parallel mining of associations. Data Mining and Knowledge Discovery 3(3), 291–314 (1999)CrossRefGoogle Scholar
  7. 7.
    Zaki, M.J., Parthasarathy, S., Li, W.: A localized algorithm for parallel association mining. In: ACM Symposium on Parallel Algorithms and Architectures, pp. 321–330. ACM Press, New York (1997)Google Scholar
  8. 8.
    Javed, A., Khokhar, A.: Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel Databases 16(3), 321–334 (2004)CrossRefGoogle Scholar
  9. 9.
    Veloso, A.: New parallel algorithms for frequent itemset mining in large databases. In: Proceedings of the Symposium on Computer Architectures and High Performance Computing, pp. 158–166 (2003)Google Scholar
  10. 10.
    Toivonen, H.: Sampling large databases for association rules. In: Proceedings of International Conference on Very Large Data Bases, pp. 134–145. Morgan Kaufman, San Francisco (1996)Google Scholar
  11. 11.
    Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57Google Scholar
  12. 12.
    Chvátal, V.: The tail of the hypergeometric distribution. Discrete Mathematics 25(3), 285–287 (1979)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Skala, M.: Hypergeometric tail inequalities: ending the insanity. Published on-line, http://ansuz.sooke.bc.ca/professional/hypergeometric.pdf
  14. 14.
    Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge university press, Cambridge (1995)CrossRefMATHGoogle Scholar
  15. 15.
    Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics 17(2), 416–429 (1969)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
  17. 17.
    Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: FIMI 2003, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations. CEUR Workshop Proceedings, vol. 90 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Robert Kessl
    • 1
  1. 1.Institute of Computer ScienceCzech Academy of SciencePrague 8Czech Republic

Personalised recommendations