An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets

  • Salvatore Orlando
  • Paolo Palmerini
  • Raffaele Perego
  • Fabrizio Silvestri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2565)

Abstract

Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, which in many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm for counting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count & Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computing platform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets). ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and by relying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform. We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploits the memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducing communication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances under a variety of conditions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo. Fast Discovery of Association Rules in Large Databases. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996. 422Google Scholar
  2. [2]
    R. Agrawal and J. C. Shafer. Parallel Mining of Association Rules. IEEE Transaction On Knowledge And Data Engineering, 8:962–969, 1996. 424, 425, 426CrossRefGoogle Scholar
  3. [3]
    R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. of the 20th VLDB Conf., pages 487–499, 1994. 422Google Scholar
  4. [4]
    R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations. In Proc. of the 3rd Work. on High Performance Data Mining, (IPDPS-2000), Cancun, Mexico, pages 350–357. LNCS 1800 Spinger-Verlag, 2000. 426Google Scholar
  5. [5]
    R. J. Bayardo. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pages 85–93, 1998. 422Google Scholar
  6. [6]
    U. M. Fayyad, G. Piatetsky-Shapiro, P. Smith, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1998. 421Google Scholar
  7. [7]
    V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999. 421Google Scholar
  8. [8]
    E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337–352, May/June 2000. 422, 424CrossRefGoogle Scholar
  9. [9]
    J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Dallas, Texas, USA, 2000. 426, 433Google Scholar
  10. [10]
    Hipp, J. and Güntzer, U. and Nakhaeizadeh, G. Algorithms for Association Rule Mining-A General Survey and Comparison. SIGKDD Explorations, 2(1):58–64, June 2000. 425CrossRefGoogle Scholar
  11. [11]
    S. Orlando, P. Palmerini, and R. Perego. Enhancing the Apriori Algorithm for Frequent Set Counting. In Proc. of the 3rd Int. Conf. on Data Warehousing and Knowledge Discovery, LNCS 2114, pages 71–82, Germany, 2001. 423, 425, 426Google Scholar
  12. [12]
    S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and Resource-Aware Mining of Frequent Sets. In Proc. of the 2002 IEEE Int. Conference on Data Mining (ICDM’02), Maebashi City, Japan, Dec. 2002. 423, 424, 428, 430, 433Google Scholar
  13. [13]
    J. S. Park, M.-S. Chen, and P. S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of the 1995 ACM SIGMOD Int. Conf. on Management of Data, pages 175–186, 1995. 422Google Scholar
  14. [14]
    A. Savasere, E. Omiecinski, and S. B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proc. of the 21th VLDB Conf., pages 432–444, Zurich, Switzerland, 1995. 422, 423Google Scholar
  15. [15]
    M. J. Zaki. Parallel and Distributed Association Mining: A Survey. IEEE Concurrency, 7(4):14–25, 1999. 424CrossRefGoogle Scholar
  16. [16]
    M. J. Zaki. Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000. 422CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Salvatore Orlando
    • 1
  • Paolo Palmerini
    • 1
    • 2
  • Raffaele Perego
    • 2
  • Fabrizio Silvestri
    • 2
    • 3
  1. 1.Dipartimento di InformaticaUniversitá Ca’ FoscariVeneziaItaly
  2. 2.Istituto CNUCE, Consiglio Nazionale delle Ricerche (CNR)PisaItaly
  3. 3.Dipartimento di InformaticaUniversitá di PisaItaly

Personalised recommendations