Skip to main content

An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets

  • Conference paper
  • First Online:
High Performance Computing for Computational Science — VECPAR 2002 (VECPAR 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2565))

Abstract

Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, which in many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm for counting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count & Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computing platform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets). ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and by relying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform. We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploits the memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducing communication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances under a variety of conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo. Fast Discovery of Association Rules in Large Databases. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996. 422

    Google Scholar 

  2. R. Agrawal and J. C. Shafer. Parallel Mining of Association Rules. IEEE Transaction On Knowledge And Data Engineering, 8:962–969, 1996. 424, 425, 426

    Article  Google Scholar 

  3. R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. of the 20th VLDB Conf., pages 487–499, 1994. 422

    Google Scholar 

  4. R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations. In Proc. of the 3rd Work. on High Performance Data Mining, (IPDPS-2000), Cancun, Mexico, pages 350–357. LNCS 1800 Spinger-Verlag, 2000. 426

    Google Scholar 

  5. R. J. Bayardo. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pages 85–93, 1998. 422

    Google Scholar 

  6. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smith, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1998. 421

    Google Scholar 

  7. V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999. 421

    Google Scholar 

  8. E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337–352, May/June 2000. 422, 424

    Article  Google Scholar 

  9. J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Dallas, Texas, USA, 2000. 426, 433

    Google Scholar 

  10. Hipp, J. and Güntzer, U. and Nakhaeizadeh, G. Algorithms for Association Rule Mining-A General Survey and Comparison. SIGKDD Explorations, 2(1):58–64, June 2000. 425

    Article  Google Scholar 

  11. S. Orlando, P. Palmerini, and R. Perego. Enhancing the Apriori Algorithm for Frequent Set Counting. In Proc. of the 3rd Int. Conf. on Data Warehousing and Knowledge Discovery, LNCS 2114, pages 71–82, Germany, 2001. 423, 425, 426

    Google Scholar 

  12. S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and Resource-Aware Mining of Frequent Sets. In Proc. of the 2002 IEEE Int. Conference on Data Mining (ICDM’02), Maebashi City, Japan, Dec. 2002. 423, 424, 428, 430, 433

    Google Scholar 

  13. J. S. Park, M.-S. Chen, and P. S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of the 1995 ACM SIGMOD Int. Conf. on Management of Data, pages 175–186, 1995. 422

    Google Scholar 

  14. A. Savasere, E. Omiecinski, and S. B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proc. of the 21th VLDB Conf., pages 432–444, Zurich, Switzerland, 1995. 422, 423

    Google Scholar 

  15. M. J. Zaki. Parallel and Distributed Association Mining: A Survey. IEEE Concurrency, 7(4):14–25, 1999. 424

    Article  Google Scholar 

  16. M. J. Zaki. Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000. 422

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Orlando, S., Palmerini, P., Perego, R., Silvestri, F. (2003). An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds) High Performance Computing for Computational Science — VECPAR 2002. VECPAR 2002. Lecture Notes in Computer Science, vol 2565. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36569-9_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-36569-9_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00852-1

  • Online ISBN: 978-3-540-36569-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics