An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets

Orlando, Salvatore; Palmerini, Paolo; Perego, Raffaele; Silvestri, Fabrizio

doi:10.1007/3-540-36569-9_28

Salvatore Orlando⁷,
Paolo Palmerini^7,8,
Raffaele Perego⁸ &
…
Fabrizio Silvestri^8,9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2565))

Included in the following conference series:

International Conference on High Performance Computing for Computational Science

696 Accesses
5 Citations

Abstract

Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, which in many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm for counting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count & Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computing platform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets). ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and by relying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform. We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploits the memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducing communication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances under a variety of conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo. Fast Discovery of Association Rules in Large Databases. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996. 422
Google Scholar
R. Agrawal and J. C. Shafer. Parallel Mining of Association Rules. IEEE Transaction On Knowledge And Data Engineering, 8:962–969, 1996. 424, 425, 426
Article Google Scholar
R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. of the 20th VLDB Conf., pages 487–499, 1994. 422
Google Scholar
R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations. In Proc. of the 3rd Work. on High Performance Data Mining, (IPDPS-2000), Cancun, Mexico, pages 350–357. LNCS 1800 Spinger-Verlag, 2000. 426
Google Scholar
R. J. Bayardo. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pages 85–93, 1998. 422
Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smith, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1998. 421
Google Scholar
V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999. 421
Google Scholar
E. H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337–352, May/June 2000. 422, 424
Article Google Scholar
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Dallas, Texas, USA, 2000. 426, 433
Google Scholar
Hipp, J. and Güntzer, U. and Nakhaeizadeh, G. Algorithms for Association Rule Mining-A General Survey and Comparison. SIGKDD Explorations, 2(1):58–64, June 2000. 425
Article Google Scholar
S. Orlando, P. Palmerini, and R. Perego. Enhancing the Apriori Algorithm for Frequent Set Counting. In Proc. of the 3rd Int. Conf. on Data Warehousing and Knowledge Discovery, LNCS 2114, pages 71–82, Germany, 2001. 423, 425, 426
Google Scholar
S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and Resource-Aware Mining of Frequent Sets. In Proc. of the 2002 IEEE Int. Conference on Data Mining (ICDM’02), Maebashi City, Japan, Dec. 2002. 423, 424, 428, 430, 433
Google Scholar
J. S. Park, M.-S. Chen, and P. S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of the 1995 ACM SIGMOD Int. Conf. on Management of Data, pages 175–186, 1995. 422
Google Scholar
A. Savasere, E. Omiecinski, and S. B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proc. of the 21th VLDB Conf., pages 432–444, Zurich, Switzerland, 1995. 422, 423
Google Scholar
M. J. Zaki. Parallel and Distributed Association Mining: A Survey. IEEE Concurrency, 7(4):14–25, 1999. 424
Article Google Scholar
M. J. Zaki. Scalable Algorithms for Association Mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000. 422
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Universitá Ca’ Foscari, Venezia, Italy
Salvatore Orlando & Paolo Palmerini
Istituto CNUCE, Consiglio Nazionale delle Ricerche (CNR), Pisa, Italy
Paolo Palmerini, Raffaele Perego & Fabrizio Silvestri
Dipartimento di Informatica, Universitá di Pisa, Italy
Fabrizio Silvestri

Authors

Salvatore Orlando
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Palmerini
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Perego
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Silvestri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculdade de Engenharia da, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
José M. L. M. Palma & A. Augusto Sousa &
Department of Computer Science, University of Tennessee, 37996-1301, Knoxville, TN, USA
Jack Dongarra
Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Camino de Vera, s/n, Apartado 22012, 46020, Valencia, Spain
Vicente Hernández

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Orlando, S., Palmerini, P., Perego, R., Silvestri, F. (2003). An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets. In: Palma, J.M.L.M., Sousa, A.A., Dongarra, J., Hernández, V. (eds) High Performance Computing for Computational Science — VECPAR 2002. VECPAR 2002. Lecture Notes in Computer Science, vol 2565. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36569-9_28

Download citation

DOI: https://doi.org/10.1007/3-540-36569-9_28
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00852-1
Online ISBN: 978-3-540-36569-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics