Abstract
Mining frequent itemsets in large databases is a widely used technique in Data Mining. Several sequential and parallel algorithms have been developed, although, when dealing with high data volumes, the execution of those algorithms takes more time and resources than expected. Because of this, finding alternatives to speed up the execution time of those algorithms is an active topic of research. Previous attempts of acceleration using custom architectures have been limited because of the nature of the algorithms that have been conceived sequentially and do not exploit the intrinsic parallelism that the hardware provides. The innovation in this paper is a highly parallel algorithm that utilizes a vertical bit vector (VBV) data layout and its feasibility for making support counting. Our results show that for dense databases a custom architecture for this algorithm can perform faster than the fastest architecture reported in previous works by one order of magnitude.
Chapter PDF
Similar content being viewed by others
References
Agrawal, R., Shafer, J.C.: Parallel mining of association rules design, implementation and experience. Technical Report RJ10004, IBM Research Report (February 1996)
Baker, Z.K., Prasanna, V.K.: Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs. In: Proc. of the 13th Annual IEEE Symposium on Field Programmable Custom Computing Machines 2005 (FCCM ’05), pp. 3–12 (2005)
Baker, Z.K., Prasanna, V.K.: An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing System. In: Proc. of the 14th Annual IEEE Symposium on Field Programmable Custom Computing Machines 2006 (FCCM ’06), pp. 67–75 (2006)
Goethals, B.: Frequent itemset mining dataset repository, http://fimi.cs.helsinki.fi/data/
Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proc. of the ACM SIGMOD Conference, pp. 277–288 (1997)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: 2000 ACM SIGMOD Intl. Conf. on Management of Data, pp. 1–12. ACM Press, New York (2000)
Palancar, J.H., Tormo, O.F., Cárdenas, J.F., León, R.H.: Distributed and shared memory algorithm for parallel mining of association rules. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 349–363. Springer, Heidelberg (2007)
Park, J., Chen, M., Yu, P.: An effective hash based algorithm for mining association rules. In: Carey, M.J., Schneider, D.A. (eds.) SIGMOD Conference, pp. 175–186. ACM Press, New York (1995)
Sun, S., Steffen, M., Zambreno, J.: A reconfigurable platform for frequent pattern mining. In: RECONFIG ’08: Proc. of the 2008 Intl. Conf. on Reconfigurable Computing and FPGAs, pp. 55–60. IEEE Computer Society, Los Alamitos (2008)
Sun, S., Zambreno, J.: Mining association rules with systolic trees. In: Proc. of the Intl. Conf. on Field-Programmable Logic and its Applications (FPL), pp. 143–148. IEEE, Los Alamitos (2008)
Wen, Y., Huang, J., Chen, M.: Hardware-enhanced association rule mining with hashing and pipelining. IEEE Trans. on Knowl. and Data Eng. 20(6), 784–795 (2008)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of the 3rd Intl. Conf. on KDD and Data Mining (KDD’97), pp. 283–286 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mesa, A., Feregrino-Uribe, C., Cumplido, R., Hernández-Palancar, J. (2010). A Highly Parallel Algorithm for Frequent Itemset Mining. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Kittler, J. (eds) Advances in Pattern Recognition. MCPR 2010. Lecture Notes in Computer Science, vol 6256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15992-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-15992-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15991-6
Online ISBN: 978-3-642-15992-3
eBook Packages: Computer ScienceComputer Science (R0)