Abstract
In this paper we describe a new parallel Frequent Itemset Mining algorithm called “Frontier Expansion.” This implementation is optimized to achieve high performance on a heterogeneous platform consisting of a shared memory multiprocessor and multiple Graphics Processing Unit (GPU) coprocessors. Frontier Expansion is an improved data-parallel algorithm derived from the Equivalent Class Clustering (Eclat) method, in which a partial breadth-first search is utilized to exploit maximum parallelism while being constrained by the available memory capacity. In our approach, the vertical transaction lists are represented using a “bitset” representation and operated using wide bitwise operations across multiple threads on a GPU. We evaluate our approach using four NVIDIA Tesla GPUs and observed a 6–30× speedup relative to state-of-the-art sequential Eclat and FPGrowth implementations executed on a multicore CPU.
Similar content being viewed by others
References
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8:962–969
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc of 20th intl conf on VLDB, pp 487–499
Ansari E, Dastghaibifard G (2008) Distributed frequent itemset mining using trie data structure. Int J Comput Sci 35(3):377–381
Aouad LM, Na L-k (2007) Distributed frequent itemsets mining in heterogeneous platforms. J Eng Comput Arch 1(2), ISSN: 1934–7197
Bart G (2004) Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/
Bodon F (2005) A trie-based APRIORI implementation for mining frequent item sequences. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, OSDM ’05. ACM Press, New York, pp 56–65
Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proc 1st IEEE ICDM workshop on frequent item set mining implementations (FIMI 2003), pp 90–99
Borgelt C, Kruse R (2002) Induction of association rules: apriori implementation. In: 15th conference on computational statistics, pp 395–400
Burdick D, Calimlim M (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings 17th international conference on data engineering, pp 443–452
Craus M (2008) A new parallel algorithm for the frequent itemset mining problem. In: International symposium on parallel and distributed computing, 2008, ISPDC ’08, pp 165–170
Fang W, Lu M (2009) Frequent itemset mining on graphics processors. In: Proceedings of the fifth international workshop on data management on new hardware, DaMoN ’09. ACM Press, New York, pp 34–42
Fiat A, Shporer S (2003) AIM: another itemset miner. In: IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03)
Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: report on fimi’03. ACM SIGKDD Explor Newsl 6(1):109–117
Han J, Pei J (2004) Mining frequent patterns without candidate generation: a Frequent-Pattern tree approach. Data Min Knowl Discov 8:53–87
Kosters WA, Pijls W (2003) APRIORI, a depth first implementation. In: Proc of the workshop on frequent itemset mining implementations
Liu L, Li E (2007) Optimization of frequent itemset mining on Multiple-Core processor. In: VLDB ’07, pp 1275–1285
NVIDIA (2011) NVIDIA CUDA compute unified device architecture programming guide. NVIDIA, Santa Clara
Parthasarathy S, Zaki MJ (1996) Parallel data mining for association rules on shared-memory multiprocessors. In: Proc Supercomputing’96, pp 43–64
Pramudiono I, Kitsuregawa M (2003) Parallel FP-Growth on PC cluster. In: Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 2637. Springer, Berlin/Heidelberg, pp 467–473
Salvatore O, Claudio L (2003) kdci: a multi-strategy algorithm for mining frequent sets. In: Goethals B, Zaki MJ (eds) FIMI 03, frequent itemset mining implementations. Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations, 19 December 2003, Melbourne, Florida, USA, CEUR-WS.org, CEUR workshop proceedings, vol 90
Sucahyo YG, Gopalan RP (2003) Efficiently mining frequent patterns from dense datasets using a cluster of computers. In: Australian conference on artificial intelligence’03, pp 233–244
Ye Y, Chiang C (2006) A parallel apriori algorithm for frequent itemsets mining. In: Fourth international conference on software engineering research, management and applications, pp 87–94
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proc SIGKDD, pp 326–335
Zaki MJ, Parthasarathyi S (1997) New algorithms for fast discovery of association rules. In: 3rd intl conf on knowledge discovery and data mining. AAAI Press, Menlo Park, pp 283–286
Zhang F, Zhang Y, Bakos J (2012) Gpapriori: Gpu-accelerated frequent itemset mining. In: IEEE international conference on cluster computing, pp 590–594
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant Nos. CCF-0844951 and CCF-091560.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, F., Zhang, Y. & Bakos, J.D. Accelerating frequent itemset mining on graphics processing units. J Supercomput 66, 94–117 (2013). https://doi.org/10.1007/s11227-013-0887-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0887-x