Skip to main content
Log in

Accelerating frequent itemset mining on graphics processing units

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper we describe a new parallel Frequent Itemset Mining algorithm called “Frontier Expansion.” This implementation is optimized to achieve high performance on a heterogeneous platform consisting of a shared memory multiprocessor and multiple Graphics Processing Unit (GPU) coprocessors. Frontier Expansion is an improved data-parallel algorithm derived from the Equivalent Class Clustering (Eclat) method, in which a partial breadth-first search is utilized to exploit maximum parallelism while being constrained by the available memory capacity. In our approach, the vertical transaction lists are represented using a “bitset” representation and operated using wide bitwise operations across multiple threads on a GPU. We evaluate our approach using four NVIDIA Tesla GPUs and observed a 6–30× speedup relative to state-of-the-art sequential Eclat and FPGrowth implementations executed on a multicore CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8:962–969

    Article  Google Scholar 

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc of 20th intl conf on VLDB, pp 487–499

    Google Scholar 

  3. Ansari E, Dastghaibifard G (2008) Distributed frequent itemset mining using trie data structure. Int J Comput Sci 35(3):377–381

    Google Scholar 

  4. Aouad LM, Na L-k (2007) Distributed frequent itemsets mining in heterogeneous platforms. J Eng Comput Arch 1(2), ISSN: 1934–7197

    Google Scholar 

  5. Bart G (2004) Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/

  6. Bodon F (2005) A trie-based APRIORI implementation for mining frequent item sequences. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, OSDM ’05. ACM Press, New York, pp 56–65

    Chapter  Google Scholar 

  7. Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proc 1st IEEE ICDM workshop on frequent item set mining implementations (FIMI 2003), pp 90–99

    Google Scholar 

  8. Borgelt C, Kruse R (2002) Induction of association rules: apriori implementation. In: 15th conference on computational statistics, pp 395–400

    Google Scholar 

  9. Burdick D, Calimlim M (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings 17th international conference on data engineering, pp 443–452

    Chapter  Google Scholar 

  10. Craus M (2008) A new parallel algorithm for the frequent itemset mining problem. In: International symposium on parallel and distributed computing, 2008, ISPDC ’08, pp 165–170

    Chapter  Google Scholar 

  11. Fang W, Lu M (2009) Frequent itemset mining on graphics processors. In: Proceedings of the fifth international workshop on data management on new hardware, DaMoN ’09. ACM Press, New York, pp 34–42

    Chapter  Google Scholar 

  12. Fiat A, Shporer S (2003) AIM: another itemset miner. In: IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03)

    Google Scholar 

  13. Goethals B, Zaki MJ (2004) Advances in frequent itemset mining implementations: report on fimi’03. ACM SIGKDD Explor Newsl 6(1):109–117

    Article  Google Scholar 

  14. Han J, Pei J (2004) Mining frequent patterns without candidate generation: a Frequent-Pattern tree approach. Data Min Knowl Discov 8:53–87

    Article  MathSciNet  Google Scholar 

  15. Kosters WA, Pijls W (2003) APRIORI, a depth first implementation. In: Proc of the workshop on frequent itemset mining implementations

    Google Scholar 

  16. Liu L, Li E (2007) Optimization of frequent itemset mining on Multiple-Core processor. In: VLDB ’07, pp 1275–1285

    Google Scholar 

  17. NVIDIA (2011) NVIDIA CUDA compute unified device architecture programming guide. NVIDIA, Santa Clara

    Google Scholar 

  18. Parthasarathy S, Zaki MJ (1996) Parallel data mining for association rules on shared-memory multiprocessors. In: Proc Supercomputing’96, pp 43–64

    Google Scholar 

  19. Pramudiono I, Kitsuregawa M (2003) Parallel FP-Growth on PC cluster. In: Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 2637. Springer, Berlin/Heidelberg, pp 467–473

    Chapter  Google Scholar 

  20. Salvatore O, Claudio L (2003) kdci: a multi-strategy algorithm for mining frequent sets. In: Goethals B, Zaki MJ (eds) FIMI 03, frequent itemset mining implementations. Proceedings of the ICDM 2003 workshop on frequent itemset mining implementations, 19 December 2003, Melbourne, Florida, USA, CEUR-WS.org, CEUR workshop proceedings, vol 90

    Google Scholar 

  21. Sucahyo YG, Gopalan RP (2003) Efficiently mining frequent patterns from dense datasets using a cluster of computers. In: Australian conference on artificial intelligence’03, pp 233–244

    Google Scholar 

  22. Ye Y, Chiang C (2006) A parallel apriori algorithm for frequent itemsets mining. In: Fourth international conference on software engineering research, management and applications, pp 87–94

    Google Scholar 

  23. Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proc SIGKDD, pp 326–335

    Google Scholar 

  24. Zaki MJ, Parthasarathyi S (1997) New algorithms for fast discovery of association rules. In: 3rd intl conf on knowledge discovery and data mining. AAAI Press, Menlo Park, pp 283–286

    Google Scholar 

  25. Zhang F, Zhang Y, Bakos J (2012) Gpapriori: Gpu-accelerated frequent itemset mining. In: IEEE international conference on cluster computing, pp 590–594

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant Nos. CCF-0844951 and CCF-091560.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason D. Bakos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, F., Zhang, Y. & Bakos, J.D. Accelerating frequent itemset mining on graphics processing units. J Supercomput 66, 94–117 (2013). https://doi.org/10.1007/s11227-013-0887-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-0887-x

Keywords

Navigation