Skip to main content
Log in

Efficient mining of maximal frequent itemsets from databases on a cluster of workstations

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose two parallel algorithms for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. One parallel algorithm is named distributed max-miner (DMM), and it requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix tree can work with any local mining algorithm. Another parallel algorithm, named parallel max-miner (PMM), is a parallel version of the sequential max-miner algorithm (Proc of ACM SIGMOD Int Conf on Management of Data, 1998, pp 85–93). Most of existing mining algorithms discover the frequent k-itemsets on the kth pass over the databases, and then generate the candidate (k + 1)-itemsets for the next pass. Compared to those level-wise algorithms, PMM looks ahead at each pass and prunes more candidate itemsets by checking the frequencies of their supersets. Both DMM and PMM were implemented on a cluster of workstations, and their performance was evaluated for various cases. They demonstrate very good performance and scalability even when there are large maximal frequent itemsets (i.e., long patterns) in databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarwal RC, Aggarwal CC, Prasad VVV (2000) Depth first generation of long patterns. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 108–118

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499

  3. Agrawal R and Shafer JC (1996). Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6): 962–969

    Article  Google Scholar 

  4. Bayardo RJ (1998) Efficient mining long patterns from databases. In: Proceedings of ACM SIGMOD international conference on management of data, pp 85–93

  5. Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of ACM SIGMOD international conference on management of data, pp 255–264

  6. Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transaction databases. In: Proceedings of international conference on data engineering, pp 443–452

  7. Cheung DW, Han J, Ng V, Fu AW, Fu Y (1996) A fast distributed algorithm for mining association rules. In: Proceedings of the 4th international conference on parallel and distributed information systems, pp 31–43

  8. Cheung DW, Lee SD and Xiao Y (2002). Effect of data skewness and workload balance in parallel data mining. IEEE Trans Knowl Data Eng 14(3): 498–514

    Article  Google Scholar 

  9. Chung SM and Yang J (1996). A parallel distributive join algorithm for cube-connected multiprocessors. IEEE Trans Parallel Distrib Systems 7(2): 127–137

    Article  Google Scholar 

  10. Frequent Itemset Mining Implementations (FIMI) Repository (2004) http://fimi.cs.helsinki.fi/fimi04/

  11. Geurts K, Wets G, Brijs T, Vanhoof K (2003) Profiling high frequency accident locations using association rules. In: Proceedings of the 82nd annual meeting of the transportation research board, Washington, p 18

  12. Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: Proceedings of the 1st IEEE international conference on data mining, pp 163–170

  13. Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of international conference on data engineering, pp 512–521

  14. Han EH, Karypis G and Kumar V (2000). Scalable parallel data mining for association rules. IEEE Trans Knowl Data Eng 12(3): 337–352

    Article  Google Scholar 

  15. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD international conference on management of data, pp 1–12

  16. Holt JD and Chung SM (2001). Multipass algorithms for mining association rules in text databases. Knowl Inf Systems 3(2): 168–183

    Article  MATH  Google Scholar 

  17. Holt JD and Chung SM (2002). Mining association rules using inverted hashing and pruning. Inf Process Lett 83(4): 211–220

    Article  MATH  MathSciNet  Google Scholar 

  18. Lin D and Kedem ZM (2002). Pincer-Search: an efficient algorithm for discovering the maximal frequent set. IEEE Trans Knowl Data Eng 14(3): 553–566

    Article  Google Scholar 

  19. Snir M, Otto S, Huss-Lederman S, Walker D and Dongarra J (1996). MPI: the complete reference. MIT Press, Cambridge

    Google Scholar 

  20. Park JS, Chen MS, Yu PS (1995) Efficient parallel mining for association rules. In: Proceedings of international conference on information and knowledge management, pp 31–36

  21. Park JS, Chen MS and Yu PS (1997). Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5): 813–825

    Article  Google Scholar 

  22. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of international conference on database theory, LNCS, vol 1540, pp 398–416

  23. Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st VLDB conference, pp 432–444

  24. Thakur R, Rabenseifner R and Gropp W (2000). Optimization of collective communication operations in MPICH. Int J High Perform Comput Appl 19(1): 49–66

    Article  Google Scholar 

  25. Wang K, Xu C, Liu B (1999) Clustering transactions using large items. In: Proceedings of international conference on information and knowledge management, pp 483–490

  26. Yang Y, Guan X, You J (2002) CLOPE: a fast and effective clustering algorithm for transactional data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 682–687

  27. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. Computer Science Department Technical Rep. #651, University of Rochester

  28. Zaki MJ, Parthasarathy S, Ogihara M and Li W (1997). Parallel algorithms for fast discovery of association rules. Data Mining Knowl Discovery 1(4): 343–373

    Article  Google Scholar 

  29. Zaki MJ (2000). Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3): 372–390

    Article  MathSciNet  Google Scholar 

  30. Zaki MJ (2000) Generating non-redundant association rules. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 34–43

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soon M. Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, S.M., Luo, C. Efficient mining of maximal frequent itemsets from databases on a cluster of workstations. Knowl Inf Syst 16, 359–391 (2008). https://doi.org/10.1007/s10115-007-0115-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0115-1

Keywords

Navigation