Task-Parallel FP-Growth on Cluster Computers
Frequent itemset mining (FIM) is one of the most deeply studied data mining task. A number of algorithms, employing different approaches and advanced data structures, have already been proposed to solve the task efficiently. Even the fastest serial FIM algorithms fail to scale up with the rapid growth of database sizes. Hence, parallel FIM algorithms are the only viable solutions in many domains as serial so- lutions have almost reached the physical barriers. To this end, parallel versions of a few serial FIM algorithms, including FP-Growth, have al- ready been developed. In this study, we develop three different parallel FP-Growth implementations for cluster computers. They, all MPI based, are (i) Static Parallel FP-Growth, (ii) Dynamic Parallel FP-Growth, and (iii) (Tree-Sharing) Dynamic Parallel FP-Growth. All the three variants are task-parallel, i.e., not based on horizontal or vertical partitioning of database. The algorithms are experimentally evaluated on a 16-node cluster computer. Our results demonstrate the utility of the algorithms.
KeywordsFrequent Itemsets Cluster Computer Master Node Support Threshold Work Node
Unable to display preview. Download preview PDF.
- 1.R. Agrawal, T. Imielienski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD ’93, pages 207–216, 1993.Google Scholar
- 2.R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB’94, pages 487–499, 1994.Google Scholar
- 3.J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), pages 1–12, 2000.Google Scholar
- 4.Y-J. Lan and Y. Qiu. Parallel frequent itemsets mining algorithms without interme-diate results. In Proceedings of 2005 International Conference on Machine Learning and Cybernetics, pages 2102–2107, 2005.Google Scholar
- 5.H. Li, Y. Wang, D. Zhang, M. Zhang, and E.Y. Chang. Pfp: Parallel fp-growth for query recommendation. In Proceedings of the 2008 ACM Conference on Recommender Systems, pages 107–114, 2008.Google Scholar
- 6.G.O. Ozdogan, O. Abul, and A. Yazici. Paralel veri madenciligi algoritmalari. In Proceedings of the first National High-Performance and Grid Computing Conference, pages 131–137, 2009 (in Turkish).Google Scholar
- 7.I. Pramudiono and M. Kitsuregawa. Parallel fp-growth on pc cluster. In Proceedings of the 7th Pacific-Asia Conference of Knowledge Discovery and Data Mining, pages 467–473, 2003.Google Scholar
- 8.A. Savasere, E. Omiecinski, and S. Navathe. An e±cient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Databases (VLDB’95), pages 432–444, 1995.Google Scholar
- 9.O.R. Zaiane, M. El-Hajj, and P. Lu. Fast parallel association rule mining without candidacy generation. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 665–668, 2001.Google Scholar