Abstract
Searching frequent patterns in transactional databases is considered as one of the most important data mining problems and Apriori is one of the typical algorithms for this task. Developing fast and efficient algorithms that can handle large volumes of data becomes a challenging task due to the large databases. In this paper, we implement a parallel Apriori algorithm based on MapReduce, which is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes). The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.
Article PDF
Avoid common mistakes on your manuscript.
References
Yanbin Ye, Chia-Chu Chiang, A Parallel Apriori Algorithm for Frequent Itemsets Mining, Proceedings of the Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06), pp. 87–93,2006
Jiawei Han and Micheline Kamber. Data Mining, Concepts and Techniques. Morgan Kaufmann, 2001
Dean J., Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters. In: Proc. of Operating Systems Design and Implementation, San Francisco, CA, pp. 137–150 , 2004
Lammel, R. Google’s MapReduce Programming Model -Revisited. Science of Computer Programming 70, 1–30, 2008
Borthakur, D. The Hadoop Distributed File System: Architecture and Design, 2007
Q. He, F.Z. Zhuang, J.C. Li, Z.Z. Shi. Parallel implementation of classification algorithms based on MapReduce. RSKT, LNAI 6401, pp. 655–662, 2010
W. Z. Zhao, H. F. Ma, Q. He. Parallel k-means clustering based on MapReduce. In CloudCom’09: Proceedings of the 1st International Conference on Cloud Computing, pp. 674–679, Berlin, Heidelberg, 2009
R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Database,” Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Vol.22, Issue 2, pp. 207–216, 1993
Osmar R. Zaiane, Mohammad El-Hajj, Paul Lu. Fast Parallel Association Rule Mining Without Candidacy Generation, Technique Report
Ghemawat, S., Gobioff, H., Leung, S. The Google File System. In: Symposium on Operating Systems Principles, pp. 29–43, 2003
Hadoop: Open source implementation of MapReduce, Available: http://hadoop.apache.org, June 24, 2010
Q. He, Q. Tan, X.D. Ma, Z.Z. Shi. The high-activity parallel implementation of data preprocessing based on MapReduce. RSKT, LNAI 6401, pp. 646–654, 2010
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. VeryLarge Data Bases, pages 487–499, Santiago, Chile, September 1994
Rakesh Agrawa, John C.shafer. Parallel Mining of Association Rules, IEEE transactions on knowledge and data engineering, Vol. 8, No.6, pp.962–969,1996
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Li, N., Zeng, L., He, Q. et al. Parallel Implementation of Apriori Algorithm Based on MapReduce. Int J Netw Distrib Comput 1, 89–96 (2013). https://doi.org/10.2991/ijndc.2013.1.2.3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2991/ijndc.2013.1.2.3