Parallel Implementation of Apriori Algorithm Based on MapReduce

Li, Ning; Zeng, Li; He, Qing; Shi, Zhongzhi

doi:10.2991/ijndc.2013.1.2.3

Parallel Implementation of Apriori Algorithm Based on MapReduce

Research Article
Open access
Published: 01 April 2013

Volume 1, pages 89–96, (2013)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Networked and Distributed Computing Aims and scope Submit manuscript

Parallel Implementation of Apriori Algorithm Based on MapReduce

Download PDF

Ning Li^1,2,3,
Li Zeng¹,
Qing He¹ &
…
Zhongzhi Shi¹

Abstract

Searching frequent patterns in transactional databases is considered as one of the most important data mining problems and Apriori is one of the typical algorithms for this task. Developing fast and efficient algorithms that can handle large volumes of data becomes a challenging task due to the large databases. In this paper, we implement a parallel Apriori algorithm based on MapReduce, which is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes). The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Yanbin Ye, Chia-Chu Chiang, A Parallel Apriori Algorithm for Frequent Itemsets Mining, Proceedings of the Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06), pp. 87–93,2006
Jiawei Han and Micheline Kamber. Data Mining, Concepts and Techniques. Morgan Kaufmann, 2001
Dean J., Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters. In: Proc. of Operating Systems Design and Implementation, San Francisco, CA, pp. 137–150 , 2004
Lammel, R. Google’s MapReduce Programming Model -Revisited. Science of Computer Programming 70, 1–30, 2008
Google Scholar
Borthakur, D. The Hadoop Distributed File System: Architecture and Design, 2007
Q. He, F.Z. Zhuang, J.C. Li, Z.Z. Shi. Parallel implementation of classification algorithms based on MapReduce. RSKT, LNAI 6401, pp. 655–662, 2010
W. Z. Zhao, H. F. Ma, Q. He. Parallel k-means clustering based on MapReduce. In CloudCom’09: Proceedings of the 1st International Conference on Cloud Computing, pp. 674–679, Berlin, Heidelberg, 2009
R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Database,” Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Vol.22, Issue 2, pp. 207–216, 1993
Google Scholar
Osmar R. Zaiane, Mohammad El-Hajj, Paul Lu. Fast Parallel Association Rule Mining Without Candidacy Generation, Technique Report
Ghemawat, S., Gobioff, H., Leung, S. The Google File System. In: Symposium on Operating Systems Principles, pp. 29–43, 2003
Hadoop: Open source implementation of MapReduce, Available: http://hadoop.apache.org, June 24, 2010
Q. He, Q. Tan, X.D. Ma, Z.Z. Shi. The high-activity parallel implementation of data preprocessing based on MapReduce. RSKT, LNAI 6401, pp. 646–654, 2010
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. VeryLarge Data Bases, pages 487–499, Santiago, Chile, September 1994
Rakesh Agrawa, John C.shafer. Parallel Mining of Association Rules, IEEE transactions on knowledge and data engineering, Vol. 8, No.6, pp.962–969,1996
Google Scholar

Download references

Author information

Authors and Affiliations

The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, No.6 Kexueyuan South Road Zhongguancun, 100190, Haidian District Beijing, China
Ning Li, Li Zeng, Qing He & Zhongzhi Shi
Graduate University of Chinese Academy of Sciences, 100139, Beijing, China
Ning Li
Key Lab. of Machine Learning and Computational Intelligence, College of Mathematics and Computer Science, Hebei University, 071002, Baoding, Hebei, China
Ning Li

Authors

Ning Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Qing He
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Li.

Rights and permissions

This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Li, N., Zeng, L., He, Q. et al. Parallel Implementation of Apriori Algorithm Based on MapReduce. Int J Netw Distrib Comput 1, 89–96 (2013). https://doi.org/10.2991/ijndc.2013.1.2.3

Download citation

Received: 22 March 2012
Accepted: 13 November 2012
Published: 01 April 2013
Issue Date: April 2013
DOI: https://doi.org/10.2991/ijndc.2013.1.2.3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Parallel Implementation of Apriori Algorithm Based on MapReduce

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation