Abstract
Mining big datasets poses a number of challenges which are not easily addressed by traditional mining methods, since both memory and computational requirements are hard to satisfy. One solution for dealing with such requirements is to take advantage of parallel frameworks, such as MapReduce, that allow to make powerful computing and storage units on top of ordinary machines. In this paper, we address the issue of mining closed frequent itemsets (CFI) from big datasets in such environments. We introduce a new parallel algorithm, called CloPN, for CFI mining. One of the worth of cite features of CloPN is that it uses a prime number based approach to transform the data into numerical form, and then to mine closed frequent itemsets by using only multiplication and division operations. We carried out exhaustive experiments over big real world datasets to assess the performance of CloPN. The obtained results highlight that our algorithm is very efficient in CFI mining from large real world datasets with up to 53 million articles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
There is no duplicated items in a transaction from transactional dataset, so we will suppose that the multiplicity will be \(m_{i} = 1\), without any loss of information.
References
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
El-hajj, M., Zaïane, O.R.: Parallel leap: large-scale maximal pattern mining in a distributed environment. In: Conference on Parallel and Distributed Systems, pp. 135–142 (2006)
Chen, K., Zhang, L., Li, S., Ke, W.: Research on association rules parallel algorithm based on FP-growth. In: Liu, C., Chang, J., Yang, A. (eds.) ICICA 2011, Part II. CCIS, vol. 244, pp. 249–256. Springer, Heidelberg (2011)
Zaïane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: ICDM Conference, pp. 665–668 (2001)
Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: VLDB Conference, pp. 1275–1285 (2007)
Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: KDD Conference, pp. 236–245 (2003)
Wang, S.-Q., Yang, Y.-B., Gao, Y., Chen, G.-P., Zhang, Y.: Mapreduce-based closed frequent itemset mining with efficient redundancy filtering. In: ICDM Workshops, pp. 449–453. IEEE Computer Society (2012)
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: ACM Conference on Recommender Systems (RecSys), pp. 107–114 (2008)
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. In: KDD Conference, pp. 66–75 (2000)
Borgelt, C.: An implementation of the fp-growth algorithm. In: OSDM Workshop, pp. 1–5 (2005)
Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: SIGMOD Workshop on Research Issues in, Data Mining and Knowledge Discovery, pp. 21–30 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zitouni, M., Akbarinia, R., Yahia, S.B., Masseglia, F. (2015). A Prime Number Based Approach for Closed Frequent Itemset Mining in Big Data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-22849-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)