A Prime Number Based Approach for Closed Frequent Itemset Mining in Big Data

Zitouni, Mehdi; Akbarinia, Reza; Yahia, Sadok Ben; Masseglia, Florent

doi:10.1007/978-3-319-22849-5_35

Mehdi Zitouni^18,19,
Reza Akbarinia¹⁹,
Sadok Ben Yahia^18,20 &
…
Florent Masseglia¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9261))

Included in the following conference series:

1234 Accesses
3 Citations

Abstract

Mining big datasets poses a number of challenges which are not easily addressed by traditional mining methods, since both memory and computational requirements are hard to satisfy. One solution for dealing with such requirements is to take advantage of parallel frameworks, such as MapReduce, that allow to make powerful computing and storage units on top of ordinary machines. In this paper, we address the issue of mining closed frequent itemsets (CFI) from big datasets in such environments. We introduce a new parallel algorithm, called CloPN, for CFI mining. One of the worth of cite features of CloPN is that it uses a prime number based approach to transform the data into numerical form, and then to mine closed frequent itemsets by using only multiplication and division operations. We carried out exhaustive experiments over big real world datasets to assess the performance of CloPN. The obtained results highlight that our algorithm is very efficient in CFI mining from large real world datasets with up to 53 million articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
There is no duplicated items in a transaction from transactional dataset, so we will suppose that the multiplicity will be \(m_{i} = 1\), without any loss of information.

References

Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Google Scholar
El-hajj, M., Zaïane, O.R.: Parallel leap: large-scale maximal pattern mining in a distributed environment. In: Conference on Parallel and Distributed Systems, pp. 135–142 (2006)
Google Scholar
Chen, K., Zhang, L., Li, S., Ke, W.: Research on association rules parallel algorithm based on FP-growth. In: Liu, C., Chang, J., Yang, A. (eds.) ICICA 2011, Part II. CCIS, vol. 244, pp. 249–256. Springer, Heidelberg (2011)
Google Scholar
Zaïane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: ICDM Conference, pp. 665–668 (2001)
Google Scholar
Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: VLDB Conference, pp. 1275–1285 (2007)
Google Scholar
Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: KDD Conference, pp. 236–245 (2003)
Google Scholar
Wang, S.-Q., Yang, Y.-B., Gao, Y., Chen, G.-P., Zhang, Y.: Mapreduce-based closed frequent itemset mining with efficient redundancy filtering. In: ICDM Workshops, pp. 449–453. IEEE Computer Society (2012)
Google Scholar
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: ACM Conference on Recommender Systems (RecSys), pp. 107–114 (2008)
Google Scholar
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. In: KDD Conference, pp. 66–75 (2000)
Google Scholar
Borgelt, C.: An implementation of the fp-growth algorithm. In: OSDM Workshop, pp. 1–5 (2005)
Google Scholar
Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: SIGMOD Workshop on Research Issues in, Data Mining and Knowledge Discovery, pp. 21–30 (2000)
Google Scholar
https://www.grid5000.fr/

Download references

Author information

Authors and Affiliations

University of Tunis ElManar, Faculty of Sciences of Tunis, LIPAH-LR 11ES14, 2092, Tunis, Tunisia
Mehdi Zitouni & Sadok Ben Yahia
INRIA and LIRMM, Montpellier, France
Mehdi Zitouni, Reza Akbarinia & Florent Masseglia
Institut Telecom, Telecom SudParis, umr 5157, Cnrs Samovar, Evry, France
Sadok Ben Yahia

Authors

Mehdi Zitouni
View author publications
You can also search for this author in PubMed Google Scholar
Reza Akbarinia
View author publications
You can also search for this author in PubMed Google Scholar
Sadok Ben Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Florent Masseglia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehdi Zitouni .

Editor information

Editors and Affiliations

Hewlett-Packard Enterprise, Sunnyvale, California, USA
Qiming Chen
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Blaise Pascal University, Aubiere, France
Farouk Toumani
University of Linz, Linz, Austria
Roland Wagner
Universidad Politécnica de Valencia, Valencia, Spain
Hendrik Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zitouni, M., Akbarinia, R., Yahia, S.B., Masseglia, F. (2015). A Prime Number Based Approach for Closed Frequent Itemset Mining in Big Data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-22849-5_35
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics