Advertisement

A Prime Number Based Approach for Closed Frequent Itemset Mining in Big Data

  • Mehdi Zitouni
  • Reza Akbarinia
  • Sadok Ben Yahia
  • Florent Masseglia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9261)

Abstract

Mining big datasets poses a number of challenges which are not easily addressed by traditional mining methods, since both memory and computational requirements are hard to satisfy. One solution for dealing with such requirements is to take advantage of parallel frameworks, such as MapReduce, that allow to make powerful computing and storage units on top of ordinary machines. In this paper, we address the issue of mining closed frequent itemsets (CFI) from big datasets in such environments. We introduce a new parallel algorithm, called CloPN, for CFI mining. One of the worth of cite features of CloPN is that it uses a prime number based approach to transform the data into numerical form, and then to mine closed frequent itemsets by using only multiplication and division operations. We carried out exhaustive experiments over big real world datasets to assess the performance of CloPN. The obtained results highlight that our algorithm is very efficient in CFI mining from large real world datasets with up to 53 million articles.

Keywords

Data mining Closed frequent itemset MapReduce Big data Parallel algorithm CloPN 

References

  1. 1.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998) Google Scholar
  2. 2.
    El-hajj, M., Zaïane, O.R.: Parallel leap: large-scale maximal pattern mining in a distributed environment. In: Conference on Parallel and Distributed Systems, pp. 135–142 (2006)Google Scholar
  3. 3.
    Chen, K., Zhang, L., Li, S., Ke, W.: Research on association rules parallel algorithm based on FP-growth. In: Liu, C., Chang, J., Yang, A. (eds.) ICICA 2011, Part II. CCIS, vol. 244, pp. 249–256. Springer, Heidelberg (2011) Google Scholar
  4. 4.
    Zaïane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: ICDM Conference, pp. 665–668 (2001)Google Scholar
  5. 5.
    Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: VLDB Conference, pp. 1275–1285 (2007)Google Scholar
  6. 6.
    Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: KDD Conference, pp. 236–245 (2003)Google Scholar
  7. 7.
    Wang, S.-Q., Yang, Y.-B., Gao, Y., Chen, G.-P., Zhang, Y.: Mapreduce-based closed frequent itemset mining with efficient redundancy filtering. In: ICDM Workshops, pp. 449–453. IEEE Computer Society (2012)Google Scholar
  8. 8.
    Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: ACM Conference on Recommender Systems (RecSys), pp. 107–114 (2008)Google Scholar
  9. 9.
    Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. In: KDD Conference, pp. 66–75 (2000)Google Scholar
  10. 10.
    Borgelt, C.: An implementation of the fp-growth algorithm. In: OSDM Workshop, pp. 1–5 (2005)Google Scholar
  11. 11.
    Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: SIGMOD Workshop on Research Issues in, Data Mining and Knowledge Discovery, pp. 21–30 (2000)Google Scholar
  12. 12.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mehdi Zitouni
    • 1
    • 2
  • Reza Akbarinia
    • 2
  • Sadok Ben Yahia
    • 1
    • 3
  • Florent Masseglia
    • 2
  1. 1.University of Tunis ElManar, Faculty of Sciences of TunisTunisTunisia
  2. 2.INRIA and LIRMMMontpellierFrance
  3. 3.Institut TelecomTelecom SudParis, umr 5157, Cnrs SamovarEvryFrance

Personalised recommendations