A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework

Article
  • 75 Downloads

Abstract

Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms. We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.

Keywords

Big Data Bit Vector Compression Data Mining Frequent itemset mining MapReduce framework 

References

  1. 1.
    Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8(6), 962–969 (1996)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)CrossRefGoogle Scholar
  3. 3.
    Al-kahtani, M.S., Karim, L.: An efficient distributed algorithm for big data processing. Arab. J. Sci. Eng. doi: 10.1007/s13369-016-2405-y
  4. 4.
    Bechini, A., Marcelloni, F., Segatori, A.: A MapReduce solution for associative classification of big data. Inf. Sci. 332, 33–55 (2016)CrossRefGoogle Scholar
  5. 5.
    Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with Roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)CrossRefGoogle Scholar
  6. 6.
    Colantonio, A., Pietro, R.D.: Concise: compressed ’n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)CrossRefMATHGoogle Scholar
  7. 7.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  8. 8.
    Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl. Based Syst. 20(4), 329–335 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Emani, C.K., Cullot, N., Nicolle, C.: Understandable big data: a survey. Comput. Sci. Rev. 17, 70–81 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery 9853, pp. 36–40 (2016)Google Scholar
  11. 11.
    Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Trans. Knowl. Data Eng. 12(3), 337–352 (2000)CrossRefGoogle Scholar
  12. 12.
    Han, J., Yin, J.P.Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8(1), 53–87 (2004)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hong, S., Huaxuan, Z., Shiping, C., Chunyan, H.: The study of improved FP-Growth Algorithm in MapReduce. In: Proceedings of 1st International Workshop on Cloud Computing and Information Security, pp. 250–253 (2013)Google Scholar
  14. 14.
    Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender systems, pp. 107–114 (2008)Google Scholar
  15. 15.
    Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of IEEE International Conference on Business Computing and Global Informatization, pp. 475–478 (2011)Google Scholar
  16. 16.
    Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (2012)Google Scholar
  17. 17.
    Meng, S., Dou, W., Zhang, X., Chen, J.: KASR: a keyword-aware service recommendation method on MapReduce for big data applications. IEEE Trans. Parallel Distrib. Syst. 25(12), 3221–3231 (2014)Google Scholar
  18. 18.
    Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Proceedings of IEEE International Conference on Big data, pp. 111–118 (2013)Google Scholar
  19. 19.
    Mohamed, M.H., Darwieesh, M.M.: Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. Cybern. 5(6), 823–833 (2013)CrossRefGoogle Scholar
  20. 20.
    Sakr, N.A., ELdesouky, A., Arafat, H.: An efficient fast-response content-based image retrieval framework for big data. Comput. Electr. Eng. 54, 522–538 (2016)Google Scholar
  21. 21.
    Sandhu, R., Sood, S.K.: Scheduling of big data applications on distributed cloud based on QoS parameters. Clust. Comput. 18(2), 817–828 (2015)Google Scholar
  22. 22.
    Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl. Based Syst. 21(6), 507–513 (2008)CrossRefGoogle Scholar
  23. 23.
    Spiegler, I., Maayan, R.: Storage and retrieval considerations of binary data bases. Inf. Process. Manag. 21(3), 233–254 (1985)CrossRefGoogle Scholar
  24. 24.
    Sun, D., Lee, V.C., Burstein, F., Haghighi, P.D.: An efficient vertical-Apriori Mapreduce algorithm for frequent item-set mining. In: Proceedings of IEEE Industrial Electronics and Applications, pp. 108–112 (2015)Google Scholar
  25. 25.
    Tsay, Y.J., Hsu, T.J., Yu, J.R.: FIUT: a new method for mining frequent itemsets. Inf. Sci. 179(11), 1724–1737 (2009)CrossRefGoogle Scholar
  26. 26.
    Vennila, V., Kannan, A.R.: Symmetric Matrix-based Predictive Classifier for Big Data computation and information sharing in Cloud. Comput. Electr. Eng. 56, 831–841 (2016)CrossRefGoogle Scholar
  27. 27.
    Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on MapReduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)CrossRefGoogle Scholar
  28. 28.
    Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management 5069, pp. 348–365 (2008)Google Scholar
  29. 29.
    Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)CrossRefGoogle Scholar
  30. 30.
    Xia, D., Zhou, Y., Rong, Z., Zhang, Z.: IPFP: An improved parallel FP-growth algorithm for frequent itemsets mining. In Proceedings of 59th ISI World Statistics Congress, pp. 4034–4039 (2013)Google Scholar
  31. 31.
    Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)CrossRefGoogle Scholar
  32. 32.
    Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Yuan, Y., Huang, T.: A matrix algorithm for mining association rules. Proc. Int. Conf. Intell. Comput. 3644, 370–379 (2005)Google Scholar
  34. 34.
    Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–335 (2003)Google Scholar
  35. 35.
    Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18(4), 1493–1501 (2015)CrossRefGoogle Scholar
  36. 36.
    Zhou, L., Wang, X.: Research of the FP-growth algorithm based on cloud environments. J. Softw. 9(3), 676–683 (2014)Google Scholar
  37. 37.
    Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: Proceedings of IEEE Youth Conference Information Computing and Telecommunications, pp. 243–246 (2010)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.National Institute of TechnologyWarangalIndia

Personalised recommendations