Advertisement

A Parallel Incremental Frequent Itemsets Mining IFIN+: Improvement and Extensive Evaluation

  • Van Quoc Phuong HuynhEmail author
  • Josef Küng
  • Tran Khanh Dang
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11390)

Abstract

In this paper, we propose a shared-memory parallelization solution for the Frequent Itemsets Mining algorithm IFIN, called IFIN+. The motivation for our work is that commodity processors, nowadays, are enhanced with many physical computational units, and exploiting full advantage of this is a potential solution to improve computational performance in single-machine environments. The portions in the serial version are improved in means which increases efficiency and computational independence for convenience in designing parallel computation with Work-Pool model, be known as a good model for load balance. We conducted extensive experiments on both synthetic and real datasets to evaluate IFIN+ against its serial version IFIN, the well-known algorithm FP-Growth and other two state-of-the-art ones, FIN and PrePost+. The experimental results show that the running time of IFIN+ is the most efficient, especially in the case of mining at different support thresholds within the same running session. Compare to its serial version, IFIN+ performance is improved significantly.

Keywords

Incremental Parallel Frequent Itemsets Mining Data mining Big Data IPPC-Tree IFIN IFIN+ 

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on VLDB, pp. 487–499 (1994)Google Scholar
  2. 2.
    Han, J., Pei, J., Yin, Y.: Mining frequent itemsets without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)CrossRefGoogle Scholar
  3. 3.
    Cheung, W., Zaïane O.R.: Incremental mining of frequent patterns without candidate generation or support constraint. In: Proceedings of the 7th International Database Engineering and Applications Symposium, pp. 111–116. IEEE (2003)Google Scholar
  4. 4.
    Deng, Z.-H., Lv, S.-L.: Fast mining frequent itemsets using nodesets. Expert Syst. Appl. 41(10), 4505–4512 (2014)CrossRefGoogle Scholar
  5. 5.
    Deng, Z.-H., Lv, S.-L.: PrePost+: an efficient N-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst. Appl. 42(13), 5424–5432 (2015)CrossRefGoogle Scholar
  6. 6.
    Rymon, R.: Search through systematic set enumeration. In: Proceedings of the 1st International Conference on Principles of Knowledge Representation and Reasoning, pp. 539–550 (1992)Google Scholar
  7. 7.
    Market-Basket Synthetic Data Generator. https://synthdatagen.codeplex.com/
  8. 8.
    Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: VLDB, pp. 432–443 (1995)Google Scholar
  9. 9.
    Perego, R., Orlando, S., Palmerini, P.: Enhancing the apriori algorithm for frequent set counting. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 71–82 (2001)CrossRefGoogle Scholar
  10. 10.
    Park, J.S., Chen, M.S., Yu, P.S.: Using a hash-based method with transaction trimming and database scan reduction for mining association rules. IEEE Trans. Knowl. Data Eng. 9(5), 813–825 (1997)CrossRefGoogle Scholar
  11. 11.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)CrossRefGoogle Scholar
  12. 12.
    Grahne, G., Zhu, J.: Fast algorithms for frequent itemset mining using FP-Trees. Trans. Knowl. Data Eng. 17(10), 1347–1362 (2005)CrossRefGoogle Scholar
  13. 13.
    Liu, G., Lu, H., Lou, W., Xu, Y., Yu, J.X.: Efficient mining of frequent itemsets using ascending frequency ordered prefix-tree. DMKD J. 9(3), 249–274 (2004)Google Scholar
  14. 14.
    Shenoy, P., Haritsa, J.R., Sudarshan, S.: Turbo-charging vertical mining of large databases. In: 2000 SIGMOD, pp. 22–33 (2000)CrossRefGoogle Scholar
  15. 15.
    Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: 9th SIGKDD, pp. 326–335 (2003)Google Scholar
  16. 16.
    Liu, J., Wu, Y., Zhou, Q., Fung, B.C.M., Chen, F., Yu, B.: Parallel eclat for opportunistic mining of frequent itemsets. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 401–415. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-22849-5_27CrossRefGoogle Scholar
  17. 17.
    Yun, U., Lee, G.: Incremental mining of weighted maximal frequent itemsets from dynamic databases. Expert Syst. Appl. 54, 304–327 (2016)CrossRefGoogle Scholar
  18. 18.
    Huynh, V.Q.P., Küng, J., Dang, T.K.: Incremental frequent itemsets mining with IPPC tree. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10438, pp. 463–477. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64468-4_35CrossRefGoogle Scholar
  19. 19.
    Huynh, V.Q.P., Küng, J., Jäger, M., Dang, T.K.: IFIN+: a parallel incremental frequent itemsets mining in shared-memory environment. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 121–138. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-70004-5_9CrossRefGoogle Scholar
  20. 20.
    Frequent Itemset Mining Dataset Repository: Kosarak, Online News Portal Click-Stream Data. http://fimi.ua.ac.be/data/kosarak.dat.gz

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Van Quoc Phuong Huynh
    • 1
    Email author
  • Josef Küng
    • 1
  • Tran Khanh Dang
    • 2
  1. 1.Institute for Application Oriented Knowledge Processing (FAW), Faculty of Engineering and Natural Sciences (TNF)Johannes Kepler University (JKU)LinzAustria
  2. 2.Ho Chi Minh City University of Technology, VNUHCMHo Chi Minh CityVietnam

Personalised recommendations