A fast and low idle time method for mining frequent patterns in distributed and many-task computing environments

  • Chun-Cheng Lin
  • Sheng-Hao Chung
  • Ju-Chin Chen
  • Yuan-Tse Yu
  • Kawuu W. Lin
Article
  • 12 Downloads

Abstract

Association rules mining has attracted much attention among data mining topics because it has been successfully applied in various fields to find the association between purchased items by identifying frequent patterns (FPs). Currently, databases are huge, ranging in size from terabytes to petabytes. Although past studies can effectively discover FPs to deduce association rules, the execution efficiency is still a critical problem, particularly for big data. Progressive size working set (PSWS) and parallel FP-growth (PFP) are state-of-the-art methods that have been applied successfully to parallel and distributed computing technology to improve mining processing time in many-task computing, thereby bridging the gap between high-throughput and high-performance computing. However, such methods cannot mine before obtaining a complete FP-tree or the corresponding subdatabase, causing a high idle time for computing nodes. We propose a method that can begin mining when a small part of an FP-tree is received. The idle time of computing nodes can be reduced, and thus, the time required for mining can be reduced effectively. Through an empirical evaluation, the proposed method is shown to be faster than PSWS and PFP.

Keywords

Distributed mining Distributed computing Frequent pattern mining Many-task computing 

Notes

Acknowledgement

This work was supported by the Ministry of Science and Technology of Taiwan, R.O.C., under Grant Nos. MOST 104-2221-E-151 -055 and 105-2221-E-151 -056.

References

  1. 1.
    Adnan, M., Alhajj, R.: DRFP-tree: disk-resident frequent pattern tree. Appl. Intell. 30, 84–97 (2009)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)CrossRefGoogle Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th international conference very large data bases, VLDB, pp. 487–499 (1994)Google Scholar
  4. 4.
    Agrawal, R., Srikant, R.: Quest Synthetic Data Generator. IBM Almaden Research Center, San Jose (2009)Google Scholar
  5. 5.
    Baralis, E., Cerquitelli, T., Chiusano, S., Grand, A.: P-mine: parallel itemset mining on large datasets. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), IEEE, pp. 266–271 (2013)Google Scholar
  6. 6.
    Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 254–260 (1999)Google Scholar
  7. 7.
    Buehrer, G., de Oliveira, R.L., Fuhry, D., Parthasarathy, S.: Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In: IEEE 31st International Conference on Data Engineering (ICDE), IEEE, pp. 1071–1082 (2015)Google Scholar
  8. 8.
    Buehrer, G., Parthasarathy, S., Tatikonda, S., Kurc, T., Saltz, J.: Toward terabyte pattern mining: an architecture-conscious solution. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 2–12 (2007)Google Scholar
  9. 9.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRefGoogle Scholar
  10. 10.
    Eggen, M., Eggen, R.: Java versus MPI in a distributed environment. In: PDPTA, pp. 390–395 (1999)Google Scholar
  11. 11.
    Ezeife, C., Zhang, D.: TidFP: mining frequent patterns in different databases with transaction ID. In: Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery, Springer, pp. 125–137 (2009)Google Scholar
  12. 12.
    Geurts, K., Wets, G., Brijs, T., Vanhoof, K.: Profiling of high-frequency accident locations by use of association rules. Transp. Res. Rec. 2003, 123–130 (1840)Google Scholar
  13. 13.
    Goethals, B., Zaki, M.J.: Frequent itemset mining dataset repository. In: Frequent Itemset Mining Implementations (FIMI 2003) (2003)Google Scholar
  14. 14.
    Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: FIMI, pp. 123–132 (2003)Google Scholar
  15. 15.
    Grahne, G., Zhu, J.: Mining frequent itemsets from secondary memory. In: Fourth IEEE International Conference on Data Mining, 2004. ICDM’04, IEEE, pp. 91–98 (2004)Google Scholar
  16. 16.
    Hadoop, A.: Hadoop (2009). http://hadoop.apache.org/
  17. 17.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record, ACM, pp. 1–12 (2000)Google Scholar
  18. 18.
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8, 53–87 (2004)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Huang, D., Song, Y., Routray, R., Qin, F.: Smart cache: an optimized mapreduce implementation of frequent itemset mining. In: 2015 IEEE International Conference on Cloud Engineering (IC2E), IEEE, pp. 16–25 (2015)Google Scholar
  20. 20.
    Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004)CrossRefGoogle Scholar
  21. 21.
    Lai, Y., ZhongZhi, S.: An efficient data mining framework on Hadoop using Java persistence API. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), IEEE, pp. 203–209 (2010)Google Scholar
  22. 22.
    Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, ACM, pp. 107–114 (2008)Google Scholar
  23. 23.
    Liang, Y.-H., Wu, S.-Y.: Sequence-growth: a scalable and effective frequent itemset mining algorithm for big data based on mapreduce framework. In: 2015 IEEE International Congress on Big Data (BigData Congress), IEEE, pp. 393–400 (2015)Google Scholar
  24. 24.
    Lin, K.W., Chung, S.-H.: A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments. Fut. Gener. Comput. Syst. 52, 49–58 (2015)CrossRefGoogle Scholar
  25. 25.
    Lin, K.W., Chung, S.-H., Lin, C.-C.: A fast and distributed algorithm for mining frequent patterns in congested networks. Computing 98, 235–256 (2016)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Lin, K.W., Deng, D.-J.: A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments. Int. J. Ad Hoc Ubiquitous Comput. 6, 205–215 (2010)CrossRefGoogle Scholar
  27. 27.
    Lin, K.W., Lo, Y.-C.: Efficient algorithms for frequent pattern mining in many-task computing environments. Knowl. Based Syst. 49, 10–21 (2013)CrossRefGoogle Scholar
  28. 28.
    Lin, W.-T., Chu, C.-P.: Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments. Int. J. Parallel Emerg. Distrib. Syst. 30, 380–392 (2014)CrossRefGoogle Scholar
  29. 29.
    Liu, J., Wu, Y., Zhou, Q., Fung, B.C., Chen, F., Yu, B.: Parallel eclat for opportunistic mining of frequent itemsets. In: Database and Expert Systems Applications, Springer, pp. 401–415 (2015)Google Scholar
  30. 30.
    Lucchese, C., Orlando, S., Perego, R.: Parallel mining of frequent closed patterns: harnessing modern computer architectures. In: Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007, IEEE, pp. 242–251 (2007)Google Scholar
  31. 31.
    Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: WebDocs: a real-life huge transactional dataset. In: FIMI (2004)Google Scholar
  32. 32.
    Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: 2013 IEEE International Conference on Big Data, IEEE, pp. 111–118 (2013)Google Scholar
  33. 33.
    Qiu, H., Gu, R., Yuan, C., Huang, Y.: Yafim: a parallel frequent itemset mining algorithm with spark. In: Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, IEEE, pp. 1664–1671 (2014)Google Scholar
  34. 34.
    Qiu, Y., Lan, Y.-J., Xie, Q.-S.: An improved algorithm of mining from FP-tree. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, IEEE, pp. 1665–1670, 2004Google Scholar
  35. 35.
    Schlegel, B., Gemulla, R., Lehner, W.: Memory-efficient frequent-itemset mining. In: Proceedings of the 14th International Conference on Extending Database Technology, ACM, pp. 461–472 (2011)Google Scholar
  36. 36.
    Spark, A.: Spark. https://spark.apache.org/
  37. 37.
    Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, ACM, p. 5 (2013)Google Scholar
  38. 38.
    Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. In: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, ACM, pp. 49–54 (2013)Google Scholar
  39. 39.
    Wang, Y., Parthasarathy, S., Sadayappan, P.: Stratification driven placement of complex data: a framework for distributed data analytics. In: IEEE 29th International Conference on Data Engineering (ICDE), pp. 709–720 (2013)Google Scholar
  40. 40.
    Wu, X., Fan, W., Peng, J., Zhang, K., Yu, Y.: Iterative sampling based frequent itemset mining for big data. Int. J. Mach. Learn. Cybern 6, 875–882 (2015)CrossRefGoogle Scholar
  41. 41.
    Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26, 97–107 (2014)CrossRefGoogle Scholar
  42. 42.
    Yahya, O., Hegazy, O., Ezat, E.: An efficient implementation of A-Priori algorithm based on Hadoop-MapReduce model. Int. J. Rev. Comput. 12 (2012)Google Scholar
  43. 43.
    Yang, L., Shi, Z., Xu, L.D., Liang, F., Kirsh, I.: DH-TRIE frequent pattern mining on Hadoop using JPA. In: 2011 IEEE International Conference on Granular Computing (GrC), pp. 875–878 (2011)Google Scholar
  44. 44.
    Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), IEEE, pp. 99–102 (2010)Google Scholar
  45. 45.
    Yen, S.-J., Lee, Y.-S., Wang, Y.-S., Wu, J.-W., Ouyang, L.-Y.: The studies of mining frequent patterns based on frequent pattern tree. In: Advances in Knowledge Discovery and Data Mining, Springer, pp. 232–241 (2009)Google Scholar
  46. 46.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 372–390 (2000)CrossRefGoogle Scholar
  47. 47.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., Stolorz, P., Musick, R.: Parallel algorithms for discovery of association rules. In: Scalable High Performance Computing for Knowledge Discovery and Data Mining, Springer, pp. 5–35 (1997)Google Scholar
  48. 48.
    Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18, 1493–1501 (2015)CrossRefGoogle Scholar
  49. 49.
    Zhou, J., Yu, K.-M.: Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In: Advances in Grid and Pervasive Computing, Springer, pp. 18–28 (2008)Google Scholar
  50. 50.
    Zhou, J., Yu, K.-M.: Balanced Tidset-based parallel FP-tree algorithm for the frequent pattern mining on grid system. In: Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid, IEEE Computer Society, pp. 103–108 (2008)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Industrial Engineering and ManagementNational Chiao Tung UniversityHsinchuTaiwan
  2. 2.Department of Computer Science and Information EngineeringNational Kaohsiung University of Science and TechnologyKaohsiungTaiwan
  3. 3.Department of Software Engineering and ManagementNational Kaohsiung Normal UniversityKaohsiungTaiwan

Personalised recommendations