Soft Computing

, Volume 21, Issue 9, pp 2237–2249 | Cite as

A parallel algorithm for mining constrained frequent patterns using MapReduce

Methodologies and Application

Abstract

Constrained frequent pattern refers to a frequent pattern generated using constrained conditions given by users and has characteristics of stronger pertinence, higher practicability and mining efficiency, etc. With the increasing of datasets, there are defects during the construction of the constrained frequent pattern tree, so that the constrained frequent pattern tree is difficult to apply to massive datasets. In this paper, a parallel mining algorithm of the constrained frequent pattern, called PACFP, is proposed using the MapReduce programming model. First, key steps in the algorithm, such as mapping transaction in datasets to frequent item support count, constructing the constrained frequent pattern tree, generating the constrained frequent pattern, and aggregating frequent patterns, are implemented by three pairs of Map and Reduce functions. Second, migration of data recording is achieved by applying a data grouping strategy based on frequent item support, and load balance is effectively solved while generating the constrained frequent pattern. In the end, experimental results validate availability, scalability, and expandability of the algorithm using celestial spectrum datasets.

Keywords

Association rule Constrained frequent pattern MapReduce Frequent item support Load balance 

References

  1. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216CrossRefGoogle Scholar
  2. Chen CC, Tseng CY, Chen MS (2013) Highly scalable sequential pattern mining based on mapreduce model on the cloud. In: 2013 IEEE international congress on big data (BigData Congress), pp 310–317Google Scholar
  3. Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883CrossRefGoogle Scholar
  4. Chen K, Zhang L, Li S, Ke W (2011) Research on association rules parallel algorithm based on fp-growth. In: Information computing and applications, pp 249–256Google Scholar
  5. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  6. Gao Y, Zhu S (2010) Improvement and realization of association rules mining algorithm based on FP-tree. In: 2010 2nd International conference on information science and engineering (ICISE), pp 1264–1266Google Scholar
  7. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2):1–12CrossRefGoogle Scholar
  8. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl Discov 8(1):53–87MathSciNetCrossRefGoogle Scholar
  9. Han J, Kamber M (2006) Data mining. Concepts and techniques. Southeast Asia EditionGoogle Scholar
  10. Hong S, Huaxuan Z, Shiping C, Chunyan H (2013) The study of improved FP-growth algorithm in MapReduce. In: 1st International workshop on cloud computing and information securityGoogle Scholar
  11. Hui-ling P, Yun-xing S (2012) A new FP-tree-based algorithm MMFI for mining the maximal frequent itemsets. In: 2012 IEEE international conference on computer science and automation engineering (CSAE), vol 2, pp 61–65Google Scholar
  12. Islam ABMR, Chung TS (2011) An improved frequent pattern tree based association rule mining technique. In: 2011 International conference on information science and applications (ICISA), pp 1–8Google Scholar
  13. Javed A, Khokhar A (2004) Frequent pattern mining on message passing multiprocessor systems. Distrib Parallel Databases 16(3):321–334CrossRefGoogle Scholar
  14. Lam C (2010) Hadoop in action. Manning Publications CoGoogle Scholar
  15. Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, pp 341–355Google Scholar
  16. Li H, Wang Y, Zhang D, Zhang, M, Chang EY (2008) Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM conference on recommender systems, pp 107–114Google Scholar
  17. Rong Z, Xia D, Zhang Z (2013) Complex statistical analysis of big data: implementation and application of apriori and FP-growth algorithm based on MapReduce. In: 2013 4th IEEE international conference on software engineering and service science (ICSESS), pp 968–972Google Scholar
  18. Seki K, Jinno R, Uehara K (2013) Parallel distributed trajectory pattern mining using hierarchical grid with MapReduce. Int J Grid High Perform Comput 5(4):79–96CrossRefGoogle Scholar
  19. Tu F, He B (2011) A parallel algorithm for mining association rules based on FP-tree. In: Advances in computer science, environment, ecoinformatics, and education, pp 399–403Google Scholar
  20. Wang HJ, Hu CA (2010) Mining maximal patterns based on improved FP-tree and array technique. In: 2010 Third international symposium on intelligent information technology and security informatics (IITSI), pp 567–571Google Scholar
  21. White T (2012) Hadoop: the definitive guide. O’Reilly Media, IncGoogle Scholar
  22. Yang XY, Liu Z, Fu Y (2010) MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International conference on information sciences and interaction sciences (ICIS), pp 99–102Google Scholar
  23. Zhang J, Zhao X, Zhang S, Yin S, Qin X (2013) Interrelation analysis of celestial spectra data using constrained frequent pattern trees. Knowl Based Syst 41:77–88CrossRefGoogle Scholar
  24. Zhou J, Yu KM (2008) Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In: Advances in grid and pervasive computing, pp 18–28Google Scholar
  25. Zhou L, Zhong Z, Chang J, Li J, Huang JZ, Feng S (2010) Balanced parallel fp-growth with mapreduce. In: 2010 IEEE youth conference on information computing and telecommunications (YC-ICT), pp 243–246Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyTaiyuan University of Science and TechnologyTaiyuanChina
  2. 2.Department of Science and Software EngineeringAuburn UniversityAuburnUSA

Personalised recommendations