Abstract
In the past, a novel framework named recent weighted frequent itemset mining (RWFIM) and two projection-based algorithms, RWFIM-P and RWFIM-PE, were proposed to consider both the relative importance of items (item weights) and the recency of patterns. However, the projection-and-test mechanism used by these algorithms to discover recent weighted frequent itemsets (RWFIs) in a recursive way may have poor performance when the database is dense or contains long transactions. To address this issue, an efficient tree-based RWFI-Mine algorithm is proposed in this paper for mining RWFIs, which considers both weight and the recency of patterns. A novel Set-enumeration tree called the recent weighted frequent (RWF)-tree and a sorted downward closure property of RWFIs for the RWF-tree are proposed. Moreover, two data structures, named element (E)-table and recent weighted frequent (RWF)-table, are designed to store the information needed for discovering RWFIs. RFWI-Mine discovers RWFIs in a recursive way without candidate generation, thus reducing the computational costs and memory requirements for mining RWFIs. A second improved algorithm named RWFI-EMine algorithm is further proposed to avoid building E-tables and RWF-tables for unpromising itemsets and their child nodes by adopting the Estimated Weight of 2-itemset Pruning (EW2P) strategy. Extensive experiments are conducted on several real-world and synthetic datasets to evaluate the performance of the two proposed algorithms, and the ratio between the number of generated RWFIs and WFIs. Results show that the proposed algorithms outperform not only the traditional PWA algorithm for WFIM, but also the state-of-the-art RWFIM-P and RWFIM-PE algorithms for RWFIM, in terms of runtime, memory usage and scalability.
Similar content being viewed by others
References
Frequent itemset mining dataset repository. Available: http://fimi.ua.ac.be/data/ (2012)
Agrawal R, Srikant R (1994) Quest synthetic data generator Available: http://www.Almaden.ibm.com/cs/quest/syndata.html
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases, International Conference on Very Large Data Bases, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns, International Conference on Data Engineering, pp 3–14
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database, The ACM SIGMOD International Conference on Management of Data, pp 207–216
Cai CH, Fu AWC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items, International Database Engineering and Applications Symposium, pp 68–77
Chen MS, Han J, Yu PS (1996) Data mining: An overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866– 883
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: A survey. ACM Comput Surv 38 (3):9
Han J, Lakshmanan L, Ng RT (1999) Constraint-based, multidimensional data mining. Computer 32 (8):46–50
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candiyear generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: Current status and future directions. Data Min Knowl Disc 15(1):55–86
Hong TP, Wu YY, Wang SL (2009) An effective mining approach for up-to-year patterns. Expert Systems with Applications 36(6):9747–9752
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) RWFIM: Recent weighted-frequent itemsets mining. Eng Appl Artif Intell 45:18–32
Lin JCW, Gan W, Hong TP, Tseng VS (2015) HEWIM: High expected weighted itemset mining in uncertain databases, International Conference on Machine Learning and Cybernetics, pp 439–444
Lan GC, Hong TP, Lee HY (2014) An efficient approach for finding weighted sequential patterns from sequence databases. Appl Intell 41(2):439–452
Lan GC, Hong TP, Lee HY, Lin CW (2013) Mining weighted frequent itemsets, The 30th workshop on Combinatorial Mathematics and Computation Theory, pp 85–89
Lee G, Yun U, Ryu KH (2014) Sliding window based weighted maximal frequent pattern mining over data streams. Expert Systems with Applications 41(2):694–708
Lin JCW, Gan W, Hong TP, Zhang B (2015) An incremental high-utility mining algorithm with transaction insertion, The Scientific World Journal
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) Mining weighted frequent itemsets with the recency constraint, Asia-Pacific Web Conference, pp 635–646
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2016) Efficient mining of weighted frequent itemsets in uncertain databases, Machine Learning and Data Mining in Pattern Recognition, pp 236–250
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2016) Efficient algorithms for mining recent weighted frequent itemsets in temporal transactional databases, The 31st Annual ACM Symposium on Applied Computing, pp 861–866
Microsoft. Example database foodmart of microsoft analysis services. Available: http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx
Pasquier N, Bastide Y, Taouil R, Lakhal L (1998) Pruning closed itemset lattices for association rules, International Conference on Advanced Databases, pp 177–196
Ng RT, Lakshmanan L, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. ACM SIGMOD Rec 27(2):13–24
Fournier-Viger P, Nkambou R, Tseng VS (2011) RuleGrowth: Mining sequential rules common to several sequences by pattern-growth, ACM symposium on applied computing, pp 956– 961
Fournier-Viger P, Faghihi U, Nkambou R, Nguifo EM (2012) CMRules: Mining sequential rules common to several sequences. Knowl-Based Syst 25(1):63–76
Pei J, Han J (2002) Constrained frequent pattern mining: A pattern-growth view. ACM SIGKDD Explorations Newsletter 4(1):31–39
Rymon R (1992) Search through systematic set enumeration, International Conference Principles of Knowledge Representation and Reasoning, pp 539–550
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements, The International Conference on Extending Database Technology: Advances in Database Technology, pp 3–17
Sun K, Bai F (2008) Mining weighted association rules without preassigned weights. IEEE Trans Knowl Data Eng 20(4):489– 495
Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework, The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 661–666
Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-Growth: An efficient algorithm for high utility itemset mining, The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 253–262
Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Systems with Applications 40(4):1256–1264
Wang W, Yang J, Yu PS (2000) Efficient mining of weighted association rules (WAR), The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, spp 270–274
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases, The SIAM International Conference on Data Mining, pp 211–225
Yun U, Leggett J (2005) WFIM: Weighted frequent itemset mining with a weight range and a minimum weight, SIAM International Conference on Data Mining, pp 636–640
Yun U, Leggett J (2006) WSpan: Weighted sequential pattern mining in large sequential database, IEEE International Conference on Intelligent Systems, pp 512–517
Acknowledgments
This research was partially supported by the National Natural Science Foundation of China (NSFC) under grant No. 61503092 and by the Tencent Project under grant CCF-Tencent IAGR20160115.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, J.CW., Gan, W., Fournier-Viger, P. et al. Efficiently mining frequent itemsets with weight and recency constraints. Appl Intell 47, 769–792 (2017). https://doi.org/10.1007/s10489-017-0915-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-0915-2