Abstract
Mining frequent sequences is a critical stage before rule generation for sequence databases. Currently, there are two main ways for mining frequent sequences, namely intra-sequence mining and inter-sequence mining. Inter-sequence mining is more attractive than intra-sequence mining because it considers the relationship between sequences in transactions. However, mining all possible frequent inter-sequences takes a long time and requires a lot of memory. Mining frequent closed inter-sequences is efficient because such sequences are compact, and only the necessary information is maintained. CISP-Miner was proposed for mining frequent closed inter-sequence patterns, but it consumes a lot of memory since many closed patterns are mined. This paper proposes an algorithm called ClosedISP for mining frequent closed inter-sequence patterns. The proposed algorithm uses a checking scheme for early eliminating and checking closed patterns without candidate maintenance. ClosedISP uses a dynamic bit vector that combines transaction information to compress data. In addition, ClosedISP adopts a prefix tree and a depth-first search order to reduce the search space and generate non-redundant sequential rules efficiently. Experiments were conducted to compare the proposed algorithm with CISP-Miner to demonstrate the effectiveness of the proposed algorithm in terms of runtime and memory usage.
Similar content being viewed by others
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: IEEE international conference on data engineering, pp 3–14
Ayres J, Gehrke J, Yiu T, Flannick J (2002) Sequential pattern mining using a bitmap representation. In: 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, pp 429–435
Dong J, Han M (2007) BitTableFI: an efficient mining frequent itemsets algorithm. Knowl-Based Syst 20(4):329–335
Feng L, Dillon TS, Liu J (2001) Inter-transactional association rules for multi-dimensional contests for prediction and their application to studying meteorological data. Data Knowl Eng 37(1):85–115
Feng L, Yu JX, Lu H, Han J (2002) A template model for multidimensional inter-transactional association rules. VLDB J 11(2):153–175
Gomariz A, Campos M, Marin R, Goethals B (2013) ClaSP: an efficient algorithm for mining frequent closed sequences. In: Advances in knowledge discovery and data mining, LNAI, vol 7818, pp 50–61
Hu Y, Panda B (2010) Mining inter-transaction data dependencies for database intrusion detection. In: Innovations and advances in computer sciences and engineering, pp 67–72
Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43(1):Article No 3
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: International conference. Database Theory (ICDT ’99), pp 398–416
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of the ACM SIGMOD workshop research issues in data mining and knowledge discovery (DMKD ’00), pp 21–30
Pei J et al (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: International conference. Data engineering, pp 215–224
Pham TT, Luo J, Hong TP, Vo B (2012) MSGPs: a novel algorithm for mining sequential generator patterns. In: Computational collective intelligence, technologies and applications, LNCS, vol 7654, pp 393–401
Pham TT, Luo J, Hong TP, Vo B (2014) An efficient method for mining non-redundant sequential rules using attributed prefix-trees. Eng Appl Artif Intell 32:88–99
Pham TT, Luo J, Vo B (2013) An effective algorithm for mining closed sequential patterns and their minimal generators based on prefix trees. Int J Intell Inf Database Syst 7(4):324– 339
Song S, Hu H, Jin S (2005) HVSM: a new sequential pattern mining algorithm using bitmap representation. Advanced data mining and applications, pp 455–463
Song W, Yang B, Xu Z (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21(6):507–513
Van TT, Vo B, Le B (2014) IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree. Vietnam J Comput Sci 1(2):97–105
Vo B, Hong TP, Le B (2012) DBV-Miner: a Dynamic Bit-Vector approach for fast mining frequent itemsets. Expert Syst Appl 39(8):7196–7206
Wang CS, Lee AJT (2009) Mining inter-sequence patterns. Expert Syst Appl 36(4):8649–8656
Wang CS, Liu Y-H, Chu KC (2013) Closed inter-sequence pattern mining. J Syst Softw 86:1603–1612
Wang J, Han J, Pei J (2003) CLOSET + : searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the ACM SIGKDD international conference. Knowledge Discovery and Data Mining (SIGKDD’03), pp 236–245
Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056
Yang Z, Kitsuregawa M (2005) LAPIN-SPAM: an improved algorithm for mining sequential pattern. ICDE Workshops 2005:1222
Zaki M, Hsiao C (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of SIAM international conference. Data Mining (SDM’02), pp 457–473
Zaki M (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60
Acknowledgments
This work was funded by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant no. 102.05-2013.20.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Le, B., Tran, MT. & Vo, B. Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors. Appl Intell 43, 74–84 (2015). https://doi.org/10.1007/s10489-014-0630-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0630-1