Fast Discovery of Time-Constrained Sequential Patterns Using Time-Indexes
Sequential pattern mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining. Time-constraints cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. In this paper, we propose a memory time-indexing approach (called METISP) to discover sequential patterns with time constraints including minimum/maximum/exact gaps, sliding window, and duration. METISP scans the database into memory and constructs time-index sets for effective processing. Utilizing the index sets and the pattern-growth strategy, METISP efficiently mines the desired patterns without generating any candidate or sub-database. The comprehensive experiments show that METISP outperforms GSP and DELISP in the discovery of time-constrained sequential patterns, even with low support thresholds and very large databases.
KeywordsSequential Pattern Time Index Minimum Support Frequent Item Support Threshold
Unable to display preview. Download preview PDF.
- 1.Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential PAttern Mining using A Bitmap Representation. In: Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining, pp. 429–435 (2002)Google Scholar
- 2.Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14 (1995)Google Scholar
- 3.Agrawal, R., Srikant, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Proceedings of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 3–17 (1996)Google Scholar
- 4.Chiu, D.Y., Wu, Y.H., Chen, A.L.P.: An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting. In: Proceedings of the 20th International Conference on Data Engineering, pp. 375–386 (2004)Google Scholar
- 5.Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. In: Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, September 1999, pp. 223–234 (1999)Google Scholar
- 6.Lin, M.Y., Lee, S.Y.: Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning. Journal of Information Science and Engineering 21(1), 109–128 (2005)Google Scholar
- 8.Masseglia, F., Cathala, F., Poncelet, P.: The PSP Approach for Mining Sequential Patterns. In: Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, Nantes, France, pp. 176–184 (1998)Google Scholar
- 9.Orlando, S., Perego, R., Silvestri, C.: A new algorithm for gap constrained sequence mining. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 540–547 (2004)Google Scholar
- 10.Pei, J., Han, J., Moryazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, April 2001, pp. 215–224 (2001)Google Scholar
- 11.Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 18–25 (2002)Google Scholar
- 13.Zaki, M.J.: Sequence Mining in Categorical Domains: Incorporating Constraints. In: Proceedings of the 9th International Conference on Information and Knowledge Management, November 2000, pp. 422–429. Washington, DC (2000)Google Scholar