Mining sequential patterns from data streams: a centroid approach
In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.
KeywordsData streams Sequential patterns Web usage mining Clustering Sequences alignment
Unable to display preview. Download preview PDF.
- MAIDS project: http://maids.ncsa.uiuc.edu/index.html.
- Agrawal, R., & Srikant, R. (March 1995). Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), Taiwan.Google Scholar
- Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD ’03: Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (pp. 487–492).Google Scholar
- Chen, Y., Dong, G., Han, J., Wah, B., & Wang, J. (2002). Multidimensional regression analysis of time-series data streams.Google Scholar
- Chen, G., Wu, X., & Zhu, X. (2005). Mining sequential patterns across data streams. University of Vermont Computer Science Technical Report, CS-05-04.Google Scholar
- Cormode, G., & Muthukrishnan, S. (2005). What’s hot and what’s not: Tracking most frequent items dynamically. In Proceedings of ACM Conference on Principles of Database Systems, volume 30(1) (pp. 249–278).Google Scholar
- Datar, M., Gionis, A., Indyk, P., & Motwani, R. (2002). Maintaining stream statistics over sliding windows. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (pp 635–644).Google Scholar
- Garofalakis, M., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD ’02: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data.Google Scholar
- Giannella, C., Han, J., Pei, J., Yan, X., & Yu., P. S. (2003). Mining frequent patterns in data streams at multiple time granularities. In H. Kargupta, A. Joshi, Sivakumar K. & Y. Yesha (Eds.), Next generation data mining. Cambridge, Massachusetts: MIT.Google Scholar
- Hay, B., Wets, G., & Vanhoof, K. (2002). Web usage mining by means of multidimensional sequence alignment method. In WEBKDD (pp. 50–65).Google Scholar
- Kum, H., Pei, J., Wang, W., & Duncan, D. (2003). ApproxMAP: Approximate mining of consensus sequential patterns. In Proceedings of SIAM International Conference on Data Mining. San Francisco, California.Google Scholar
- Masseglia, F., Cathala, F., & Poncelet, P. (September 1998). The PSP Approach for mining sequential patterns. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. Nantes, France.Google Scholar
- Masseglia, F., Poncelet, P., & Cicchetti, R. (April 2000). An efficient algorithm for web usage mining. Networking and Information Systems Journal (NIS).Google Scholar
- Masseglia, F., Tanasa, D., & Trousse, B. (2004). Web usage mining: Sequential pattern extraction with a very low support. In 6th Asia-Pacific Web Conference. APWeb, Hangzhou, China.Google Scholar
- Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. C. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In 17th International Conference on Data Engineering (ICDE).Google Scholar
- Raissi, C., Poncelet, P., & Teisseire, M. (October 2005). Need for SPEED: Mining sequential pattens in data streams. In Actes des 21iemes Journees Bases de Donnees Avancees (BDA 2005).Google Scholar
- Teng, W.-G., Chen, M.-S., & Yu, P. S. (2003). A regression-based temporal pattern mining scheme for data streams. In VLDB (pp. 93–104).Google Scholar
- Wang, J. & Han, J. (March 2004). BIDE: Efficient mining of frequent closed sequences. In Proceedings of the International Conference on Data Engineering (ICDE’04). Boston, Massachusetts.Google Scholar
- Xu, K., Zheng, Q., & Ma, S. (2003). When to update the sequential patterns of stream data? In 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (pp. 545–550).Google Scholar