Advertisement

Journal of Intelligent Information Systems

, Volume 27, Issue 3, pp 291–307 | Cite as

Mining sequential patterns from data streams: a centroid approach

  • Alice Marascu
  • Florent Masseglia
Article

Abstract

In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.

Keywords

Data streams Sequential patterns Web usage mining Clustering Sequences alignment 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., & Srikant, R. (March 1995). Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), Taiwan.Google Scholar
  2. Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD ’03: Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (pp. 487–492).Google Scholar
  3. Chang, J. H., & Lee, W. S. (2005). Efficient mining method for retrieving sequential patterns over online data streams. Journal of Information Science, 31(5), 420–432.CrossRefGoogle Scholar
  4. Chen, Y., Dong, G., Han, J., Wah, B., & Wang, J. (2002). Multidimensional regression analysis of time-series data streams.Google Scholar
  5. Chen, G., Wu, X., & Zhu, X. (2005). Mining sequential patterns across data streams. University of Vermont Computer Science Technical Report, CS-05-04.Google Scholar
  6. Cormode, G., & Muthukrishnan, S. (2005). What’s hot and what’s not: Tracking most frequent items dynamically. In Proceedings of ACM Conference on Principles of Database Systems, volume 30(1) (pp. 249–278).Google Scholar
  7. Datar, M., Gionis, A., Indyk, P., & Motwani, R. (2002). Maintaining stream statistics over sliding windows. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (pp 635–644).Google Scholar
  8. Garofalakis, M., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD ’02: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data.Google Scholar
  9. Giannella, C., Han, J., Pei, J., Yan, X., & Yu., P. S. (2003). Mining frequent patterns in data streams at multiple time granularities. In H. Kargupta, A. Joshi, Sivakumar K. & Y. Yesha (Eds.), Next generation data mining. Cambridge, Massachusetts: MIT.Google Scholar
  10. Hay, B., Wets, G., & Vanhoof, K. (2002). Web usage mining by means of multidimensional sequence alignment method. In WEBKDD (pp. 50–65).Google Scholar
  11. Kum, H., Pei, J., Wang, W., & Duncan, D. (2003). ApproxMAP: Approximate mining of consensus sequential patterns. In Proceedings of SIAM International Conference on Data Mining. San Francisco, California.Google Scholar
  12. Masseglia, F., Cathala, F., & Poncelet, P. (September 1998). The PSP Approach for mining sequential patterns. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. Nantes, France.Google Scholar
  13. Masseglia, F., Poncelet, P., & Cicchetti, R. (April 2000). An efficient algorithm for web usage mining. Networking and Information Systems Journal (NIS).Google Scholar
  14. Masseglia, F., Tanasa, D., & Trousse, B. (2004). Web usage mining: Sequential pattern extraction with a very low support. In 6th Asia-Pacific Web Conference. APWeb, Hangzhou, China.Google Scholar
  15. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. C. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In 17th International Conference on Data Engineering (ICDE).Google Scholar
  16. Raissi, C., Poncelet, P., & Teisseire, M. (October 2005). Need for SPEED: Mining sequential pattens in data streams. In Actes des 21iemes Journees Bases de Donnees Avancees (BDA 2005).Google Scholar
  17. Teng, W.-G., Chen, M.-S., & Yu, P. S. (2003). A regression-based temporal pattern mining scheme for data streams. In VLDB (pp. 93–104).Google Scholar
  18. Wang, J. & Han, J. (March 2004). BIDE: Efficient mining of frequent closed sequences. In Proceedings of the International Conference on Data Engineering (ICDE’04). Boston, Massachusetts.Google Scholar
  19. Xu, K., Zheng, Q., & Ma, S. (2003). When to update the sequential patterns of stream data? In 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (pp. 545–550).Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.INRIA Sophia AntipolisSophia AntipolisFrance

Personalised recommendations