Abstract
Mining frequent patterns on streaming data is a new challenging problem for the data mining community since data arrives sequentially in the form of continuous rapid streams. In this paper we propose a new approach for mining itemsets. Our approach has the following advantages: an efficient representation of items and a novel data structure to maintain frequent patterns coupled with a fast pruning strategy. At any time, users can issue requests for frequent itemsets over an arbitrary time interval. Furthermore our approach produces an approximate answer with an assurance that it will not bypass user-defined frequency and temporal thresholds. Finally the proposed method is analyzed by a series of experiments on different datasets.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large database. In Proceedings of the International Conference on Management of Data (ACM SIGMOD 93) (pp. 207–216). New York: ACM.
Chen, Y., Dong, G., Han, J., Wah, B.W., & Wang, J. (2002). Multidimensional regression analysis of time-series data streams. In VLDB Conference.
Chi, Y., Wang, H., Yu, P.S., & Muntz, R.R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proceedings of International Conference on Data Missing ’04 Conference (pp. 59–66).
Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., & Tan, P.-N. (2002). Data mining for network intrusion detection. In Proceedings of the 2002 National Science Foundation Workshop on Data Mining.
Giannella, G., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in data streams at multiple time granularities. In Next generation data mining. New York: MIT.
Han, J., Pei, J., Mortazavi-asl, B., Chen, Q., Dayal, U., & Hsu, M. (2000). Freespan: Frequent pattern-projected sequential pattern mining. In Proceedings of Knowledge Discovery and Data ’00 Conference (pp. 20–23).
Jin, C., Qian, W., Sha, C., Yu, J.-X., & Zhou, A. (2003). Dynamically maintaining frequent items over a data stream. In Proceedings of International Conference on Information and Knowledge Management ’04 Conference (pp. 287–294). Washington, District of Columbia.
Karp, R.-M., Shenker, S., & Papadimitriou, C.-H. (2003). A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems, 28(1), 51–55.
Li, H.-F., Lee, S.Y., & Shan, M.-K. (2004). An efficient algorithm for mining frequent itemsets over the entire history of data streams. In Proceedings of the 1st International Workshop on Knowledge Discovery in Data streams.
Manku, G., & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of very Large Databases ’02 Conference (pp. 346–357). Hong Kong, China.
Sivanandam, S.N., Sumathi, D., Hamsapriya, T., & Babu, K. (2004). In Parallel buddy prima—A hybrid parallel frequent itemset mining algorithm for very large databases. Retrieved from www.acadjournal.com.
Teng, W.-G., Chen, M.-S., & Yu, P.S. (2003). A regression-based temporal patterns mining schema for data streams. In Proceedings of very Large Databases ’03 Conference (pp. 93–104). Berlin, Germany.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raïssi, C., Poncelet, P. & Teisseire, M. Towards a new approach for mining frequent itemsets on data stream. J Intell Inf Syst 28, 23–36 (2007). https://doi.org/10.1007/s10844-006-0002-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-0002-3