Towards a new approach for mining frequent itemsets on data stream
Mining frequent patterns on streaming data is a new challenging problem for the data mining community since data arrives sequentially in the form of continuous rapid streams. In this paper we propose a new approach for mining itemsets. Our approach has the following advantages: an efficient representation of items and a novel data structure to maintain frequent patterns coupled with a fast pruning strategy. At any time, users can issue requests for frequent itemsets over an arbitrary time interval. Furthermore our approach produces an approximate answer with an assurance that it will not bypass user-defined frequency and temporal thresholds. Finally the proposed method is analyzed by a series of experiments on different datasets.
KeywordsData streams Frequent itemsets Approximate answer
Unable to display preview. Download preview PDF.
- Chen, Y., Dong, G., Han, J., Wah, B.W., & Wang, J. (2002). Multidimensional regression analysis of time-series data streams. In VLDB Conference.Google Scholar
- Chi, Y., Wang, H., Yu, P.S., & Muntz, R.R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proceedings of International Conference on Data Missing ’04 Conference (pp. 59–66).Google Scholar
- Dokas, P., Ertoz, L., Kumar, V., Lazarevic, A., Srivastava, J., & Tan, P.-N. (2002). Data mining for network intrusion detection. In Proceedings of the 2002 National Science Foundation Workshop on Data Mining.Google Scholar
- Giannella, G., Han, J., Pei, J., Yan, X., & Yu, P. (2003). Mining frequent patterns in data streams at multiple time granularities. In Next generation data mining. New York: MIT.Google Scholar
- Han, J., Pei, J., Mortazavi-asl, B., Chen, Q., Dayal, U., & Hsu, M. (2000). Freespan: Frequent pattern-projected sequential pattern mining. In Proceedings of Knowledge Discovery and Data ’00 Conference (pp. 20–23).Google Scholar
- Jin, C., Qian, W., Sha, C., Yu, J.-X., & Zhou, A. (2003). Dynamically maintaining frequent items over a data stream. In Proceedings of International Conference on Information and Knowledge Management ’04 Conference (pp. 287–294). Washington, District of Columbia.Google Scholar
- Li, H.-F., Lee, S.Y., & Shan, M.-K. (2004). An efficient algorithm for mining frequent itemsets over the entire history of data streams. In Proceedings of the 1st International Workshop on Knowledge Discovery in Data streams.Google Scholar
- Manku, G., & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of very Large Databases ’02 Conference (pp. 346–357). Hong Kong, China.Google Scholar
- Sivanandam, S.N., Sumathi, D., Hamsapriya, T., & Babu, K. (2004). In Parallel buddy prima—A hybrid parallel frequent itemset mining algorithm for very large databases. Retrieved from www.acadjournal.com.
- Teng, W.-G., Chen, M.-S., & Yu, P.S. (2003). A regression-based temporal patterns mining schema for data streams. In Proceedings of very Large Databases ’03 Conference (pp. 93–104). Berlin, Germany.Google Scholar