A Data Mining Algorithm of Frequent Pattern for Data Flow Based on Landmark Window
According to the fact that data flow have the characteristics of large volume of data and real-time processing, adopting landmark window pattern, and overcoming the shortcoming of sliding window pattern and decaying window pattern such as information loss, a representation method of the transaction two-tuple based data flow is proposed. The article proposes the concept of data flow base, and obtains the transaction two-tuple by real-time scanning one time for data flow. Whether the scale of data flow is how large or not, the number of the transaction two-tuple will not exceed data flow base, if the value range of the attribute of data flow is distributed rationally, then the whole two-tuple can completely in memory, and the two-tuple is stored using the hash table. This scheme improves the speed of data mining, and does without losing the basic information of data flow, and has certain practicability and reliability.
KeywordsData Stream Association Rule Hash Table Frequent Itemsets Data Mining Algorithm
Unable to display preview. Download preview PDF.
- 1.Zhang, C.-S.: Improved fast Apriori algorithm for database one scanning. Computer Engineering and Design 30, 3811–3813 (2009) (in Chinese)Google Scholar
- 2.Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACMSIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 286–296. ACM Press, Paris (2004)Google Scholar
- 3.Li, H., Lee, S., Shan, M.: An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proceedings of the First International Workshop on Knowledge Discovery in Data Streams, Held in Conjunction with the 15th European Conference on Machine Learning, ECML 2004 and the 8th European Conference on the Principles and Practice of Knowledge Discovery in Databases, PKDD 2004, Pisa, Italy (2004)Google Scholar
- 5.Hang, J., Lee, W.S.: EstWin: adaptively monitoring the recent change of frequent itemsets over online data streams. In: Proceedings of the 12th International Conference on Information and Knowledge Management, pp. 536–539. ACM Press, New Orleans (2003)Google Scholar
- 10.Xu, J.-M., Hao, L.-W., Wang, Y.: Fast algorithm for mining frequent itemsets over data streams. Computer Engineering and Applications 44, 142–168 (2008) (in Chinese)Google Scholar
- 11.Cheng, Z., Wang, B.: Frequent Pattern Mining in Data Streams. Computer Technology and Development 17, 53–59 (2007) (in Chinese)Google Scholar
- 12.itemsets overdata stream by matrix. Journal of Frontiers of Computer Science and Technology 2, 330–336 (2008) (in Chinese)Google Scholar
- 13.Lee, D., Lee, W.: Finding maximal frequent itemsets over online data streams adaptively. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 266–273. IEEE Press, Houston (2005)Google Scholar