False-Negative Frequent Items Mining from Data Streams with Bursting
False-negative frequent items mining from a high speed transactional data stream is to find an approximate set of frequent items with respect to a minimum support threshold, s. It controls the possibility of missing frequent items using a reliability parameter δ. The importance of false-negative frequent items mining is that it can exclude false-positives and therefore significantly reduce the memory consumption for frequent itemsets mining. The key issue of false-negative frequent items mining is how to minimize the possibility of missing frequent items. In this paper, we propose a new false-negative frequent items mining algorithm, called Loss-Negative, for handling bursting in data streams. The new algorithm consumes the smallest memory in comparison with other false-negative and false-positive frequent items algorithms. We present theoretical bound of the new algorithm, and analyze the possibility of minimization of missing frequent items, in terms of two possibilities, namely, in-possibility and out-possibility. The former is about how a frequent item can possibly pass the first pruning. The latter is about how long a frequent item can stay in memory while no occurrences of the item comes in the following data stream for a certain period. The new proposed algorithm is superior to the existing false-negative frequent items mining algorithms in terms of the two possibilities. We demonstrate the effectiveness of the new algorithm in this paper.
Unable to display preview. Download preview PDF.
- 1.Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proc. of the 29th ICALP (2002)Google Scholar
- 2.Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proc. of PODS 2003 (2003)Google Scholar
- 3.Demaine, E., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Proc. of 10th Annual European Symposium on Algorithms (2002)Google Scholar
- 4.Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proc. of VLDB 2002 (2002)Google Scholar
- 6.Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: Proc. of VLDB 2004 (2004)Google Scholar