False-Negative Frequent Items Mining from Data Streams with Bursting

  • Zhihong Chong
  • Jeffrey Xu Yu
  • Hongjun Lu
  • Zhengjie Zhang
  • Aoying Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3453)

Abstract

False-negative frequent items mining from a high speed transactional data stream is to find an approximate set of frequent items with respect to a minimum support threshold, s. It controls the possibility of missing frequent items using a reliability parameter δ. The importance of false-negative frequent items mining is that it can exclude false-positives and therefore significantly reduce the memory consumption for frequent itemsets mining. The key issue of false-negative frequent items mining is how to minimize the possibility of missing frequent items. In this paper, we propose a new false-negative frequent items mining algorithm, called Loss-Negative, for handling bursting in data streams. The new algorithm consumes the smallest memory in comparison with other false-negative and false-positive frequent items algorithms. We present theoretical bound of the new algorithm, and analyze the possibility of minimization of missing frequent items, in terms of two possibilities, namely, in-possibility and out-possibility. The former is about how a frequent item can possibly pass the first pruning. The latter is about how long a frequent item can stay in memory while no occurrences of the item comes in the following data stream for a certain period. The new proposed algorithm is superior to the existing false-negative frequent items mining algorithms in terms of the two possibilities. We demonstrate the effectiveness of the new algorithm in this paper.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proc. of the 29th ICALP (2002)Google Scholar
  2. 2.
    Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proc. of PODS 2003 (2003)Google Scholar
  3. 3.
    Demaine, E., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Proc. of 10th Annual European Symposium on Algorithms (2002)Google Scholar
  4. 4.
    Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proc. of VLDB 2002 (2002)Google Scholar
  5. 5.
    Karp, S.S.R.M., Papadimitrlou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems 28, 51–55 (2003)CrossRefGoogle Scholar
  6. 6.
    Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In: Proc. of VLDB 2004 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Zhihong Chong
    • 1
  • Jeffrey Xu Yu
    • 2
  • Hongjun Lu
    • 3
  • Zhengjie Zhang
    • 1
  • Aoying Zhou
    • 1
  1. 1.Fudan UniversityChina
  2. 2.Chinese University of Hong KongChina
  3. 3.Hong Kong University of Science and TechnologyChina

Personalised recommendations