Advertisement

Maintaining Frequent Itemsets over High-Speed Data Streams

  • James Cheng
  • Yiping Ke
  • Wilfred Ng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)

Abstract

We propose a false-negative approach to approximate the set of frequent itemsets (FIs) over a sliding window. Existing approximate algorithms use an error parameter, ε, to control the accuracy of the mining result. However, the use of ε leads to a dilemma. A smaller ε gives a more accurate mining result but higher computational complexity, while increasing ε degrades the mining accuracy. We address this dilemma by introducing a progressively increasing minimum support function. When an itemset is retained in the window longer, we require its minimum support to approach the minimum support of an FI. Thus, the number of potential FIs to be maintained is greatly reduced. Our experiments show that our algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than do existing algorithms for mining FIs over a sliding window.

Keywords

Data Stream Minimum Support Frequent Itemset Memory Consumption Frequent Itemset Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chang, J.H., Lee, W.S.: estWin: Adaptively Monitoring the Recent Change of Frequent Itemsets over Online Data Streams. In: Proc. of CIKM (2003)Google Scholar
  2. 2.
    Chang, J.H., Lee, W.S.: A Sliding Window method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering 20(4) (July 2004)Google Scholar
  3. 3.
    Cheng, J., Ke, Y., Ng, W.: Maintaining Frequent Itemsets over High-Speed Data Streams. Technical Report, http://www.cs.ust.hk/~csjames/pakdd06tr.pdf
  4. 4.
    Li, H., Lee, S., Shan, M.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: Proc. of First International Workshop on Knowledge Discovery in Data Streams (2004)Google Scholar
  5. 5.
    Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of VLDB (2002)Google Scholar
  6. 6.
    Yu, J., Chong, Z., Lu, H., Zhou, A.: False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. In: VLDB (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • James Cheng
    • 1
  • Yiping Ke
    • 1
  • Wilfred Ng
    • 1
  1. 1.Department of Computer ScienceHong Kong University of Science and TechnologyKowloon, Hong KongChina

Personalised recommendations