Advertisement

Journal of Intelligent Information Systems

, Volume 31, Issue 3, pp 191–215 | Cite as

Maintaining frequent closed itemsets over a sliding window

  • James Cheng
  • Yiping Ke
  • Wilfred Ng
Article

Abstract

In this paper, we study the incremental update of Frequent Closed Itemsets (FCIs) over a sliding window in a high-speed data stream. We propose the notion of semi-FCIs, which is to progressively increase the minimum support threshold for an itemset as it is retained longer in the window, thereby drastically reducing the number of itemsets that need to be maintained and processed. We explore the properties of semi-FCIs and observe that a majority of the subsets of a semi-FCI are not semi-FCIs and need not be updated. This finding allows us to devise an efficient algorithm, IncMine, that incrementally updates the set of semi-FCIs over a sliding window. We also develop an inverted index to facilitate the update process. Our empirical results show that IncMine achieves significantly higher throughput and consumes less memory than the state-of-the-art streaming algorithms for mining FCIs and FIs. IncMine also attains high accuracy of 100% precision and over 93% recall.

Keywords

Frequent Closed Itemset Data stream mining Sliding window 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In SIGMOD, (pp. 207–216).Google Scholar
  2. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB, (pp. 487–499).Google Scholar
  3. Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD, (pp. 487–492).Google Scholar
  4. Chang, J. H., & Lee, W. S. (2004). A sliding window method for finding recently frequent itemsets over online data streams. Journal of Information Science and Engineering, 20(4), 753–762.Google Scholar
  5. Chen, Y., Dong, G., Han, J., Wah, B. W., & Wang, J. (2002). Multi-dimensional regression analysis of time-series data streams. In VLDB, (pp. 323–334).Google Scholar
  6. Chi, Y., Wang, H., Yu, P. S., & Muntz, R. R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In ICDM, (pp. 59–66).Google Scholar
  7. FIMI Dataset Repository (2003). Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/.
  8. Garofalakis, M. N., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD, (p. 63).Google Scholar
  9. Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. S. (2004). Mining frequent patterns in data streams at multiple time granularities. Cambridge, MA: MIT Press.Google Scholar
  10. Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In SIGMOD, (pp. 1–12).Google Scholar
  11. IBM Quest (1996). Ibm quest data mining project. frequent itemset mining dataset repository. http://www.almaden.ibm.com/software/quest/.
  12. Jiang, N. & Gruenwald, L. (2006). CFI-Stream: Mining closed frequent itemsets in data streams. In KDD, (pp. 592–597).Google Scholar
  13. Kifer, D., Ben-David, S., & Gehrke, J. (2004). Detecting change in data streams. In VLDB, (pp. 180–191).Google Scholar
  14. Lee, C.-H., Lin, C.-R., & Chen, M.-S. (2001). Sliding-window filtering: An efficient algorithm for incremental mining. In CIKM, (pp. 263–270).Google Scholar
  15. Li, H., Lee, S., & Shan, M. (2004). Algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of First International Workshop on Knowledge Discovery in Data Streams.Google Scholar
  16. Manku, G. S. & Motwani, R. (2002). Approximate frequency counts over data streams. In VLDB, pages 346–357.Google Scholar
  17. Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In ICDT, (pp. 398–416).Google Scholar
  18. Pei, J., Han, J., & Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, (pp. 21–30).Google Scholar
  19. Wang, J., Han, J., & Pei, J. (2003). CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In KDD, (pp. 236–245).Google Scholar
  20. Yu, J. X., Chong, Z., Lu, H., & Zhou, A. (2004). False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In VLDB, (pp. 204–215).Google Scholar
  21. Zaki, M. J. (2000). Generating non-redundant association rules. In KDD, (pp. 34–43).Google Scholar
  22. Zaki, M. J., & Hsiao, C.-J. (2002). Charm: An efficient algorithm for closed itemset mining. In SDM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThe Hong Kong University of Science and TechnologyClear Water BayHong Kong

Personalised recommendations