Maintaining frequent closed itemsets over a sliding window
In this paper, we study the incremental update of Frequent Closed Itemsets (FCIs) over a sliding window in a high-speed data stream. We propose the notion of semi-FCIs, which is to progressively increase the minimum support threshold for an itemset as it is retained longer in the window, thereby drastically reducing the number of itemsets that need to be maintained and processed. We explore the properties of semi-FCIs and observe that a majority of the subsets of a semi-FCI are not semi-FCIs and need not be updated. This finding allows us to devise an efficient algorithm, IncMine, that incrementally updates the set of semi-FCIs over a sliding window. We also develop an inverted index to facilitate the update process. Our empirical results show that IncMine achieves significantly higher throughput and consumes less memory than the state-of-the-art streaming algorithms for mining FCIs and FIs. IncMine also attains high accuracy of 100% precision and over 93% recall.
KeywordsFrequent Closed Itemset Data stream mining Sliding window
Unable to display preview. Download preview PDF.
- Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In SIGMOD, (pp. 207–216).Google Scholar
- Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB, (pp. 487–499).Google Scholar
- Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD, (pp. 487–492).Google Scholar
- Chang, J. H., & Lee, W. S. (2004). A sliding window method for finding recently frequent itemsets over online data streams. Journal of Information Science and Engineering, 20(4), 753–762.Google Scholar
- Chen, Y., Dong, G., Han, J., Wah, B. W., & Wang, J. (2002). Multi-dimensional regression analysis of time-series data streams. In VLDB, (pp. 323–334).Google Scholar
- Chi, Y., Wang, H., Yu, P. S., & Muntz, R. R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In ICDM, (pp. 59–66).Google Scholar
- FIMI Dataset Repository (2003). Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/.
- Garofalakis, M. N., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD, (p. 63).Google Scholar
- Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. S. (2004). Mining frequent patterns in data streams at multiple time granularities. Cambridge, MA: MIT Press.Google Scholar
- Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In SIGMOD, (pp. 1–12).Google Scholar
- IBM Quest (1996). Ibm quest data mining project. frequent itemset mining dataset repository. http://www.almaden.ibm.com/software/quest/.
- Jiang, N. & Gruenwald, L. (2006). CFI-Stream: Mining closed frequent itemsets in data streams. In KDD, (pp. 592–597).Google Scholar
- Kifer, D., Ben-David, S., & Gehrke, J. (2004). Detecting change in data streams. In VLDB, (pp. 180–191).Google Scholar
- Lee, C.-H., Lin, C.-R., & Chen, M.-S. (2001). Sliding-window filtering: An efficient algorithm for incremental mining. In CIKM, (pp. 263–270).Google Scholar
- Li, H., Lee, S., & Shan, M. (2004). Algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of First International Workshop on Knowledge Discovery in Data Streams.Google Scholar
- Manku, G. S. & Motwani, R. (2002). Approximate frequency counts over data streams. In VLDB, pages 346–357.Google Scholar
- Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In ICDT, (pp. 398–416).Google Scholar
- Pei, J., Han, J., & Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, (pp. 21–30).Google Scholar
- Wang, J., Han, J., & Pei, J. (2003). CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In KDD, (pp. 236–245).Google Scholar
- Yu, J. X., Chong, Z., Lu, H., & Zhou, A. (2004). False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In VLDB, (pp. 204–215).Google Scholar
- Zaki, M. J. (2000). Generating non-redundant association rules. In KDD, (pp. 34–43).Google Scholar
- Zaki, M. J., & Hsiao, C.-J. (2002). Charm: An efficient algorithm for closed itemset mining. In SDM.Google Scholar