Skip to main content

Advertisement

Log in

Mining top-k frequent patterns in the presence of the memory constraint

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We explore in this paper a practicably interesting mining task to retrieve top-k (closed) itemsets in the presence of the memory constraint. Specifically, as opposed to most previous works that concentrate on improving the mining efficiency or on reducing the memory size by best effort, we first attempt to specify the available upper memory size that can be utilized by mining frequent itemsets. To comply with the upper bound of the memory consumption, two efficient algorithms, called MTK and MTK_Close, are devised for mining frequent itemsets and closed itemsets, respectively, without specifying the subtle minimum support. Instead, users only need to give a more human-understandable parameter, namely the desired number of frequent (closed) itemsets k. In practice, it is quite challenging to constrain the memory consumption while also efficiently retrieving top-k itemsets. To effectively achieve this, MTK and MTK_Close are devised as level-wise search algorithms, where the number of candidates being generated-and-tested in each database scan will be limited. A novel search approach, called δ-stair search, is utilized in MTK and MTK_Close to effectively assign the available memory for testing candidate itemsets with various itemset-lengths, which leads to a small number of required database scans. As demonstrated in the empirical study on real data and synthetic data, instead of only providing the flexibility of striking a compromise between the execution efficiency and the memory consumption, MTK and MTK_Close can both achieve high efficiency and have a constrained memory bound, showing the prominent advantage to be practical algorithms of mining frequent patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. In: Proceedings of ACM SIGKDD (2004)

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of VLDB (1994)

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of PODS (2002)

  4. Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of SIGMOD (1997)

  5. Chen M.-S., Park J.-S., Yu P.S. (1998): Efficient data mining for path traversal patterns. IEEE Trans. Knowledge Data Eng. 10(2): 209–221

    Article  Google Scholar 

  6. Cheung, Y.L., Fu, A.W.: Mining association rules without support threshold: with and without item constraints. In: TKDE (2004)

  7. Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of ICDM (2004)

  8. Geerts, F., Goethals, B., Bussche, J.V.D. (2005) Tight upper bounds on the number of candidate patterns. ACM Trans. Database Syst. 30(2): 333–363

    Article  Google Scholar 

  9. Goethals, B.: Survey on frequent pattern mining, online technical report. http://www.adrem.ua.ac.be/bibrem/pubs/fpm_survey.pdf (2003)

  10. Goethals, B.: Memory issues in frequent itemset mining. In: Proceedings of SAC (2004)

  11. Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: introduction to FIMI03. In: Proceedings of Workshop on Frequent Itemset Mining Implementations (2003)

  12. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD (2000)

  13. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. In: DMKD (2004)

  14. Hipp, J., Guntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining—a general survey and comparison. In: SIGKDD Explorations (2000)

  15. Manku, G.S., Motwani, R.: Approximate frequency counts over streaming data. In: Proceedings of VLDB (2002)

  16. Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: AAAI Workshop on Knowledge Discovery in Databases (KDD-94) (1994)

  17. Orlando, S., Lucchese, C., Palmerini, P., Perego, R., Silvestri, F.: kDCI: a multi-strategy algorithm for mining frequent sets. In: Proceedings of Workshop on Frequent Itemset Mining Implementations (2004)

  18. Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and resource-aware mining of frequent sets. In: Proceedings of IEEE ICDM (2002)

  19. Park, J.-S., Chen, M.-S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of ACM SIGMOD (1995)

  20. Park, J.S., Chen, M.-S., Yu, P.S.: Using a hash-based method with transaction trimming for mining association rules. IEEE Trans. Knowl. Data Eng. 9(5), (1997)

  21. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT (1999)

  22. Wang, J., Han, J., Lu, Y., Tzvetkov, P.: TFP: an efficient algorithm for mining top-k frequent closed itemsets. In: TKDE (2005)

  23. Wong, R.C.-W., Fu, A.W.: Mining top-k itemsets over a sliding window based on zipfian distribution. In: Proceedings of SIAM SDM (2005)

  24. Xiao, Y., Dunham, M.H.: Considering main memory in mining association rules. In: Proceedings of DAWAK (1999)

  25. Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of VLDB (2004)

  26. Zaki, M.J., Hsiao, C.-J.: Charm: an efficient algorithm for closed itemset mining. In: Proceedings of SIAM SDM (2002)

  27. Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of SIGKDD (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun-Ta Chuang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chuang, KT., Huang, JL. & Chen, MS. Mining top-k frequent patterns in the presence of the memory constraint. The VLDB Journal 17, 1321–1344 (2008). https://doi.org/10.1007/s00778-007-0078-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0078-6

Keywords

Navigation