Mining top-k frequent patterns in the presence of the memory constraint

Chuang, Kun-Ta; Huang, Jiun-Long; Chen, Ming-Syan

doi:10.1007/s00778-007-0078-6

Mining top-k frequent patterns in the presence of the memory constraint

Regular Paper
Published: 07 November 2007

Volume 17, pages 1321–1344, (2008)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Kun-Ta Chuang¹,
Jiun-Long Huang² &
Ming-Syan Chen¹

291 Accesses
46 Citations
Explore all metrics

Abstract

We explore in this paper a practicably interesting mining task to retrieve top-k (closed) itemsets in the presence of the memory constraint. Specifically, as opposed to most previous works that concentrate on improving the mining efficiency or on reducing the memory size by best effort, we first attempt to specify the available upper memory size that can be utilized by mining frequent itemsets. To comply with the upper bound of the memory consumption, two efficient algorithms, called MTK and MTK_Close, are devised for mining frequent itemsets and closed itemsets, respectively, without specifying the subtle minimum support. Instead, users only need to give a more human-understandable parameter, namely the desired number of frequent (closed) itemsets k. In practice, it is quite challenging to constrain the memory consumption while also efficiently retrieving top-k itemsets. To effectively achieve this, MTK and MTK_Close are devised as level-wise search algorithms, where the number of candidates being generated-and-tested in each database scan will be limited. A novel search approach, called δ-stair search, is utilized in MTK and MTK_Close to effectively assign the available memory for testing candidate itemsets with various itemset-lengths, which leads to a small number of required database scans. As demonstrated in the empirical study on real data and synthetic data, instead of only providing the flexibility of striking a compromise between the execution efficiency and the memory consumption, MTK and MTK_Close can both achieve high efficiency and have a constrained memory bound, showing the prominent advantage to be practical algorithms of mining frequent patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. In: Proceedings of ACM SIGKDD (2004)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of VLDB (1994)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of PODS (2002)
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of SIGMOD (1997)
Chen M.-S., Park J.-S., Yu P.S. (1998): Efficient data mining for path traversal patterns. IEEE Trans. Knowledge Data Eng. 10(2): 209–221
Article Google Scholar
Cheung, Y.L., Fu, A.W.: Mining association rules without support threshold: with and without item constraints. In: TKDE (2004)
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of ICDM (2004)
Geerts, F., Goethals, B., Bussche, J.V.D. (2005) Tight upper bounds on the number of candidate patterns. ACM Trans. Database Syst. 30(2): 333–363
Article Google Scholar
Goethals, B.: Survey on frequent pattern mining, online technical report. http://www.adrem.ua.ac.be/bibrem/pubs/fpm_survey.pdf (2003)
Goethals, B.: Memory issues in frequent itemset mining. In: Proceedings of SAC (2004)
Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: introduction to FIMI03. In: Proceedings of Workshop on Frequent Itemset Mining Implementations (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD (2000)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. In: DMKD (2004)
Hipp, J., Guntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining—a general survey and comparison. In: SIGKDD Explorations (2000)
Manku, G.S., Motwani, R.: Approximate frequency counts over streaming data. In: Proceedings of VLDB (2002)
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: AAAI Workshop on Knowledge Discovery in Databases (KDD-94) (1994)
Orlando, S., Lucchese, C., Palmerini, P., Perego, R., Silvestri, F.: kDCI: a multi-strategy algorithm for mining frequent sets. In: Proceedings of Workshop on Frequent Itemset Mining Implementations (2004)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and resource-aware mining of frequent sets. In: Proceedings of IEEE ICDM (2002)
Park, J.-S., Chen, M.-S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of ACM SIGMOD (1995)
Park, J.S., Chen, M.-S., Yu, P.S.: Using a hash-based method with transaction trimming for mining association rules. IEEE Trans. Knowl. Data Eng. 9(5), (1997)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT (1999)
Wang, J., Han, J., Lu, Y., Tzvetkov, P.: TFP: an efficient algorithm for mining top-k frequent closed itemsets. In: TKDE (2005)
Wong, R.C.-W., Fu, A.W.: Mining top-k itemsets over a sliding window based on zipfian distribution. In: Proceedings of SIAM SDM (2005)
Xiao, Y., Dunham, M.H.: Considering main memory in mining association rules. In: Proceedings of DAWAK (1999)
Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of VLDB (2004)
Zaki, M.J., Hsiao, C.-J.: Charm: an efficient algorithm for closed itemset mining. In: Proceedings of SIAM SDM (2002)
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of SIGKDD (2001)

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, ROC
Kun-Ta Chuang & Ming-Syan Chen
Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC
Jiun-Long Huang

Authors

Kun-Ta Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Jiun-Long Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Syan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun-Ta Chuang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chuang, KT., Huang, JL. & Chen, MS. Mining top-k frequent patterns in the presence of the memory constraint. The VLDB Journal 17, 1321–1344 (2008). https://doi.org/10.1007/s00778-007-0078-6

Download citation

Received: 16 January 2006
Revised: 11 March 2007
Accepted: 08 August 2007
Published: 07 November 2007
Issue Date: August 2008
DOI: https://doi.org/10.1007/s00778-007-0078-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining top-k frequent patterns in the presence of the memory constraint

Abstract

Access this article

Similar content being viewed by others

A high utility itemset mining algorithm based on subsume index

Top-K Miner: top-K identical frequent itemsets discovery without user support threshold

More Efficient Algorithm for Mining Frequent Patterns with Multiple Minimum Supports

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining top-k frequent patterns in the presence of the memory constraint

Abstract

Access this article

Similar content being viewed by others

A high utility itemset mining algorithm based on subsume index

Top-K Miner: top-K identical frequent itemsets discovery without user support threshold

More Efficient Algorithm for Mining Frequent Patterns with Multiple Minimum Supports

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation