Abstract
We present GenMax, a backtrack search based algorithm for mining maximal frequent itemsets. GenMax uses a number of optimizations to prune the search space. It uses a novel technique called progressive focusing to perform maximality checking, and diffset propagation to perform fast frequency computation. Systematic experimental comparison with previous work indicates that different methods have varying strengths and weaknesses based on dataset characteristics. We found GenMax to be a highly efficient method to mine the exact set of maximal patterns.
Similar content being viewed by others
References
Agrawal, R., Aggarwal, C., and Prasad, V. 2000. Depth first generation of long patterns. In 7th Int'l Conference on Knowledge Discovery and Data Mining, pp. 108–118.
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A.I. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, Fayyad, U. et al. (Eds.), Menlo Park, CA: AAAI Press, pp. 307–328.
Bayardo, R.J. 1998. Efficiently mining long patterns from databases. In ACM SIGMOD Conf. on Management of Data, pp. 85–93.
Burdick, D., Calimlim, M., and Gehrke, J. 2001. MAFIA: A maximal frequent itemset algorithm for transactional databases. In IEEE Intl. Conf. on Data Engineering, pp. 443–452.
Goethals, B., and Zaki, M. 2003. Advances in frequent itemset mining implementations: Report on FIMI'03. SIGKDD Explorations, 6(1):109–117.
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., and Sharma, R. 2003. Discovering all most specific sentences. ACM Transactions on Database Systems, 28(2):140–174.
Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. on Management of Data, pp. 1–12.
Lin, D.-L., and Kedem, Z.M. 1998. Pincer-search: A new algorithm for discovering the maximum frequent set. In 6th Intl. Conf. on Extending Database Technology, pp. 105–119.
Yellin, D. 1994. An algorithm for dynamic subset and intersection testing. Theoretical Computer Science, 129:397–406.
Zaki, M.J. 2000. Generating non-redundant association rules. In 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp. 34–43.
Zaki, M.J., and Gouda, K. 2003. Fast vertical mining using Diffsets. In 9th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp. 326–335.
Zaki, M.J., and Hsiao, C.-J. 2002. CHARM: An efficient algorithm for closed itemset mining. In 2nd SIAM International Conference on Data Mining, pp. 457–473.
Acknowledgments
We would like to thank Roberto Bayardo for providing us the MaxMiner algorithm and Johannes Gehrke for the MAFIA algorithm. This work was supported in part by NSF CAREER Award IIS-0092978, DOE Career Award DE-FG02-02ER25538, and NSF grants EIA-0103708 and EMT-0432098.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gouda, K., Zaki, M.J. GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets. Data Min Knowl Disc 11, 223–242 (2005). https://doi.org/10.1007/s10618-005-0002-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-005-0002-x