Performance Drivers for Depth-First Frequent Pattern Mining
Fast algorithms for mining frequent itemsets nowadays are highly optimized and specialized. Often they resemble the basic algorithms as Apriori and Eclat only faintly. But algorithms for other pattern domains as sequences etc. typically are built on top of the basic algorithms and thus cannot participate in improvements for highly specialized algorithms for itemsets.
Therefore, we would like to investigate different properties of a basic depth-first search algorithm, Eclat, and identify its performance drivers. We view Eclat as a basic algorithm and a bundle of optional algorithmic features that are taken partly from other algorithms like 1cm and Apriori, partly new ones. We evaluate the performance impact of these different features and identify the best configuration of Eclat.
KeywordsIncidence Matrix Frequent Item Mining Task Performance Driver Transaction Database
Unable to display preview. Download preview PDF.
- AGRAWAL, R. and SRIKANT, R. (1994): Fast Algorithms for Mining Association Rules. In: J.B. Bocca, M. Jarke, and C. Zaniolo (Eds.): Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), Santiago de Chile, September 12–15, Morgan Kaufmann, 487–499.Google Scholar
- BORGELT, Ch. (2003): Efficient Implementations of Apriori and Eclat, in Goethals and Zaki (2003a).Google Scholar
- GAUL, W. and SCHMIDT-THIEME, L. (2001): Mining Generalized Association Rules for Sequential and Path Data. Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), San Jose, 593–596.Google Scholar
- GOETHALS, B. and ZAKI, M. (eds., 2003a): Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA, November 19, 2003.Google Scholar
- GOETHALS, B. and ZAKI, M. (2003b): Advances in Frequent Itemset Mining Implementations: Introduction to FIMI03, in Goethals and Zaki (2003a).Google Scholar
- GRAHNE, G. and ZHU, J. (2003): Efficiently Using Prefixtrees in Mining Frequent Itemsets, in Goethals and Zaki (2003a).Google Scholar
- HAN, J., PEI, J., and YIN, Y. (2000): Mining frequent patterns without candidate generation, in W. Chen, J.F. Naughton, and P.A. Bernstein (Eds.): Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, ACM Press, 1–12.Google Scholar
- ORLANDO, S., LUCCHESE, C., PALMERINI, P., PEREGO, R., and SILVESTRI, F. (2003): kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets, in Goethals and Zaki (2003a).Google Scholar
- PIETRACAPRINA, A. and ZANDOLIN, D. (2003): Mining Frequent Itemsets using Patricia Tries, in Goethals and Zaki (2003a).Google Scholar
- SCHMIDT-THIEME, L. and GAUL, W. (2001): Frequent Substructures in Web Usage Data — A Unified Approach, Proceedings of the Web Mining Workshop, First SIAM International Conference on Data Mining 2001 (ICDM), Chicago.Google Scholar
- UNO, T., ASAI, T., UCHIDA, Y., and ARIMURA, H. (2003): LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets, in Goethals and Zaki (2003a).Google Scholar
- ZAKI, M. and GOUDA, K. (2001): Fast Vertical Mining Using Diffsets, RPI Tech. Report. 01-1.Google Scholar