Discovering Frequent Closed Itemsets for Association Rules
Abstract
In this paper, we address the problem of finding frequent itemsets in a database. Using the closed itemset lattice framework, we show that this problem can be reduced to the problem of finding frequent closed itemsets. Based on this statement, we can construct efficient data mining algorithms by limiting the search space to the closed itemset lattice rather than the subset lattice. Moreover, we show that the set of all frequent closed itemsets suffices to determine a reduced set of association rules, thus addressing another important data mining problem: limiting the number of rules produced without information loss.We propose a new algorithm, called A-Close, using a closure mechanism to find frequent closed itemsets. We realized experiments to compare our approach to the commonly used frequent itemset search approach. Those experiments showed that our approach is very valuable for dense and/or correlated data that represent an important part of existing databases.
Preview
Unable to display preview. Download preview PDF.
References
- [1]R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Int’l Conference on Management of Data, pages 207–216, May 1993.Google Scholar
- [2]R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proceedings of the 20th Int’l Conference on Very Large Data Bases, pages 478–499, June 1994. Expanded version in IBM Research Report RJ9839.Google Scholar
- [3]R. J. Bayardo. Efficiently mining long patterns from databases. Proceedings of the ACM SIGMOD Int’l Conference on Management of Data, pages 85–93, June 1998.Google Scholar
- [4]G. Birkhoff. Lattices theory. In Coll. Pub. XXV, volume 25. American Mathematical Society, 1967. Third edition.Google Scholar
- [5]S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. Proceedings of the ACM SIGMOD Int’l Conference on Management of Data, pages 255–264, May 1997.Google Scholar
- [6]M.-S. Chen, J. Han, and P. S. Yu. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6):866–883, December 1996.CrossRefGoogle Scholar
- [7]B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 1994. Fourth edition.Google Scholar
- [8]V. Duquenne and L.-L. Guigues. Famille minimale d’implication informatives résultant d’un tableau de données binaires. Math. Sci. Hum., 24(95):5–18, 1986.MathSciNetGoogle Scholar
- [9]B. Ganter and K. Reuter. Finding all closed sets: A general approach. In Order, pages 283–290. Kluwer Academic Publishers, 1991.Google Scholar
- [10]D. Lin and Z. M. Kedem. Pincer-search: A new algorithm for discovering the maximum frequent set. Proceedings of the 6th Int’l Conference on Extending Database Technology, pages 105–119, March 1998.Google Scholar
- [11]M. Luxenburger. Implications partielles dans un contexte. Math. Inf. Sci. Hum., 29(113):35–55, 1991.MathSciNetGoogle Scholar
- [12]H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.CrossRefGoogle Scholar
- [13]H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 181–192, July 1994.Google Scholar
- [14]A. M. Mueller. Fast sequential and parallel algorithms for association rules mining: A comparison. Technical report, Faculty of the Graduate School of The University of Maryland, 1995.Google Scholar
- [15]N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Pruning closed itemset lattices for association rules. Proceedings of the BDA French Conference on Advanced Databases, October 1998. To appear.Google Scholar
- [16]A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in larges databases. Proceedings of the 21th Int’l Conference on Very Large Data Bases, pages 432–444, September 1995.Google Scholar
- [17]H. Toivonen. Sampling large databases for association rules. Proceedings of the 22nd Int’l Conference on Very Large Data Bases, pages 134–145, September 1996.Google Scholar
- [18]H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hatonen, and H. Mannila. Pruning and grouping discovered association rules. ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Databases, pages 47–52, April 1995.Google Scholar
- [19]R. Wille. Concept lattices and conceptual knowledge systems. Computers and Mathematics with Applications, 23:493–515, 1992.zbMATHCrossRefGoogle Scholar
- [20]M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. Proceedings of the 3rd Int’l Conference on Knowledge Discovery in Databases, pages 283–286, August 1997.Google Scholar