A well-known approach to Knowledge Discovery in Databases involves the identification of association rules linking database attributes. Extracting all possible association rules from a database, however, is a computationally intractable problem, because of the combinatorial explosion in the number of sets of attributes for which incidence-counts must be computed. Existing methods for dealing with this may involve multiple passes of the database, and tend still to cope badly with densely-packed database records. We describe here a class of methods we have introduced that begin by using a single database pass to perform a partial computation of the totals required, storing these in the form of a set enumeration tree, which is created in time linear to the size of the database. Algorithms for using this structure to complete the count summations are discussed, and a method is described, derived from the well-known Apriori algorithm. Results are presented demonstrating the performance advantage to be gained from the use of this approach. Finally, we discuss possible further applications of the method.
Unable to display preview. Download preview PDF.
- Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD-93, pp. 207–216.Google Scholar
- Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. 20th VLDB Conference, Santiago, pp. 487–499.Google Scholar
- Bayardo, R.J. 1998. Efficiently mining long patterns from databases. In Proc. ACM-SIGMOD Int Conf on Management of Data, pp. 85–93.Google Scholar
- Bayardo, R.J., Agrawal, R., and Gunopolos, D. 1999. Constraint-based rule mining in large, dense databases. In Proc. 15th Int Conf on Data Engineering.Google Scholar
- Brin, S., Motwani, R., Ullman, J.D., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. In Proc. ACM SIGMOD Conference, pp. 255–264.Google Scholar
- Fayyad, U., Piatetsky-Shapiro, G., and Smythe, P. 1996. Knowledge discovery and data mining: Towards a unifying framework. In Proceedings of the Second International Conference on Data Mining and Knowledge Discovery, AAAI Press, pp. 82–95.Google Scholar
- Goulbourne, G., Coenen, F., and Leng, P. 2000. Algorithms for computing association rules using a partial-support tree. J. Knowledge-Based Systems, 13: 141–149.Google Scholar
- Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proc. ACMSIGMOD 2000 Conference, pp. 1–12.Google Scholar
- Houtsma, M. and Swami, A. 1993. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Centre, San Jose.Google Scholar
- Mannila, H., Toivonen, H., and Verkamo, A.I. 1994. Efficient algorithms for discovering association rules. In Proc. AAAIWorkshop on Knowledge Discovery in Databases, U.M. Fayyad and R. Uthurusamy (Eds.), Seattle, pp. 181–192.Google Scholar
- Rymon, R. 1992. Search through systematic set enumeration. In Proc. 3rd Int'l Conf. on Principles of Knowledge Represenation and Reasoning, pp. 539–550.Google Scholar
- Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. In Proc. 21st VLDB Conference, Zurich, pp. 432–444.Google Scholar
- Toivonen, H. 1996. Sampling large databases for association rules. In Proc. 22nd VLDB Conference, Bombay, pp. 134–145.Google Scholar
- Zaki, M.J., Parthasarathy, S., Ogihara, M., and Li,W. 1997. New algorithms for fast discovery of association rules. Technical report 651, University of Rochester, Computer Science Department, New York.Google Scholar