Journal of Intelligent Information Systems

, Volume 24, Issue 1, pp 29–60 | Cite as

Generating a Condensed Representation for Association Rules

  • Nicolas Pasquier
  • Rafik Taouil
  • Yves Bastide
  • Gerd Stumme
  • Lotfi Lakhal
Article

Abstract

Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for association rules. This representation is characterized by frequent closed itemsets and their generators. It contains the non-redundant association rules having minimal antecedent and maximal consequent, called min-max association rules. We think that these rules are the most relevant since they are the most general non-redundant association rules. Furthermore, this representation is a basis, i.e., a generating set for all association rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all association rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsets—such as Apriori for instance—is used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset.

Keywords

data mining Galois closure operator frequent closed itemsets generators min-max association rules basis for association rules condensed representation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining Association Rules Between Sets of Items in Large Databases. In Proceedings of the SIGMOD Conference, pp. 207–216.Google Scholar
  2. Agrawal, R. and Srikant, R. (1994). Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the VLDB Conference, pp. 478–499.Google Scholar
  3. Armstrong, W.W. (1974). Dependency Structures of Data Base Relationships. In Proceedings of the IFIP Congress, pp. 580–583.Google Scholar
  4. Baralis, E. and Psaila, G. (1997). Designing Templates for Mining Association Rules. Journal of Intelligent Information Systems, 9(1), 7–32; L. Kerschberg, Z. Ras, and M. Zemankova (Eds.), Kluwer Academic Publishers.Google Scholar
  5. Bastide, Y., Taouil, Y., Pasquier, N., Stumme, G., and Lakhal, L. (2000). Mining Frequent Patterns with Counting Inference. SIGKDD Explorations, 2(2), 66–75; U. Fayyad, S. Sarawagi and P. Bradley (Eds.), ACMComputer Press.Google Scholar
  6. Bayardo, R.J. (1998). Efficiently Mining Long Patterns from Databases. In Proceedings of the SIGMOD Conference, pp. 85–93.Google Scholar
  7. Bayardo, R.J. and Agrawal, R. (1999). Mining the Most Interesting Rules.In Proceedings of the KDD Conference, pp. 145–154.Google Scholar
  8. Bayardo, R.J., Agrawal, R., and Gunopulos, D. (2000).Constraint-Based Rule Mining in Large, Dense Databases. Data Mining and Knowledge Discovery, 4(2/3), 217–240; S. Chaudhuri (Ed.), Kluwer Academic Publishers.Google Scholar
  9. Beeri, C. and Bernstein, P.A. (1979). Computational Problems Related to the Design of Normal form Relational Schemas. Transactions on Database Systems, 4(1), 30–59.Google Scholar
  10. Blake, C.L. and Merz, C.J. (1998). UCI Machine Learning Databases Repository. University of California, Irvine, Department of Information and Computer Science, http://www.ics.uci.edu/~mlearn/MLRepository.html.
  11. Brin, S., Motwani, R., and Silverstein, C. (1997.) Beyond Market Baskets: Generalizing Association Rules to Correlation. In Proceedings of the SIGMOD Conference, pp. 265–276.Google Scholar
  12. Brin, S., Motwani, R., Ullman, J.D., and Tsur, S. (1997). Dynamic Itemset Counting and Implication Rules for Market Basket Data. In Proceedings of the SIGMOD Conference, pp. 255–264.Google Scholar
  13. Duquenne, V. and Guigues, J.-L. (1986). Famille Minimale d’implications Informatives résultant d’un Tableau de données Binaires.Mathématiques et Sciences Humaines, 24(95), 5–18.Google Scholar
  14. Ganter, B. and Wille, R. (1999.) Formal Concept Analysis: Mathematical Foundations. Springer-Verlag.Google Scholar
  15. Han, J. and Fu, Y. (1999). Mining Multiple-Level Association Rules in Large Databases. Transactions on Knowledge and Data Engineering, 11(5), 798–804; P.-S. Yu (Ed.), IEEE Computer ScienceGoogle Scholar
  16. Hettich, S. and Bay, S.D. (1999.) UCI Knowledge Discovery in Databases Archive. University of California, Irvine, Department of Information and Computer Science,http://kdd.ics.uci.edu
  17. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A.I. (1994). Finding interesting Rules from Large Sets of Discovered Association Rules. In Proceedings of the CIKM conference, pp. 401–407.Google Scholar
  18. Lin, D. and Kedem, Z.M. (1998). Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set. In Proceedings of the EDBT Conference, pp. 105–119.Google Scholar
  19. Liu, B., Hsu, W., and Ma, Y. (1999). Pruning and Summarizing the Discovered Association Rules. In Proceedings of the KDD Conference, pp. 125–134.Google Scholar
  20. Luxenburger, M. (1991). Implications Partielles Dans un Contexte. Mathématiques, Informatique et Sciences Humaines, 29(113), 35–55.Google Scholar
  21. Maier, D. (1980). Minimum Covers in Relational Database Model. Journal of the ACM, 27(4), 664–674; ACM Computer Press.Google Scholar
  22. Mannila, H. and Toivonen, H. (1996). Multiple Uses of Frequent Sets and Condensed Representations. In Proceedings of the KDD Conference, pp. 189–194.Google Scholar
  23. Mannila, H. and Toivonen, H. (1997). Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(3), 241–258; U. Fayyad (Ed.), Kluwer Academic Publishers.Google Scholar
  24. Mannila, H., Toivonen, H., and Verkamo, A.I. (1994). Efficient Algorithms for Discovering Association Rules. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pp. 181–192.Google Scholar
  25. Meo, R., Psaila, G., and Ceri, S. (1998). An Extension to SQL for MiningAssociation Rules. Data Mining and Knowledge Discovery, 2(2), 195–224, U. Fayyad (Ed.), Kluwer Academic Publishers.Google Scholar
  26. Morimoto, Y., Fukuda, T., Matsuzawa, H., Tokuyama, T., and Yoda, K. (1998). Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases. In Proceedings of the VLDB Conference, pp. 380–391.Google Scholar
  27. Ng, R.T., Lakshmanan, V.S., Han, J., and Pang, A. (1998). Exploratory Mining and Pruning Optimizations of Constrained Association Rules. In Proceedings ofthe SIGMOD Conference, pp. 13–24.Google Scholar
  28. Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1998). Pruning Closed Itemset Lattices for Association Rules. In Proceedings of the BDA Conference, pp. 177–196.Google Scholar
  29. Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999). Discovering Frequent Closed Itemsets for Association Rules. In Proceedings of the ICDT Conference, LNCS 1540, pp. 398–416.Google Scholar
  30. Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999). Efficient Mining of Association Rules Using Closed Itemset Lattices. Information Systems, 24(1), 25–46; M. Jarke and D. Shasha (Eds.), Elsevier Science.Google Scholar
  31. Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999). Closed SetBased Discovery of Small Covers for Association Rules. In Proceedings of the BDA Conference, pp. 361–381.Google Scholar
  32. Pei, J., Han, J., and Mao, R. (2000). Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets. In Proceedings of the DMKD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30.Google Scholar
  33. Piatetsky-Shapiro, G. and Matheus, C.J. (1994). The Interestingness of Deviations. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pp. 25–36.Google Scholar
  34. Silberschatz, A. and Tuzhilin, A. (1996). What Makes Patterns Interesting in Knowledge Discovery Systems. Transactions on Knowledge and Data Engineering, 8(6), 970–974; IEEE Computer Science.Google Scholar
  35. Silverstein, C., Brin, S., and Motwani, R. (1998). Beyond Market Baskets: Generalizing Association Rules to Dependence Rules. Data Miningand Knowledge Discovery, 2(1), 39–68; U. Fayyad (Ed.), Kluwer AcademicPublishers.Google Scholar
  36. Srikant, R. and Agrawal, R. (1995). Mining Generalized Association Rules. In Proceedings of the VLDB Conference, pp. 407–419.Google Scholar
  37. Srikant, R. and Agrawal, R. (1996). Mining Quantitative Association Rules in Large Relational Tables. In Proceedings of the SIGMOD Conference, pp. 1–12.Google Scholar
  38. Srikant, R., Vu, Q., and Agrawal, R. (1997). Mining Association Rules with Item Constraints. In Proceedings of the KDD Conference, pp. 67–73.Google Scholar
  39. Taouil, R., Pasquier, N., Bastide, Y., and Lakhal, L. (2000). Mining Bases for Association Rules Using Closed Sets. In Proceedings of theICDE Conference, p. 307.Google Scholar
  40. Toivonen, H., Klemettinen, M., Ronkainen, P., Hätönen, K.,and Mannila, H. (1995). Pruning and Grouping Discovered Association Rules. In Proceedings of the ECML Workshop, pp. 47–52.Google Scholar
  41. Zaki, M.J. (2000). Generating Non-Redundant Association Rules. In Proceedings of the KDD Conference, pp. 34–43.Google Scholar
  42. Zaki, M.J. and Hsiao, C.-J. (1999). ChARM: An Efficient Algorithm for Closed Association Rule Mining. Technical Report 99-10.Google Scholar
  43. Zaki, M.J. and Ogihara, M. (1998). Theoretical Foundations of Association Rules. In Proceedings of the DMKD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 7:1–7:8.Google Scholar
  44. Zaki, M.J., Parthasarathy, S., Ogihara, M., and Li, W. (1997). New Algorithms for Fast Discovery of Association Rules. In Proceedings of the KDD Conference, pp. 283–286.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Nicolas Pasquier
    • 1
  • Rafik Taouil
    • 2
  • Yves Bastide
    • 3
  • Gerd Stumme
    • 4
  • Lotfi Lakhal
    • 5
  1. 1.I3S (CNRS UMR 6070)—Université de Nice-Sophia AntipolisFrance
  2. 2.LI—Université Francois Rabelais de ToursBloisFrance
  3. 3.IRISA—INRIA RennesCampus Universitaire de BeaulieuRennesFrance
  4. 4.Fachbereich Mathematik/InformatikUniversität KasselKasselGermany
  5. 5.LIM (CNRS FRE 2246)—Université de la MéditerranéeMarseilleFrance

Personalised recommendations