Transaction Databases, Frequent Itemsets, and Their Condensed Representations

  • Taneli Mielikäinen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3933)


Mining frequent itemsets is a fundamental task in data mining. Unfortunately the number of frequent itemsets describing the data is often too large to comprehend. This problem has been attacked by condensed representations of frequent itemsets that are subcollections of frequent itemsets containing only the frequent itemsets that cannot be deduced from other frequent itemsets in the subcollection, using some deduction rules. In this paper we review the most popular condensed representations of frequent itemsets, study their relationship to transaction databases and each other, examine their combinatorial and computational complexity, and describe their relationship to other important concepts in combinatorial data analysis, such as Vapnik-Chervonenkis dimension and hypergraph transversals.


Association Rule Frequent Itemsets Transaction Database Frequent Itemset Mining Minimum Support Threshold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C, May 26-28, 1993, pp. 207–216. ACM Press, New York (1993)CrossRefGoogle Scholar
  2. 2.
    Boulicaut, J.F.: Inductive databases and multiple uses of frequent itemsets: The cInQ approach. In: [49], pp. 1–23Google Scholar
  3. 3.
    De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4, 69–77 (2003)CrossRefGoogle Scholar
  4. 4.
    Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of The ACM 39, 58–64 (1996)CrossRefGoogle Scholar
  5. 5.
    Mannila, H.: Inductive databases and condensed representations for data mining. In: Maluszynski, J. (ed.) Logic Programming, Proceedings of the 1997 International Symposium, Port Jefferson, Long Island, N.Y, October 13-16, 1997, pp. 21–30. MIT Press, Cambridge (1997)Google Scholar
  6. 6.
    Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 189–194. AAAI Press, Menlo Park (1996)Google Scholar
  7. 7.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI/MIT Press, Cambridge (1996)Google Scholar
  8. 8.
    Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Transactions on Database Systems 28, 140–174 (2003)CrossRefGoogle Scholar
  9. 9.
    Goethals, B., Zaki, M.J. (eds.): Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI 2003), Melbourne Florida, USA, November 19. CEUR Workshop Proceedings, vol. 90 (2003),
  10. 10.
    Bayardo, R., Goethals, B., Zaki, M.J. (eds.): Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI 2004), Brighton, UK, November 1, 2004. CEUR Workshop Proceedings, vol. 126 (2004),
  11. 11.
    Mielikäinen, T.: Separating structure from interestingness. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 476–485. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) VLDB 1996, Proceedings of 22th International Conference on Very Large Data Bases, Mumbai (Bombay), India, September 3-6, 1996, pp. 134–145. Morgan Kaufmann, San Francisco (1996)Google Scholar
  13. 13.
    Bayardo Jr., R.J.: Efficiently mining long patterns from databases. In: Haas, L.M., Tiwary, A. (eds.) SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, June 2-4, 1998, pp. 85–93. ACM Press, New York (1998)CrossRefGoogle Scholar
  14. 14.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1, 241–258 (1997)CrossRefGoogle Scholar
  15. 15.
    Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On the complexity of generating maximal frequent and minimal infrequent sets. In: Alt, H., Ferreira, A. (eds.) STACS 2002. LNCS, vol. 2285, pp. 133–141. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: [50], pp. 344–353 Google Scholar
  17. 17.
    Afrati, F.N., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. In: [50], pp. 12–19 Google Scholar
  18. 18.
    Karp, R.M., Luby, M., Madras, N.: Monte-Carlo approximation algorithms for enumeration problems. Journal of Algorithms 10, 429–448 (1989)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  20. 20.
    Mielikäinen, T.: Finding all occurring sets of interest. In: Boulicaut, J.F., Džeroski, S. (eds.) 2nd International Workshop on Knowledge Discovery in Inductive Databases, pp. 97–106 (2003)Google Scholar
  21. 21.
    Kryszkiewicz, M.: Concise representation of frequent patterns based on disjunctionfree generators. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, California, USA, 29 November - 2 December 2001, pp. 305–312. IEEE Computer Society Press, Los Alamitos (2001)CrossRefGoogle Scholar
  22. 22.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Boulicaut, J.F., Bykowski, A.: Frequent closures as a concise representation for binary data mining. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 62–73. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  24. 24.
    Mielikäinen, T.: Frequency-based views to pattern collections. In: Hammer, P.L. (ed.) Proceedings of the IFIP/SIAM Workshop on Discrete Mathematics and Data Mining, SIAM International Conference on Data Mining, San Francisco, CA, USA, May 1-3. SIAM, Philadelphia (2003)Google Scholar
  25. 25.
    Mielikäinen, T., Mannila, H.: The pattern ordering problem. In:[51], pp. 327–338Google Scholar
  26. 26.
    Pei, J., Dong, G., Zou, W., Han, J.: On computing condensed pattern bases. In: Kumar, V., Tsumoto, S. (eds.) Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, December 9-12, pp. 378–385. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  27. 27.
    Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.Å., Ooi, B.C. (eds.) Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, ACM, August 30 - September 2, 2005, pp. 709–720. ACM, New York (2005)Google Scholar
  28. 28.
    Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Grossman, R., Bayardo, R., Bennett, K.P. (eds.) Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005, pp. 314–323. ACM, New York (2005)CrossRefGoogle Scholar
  29. 29.
    Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: SIGMOD 1998 Workshop on Research Issues in Data Mining and Knowledge Discovery (1998)Google Scholar
  30. 30.
    Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of Boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery 7, 5–22 (2003)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhai, L.: Mining frequent patterns with counting inference. SIGKDD Explorations 2, 66–75 (2000)CrossRefGoogle Scholar
  32. 32.
    Calders, T.: Computational complexity of itemset frequency satisfiability. In: Proceedings of the Twenty-Third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Maison de la Chimie, Paris, France, June 13-18. ACM Press, New York (2004)Google Scholar
  33. 33.
    Mielikäinen, T.: On inverse frequent set mining. In: Du, W., Clifton, C.W. (eds.) Proceedings of the 2nd Workshop on Privacy Preserving Data Mining (PPDM), Melbourne, Florida, USA, November 19, 2003, pp. 18–23. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  34. 34.
    Mielikäinen, T.: Implicit enumeration of patterns. In: [52]Google Scholar
  35. 35.
    Jukna, S.: Extremal Combinatorics: With Applications in Computer Science. EATCS Texts in Theoretical Computer Science. Springer, Heidelberg (2001)CrossRefMATHGoogle Scholar
  36. 36.
    Calders, T.: Deducing bounds on the supports of itemsets. In: [49], pp. 214–233 Google Scholar
  37. 37.
    Calders, T., Goethals, B.: Mining All Non-derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–865. Springer, Heidelberg (2002)Google Scholar
  38. 38.
    Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: [51], pp. 71–82Google Scholar
  39. 39.
    Dexters, N., Calders, T.: Theoretical bounds on the size of condensed representations. In: [52], pp. 46–65Google Scholar
  40. 40.
    Anthony, M., Biggs, N.: Computational Learning Theory: An Introduction, Paperback edn. Cambridge University Press, Cambridge (1997)MATHGoogle Scholar
  41. 41.
    Chazelle, B.: The Discrepancy Method: Randomness and Complexity, Paperback edn. Cambridge University Press, Cambridge (2001)MATHGoogle Scholar
  42. 42.
    Papadimitriou, C.H., Yannakakis, M.: On limited nondeterminism and the complexity of V-C dimension. Journal of Computer and System Sciences 53, 161–170 (1996)MathSciNetCrossRefMATHGoogle Scholar
  43. 43.
    Downey, R.G., Fellows, M.R.: Parameterized Complexity. Monographs in Computer Science. Springer, Heidelberg (1999)CrossRefMATHGoogle Scholar
  44. 44.
    Flum, J., Grohe, M., Weyer, M.: Bounded fixed-parameter tractability and log 2n nondeterministic bits. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 555–567. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  45. 45.
    Ramesh, G., Maniatty, W.A., Zaki, M.J.: Feasible itemset distributions in data mining: Theory and application. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Diego, CA, USA, June 9-12, 2003, pp. 284–295. ACM, New York (2003)CrossRefGoogle Scholar
  46. 46.
    Eiter, T., Gottlob, G., Makino, K.: New results on monotone dualization and generating hypergraph transversals. In: Proceedings on 34th Annual ACM Symposium on Theory of Computing, Montréal, Québec, Canada, May 19-21, 2002, pp. 14–22. ACM, New York (2002)Google Scholar
  47. 47.
    Mielikäinen, T.: An automata approach to pattern collections. In: [52] Google Scholar
  48. 48.
    Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations 2, 86–98 (2000), CrossRefGoogle Scholar
  49. 49.
    Meo, R., Lanzi, P.L., Klemettinen, M. (eds.): Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682. Springer, Heidelberg (2004)MATHGoogle Scholar
  50. 50.
    Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.): Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004. ACM, New York (2004)Google Scholar
  51. 51.
    Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.): PKDD 2003. LNCS (LNAI), vol. 2838, pp. 1–8. Springer, Heidelberg (2003)Google Scholar
  52. 52.
    Goethals, B., Siebes, A. (eds.): KDID 2004. LNCS, vol. 3377. Springer, Heidelberg (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Taneli Mielikäinen
    • 1
  1. 1.HIIT Basic Research Unit, Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations