An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining

  • Takeaki Uno
  • Hiroki Arimura
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4755)

Abstract

Mining frequently appearing patterns in a database is a basic problem in informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, the problem is called the frequent itemset mining problem, and has been extensively studied. In the real-world use, one of difficulties of frequent itemset mining is that real-world data is often incorrect, or missing some parts. It causes that some records which should include a pattern do not have it. To deal with real-world problems, one can use an ambiguous inclusion relation and find patterns which are mostly included in many records. However, computational difficulty have prevented such problems from being actively used in practice. In this paper, we use an alternative inclusion relation in which we consider an itemset P to be included in an itemset T if at most k items of P are not included in T, i.e., |P ∖ T| ≤ k. We address the problem of enumerating frequent itemsets under this inclusion relation and propose an efficient polynomial delay polynomial space algorithm. Moreover, To enable us to skip many small non-valuable frequent itemsets, we propose an algorithm for directly enumerating frequent itemsets of a certain size.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining, 307–328 (1996)Google Scholar
  2. 2.
    Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM 2002 (2002)Google Scholar
  3. 3.
    Avis, D., Fukuda, K.: Reverse Search for Enumeration. Discrete App. Math. 65, 21–46 (1996)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proc. SIGMOD 1998, pp. 85–93 (1998)Google Scholar
  5. 5.
    Besson, J., Robardet, C., Boulicaut, J.F.: Mining Formal Concepts with a Bounded Number of Exceptions from Transactional Data. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 33–45. Springer, Heidelberg (2005)Google Scholar
  6. 6.
    Liu, J., Paulsen, S., Wang, W., Nobel, A., Prins, J.: Mining Approximate Frequent Itemsets from Noisy Data. In: ICDM 2005. 5th IEEE International Conference on Data Mining, pp. 721–724 (2005)Google Scholar
  7. 7.
    Seppanen, J.K., Mannila, H.: Dense Itemsets. In: SIGKDD 2004 (2004)Google Scholar
  8. 8.
    Shen-Shung, W., Suh-Yin, L.: Mining Fault-Tolerant Frequent Patterns in Large Databases. In: ICS 2002 (2002)Google Scholar
  9. 9.
    Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering Most Classificatory Patterns for Very Expressive Pattern Classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)Google Scholar
  10. 10.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets. In: Proc. IEEE ICDM 2003 Workshop FIMI 2003 (2003)Google Scholar
  11. 11.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)Google Scholar
  12. 12.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: Proc. IEEE ICDM 2004 Workshop FIMI 2004 (2004)Google Scholar
  13. 13.
    Wang, J.T.L., Chirn, G.W., Marr, T.G., Shapiro, B., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: some preliminary results. In: Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pp. 115–125 (1994)Google Scholar
  14. 14.
    Yang, C., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: SIGKDD 2001 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Takeaki Uno
    • 1
  • Hiroki Arimura
    • 2
  1. 1.National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430Japan
  2. 2.Graduate School of Information Science and Technology, Hokkaido University, Kita 14 Nishi 9, Sapporo 060-0814Japan

Personalised recommendations