LCM over ZBDDs: Fast Generation of Very Large-Scale Frequent Itemsets Using a Compact Graph-Based Representation

  • Shin-ichi Minato
  • Takeaki Uno
  • Hiroki Arimura
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5012)

Abstract

Frequent itemset mining is one of the fundamental techniques for data mining and knowledge discovery. In the last decade, a number of efficient algorithms have been presented for frequent itemset mining, but most of them focused on only enumerating the itemsets that satisfy the given conditions, and how to store and index the mining result in order to ensure an efficient data analysis is a different matter.

In this paper, we propose a fast algorithm for generating very large-scale all/closed/maximal frequent itemsets using Zero-suppressed BDDs (ZBDDs), a compact graph-based data structure. Our method, “LCM over ZBDDs,” is based on one of the most efficient state-of-the-art algorithms proposed thus far. Not only does it enumerate/list the itemsets, but it also generates a compact output data structure on the main memory. The result can be efficiently postprocessed by using algebraic ZBDD operations. The original LCM is known as an output linear time algorithm, but our new method requires a sub-linear time for the number of frequent patterns when the ZBDD-based data compression works well. Our method will greatly accelerate the data mining process and this will leads to a new style of on-memory processing for dealing with knowledge discovery problems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proc. of the 1993 ACM SIGMOD International Conference on Management of Data, vol. 22(2) of SIGMOD Record, pp. 207–216 (1993)Google Scholar
  2. 2.
    Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers C-35(8), 677–691 (1986)CrossRefGoogle Scholar
  3. 3.
    Goethals, B.: Survey on frequent pattern mining (2003), http://www.cs.helsinki.fi/u/goethals/publications/survey.ps
  4. 4.
    Goethals, B., Zaki, M.J.: Frequent itemset mining dataset repository. In: Frequent Itemset Mining Implementations (FIMI 2003) (2003), http://fimi.cs.helsinki.fi/data/
  5. 5.
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery 8(1), 53–87 (2004)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Loekit, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. In: Proc. The Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), pp. 307–316 (2006)Google Scholar
  7. 7.
    Minato, S.: Zero-suppressed BDDs for set manipulation in combinatorial problems. In: Proc. of 30th ACM/IEEE Design Automation Conference, pp. 272–277 (1993)Google Scholar
  8. 8.
    Minato, S.: Symmetric item set mining based on zero-suppressed BDDs. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 321–326. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Minato, S., Arimura, H.: Efficient combinatorial item set analysis based on zero-suppressed BDDs. In: Proc. IEEE/IEICE/IPSJ International Workshop on Challenges in Web Information Retrieval and Integration (WIRI-2005), April 2005, pp. 3–10 (2005)Google Scholar
  10. 10.
    Minato, S., Arimura, H.: frequent closed item set mining based on zero-suppressed BDDs. Trans. of the Japanese Society of Artificial Intelligence 22(2), 165–172 (2007)CrossRefGoogle Scholar
  11. 11.
    Minato, S., Arimura, H.: Frequent pattern mining and knowledge indexing based on zero-suppressed BDDs. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 152–169. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Uno, T., Arimura, H.: Program codes of takeaki uno and hiroki arimura (2007), http://research.nii.ac.jp/~uno/codes.htm
  13. 13.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver.2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, Springer, Heidelberg (2004)Google Scholar
  14. 14.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proc. Open Source Data Mining Workshop on Frequent Pattern Mining Implementations 2005 (2005)Google Scholar
  15. 15.
    Uno, T., Uchida, Y., Asai, T., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proc. Workshop on Frequent Itemset Mining Implementations (FIMI 2003) (2003), http://fimi.cs.helsinki.fi/src/
  16. 16.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(2), 372–390 (2000)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Shin-ichi Minato
    • 1
  • Takeaki Uno
    • 2
  • Hiroki Arimura
    • 1
  1. 1.Graduate School of Information Science and TechnologyHokkaido UniversitySapporoJapan
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations