Memory-Aware Frequent k-Itemset Mining

  • Maurizio Atzori
  • Paolo Mancarella
  • Franco Turini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3933)


In this paper we show that the well known problem of computing frequent k-itemsets (i.e. itemsets of cardinality k) in a given dataset can be reduced to the problem of finding iceberg queries from a stream of queries suitably constructed from the original dataset. Hence, algorithms for computing frequent k-itemsets can be obtained by adapting algorithms for computing iceberg queries. In the paper we show that, for sparse datasets, this can be done directly, i.e. without generating frequent x-itemsets, for each x < k, as done in the most common algorithms based on a level-wise approach. We exploit a recent algorithm for finding iceberg queries and define an algorithm which requires only three sequential passes over the dataset to compute the frequent k-itemsets (even for k > 3). An important feature of the algorithm is that the amount of main memory required can be determined in advance, and it is shown to be very low for sparse datasets. Experiments show that for very large datasets with millions of small transactions our proposal outperforms the state-of-the-art algorithms. Furthermore, we sketch a first extension of our algorithm that works over data streams.


Association Rule Space Complexity Main Memory Mining Association Rule Frequent Itemset Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining – a general survey and comparison. SIGKDD Explorations 2, 58–64 (2000)CrossRefGoogle Scholar
  2. 2.
    Goethals, B.: Survey on frequent pattern mining (2003)Google Scholar
  3. 3.
    Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: International Conference on Data Mining, pp. 369–376 (2001)Google Scholar
  4. 4.
    Han, E.H., Karypis, G., Kumar, V., Mobasher, B.: Clustering based on association rule hypergraphs. In: Research Issues on Data Mining and Knowledge Discovery (1997)Google Scholar
  5. 5.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 16-18, 2000, pp. 1–12. ACM, New York (2000)CrossRefGoogle Scholar
  6. 6.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  7. 7.
    Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. In: Proceedings of the ACM PODS 2003, vol. 28, ACM Press, New York (2003)Google Scholar
  8. 8.
    Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. The VLDB Journal, 432–444 (1995)Google Scholar
  9. 9.
    Toivonen, H.: Sampling large databases for association rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) Proc. 1996 Int. Conf. Very Large Data Bases, pp. 134–145. Morgan Kaufmann, San Francisco (1996)Google Scholar
  10. 10.
    Chen, B., Haas, P., Scheuermann, P.: A new two-phase sampling based algorithm for discovering association rules. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 462–468. ACM Press, New York (2002)CrossRefGoogle Scholar
  11. 11.
    Chen, B., Haas, P.J., Schauermann, P.: Fast: A new sampling-based algorithm for discovering association rules. In: 18th International Conference on Data Engineering (2002)Google Scholar
  12. 12.
    Goethals, B.: Memory issues in frequent itemset mining. In: Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004), Nicosia, Cyprus, March 14 –17. ACM, New York (2004)Google Scholar
  13. 13.
    Zaki, M.J.: Scalable algorithms for association mining. In: IEEE Transactions on Knowledge and Data Engineering, pp. 372–390. ACM Press, New York (2000)Google Scholar
  14. 14.
    Goethals, B., Zaki, M.J. (eds.): Proceedings of the ICDM 2003, Workshop on Frequent Itemset Mining Implementations, FIMI 2003, Melbourne, Florida, USA, December 19. CEUR Workshop Proceedings, vol. 90. (2003)Google Scholar
  15. 15.
    Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: Optimized levelwise frequent pattern mining with monotone constraint. In: International Conference on Data Mining 2003, Melbourne, Florida, USA, pp. 11–18 (2003)Google Scholar
  16. 16.
    Borgelt, C.: Keeping things simple: Finding frequent item sets by recursive elimination. In: Workshop Open Software for Data Mining, on Frequent Pattern Mining Implementations (OSDM 2005), Chicago, IL, USA (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Maurizio Atzori
    • 1
    • 2
  • Paolo Mancarella
    • 1
  • Franco Turini
    • 1
  1. 1.Dipartimento di InformaticaUniversity of PisaItaly
  2. 2.ISTI-CNR, Area della Ricerca di PisaItaly

Personalised recommendations