Simple Algorithms for Frequent Item Set Mining

  • Christian Borgelt
Part of the Studies in Computational Intelligence book series (SCI, volume 263)

Abstract

In this paper I introduce SaM, a split and merge algorithm for frequent item set mining. Its core advantages are its extremely simple data structure and processing scheme, which not only make it quite easy to implement, but also very convenient to execute on external storage, thus rendering it a highly useful method if the transaction database to mine cannot be loaded into main memory. Furthermore, I review RElim (an algorithm I proposed in an earlier paper and improved in the meantime) and discuss different optimization options for both SaM and RElim. Finally, I present experiments comparing SaM and RElim with classical frequent item set mining algorithms (like Apriori, Eclat and FP-growth).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielienski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)Google Scholar
  2. 2.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: [10], pp. 307–328Google Scholar
  3. 3.
    Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science, University of California at Irvine, CA, USA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  4. 4.
    Böttcher, M., Spott, M., Nauck, D.: Detecting Temporally Redundant Association Rules. In: Proc. 4th Int. Conf. on Machine Learning and Applications (ICMLA 2005), Los Angeles, CA, pp. 397–403. IEEE Press, Piscataway (2005)CrossRefGoogle Scholar
  5. 5.
    Böttcher, M., Spott, M., Nauck, D.: A Framework for Discovering and Analyzing Changing Customer Segments. In: Perner, P. (ed.) ICDM 2007. LNCS (LNAI), vol. 4597, pp. 255–268. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Borgelt, C.: Efficient Implementations of Apriori and Eclat. In: Proc. Workshop Frequent Item Set Mining Implementations (FIMI 2003), Melbourne, FL, USA, Aachen, Germany. CEUR Workshop Proceedings, vol. 90 (2003)Google Scholar
  7. 7.
    Borgelt, C.: An Implementation of the FP-growth Algorithm. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 1–5. ACM Press, New York (2005)Google Scholar
  8. 8.
    Borgelt, C.: Keeping Things Simple: Finding Frequent Item Sets by Recursive Elimination. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 66–70. ACM Press, New York (2005)Google Scholar
  9. 9.
    Cheng, Y., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: Proc. 7th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 194–203. ACM Press, New York (2001)Google Scholar
  10. 10.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press / MIT Press (1996)Google Scholar
  11. 11.
    Han, J., Pei, H., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. Conf. on the Management of Data (SIGMOD 2000), Dallas, TX, pp. 1–12. ACM Press, New York (2000)CrossRefGoogle Scholar
  12. 12.
    Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 Organizers’ Report: Peeling the Onion. SIGKDD Exploration 2(2), 86–93 (2000)CrossRefGoogle Scholar
  13. 13.
    Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 396–407. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Pei, J., Tung, A.K.H., Han, J.: Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. In: Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMK 2001), Santa Babara, CA, ACM Press, New York (2001)Google Scholar
  15. 15.
    Rász, B.: nonordfp: An FP-growth Variation without Rebuilding the FP-Tree. In: Proc. Workshop Frequent Item Set Mining Implementations (FIMI 2004), Brighton, UK, Aachen, Germany. CEUR Workshop Proceedings, vol. 126 (2004)Google Scholar
  16. 16.
    Rász, B., Bodon, F., Schmidt-Thieme, L.: On Benchmarking Frequent Itemset Mining Algorithms. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 36–45. ACM Press, New York (2005)Google Scholar
  17. 17.
    Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, CA, pp. 283–296. AAAI Press, Menlo Park (1997)Google Scholar
  18. 18.
    Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of Frequent Episodes in Event Sequences. Report C-1997-15, University of Helsinki, Finland (1997)Google Scholar
  19. 19.
    Kuok, C., Fu, A., Wong, M.: Mining Fuzzy Association Rules in Databases. SIGMOD Record 27(1), 41–46 (1998)CrossRefGoogle Scholar
  20. 20.
    Moen, P.: Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining. Ph.D. Thesis/Report A-2000-1, Department of Computer Science, University of Helsinki, Finland (2000)Google Scholar
  21. 21.
    Wang, X., Borgelt, C., Kruse, R.: Mining Fuzzy Frequent Item Sets. In: Proc. 11th Int. Fuzzy Systems Association World Congress (IFSA 2005), Beijing, China, pp. 528–533. Tsinghua University Press and Springer-Verlag (2005)Google Scholar
  22. 22.
    Webb, G.I., Zhang, S.: k-Optimal-Rule-Discovery. Data Mining and Knowledge Discovery 10(1), 39–79 (2005)CrossRefMathSciNetGoogle Scholar
  23. 23.
    Webb, G.I.: Discovering Significant Patterns. Machine Learning 68(1), 1–33 (2007)CrossRefGoogle Scholar
  24. 24.
    Synthetic Data Generation Code for Associations and Sequential Patterns. Intelligent Information Systems, IBM Almaden Research Center, http://www.almaden.ibm.com/software/quest/Resources/index.shtml

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Christian Borgelt
    • 1
  1. 1.European Center for Soft ComputingMieres, AsturiasSpain

Personalised recommendations