Simple Algorithms for Frequent Item Set Mining

Borgelt, Christian

doi:10.1007/978-3-642-05179-1_16

Christian Borgelt⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 263))

2158 Accesses
20 Citations

Abstract

In this paper I introduce SaM, a split and merge algorithm for frequent item set mining. Its core advantages are its extremely simple data structure and processing scheme, which not only make it quite easy to implement, but also very convenient to execute on external storage, thus rendering it a highly useful method if the transaction database to mine cannot be loaded into main memory. Furthermore, I review RElim (an algorithm I proposed in an earlier paper and improved in the meantime) and discuss different optimization options for both SaM and RElim. Finally, I present experiments comparing SaM and RElim with classical frequent item set mining algorithms (like Apriori, Eclat and FP-growth).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielienski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)
Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: [10], pp. 307–328
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science, University of California at Irvine, CA, USA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Böttcher, M., Spott, M., Nauck, D.: Detecting Temporally Redundant Association Rules. In: Proc. 4th Int. Conf. on Machine Learning and Applications (ICMLA 2005), Los Angeles, CA, pp. 397–403. IEEE Press, Piscataway (2005)
Chapter Google Scholar
Böttcher, M., Spott, M., Nauck, D.: A Framework for Discovering and Analyzing Changing Customer Segments. In: Perner, P. (ed.) ICDM 2007. LNCS (LNAI), vol. 4597, pp. 255–268. Springer, Heidelberg (2007)
Chapter Google Scholar
Borgelt, C.: Efficient Implementations of Apriori and Eclat. In: Proc. Workshop Frequent Item Set Mining Implementations (FIMI 2003), Melbourne, FL, USA, Aachen, Germany. CEUR Workshop Proceedings, vol. 90 (2003)
Google Scholar
Borgelt, C.: An Implementation of the FP-growth Algorithm. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 1–5. ACM Press, New York (2005)
Google Scholar
Borgelt, C.: Keeping Things Simple: Finding Frequent Item Sets by Recursive Elimination. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 66–70. ACM Press, New York (2005)
Google Scholar
Cheng, Y., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: Proc. 7th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 194–203. ACM Press, New York (2001)
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press / MIT Press (1996)
Google Scholar
Han, J., Pei, H., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. Conf. on the Management of Data (SIGMOD 2000), Dallas, TX, pp. 1–12. ACM Press, New York (2000)
Chapter Google Scholar
Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 Organizers’ Report: Peeling the Onion. SIGKDD Exploration 2(2), 86–93 (2000)
Article Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 396–407. Springer, Heidelberg (2000)
Chapter Google Scholar
Pei, J., Tung, A.K.H., Han, J.: Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. In: Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMK 2001), Santa Babara, CA, ACM Press, New York (2001)
Google Scholar
Rász, B.: nonordfp: An FP-growth Variation without Rebuilding the FP-Tree. In: Proc. Workshop Frequent Item Set Mining Implementations (FIMI 2004), Brighton, UK, Aachen, Germany. CEUR Workshop Proceedings, vol. 126 (2004)
Google Scholar
Rász, B., Bodon, F., Schmidt-Thieme, L.: On Benchmarking Frequent Itemset Mining Algorithms. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 36–45. ACM Press, New York (2005)
Google Scholar
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, CA, pp. 283–296. AAAI Press, Menlo Park (1997)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of Frequent Episodes in Event Sequences. Report C-1997-15, University of Helsinki, Finland (1997)
Google Scholar
Kuok, C., Fu, A., Wong, M.: Mining Fuzzy Association Rules in Databases. SIGMOD Record 27(1), 41–46 (1998)
Article Google Scholar
Moen, P.: Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining. Ph.D. Thesis/Report A-2000-1, Department of Computer Science, University of Helsinki, Finland (2000)
Google Scholar
Wang, X., Borgelt, C., Kruse, R.: Mining Fuzzy Frequent Item Sets. In: Proc. 11th Int. Fuzzy Systems Association World Congress (IFSA 2005), Beijing, China, pp. 528–533. Tsinghua University Press and Springer-Verlag (2005)
Google Scholar
Webb, G.I., Zhang, S.: k-Optimal-Rule-Discovery. Data Mining and Knowledge Discovery 10(1), 39–79 (2005)
Article MathSciNet Google Scholar
Webb, G.I.: Discovering Significant Patterns. Machine Learning 68(1), 1–33 (2007)
Article Google Scholar
Synthetic Data Generation Code for Associations and Sequential Patterns. Intelligent Information Systems, IBM Almaden Research Center, http://www.almaden.ibm.com/software/quest/Resources/index.shtml

Download references

Author information

Authors and Affiliations

European Center for Soft Computing, c/ Gonzalo Gutiérrez Quirós s/n, 33600, Mieres, Asturias, Spain
Christian Borgelt

Authors

Christian Borgelt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Ordona 21, 01-237, Warsaw, Poland
Jacek Koronacki & Sławomir T. Wierzchoń &
Woodward Hall 430C, University of North Carolina, 9201 University City Blvd., N.C. 28223, Charlotte, USA
Zbigniew W. Raś
Systems Research Institute, Polish Academy of Sciences, ul.Newelska 6, 01-447, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Borgelt, C. (2010). Simple Algorithms for Frequent Item Set Mining. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds) Advances in Machine Learning II. Studies in Computational Intelligence, vol 263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05179-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-05179-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05178-4
Online ISBN: 978-3-642-05179-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics