Abstract
In this paper I introduce SaM, a split and merge algorithm for frequent item set mining. Its core advantages are its extremely simple data structure and processing scheme, which not only make it quite easy to implement, but also very convenient to execute on external storage, thus rendering it a highly useful method if the transaction database to mine cannot be loaded into main memory. Furthermore, I review RElim (an algorithm I proposed in an earlier paper and improved in the meantime) and discuss different optimization options for both SaM and RElim. Finally, I present experiments comparing SaM and RElim with classical frequent item set mining algorithms (like Apriori, Eclat and FP-growth).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielienski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: [10], pp. 307–328
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science, University of California at Irvine, CA, USA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Böttcher, M., Spott, M., Nauck, D.: Detecting Temporally Redundant Association Rules. In: Proc. 4th Int. Conf. on Machine Learning and Applications (ICMLA 2005), Los Angeles, CA, pp. 397–403. IEEE Press, Piscataway (2005)
Böttcher, M., Spott, M., Nauck, D.: A Framework for Discovering and Analyzing Changing Customer Segments. In: Perner, P. (ed.) ICDM 2007. LNCS (LNAI), vol. 4597, pp. 255–268. Springer, Heidelberg (2007)
Borgelt, C.: Efficient Implementations of Apriori and Eclat. In: Proc. Workshop Frequent Item Set Mining Implementations (FIMI 2003), Melbourne, FL, USA, Aachen, Germany. CEUR Workshop Proceedings, vol. 90 (2003)
Borgelt, C.: An Implementation of the FP-growth Algorithm. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 1–5. ACM Press, New York (2005)
Borgelt, C.: Keeping Things Simple: Finding Frequent Item Sets by Recursive Elimination. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 66–70. ACM Press, New York (2005)
Cheng, Y., Fayyad, U., Bradley, P.S.: Efficient Discovery of Error-Tolerant Frequent Itemsets in High Dimensions. In: Proc. 7th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 194–203. ACM Press, New York (2001)
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press / MIT Press (1996)
Han, J., Pei, H., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. Conf. on the Management of Data (SIGMOD 2000), Dallas, TX, pp. 1–12. ACM Press, New York (2000)
Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 Organizers’ Report: Peeling the Onion. SIGKDD Exploration 2(2), 86–93 (2000)
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 396–407. Springer, Heidelberg (2000)
Pei, J., Tung, A.K.H., Han, J.: Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. In: Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMK 2001), Santa Babara, CA, ACM Press, New York (2001)
Rász, B.: nonordfp: An FP-growth Variation without Rebuilding the FP-Tree. In: Proc. Workshop Frequent Item Set Mining Implementations (FIMI 2004), Brighton, UK, Aachen, Germany. CEUR Workshop Proceedings, vol. 126 (2004)
Rász, B., Bodon, F., Schmidt-Thieme, L.: On Benchmarking Frequent Itemset Mining Algorithms. In: Proc. Workshop Open Software for Data Mining (OSDM 2005 at KDD 2005), Chicago, IL, pp. 36–45. ACM Press, New York (2005)
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, CA, pp. 283–296. AAAI Press, Menlo Park (1997)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of Frequent Episodes in Event Sequences. Report C-1997-15, University of Helsinki, Finland (1997)
Kuok, C., Fu, A., Wong, M.: Mining Fuzzy Association Rules in Databases. SIGMOD Record 27(1), 41–46 (1998)
Moen, P.: Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining. Ph.D. Thesis/Report A-2000-1, Department of Computer Science, University of Helsinki, Finland (2000)
Wang, X., Borgelt, C., Kruse, R.: Mining Fuzzy Frequent Item Sets. In: Proc. 11th Int. Fuzzy Systems Association World Congress (IFSA 2005), Beijing, China, pp. 528–533. Tsinghua University Press and Springer-Verlag (2005)
Webb, G.I., Zhang, S.: k-Optimal-Rule-Discovery. Data Mining and Knowledge Discovery 10(1), 39–79 (2005)
Webb, G.I.: Discovering Significant Patterns. Machine Learning 68(1), 1–33 (2007)
Synthetic Data Generation Code for Associations and Sequential Patterns. Intelligent Information Systems, IBM Almaden Research Center, http://www.almaden.ibm.com/software/quest/Resources/index.shtml
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Borgelt, C. (2010). Simple Algorithms for Frequent Item Set Mining. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds) Advances in Machine Learning II. Studies in Computational Intelligence, vol 263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05179-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-05179-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05178-4
Online ISBN: 978-3-642-05179-1
eBook Packages: EngineeringEngineering (R0)