Abstract
Most of the itemset mining approaches are memory-like and run outside of the database. On the other hand, when we deal with data warehouse the size of tables is extremely huge for memory copy. In addition, using a pure SQL-like approach is quite inefficient. Actually, those implementations rarely take advantages of database programming. Furthermore, RDBMS vendors offer a lot of features for taking control and management of the data. We purpose a pattern growth mining approach by means of database programming for finding allfrequent itemsets. The main idea is to avoid one-at-a-time record retrieval from the database, saving both the copying and process context switching, expensive joins, and table reconstruction. The empirical evaluation of our approach shows that runs competitively with the most known itemset mining implementations based on SQL. Our performance evaluation was made with SQL Server 2000 (v.8) and T-SQL, throughout several synthetical datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, R., Shim., R.: Developing tightly-coupled data mining application on a relational database system. In: Proc. of the 2nd Int. Conf. on Knowledge Discovery in Database and Data Mining, Portland, Oregon (1996)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of the ACM SIGMOD Intl. Conference on Management of Data, pp. 207–216 (1993)
Agrawal, R., Srikant., R.: Fast algorithms for mining association rules. In: Proc. of the 20th Very Large Data Base Conference, pp. 487–499 (1994)
Alves, R., Belo, O.: Integrating Pattern Growth Mining on SQL-Server RDBMS. Technical Report-003, University of Minho, Department of Informatics (May 2005), http://alfa.di.uminho.pt/~ronnie/files_files/rt/2005-RT3-Ronnie.pdf
Alves, R., Gabriel, P., Azevedo, P., Belo, O.: A Hybrid Method to Discover Inter-Transactional Rules. In: Proceedings of the JISBD 2005, Granada (2005)
Cheung, W., Zaïane, O.R.: Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint. In: Seventh International Database Engineering and Applications Symposium (IDEAS 2003), Hong Kong, China, July 16-18, pp. 111–116 (2003)
El-Hajj, M., Zaïane, O.R.: Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining. In: Proc. 2003 Int’l Conf. on Knowledge Discovery and Data Mining (ACM SIGKDD), Washington, DC, USA, August 24-27, pp. 109–118 (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of ACM SIGMOD Intl. Conference on Management of Data, pp. 1–12 (2000)
Hidber, C.: Online association rule mining. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 28(2), pp. 145–156. ACM Press, New York (1999)
Orlando, S., Palmerini, P., Perego, R.: Enhancing the apriori algorithm for frequent set counting. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 71–82. Springer, Heidelberg (2001)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and resource-aware mining of frequent sets. In: Kumar, V., Tsumoto, S., Yu, P.S., Zhong, N. (eds.) Proceedings of the 2002 IEEE International Conference on Data Mining. IEEE Computer Society, Los Alamitos (2002)
Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: DMKD 2003: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2003)
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating mining with relational database systems: alternatives and implications. In: Proc. of the ACM SIGMOD Conference on Management of data, Seattle, Washington, USA (1998)
Shang, X., Sattler, K., Geist, I.: Sql based frequent pattern mining without candidate generation. In: SAC 2004 Data Mining, Nicosia, Cyprus (2004)
Wang, H., Zaniolo, C.: Using SQL to build new aggregates and extenders for Object-Relational systems. In: Proc. of the 26th Int. Conf. on Very Large Databases, Cairo, Egypt (2000)
Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: Sql based association rule mining using commercial rdbms (ibm db2 udb eee). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, p. 301. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alves, R., Belo, O. (2005). Programming Relational Databases for Itemset Mining over Large Transactional Tables. In: Bento, C., Cardoso, A., Dias, G. (eds) Progress in Artificial Intelligence. EPIA 2005. Lecture Notes in Computer Science(), vol 3808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11595014_32
Download citation
DOI: https://doi.org/10.1007/11595014_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30737-2
Online ISBN: 978-3-540-31646-6
eBook Packages: Computer ScienceComputer Science (R0)