Advertisement

Shaping SQL-Based Frequent Pattern Mining Algorithms

  • Csaba István Sidló
  • András Lukács
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3933)

Abstract

Integration of data mining and database management systems could significantly ease the process of knowledge discovery in large databases. We consider implementations of frequent itemset mining algorithms, in particular pattern-growth algorithms similar to the top-down FP-growth variations, tightly coupled to relational database management systems. Our implementations remain within the confines of the conventional relational database facilities like tables, indices, and SQL operations. We compare our algorithm to the most promising previously proposed SQL-based FIM algorithm. Experiments show that our method performs better in many cases, but still has severe limitations compared to the traditional stand-alone pattern-growth method implementations. We identify the bottlenecks of our SQL-based pattern-growth methods and investigate the applicability of tightly coupled algorithms in practice.

Keywords

Association Rule Minimum Support Association Rule Mining Support Counting Data Mining Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Frequent itemset mining implementations repository, http://fimi.cs.helsinki.fi/
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994: Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, CA, USA, pp. 487–499. Morgan Kaufmann Publishers, San Francisco (1994)Google Scholar
  3. 3.
    Baralis, E., Cerquitelli, T., Chiusano, S.: Index support for frequent itemset mining in a relational DBMS. In: ICDE 2005: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 754–765. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  4. 4.
    Benczúr, A.A., Csalogány, K., Hum, K., Lukács, A., Rácz, B., Sidló, C., Uher, M.: Architecture for mining massive web logs with experiments. In: Proceedings of the HUBUSKA Open Workshop on Generic Issues of Knowledge Technologies (2005)Google Scholar
  5. 5.
    Bentayeb, F., Darmont, J.: Decision tree modeling with relational views. In: Hacid, M.-S., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds.) ISMIS 2002. LNCS (LNAI), vol. 2366, pp. 423–431. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Botta, M., Boulicaut, J.-F., Masson, C., Meo, R.: Query languages supporting descriptive rule mining: A comparative study. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds.) Database Support for Data Mining Applications. LNCS (LNAI), vol. 2682, pp. 24–51. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  7. 7.
    Boulicaut, J.-F., Klemettinen, M., Mannila, H.: Modeling KDD processes within the inductive database framework. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 293–302. Springer, Heidelberg (1999)Google Scholar
  8. 8.
    Grahne, G., Zhu, J.: Mining frequent itemsets from secondary memory. In: ICDM 2004: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 2004), pp. 91–98. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  9. 9.
    Han, J.: Towards on-line analytical mining in large databases. SIGMOD Rec. 27(1), 97–107 (1998)CrossRefGoogle Scholar
  10. 10.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 1–12. ACM Press, New York (2000)CrossRefGoogle Scholar
  11. 11.
    Houtsma, M., Swami, A.: Set-oriented data mining in relational databases. Data Knowl. Eng. 17(3), 245–262 (1995)CrossRefMATHGoogle Scholar
  12. 12.
    Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39(11), 58–64 (1996)CrossRefGoogle Scholar
  13. 13.
    Kona, H., Chakravarthy, S.: Partitioned approach to association rule mining over multiple databases, pp. 320–330 (2004)Google Scholar
  14. 14.
    Li, W., Mozes, A.: Computing frequent itemsets inside Oracle 10g. In: VLDB 2004, pp. 1253–1256 (2004)Google Scholar
  15. 15.
    MacLennan, J.: SQL Server 2005: Unearth the new data mining features of analysis services 2005. MSDN Magazine 19(9) (2004)Google Scholar
  16. 16.
    Meo, R., Psaila, G., Ceri, S.: A tightly-coupled architecture for data mining. In: ICDE 1998: Proceedings of the Fourteenth International Conference on Data Engineering, Washington, DC, USA, pp. 316–323. IEEE Computer Society, Los Alamitos (1998)Google Scholar
  17. 17.
    Mishra, P., Chakravarthy, S.: Performance evaluation of SQL-OR variants for association rule mining. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 288–298. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Netz, A., Chaudhuri, S., Fayyad, U.M., Bernhardt, J.: Integrating data mining with SQL databases: OLE DB for data mining. In: Proceedings of the 17th International Conference on Data Engineering, pp. 379–387. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  19. 19.
    Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-Mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 441–448. IEEE Computer Society, Los Alamitos (2001)Google Scholar
  20. 20.
    Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pp. 20–27. ACM Press, New York (2003)CrossRefGoogle Scholar
  21. 21.
    Rantzau, R.: Frequent itemset discovery with SQL using universal quantification. Database Support for Data Mining Applications, 194–213 (2004)Google Scholar
  22. 22.
    Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: alternatives and implications. In: SIGMOD 1998: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 343–354. ACM Press, New York (1998)CrossRefGoogle Scholar
  23. 23.
    Sattler, K.-U., Dunemann, O.: SQL database primitives for decision tree classifiers. In: CIKM 2001: Proceedings of the tenth international conference on Information and knowledge management, pp. 379–386. ACM Press, New York (2001)CrossRefGoogle Scholar
  24. 24.
    Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 432–444. Morgan Kaufmann Publishers Inc., San Francisco (1995)Google Scholar
  25. 25.
    Shang, X., Sattler, K.-U., Geist, I.: SQL based frequent pattern mining with fp-growth. In: INAP/WLP, pp. 32–46 (2004)Google Scholar
  26. 26.
    Thomas, S., Chakravarthy, S.: Performance evaluation and optimization of join queries for association rule mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 241–250. Springer, Heidelberg (1999)Google Scholar
  27. 27.
    Wang, K., Tang, L., Han, J., Liu, J.: Top down FP-growth for association rule mining. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 334–340. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  28. 28.
    Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: SQL based association rule mining using commercial RDBMS (IBM DB2 UDB EEE). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000, vol. 1874, pp. 301–306. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Csaba István Sidló
    • 1
  • András Lukács
    • 2
  1. 1.Faculty of InformaticsEötvös Loránd UniversityBudapestHungary
  2. 2.Computer and Automation Research InstituteHungarian Academy of SciencesBudapestHungary

Personalised recommendations