Integration of Dataset Scans in Processing Sets of Frequent Itemset Queries

  • Marek Wojciechowski
  • Maciej Zakrzewicz
  • Pawel Boinski
Part of the Intelligent Systems Reference Library book series (ISRL, volume 23)

Abstract

Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. In this chapter we address the problem of processing sets of frequent itemset queries, which brings the ideas of multiple-query optimization to the domain of data mining. The most attractive method of solving the problem with respect to possible practical applications is Common Counting which consists in concurrent execution of the queries using Apriori with the integration of scans of the parts of the database shared among the queries. The major advantage of Common Counting over its alternatives is its applicability to arbitrarily large batches of queries. If the memory structures of all the queries to be processed by Common Counting do not fit together in main memory, the set of queries has to be partitioned into subsets processed in several phases. We formalize the problem of dividing the set of queries for Common Counting as a specific case of hypergraph partitioning and provide a comprehensive overview of query set partitioning algorithms proposed so far.

Keywords

Association Rule Total Execution Time Frequent Itemset Mining Minimum Support Threshold Restricted Candidate List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules Between Sets of Items in Large Databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. ACM Press, New York (1993)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., et al.: The Quest Data Mining System. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining, pp. 244–249. AAAI Press, Menlo Park (1996)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  4. 4.
    Alpert, C.J., Kahng, A.B.: Recent Directions in Netlist Partitioning: A Survey. Integration: The VLSI Journal 19, 1–81 (1995)MATHCrossRefGoogle Scholar
  5. 5.
    Alsabbagh, J.R., Raghavan, V.V.: Analysis of common subexpression exploitation models in multiple-query processing. In: Rusinkiewicz, M. (ed.) Proceedings of the 10th International Conference on Data Engineering, pp. 488–497. IEEE Computer Society, Los Alamitos (1994)Google Scholar
  6. 6.
    Baralis, E., Psaila, G.: Incremental Refinement of Mining Queries. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 173–182. Springer, Heidelberg (1999)Google Scholar
  7. 7.
    Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H.: Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs. Journal of Artificial Intelligence Research 16, 135–166 (2002)MATHGoogle Scholar
  8. 8.
    Boinski, P., Jozwiak, K., Wojciechowski, M., Zakrzewicz, M.: Improving Quality of Agglomerative Scheduling in Concurrent Processing of Frequent Itemset Queries. In: Klopotek, M.A., Wierzchon, S.T., Trojanowski, K. (eds.) Proceedings of the International IIS: IIPWM 2006 Conference, pp. 233–242. Springer, Heidelberg (2006)Google Scholar
  9. 9.
    Boinski, P., Jozwiak, K., Wojciechowski, M., Zakrzewicz, M.: Estimating Hash-Tree Sizes in Concurrent Processing of Frequent Itemset Queries. International Journal of Information Technology and Intelligent Computing 1, 405–417 (2006)Google Scholar
  10. 10.
    Boinski, P., Wojciechowski, M., Zakrzewicz, M.: A Greedy Approach to Concurrent Processing of Frequent Itemset Queries. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 292–301. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Ceri, S., Meo, R., Psaila, G.: A New SQL-like Operator for Mining Association Rules. In: Vijayaraman, T.M., Buchmann, A.P., Mohan, C., Sarda, N.L. (eds.) Proceedings of the 22th International Conference on Very Large Data Bases, pp. 122–133. Morgan Kaufmann, San Francisco (1996)Google Scholar
  12. 12.
    Cheung, D.W., Han, J., Ng, V.T., Wong, C.Y.: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. In: Su, S.Y.W. (ed.) Proceedings of the 12th International Conference on Data Engineering, pp. 106–114. IEEE Computer Society, Los Alamitos (1996)Google Scholar
  13. 13.
    Cheung, D.W., Lee, S.D., Kao, B.: A General Incremental Technique for Maintaining Discovered Association Rules. In: Topor, R.W., Tanaka, K. (eds.) Proceedings of the Fifth International Conference on Database Systems for Advanced Applications, pp. 185–194. World Scientific, Singapore (1997)CrossRefGoogle Scholar
  14. 14.
    Garey, M.R., Johnson, D.S.: Computers and Intractability. A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, San Francisco (1979)MATHGoogle Scholar
  15. 15.
    Goethals, B., Van den Bussche, J.: On supporting interactive association rule mining. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 307–316. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  16. 16.
    Grudzinski, P., Wojciechowski, M.: Integration of candidate hash trees in concurrent processing of frequent itemset queries using Apriori. Control and Cybernetics 38, 47–65 (2009)Google Scholar
  17. 17.
    Grudzinski, P., Wojciechowski, M., Zakrzewicz, M.: Partition-Based Approach to Processing Batches of Frequent Itemset Queries. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds.) FQAS 2006. LNCS (LNAI), vol. 4027, pp. 479–488. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Han, J., Fu, Y., Wang, W., Chiang, J., Gong, W., Koperski, K., Li, D., Lu, Y., Rajan, A., Stefanovic, N., Xia, B., Zaiane, O.: DBMiner: A System for Mining Knowledge in Large Relational Databases. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 250–255. AAAI Press, Menlo Park (1996)Google Scholar
  19. 19.
    Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational databases. In: Jagadish, H.V., Mumick, I.S. (eds.) Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 27–33. ACM Press, New York (1996)Google Scholar
  20. 20.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM Press, New York (2000)CrossRefGoogle Scholar
  21. 21.
    Hart, J.P., Shogan, A.W.: Semi-greedy Heuristics: An Empirical Study. Operations Research Letters 6, 107–114 (1987)MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Hipp, J., Guntzer, U.: Is pushing constraints deeply into the mining algorithms really what we want? - An alternative approach for association rule mining. ACM SIGKDD Explorations Newsletter 4, 50–55 (2002)CrossRefGoogle Scholar
  23. 23.
    Imielinski, T., Mannila, H.: A Database Perspective on Knowledge Discovery. Communications of the ACM 39, 58–64 (1996)CrossRefGoogle Scholar
  24. 24.
    Imielinski, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3, 373–408 (1999)CrossRefGoogle Scholar
  25. 25.
    Imielinski, T., Virmani, A., Abdulghani, A.: Discovery board application programming interface and query language for database mining. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining, pp. 20–26. AAAI Press, Menlo Park (1996)Google Scholar
  26. 26.
    ISO: Information technology – Database languages – SQL multimedia and application packages – Part 6: Data mining. ISO/IEC 13249-6 (2006) Google Scholar
  27. 27.
    Jain, S., Swamy, C., Balaji, K.: Greedy Algorithms for k-way Graph Partitioning. In: Sinha, P.K., Das, C.R. (eds.) Proceedings of the 6th International Conference on Advanced Computing., Tata McGraw Hill, New York (1998)Google Scholar
  28. 28.
    Jarke, M.: Common subexpression isolation in multiple query optimization. In: Kim, W., Reiner, D.S. (eds.) Query Processing in Database Systems, pp. 191–205. Springer, New York (1985)Google Scholar
  29. 29.
    Jedrzejczak, P., Wojciechowski, M.: Integrated Candidate Generation in Processing Batches of Frequent Itemset Queries Using Apriori. In: Fred, A., Filipe, J. (eds.) Proceedings of the 2nd International Conference on Knowledge Discovery and Information Retrieval, pp. 487–490. SciTePress (2010)Google Scholar
  30. 30.
    Jin, R., Sinha, K., Agrawal, G.: Simultaneous Optimization of Complex Mining Tasks with a Knowledgeable Cache. In: Grossman, R., Bayardo, R.J., Bennett, K.P. (eds.) Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 600–605. ACM Press, New York (2005)CrossRefGoogle Scholar
  31. 31.
    JSR-73 Expert Group: Java Specification Request 73: Java Data Mining, JDM (2005)Google Scholar
  32. 32.
    Karypis, G.: Multilevel Hypergraph Partitioning. In: Cong, J., Shinnerl, J. (eds.) Multilevel Optimization Methods for VLSI. Kluwer Academic Publishers, Boston (2002)Google Scholar
  33. 33.
    Karypis, G., Kumar, V.: Multilevel Graph Partitioning Schemes. In: Banerjee, P., Boca, P. (eds.) Proceedings of the 24th International Conference on Parallel Processing, pp. 113–122. CRC Press, Boca Raton (1995)Google Scholar
  34. 34.
    Karypis, G., Han, E., Kumar, V.: Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)Google Scholar
  35. 35.
    Meo, R.: Optimization of a Language for Data Mining. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 437–444. ACM, New York (2003)CrossRefGoogle Scholar
  36. 36.
    Meo, R.: Inductive Databases: Towards a New Generation of Databases for Knowledge Discovery. In: Proceedings of the First International Workshop on Integrating Data Mining, Database and Information Retrieval, pp. 1003–1007. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  37. 37.
    Morzy, M., Wojciechowski, M., Zakrzewicz, M.: Optimizing a Sequence of Frequent Pattern Queries. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 448–457. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  38. 38.
    Morzy, T., Wojciechowski, M., Zakrzewicz, M.: Data Mining Support in Database Management Systems. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 382–392. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  39. 39.
    Morzy, T., Wojciechowski, M., Zakrzewicz, M.: Materialized Data Mining Views. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 65–74. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  40. 40.
    Nag, B., Deshpande, P.M., DeWitt, D.J.: Using a Knowledge Cache for Interactive Discovery of Association Rules. In: Han, J. (ed.) Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 244–253. ACM Press, New York (1999)CrossRefGoogle Scholar
  41. 41.
    Netz, A., Chaudhuri, S., Fayyad, U., Bernhardt, J.: Integrating data mining with SQL databases: OLE DB for data mining. In: Proceedings of the 17th International Conference on Data Engineering, pp. 379–387. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  42. 42.
    Ng, R., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Tiwary, A., Haas, L.M. (eds.) Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 13–24. ACM Press, New York (1998)CrossRefGoogle Scholar
  43. 43.
    Oracle Corporation: PL/SQL Packages and Types Reference, 10g Release 1 (10.1) (2003)Google Scholar
  44. 44.
    Pei, J., Han J.: Can We Push More Constraints into Frequent Pattern Mining? In: Ra-makrishnan, R., Stolfo, S., Bayardo, R., Parsa, I. (eds.) Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 350–354. ACM Press, New York (2000)CrossRefGoogle Scholar
  45. 45.
    Pei, J., Han, J., Lakshmanan, L.V.S.: Pushing Convertible Constraints in Frequent Itemset Mining. Data Mining and Knowledge Discovery 8, 227–252 (2004)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Roy, P., Seshadri, S., Sundarshan, S., Bhobe, S.: Efficient and Extensible Algorithms for Multi Query Optimization. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pp. 249–260. ACM Press, New York (2000)CrossRefGoogle Scholar
  47. 47.
    Sellis, T.K.: Multiple Query Optimization. ACM Transactions on Database Systems 13, 23–52 (1988)CrossRefGoogle Scholar
  48. 48.
    Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Heckerman, D., Mannila, H., Pregibon, D. (eds.) Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 67–73. AAAI Press, Menlo Park (1997)Google Scholar
  49. 49.
    Thomas, S., Bodagala, S., Alsabti, K., Ranka, S.: An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases. In: Heckerman, D., Mannila, H., Pregibon, D. (eds.) Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 263–266. AAAI Press, Menlo Park (1997)Google Scholar
  50. 50.
    Wojciechowski, M., Galecki, K., Gawronek, K.: Three Strategies for Concurrent Processing of Frequent Itemset Queries Using FP-growth. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 240–258. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  51. 51.
    Wojciechowski, M., Zakrzewicz, M.: Methods for Batch Processing of Data Mining Queries. In: Haav, H.-M., Kalja, A. (eds.) Proceedings of the 5th International Baltic Conference on Databases and Information Systems, Tallinn Technical University, pp. 225–236 (2002)Google Scholar
  52. 52.
    Wojciechowski, M., Zakrzewicz, M.: Dataset Filtering Techniques in Constraint-Based Frequent Pattern Mining. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 77–91. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  53. 53.
    Wojciechowski, M., Zakrzewicz, M.: Evaluation of Common Counting Method for Concurrent Data Mining Queries. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 76–87. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  54. 54.
    Wojciechowski, M., Zakrzewicz, M.: Data Mining Query Scheduling for Apriori Common Counting. In: Barzdins, J. (ed.) Proceedings of the 6th International Baltic Conference on Databases and Information Systems, University of Latvia, pp. 270–281 (2004)Google Scholar
  55. 55.
    Wojciechowski, M., Zakrzewicz, M.: Evaluation of the Mine Merge Method for Data Mining Query Processing. In: Benczur, A., Demetrovics, J., Gottlob, G. (eds.) Proceedings of the 8th East European Conference on Advances in Databases and Information Systems, Computer and Automation Research Institute, Hungarian Academy of Sciences, pp. 78–88 (2004)Google Scholar
  56. 56.
    Wojciechowski, M., Zakrzewicz, M.: On Multiple Query Optimization in Data Mining. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 696–701. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  57. 57.
    Wojciechowski, M., Zakrzewicz, M.: Heuristic Scheduling of Concurrent Data Mining Queries. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 315–322. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  58. 58.
    Wojciechowski, M., Zakrzewicz, M.: Partycjonowanie grafow a optymalizacja wykonania zbioru zapytan eksploracyjnych. In: Morzy, T., Rybinski, H. (eds.) Proceedings of I Krajowa Konferencja Naukowa Technologie Przetwarzania Danych, pp. 62–71. Wydawnictwo Politechniki Poznanskiej (2005)Google Scholar
  59. 59.
    Zakrzewicz, M., Morzy, M., Wojciechowski, M.: A Study on Answering a Data Mining Query Using a Materialized View. In: Aykanat, C., Dayar, T., Körpeoğlu, İ. (eds.) ISCIS 2004. LNCS, vol. 3280, pp. 493–502. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marek Wojciechowski
    • 1
  • Maciej Zakrzewicz
    • 1
  • Pawel Boinski
    • 1
  1. 1.Institute of Computing SciencePoznan University of TechnologyPoznanPoland

Personalised recommendations