Advertisement

Improving Quality of Agglomerative Scheduling in Concurrent Processing of Frequent Itemset Queries

  • Pawel Boinski
  • Konrad Jozwiak
  • Marek Wojciechowski
  • Maciej Zakrzewicz
Part of the Advances in Soft Computing book series (AINSC, volume 35)

Abstract

Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of batches of frequent itemset queries has been considered. The best technique for this problem proposed so far is Common Counting, which consists in concurrent processing of frequent itemset queries and integrating their database scans. Common Counting requires that data structures of several queries are stored in main memory at the same time. Since in practice memory is limited, the crucial problem is scheduling the queries to Common Counting phases so that the I/O cost is optimized. According to our previous studies, the best algorithm for this task, applicable to large batches of queries, is CCAgglomerative. In this paper we present a novel query scheduling method CCAgglomerativeNoise, built around CCAgglomerative, increasing its chances of finding an optimal solution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    1. Agrawal, R., Imielinski, T., Swami, A. (1993) Mining Association Rules Between Sets of Items in Large Databases. Proceedings of the 1993 ACM SIG-MOD Conference on Management of Data, Washington, D. C., 207–216Google Scholar
  2. 2.
    2. Agrawal, R., Mehta, M., Shafer, J., Srikant, R., Arning, A., Bollinger, T. (1996) The Quest Data Mining System. Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, 244–249Google Scholar
  3. 3.
    3. Agrawal, R,., Srikant, R,. (1994) Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, 487–499Google Scholar
  4. 4.
    4. Baralis, E., Psaila, G. (1999) Incremental Refinement of Mining Queries. Proceedings of the 1st International Conference on Data Warehousing and Knowledge Discovery, Florence, Italy, 173–182Google Scholar
  5. 5.
    5. Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H. (2002) Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs. Journal of Artificial Intelligence Research 16, 135–166zbMATHGoogle Scholar
  6. 6.
    6. Cheung, D. W.-L., Han, J., Ng, V., Wong, C. Y. (1996) Maintenance of discovered association rules in large databases: An incremental updating technique. Proceedings of the 12th International Conference on Data Engineering, NewOrleans, Louisiana, USA, 106–114Google Scholar
  7. 7.
    7. Garey, M., Johnson, D., Stockmeyer, L. (1976) Some simplified NP-complete graph problems. Theoretical Computer Science 1(3), 237–267zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    8. Hart, J.P., Shogan, A.W. (1987) Semi-greedy heuristics: An empirical study. Operations Research Letters 6, 107–114zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    9. Imielinski, T., Mannila, H. (1996) A Database Perspective on Knowledge Discovery. Communications of the ACM 39(11), 58–64CrossRefGoogle Scholar
  10. 10.
    10. Meo, R,. (2003) Optimization of a Language for Data Mining. Proceedings of the ACM Symposium on Applied Computing - Data Mining Track, Melbourne, Florida, USA, 437–444Google Scholar
  11. 11.
    11. Morzy, M., Wojciechowski, M., Zakrzewicz, M. (2005) Optimizing a Sequence of Frequent Pattern Queries. Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery, Copenhagen, Denmark, 448–457Google Scholar
  12. 12.
    12. Nag, B., Deshpande, P. M., DeWitt, D. J. (1999) Using a Knowledge Cache for Interactive Discovery of Association Rules. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, San Diego, California, 244–253Google Scholar
  13. 13.
    13. Sellis, T. (1988) Multiple-query optimization. ACM Transactions on Database Systems 13(1), 23–52CrossRefGoogle Scholar
  14. 14.
    14. Wojciechowski, M., Zakrzewicz, M. (2003) Evaluation of Common Counting Method for Concurrent Data Mining Queries. Proceedings of 7th East European Conference on Advances in Databases and Information Systems, Dresden, Germany, 76–87Google Scholar
  15. 15.
    15. Wojciechowski, M., Zakrzewicz, M. (2004) Evaluation of the Mine Merge Method for Data Mining Query Processing. Proceedings of the 8th East European Conference on Advances in Databases and Information Systems, Budapest, Hungary, 78–88Google Scholar
  16. 16.
    16. Wojciechowski, M., Zakrzewicz, M. (2005) On Multiple Query Optimization in Data Mining. Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hanoi, Vietnam, 696–701Google Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Pawel Boinski
    • 1
  • Konrad Jozwiak
    • 1
  • Marek Wojciechowski
    • 1
  • Maciej Zakrzewicz
    • 1
  1. 1.Poznan University of TechnologyPoznanPoland

Personalised recommendations