Advertisement

Applied Intelligence

, 31:332 | Cite as

Strategies for complex data cube queries

  • Shichao ZhangEmail author
  • Rifeng Wang
  • Zhi Jin
Article
  • 66 Downloads

Abstract

This paper proposes a computation method for holistic multi-feature cube (MF-Cube) queries based on the characteristics of MF-Cubes. Three simple yet efficient strategies are designed to optimize the dependent complex aggregate at multiple granularities for a complex data-mining query within data cubes. One strategy is the computation of Holistic MF-Cube queries using the PDAP (Part Distributive Aggregate Property). More efficiency is gained by another strategy, that of dynamic subset data selection (the iceberg query technique), which reduces the size of the materialized data cubes. To extend this efficiency further, the second approach may adopt the chunk-based caching technique that reuses the output of previous queries. By combining these three strategies, we design an algorithm called the PDIC (Part Distributive Iceberg Chunk). We experimentally evaluate this algorithm using synthetic and real-world datasets and demonstrate that our approach delivers up to approximately twice the performance efficiency of traditional computation methods.

Keywords

Multidimensional data Data cube query Data cube modeling 

References

  1. 1.
    Agarwal S, Agrawal R, Deshpande P, Gupta A et al (1996) On the computation of multidimensional aggregates. In: Proceedings of the international conference on very large data bases, pp 506–521 Google Scholar
  2. 2.
    Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 359–370 Google Scholar
  3. 3.
    Dehne F, Eavis T, Rau-Chaplin A (2004) Computing partial data cubes. In: Proceedings of the Hawaii international conference on system sciences, pp 1–20 Google Scholar
  4. 4.
    Deshpande P, Naughton J (2000) Aggregate aware caching for multi-dimensional queries. In: Proceedings of the international conference on extending database technology, pp 167–182 Google Scholar
  5. 5.
    Deshpande P, Ramasamy K, Shukla A (1998) Caching multidimensional queries using chunks. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 259–270 Google Scholar
  6. 6.
    Dong G, Han J, Lam J et al. (2004) Mining constrained gradients in large databases. IEEE Trans Knowl Data Eng 16(8):922–938 CrossRefGoogle Scholar
  7. 7.
    Fang M, Shivakumar N, Garcia-Molina H et al (1998) Computing iceberg queries efficiently. In: Proceedings of the international conference on very large data bases, pp 299–310 Google Scholar
  8. 8.
    Gray A, Bosworth A, Layman A, et al (1996) Datacube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proceedings of the international conference on data engineering, pp 152–159 Google Scholar
  9. 9.
    Hahn C, Warren S, London J Edited synoptic cloud reports from ships and land stations over the globe. http://cdiac.esd.ornl.gov/cdiac/ndps/ndp026b.html
  10. 10.
    Han J, Kambr M (2001) In: Data mining concepts and techniques. Morgan Kaufmann, San Francisco, pp 39–104 Google Scholar
  11. 11.
    Han J, Pei J, Dong G et al (2001) Efficient computation of iceberg cubes with complex measures. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12 Google Scholar
  12. 12.
    Ng R, Wagner A, Yin Y (2001) Iceberg-cube computation with pc clusters. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 25–36 Google Scholar
  13. 13.
    O’Gorman K, Agrawal D, El Abbadi A (2002) Multiple query optimization by cache-aware middleware using query teamwork. In: Proceedings of the IEEE international conference on data engineering, p 274 Google Scholar
  14. 14.
    Ross K, Srivastava D (1997) Fast computation of sparse datacubes. In: Proceedings of the international conference on very large data bases, pp 116–125 Google Scholar
  15. 15.
    Ross K, Srivastava D, Chatziantoniou D (1998) Complex aggregation at multiple granularities. In: Proceedings of the international conference on extending database technology, pp 263–277 Google Scholar
  16. 16.
    Wang K, Jiang Y, Dong G et al. (2005) Divide-and-approximate: a novel constraint push strategy for iceberg cube mining. IEEE Trans Knowl Data Eng 17(3):354–368 CrossRefGoogle Scholar
  17. 17.
    Yang Q, Wu X (2005) 10 Challenging problems in data mining research. In: IEEE international conference on data mining Google Scholar
  18. 18.
    Zhao Y, Deshpande P, Naughton J (1997) An array-based algorithm for simultaneous multidimensional aggregates. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 159–170 Google Scholar
  19. 19.
    Zhu Q, Tao Y, Zuzarte C (2003) Exploiting similarity of sub-queries for complex query optimization. In: Proceedings of the international conference on database and expert systems applications, pp 747–759 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Faculty of Information TechnologyUniversity of Technology SydneySydneyAustralia
  2. 2.College of CS and ITGuangxi Normal UniversityGuilinChina
  3. 3.Department of Comuter ScienceGuangxi University of TechnologyLiuzhouChina
  4. 4.School of EE and CSPeking UniversityBeijingChina

Personalised recommendations