Abstract
This paper proposes a computation method for holistic multi-feature cube (MF-Cube) queries based on the characteristics of MF-Cubes. Three simple yet efficient strategies are designed to optimize the dependent complex aggregate at multiple granularities for a complex data-mining query within data cubes. One strategy is the computation of Holistic MF-Cube queries using the PDAP (Part Distributive Aggregate Property). More efficiency is gained by another strategy, that of dynamic subset data selection (the iceberg query technique), which reduces the size of the materialized data cubes. To extend this efficiency further, the second approach may adopt the chunk-based caching technique that reuses the output of previous queries. By combining these three strategies, we design an algorithm called the PDIC (Part Distributive Iceberg Chunk). We experimentally evaluate this algorithm using synthetic and real-world datasets and demonstrate that our approach delivers up to approximately twice the performance efficiency of traditional computation methods.
Similar content being viewed by others
References
Agarwal S, Agrawal R, Deshpande P, Gupta A et al (1996) On the computation of multidimensional aggregates. In: Proceedings of the international conference on very large data bases, pp 506–521
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 359–370
Dehne F, Eavis T, Rau-Chaplin A (2004) Computing partial data cubes. In: Proceedings of the Hawaii international conference on system sciences, pp 1–20
Deshpande P, Naughton J (2000) Aggregate aware caching for multi-dimensional queries. In: Proceedings of the international conference on extending database technology, pp 167–182
Deshpande P, Ramasamy K, Shukla A (1998) Caching multidimensional queries using chunks. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 259–270
Dong G, Han J, Lam J et al. (2004) Mining constrained gradients in large databases. IEEE Trans Knowl Data Eng 16(8):922–938
Fang M, Shivakumar N, Garcia-Molina H et al (1998) Computing iceberg queries efficiently. In: Proceedings of the international conference on very large data bases, pp 299–310
Gray A, Bosworth A, Layman A, et al (1996) Datacube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proceedings of the international conference on data engineering, pp 152–159
Hahn C, Warren S, London J Edited synoptic cloud reports from ships and land stations over the globe. http://cdiac.esd.ornl.gov/cdiac/ndps/ndp026b.html
Han J, Kambr M (2001) In: Data mining concepts and techniques. Morgan Kaufmann, San Francisco, pp 39–104
Han J, Pei J, Dong G et al (2001) Efficient computation of iceberg cubes with complex measures. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12
Ng R, Wagner A, Yin Y (2001) Iceberg-cube computation with pc clusters. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 25–36
O’Gorman K, Agrawal D, El Abbadi A (2002) Multiple query optimization by cache-aware middleware using query teamwork. In: Proceedings of the IEEE international conference on data engineering, p 274
Ross K, Srivastava D (1997) Fast computation of sparse datacubes. In: Proceedings of the international conference on very large data bases, pp 116–125
Ross K, Srivastava D, Chatziantoniou D (1998) Complex aggregation at multiple granularities. In: Proceedings of the international conference on extending database technology, pp 263–277
Wang K, Jiang Y, Dong G et al. (2005) Divide-and-approximate: a novel constraint push strategy for iceberg cube mining. IEEE Trans Knowl Data Eng 17(3):354–368
Yang Q, Wu X (2005) 10 Challenging problems in data mining research. In: IEEE international conference on data mining
Zhao Y, Deshpande P, Naughton J (1997) An array-based algorithm for simultaneous multidimensional aggregates. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 159–170
Zhu Q, Tao Y, Zuzarte C (2003) Exploiting similarity of sub-queries for complex query optimization. In: Proceedings of the international conference on database and expert systems applications, pp 747–759
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, S., Wang, R. & Jin, Z. Strategies for complex data cube queries. Appl Intell 31, 332–346 (2009). https://doi.org/10.1007/s10489-008-0130-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-008-0130-2