Complex aggregation at multiple granularities
Datacube queries compute simple aggregates at multiple granularities. In this paper we examine the more general and useful problem of computing a complex subquery involving multiple dependent aggregates at multiple granularities. We call such queries “multi-feature cubes.” An example is “Broken down by all combinations of month and customer, find the fraction of the total sales in 1996 of a particular item due to suppliers supplying within 10% of the minimum price (within the group), showing all subtotals across each dimension.” We classify multi-feature cubes based on the extent to which fine granularity results can be used to compute coarse granularity results; this classification includes distributive, algebraic and holistic multi-feature cubes. We provide syntactic sufficient conditions to determine when a multi-feature cube is either distributive or algebraic. This distinction is important because, as we show, existing datacube evaluation algorithms can be used to compute multi-feature cubes that are distributive or algebraic, without any increase in I/O complexity. We evaluate the CPU performance of computing multi-feature cubes using the datacube evaluation algorithm of Ross and Srivastava. Using a variety of synthetic, benchmark and real-world data sets, we demonstrate that the CPU cost of evaluating distributive multi-feature cubes is comparable to that of evaluating simple datacubes. We also show that a variety of holistic multi-feature cubes can be evaluated with a manageable overhead compared to the distributive case.
KeywordsTotal Sales Minimum Price Aggregate Function Coarse Granularity Multiple Granularity
Unable to display preview. Download preview PDF.
- [AAD+96]S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proceedings of VLDB, pages 506–521, 1996.Google Scholar
- [AGS97]R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proceedings of IEEE ICDE, 1997.Google Scholar
- [CR96]D. Chatziantoniou and K. A. Ross. Querying multiple features of groups in relational databases. In Proceedings of VLDB, pages 295–306, 1996.Google Scholar
- [GBLP96]J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Datacube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In Proceedings of IEEE ICDE, pages 152–159, 1996. Also available as Microsoft Technical Report MSR-TR-95-22.Google Scholar
- [HWL94]C. J. Hahn, S. G. Warren, and J. London. Edited synoptic cloud reports from ships and land stations over the globe, 1982–1991. Available from http://cdiac.esd.ornl.gov/cdiac/ndps/ndp026b.html, 1994.Google Scholar
- [LW96]C. Li and X. S. Wang. A data model for supporting on-line analytical processing. In Proceedings of CIKM, pages 81–88, 1996.Google Scholar
- [RS97]K. A. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proceedings of VLDB, pages 116–125, 1997.Google Scholar
- [RSC97]K. A. Ross, D. Srivastava and D. Chatziantoniou. Complex aggregation at multiple granularities. AT&T Technical Report, 1997.Google Scholar
- [Tra95]Transaction Processing Performance Council (TPC), 777 N. First Street, Suite 600, San Jose, CA 95112, USA. TPC Benchmark D (Decision Support), May 1995.Google Scholar
- [ZDN97]Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In Proceedings of ACM SIGMOD, pages 159–170, 1997.Google Scholar