Abstract
Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, interactive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, S., Agrawal, R., Deshpande, P., Naughton, J., Sarawagi, S., Ramakrishnan, R.: On the Computation of Multidimensional Aggregates. In: Proceedings of the International Conference on Very Large Databases (VLDB 1996), Mumbai (Bomabi), India (1996)
Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 181–192 (1999)
Acharya, S., Gibbons, P.B., Poosala, V.: Congressional samples for approximate answering of group-by queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, TX, May 2000, pp. 487–498 (2000)
Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and Iceberg CUBE. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 359–370 (1999)
Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Technical Report, www.arborsoft.com/OLAP.html
Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1), 65–74 (1997)
Chan, C.Y., Ioannidis, Y.E.: Bitmap Index Design and Evaluation. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, WA, pp. 355–366 (1998)
Goil, S., Choudhary, A.N.: PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining. Journal of Parallel and Distributed Computing 61(3), 285–321 (2001)
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1(1), 29–53 (1997)
Gupta, A., Harinarayan, V., Quass, D.: Aggregate-query Processing in Data Warehousing Environments. In: Proceedings of the Eighth International Conference on Very Large Databases (VLDB 1995), Zurich, Switzerland, pp. 358–369 (1995)
Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, WA, pp. 331–342 (1998)
Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. SIGMOD Record (ACM Special Interest Group on Management of Data) 25(2), 205–216 (1996)
Johnson, T., Shasha, D.: Some Approaches to Index Design for Cube Forests. Bulletin of the Technical Committee on Data Engineering 20(1), 27–35 (1997)
Lomet, D.: Bulletin of the Technical Committee on Data Engineering, 18, IEEEE Computer Society (1995)
Lehner, W., Sidle, R., Pirahesh, H., Cochrane, R.W.: Maintenance of cube automatic summary tables. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, TX, May 2000, pp. 512–513 (2000)
Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of Data Cubes and Summary Tables in a Warehouse. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), Tucson, AZ, pp. 100–111 (1997)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), Montreal, Canada, pp. 1–12 (1996)
Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 193–204 (1999)
Yan, W.P., Larson, P.: Eager Aggregation and Lazy Aggregation. In: Proceedings of the Eighth International Conference on Very Large Databases (VLDB 1995), Zurich, Switzerland, pp. 345–357 (1995)
Zhao, Y., Deshpande, P.M., Naughton, J.F.: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Record (ACM Special Interest Group on Management of Data) 26(2), 159–170 (1997)
Fu, L., Hammer, J.: CUBIST: A New Algorithm For Improving the Performance of Adhoc OLAP Queries. In: ACM Third International Workshop on Data Warehousing and OLAP, Washington, D.C, USA, November 2000, pp. 72–79 (2000)
Hammer, J., Fu, L.: Improving the Performance of OLAP Queries Using Families of Statistics Trees. In: 3rd International Conference on Data Warehousing and Knowledge Discovery DaWaK 2001, Munich, Germany, September 2001, pp. 274–283 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fu, L. (2004). Efficient Evaluation of Sparse Data Cubes. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-27772-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22418-1
Online ISBN: 978-3-540-27772-9
eBook Packages: Springer Book Archive