Efficient Evaluation of Sparse Data Cubes

Fu, Lixin

doi:10.1007/978-3-540-27772-9_34

Lixin Fu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3129))

Included in the following conference series:

International Conference on Web-Age Information Management

906 Accesses
4 Citations

Abstract

Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, interactive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, S., Agrawal, R., Deshpande, P., Naughton, J., Sarawagi, S., Ramakrishnan, R.: On the Computation of Multidimensional Aggregates. In: Proceedings of the International Conference on Very Large Databases (VLDB 1996), Mumbai (Bomabi), India (1996)
Google Scholar
Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 181–192 (1999)
Google Scholar
Acharya, S., Gibbons, P.B., Poosala, V.: Congressional samples for approximate answering of group-by queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, TX, May 2000, pp. 487–498 (2000)
Google Scholar
Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and Iceberg CUBE. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 359–370 (1999)
Google Scholar
Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Technical Report, www.arborsoft.com/OLAP.html
Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1), 65–74 (1997)
Article Google Scholar
Chan, C.Y., Ioannidis, Y.E.: Bitmap Index Design and Evaluation. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, WA, pp. 355–366 (1998)
Google Scholar
Goil, S., Choudhary, A.N.: PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining. Journal of Parallel and Distributed Computing 61(3), 285–321 (2001)
Article MATH Google Scholar
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1(1), 29–53 (1997)
Article Google Scholar
Gupta, A., Harinarayan, V., Quass, D.: Aggregate-query Processing in Data Warehousing Environments. In: Proceedings of the Eighth International Conference on Very Large Databases (VLDB 1995), Zurich, Switzerland, pp. 358–369 (1995)
Google Scholar
Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, WA, pp. 331–342 (1998)
Google Scholar
Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. SIGMOD Record (ACM Special Interest Group on Management of Data) 25(2), 205–216 (1996)
Google Scholar
Johnson, T., Shasha, D.: Some Approaches to Index Design for Cube Forests. Bulletin of the Technical Committee on Data Engineering 20(1), 27–35 (1997)
Google Scholar
Lomet, D.: Bulletin of the Technical Committee on Data Engineering, 18, IEEEE Computer Society (1995)
Google Scholar
Lehner, W., Sidle, R., Pirahesh, H., Cochrane, R.W.: Maintenance of cube automatic summary tables. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, TX, May 2000, pp. 512–513 (2000)
Google Scholar
Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of Data Cubes and Summary Tables in a Warehouse. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), Tucson, AZ, pp. 100–111 (1997)
Google Scholar
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), Montreal, Canada, pp. 1–12 (1996)
Google Scholar
Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 193–204 (1999)
Google Scholar
Yan, W.P., Larson, P.: Eager Aggregation and Lazy Aggregation. In: Proceedings of the Eighth International Conference on Very Large Databases (VLDB 1995), Zurich, Switzerland, pp. 345–357 (1995)
Google Scholar
Zhao, Y., Deshpande, P.M., Naughton, J.F.: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Record (ACM Special Interest Group on Management of Data) 26(2), 159–170 (1997)
Google Scholar
Fu, L., Hammer, J.: CUBIST: A New Algorithm For Improving the Performance of Adhoc OLAP Queries. In: ACM Third International Workshop on Data Warehousing and OLAP, Washington, D.C, USA, November 2000, pp. 72–79 (2000)
Google Scholar
Hammer, J., Fu, L.: Improving the Performance of OLAP Queries Using Families of Statistics Trees. In: 3rd International Conference on Data Warehousing and Knowledge Discovery DaWaK 2001, Munich, Germany, September 2001, pp. 274–283 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Computer Science Department of Mathematical Sciences, University of North Carolina, Greensboro 383 Bryan Building, P. O. Box 26170, Greensboro, NC, 27402-6170, U.S.A.
Lixin Fu

Authors

Lixin Fu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Qing Li
Shenyang Liaoning, Northeastern University, 110004, China
Guoren Wang
Dept. of Computer Science & Technology, Tsinghua University, Beijing, China
Ling Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, L. (2004). Efficient Evaluation of Sparse Data Cubes. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-540-27772-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22418-1
Online ISBN: 978-3-540-27772-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics