Skip to main content

Efficient Evaluation of Sparse Data Cubes

  • Conference paper
Advances in Web-Age Information Management (WAIM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3129))

Included in the following conference series:

Abstract

Computing data cubes requires the aggregation of measures over arbitrary combinations of dimensions in a data set. Efficient data cube evaluation remains challenging because of the potentially very large sizes of input datasets (e.g., in the data warehousing context), the well-known curse of dimensionality, and the complexity of queries that need to be supported. This paper proposes a new dynamic data structure called SST (Sparse Statistics Trees) and a novel, interactive, and fast cube evaluation algorithm called CUPS (Cubing by Pruning SST), which is especially well suitable for computing aggregates in cubes whose data sets are sparse. SST only stores the aggregations of non-empty cube cells instead of the detailed records. Furthermore, it retains in memory the dense cubes (a.k.a. iceberg cubes) whose aggregate values are above a threshold. Sparse cubes are stored on disks. This allows a fast, accurate approximation for queries. If users desire more refined answers, related sparse cubes are aggregated. SST is incrementally maintainable, which makes CUPS suitable for data warehousing and analysis of streaming data. Experiment results demonstrate the excellent performance and good scalability of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, S., Agrawal, R., Deshpande, P., Naughton, J., Sarawagi, S., Ramakrishnan, R.: On the Computation of Multidimensional Aggregates. In: Proceedings of the International Conference on Very Large Databases (VLDB 1996), Mumbai (Bomabi), India (1996)

    Google Scholar 

  2. Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 181–192 (1999)

    Google Scholar 

  3. Acharya, S., Gibbons, P.B., Poosala, V.: Congressional samples for approximate answering of group-by queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, TX, May 2000, pp. 487–498 (2000)

    Google Scholar 

  4. Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and Iceberg CUBE. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 359–370 (1999)

    Google Scholar 

  5. Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Technical Report, www.arborsoft.com/OLAP.html

  6. Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1), 65–74 (1997)

    Article  Google Scholar 

  7. Chan, C.Y., Ioannidis, Y.E.: Bitmap Index Design and Evaluation. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, WA, pp. 355–366 (1998)

    Google Scholar 

  8. Goil, S., Choudhary, A.N.: PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining. Journal of Parallel and Distributed Computing 61(3), 285–321 (2001)

    Article  MATH  Google Scholar 

  9. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1(1), 29–53 (1997)

    Article  Google Scholar 

  10. Gupta, A., Harinarayan, V., Quass, D.: Aggregate-query Processing in Data Warehousing Environments. In: Proceedings of the Eighth International Conference on Very Large Databases (VLDB 1995), Zurich, Switzerland, pp. 358–369 (1995)

    Google Scholar 

  11. Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, WA, pp. 331–342 (1998)

    Google Scholar 

  12. Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. SIGMOD Record (ACM Special Interest Group on Management of Data) 25(2), 205–216 (1996)

    Google Scholar 

  13. Johnson, T., Shasha, D.: Some Approaches to Index Design for Cube Forests. Bulletin of the Technical Committee on Data Engineering 20(1), 27–35 (1997)

    Google Scholar 

  14. Lomet, D.: Bulletin of the Technical Committee on Data Engineering, 18, IEEEE Computer Society (1995)

    Google Scholar 

  15. Lehner, W., Sidle, R., Pirahesh, H., Cochrane, R.W.: Maintenance of cube automatic summary tables. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, TX, May 2000, pp. 512–513 (2000)

    Google Scholar 

  16. Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of Data Cubes and Summary Tables in a Warehouse. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), Tucson, AZ, pp. 100–111 (1997)

    Google Scholar 

  17. Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), Montreal, Canada, pp. 1–12 (1996)

    Google Scholar 

  18. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, PA, June 1999, pp. 193–204 (1999)

    Google Scholar 

  19. Yan, W.P., Larson, P.: Eager Aggregation and Lazy Aggregation. In: Proceedings of the Eighth International Conference on Very Large Databases (VLDB 1995), Zurich, Switzerland, pp. 345–357 (1995)

    Google Scholar 

  20. Zhao, Y., Deshpande, P.M., Naughton, J.F.: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Record (ACM Special Interest Group on Management of Data) 26(2), 159–170 (1997)

    Google Scholar 

  21. Fu, L., Hammer, J.: CUBIST: A New Algorithm For Improving the Performance of Adhoc OLAP Queries. In: ACM Third International Workshop on Data Warehousing and OLAP, Washington, D.C, USA, November 2000, pp. 72–79 (2000)

    Google Scholar 

  22. Hammer, J., Fu, L.: Improving the Performance of OLAP Queries Using Families of Statistics Trees. In: 3rd International Conference on Data Warehousing and Knowledge Discovery DaWaK 2001, Munich, Germany, September 2001, pp. 274–283 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fu, L. (2004). Efficient Evaluation of Sparse Data Cubes. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27772-9_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22418-1

  • Online ISBN: 978-3-540-27772-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics