Skip to main content
Log in

A Pareto Model for OLAP View Size Estimation

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

On-Line Analytical Processing (OLAP) aims at gaining useful information quickly from large amounts of data residing in a data warehouse. To improve the quickness of response to queries, pre-aggregation is a useful strategy. However, it is usually impossible to pre-aggregate along all combinations of the dimensions. The multi-dimensional aspects of the data lead to combinatorial explosion in the number and potential storage size of the aggregates. We must selectively pre-aggregate. Cost/benefit analysis involves estimating the storage requirements of the aggregates in question. We present an original algorithm for estimating the number of rows in an aggregate based on the Pareto distribution model. We test the Pareto Model Algorithm empirically against four published algorithms, and conclude the Pareto Model Algorithm is consistently the best of these algorithms for estimating view size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cardenas A. Analysis and performance of inverted database structures.Communications of the ACM 1975;18(5):253-263.

    Google Scholar 

  • Charikar M, Chaudhuri S, Motwani R, Narasayya V. Towards estimation error guarantees for distinct values. In: Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'00), Dallas, 2000;268-279.

  • Chaudhuri S, Motwani R, Narasayya V. Random sampling for histogram construction: How much is enough? In: Proceedings of the 1998 ACMSIGMOD International Conference on Management of Data (SIGMOD'98), Seattle, 1998;436-447.

  • DeGroot M. Optimal Statistical Decisions. McGraw-Hill Book Company, 1970.

  • Faloutsos C, Matias Y, Silberschatz A. Modeling skewed distributions using multifractals and the '80-20' law. In: Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), Mumbai, 1996;307-317.

  • Flajolet P, Martin G. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences 1985;31:182-209.

    Google Scholar 

  • Gibbons P. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01), Roma, 2001;541-550.

  • Harinarayan V, Rajaraman A, Ullman J. Implementing data cubes ef-ficiently. In: Proceedings of the 1996 ACMSIGMOD International Conference on Management of Data (SIGMOD'96), Montreal, 1996;205-216.

  • Kimball R. The Data Warehouse Toolkit. John Wiley, 1996.

  • Nadeau T, Runapongsa K, Teorey T. Binomial multifractal curve fitting for viewsize estimation inOLAP. In: SCI 2001 Proceedings, Vol. II, Information Systems, Orlando, 2001;194-199.

  • Nadeau T, Teorey T. A Pareto Model for OLAP view size estimation.In: Proceedings of CASCON 2001, Toronto, 2001;1-13.

  • Runapongsa K, Nadeau T, Teorey T. Storage estimation for multidimensional aggregates in OLAP. In: Proceedings of CASCON 1999, Toronto, 1999;40-54.

  • Shukla A, Deshpande P, Naughton J, Ramasamy K. Storage estimation for multidimensional aggregates in the presence of hierarchies.In: Proceedings of the 22nd Very Large Data Bases (VLDB'96), Mumbai, 1996;522-531.

  • Zipf G. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Cambridge: Addison Wesley, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nadeau, T.P., Teorey, T.J. A Pareto Model for OLAP View Size Estimation. Information Systems Frontiers 5, 137–147 (2003). https://doi.org/10.1023/A:1022693305401

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022693305401

Navigation