A Pareto Model for OLAP View Size Estimation

Nadeau, Thomas P.; Teorey, Toby J.

doi:10.1023/A:1022693305401

A Pareto Model for OLAP View Size Estimation

Published: April 2003

Volume 5, pages 137–147, (2003)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Thomas P. Nadeau¹ &
Toby J. Teorey

92 Accesses
11 Citations
Explore all metrics

Abstract

On-Line Analytical Processing (OLAP) aims at gaining useful information quickly from large amounts of data residing in a data warehouse. To improve the quickness of response to queries, pre-aggregation is a useful strategy. However, it is usually impossible to pre-aggregate along all combinations of the dimensions. The multi-dimensional aspects of the data lead to combinatorial explosion in the number and potential storage size of the aggregates. We must selectively pre-aggregate. Cost/benefit analysis involves estimating the storage requirements of the aggregates in question. We present an original algorithm for estimating the number of rows in an aggregate based on the Pareto distribution model. We test the Pareto Model Algorithm empirically against four published algorithms, and conclude the Pareto Model Algorithm is consistently the best of these algorithms for estimating view size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Cardenas A. Analysis and performance of inverted database structures.Communications of the ACM 1975;18(5):253-263.
Google Scholar
Charikar M, Chaudhuri S, Motwani R, Narasayya V. Towards estimation error guarantees for distinct values. In: Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'00), Dallas, 2000;268-279.
Chaudhuri S, Motwani R, Narasayya V. Random sampling for histogram construction: How much is enough? In: Proceedings of the 1998 ACMSIGMOD International Conference on Management of Data (SIGMOD'98), Seattle, 1998;436-447.
DeGroot M. Optimal Statistical Decisions. McGraw-Hill Book Company, 1970.
Faloutsos C, Matias Y, Silberschatz A. Modeling skewed distributions using multifractals and the '80-20' law. In: Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), Mumbai, 1996;307-317.
Flajolet P, Martin G. Probabilistic counting algorithms for database applications. Journal of Computer and System Sciences 1985;31:182-209.
Google Scholar
Gibbons P. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01), Roma, 2001;541-550.
Harinarayan V, Rajaraman A, Ullman J. Implementing data cubes ef-ficiently. In: Proceedings of the 1996 ACMSIGMOD International Conference on Management of Data (SIGMOD'96), Montreal, 1996;205-216.
Kimball R. The Data Warehouse Toolkit. John Wiley, 1996.
Nadeau T, Runapongsa K, Teorey T. Binomial multifractal curve fitting for viewsize estimation inOLAP. In: SCI 2001 Proceedings, Vol. II, Information Systems, Orlando, 2001;194-199.
Nadeau T, Teorey T. A Pareto Model for OLAP view size estimation.In: Proceedings of CASCON 2001, Toronto, 2001;1-13.
Runapongsa K, Nadeau T, Teorey T. Storage estimation for multidimensional aggregates in OLAP. In: Proceedings of CASCON 1999, Toronto, 1999;40-54.
Shukla A, Deshpande P, Naughton J, Ramasamy K. Storage estimation for multidimensional aggregates in the presence of hierarchies.In: Proceedings of the 22nd Very Large Data Bases (VLDB'96), Mumbai, 1996;522-531.
Zipf G. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Cambridge: Addison Wesley, 1949.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Division (CSE), Department of Electrical Engineering and Computer Science (EECS), The University of Michigan, 1301 Beal Avenue, Ann Arbor, MI, 48109-2122, USA
Thomas P. Nadeau

Authors

Thomas P. Nadeau
View author publications
You can also search for this author in PubMed Google Scholar
Toby J. Teorey
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nadeau, T.P., Teorey, T.J. A Pareto Model for OLAP View Size Estimation. Information Systems Frontiers 5, 137–147 (2003). https://doi.org/10.1023/A:1022693305401

Download citation

Issue Date: April 2003
DOI: https://doi.org/10.1023/A:1022693305401

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Pareto Model for OLAP View Size Estimation

Abstract

Access this article

Similar content being viewed by others

Estimation of View Size Using Sampling Techniques

Materialized View Selection Using Iterative Improvement

A Workload-Driven Approach for View Selection in Large Dimensional Datasets

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Pareto Model for OLAP View Size Estimation

Abstract

Access this article

Similar content being viewed by others

Estimation of View Size Using Sampling Techniques

Materialized View Selection Using Iterative Improvement

A Workload-Driven Approach for View Selection in Large Dimensional Datasets

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation