Skip to main content

Estimation of View Size Using Sampling Techniques

  • Conference paper
  • First Online:
Innovations in Computational Intelligence and Computer Vision

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1189))

  • 847 Accesses

Abstract

Online analytical systems based on multidimensional data models enable analysts and decision makers to have fast and interactive responses to queries posed against the data stored in very large databases. Materialized views can speed up the execution of many queries. Materialization of all the possible views is constrained by the storage space and maintenance cost and one needs to select a subset of views to be materialized. Algorithms to determine the views to be materialized need the size of views as input. However, often the task of determining the actual number of rows present in a view can itself be computationally very expensive because of the large sizes of the tables and views. Counting actual number of rows present in each view takes considerable time in an extremely large database environment. Thus, methods to determine a good estimate of the size of a view in reasonable time instead of determining its actual size are very important. We explore the use of sampling to estimate the size of views. In this paper, we propose a hybrid estimator that takes into account the degree of skew in the data and combines Jackknife estimator with the Schlosser’s estimator and guaranteed error estimator to estimate the size of the view more accurately. The proposed hybrid estimator has been used to estimate the view size and simulation results show better estimated values as compared to individual estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. U.D. Chaudhuri, An overview of data warehousing and OLAP technology. SIGMOD Record 26(1), 65–74 (1997)

    Article  MathSciNet  Google Scholar 

  2. M. Sharma, A framework for big data analytics as a scalable systems. Int. J. Adv. Netw. Appl. (IJANA), 72–82 (2015)

    Google Scholar 

  3. S. Agarwal, A. Panda, B. Mozafari, Blink and it’s done: interactive queries on very large data. Proc. VLDB Endow. 5(12),1902–1905 (2012, August)

    Google Scholar 

  4. S. Agarwal, S. Chaudhuri, V. Narasayya, Automated selection of materialized views and indexes for SQL databases. in Proceedings of the 26th International Conference on Very Large Databases (Cairo, Egypt, 2000), pp 406–505

    Google Scholar 

  5. V. Harinarayan, A. Rajaraman J.D. Ullman, Implementing datacubes efficiently. in International Conference on Management of Data (Canada, 1996), pp. 205–216

    Google Scholar 

  6. D.V. Andre, C. Menck, Join size estimation subject to filter conditions. in 41st International Conference on Very Large Data Bases, 2015, Kohala Coast, Hawaii. Proceedings of the VLDB Endowment, vol 8, No. 12 (2015)

    Google Scholar 

  7. V. Agrawal, Data Warehouse Operational Design: View Selection and Performance Simulation. Dissertation at University of Toledo (2018, May), http://www.utoledo.edu/business/PHD/PHDDocs/Vikas_Agrawal_Data_warehouse.pdf

  8. Jh Peter, Jh Stokes, Estimating the number of classes in finite population. J. Am. Stat. Assoc. 93, 1475–1487 (1998)

    Article  MathSciNet  Google Scholar 

  9. P.J. Hass, J.F. Naughton, S. Seshadri, L. Stokes, Sampling-based estimation of the number of distinct values of an attribute. in Proceedings of the 21st Very Large Data Bases Conference (Zurich, Switzerland, 1995), pp. 311–322

    Google Scholar 

  10. J.K. Kim, Z. Wang, Sampling techniques for big data analysis in finite population inference. in 2018 Statistics Preprints 136

    Google Scholar 

  11. http://www.tpc.org

  12. K. Stefanidis, E. Pitoura, P. Vassiliadis, Modeling and storing context-aware preferences. Adv. Database. Inf. Syst. LNCS 4152, 124–140 (2006)

    Google Scholar 

  13. V. Poosala, Y.E. Loannidids, Selectivity Estimation without the Attribute Value Independence Assumption. in The Proceedings of 23rd International Conference on very Large Databases (1995), pp 486–495

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhu Bhan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhan, M., Rajanikanth, K. (2021). Estimation of View Size Using Sampling Techniques. In: Sharma, M.K., Dhaka, V.S., Perumal, T., Dey, N., Tavares, J.M.R.S. (eds) Innovations in Computational Intelligence and Computer Vision. Advances in Intelligent Systems and Computing, vol 1189. Springer, Singapore. https://doi.org/10.1007/978-981-15-6067-5_31

Download citation

Publish with us

Policies and ethics