Abstract
Online analytical systems based on multidimensional data models enable analysts and decision makers to have fast and interactive responses to queries posed against the data stored in very large databases. Materialized views can speed up the execution of many queries. Materialization of all the possible views is constrained by the storage space and maintenance cost and one needs to select a subset of views to be materialized. Algorithms to determine the views to be materialized need the size of views as input. However, often the task of determining the actual number of rows present in a view can itself be computationally very expensive because of the large sizes of the tables and views. Counting actual number of rows present in each view takes considerable time in an extremely large database environment. Thus, methods to determine a good estimate of the size of a view in reasonable time instead of determining its actual size are very important. We explore the use of sampling to estimate the size of views. In this paper, we propose a hybrid estimator that takes into account the degree of skew in the data and combines Jackknife estimator with the Schlosser’s estimator and guaranteed error estimator to estimate the size of the view more accurately. The proposed hybrid estimator has been used to estimate the view size and simulation results show better estimated values as compared to individual estimators.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
U.D. Chaudhuri, An overview of data warehousing and OLAP technology. SIGMOD Record 26(1), 65–74 (1997)
M. Sharma, A framework for big data analytics as a scalable systems. Int. J. Adv. Netw. Appl. (IJANA), 72–82 (2015)
S. Agarwal, A. Panda, B. Mozafari, Blink and it’s done: interactive queries on very large data. Proc. VLDB Endow. 5(12),1902–1905 (2012, August)
S. Agarwal, S. Chaudhuri, V. Narasayya, Automated selection of materialized views and indexes for SQL databases. in Proceedings of the 26th International Conference on Very Large Databases (Cairo, Egypt, 2000), pp 406–505
V. Harinarayan, A. Rajaraman J.D. Ullman, Implementing datacubes efficiently. in International Conference on Management of Data (Canada, 1996), pp. 205–216
D.V. Andre, C. Menck, Join size estimation subject to filter conditions. in 41st International Conference on Very Large Data Bases, 2015, Kohala Coast, Hawaii. Proceedings of the VLDB Endowment, vol 8, No. 12 (2015)
V. Agrawal, Data Warehouse Operational Design: View Selection and Performance Simulation. Dissertation at University of Toledo (2018, May), http://www.utoledo.edu/business/PHD/PHDDocs/Vikas_Agrawal_Data_warehouse.pdf
Jh Peter, Jh Stokes, Estimating the number of classes in finite population. J. Am. Stat. Assoc. 93, 1475–1487 (1998)
P.J. Hass, J.F. Naughton, S. Seshadri, L. Stokes, Sampling-based estimation of the number of distinct values of an attribute. in Proceedings of the 21st Very Large Data Bases Conference (Zurich, Switzerland, 1995), pp. 311–322
J.K. Kim, Z. Wang, Sampling techniques for big data analysis in finite population inference. in 2018 Statistics Preprints 136
K. Stefanidis, E. Pitoura, P. Vassiliadis, Modeling and storing context-aware preferences. Adv. Database. Inf. Syst. LNCS 4152, 124–140 (2006)
V. Poosala, Y.E. Loannidids, Selectivity Estimation without the Attribute Value Independence Assumption. in The Proceedings of 23rd International Conference on very Large Databases (1995), pp 486–495
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhan, M., Rajanikanth, K. (2021). Estimation of View Size Using Sampling Techniques. In: Sharma, M.K., Dhaka, V.S., Perumal, T., Dey, N., Tavares, J.M.R.S. (eds) Innovations in Computational Intelligence and Computer Vision. Advances in Intelligent Systems and Computing, vol 1189. Springer, Singapore. https://doi.org/10.1007/978-981-15-6067-5_31
Download citation
DOI: https://doi.org/10.1007/978-981-15-6067-5_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6066-8
Online ISBN: 978-981-15-6067-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)