Abstract
Selectivity is a parameter used by a query optimizer for estimating the size of data that satisfies a query condition. Calculation of selectivity requires some representation of distribution of attribute values. Commonly, one-dimensional histograms that describe distributions of single attribute are used in DBMSes. A multidimensional (m-d) representation is required for complex queries with a range selection condition based on many attributes. Storing m-d representation directly (e.g. m-d histogram) is very space consuming for high dimensions hence the copula-based approach is proposed where we only need to store a few parameters. By using very few parameters of copula we achieve the method more accurate in selectivity estimation than the method based on attribute values independence which is commonly used by database management systems. The paper presents a software module which provides the copula-based method of selectivity estimation for a m-d range query. The presented solution is based on R Serve and it is integrated with Oracle DBMS. Some additional advantages of the module result from caching selectivities values for similar conditions are shown.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Matlab Sample Data Sets (2016) https://www.mathworks.com/help/stats/_bq9uxn4.html.
- 2.
Rserve - Binary R server (2016) http://rforge.net/Rserve.
- 3.
Oracle 11g. Using Extensible Optimizer (2016) http://docs.oracle.com/cd/B28359_01/appdev.111/b28425/ext_optimizer.htm.
References
Augustyn, D.R.: Applying advanced methods of query selectivity estimation in oracle DBMS. In: ICMMI 2009, pp. 585–593. Springer, The Beskids (2009)
Augustyn, D.R.: Applying prediction of attribute values distribution for improvement of query selectivity estimation accuracy. Zastosowanie predykcji rozkładu wartości atrybutu w celu poprawy dokładności estymacji selektywności zapytań. Stud. Inform. 34(2A(111)), 23–42 (2013)
Augustyn, D.R.: M2HSE - the selectivity estimation method based on multidimensional attribute values distribution and marginal ones for some kind of range queries. Stud. Inform. 34, 43–56 (2013). M2HSE - metoda estymacji selektywności pewnej klasy zapytań zakresowych oparta na wielowymiarowym rozkładzie wartości atrybutów oraz rozkładach brzegowych
Augustyn, D.R.: Using the model of continuous dynamical system with viscous resistance forces for improving distribution prediction based on evolution of quantiles. In: BDAS 2014, pp. 1–9. Springer, Ustron (2014)
Augustyn, D.R., Warchal, L.: Applying task-aggregating wrapper to cuda-based method of query selectivity calculation using multidimensional kernel estimator. In: ICMMI 2013, pp. 591–599. Springer, Brenna (2013)
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. Int. J. Very Large Data Bases 10(2–3), 199–223 (2001)
Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: SIGMOD 2001, pp. 461–472. ACM, Santa Barbara (2001)
Hofert, M., Mächler, M.: Nested archimedean copulas meet R: the nacopula package. J. Stat. Softw. 39(9), 1–20 (2011)
Joe, H.: Dependence Modeling with Copulas. CRC Press, Boca Raton (2014)
Lee, J.H., Kim, D.H., Chung, C.W.: Multi-dimensional selectivity estimation using compressed histogram information. ACM SIGMOD Rec. 28(2), 205–214 (1999)
Nelsen, R.B.: An Introduction to Copulas (Springer Series in Statistics). Springer-Verlag New York, Inc., New York (2006)
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB 1997, pp. 486–495. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Senegačnik, J.: PL/SQL and the CBO, pp. 153–172. Apress, Berkeley (2010)
Sklar, A.: Fonctions de Répartition á n Dimensions et Leurs Marges. Université Paris, Paris (1959)
Yan, F., Hou, W.C., Jiang, Z., Luo, C., Zhu, Q.: Selectivity estimation of range queries based on data density approximation via cosine series. Data Knowl. Eng. 63(3), 855–878 (2007)
Yan, J.: Enjoy the joy of copulas: with a package copula. J. Stat. Softw. 21(4), 1–21 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Augustyn, D.R. (2018). Copula-Based Module for Selectivity Estimation of Multidimensional Range Queries. In: Gruca, A., CzachĂłrski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-67792-7_55
Download citation
DOI: https://doi.org/10.1007/978-3-319-67792-7_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67791-0
Online ISBN: 978-3-319-67792-7
eBook Packages: EngineeringEngineering (R0)