Skip to main content

Copula-Based Module for Selectivity Estimation of Multidimensional Range Queries

  • Conference paper
  • First Online:
Man-Machine Interactions 5 (ICMMI 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 659))

Included in the following conference series:

Abstract

Selectivity is a parameter used by a query optimizer for estimating the size of data that satisfies a query condition. Calculation of selectivity requires some representation of distribution of attribute values. Commonly, one-dimensional histograms that describe distributions of single attribute are used in DBMSes. A multidimensional (m-d) representation is required for complex queries with a range selection condition based on many attributes. Storing m-d representation directly (e.g. m-d histogram) is very space consuming for high dimensions hence the copula-based approach is proposed where we only need to store a few parameters. By using very few parameters of copula we achieve the method more accurate in selectivity estimation than the method based on attribute values independence which is commonly used by database management systems. The paper presents a software module which provides the copula-based method of selectivity estimation for a m-d range query. The presented solution is based on R Serve and it is integrated with Oracle DBMS. Some additional advantages of the module result from caching selectivities values for similar conditions are shown.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Matlab Sample Data Sets (2016) https://www.mathworks.com/help/stats/_bq9uxn4.html.

  2. 2.

    Rserve - Binary R server (2016) http://rforge.net/Rserve.

  3. 3.

    Oracle 11g. Using Extensible Optimizer (2016) http://docs.oracle.com/cd/B28359_01/appdev.111/b28425/ext_optimizer.htm.

References

  1. Augustyn, D.R.: Applying advanced methods of query selectivity estimation in oracle DBMS. In: ICMMI 2009, pp. 585–593. Springer, The Beskids (2009)

    Google Scholar 

  2. Augustyn, D.R.: Applying prediction of attribute values distribution for improvement of query selectivity estimation accuracy. Zastosowanie predykcji rozkładu wartości atrybutu w celu poprawy dokładności estymacji selektywności zapytań. Stud. Inform. 34(2A(111)), 23–42 (2013)

    Google Scholar 

  3. Augustyn, D.R.: M2HSE - the selectivity estimation method based on multidimensional attribute values distribution and marginal ones for some kind of range queries. Stud. Inform. 34, 43–56 (2013). M2HSE - metoda estymacji selektywności pewnej klasy zapytań zakresowych oparta na wielowymiarowym rozkładzie wartości atrybutów oraz rozkładach brzegowych

    Google Scholar 

  4. Augustyn, D.R.: Using the model of continuous dynamical system with viscous resistance forces for improving distribution prediction based on evolution of quantiles. In: BDAS 2014, pp. 1–9. Springer, Ustron (2014)

    Google Scholar 

  5. Augustyn, D.R., Warchal, L.: Applying task-aggregating wrapper to cuda-based method of query selectivity calculation using multidimensional kernel estimator. In: ICMMI 2013, pp. 591–599. Springer, Brenna (2013)

    Google Scholar 

  6. Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. Int. J. Very Large Data Bases 10(2–3), 199–223 (2001)

    MATH  Google Scholar 

  7. Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: SIGMOD 2001, pp. 461–472. ACM, Santa Barbara (2001)

    Google Scholar 

  8. Hofert, M., Mächler, M.: Nested archimedean copulas meet R: the nacopula package. J. Stat. Softw. 39(9), 1–20 (2011)

    Article  Google Scholar 

  9. Joe, H.: Dependence Modeling with Copulas. CRC Press, Boca Raton (2014)

    MATH  Google Scholar 

  10. Lee, J.H., Kim, D.H., Chung, C.W.: Multi-dimensional selectivity estimation using compressed histogram information. ACM SIGMOD Rec. 28(2), 205–214 (1999)

    Article  Google Scholar 

  11. Nelsen, R.B.: An Introduction to Copulas (Springer Series in Statistics). Springer-Verlag New York, Inc., New York (2006)

    Google Scholar 

  12. Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB 1997, pp. 486–495. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  13. Senegačnik, J.: PL/SQL and the CBO, pp. 153–172. Apress, Berkeley (2010)

    Google Scholar 

  14. Sklar, A.: Fonctions de Répartition á n Dimensions et Leurs Marges. Université Paris, Paris (1959)

    MATH  Google Scholar 

  15. Yan, F., Hou, W.C., Jiang, Z., Luo, C., Zhu, Q.: Selectivity estimation of range queries based on data density approximation via cosine series. Data Knowl. Eng. 63(3), 855–878 (2007)

    Article  Google Scholar 

  16. Yan, J.: Enjoy the joy of copulas: with a package copula. J. Stat. Softw. 21(4), 1–21 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Rafal Augustyn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Augustyn, D.R. (2018). Copula-Based Module for Selectivity Estimation of Multidimensional Range Queries. In: Gruca, A., CzachĂłrski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds) Man-Machine Interactions 5. ICMMI 2017. Advances in Intelligent Systems and Computing, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-319-67792-7_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67792-7_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67791-0

  • Online ISBN: 978-3-319-67792-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics