Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

XML Selectivity Estimation

  • Maya Ramanath
  • Juliana Freire
  • Neoklis Polyzotis
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_801

Synonyms

XML cardinality estimation

Definition

Selectivity estimation in database systems refers to the task of estimating the number of results that will be output for a given query. Selectivity estimates are crucial in query optimization, since they enable optimizers to select efficient query plans. They are also employed in interactive data exploration as timely feedback about the expected outcome of user queries, and can even serve as approximate answers for count queries.

Selectivity estimators apply an estimation procedure on a synopsis of the data. Due to the stringent time and space constraints of query optimization, of which selectivity estimation is only one of the steps, selectivity estimators are faced with two, often conflicting, requirements: they have to accurately and efficiently estimate the cardinality of queries while keeping the synopsis size to a minimum.

While there is a large body of literature on selectivity estimation in the context of relational databases, the...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Aboulnaga A, Alameldeen AR, Naughton J. Estimating the selectivity of XML path expressions for internet scale applications. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 591–600.Google Scholar
  2. 2.
    Chen Z, Jagadish HV, Korn F, Koudas N, Muthukrishnan S, Ng RT, Srivastava D. Counting twig matches in a tree. In: Proceedings of the 17th International Conference on Data Engineering; 2001. p. 453–62.Google Scholar
  3. 3.
    Freire J, Haritsa J, Ramanath M, Roy P, Siméon J. StatiX: making XML count. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2002. p. 181–91.Google Scholar
  4. 4.
    Goldman R, Widom J. Dataguides: enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23th International Conference on Very Large Data Bases; 1997. p. 436–45.Google Scholar
  5. 5.
    Lim L, Wang M, Padmanabhan S, Vitter J, Parr R. XPathLearner: an on-line self-tuning markov histogram for XML path selectivity estimation. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 442–53.CrossRefGoogle Scholar
  6. 6.
    Lim L, Wang M, Vitter J. CXHist: an on-line classification-based histogram for XML string selectivity estimation. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 1187–98.Google Scholar
  7. 7.
    McHugh J, Abiteboul S, Goldman R, Quass D, Widom J. A database management system for semistructured data. ACM SIGMOD Rec. 1997;26(3):54–66.CrossRefGoogle Scholar
  8. 8.
    Milo T, Suciu D. Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory; 1999. p. 277–95.Google Scholar
  9. 9.
    Nestorov S, Ullman J, Wiener J, Chawathe S. Representative objects: concise representations of semistructured, hierarchical data. In: Proceedings of the 13th International Conference on Data Engineering; 1997. p. 79–90.Google Scholar
  10. 10.
    Polyzotis N, Garofalakis M. XCluster synopses for structured XML content. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 63.Google Scholar
  11. 11.
    Polyzotis N, Garofalakis M. XSketch synopses for XML data graphs. ACM Trans Database Syst. 2006;31(3):1014–63.CrossRefGoogle Scholar
  12. 12.
    Polyzotis N, Garofalakis M, Ioannidis Y. Approximate XML query answers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 263–74.Google Scholar
  13. 13.
    Ramanath M, Zhang L, Freire J, Haritsa J. IMAX: incremental maintenance of schema-based XML statistics. In: Proceedings of the 21st International Conference on Data Engineering; 2005. p. 273–84.Google Scholar
  14. 14.
    Rao P, Moon B. Sketchtree: approximate tree pattern counts over streaming labeled trees. In: Proceedings of 22nd International Conference on Data Engineering; 2006. p. 80.Google Scholar
  15. 15.
    Sartiani C. A framework for estimating XML query cardinality. In: Proceedings of the 6th International Workshop on the World Wide Web and Databases; 2003. p. 43–48.Google Scholar
  16. 16.
    Wang W, Jiang H, Lu H, Yu JX. Containment join size estimation: models and methods. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003. p. 145–56.Google Scholar
  17. 17.
    Wang W, Jiang H, Lu H, Yu JX. Bloom histogram: path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 240–51.Google Scholar
  18. 18.
    Wu Y, Patel JM, Jagadish HV. Estimating answer sizes for XML queries. In: Advances in database technology, Proceedings of the 8th International Conference on Extending Database Technology; 2002. p. 590–608.CrossRefGoogle Scholar
  19. 19.
    Zhang N, Özsu MT, Aboulnaga A, Ilyas IF. XSEED: accurate and fast cardinality estimation for XPath queries. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 61.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Maya Ramanath
    • 1
  • Juliana Freire
    • 2
    • 3
    • 4
  • Neoklis Polyzotis
    • 5
  1. 1.Max-Planck Institute for InformaticsSaarbrückenGermany
  2. 2.NYU Tandon School of EngineeringBrooklynUSA
  3. 3.NYU Center for Data ScienceNew YorkUSA
  4. 4.New York UniversityNew YorkUSA
  5. 5.University of California Santa CruzSanta CruzUSA

Section editors and affiliations

  • Sihem Amer-Yahia
    • 1
  1. 1.Laboratoire d'Informatique de GrenobleCNRS and LIGGrenobleFrance