A Decomposition-Based Probabilistic Framework for Estimating the Selectivity of XML Twig Queries

  • Chao Wang
  • Srinivasan Parthasarathy
  • Ruoming Jin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

In this paper we present a novel approach for estimating the selectivity of XML twig queries. Such a technique is useful for answering approximate queries as well as for determining an optimal query plan for complex queries based on said estimates. Our approach relies on a summary structure that contains the occurrence statistics of small twigs. We rely on a novel probabilistic approach for decomposing larger twig queries into smaller ones. We then show how it can be used to estimate the selectivity of the larger query in conjunction with the summary information. We present and evaluate different strategies for decomposition and compare this work against a state-of-the-art selectivity estimation approach on synthetic and real datasets. The experimental results show that our proposed approach is very effective in estimating the selectivity of XML twig queries.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Selectivity estimation for xml twigs. In: Proceedings of the International Conference on Data Engineering (2004)Google Scholar
  2. 2.
    Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate xml query answers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2004)Google Scholar
  3. 3.
    Chen, Z., Jagadish, H.V., et al.: Counting twig matches in a tree. In: Proceedings of the International Conference on Data Engineering (2001)Google Scholar
  4. 4.
    Chen, Z., Korn, F., et al.: Selectivity estimation for boolean queries. In: Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2000)Google Scholar
  5. 5.
    Jagadish, H., Kapitskaia, O., et al.: Multi-dimensional substring selectivity estimation. In: Proceedings of the International Conference on Very Large Data Bases (1999)Google Scholar
  6. 6.
    Jagadish, H., Ng, T., R., et al.: Substring selectivity estimation. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (1999)Google Scholar
  7. 7.
    Krishnan, P., Vitter, J.S., Iyer, B.: Estimating alphanumeric selectivity in the presence of wildcards. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1996)Google Scholar
  8. 8.
    McHugh, J., Widom, J.: Query optimization for xml. In: Proceedings of the International Conference on Very Large Data Bases (1999)Google Scholar
  9. 9.
    Lim, L., Wang, M., et al.: Xpathlearner: An on-line self-tuning markov histogram for xml path selectivity estimation. In: Proceedings of the International Conference on Very Large Data Bases (2002)Google Scholar
  10. 10.
    Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of xml path expressions for internet scale applications. In: Proceedings of the International Conference on Very Large Data Bases (2001)Google Scholar
  11. 11.
    Wu, Y., Patel, J.M., Jagadish, H.V.: Estimating answer sizes for XML queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 590. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Wang, W., Jiang, H., Lu, H., Yu, J.X.: Bloom histogram: Path selectivity estimation for xml data with updates. In: Proceedings of the International Conference on Very Large Data Bases (2004)Google Scholar
  13. 13.
    Jiang, W., Jiang, H., Lu, H., Yu, J.X.: Containment join size estimation: Models and methods. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)Google Scholar
  14. 14.
    Freire, J., Haritsa, J.R., et al.: Statix: Making xml count. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)Google Scholar
  15. 15.
    Polyzotis, N., Garofalakis, M.: Structure and value synopses for xml data graphs. In: Proceedings of the International Conference on Very Large Data Bases (2002)Google Scholar
  16. 16.
    Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured xml databases. In: Proceedings of the ACMSIGMODInternational Conference on Management of Data (2002)Google Scholar
  17. 17.
    Wang, C., Parthasarathy, S., Jin, R.: A decomposition-based probabilistic framework for estimating the selectivity of xml twig queries. In: The Ohio State University, Technical Report (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chao Wang
    • 1
  • Srinivasan Parthasarathy
    • 1
  • Ruoming Jin
    • 1
  1. 1.Department of Computer Science and EngineeringThe Ohio State University 

Personalised recommendations