Skip to main content
Log in

A relational model for XML structural joins and their size estimations

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

XML structural joins, which evaluate the containment (ancestor-descendant) relationships between XML elements, are important operations of XML query processing. Estimating structural join size accurately and quickly is crucial to the success of XML query plan selection and the query optimization. XML structural joins are essentially complex θ-joins, which render well-known estimation techniques for relational equijoins, such as discrete cosine transform, wavelet transform, and sketch, not applicable. In this paper, we model structural joins from a relational point of view and convert the complex θ-joins to equijoins so that those well-known estimation techniques become applicable to structural join size estimation. Theoretical analyses and extensive experiments have been performed on these estimation methods. It is shown that discrete cosine transform requires the least memory and yields the best estimates among the three techniques. Compared with state-of-the-art method IM-DA-Est, discrete cosine transform is much faster, requires less memory, and yields comparable estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aboulnaga A, Alameldeen A, Naughton J (2001) Estimating the selectivity of XML path expressions for internet scale applications. In: Proceedings of 27th international conference on very large data bases, pp 591–600

  2. Al-Khalifa S, Jagadish V, Koudas N, Patel M, Srivastava D, Wu Y (2002) Structural joins: a primitive for efficient XML query pattern matching. ICDE, pp 141–152

  3. Alon N, Matias Y, Szegedy M (1996) The space complexity of approximating the frequency moments. In: Proceedings of the 28th annual ACM symposium on theory of computing, pp 20–29

  4. Alon N, Gibons P, Matias Y, Szegedy M (1999) Tracking join and self-join sizes in limited storage. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 10–20

  5. Briggs L, Henson E (1995) DFT: an owner’s manual for the discrete Fourier transform. Philadelphia Society for Industrial and Applied Mathematics

  6. Chamberlin D, Florescu D, Robie J, Simeon J, Stefanescu M (2004) XQuery 1.0: an XML query language. W3C Working Draft http://www.w3.org/TR/xquery/

  7. Chen Z, Jagadish V, Korn F, Koudas N, Muthukrishnan S, Ng T, Srivastava D (2001) Counting twig matches in a tree. In: Proceedings of the 17th International Conference on Data Engineering, pp 595–604

  8. Chui C (1992). An introduction to wavelets. Academic, New York

    MATH  Google Scholar 

  9. Clark J, DeRose S (1999) XML path language (XPath). W3C Working Draft http://www.w3.org/TR/xpath

  10. Dobra A, Garofalakis M, Gchrkc J, Rastogi R (2002) Processing complex aggregate queries over data stream. ACM-SIGMOD, Madison, pp 61–72

  11. Freire J, Haritsa R, Ramanath M, Roy P, Siméon J (2002) Statix: making XML count. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 181–191

  12. Gilbert A, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the 27th international conferrence on VLDB, pp 79–88

  13. Issacson E and Keller B (1994). Analysis of numerical methods theorem 3. Dover Publications, New York, 238

    Google Scholar 

  14. Jiang H, Lu H, Wang W, Ooi B (2003) XR-Tree: indexing XML data for efficient structural join. In: Proceedings of ICDE, India, pp 253–264

  15. Jiang Z, Luo C, Hou W-C, Yan F, Zhu Q and Wang C-F (2007). Join size estimation over data streams using cosine series. Int J Inf Technol 12(9): 27–45

    Google Scholar 

  16. Lee J, Kim, Chung C (1999) Multi-dimensional selectivity estimation using compressed histogram information. In: Proceedings ACM SIGMOD conference, pp 205–214

  17. Ley M (2002) The dblp computer science bibliography: Evolution, research issues, perspectives. In: SPIRE 2002, Lisbon, Portugal, September 11–12, 2002. Springer, Heidelberg, pp 1–10

  18. Li Q, Moon B (2001) Indexing and querying XML data for regular path expressions. VLDB, pp 361–370

  19. Matias Y, Vitter J, Wang M (1998) Wavelet-based histograms for selectivity estimation. SIGMOD

  20. McHugh J, Widom J (1999) Optimizing branching path expressions. VLDB, pp 315–326

  21. Nievergelt Y (1999). Wavelets made easy. Birkhauser, Basel

    MATH  Google Scholar 

  22. Paparizos S, Al-Khalifa S, Chapman A, Jagadish V, Lakshmanan S, Nierman A, Patel M, Srivastava D, Wiwatwattana N, Wu Y and Yu C (2002). TIMBER: a native system for querying XML. VLDB J 11(4): 274–291

    Article  MATH  Google Scholar 

  23. Polyzotis N, Garofalakis N (2002) Statistical synopses for graph-structured XML databases. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 358–369

  24. Schmidt A, Waas F, Kersten M, Florescu D, Manolescu L, Carey J, Busse R (2001) The XML benchmark project. Technical report CWI

  25. Wang W, Jiang H, Lu H, Yu X (2003) Containment join size estimation: models and methods. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, pp 145–156

  26. Wu Y, Patel M, Jagadish V (2002) Estimating answer sizes for xml queries. In: 8th International conference on extending database technology, pp 590–608

  27. Zhang C, Naughton F, DeWitt J, Luo Q, Lohman M (2001) On supporting containment queries in relational database management systems. SIGMOD

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheng Luo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, C., Jiang, Z., Hou, WC. et al. A relational model for XML structural joins and their size estimations. Knowl Inf Syst 16, 97–127 (2008). https://doi.org/10.1007/s10115-007-0089-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0089-z

Keywords

Navigation