Skip to main content
Log in

Scaling XML query processing: distribution, localization and pruning

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on this, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distributed query plan that can be executed at the sites that hold XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems (3rd edn.). Springer, Berlin (2011)

    Book  Google Scholar 

  2. Abiteboul, S., Gottlob, G., Manna, M.: Distributed XML design. In: Proc. of PODS, pp. 247–257 (2009)

    Google Scholar 

  3. Deutsch, A., Tannen, V.: MaRS: A system for publishing XML from mixed and redundant storage. In: Proc. of VLDB, pp. 201–212 (2003)

    Chapter  Google Scholar 

  4. Abiteboul, S., Benjelloun, O., Cautis, B., Manolescu, I., Milo, T., Preda, N.: Lazy query evaluation for Active XML. In: Proc. of ACM SIGMOD, pp. 227–238 (2004)

    Chapter  Google Scholar 

  5. Bremer, J.-M., Gertz, M.: On distributing XML repositories. In: Proc. of WebDB, pp. 73–78 (2003)

    Google Scholar 

  6. Cong, G., Fan, W., Kementsietsidis, A.: Distributed query evaluation with performance guarantees. In: Proc. of ACM SIGMOD, pp. 509–520 (2007)

    Google Scholar 

  7. Buneman, P., Cong, G., Fan, W., Kementsietsidis, A.: Using partial evaluation in distributed query evaluation. In: Proc. of VLDB, pp. 211–222 (2006)

    Google Scholar 

  8. Suciu, D.: Distributed query evaluation on semistructured data. ACM Trans. Database Syst. 27(1), 1–62 (2002)

    Article  Google Scholar 

  9. Kling, P., Özsu, M.T., Daudjee, K.: Generating efficient execution plans for vertically partitioned XML databases. In: Proc. of VLDB Endow., pp. 1–11 (2010)

    Google Scholar 

  10. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: Proc. of ICDE, pp. 302–314 (1999)

    Google Scholar 

  11. Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)

    Article  MathSciNet  Google Scholar 

  12. Ives, Z.G., Halevy, A.Y., Weld, D.S.: An XML query engine for network-bound data. VLDB J. 11(4), 380–402 (2002)

    Article  MATH  Google Scholar 

  13. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. of ACM SIGMOD, pp. 310–321 (2002)

    Google Scholar 

  14. Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc. of ICDE, pp. 54–65 (2004)

    Google Scholar 

  15. Buswell, S., Devitt, S., Diaz, A., Ion, P., Miner, R., Poppelier, N., Smith, B., Soiffer, N., Sutor, R., Watt, S.: Mathematical Markup Language (MathML) 1.01 Specification (1999). http://www.w3.org/TR/REC-MathML/

  16. Murray-Rust, P.: Chemical markup language. World Wide Web J. 2(4), 135–147 (1997)

    Google Scholar 

  17. Fernàndez, M., Malhotra, A., Marsh, J., Nagy, M., Walsh, N.: XQuery 1.0 and XPath 2.0 data model (XDM) (2007). http://www.w3.org/TR/xpath-datamodel/

  18. Brantner, M., Helmer, S., Kanne, C.-C., Moerkotte, G.: Full-fledged algebraic XPath processing in Natix. In: Proc. of ICDE, pp. 705–716 (2005)

    Google Scholar 

  19. Al-Khalifa, S., Jagadish, H., Koudas, N., Patel, J., Srivastava, D., Wu, Y.: Structural joins: A primitive for efficient XML query pattern matching. In: Proc. of ICDE, pp. 141–152 (2002)

    Google Scholar 

  20. Kling, P., Özsu, M.T., Daudjee, K.: Distributed XML query processing: Fragmentation, localization and pruning. University of Waterloo, Tech. Rep. CS-2010-02 (2010)

  21. Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing, 114–121 (1972)

  22. Dewey, M.: A classification and subject index for cataloguing and arranging the books and pamphlets of a library (1876)

  23. Zhang, N., Haas, P.J., Josifovski, V., Lohman, G.M., Zhang, C.: Statistical learning techniques for costing XML queries. In: Proc. of VLDB, pp. 289–300 (2005)

    Google Scholar 

  24. Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications. In: Proc. of VLDB, pp. 591–600 (2001)

    Google Scholar 

  25. Franceschet, M.: XPathMark: An XPath benchmark for XMark generated data. In: Proc. of XSym, pp. 129–143 (2005)

    Google Scholar 

  26. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: Proc. of VLDB, pp. 974–985 (2002)

    Google Scholar 

  27. Andrade, A., Ruberg, G., Baião, F.A., Braganholo, V.P., Mattoso, M.: Efficiently processing XML queries over fragmented repositories with PartiX. In: Proc. of EDBT, pp. 150–163 (2006)

    Google Scholar 

  28. Abiteboul, S., Benjelloun, O., Milo, T.: The active XML project: an overview. VLDB J. 17(5), 1019–1040 (2008)

    Article  Google Scholar 

  29. Abiteboul, S., Benjellourn, O., Manolescu, I., Milo, T., Weber, R.: Active XML: Peer-to-peer data and web services integration. In: Proc. of VLDB (2002)

    Google Scholar 

  30. Abiteboul, S., Bonifati, A., Cobéna, G., Manolescu, I., Milo, T.: Dynamic XML documents with distribution and replication. In: Proc. of ACM SIGMOD, pp. 527–538 (2003)

    Google Scholar 

  31. Ma, H., Schewe, K.-D.: Fragmentation of XML documents. In: Proc. of SBBD, pp. 200–214 (2003)

    Google Scholar 

  32. Ma, H., Schewe, K.-D.: Heuristic horizontal XML fragmentation. In: Proc. of CAiSE, pp. 131–136 (2005)

    Google Scholar 

  33. Kido, K., Amagasa, T., Kitagawa, H.: Processing XPath queries in PC-clusters using XML data partitioning. In: Special Workshop on Databases for Next-Generation Researchers, ICDE, p. 114 (2006)

    Google Scholar 

  34. Marian, A., Siméon, J.: Projecting XML documents. In: Proc. of VLDB, pp. 213–224 (2003)

    Chapter  Google Scholar 

  35. Bose, S., Fegaras, L.: XFrag: A query processing framework for fragmented XML data. In: Proc. of WebDB, pp. 97–102 (2005)

    Google Scholar 

  36. Chan, C.-Y., Ni, Y.: Content-based dissemination of fragmented XML data. In: Proc. of ICDCS, p. 44 (2006)

    Google Scholar 

  37. Kanne, C.-C., Brantner, M., Moerkotte, G.: Cost-sensitive reordering of navigational primitives. In: Proc. of ACM SIGMOD, pp. 742–753 (2005)

    Chapter  Google Scholar 

  38. Zhang, Y., Boncz, P.: XRPC: interoperable and efficient distributed XQuery. In: Proc. of VLDB, pp. 99–110 (2007)

    Google Scholar 

  39. Zhang, Y., Boncz, P.: XRPC: distributed xquery and update processing with heterogeneous xquery engines. In: Proc. of ACM SIGMOD. ACM, New York (2008), pp. 1331–1336

    Google Scholar 

  40. Re, C., Brinkley, J., Hinshaw, K., Suciu, D.: Distributed XQuery. In: Workshop on Information Integration on the Web, pp. 116–121 (2004)

    Google Scholar 

  41. Fernàndez, M.F., Jim, T., Morton, K., Onose, N., Siméon, J.: Highly distributed XQuery with DXQ. In: Proc. of ACM SIGMOD, pp. 1159–1161 (2007)

    Google Scholar 

  42. Andrade, A., Ruberga, G., Baião, F.A., Braganholo, V.P., Mattoso, M.: Partix: processing XQuery queries over fragmented XML repositories. Universidade Federal do Rio de Janeiro, Tech. Rep. (2005)

  43. Hammerschmidt, B.C., Kempa, M., Linnemann, V.: On the intersection of XPath expressions. In: International Database Engineering and Applications Symposium, pp. 49–57 (2005)

    Google Scholar 

  44. Zhang, Y., Tang, N., Boncz, P.: Efficient distribution of full-fledged XQuery. In: Proc. of ICDE, pp. 565–576 (2009)

    Google Scholar 

  45. Le, T.T.T., Doan, D.D., Bhavsar, V.C., Boley, H.: A bottom-up algorithm for query decomposition. Int. J. Innov. Comput. Appl. 1(3), 185–193 (2008)

    Article  Google Scholar 

  46. Tajima, K., Fukui, Y.: Answering xpath queries over networks by sending minimal views. In: Proc. of VLDB, pp. 48–59 (2004)

    Chapter  Google Scholar 

  47. Koloniari, G., Pitoura, E.: Distributed structural relaxation of XPath queries. In: Proc. of ICDE, pp. 529–540 (2009)

    Google Scholar 

  48. Haustein, M.P., Härder, T., Mathis, C., Wagner, M.: DeweyIDs – the key to fine-grained management of XML documents. In: Proc. of Brazilian Symposium on Databases, pp. 85–99 (2005)

    Google Scholar 

  49. Lalmas, M.: XML retrieval. Synth. Lect. Inf. Concept. Retr. Services 1(1), 1–111 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Kling.

Additional information

Communicated by Ahmed K. Elmagarmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kling, P., Özsu, M.T. & Daudjee, K. Scaling XML query processing: distribution, localization and pruning. Distrib Parallel Databases 29, 445–490 (2011). https://doi.org/10.1007/s10619-011-7085-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-011-7085-8

Keywords

Navigation