Scaling XML query processing: distribution, localization and pruning

Kling, Patrick; Özsu, M. Tamer; Daudjee, Khuzaima

doi:10.1007/s10619-011-7085-8

Scaling XML query processing: distribution, localization and pruning

Published: 17 June 2011

Volume 29, pages 445–490, (2011)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Patrick Kling¹,
M. Tamer Özsu¹ &
Khuzaima Daudjee¹

195 Accesses
4 Citations
Explore all metrics

Abstract

Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on this, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distributed query plan that can be executed at the sites that hold XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems (3rd edn.). Springer, Berlin (2011)
Book Google Scholar
Abiteboul, S., Gottlob, G., Manna, M.: Distributed XML design. In: Proc. of PODS, pp. 247–257 (2009)
Google Scholar
Deutsch, A., Tannen, V.: MaRS: A system for publishing XML from mixed and redundant storage. In: Proc. of VLDB, pp. 201–212 (2003)
Chapter Google Scholar
Abiteboul, S., Benjelloun, O., Cautis, B., Manolescu, I., Milo, T., Preda, N.: Lazy query evaluation for Active XML. In: Proc. of ACM SIGMOD, pp. 227–238 (2004)
Chapter Google Scholar
Bremer, J.-M., Gertz, M.: On distributing XML repositories. In: Proc. of WebDB, pp. 73–78 (2003)
Google Scholar
Cong, G., Fan, W., Kementsietsidis, A.: Distributed query evaluation with performance guarantees. In: Proc. of ACM SIGMOD, pp. 509–520 (2007)
Google Scholar
Buneman, P., Cong, G., Fan, W., Kementsietsidis, A.: Using partial evaluation in distributed query evaluation. In: Proc. of VLDB, pp. 211–222 (2006)
Google Scholar
Suciu, D.: Distributed query evaluation on semistructured data. ACM Trans. Database Syst. 27(1), 1–62 (2002)
Article Google Scholar
Kling, P., Özsu, M.T., Daudjee, K.: Generating efficient execution plans for vertically partitioned XML databases. In: Proc. of VLDB Endow., pp. 1–11 (2010)
Google Scholar
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: Proc. of ICDE, pp. 302–314 (1999)
Google Scholar
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)
Article MathSciNet Google Scholar
Ives, Z.G., Halevy, A.Y., Weld, D.S.: An XML query engine for network-bound data. VLDB J. 11(4), 380–402 (2002)
Article MATH Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. of ACM SIGMOD, pp. 310–321 (2002)
Google Scholar
Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc. of ICDE, pp. 54–65 (2004)
Google Scholar
Buswell, S., Devitt, S., Diaz, A., Ion, P., Miner, R., Poppelier, N., Smith, B., Soiffer, N., Sutor, R., Watt, S.: Mathematical Markup Language (MathML) 1.01 Specification (1999). http://www.w3.org/TR/REC-MathML/
Murray-Rust, P.: Chemical markup language. World Wide Web J. 2(4), 135–147 (1997)
Google Scholar
Fernàndez, M., Malhotra, A., Marsh, J., Nagy, M., Walsh, N.: XQuery 1.0 and XPath 2.0 data model (XDM) (2007). http://www.w3.org/TR/xpath-datamodel/
Brantner, M., Helmer, S., Kanne, C.-C., Moerkotte, G.: Full-fledged algebraic XPath processing in Natix. In: Proc. of ICDE, pp. 705–716 (2005)
Google Scholar
Al-Khalifa, S., Jagadish, H., Koudas, N., Patel, J., Srivastava, D., Wu, Y.: Structural joins: A primitive for efficient XML query pattern matching. In: Proc. of ICDE, pp. 141–152 (2002)
Google Scholar
Kling, P., Özsu, M.T., Daudjee, K.: Distributed XML query processing: Fragmentation, localization and pruning. University of Waterloo, Tech. Rep. CS-2010-02 (2010)
Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing, 114–121 (1972)
Dewey, M.: A classification and subject index for cataloguing and arranging the books and pamphlets of a library (1876)
Zhang, N., Haas, P.J., Josifovski, V., Lohman, G.M., Zhang, C.: Statistical learning techniques for costing XML queries. In: Proc. of VLDB, pp. 289–300 (2005)
Google Scholar
Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications. In: Proc. of VLDB, pp. 591–600 (2001)
Google Scholar
Franceschet, M.: XPathMark: An XPath benchmark for XMark generated data. In: Proc. of XSym, pp. 129–143 (2005)
Google Scholar
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: Proc. of VLDB, pp. 974–985 (2002)
Google Scholar
Andrade, A., Ruberg, G., Baião, F.A., Braganholo, V.P., Mattoso, M.: Efficiently processing XML queries over fragmented repositories with PartiX. In: Proc. of EDBT, pp. 150–163 (2006)
Google Scholar
Abiteboul, S., Benjelloun, O., Milo, T.: The active XML project: an overview. VLDB J. 17(5), 1019–1040 (2008)
Article Google Scholar
Abiteboul, S., Benjellourn, O., Manolescu, I., Milo, T., Weber, R.: Active XML: Peer-to-peer data and web services integration. In: Proc. of VLDB (2002)
Google Scholar
Abiteboul, S., Bonifati, A., Cobéna, G., Manolescu, I., Milo, T.: Dynamic XML documents with distribution and replication. In: Proc. of ACM SIGMOD, pp. 527–538 (2003)
Google Scholar
Ma, H., Schewe, K.-D.: Fragmentation of XML documents. In: Proc. of SBBD, pp. 200–214 (2003)
Google Scholar
Ma, H., Schewe, K.-D.: Heuristic horizontal XML fragmentation. In: Proc. of CAiSE, pp. 131–136 (2005)
Google Scholar
Kido, K., Amagasa, T., Kitagawa, H.: Processing XPath queries in PC-clusters using XML data partitioning. In: Special Workshop on Databases for Next-Generation Researchers, ICDE, p. 114 (2006)
Google Scholar
Marian, A., Siméon, J.: Projecting XML documents. In: Proc. of VLDB, pp. 213–224 (2003)
Chapter Google Scholar
Bose, S., Fegaras, L.: XFrag: A query processing framework for fragmented XML data. In: Proc. of WebDB, pp. 97–102 (2005)
Google Scholar
Chan, C.-Y., Ni, Y.: Content-based dissemination of fragmented XML data. In: Proc. of ICDCS, p. 44 (2006)
Google Scholar
Kanne, C.-C., Brantner, M., Moerkotte, G.: Cost-sensitive reordering of navigational primitives. In: Proc. of ACM SIGMOD, pp. 742–753 (2005)
Chapter Google Scholar
Zhang, Y., Boncz, P.: XRPC: interoperable and efficient distributed XQuery. In: Proc. of VLDB, pp. 99–110 (2007)
Google Scholar
Zhang, Y., Boncz, P.: XRPC: distributed xquery and update processing with heterogeneous xquery engines. In: Proc. of ACM SIGMOD. ACM, New York (2008), pp. 1331–1336
Google Scholar
Re, C., Brinkley, J., Hinshaw, K., Suciu, D.: Distributed XQuery. In: Workshop on Information Integration on the Web, pp. 116–121 (2004)
Google Scholar
Fernàndez, M.F., Jim, T., Morton, K., Onose, N., Siméon, J.: Highly distributed XQuery with DXQ. In: Proc. of ACM SIGMOD, pp. 1159–1161 (2007)
Google Scholar
Andrade, A., Ruberga, G., Baião, F.A., Braganholo, V.P., Mattoso, M.: Partix: processing XQuery queries over fragmented XML repositories. Universidade Federal do Rio de Janeiro, Tech. Rep. (2005)
Hammerschmidt, B.C., Kempa, M., Linnemann, V.: On the intersection of XPath expressions. In: International Database Engineering and Applications Symposium, pp. 49–57 (2005)
Google Scholar
Zhang, Y., Tang, N., Boncz, P.: Efficient distribution of full-fledged XQuery. In: Proc. of ICDE, pp. 565–576 (2009)
Google Scholar
Le, T.T.T., Doan, D.D., Bhavsar, V.C., Boley, H.: A bottom-up algorithm for query decomposition. Int. J. Innov. Comput. Appl. 1(3), 185–193 (2008)
Article Google Scholar
Tajima, K., Fukui, Y.: Answering xpath queries over networks by sending minimal views. In: Proc. of VLDB, pp. 48–59 (2004)
Chapter Google Scholar
Koloniari, G., Pitoura, E.: Distributed structural relaxation of XPath queries. In: Proc. of ICDE, pp. 529–540 (2009)
Google Scholar
Haustein, M.P., Härder, T., Mathis, C., Wagner, M.: DeweyIDs – the key to fine-grained management of XML documents. In: Proc. of Brazilian Symposium on Databases, pp. 85–99 (2005)
Google Scholar
Lalmas, M.: XML retrieval. Synth. Lect. Inf. Concept. Retr. Services 1(1), 1–111 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Cheriton School of Computer Science, University of Waterloo, 200 University Ave W, Waterloo, ON, N2L 3G1, Canada
Patrick Kling, M. Tamer Özsu & Khuzaima Daudjee

Authors

Patrick Kling
View author publications
You can also search for this author in PubMed Google Scholar
M. Tamer Özsu
View author publications
You can also search for this author in PubMed Google Scholar
Khuzaima Daudjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Kling.

Additional information

Communicated by Ahmed K. Elmagarmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kling, P., Özsu, M.T. & Daudjee, K. Scaling XML query processing: distribution, localization and pruning. Distrib Parallel Databases 29, 445–490 (2011). https://doi.org/10.1007/s10619-011-7085-8

Download citation

Published: 17 June 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10619-011-7085-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scaling XML query processing: distribution, localization and pruning

Abstract

Access this article

Similar content being viewed by others

Integrated method for distributed processing of large XML data

Online Integration of Fragmented XML Documents

Distributed Processing of XPath Queries Using MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scaling XML query processing: distribution, localization and pruning

Abstract

Access this article

Similar content being viewed by others

Integrated method for distributed processing of large XML data

Online Integration of Fragmented XML Documents

Distributed Processing of XPath Queries Using MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation