Scaling XML query processing: distribution, localization and pruning

  • Patrick Kling
  • M. Tamer Özsu
  • Khuzaima Daudjee


Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of the XML data and query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on this, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distributed query plan that can be executed at the sites that hold XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution.


Distributed XML Localization Pruning 


  1. 1.
    Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems (3rd edn.). Springer, Berlin (2011) CrossRefGoogle Scholar
  2. 2.
    Abiteboul, S., Gottlob, G., Manna, M.: Distributed XML design. In: Proc. of PODS, pp. 247–257 (2009) Google Scholar
  3. 3.
    Deutsch, A., Tannen, V.: MaRS: A system for publishing XML from mixed and redundant storage. In: Proc. of VLDB, pp. 201–212 (2003) CrossRefGoogle Scholar
  4. 4.
    Abiteboul, S., Benjelloun, O., Cautis, B., Manolescu, I., Milo, T., Preda, N.: Lazy query evaluation for Active XML. In: Proc. of ACM SIGMOD, pp. 227–238 (2004) CrossRefGoogle Scholar
  5. 5.
    Bremer, J.-M., Gertz, M.: On distributing XML repositories. In: Proc. of WebDB, pp. 73–78 (2003) Google Scholar
  6. 6.
    Cong, G., Fan, W., Kementsietsidis, A.: Distributed query evaluation with performance guarantees. In: Proc. of ACM SIGMOD, pp. 509–520 (2007) Google Scholar
  7. 7.
    Buneman, P., Cong, G., Fan, W., Kementsietsidis, A.: Using partial evaluation in distributed query evaluation. In: Proc. of VLDB, pp. 211–222 (2006) Google Scholar
  8. 8.
    Suciu, D.: Distributed query evaluation on semistructured data. ACM Trans. Database Syst. 27(1), 1–62 (2002) CrossRefGoogle Scholar
  9. 9.
    Kling, P., Özsu, M.T., Daudjee, K.: Generating efficient execution plans for vertically partitioned XML databases. In: Proc. of VLDB Endow., pp. 1–11 (2010) Google Scholar
  10. 10.
    Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: Proc. of ICDE, pp. 302–314 (1999) Google Scholar
  11. 11.
    Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004) MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ives, Z.G., Halevy, A.Y., Weld, D.S.: An XML query engine for network-bound data. VLDB J. 11(4), 380–402 (2002) MATHCrossRefGoogle Scholar
  13. 13.
    Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. of ACM SIGMOD, pp. 310–321 (2002) Google Scholar
  14. 14.
    Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc. of ICDE, pp. 54–65 (2004) Google Scholar
  15. 15.
    Buswell, S., Devitt, S., Diaz, A., Ion, P., Miner, R., Poppelier, N., Smith, B., Soiffer, N., Sutor, R., Watt, S.: Mathematical Markup Language (MathML) 1.01 Specification (1999).
  16. 16.
    Murray-Rust, P.: Chemical markup language. World Wide Web J. 2(4), 135–147 (1997) Google Scholar
  17. 17.
    Fernàndez, M., Malhotra, A., Marsh, J., Nagy, M., Walsh, N.: XQuery 1.0 and XPath 2.0 data model (XDM) (2007).
  18. 18.
    Brantner, M., Helmer, S., Kanne, C.-C., Moerkotte, G.: Full-fledged algebraic XPath processing in Natix. In: Proc. of ICDE, pp. 705–716 (2005) Google Scholar
  19. 19.
    Al-Khalifa, S., Jagadish, H., Koudas, N., Patel, J., Srivastava, D., Wu, Y.: Structural joins: A primitive for efficient XML query pattern matching. In: Proc. of ICDE, pp. 141–152 (2002) Google Scholar
  20. 20.
    Kling, P., Özsu, M.T., Daudjee, K.: Distributed XML query processing: Fragmentation, localization and pruning. University of Waterloo, Tech. Rep. CS-2010-02 (2010) Google Scholar
  21. 21.
    Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing, 114–121 (1972) Google Scholar
  22. 22.
    Dewey, M.: A classification and subject index for cataloguing and arranging the books and pamphlets of a library (1876) Google Scholar
  23. 23.
    Zhang, N., Haas, P.J., Josifovski, V., Lohman, G.M., Zhang, C.: Statistical learning techniques for costing XML queries. In: Proc. of VLDB, pp. 289–300 (2005) Google Scholar
  24. 24.
    Aboulnaga, A., Alameldeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications. In: Proc. of VLDB, pp. 591–600 (2001) Google Scholar
  25. 25.
    Franceschet, M.: XPathMark: An XPath benchmark for XMark generated data. In: Proc. of XSym, pp. 129–143 (2005) Google Scholar
  26. 26.
    Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: Proc. of VLDB, pp. 974–985 (2002) Google Scholar
  27. 27.
    Andrade, A., Ruberg, G., Baião, F.A., Braganholo, V.P., Mattoso, M.: Efficiently processing XML queries over fragmented repositories with PartiX. In: Proc. of EDBT, pp. 150–163 (2006) Google Scholar
  28. 28.
    Abiteboul, S., Benjelloun, O., Milo, T.: The active XML project: an overview. VLDB J. 17(5), 1019–1040 (2008) CrossRefGoogle Scholar
  29. 29.
    Abiteboul, S., Benjellourn, O., Manolescu, I., Milo, T., Weber, R.: Active XML: Peer-to-peer data and web services integration. In: Proc. of VLDB (2002) Google Scholar
  30. 30.
    Abiteboul, S., Bonifati, A., Cobéna, G., Manolescu, I., Milo, T.: Dynamic XML documents with distribution and replication. In: Proc. of ACM SIGMOD, pp. 527–538 (2003) Google Scholar
  31. 31.
    Ma, H., Schewe, K.-D.: Fragmentation of XML documents. In: Proc. of SBBD, pp. 200–214 (2003) Google Scholar
  32. 32.
    Ma, H., Schewe, K.-D.: Heuristic horizontal XML fragmentation. In: Proc. of CAiSE, pp. 131–136 (2005) Google Scholar
  33. 33.
    Kido, K., Amagasa, T., Kitagawa, H.: Processing XPath queries in PC-clusters using XML data partitioning. In: Special Workshop on Databases for Next-Generation Researchers, ICDE, p. 114 (2006) Google Scholar
  34. 34.
    Marian, A., Siméon, J.: Projecting XML documents. In: Proc. of VLDB, pp. 213–224 (2003) CrossRefGoogle Scholar
  35. 35.
    Bose, S., Fegaras, L.: XFrag: A query processing framework for fragmented XML data. In: Proc. of WebDB, pp. 97–102 (2005) Google Scholar
  36. 36.
    Chan, C.-Y., Ni, Y.: Content-based dissemination of fragmented XML data. In: Proc. of ICDCS, p. 44 (2006) Google Scholar
  37. 37.
    Kanne, C.-C., Brantner, M., Moerkotte, G.: Cost-sensitive reordering of navigational primitives. In: Proc. of ACM SIGMOD, pp. 742–753 (2005) CrossRefGoogle Scholar
  38. 38.
    Zhang, Y., Boncz, P.: XRPC: interoperable and efficient distributed XQuery. In: Proc. of VLDB, pp. 99–110 (2007) Google Scholar
  39. 39.
    Zhang, Y., Boncz, P.: XRPC: distributed xquery and update processing with heterogeneous xquery engines. In: Proc. of ACM SIGMOD. ACM, New York (2008), pp. 1331–1336 Google Scholar
  40. 40.
    Re, C., Brinkley, J., Hinshaw, K., Suciu, D.: Distributed XQuery. In: Workshop on Information Integration on the Web, pp. 116–121 (2004) Google Scholar
  41. 41.
    Fernàndez, M.F., Jim, T., Morton, K., Onose, N., Siméon, J.: Highly distributed XQuery with DXQ. In: Proc. of ACM SIGMOD, pp. 1159–1161 (2007) Google Scholar
  42. 42.
    Andrade, A., Ruberga, G., Baião, F.A., Braganholo, V.P., Mattoso, M.: Partix: processing XQuery queries over fragmented XML repositories. Universidade Federal do Rio de Janeiro, Tech. Rep. (2005) Google Scholar
  43. 43.
    Hammerschmidt, B.C., Kempa, M., Linnemann, V.: On the intersection of XPath expressions. In: International Database Engineering and Applications Symposium, pp. 49–57 (2005) Google Scholar
  44. 44.
    Zhang, Y., Tang, N., Boncz, P.: Efficient distribution of full-fledged XQuery. In: Proc. of ICDE, pp. 565–576 (2009) Google Scholar
  45. 45.
    Le, T.T.T., Doan, D.D., Bhavsar, V.C., Boley, H.: A bottom-up algorithm for query decomposition. Int. J. Innov. Comput. Appl. 1(3), 185–193 (2008) CrossRefGoogle Scholar
  46. 46.
    Tajima, K., Fukui, Y.: Answering xpath queries over networks by sending minimal views. In: Proc. of VLDB, pp. 48–59 (2004) CrossRefGoogle Scholar
  47. 47.
    Koloniari, G., Pitoura, E.: Distributed structural relaxation of XPath queries. In: Proc. of ICDE, pp. 529–540 (2009) Google Scholar
  48. 48.
    Haustein, M.P., Härder, T., Mathis, C., Wagner, M.: DeweyIDs – the key to fine-grained management of XML documents. In: Proc. of Brazilian Symposium on Databases, pp. 85–99 (2005) Google Scholar
  49. 49.
    Lalmas, M.: XML retrieval. Synth. Lect. Inf. Concept. Retr. Services 1(1), 1–111 (2009) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Patrick Kling
    • 1
  • M. Tamer Özsu
    • 1
  • Khuzaima Daudjee
    • 1
  1. 1.Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations