Efficient Fragmentation of Large XML Documents

  • Angela Bonifati
  • Alfredo Cuzzocrea
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4653)


Fragmentation techniques for XML data are gaining momentum within both distributed and centralized XML query engines and pose novel and unrecognized challenges to the community. Albeit not novel, and clearly inspired by the classical divide et impera principle, fragmentation for XML trees has been proved successful in boosting the querying performance, and in cutting down the memory requirements. However, fragmentation considered so far has been driven by semantics, i.e. built around query predicates. In this paper, we propose a novel fragmentation technique that founds on structural constraints of XML documents (size, tree-width, and tree-depth) and on special-purpose structure histograms able to meaningfully summarize XML documents. This allows us to predict bounding intervals of structural properties of output (XML) fragments for efficient query processing of distributed XML data. An experimental evaluation of our study confirms the effectiveness of our fragmentation methodology on some representative XML data sets.


Query Processing Query Plan Path Expression Query Processor Fragmentation Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bellatreche, L., Karlapalem, K., Simonet, A.: Algorithms and Support for Horizontal Class Partitioning in Object-Oriented Databases. Distributed and Parallel Databases 8 (2000)Google Scholar
  2. 2.
    Bohannon, P., Freire, J., Roy, P., Simeon, J.: From XML Schema to Relations: A Cost-based Approach to XML Storage. In: Proc. of ICDE (2002)Google Scholar
  3. 3.
    Bonifati, A., Cuzzocrea, A.: Storing and Retrieving XPath Fragments in Structured P2P Networks. Data & Knowledge Engineeering 59 (2006)Google Scholar
  4. 4.
    Bose, S., Fegaras, L.: XFrag: A Query Processing Framework for Fragmented XML Data. In: Proc. of WebDB (2005)Google Scholar
  5. 5.
    Bremer, J.M., Gertz, M.: On distributing xml repositories. In: Proc. of WebDB (2003)Google Scholar
  6. 6.
    Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng Raymond, T., Srivastava, D.: Counting Twig Matches in a Tree. In: Proc. of ICDE (2001)Google Scholar
  7. 7.
    Ezeife, C., Barker, K.: A Comprehensive Approach to Horizontal Class Fragmentation in a Distributed Object based System. Distributed and Parallel Databases 3 (1995)Google Scholar
  8. 8.
    Florescu, D., Hillery, C., Kossman, D., et al.: The BEA/XQRL Streaming XQuery Processor. In: Proc. of VLDB (2003)Google Scholar
  9. 9. web site (2004), Available at
  10. 10.
    Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V., Nierman, A., Paparizos, S., Patel, J., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu., C.: Timber: a Native XML Database. VLDB Journal 11 (2002)Google Scholar
  11. 11.
    Koch, C.: Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage: A Tree Automata-based Approach. In: Proc. of VLDB (2003)Google Scholar
  12. 12.
    Krishnamurthy, R., Chakaravarthy, V.T., Naughton, J.F.: On the Difficulty of Finding Optimal Relational Decompositions for XML Workloads: A Complexity Theoretic Perspective. In: Proc. of ICDT (2003)Google Scholar
  13. 13.
    Lin, X., Orlowska, M., Zhang, Y.: A Graph-based Cluster Approach for Vertical Partitioning in Databases Systems. Data & Knowledge Engineeering, 11 (1993)Google Scholar
  14. 14.
    Ma, H., Schewe, K.D.: Fragmentation of XML Documents. In: Proc. of SBBD (2003)Google Scholar
  15. 15.
    Ma, H., Schewe, K.D.: Heuristic Horizontal XML Fragmentation. In: Proc. of CAiSE (2005)Google Scholar
  16. 16.
    Marian, A., Simeon, J.: Projecting XML Documents. In: Proc. of VLDB (2003)Google Scholar
  17. 17.
    Ozsu, M., Valduriez, P.: Principles of Distributed Database Systems. Alan. Apt. (1999)Google Scholar
  18. 18.
    Polyzotis, N., Garofalakis, M.N.: Statistical synopses for graph-structured XML databases. In: Proc. of SIGMOD (2002)Google Scholar
  19. 19.
    University of Washington’s XML repository (2004), Available at
  20. 20.
    Xmark: An XML Benchmark Project (2002), Available at
  21. 21.
    Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: A Path-based Approach to Storage and Retrieval of XML Documents Using Relational Databases. ACM Transactions on Internet Technology 1 (2001)Google Scholar
  22. 22.
    Wu, Y., Patel, J., Jagadish, H.: Using Histograms to Estimate Answer Sizes for XML Queries. Information Systems 28 (2003)Google Scholar
  23. 23.
    Zhang, N., Haas, P., Josifovski, V., Lohman, G., Zhang, C.: Statistical Learning Techniques for Costing XML Queries. In: Proc. of VLDB (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Angela Bonifati
    • 1
  • Alfredo Cuzzocrea
    • 2
  1. 1.ICAR Inst., National Research CouncilItaly
  2. 2.DEIS Dept., University of CalabriaItaly

Personalised recommendations