Skip to main content

Fast Answering of XPath Query Workloads on Web Collections

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4704))

Abstract

Several web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges.

This paper introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, enabling the efficient evaluation of XPath workloads (supporting all the axes and language constructs in XPath). Experiments validate that DescribeX enables existing document-at-a-time XPath tools to scale up to multi-gigabyte XML collections.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afanasiev, L., Franceschet, M., Marx, M.: XCheck: a platform for benchmarking XQuery engines. In: VLDB, pp. 1247–1250 (2006)

    Google Scholar 

  2. Afanasiev, L., Manolescu, I., Michiels, P.: MemBeR: A micro-benchmark repository for XQuery. In: XSym, pp. 144–161 (2005), http://ilps.science.uva.nl/Resources/MemBeR/

  3. Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and querying large XML repositories. In: ICDE, pp. 261–272 (2005)

    Google Scholar 

  4. Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: VLDB, pp. 141–152 (2003)

    Google Scholar 

  5. Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: SIGMOD, pp. 121–132 (2002)

    Google Scholar 

  6. Consens, M.P., Liu, J.W., Rizzolo, F.: XPlainer: Visual explanations of XPath queries. In: ICDE (2007)

    Google Scholar 

  7. Consens, M.P., Milo, T.: Optimizing queries on files. In: SIGMOD, pp. 301–312 (1994)

    Google Scholar 

  8. Consens, M.P., Rizzolo, F., Vaisman, A.A.: Exploring the (semi-)structure of XML web collections. Technical report, University of Toronto - DCS (2007), http://www.cs.toronto.edu/~consens/describex/

  9. Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006)

    Google Scholar 

  10. Dovier, A., Piazza, C., Policriti, A.: An efficient algorithm for computing bisimulation equivalence. Theoretical Computer Science 311(1-3), 221–256 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  11. Goldman, R., Widom, J.: Dataguides: Enabling query formulation and optimization in semistructured databases. In: VLDB, pp. 436–445 (1997)

    Google Scholar 

  12. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: SIGMOD, pp. 133–144 (2002)

    Google Scholar 

  13. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: ICDE, pp. 129–140 (2002)

    Google Scholar 

  14. Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 889–900. Springer, Heidelberg (2004)

    Google Scholar 

  15. Marx, M.: XPath with conditional axis relations. In: EDBT, pp. 477–494 (2004)

    Google Scholar 

  16. Mendelzon, A.O., Wood, P.T.: Finding regular simple paths in graph databases. SIAM Journal on Computing 24(6), 1235–1258 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  17. Milo, T., Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Nestorov, S., Ullman, J.D., Wiener, J.L., Chawathe, S.S.: Representative objects: Concise representations of semistructured, hierarchial data. In: ICDE, pp. 79–90 (1997)

    Google Scholar 

  19. Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM Journal on Computing 16(6), 973–989 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  20. Polyzotis, N., Garofalakis, M.N.: XCLUSTER synopses for structured XML content. In: ICDE (2006)

    Google Scholar 

  21. Polyzotis, N., Garofalakis, M.N.: XSKETCH synopses for XML data graphs. ACM Transactions on Database Systems (TODS) 31(3), 1014–1063 (2006)

    Article  Google Scholar 

  22. Polyzotis, N., Garofalakis, M.N., Ioannidis, Y.E.: Approximate XML query answers. In: SIGMOD, pp. 263–274 (2004)

    Google Scholar 

  23. Qun, C., Lim, A., Ong, K.W.: D(k)-index: An adaptive structural summary for graph-structured data. In: SIGMOD, pp. 134–144 (2003)

    Google Scholar 

  24. Rizzolo, F., Mendelzon, A.O.: Indexing XML data with ToXin. In: WebDB, pp. 49–54 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Denilson Barbosa Angela Bonifati Zohra Bellahsène Ela Hunt Rainer Unland

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Consens, M.P., Rizzolo, F. (2007). Fast Answering of XPath Query Workloads on Web Collections. In: Barbosa, D., Bonifati, A., Bellahsène, Z., Hunt, E., Unland, R. (eds) Database and XMLTechnologies. XSym 2007. Lecture Notes in Computer Science, vol 4704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75288-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75288-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75287-5

  • Online ISBN: 978-3-540-75288-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics