Skip to main content

A Synopsis based Approach for XML Fast Approximate Querying

  • Chapter
Flexible Databases Supporting Imprecision and Uncertainty

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 203))

  • 292 Accesses

Abstract

In the last few years, XML has spread in many application fields and today it is used as a format to exchange data on the web, to ensure inter-operability among applications. Due to this success, the W3C has proposed a new query language, XQuery [25], specifically designed to query XML data. XQuery is a well-defined but rather complex language [14]. In this work we propose a new approach to overcome the problem of the high computational costs required by aggregate queries over massive XML data collections. In traditional relational warehouses [11] a similar problem is solved by means of fast approximate queries, that use concise data statistics based on histograms or on other statistical techniques. Their most common application is for aggregate queries in modern decision support systems, where large volumes of data need to be queried, and quick and interactive responses from the DBMS are claimed, e.g., to analyze the data in the warehouse in order to get trend information to evaluate marketing strategies. In such applications, users are often more interested to obtain an approximate answer computed in a short time rather than an exact one obtained in some minutes or, at the worst, hours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The galax project, http://www.galaxquery.org/.

    Google Scholar 

  2. A. Aboulnaga and J. F. Naughton. Building xml statistics for the hidden web. In Proc. CIKM’03 Conference, New Orleans,Louisiana,USA, 2003.

    Google Scholar 

  3. S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. SIGMOD Record (ACM Special Interest Group on Management of Data), 28:275–286, 1999.

    Google Scholar 

  4. D. Barbarà, W. Dumouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The new jersey data reduction report. In Bulletin of Technical Committee on Data Engineering, pages 20(4): 3–45, 1997.

    Google Scholar 

  5. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proc. 30th ACM Symp. on the Theory of Computing, pages 327–336, 1998.

    Google Scholar 

  6. Z. Chen, H. V. Jagadish, F.Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. In ICDE, pages 595–604, 2001.

    Google Scholar 

  7. C. Faloutsos, Y. Matias, and A. Silberschatz. Modeling skewed distributions using multifractals and the ‘80–20’ law. In Proc. 22rd International Conf. on Very Large Data Bases, pages 299–310, 1996.

    Google Scholar 

  8. J. Freie, J. R. Haritsa, M. Ramanath, P. Roy, and J. Simeon. Statix: Making xml count. In ACM SIGMOD, Madison, Wisconsin, June 4–6, 2002.

    Google Scholar 

  9. P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on External Memory Algorithms and Visualization, vol. A, 1999.

    Google Scholar 

  10. P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. In Proc. of Very Large Data Bases, 1997.

    Google Scholar 

  11. P. B. Gibbons, V. Poosala, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, S. Ramaswamy, and T. Suel. Aqua: System and techniques for approximate query answering. In Technical Report, Murray Hill, New Jersey, 1998.

    Google Scholar 

  12. R. Goldman, J. McHugh, and J. Widom. From semistructured data to xml: Migrating the lore data model and query language. In Proc. WebDb, pages 25–30, 1999.

    Google Scholar 

  13. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In ACM Sigmod, 2001.

    Google Scholar 

  14. Jan Hidders, Jan Paredaens, and Dirk Van Gucht. A light but formal introduction to XQuery. In Second International XML Database Symposium, 2004.

    Google Scholar 

  15. P. Kishnan, J. S. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proc. ACM SIGMOD International Conf. on Management of Data., pages 282–293, 1996.

    Google Scholar 

  16. U. manber. Finding similar files in a large file system. In Proc. Usenix Winter 1994 Technical Conf., pages 1–10, 1994.

    Google Scholar 

  17. S. Marrara. Aggregate queries in XQuery. PhD thesis, Politecnico di Milano, 2005. PhD Thesis, Politecnico di Milano, XVII PhD School Edition.

    Google Scholar 

  18. Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In Proc. of ACM SIGMOD Conference, pages 448–159, 1998.

    Google Scholar 

  19. F. Olken. Random sampling from databases., 1993. PhD Thesis, U.C. Berkeley.

    Google Scholar 

  20. N. Polyzotis and M. Garofalakis. Statistical synopses for graphstructured xml databases. In Proc. ACM SIGMOD Conference, Madison,Wisconsin,USA, 2002.

    Google Scholar 

  21. N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate xml query answers. In SIGMOD, 2004.

    Google Scholar 

  22. V. Poosala, Y. loannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimation of range predicates. In Proc. ACM SIGMOD, 1996.

    Google Scholar 

  23. J. S. Vitter, M. Wang, and B. Iyer. Data cube approximation and histograms via wavelets. In Proc. the 7th Int. Conf. on Information and Knowledge Management., 1998.

    Google Scholar 

  24. W3C. Xml path language (XPath) version 1.0, 1999. http://www.w3.org/TR/xpath.

    Google Scholar 

  25. W3C. Xml query (XQuery) version 1.0, 2004. http://www.w3.org/XML/Query.

    Google Scholar 

  26. Ling Wang and Elke A. Rundensteiner. Updating xquery views.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer

About this chapter

Cite this chapter

Comai, S., Marrara, S., Tanca, L. (2006). A Synopsis based Approach for XML Fast Approximate Querying. In: Bordogna, G., Psaila, G. (eds) Flexible Databases Supporting Imprecision and Uncertainty. Studies in Fuzziness and Soft Computing, vol 203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33289-8_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-33289-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33288-6

  • Online ISBN: 978-3-540-33289-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics