A Synopsis based Approach for XML Fast Approximate Querying

Comai, Sara; Marrara, Stefania; Tanca, Letizia

doi:10.1007/3-540-33289-8_10

Sara Comai³,
Stefania Marrara³ &
Letizia Tanca³

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 203))

292 Accesses

Abstract

In the last few years, XML has spread in many application fields and today it is used as a format to exchange data on the web, to ensure inter-operability among applications. Due to this success, the W3C has proposed a new query language, XQuery [25], specifically designed to query XML data. XQuery is a well-defined but rather complex language [14]. In this work we propose a new approach to overcome the problem of the high computational costs required by aggregate queries over massive XML data collections. In traditional relational warehouses [11] a similar problem is solved by means of fast approximate queries, that use concise data statistics based on histograms or on other statistical techniques. Their most common application is for aggregate queries in modern decision support systems, where large volumes of data need to be queried, and quick and interactive responses from the DBMS are claimed, e.g., to analyze the data in the warehouse in order to get trend information to evaluate marketing strategies. In such applications, users are often more interested to obtain an approximate answer computed in a short time rather than an exact one obtained in some minutes or, at the worst, hours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The galax project, http://www.galaxquery.org/.
Google Scholar
A. Aboulnaga and J. F. Naughton. Building xml statistics for the hidden web. In Proc. CIKM’03 Conference, New Orleans,Louisiana,USA, 2003.
Google Scholar
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. SIGMOD Record (ACM Special Interest Group on Management of Data), 28:275–286, 1999.
Google Scholar
D. Barbarà, W. Dumouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The new jersey data reduction report. In Bulletin of Technical Committee on Data Engineering, pages 20(4): 3–45, 1997.
Google Scholar
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proc. 30th ACM Symp. on the Theory of Computing, pages 327–336, 1998.
Google Scholar
Z. Chen, H. V. Jagadish, F.Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. In ICDE, pages 595–604, 2001.
Google Scholar
C. Faloutsos, Y. Matias, and A. Silberschatz. Modeling skewed distributions using multifractals and the ‘80–20’ law. In Proc. 22rd International Conf. on Very Large Data Bases, pages 299–310, 1996.
Google Scholar
J. Freie, J. R. Haritsa, M. Ramanath, P. Roy, and J. Simeon. Statix: Making xml count. In ACM SIGMOD, Madison, Wisconsin, June 4–6, 2002.
Google Scholar
P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on External Memory Algorithms and Visualization, vol. A, 1999.
Google Scholar
P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. In Proc. of Very Large Data Bases, 1997.
Google Scholar
P. B. Gibbons, V. Poosala, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, S. Ramaswamy, and T. Suel. Aqua: System and techniques for approximate query answering. In Technical Report, Murray Hill, New Jersey, 1998.
Google Scholar
R. Goldman, J. McHugh, and J. Widom. From semistructured data to xml: Migrating the lore data model and query language. In Proc. WebDb, pages 25–30, 1999.
Google Scholar
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In ACM Sigmod, 2001.
Google Scholar
Jan Hidders, Jan Paredaens, and Dirk Van Gucht. A light but formal introduction to XQuery. In Second International XML Database Symposium, 2004.
Google Scholar
P. Kishnan, J. S. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proc. ACM SIGMOD International Conf. on Management of Data., pages 282–293, 1996.
Google Scholar
U. manber. Finding similar files in a large file system. In Proc. Usenix Winter 1994 Technical Conf., pages 1–10, 1994.
Google Scholar
S. Marrara. Aggregate queries in XQuery. PhD thesis, Politecnico di Milano, 2005. PhD Thesis, Politecnico di Milano, XVII PhD School Edition.
Google Scholar
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In Proc. of ACM SIGMOD Conference, pages 448–159, 1998.
Google Scholar
F. Olken. Random sampling from databases., 1993. PhD Thesis, U.C. Berkeley.
Google Scholar
N. Polyzotis and M. Garofalakis. Statistical synopses for graphstructured xml databases. In Proc. ACM SIGMOD Conference, Madison,Wisconsin,USA, 2002.
Google Scholar
N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate xml query answers. In SIGMOD, 2004.
Google Scholar
V. Poosala, Y. loannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimation of range predicates. In Proc. ACM SIGMOD, 1996.
Google Scholar
J. S. Vitter, M. Wang, and B. Iyer. Data cube approximation and histograms via wavelets. In Proc. the 7th Int. Conf. on Information and Knowledge Management., 1998.
Google Scholar
W3C. Xml path language (XPath) version 1.0, 1999. http://www.w3.org/TR/xpath.
Google Scholar
W3C. Xml query (XQuery) version 1.0, 2004. http://www.w3.org/XML/Query.
Google Scholar
Ling Wang and Elke A. Rundensteiner. Updating xquery views.
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione Piazza L. Da Vinci 32, Politecnico di Milano, Milano, I-20133, Italy
Sara Comai, Stefania Marrara & Letizia Tanca

Authors

Sara Comai
View author publications
You can also search for this author in PubMed Google Scholar
Stefania Marrara
View author publications
You can also search for this author in PubMed Google Scholar
Letizia Tanca
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNR IDPA, National Research Council of Italy Institute for the Study of Environmental Process Dynamics, via Pasubio 5, Dalmine, BG, I-24044, Italy
Gloria Bordogna Dr.
Faculty of Engineering, University of Bergamo, viale Marconi 5, Dalmine, BG, I-240044, Italy
Giuseppe Psaila Professor

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Comai, S., Marrara, S., Tanca, L. (2006). A Synopsis based Approach for XML Fast Approximate Querying. In: Bordogna, G., Psaila, G. (eds) Flexible Databases Supporting Imprecision and Uncertainty. Studies in Fuzziness and Soft Computing, vol 203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33289-8_10

Download citation

DOI: https://doi.org/10.1007/3-540-33289-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33288-6
Online ISBN: 978-3-540-33289-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics