Abstract
In the last few years, XML has spread in many application fields and today it is used as a format to exchange data on the web, to ensure inter-operability among applications. Due to this success, the W3C has proposed a new query language, XQuery [25], specifically designed to query XML data. XQuery is a well-defined but rather complex language [14]. In this work we propose a new approach to overcome the problem of the high computational costs required by aggregate queries over massive XML data collections. In traditional relational warehouses [11] a similar problem is solved by means of fast approximate queries, that use concise data statistics based on histograms or on other statistical techniques. Their most common application is for aggregate queries in modern decision support systems, where large volumes of data need to be queried, and quick and interactive responses from the DBMS are claimed, e.g., to analyze the data in the warehouse in order to get trend information to evaluate marketing strategies. In such applications, users are often more interested to obtain an approximate answer computed in a short time rather than an exact one obtained in some minutes or, at the worst, hours.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The galax project, http://www.galaxquery.org/.
A. Aboulnaga and J. F. Naughton. Building xml statistics for the hidden web. In Proc. CIKM’03 Conference, New Orleans,Louisiana,USA, 2003.
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. SIGMOD Record (ACM Special Interest Group on Management of Data), 28:275–286, 1999.
D. Barbarà , W. Dumouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The new jersey data reduction report. In Bulletin of Technical Committee on Data Engineering, pages 20(4): 3–45, 1997.
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proc. 30th ACM Symp. on the Theory of Computing, pages 327–336, 1998.
Z. Chen, H. V. Jagadish, F.Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. In ICDE, pages 595–604, 2001.
C. Faloutsos, Y. Matias, and A. Silberschatz. Modeling skewed distributions using multifractals and the ‘80–20’ law. In Proc. 22rd International Conf. on Very Large Data Bases, pages 299–310, 1996.
J. Freie, J. R. Haritsa, M. Ramanath, P. Roy, and J. Simeon. Statix: Making xml count. In ACM SIGMOD, Madison, Wisconsin, June 4–6, 2002.
P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science: Special Issue on External Memory Algorithms and Visualization, vol. A, 1999.
P. B. Gibbons, Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. In Proc. of Very Large Data Bases, 1997.
P. B. Gibbons, V. Poosala, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, S. Ramaswamy, and T. Suel. Aqua: System and techniques for approximate query answering. In Technical Report, Murray Hill, New Jersey, 1998.
R. Goldman, J. McHugh, and J. Widom. From semistructured data to xml: Migrating the lore data model and query language. In Proc. WebDb, pages 25–30, 1999.
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In ACM Sigmod, 2001.
Jan Hidders, Jan Paredaens, and Dirk Van Gucht. A light but formal introduction to XQuery. In Second International XML Database Symposium, 2004.
P. Kishnan, J. S. Vitter, and B. Iyer. Estimating alphanumeric selectivity in the presence of wildcards. In Proc. ACM SIGMOD International Conf. on Management of Data., pages 282–293, 1996.
U. manber. Finding similar files in a large file system. In Proc. Usenix Winter 1994 Technical Conf., pages 1–10, 1994.
S. Marrara. Aggregate queries in XQuery. PhD thesis, Politecnico di Milano, 2005. PhD Thesis, Politecnico di Milano, XVII PhD School Edition.
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In Proc. of ACM SIGMOD Conference, pages 448–159, 1998.
F. Olken. Random sampling from databases., 1993. PhD Thesis, U.C. Berkeley.
N. Polyzotis and M. Garofalakis. Statistical synopses for graphstructured xml databases. In Proc. ACM SIGMOD Conference, Madison,Wisconsin,USA, 2002.
N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate xml query answers. In SIGMOD, 2004.
V. Poosala, Y. loannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimation of range predicates. In Proc. ACM SIGMOD, 1996.
J. S. Vitter, M. Wang, and B. Iyer. Data cube approximation and histograms via wavelets. In Proc. the 7th Int. Conf. on Information and Knowledge Management., 1998.
W3C. Xml path language (XPath) version 1.0, 1999. http://www.w3.org/TR/xpath.
W3C. Xml query (XQuery) version 1.0, 2004. http://www.w3.org/XML/Query.
Ling Wang and Elke A. Rundensteiner. Updating xquery views.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer
About this chapter
Cite this chapter
Comai, S., Marrara, S., Tanca, L. (2006). A Synopsis based Approach for XML Fast Approximate Querying. In: Bordogna, G., Psaila, G. (eds) Flexible Databases Supporting Imprecision and Uncertainty. Studies in Fuzziness and Soft Computing, vol 203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33289-8_10
Download citation
DOI: https://doi.org/10.1007/3-540-33289-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33288-6
Online ISBN: 978-3-540-33289-3
eBook Packages: EngineeringEngineering (R0)