Abstract
Approximate query processing is an adequate technique to reduce response times and system load in cases where approximate results suffice. In database literature, sampling has been proposed to evaluate queries approximately by using only a subset of the original data. Unfortunately, most of these methods consider either only certain problems arising due to the use of samples in databases (e.g. data skew) or only join operations involving multiple relations. We describe how well-known sampling techniques dealing with group-by operations can be combined with foreign-key joins such that the join is computed after the generation of the sample. In detail, we show how senate sampling and small group sampling can be combined efficiently with the idea of join synopses. Additionally, we introduce different algorithms which maintain the sample if the underlying data changes. Finally, we prove the superiority of our method to the naive approach in an extensive set of experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
University of California at Berkeley: How much Information? (2003), http://www.sims.berkeley.edu/research/projects/how-much-info-2003/
Acharya, S., Gibbons, P., Poosala, V.: Congressional Samples for Approximate Answering of Group-By Queries. In: Proc. ACM SIGMOD, pp. 487–498 (2000)
Babcock, B., Chaudhuri, S., Das, G.: Dynamic sample selection for approximate query processing. In: Proc. ACM SIGMOD, pp. 539–550 (2003)
Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. In: Proc. ACM SIGMOD, pp. 275–286 (1999)
Barbará, D., DuMouchel, W., Faloutsos, C., Haas, P., Hellerstein, J., Ioannidis, Y., Jagadish, H., Johnson, T., Ng, R., Poosala, V., Ross, K., Sevcik, K.: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20, 3–45 (1997)
Hellerstein, J., Haas, P., Wang, H.: Online Aggregation. In: Proc. ACM SIGMOD, pp. 171–182 (1997)
Vitter, J.: Random Sampling with a Reservoir. ACM Transactions on Mathematical Software 11, 37–57 (1985)
Gemulla, R., Lehner, W.: On Incremental Maintenance of Materialized Offline Samples (2005) (submitted for publication)
Ganti, V., Lee, M., Ramakrishnan, R.: ICICLES: Self-Tuning Samples for Approximate Query Answering. The VLDB Journal, 176–187 (2000)
Chaudhuri, S., Das, G., Datar, M., Motwani, R., Narasayya, V.: Overcoming Limitations of Sampling for Aggregation Queries. In: Proc. ICDE, pp. 534–544 (2001)
Chaudhuri, S., Motwani, R., Narasayya, V.: On Random Sampling over Joins. In: Proc. ACM SIGMOD, pp. 263–274 (1999)
Gemulla, R., Berthold, H., Lehner, W.: Hierarchical Group-based Sampling (2005), Full version available at http://wwwdb.inf.tu-dresden.de/files/team/gemulla/files/hgs-fullversion.pdf
Transaction Processing Performance Council: TPC-D Benchmark Version 2.1 (1998), http://www.tpc.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gemulla, R., Berthold, H., Lehner, W. (2005). Hierarchical Group-Based Sampling. In: Jackson, M., Nelson, D., Stirk, S. (eds) Database: Enterprise, Skills and Innovation. BNCOD 2005. Lecture Notes in Computer Science, vol 3567. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11511854_10
Download citation
DOI: https://doi.org/10.1007/11511854_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26973-1
Online ISBN: 978-3-540-31677-0
eBook Packages: Computer ScienceComputer Science (R0)