Abstract
Integrating, cleaning and analyzing data from heterogeneous sources is often complicated by the large amounts of data and its physical distribution which can result in poor query response time. One approach to speed up the processing is to reduce the cardinality of results – either by querying only the first tuples or by obtaining a sample for further processing. In this paper we address the processing of such queries in a multidatabase environment. We discuss implementations of the query operators, strategies for their placement in a query plan and particularly the usage of histograms for estimating attribute value distributions and result cardinalities in order to parameterize the operators.
Article
Research was supported by the grant FOR 345/1 from the DFG.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Acharya, P.B. Gibbons, and V. Poosala. Aqua: A Fast Decision Support Systems Using Approximate Query Answers. In M.P. Atkinson, M.E. Orlowska, P. Valduriez, S.B. Zdonik, and M.L. Brodie, editors, VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7–10, 1999, Edinburgh, Scotland, UK, pages 754–757. Morgan Kaufmann, 1999.
S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1–3, 1999, Philadephia, Pennsylvania, USA, pages 275–286. ACM Press, 1999.
D. Barbará, W. DuMouchel, C. Faloutsos, P.J. Haas, J.M. Hellerstein, Y.E. Ioannidis, H.V. Jagadish, T. Johnson, R.T. Ng, V. Poosala, K.A. Ross, and K.C. Sevcik. The New Jersey Data Reduction Report. Data Engineering Bulletin, 20(4):3–45, 1997.
M.J. Carey and D. Kossmann. On Saying ”Enough Already!” in SQL. In J. Peckham, editor, SIGMOD 1997, Proceedings of Annual Meeting, May 13–15, 1997, Tucson, Arizona, USA, pages 219–230. ACM Press, 1997.
M.J. Carey and D. Kossmann. Reducing the Braking Distance of an SQL Query Engine. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, August 24–27, 1998, New York City, New York, USA, pages 158–169. Morgan Kaufmann, 1998.
S. Chaudhuri, R. Motwani, and V.R. Narasayya. On Random Sampling over Joins. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1–3, 1999, Philadephia, Pennsylvania,USA, pages 263–274. ACM Press, 1999.
J. Grant, W. Litwin, N. Roussopoulos, and T. Sellis. Query Languages for Relational Multidatabases. The VLDB Journal, 2(2):153–171, April 1993.
G. Graefe. Query Evaluation Techniques For Large Databases. ACM Computing Surveys, 25(2):73–170, 1993.
J.M. Hellerstein, P.J. Haas, and H. Wang. Online Aggregation. In J. Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA, pages 171–182. ACM Press, 1997.
Y.E. Ioannidis and V. Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In M.J. Carey and D.A. Schneider, editors, ACM SIGMOD’ 95, Proceedings of Annual Meeting, San Jose, California, May 22–25, 1995, pages 233–244. ACM Press, 1995.
M. Jarke and J. Koch. Query Optimization in Database Systems. ACM Computing Surveys, 16(2):111–152, 1984.
K.-H. Li. Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n))). ACM Transactions on Mathematical Software, 20(4):481–493, December 1994.
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. SchemaSQL–A Language for Interoperability in Relational Multi-database Systems. In T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda, editors, Proc. of the 22nd Int. Conf. on Very Large Data Bases, VLDB’96, Bombay, India, September 3–6, 1996, pages 239–250, San Francisco, CA, 1996. Morgan Kaufmann Publishers.
F. Olken and D. Rotem. Simple Random Sampling from Relational Databases. In W.W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, VLDB’86 Twelfth International Conference on Very Large Data Bases, August 25–28, 1986, Kyoto, Japan, Proceedings, pages 160–169. Morgan Kaufmann, 1986.
V. Poosala, V. Ganti, and Y.E. Ioannidis. Approximate Query Answering using Histograms. IEEE Data Engineering Bulletin, 22(4):5–14, 1999.
G. Piatetsky-Shapiro and C. Connell. Accurate Estimation of the Number of Tuples Satisfying a Condition. In B. Yormark, editor, SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18–21, 1984, pages 256–276. ACM Press, 1984.
K. Sattler, S. Conrad, and G. Saake. Adding Conflict Resolution Features to a Query Language for Database Federations. In M. Roantree, W. Hasselbring, and S. Conrad, editors, Proc. 3nd Int. Workshop on Engineering Federated Information Systems, EFIS’00, Dublin, Ireland, June, pages 41–52, Berlin, 2000. Akadem. Verlagsgesellschaft.
A.N. Swami and K.B. Schiefer. On the Estimation of Join Result Sizes. In M. Jarke, J.A. Bubenko Jr., and K.G. Jeffery, editors, Advances in Database Technology–EDBT’94. 4th International Conference on Extending Database Technology, Cambridge, United Kingdom, March 28–31, 1994, Proceedings, volume 779 of Lecture Notes in Computer Science, pages 287–300. Springer, 1994.
K.-L. Tan, C. H. Goh, and B. C. Ooi. On Getting Some Answers Quickly, and Perhaps More Later. In Proceedings of the 15th International Conference on Data Engineering, 23–26 March 1999, Sydney, Austrialia, pages 32–39. IEEE Computer Society, 1999.
J.S. Vitter. An Efficient Algorithm for Sequential Random Sampling. ACM Transactions on Mathematical Software, 13(1):58–67, March 1987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sattler, KU., Dunemann, O., Geist, I., Saake, G., Conrad, S. (2001). Limiting Result Cardinalities for Multidatabase Queries Using Histograms. In: Read, B. (eds) Advances in Databases. BNCOD 2001. Lecture Notes in Computer Science, vol 2097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45754-2_10
Download citation
DOI: https://doi.org/10.1007/3-540-45754-2_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42265-5
Online ISBN: 978-3-540-45754-1
eBook Packages: Springer Book Archive