Limiting Result Cardinalities for Multidatabase Queries Using Histograms

Sattler, Kai-Uwe; Dunemann, Oliver; Geist, Ingolf; Saake, Gunter; Conrad, Stefan

doi:10.1007/3-540-45754-2_10

Kai-Uwe Sattler⁵,
Oliver Dunemann⁵,
Ingolf Geist⁵,
Gunter Saake⁵ &
…
Stefan Conrad⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2097))

Included in the following conference series:

British National Conference on Databases

165 Accesses

Abstract

Integrating, cleaning and analyzing data from heterogeneous sources is often complicated by the large amounts of data and its physical distribution which can result in poor query response time. One approach to speed up the processing is to reduce the cardinality of results – either by querying only the first tuples or by obtaining a sample for further processing. In this paper we address the processing of such queries in a multidatabase environment. We discuss implementations of the query operators, strategies for their placement in a query plan and particularly the usage of histograms for estimating attribute value distributions and result cardinalities in order to parameterize the operators.

Article

Research was supported by the grant FOR 345/1 from the DFG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Acharya, P.B. Gibbons, and V. Poosala. Aqua: A Fast Decision Support Systems Using Approximate Query Answers. In M.P. Atkinson, M.E. Orlowska, P. Valduriez, S.B. Zdonik, and M.L. Brodie, editors, VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7–10, 1999, Edinburgh, Scotland, UK, pages 754–757. Morgan Kaufmann, 1999.
Google Scholar
S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1–3, 1999, Philadephia, Pennsylvania, USA, pages 275–286. ACM Press, 1999.
Google Scholar
D. Barbará, W. DuMouchel, C. Faloutsos, P.J. Haas, J.M. Hellerstein, Y.E. Ioannidis, H.V. Jagadish, T. Johnson, R.T. Ng, V. Poosala, K.A. Ross, and K.C. Sevcik. The New Jersey Data Reduction Report. Data Engineering Bulletin, 20(4):3–45, 1997.
Google Scholar
M.J. Carey and D. Kossmann. On Saying ”Enough Already!” in SQL. In J. Peckham, editor, SIGMOD 1997, Proceedings of Annual Meeting, May 13–15, 1997, Tucson, Arizona, USA, pages 219–230. ACM Press, 1997.
Google Scholar
M.J. Carey and D. Kossmann. Reducing the Braking Distance of an SQL Query Engine. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, August 24–27, 1998, New York City, New York, USA, pages 158–169. Morgan Kaufmann, 1998.
Google Scholar
S. Chaudhuri, R. Motwani, and V.R. Narasayya. On Random Sampling over Joins. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1–3, 1999, Philadephia, Pennsylvania,USA, pages 263–274. ACM Press, 1999.
Google Scholar
J. Grant, W. Litwin, N. Roussopoulos, and T. Sellis. Query Languages for Relational Multidatabases. The VLDB Journal, 2(2):153–171, April 1993.
Article Google Scholar
G. Graefe. Query Evaluation Techniques For Large Databases. ACM Computing Surveys, 25(2):73–170, 1993.
Article Google Scholar
J.M. Hellerstein, P.J. Haas, and H. Wang. Online Aggregation. In J. Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA, pages 171–182. ACM Press, 1997.
Google Scholar
Y.E. Ioannidis and V. Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In M.J. Carey and D.A. Schneider, editors, ACM SIGMOD’ 95, Proceedings of Annual Meeting, San Jose, California, May 22–25, 1995, pages 233–244. ACM Press, 1995.
Google Scholar
M. Jarke and J. Koch. Query Optimization in Database Systems. ACM Computing Surveys, 16(2):111–152, 1984.
Article MATH MathSciNet Google Scholar
K.-H. Li. Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n))). ACM Transactions on Mathematical Software, 20(4):481–493, December 1994.
Article MATH Google Scholar
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. SchemaSQL–A Language for Interoperability in Relational Multi-database Systems. In T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda, editors, Proc. of the 22nd Int. Conf. on Very Large Data Bases, VLDB’96, Bombay, India, September 3–6, 1996, pages 239–250, San Francisco, CA, 1996. Morgan Kaufmann Publishers.
Google Scholar
F. Olken and D. Rotem. Simple Random Sampling from Relational Databases. In W.W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, VLDB’86 Twelfth International Conference on Very Large Data Bases, August 25–28, 1986, Kyoto, Japan, Proceedings, pages 160–169. Morgan Kaufmann, 1986.
Google Scholar
V. Poosala, V. Ganti, and Y.E. Ioannidis. Approximate Query Answering using Histograms. IEEE Data Engineering Bulletin, 22(4):5–14, 1999.
Google Scholar
G. Piatetsky-Shapiro and C. Connell. Accurate Estimation of the Number of Tuples Satisfying a Condition. In B. Yormark, editor, SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18–21, 1984, pages 256–276. ACM Press, 1984.
Google Scholar
K. Sattler, S. Conrad, and G. Saake. Adding Conflict Resolution Features to a Query Language for Database Federations. In M. Roantree, W. Hasselbring, and S. Conrad, editors, Proc. 3nd Int. Workshop on Engineering Federated Information Systems, EFIS’00, Dublin, Ireland, June, pages 41–52, Berlin, 2000. Akadem. Verlagsgesellschaft.
Google Scholar
A.N. Swami and K.B. Schiefer. On the Estimation of Join Result Sizes. In M. Jarke, J.A. Bubenko Jr., and K.G. Jeffery, editors, Advances in Database Technology–EDBT’94. 4th International Conference on Extending Database Technology, Cambridge, United Kingdom, March 28–31, 1994, Proceedings, volume 779 of Lecture Notes in Computer Science, pages 287–300. Springer, 1994.
Google Scholar
K.-L. Tan, C. H. Goh, and B. C. Ooi. On Getting Some Answers Quickly, and Perhaps More Later. In Proceedings of the 15th International Conference on Data Engineering, 23–26 March 1999, Sydney, Austrialia, pages 32–39. IEEE Computer Society, 1999.
Google Scholar
J.S. Vitter. An Efficient Algorithm for Sequential Random Sampling. ACM Transactions on Mathematical Software, 13(1):58–67, March 1987.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Magdeburg, P.O. Box 4120, D-39016, Magdeburg, Germany
Kai-Uwe Sattler, Oliver Dunemann, Ingolf Geist & Gunter Saake
Department of Computer Science, University of Munich, Oettingenstr. 67, D-80538, München, Germany
Stefan Conrad

Authors

Kai-Uwe Sattler
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Dunemann
View author publications
You can also search for this author in PubMed Google Scholar
Ingolf Geist
View author publications
You can also search for this author in PubMed Google Scholar
Gunter Saake
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Conrad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Technology Department, CLRC Rutherford Appleton Laboratory, Chilton, Didcot, Oxfordshire, OX11 0QX, UK
Brian Read

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sattler, KU., Dunemann, O., Geist, I., Saake, G., Conrad, S. (2001). Limiting Result Cardinalities for Multidatabase Queries Using Histograms. In: Read, B. (eds) Advances in Databases. BNCOD 2001. Lecture Notes in Computer Science, vol 2097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45754-2_10

Download citation

DOI: https://doi.org/10.1007/3-540-45754-2_10
Published: 22 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42265-5
Online ISBN: 978-3-540-45754-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics