Skip to main content

Limiting Result Cardinalities for Multidatabase Queries Using Histograms

  • Conference paper
  • First Online:
Advances in Databases (BNCOD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2097))

Included in the following conference series:

  • 165 Accesses

Abstract

Integrating, cleaning and analyzing data from heterogeneous sources is often complicated by the large amounts of data and its physical distribution which can result in poor query response time. One approach to speed up the processing is to reduce the cardinality of results – either by querying only the first tuples or by obtaining a sample for further processing. In this paper we address the processing of such queries in a multidatabase environment. We discuss implementations of the query operators, strategies for their placement in a query plan and particularly the usage of histograms for estimating attribute value distributions and result cardinalities in order to parameterize the operators.

Article

Research was supported by the grant FOR 345/1 from the DFG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Acharya, P.B. Gibbons, and V. Poosala. Aqua: A Fast Decision Support Systems Using Approximate Query Answers. In M.P. Atkinson, M.E. Orlowska, P. Valduriez, S.B. Zdonik, and M.L. Brodie, editors, VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7–10, 1999, Edinburgh, Scotland, UK, pages 754–757. Morgan Kaufmann, 1999.

    Google Scholar 

  2. S. Acharya, P.B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1–3, 1999, Philadephia, Pennsylvania, USA, pages 275–286. ACM Press, 1999.

    Google Scholar 

  3. D. Barbará, W. DuMouchel, C. Faloutsos, P.J. Haas, J.M. Hellerstein, Y.E. Ioannidis, H.V. Jagadish, T. Johnson, R.T. Ng, V. Poosala, K.A. Ross, and K.C. Sevcik. The New Jersey Data Reduction Report. Data Engineering Bulletin, 20(4):3–45, 1997.

    Google Scholar 

  4. M.J. Carey and D. Kossmann. On Saying ”Enough Already!” in SQL. In J. Peckham, editor, SIGMOD 1997, Proceedings of Annual Meeting, May 13–15, 1997, Tucson, Arizona, USA, pages 219–230. ACM Press, 1997.

    Google Scholar 

  5. M.J. Carey and D. Kossmann. Reducing the Braking Distance of an SQL Query Engine. In A. Gupta, O. Shmueli, and J. Widom, editors, VLDB’98, August 24–27, 1998, New York City, New York, USA, pages 158–169. Morgan Kaufmann, 1998.

    Google Scholar 

  6. S. Chaudhuri, R. Motwani, and V.R. Narasayya. On Random Sampling over Joins. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1–3, 1999, Philadephia, Pennsylvania,USA, pages 263–274. ACM Press, 1999.

    Google Scholar 

  7. J. Grant, W. Litwin, N. Roussopoulos, and T. Sellis. Query Languages for Relational Multidatabases. The VLDB Journal, 2(2):153–171, April 1993.

    Article  Google Scholar 

  8. G. Graefe. Query Evaluation Techniques For Large Databases. ACM Computing Surveys, 25(2):73–170, 1993.

    Article  Google Scholar 

  9. J.M. Hellerstein, P.J. Haas, and H. Wang. Online Aggregation. In J. Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA, pages 171–182. ACM Press, 1997.

    Google Scholar 

  10. Y.E. Ioannidis and V. Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In M.J. Carey and D.A. Schneider, editors, ACM SIGMOD’ 95, Proceedings of Annual Meeting, San Jose, California, May 22–25, 1995, pages 233–244. ACM Press, 1995.

    Google Scholar 

  11. M. Jarke and J. Koch. Query Optimization in Database Systems. ACM Computing Surveys, 16(2):111–152, 1984.

    Article  MATH  MathSciNet  Google Scholar 

  12. K.-H. Li. Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n))). ACM Transactions on Mathematical Software, 20(4):481–493, December 1994.

    Article  MATH  Google Scholar 

  13. L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. SchemaSQL–A Language for Interoperability in Relational Multi-database Systems. In T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda, editors, Proc. of the 22nd Int. Conf. on Very Large Data Bases, VLDB’96, Bombay, India, September 3–6, 1996, pages 239–250, San Francisco, CA, 1996. Morgan Kaufmann Publishers.

    Google Scholar 

  14. F. Olken and D. Rotem. Simple Random Sampling from Relational Databases. In W.W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, VLDB’86 Twelfth International Conference on Very Large Data Bases, August 25–28, 1986, Kyoto, Japan, Proceedings, pages 160–169. Morgan Kaufmann, 1986.

    Google Scholar 

  15. V. Poosala, V. Ganti, and Y.E. Ioannidis. Approximate Query Answering using Histograms. IEEE Data Engineering Bulletin, 22(4):5–14, 1999.

    Google Scholar 

  16. G. Piatetsky-Shapiro and C. Connell. Accurate Estimation of the Number of Tuples Satisfying a Condition. In B. Yormark, editor, SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18–21, 1984, pages 256–276. ACM Press, 1984.

    Google Scholar 

  17. K. Sattler, S. Conrad, and G. Saake. Adding Conflict Resolution Features to a Query Language for Database Federations. In M. Roantree, W. Hasselbring, and S. Conrad, editors, Proc. 3nd Int. Workshop on Engineering Federated Information Systems, EFIS’00, Dublin, Ireland, June, pages 41–52, Berlin, 2000. Akadem. Verlagsgesellschaft.

    Google Scholar 

  18. A.N. Swami and K.B. Schiefer. On the Estimation of Join Result Sizes. In M. Jarke, J.A. Bubenko Jr., and K.G. Jeffery, editors, Advances in Database Technology–EDBT’94. 4th International Conference on Extending Database Technology, Cambridge, United Kingdom, March 28–31, 1994, Proceedings, volume 779 of Lecture Notes in Computer Science, pages 287–300. Springer, 1994.

    Google Scholar 

  19. K.-L. Tan, C. H. Goh, and B. C. Ooi. On Getting Some Answers Quickly, and Perhaps More Later. In Proceedings of the 15th International Conference on Data Engineering, 23–26 March 1999, Sydney, Austrialia, pages 32–39. IEEE Computer Society, 1999.

    Google Scholar 

  20. J.S. Vitter. An Efficient Algorithm for Sequential Random Sampling. ACM Transactions on Mathematical Software, 13(1):58–67, March 1987.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sattler, KU., Dunemann, O., Geist, I., Saake, G., Conrad, S. (2001). Limiting Result Cardinalities for Multidatabase Queries Using Histograms. In: Read, B. (eds) Advances in Databases. BNCOD 2001. Lecture Notes in Computer Science, vol 2097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45754-2_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-45754-2_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42265-5

  • Online ISBN: 978-3-540-45754-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics