Workload-Optimal Histograms on Streams

  • S. Muthukrishnan
  • M. Strauss
  • X. Zheng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3669)

Abstract

A histogram is a piecewise-constant approximation of an observed data distribution. A histogram is used as a small-space, approximate synopsis of the underlying data distribution, which is often too large to be stored precisely. Histograms have found many applications in database management systems, perhaps most commonly for query selectivity estimation in query optimizers [1], but have also found applications in approximate query answering [2], load balancing in parallel join execution [3], mining time-series data [4], partition-based temporal join execution, query pro.ling for user feedback, etc. Ioannidis has a nice overview of the history of histograms, their applications, and their use in commercial DBMSs [5]. Also, Poosala’s thesis provides a systematic treatment of different types of histograms [3].

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ioannidis, Y., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. 18, 709–748 (1993)CrossRefGoogle Scholar
  2. 2.
    Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: The aqua approximate query answering system. In: SIGMOD Conference, pp. 574–576 (1999)Google Scholar
  3. 3.
    Poosala, V.: Histogram-based estimation techniques in database systems. PhD thesis, Univ. of Wisconsin (1997)Google Scholar
  4. 4.
    Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proc. SIGMOD (2001)Google Scholar
  5. 5.
    Ioannidis, Y.: The history of histograms (abridged). In: Proc. VLDB (2003)Google Scholar
  6. 6.
    Ioannidis, Y., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. SIGMOD, pp. 233–244 (1995)Google Scholar
  7. 7.
    Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal histograms with quality guarantees. In: Proc. VLDB, pp. 275–286 (1998)Google Scholar
  8. 8.
    Muthukrishnan, S.: Data stream algorithms and applications (2003), http://www.cs.rutgers.edu/~muthu/stream-1-1.ps
  9. 9.
    Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proc. ACM STOC, pp. 471–475 (2001)Google Scholar
  10. 10.
    Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: Proc. ICDE (2002)Google Scholar
  11. 11.
    Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M.: Histogramming data streams with fast per-item processing. In: Proc 29th ICALP, pp. 681–692 (2002)Google Scholar
  12. 12.
    Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proc. ACM STOC, pp. 389–398 (2002)Google Scholar
  13. 13.
    Chen, C., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: Proc. ACM SIGMOD (1994)Google Scholar
  14. 14.
    Konig, A., Weikum, G.: Combining histograms and parametric curve fitting for feedback driven query result size estimation. In: Proc. VLDB (1999)Google Scholar
  15. 15.
    Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: Proc. ACM SIGMOD (1999)Google Scholar
  16. 16.
    Qiao, L., Agrawal, D., Abbadi, A.E.: Rhist: adaptive summarization over continuous data streams. In: Proc. CIKM, pp. 469–476 (2002)Google Scholar
  17. 17.
    Ganti, V., Lee, M., Ramakrishnan, R.: Icicles–self-tuning samples for approximate query answering. In: Proc. VLDB (2000)Google Scholar
  18. 18.
    Stillger, M., Lohman, G., Markl, V., Kandil, M.: Leo - db2’s learning optimizer. In: Proc. VLDB, pp. 19–28 (2001)Google Scholar
  19. 19.
    Muthukrishnan, S.: Nonuniform sparse approximation theory with Haar wavelets. Technical report, DIMACS (2004)Google Scholar
  20. 20.
    Guha, S.: A note on wavelet optimization (2004), http://www.cis.upenn.edu/~sudipto/notes/wavelet.pdf.gz
  21. 21.
    Matias, Y., Urieli, D.: Optimal workload-based wavelet synopses, Technical report, TAU (2004)Google Scholar
  22. 22.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24, 530–536 (1978)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. IEEE FOCS, pp. 390–398 (2000)Google Scholar
  24. 24.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–343 (1977)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Muthukrishnan, S., Strauss, M., Zheng, X.: Workload-optimal histograms on streams. Technical report, DIMACS (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • S. Muthukrishnan
    • 1
  • M. Strauss
    • 2
  • X. Zheng
    • 2
  1. 1.Supported by NSF ITR 0220280 and NSF 0354600, Rutgers University 
  2. 2.Supported by NSF DMS 0354600, University of Michigan 

Personalised recommendations