Smooth Interpolating Histograms with Error Guarantees

  • Thomas Neumann
  • Sebastian Michel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5071)


Accurate selectivity estimations are essential for query optimization decisions where they are typically derived from various kinds of histograms which condense value distributions into compact representations. The estimation accuracy of existing approaches typically varies across the domain, with some estimations being very accurate and some quite inaccurate. This is in particular unfortunate when performing a parametric search using these estimations, as the estimation artifacts can dominate the search results. We propose the usage of linear splines to construct histograms with known error guarantees across the whole continuous domain. These histograms are particularly well suited for using the estimates in parameter optimization. We show by a comprehensive performance evaluation using both synthetic and real world data that our approach clearly outperforms existing techniques.


Maximum Error Range Query Point Query Average Relative Error Query Optimization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Güntzer, U., Balke, W.T., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)Google Scholar
  3. 3.
    Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)Google Scholar
  4. 4.
    Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC, pp. 206–215 (2004)Google Scholar
  5. 5.
    Yu, H., Li, H.G., Wu, P., Agrawal, D., Abbadi, A.E.: Efficient processing of distributed top-k queries. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 65–74. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Neumann, T., Michel, S.: Algebraic query optimization for distributed top-k queries. In: BTW, pp. 324–343 (2007)Google Scholar
  7. 7.
    Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)Google Scholar
  8. 8.
    König, A.C., Weikum, G.: Combining histograms and parametric curve fitting for feedback-driven query result-size estimation. In: VLDB, pp. 423–434 (1999)Google Scholar
  9. 9.
    Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)Google Scholar
  10. 10.
    Deshpande, A., Garofalakis, M.N., Rastogi, R.: Independence is good: Dependency-based histogram synopses for high-dimensional data. In: SIGMOD, pp. 199–210 (2001)Google Scholar
  11. 11.
    Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: SIGMOD, pp. 294–305 (1996)Google Scholar
  12. 12.
    Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275–286 (1998)Google Scholar
  13. 13.
    Garofalakis, M.N., Kumar, A.: Wavelet synopses for general error metrics. ACM Trans. Database Syst. 30(4), 888–928 (2005)CrossRefGoogle Scholar
  14. 14.
    Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: SIGMOD, pp. 448–459 (1998)Google Scholar
  15. 15.
    Goodrich, M.T.: Efficient piecewise-linear function approximation using the uniform metric. Discrete & Computational Geometry 14(4), 445–462 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Scott, D.W.: Multivariate Density Estimation: Theory, practice, and visualization. Wiley, Chichester (1992)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Thomas Neumann
    • 1
  • Sebastian Michel
    • 2
  1. 1.Max-Planck-Institut InformatikSaarbrückenGermany
  2. 2.École Polytechnique Fédérale de LausanneLausanneSwitzerland

Personalised recommendations