# Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

# Histogram

• Qing Zhang
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_544

## Definition

Given a relation R and an attribute X of R, the domain D of X is the set of all possible values of X, and a finite set V(⊆ D)V(⊆D) denotes the distinct values of X in an instance r of R. Let V be ordered, that is, V = { vi : 1 ≤ in}V = {vi : 1 ≤ i ≤ n}, where vi < vjvi < vj if  i < ji < j. The instance r of R restricted to X is denoted by T, and can be represented as T = {( v1, f1), ⋯( vn, fn)}T = {(v1, f1), ⋯(vn, fn)}. In T, each v i is distinct and is called a value of T; and f i is the occurrence of v i in T and is called the frequency of v i, and T is called the data distribution. A histogram on data distribution T is constructed by the following two steps:
1. 1.

Partitioning the values of T into β(≥1)β(≥1) disjoint intervals (called buckets) – {Bi : 1 ≤ iβ}{Bi : 1 ≤ i ≤ β} – such that each value in Bi is smaller than that in Bi if i < ji < j

2. 2.

Approximately representing the frequencies and values in each bucket

## Key Points

Histogram, as a summarization of the data...

This is a preview of subscription content, log in to check access.

1. 1.
Buccafurri F, Rosaci D, Doutieri L, Sacca D. Improving range query estimation on histograms. In: Proceedings of the 18th International Conference on Data Engineering; 2002. p. 628–38.Google Scholar
2. 2.
Konig AC, Weikum G.. Combining histograms and parametric curve fitting for feedback-driven query result-size estimation. In: Proceedings of the 25th International Conference on Very Large Data Bases; 1999.Google Scholar
3. 3.
Poosala V, Ioannidis YE., Haas PJ, Shekita EJ. Improved histograms for selectivity estimation of range predicates. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 294–305.Google Scholar