Skip to main content
Log in

Efficient two-dimensional Haar\(^+\) synopsis construction for the maximum absolute error measure

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Several wavelet synopsis construction algorithms were previously proposed for optimal Haar\(^+\) synopses. Recently, we proposed the OptExtHP-EB algorithm to find an optimal one-dimensional \(\hbox {Haar}^+\) synopsis. By utilizing the novel properties of optimal synopses, OptExtHP-EB represents the set of optimal synopses in a node of a \(\hbox {Haar}^+\) tree by a set of extended synopses. While it is much faster than the previous \(\hbox {Haar}^+\) synopsis construction algorithms, it can handle only one-dimensional data. In this paper, we propose the OptExtHP-EB2D algorithm for two-dimensional \(\hbox {Haar}^+\) synopses by extending OptExtHP-EB. While a one-dimensional \(\hbox {Haar}^+\) tree has only two child nodes and three coefficients in a node, a two-dimensional \(\hbox {Haar}^+\) tree is much more complex in that it has four child nodes and seven coefficients per node. Thus, for each possible subset of the coefficients selected in a node, we develop the efficient methods to compute a set of optimal synopses denoted by extended synopses. Our experiments confirm the effectiveness of our proposed OptExtHP-EB2D algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: a multidimensional workload-aware histogram. In: ACM Sigmod Record, vol. 30, pp. 211–222. ACM (2001)

  2. Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. 10(2–3), 199–223 (2001)

    MATH  Google Scholar 

  3. Cormode, G., Garofalakis, M., Sacharidis, D.: Fast approximate wavelet tracking on streams. In: International Conference on Extending Database Technology, pp. 4–22. Springer (2006)

  4. Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. ACM SIGMOD Rec. 30(2), 199–210 (2001)

    Article  Google Scholar 

  5. Garofalakis, M., Gibbons, P.B.: Probabilistic wavelet synopses. ACM TODS 29(1), 43–90 (2004)

    Article  Google Scholar 

  6. Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: PODS, pp. 166–176 (2004)

  7. Garofalakis, M., Kumar, A.: Wavelet synopses for general error metrics. TODS 30(4), 888–928 (2005)

    Article  Google Scholar 

  8. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decompositions of data streams. TKDE 15(3), 541–554 (2003)

    Google Scholar 

  9. Guha, S.: Space efficiency in synopsis construction algorithms. In: VLDB, pp. 409–420 (2005)

  10. Guha, S.: On the space-time of optimal, approximate and streaming algorithms for synopsis construction problems. VLDB J. 17(6), 1509–1535 (2008)

    Article  Google Scholar 

  11. Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-Euclidean error. In: SIGKDD, pp. 88–97 (2005)

  12. Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. Inf. Theory 54(2), 811–830 (2008)

    Article  MathSciNet  Google Scholar 

  13. Guha, S., Park, H., Shim, K.: Wavelet synopsis for hierarchical range queries with workloads. VLDB J. 17(5), 1079–1099 (2008)

    Article  Google Scholar 

  14. Jestes, J., Yi, K., Li, F.: Building wavelet histograms on large data in mapreduce. PVLDB 5(2), 109–120 (2011)

    Google Scholar 

  15. Karras, P.: Optimality and scalability in lattice histogram construction. PVLDB 2(1), 670–681 (2009)

    Google Scholar 

  16. Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: VLDB, pp. 421–432 (2005)

  17. Karras, P., Mamoulis, N.: The Haar+ tree: a refined synopsis data structure. In: ICDE, pp. 436–445 (2007)

  18. Karras, P., Mamoulis, N.: Hierarchical synopses with optimal error guarantees. TODS 33(3), 18 (2008)

    Article  Google Scholar 

  19. Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: SIGKDD, pp. 380–389. ACM (2007)

  20. Kim, J., Min, J.K., Shim, K.: Efficient haar+ synopsis construction for the maximum absolute error measure. PVLDB 11(1), 40–52 (2017)

    Google Scholar 

  21. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: SIGMOD, vol. 27, pp. 448–459. ACM (1998)

  22. Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: VLDB, pp. 101–110 (2000)

  23. Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)

  24. Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. In: ACM SIGMOD Record, vol. 17, pp. 28–36. ACM (1988)

  25. Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: FSTTCS, pp. 285–296 (2005)

    Google Scholar 

  26. Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: algorithms, complexity and applications. In: International Conference on Database Theory, pp. 236–256. Springer (1999)

  27. Muthukrishnan, S., Strauss, M.: Rangesum histograms. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 233–242. Society for Industrial and Applied Mathematics (2003)

  28. Mytilinis, I., Tsoumakos, D., Koziris, N.: Distributed wavelet thresholding for maximum error metrics. In: SIGMOD, pp. 663–677. ACM (2016)

  29. Natsev, A., Rastogi, R., Shim, K.: Walrus: a similarity retrieval algorithm for image databases. SIGMOD 28, 395–406 (1999)

    Article  Google Scholar 

  30. Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. VLDB 97, 486–495 (1997)

    Google Scholar 

  31. Reiss, F., Garofalakis, M., Hellerstein, J.M.: Compact histograms for hierarchical identifiers. In: VLDB, pp. 870–881 (2006)

  32. Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: Proceedings of the 22nd International Conference on Data Engineering, 2006. ICDE’06, pp. 39–39. IEEE (2006)

  33. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 428–439. ACM (2002)

  34. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: SIGMOD, vol. 28, pp. 193–204. ACM (1999)

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1A02937186) as well as Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7063570). It was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2019R1F1A1062511).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Ki Min.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 5

For a pair of ranges \(r\in R\) and \(r'\in R'\), \(R_{\mathsf {req}}(r\,{\bowtie }_{\delta }r')=\{r \oplus z\cdot \delta \,|\, mat_\mathsf {max}(r,r') \le z\cdot \delta \le mat_\mathsf {min}(r,r'), z\in {\mathbb {Z}}\}\). Let \(r_1\) and \(r'_1\) be the front ranges of R and \(R'\), respectively. Then, for every range \(r_{\delta }\in R_{\mathsf {req}}(r\,{\bowtie }_{\delta }r')\) with \(r\in R\) and \(r'\in R'\), we have the followings.

$$\begin{aligned}&r_{\delta }.{\mathsf {min}}= r.{\mathsf {min}}+ z\cdot \delta&(\text {since }r_{\delta }=r\oplus z\cdot \delta )\\&\quad \ge r.{\mathsf {min}}+ mat_\mathsf {max}(r_{},r')&(\text {due to }z\cdot \delta \ge mat_\mathsf {max}(r_{},r'))\\&\quad = r.{\mathsf {min}}+ (r'.{\mathsf {max}}\mathsf {-} r_{}.{\mathsf {max}})/2&(\text {by the value of } mat_\mathsf {max}(r_{},r'))\\&\quad \ge r_{1}.{\mathsf {min}} + (r'_1.\mathsf {max}\mathsf {-} r_1.\mathsf {max})/2&(\text {since }R\text { and }R'\text { are }\delta \text {-shifted range sets}) \\&\quad = r_{1}.{\mathsf {min}} + mat_\mathsf {max}(r_1,r'_1)&(\text {by the value of } mat_\mathsf {max}(r_{1},r'_1))\\&r_{\delta }.{\mathsf {max}} = r.\mathsf {max}+ z\cdot \delta&(\text {since }r_{\delta }=r\oplus z\cdot \delta )\\&\quad \ge r_{}.{\mathsf {max}}+ mat_\mathsf {max}(r_{},r')&(\text {due to }z\cdot \delta \ge mat_\mathsf {max}(r_{},r')) \\&\quad = (r'.{\mathsf {max}}+ r_{}.{\mathsf {max}})/2&(\text {by }mat_\mathsf {max}(r_{},r')= (r'.{\mathsf {max}}\mathsf {-} r_{}.{\mathsf {max}})/2)\\&\quad \ge (r'_{1}.{\mathsf {max}}+ r_{1}.{\mathsf {max}})/2&(\text {since }R\text { and }R'\text { are }\delta \text {-shifted range set}) \\&\quad = r_{1}.\mathsf {max} + mat_\mathsf {max}(r_1,r'_1)&(\text {by }mat_\mathsf {max}(r_1,r'_1)= (r'_{1}.{\mathsf {max}}\mathsf {-} r_{1}.{\mathsf {max}})/2) \end{aligned}$$

Since \(r_{\delta }.\mathsf {min}\ge (R_{\mathsf {F}}.\mathsf {front}).\mathsf {min}\) and \(r_{\delta }.\mathsf {max}\ge (R_{\mathsf {F}}. \mathsf {front}).\mathsf {max}\), \(R_{\mathsf {req}}(R {\bowtie }_{\delta } R').\mathsf {front}=R_{\mathsf {F}}.\mathsf {front}\). Similarly, we can show that \(R_{\mathsf {req}}(R\,{\bowtie }_{\delta }R').\mathsf {rear}=R_{\mathsf {rear}}.\mathsf {rear}\).

Since \(R_{\mathsf {req}}(r_{}\,{\bowtie }_{\delta }r')\) is a \(\delta \)-shifted range set, there always exists a range \(r''\) in \(R\,{\bowtie }_{\delta }R'\) such that \(r''.\mathsf {min}\) is located in \([(R_{\mathsf {req}}.\mathsf {front}).\mathsf {min},(R_{\mathsf {rear}}.\mathsf {rear}).\mathsf {min}]\). Thus, \(R_\mathsf {req}(R\,{\bowtie }_{\delta }R')\) is also a \(\delta \)-shifted range set whose front and rear ranges are \(R_{\mathsf {F}}.\mathsf {front}\) and \(R_{\mathsf {R}}.\mathsf {rear}\), respectively. \(\square \)

Proof of Lemma 6

We prove each case as follows:

(a) When every range in R does not contain any range in \(R'\): We break the proof into two subcases.

(a-1) When \(r_{1}.{\mathsf {min}} <r'_{1}.{\mathsf {min}}\): If \(r_{m}.{\mathsf {min}} \ge r'_{1}.{\mathsf {min}}\) holds, since R is a \(\delta \)-shifted range set, there always exists a range \(r_j \in R\) such that \(r_{j}.{\mathsf {min}} =r'_{1}.{\mathsf {min}}\). Then, \(r_j\) contains \(r'_1\), it is a contradiction. Thus, we have \(r_{m}.{\mathsf {min}} <r'_{1}.{\mathsf {min}}\). In this case, \(r_m\) and \(r'_1\) are the pair of the ranges whose minimum values are the closest among all pairs of the ranges in R and \(R'\), respectively.

For a pair of ranges \(r_j\in R\) and \(r'_k\in R'\), let \(\mathtt{mbr}(\{r_j, r'_k\})=[e_\mathsf {min}, e_\mathsf {max}]\). Then, we have the property \(e_\mathsf {min} \le \min (r_{m}.{\mathsf {min}}, r'_{1}.{\mathsf {min}})\) from the following inequalities.

$$\begin{aligned} e_\mathsf {min}&= \min (r_{j}.{\mathsf {min}},r'_{k}.{\mathsf {min}})&(\text {by Definition}~6) \\&\le \min (r_{m}.{\mathsf {min}},r'_{1}.{\mathsf {min}})&(\text {since }r_j.\mathsf {min}\le r_{m}.{\mathsf {min}}\le r'_{1}.{\mathsf {min}}) \end{aligned}$$

Symmetrically, we can show \( \max (r_{m}.{\mathsf {max}},r'_{1}.{\mathsf {max}}) \le e_\mathsf {max} \).

Since \(e_\mathsf {min} \le \min (r_{m}.{\mathsf {min}}, r'_{1}.{\mathsf {min}})\), \(e_\mathsf {max} \ge \max (r_{m}.{\mathsf {max}}, r'_{1}.{\mathsf {max}})\) and \([\min (r_{m}.{\mathsf {min}},r'_{1}.{\mathsf {min}}), \max (r_{m}.{\mathsf {max}},r'_{1}.{\mathsf {max}})]=\mathtt{mbr}(\{r_m, r'_1\})\), \(\mathtt{mbr}(\{r_j, r'_k\})\) always contains \(\mathtt{mbr}(\{r_m, r'_1\})\). That is, every range in \(R\,{\bowtie }_\mathsf {mbr}R'\) contains \(\mathtt{mbr}(\{r_m, r'_1\})\). Thus, by Definition 6, the required range set of \(R\,{\bowtie }_\mathsf {mbr}R'\) becomes \(\{\mathtt{mbr}(\{r_m, r'_1\})\}\).

(a-2) When \(\varvec{r_{1}.{\mathsf {min}} \ge r'_{m}.{\mathsf {min}}}\): We omit the proof since we can show similarly to the case of (a-1).

(b) When there exists a range in R containing a range in \(R'\): Let \(r'_{k_\mathsf {F}}\) be the range in \(R'\) which has the smallest minimum value among all ranges contained by \(r_{j_\mathsf {F}}\). Then, we first show that, for every pair of ranges \(r_j\in R\) and \(r'_k\in R'\) satisfying \(j\le j_\mathsf {F}\) and \(k\le k_\mathsf {F}\), \(\mathtt{mbr}(\{r_j, r'_k\})\) contains \(\mathtt{mbr}(\{r_{j_\mathsf {F}}, r'_{k_\mathsf {F}}\})\). It implies that such \(\mathtt{mbr}(\{r_j, r'_k\})\)s are not included in \(R_\mathsf {req}(R\,{\bowtie }_\mathsf {mbr}R')\). We consider two subcases of when (b-1) \(j_\mathsf {F}=1\) and (b-2) \(j_\mathsf {F}>1\).

(b-1) When \(j_\mathsf {F}=1\): Since \(\mathtt{mbr}(\{r_{1}, r'_\mathsf {k}\})\) always contains \(r_{1}\) and \(\mathtt{mbr}(\{r_{1}, r'_{k_\mathsf {F}}\})=r_{1}\), all \(\mathtt{mbr}(\{r_{1}, r'_\mathsf {k}\})\)s with \(k\le k_\mathsf {F}\) contain \(\mathtt{mbr}(\{r_{1}, r'_{k_\mathsf {F}}\})\).

(b-2) When \(j_\mathsf {F}>1\): If \(r_{j_\mathsf {F}}.\mathsf {max}<r'_1.\mathsf {max}\), \(r_{j_\mathsf {F}}\) cannot contain any range in \(R'\). If \(r_{j_\mathsf {F}}.\mathsf {max}>r'_1.\mathsf {max}\), \(r_{j_\mathsf {F}-1}\) contains \(r'_1\) and it is a contradiction. Thus, we get \(r_{j_\mathsf {F}}.\mathsf {max}=r'_1.\mathsf {max}\) and \(k_\mathsf {F}=1\). Then, for every \(\mathtt{mbr}(\{r_j, r'_1\})\) with \(j\le j_\mathsf {F}\), since \(\mathtt{mbr}(\{r_j, r'_1\}).\mathsf {max}=r'_1.\mathsf {max}=\mathtt{mbr}(\{r_{j_\mathsf {F}}, r'_1\}).\mathsf {max}\) and \(\mathtt{mbr}(\{r_j, r'_1\}).\mathsf {min}=r_j.\mathsf {min}\le \mathtt{mbr}(\{r_{j_\mathsf {F}}, r'_1\}).\mathsf {min}\), \(\mathtt{mbr}(\{r_j, r'_1\})\) contains \(\mathtt{mbr}(\{r_{j_\mathsf {F}}, r'_{1}\})\).

For every pair of ranges \(r_j\in R\) and \(r'_k\in R'\) satisfying \(j\ge j_\mathsf {R}\) and \(k\ge k_\mathsf {R}\), we can symmetrically show that \(\mathtt{mbr}(\{r_j, r'_k\})\) contains \(\mathtt{mbr}(\{r_{j_\mathsf {R}}, r'_{k_\mathsf {R}}\})\). Thus, we need to consider \(\mathtt{mbr}(\{r_j, r'_k\})\)s with \(j_\mathsf {F}\le j\le j_\mathsf {R}\) and \(k_\mathsf {F}\le k\le k_\mathsf {R}\). Since a range \(r_j\in R\) contains a range \(r'_k\in R'\), \(\mathtt{mbr}(\{r_j, r'_k\})=r_j\) and it is contained by \(\mathtt{mbr}(\{r_j, r'_{k'}\})\) with every \(r'_{k'}\in R'\), the required range set of \(\{\mathtt{mbr}(\{r_j, r'_k\})\,|\,j_\mathsf {F}\le j\le j_\mathsf {R},k_\mathsf {F}\le k\le k_\mathsf {R}\}\) becomes \(\{r_j\,|\,j_\mathsf {F}\le j\le j_\mathsf {R}\}\). \(\square \)

Proof of Lemma 7

For a pair of \(r_1\in R\) and \(r_2\in R\), if \(r_1\) contains \(r_2\), since \(\mathtt{mbr}(\{r_1, r_3\})\) contains \(\mathtt{mbr}(\{r_2, r_3\})\) with every range \(r_3\in R'\) by Definition 6, \(\mathtt{mbr}(\{r_1, r_2\})\not \in R_\mathsf {req}(\{\mathtt{mbr}(\{r, r'\})\,|\,r\in R,r'\in R'\})\) by Definition 7. Thus, \(R_\mathsf {req}(\{\mathtt{mbr}(\{r, r'\})\,|\,r\in R,r'\in R'\})=R_\mathsf {req}(\{\mathtt{mbr}(\{r, r'\})\,|\,r\in R_\mathsf {req}(R),r'\in R'\})\). Symmetrically, we can show \(R_\mathsf {req}(\{\mathtt{mbr}(\{r, r'\})\,|\,r\in R_\mathsf {req}(R),r'\in R'\})=R_\mathsf {req}(\{\mathtt{mbr}(\{r, r'\})\,|\,r\in R_\mathsf {req}(R),r'\in R_\mathsf {req}(R')\})\). Thus, \(R_\mathsf {req}(R\,{\bowtie }_\mathsf {mbr}R')=R_\mathsf {req}(R_\mathsf {req}(R)\,{\bowtie }_\mathsf {mbr}R_\mathsf {req}(R'))\). We can similarly prove \(R_\mathsf {req}(R\,{\bowtie }_{\delta }R')=R_\mathsf {req}(R_\mathsf {req}(R)\,{\bowtie }_{\delta } R_\mathsf {req}(R'))\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, J., Min, JK. & Shim, K. Efficient two-dimensional Haar\(^+\) synopsis construction for the maximum absolute error measure. The VLDB Journal 28, 675–701 (2019). https://doi.org/10.1007/s00778-019-00551-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00551-2

Keywords

Navigation