Advertisement

The VLDB Journal

, Volume 28, Issue 5, pp 675–701 | Cite as

Efficient two-dimensional Haar\(^+\) synopsis construction for the maximum absolute error measure

  • Jinhyun Kim
  • Jun-Ki MinEmail author
  • Kyuseok Shim
Regular Paper

Abstract

Several wavelet synopsis construction algorithms were previously proposed for optimal Haar\(^+\) synopses. Recently, we proposed the OptExtHP-EB algorithm to find an optimal one-dimensional \(\hbox {Haar}^+\) synopsis. By utilizing the novel properties of optimal synopses, OptExtHP-EB represents the set of optimal synopses in a node of a \(\hbox {Haar}^+\) tree by a set of extended synopses. While it is much faster than the previous \(\hbox {Haar}^+\) synopsis construction algorithms, it can handle only one-dimensional data. In this paper, we propose the OptExtHP-EB2D algorithm for two-dimensional \(\hbox {Haar}^+\) synopses by extending OptExtHP-EB. While a one-dimensional \(\hbox {Haar}^+\) tree has only two child nodes and three coefficients in a node, a two-dimensional \(\hbox {Haar}^+\) tree is much more complex in that it has four child nodes and seven coefficients per node. Thus, for each possible subset of the coefficients selected in a node, we develop the efficient methods to compute a set of optimal synopses denoted by extended synopses. Our experiments confirm the effectiveness of our proposed OptExtHP-EB2D algorithm.

Keywords

Query processing Data synopses Optimal wavelet synopsis construction Approximate query answering 

Notes

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1A02937186) as well as Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2017M3C4A7063570). It was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2019R1F1A1062511).

References

  1. 1.
    Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: a multidimensional workload-aware histogram. In: ACM Sigmod Record, vol. 30, pp. 211–222. ACM (2001)Google Scholar
  2. 2.
    Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. VLDB J. 10(2–3), 199–223 (2001)zbMATHGoogle Scholar
  3. 3.
    Cormode, G., Garofalakis, M., Sacharidis, D.: Fast approximate wavelet tracking on streams. In: International Conference on Extending Database Technology, pp. 4–22. Springer (2006)Google Scholar
  4. 4.
    Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. ACM SIGMOD Rec. 30(2), 199–210 (2001)CrossRefGoogle Scholar
  5. 5.
    Garofalakis, M., Gibbons, P.B.: Probabilistic wavelet synopses. ACM TODS 29(1), 43–90 (2004)CrossRefGoogle Scholar
  6. 6.
    Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: PODS, pp. 166–176 (2004)Google Scholar
  7. 7.
    Garofalakis, M., Kumar, A.: Wavelet synopses for general error metrics. TODS 30(4), 888–928 (2005)CrossRefGoogle Scholar
  8. 8.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: One-pass wavelet decompositions of data streams. TKDE 15(3), 541–554 (2003)Google Scholar
  9. 9.
    Guha, S.: Space efficiency in synopsis construction algorithms. In: VLDB, pp. 409–420 (2005)Google Scholar
  10. 10.
    Guha, S.: On the space-time of optimal, approximate and streaming algorithms for synopsis construction problems. VLDB J. 17(6), 1509–1535 (2008)CrossRefGoogle Scholar
  11. 11.
    Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-Euclidean error. In: SIGKDD, pp. 88–97 (2005)Google Scholar
  12. 12.
    Guha, S., Harb, B.: Approximation algorithms for wavelet transform coding of data streams. Inf. Theory 54(2), 811–830 (2008)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Guha, S., Park, H., Shim, K.: Wavelet synopsis for hierarchical range queries with workloads. VLDB J. 17(5), 1079–1099 (2008)CrossRefGoogle Scholar
  14. 14.
    Jestes, J., Yi, K., Li, F.: Building wavelet histograms on large data in mapreduce. PVLDB 5(2), 109–120 (2011)Google Scholar
  15. 15.
    Karras, P.: Optimality and scalability in lattice histogram construction. PVLDB 2(1), 670–681 (2009)Google Scholar
  16. 16.
    Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: VLDB, pp. 421–432 (2005)Google Scholar
  17. 17.
    Karras, P., Mamoulis, N.: The Haar+ tree: a refined synopsis data structure. In: ICDE, pp. 436–445 (2007)Google Scholar
  18. 18.
    Karras, P., Mamoulis, N.: Hierarchical synopses with optimal error guarantees. TODS 33(3), 18 (2008)CrossRefGoogle Scholar
  19. 19.
    Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting duality in summarization with deterministic guarantees. In: SIGKDD, pp. 380–389. ACM (2007)Google Scholar
  20. 20.
    Kim, J., Min, J.K., Shim, K.: Efficient haar+ synopsis construction for the maximum absolute error measure. PVLDB 11(1), 40–52 (2017)Google Scholar
  21. 21.
    Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: SIGMOD, vol. 27, pp. 448–459. ACM (1998)Google Scholar
  22. 22.
    Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: VLDB, pp. 101–110 (2000)Google Scholar
  23. 23.
    Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)Google Scholar
  24. 24.
    Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. In: ACM SIGMOD Record, vol. 17, pp. 28–36. ACM (1988)Google Scholar
  25. 25.
    Muthukrishnan, S.: Subquadratic algorithms for workload-aware haar wavelet synopses. In: FSTTCS, pp. 285–296 (2005)Google Scholar
  26. 26.
    Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: algorithms, complexity and applications. In: International Conference on Database Theory, pp. 236–256. Springer (1999)Google Scholar
  27. 27.
    Muthukrishnan, S., Strauss, M.: Rangesum histograms. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 233–242. Society for Industrial and Applied Mathematics (2003)Google Scholar
  28. 28.
    Mytilinis, I., Tsoumakos, D., Koziris, N.: Distributed wavelet thresholding for maximum error metrics. In: SIGMOD, pp. 663–677. ACM (2016)Google Scholar
  29. 29.
    Natsev, A., Rastogi, R., Shim, K.: Walrus: a similarity retrieval algorithm for image databases. SIGMOD 28, 395–406 (1999)CrossRefGoogle Scholar
  30. 30.
    Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. VLDB 97, 486–495 (1997)Google Scholar
  31. 31.
    Reiss, F., Garofalakis, M., Hellerstein, J.M.: Compact histograms for hierarchical identifiers. In: VLDB, pp. 870–881 (2006)Google Scholar
  32. 32.
    Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: Isomer: Consistent histogram construction using query feedback. In: Proceedings of the 22nd International Conference on Data Engineering, 2006. ICDE’06, pp. 39–39. IEEE (2006)Google Scholar
  33. 33.
    Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 428–439. ACM (2002)Google Scholar
  34. 34.
    Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: SIGMOD, vol. 28, pp. 193–204. ACM (1999)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Seoul National UniversitySeoulKorea
  2. 2.Korea University of Technology and EducationCheonanKorea

Personalised recommendations